Expression Parsing in Data Structure
An expression is a statement that generates a value on evaluation. Parsing means analyzing a string or a set of symbols one by one depending on a particular criterion. Expression parsing a term used in a programming language to evaluate arithmetic and logical expressions.
Whenever we input an expression into the computer, it reads it as a string and parses it to generate the relevant output. An expression looks something as follows: a+b^c-d+x*p.
Expression Parsing in Data Structure
In data structures, expression parsing is used to evaluate logical and arithmetic expressions. We can describe arithmetic operations in the form of a notation. There are three ways to write an arithmetic expression i.e. three different notations for arithmetic expressions:
1. Infix notation
2. Postfix notation
3. Prefix notation
We need to note here that a notation is just a form of representation. The output of the expression remains the same irrespective of the notation used.
Before digging deeper into these notations, let us first understand the concept of precedence and associativity.
Associativity in Expression Parsing
Associativity comes into the picture when an expression contains an operator more than once. It is also used whenever an expression has two or more operators of the same precedence.
In such cases, we need to decide the priority/order of evaluation of every operator. Associativity can be either:
1. Left to right or
2. Right to left
Example of Expression Parsing
Consider the expression p+q-r.
Here, both + and – have the same precedence, so which of the two will evaluate first? This decision will be made by associativity. Both + and – have left to right associativity. Thus, the addition will evaluate before subtraction.
Let an expression be 5+10-4. We will evaluate (5 + 10) first which is equal to 15. Then we will perform subtraction i.e. 15 – 4 = 11.
Thus, the outcome of the expression will be 11.
Example 2:
Consider the expression p*q/r. Both * and / have the same precedence but the associativity is left to right.
Hence multiplication will occur first.
Let an expression be 20*5/10. Firstly, 20*5 will evaluate which gives out 100. Then division will evaluate i.e. 100/10 = 10.
Thus, 10 will be the outcome.
Precedence in Expression Parsing
Precedence defines the order which the operators must follow during the evaluation of an expression. The order of operators is always decided by the precedence first.
If precedence fails to decide the order, we take the help of associativity to do so. This mostly occurs when an expression contains two or more operators of the same precedence.
The following table shows the decreasing order of precedence of operators along with their associativity.
S. No. | Operator | Description | Associativity |
1 | ( ) | Parentheses | Left to right |
2 | [ ] | Brackets | |
3 | . | Object | |
4 | -> | Pointer | |
5 | ++ — | Postfix increment and decrement | |
6 | ++ — | Prefix increment and decrement | Right to left |
7 |
|
Unary plus/minus | |
8 | !~ | Logical negation and bitwise complement | |
9 | (type) | Cast | |
10 | * | Dereference operator | |
11 | & | Address of operator | |
12 | sizeof | Unary operator, returns size in bytes. | |
13 | * / % | Multiplication/ division/ modulus | Left to right |
14 |
|
Addition/ Subtraction | |
15 | << >> | Bitwise left shift/ Bitwise right shift | Left to right |
16 | < <= | Relation less than, less than or equal to | Left to right |
17 | > >= | Relational greater than, greater than or equal to | Left to right |
18 | == != | Is equal to, Not equal to | Left to right |
19 | & | Bitwise AND | Left to right |
20 | ^ | Bitwise exclusive OR | Left to right |
21 | | | Bitwise inclusive OR | Left to right |
22 | && | Logical AND | Left to right |
23 | || | Logical OR | Left to right |
24 | ? : | Ternary operator | Left to right |
25 | = | Assignment operator | Right to left |
26 | += -= | Addition/Subtraction assignment | |
27 | *= /= | Multiplication/division assignment | |
28 | %= &= | Modulus/Bitwise AND assignment | |
29 | ^= |= | Bitwise exclusive/ Bitwise inclusive OR | |
30 | <<= >>= | Bitwise left shift/ Bitwise right shift assignment | |
31 | , | Comma operator | Left to right |
Types of Notations:
As discussed above, there are three types of notations:
1. Infix
2. Postfix
3. Prefix
Let us discuss them in detail.
1. Infix Notation
In this notation, the operator is in-between both the operands. It is the easiest readable form of expression for humans. However, it is very complex for the computer. This is why whenever we input an infix expression into the program, the compiler first converts it into postfix notation and then evaluates it.
For a computer, infix notation requires a lot of memory and time for computation. The following expression represents an infix notation: (p+q) * (r-s)
2. Postfix notation
It is also known as reverse polish notation. In postfix notation, the operators are written after the operand. The following expression represents a postfix notation: pq+rs-*
3. Prefix Notation
It is also known as polish notation. In this notation, the operator comes before the operands. The following expression represents the prefix notation: *+pq-rs
Expression Parsing in Data Structure
To parse any expression, the first thing we need is to take into consideration associativity and the order of precedence.
But why do we need to parse an expression? The answer is simple. The algorithms that are designed in infix are not very efficient. Therefore, we need to convert the infix expressions into a prefix or postfix expression.
Let us check two different conversions:
1. Infix to postfix
2. Infix to prefix
1. Infix to Prefix Conversion:
We can convert an infix expression into a prefix or postfix expression without or with using stack.
Let us first see an infix conversion into prefix without the help of a stack.
Consider an expression: A+((B*C-D/E)^F^G)+H
Step 1: First, we will solve B*C i.e. A + ((*BC – D/E) ^F ^ G) + H
Step 2: A + ((*BC – /DE) ^F ^ G) + H
Step 3: A + (( -*BC /DE) ^F ^ G) + H
Step 4: A + (( -*BC /DE) ^ ^FG) + H
Step 5: A + (^ -*BC /DE ^FG) + H
Step 6: + A (^ -*BC /DE ^FG) + H
Step 7: + + A (^ -*BC /DE ^FG) H
Thus, according to the order of precedence and associativity, we have calculated the prefix expression from the infix expression.
Infix to Prefix Conversion Using Stack
Consider the expression: A+(B-C)*D-E/F
Step 1: Reverse the expression as shown: F/E-D*)C-B(+A
Step 2:
Input symbol | Stack | Prefix |
F | Empty | F |
/ | / | F |
E | / | FE |
– | – | FE/ |
D | – | FE/D |
* | -* | FE/D |
) | -*) | FE/D |
C | -*) | FE/DC |
Step 3: Reverse the results -+A*-BCD/EF
Infix to Postfix Conversion:
We can convert the infix expression to postfix expression with or without using stack. Let us see the conversion with the help of the stack.
Consider the following expression:
A+(B*C/D-E^F^I+G)/H
Its postfix will be:
Input symbol | Stack | Postfix |
A | Empty | A |
+ | + | A |
( | +( | A |
B | +( | AB |
* | +(* | AB |
C | +(* | ABC |
/ | +(/ | ABC* |
D | +(/ | ABC*D |
– | +(- | ABC*D/ |
E | +(- | ABC*D/E |
^ | +(-^ | ABC*D/E |
F | +(-^ | ABC*D/EF |
^ | +(-^^ | ABC*D/EF |
I | +(-^^ | ABC*D/EFI |
+ | +(+ | ABC*D/EFI^^- |
G | +(+ | ABC*D/EFI^^-G |
) | + | ABC*D/EFI^^-G+ |
/ | +/ | ABC*D/EFI^^-G+ |
H | +/ | ABC*D/EFI^^-G+H |
NULL | Empty | ABC*D/EFI^^-G+H/+ |
Thus, the outcome of the above expression in postfix is: ABC*D/EFI^^-G+H/+
Postfix evaluation:
The rules for postfix evaluation are:
- Whenever we encounter an operand, we push it into the stack.
- Whenever we encounter an operator in the postfix notation, we pop the top 2 elements out of the stack.
- The lower element i.e. the element at the position top-1 will be written first during evaluation.
For example: Let the postfix expression be 10 4 + 12 3 / *
Input | Current stack | Current symbol | Operations on stack |
10 4 + 12 3 / * | Empty | 10 | Push 10 |
4 + 12 3 / * | 10 | 4 | Push 4 |
+ 12 3 / * | 4, 10 | + | Pop 4,10 push 4+10 |
12 3 / * | 14 | 12 | Push 12 |
3 / * | 14, 12 | 3 | Push 3 |
/ * | 14, 12, 3 | / | Pop 3, 12 and push 12/3 |
* | 14, 4 | * | Pop 4, 14 and push 4*14 |
The result of this postfix evaluation is: 56
Algorithm for expression parsing:
Step 1: Scan all the symbols of the expression one by one.
Step 2: If the input character is an operand, we push it into the stack.
Step 3: If the input character is an operator, we pop it out and perform the relevant operation.
Step 4: Push the result of the operation back into the stack.
Step 5: Repeat steps 1 to 4 until we have performed all the operations.
Step 6: Pop the stack. Perform the final operation.
Postfix evaluation in C++
#include <bits/stdc++.h> using namespace std; int Precedence_Order(char opd){ if(opd == '+'||opd == '-') return 1; if(opd == '*'||opd == '/') return 2; return 0; } int Operation(int num1, int num2, char opd){ switch(op){ case '+': return num1 + num2; case '-': return num1 - num2; case '*': return num1 * num2; case '/': return num1 / num2; } } int Evaluate(string expression){ int i; stack <int> values; stack <char> ops; for(i = 0; i < expression.length(); i++){ if(expression[i] == ' ') continue; else if(expression[i] == '('){ ops.push(expression[i]); } else if(isdigit(expression[i])){ int val = 0; while(i < expression.length() && isdigit(expression[i])) { val = (val*10) + (expression[i]-'0'); i++; } values.push(val); i--; } else if(expression[i] == ')') { while(!ops.empty() && ops.top() != '(') { int val2 = values.top(); values.pop(); int val1 = values.top(); values.pop(); char opd = ops.top(); ops.pop(); values.push(Operation(val1, val2, opd)); } if(!ops.empty()) ops.pop(); } else { while(!ops.empty() && Precedence_Order(ops.top())>= Precedence_Order(expression[i])){ int val2 = values.top(); values.pop(); int val1 = values.top(); values.pop(); char opd = ops.top(); ops.pop(); values.push(Operation(val1, val2, opd)); } ops.push(expression[i]); } } while(!ops.empty()){ int val2 = values.top(); values.pop(); int val1 = values.top(); values.pop(); char opd = ops.top(); ops.pop(); values.push(Operation(val1, val2, opd)); } return values.top(); } int main() { cout << Evaluate("34 + 13 * 93") << "\n"; cout << Evaluate("85 / 23 - 16") << "\n"; cout << Evaluate("100 * ( 2 + 12 )") << "\n"; cout << Evaluate("121 * ( 5 - 1 ) / 11"); return 0; }
Expression parsing in Python:
def Precedence_Order(opd): if opd == '+' or opd == '-': return 1 if opd == '*' or opd == '/': return 2 return 0 def Operation(num1, num2, opd): if opd == '+': return num1 + num2 if opd == '-': return num1 - num2 if opd == '*': return num1 * num2 if opd == '/': return num1 // num2 def Evaluate(expression): values = [] ops = [] i = 0 while i < len(expression): if expression[i] == ' ': i += 1 continue elif expression[i] == '(': ops.append(expression[i]) elif expression[i].isdigit(): val = 0 while (i < len(expression) and expression[i].isdigit()): val = (val * 10) + int(expression[i]) i += 1 values.append(val) i-=1 elif expression[i] == ')': while len(ops) != 0 and ops[-1] != '(': val2 = values.pop() val1 = values.pop() opd = ops.pop() values.append(Operation(val1, val2, opd)) ops.pop() else: while (len(ops) != 0 and Precedence_Order(ops[-1]) >= Precedence_Order(expression[i])): val2 = values.pop() val1 = values.pop() opd = ops.pop() values.append(Operation(val1, val2, opd)) ops.append(expression[i]) i += 1 while len(ops) != 0: val2 = values.pop() val1 = values.pop() opd = ops.pop() values.append(Operation(val1, val2, opd)) return values[-1] if __name__ == "__main__": print(Evaluate("34 + 13 * 93")) print(Evaluate("85 / 23 - 16")) print(Evaluate("100 * ( 2 + 12 )")) print(Evaluate("121 * ( 5 - 1 ) / 11"))
Conclusion
Expression parsing is a very important concept of data structures because it tells us how a compiler evaluates the arithmetic expressions. As mentioned earlier, it is difficult for a compiler to solve the expressions in infix notations which is why it converts the expression into postfix or prefix.
Thus, it is very important for us to understand what are postfix and prefix notations and how a compiler works on them or evaluates them.