Expression Parsing in Data Structure

An expression is a statement that generates a value on evaluation. Parsing means analyzing a string or a set of symbols one by one depending on a particular criterion. Expression parsing a term used in a programming language to evaluate arithmetic and logical expressions.

Whenever we input an expression into the computer, it reads it as a string and parses it to generate the relevant output. An expression looks something as follows: a+b^c-d+x*p.

Expression Parsing in Data Structure

In data structures, expression parsing is used to evaluate logical and arithmetic expressions. We can describe arithmetic operations in the form of a notation. There are three ways to write an arithmetic expression i.e. three different notations for arithmetic expressions:

1. Infix notation
2. Postfix notation
3. Prefix notation

We need to note here that a notation is just a form of representation. The output of the expression remains the same irrespective of the notation used.

Before digging deeper into these notations, let us first understand the concept of precedence and associativity.

Associativity in Expression Parsing

Associativity comes into the picture when an expression contains an operator more than once. It is also used whenever an expression has two or more operators of the same precedence.

In such cases, we need to decide the priority/order of evaluation of every operator. Associativity can be either:

1. Left to right or
2. Right to left

Example of Expression Parsing

Consider the expression p+q-r.

Here, both + and – have the same precedence, so which of the two will evaluate first? This decision will be made by associativity. Both + and – have left to right associativity. Thus, the addition will evaluate before subtraction.

Let an expression be 5+10-4. We will evaluate (5 + 10) first which is equal to 15. Then we will perform subtraction i.e. 15 – 4 = 11.

Thus, the outcome of the expression will be 11.

Example 2:
Consider the expression p*q/r. Both * and / have the same precedence but the associativity is left to right.
Hence multiplication will occur first.

Let an expression be 20*5/10. Firstly, 20*5 will evaluate which gives out 100. Then division will evaluate i.e. 100/10 = 10.

Thus, 10 will be the outcome.

Precedence in Expression Parsing

Precedence defines the order which the operators must follow during the evaluation of an expression. The order of operators is always decided by the precedence first.

If precedence fails to decide the order, we take the help of associativity to do so. This mostly occurs when an expression contains two or more operators of the same precedence.

The following table shows the decreasing order of precedence of operators along with their associativity.

S. No. Operator Description  Associativity 
1 ( ) Parentheses  Left to right
2 [ ] Brackets 
3 . Object 
4 -> Pointer 
5 ++   — Postfix increment and decrement
6 ++  — Prefix increment and decrement Right to left
7
Unary plus/minus
8 !~ Logical negation and bitwise complement
9 (type) Cast 
10 * Dereference operator
11 & Address of operator
12 sizeof Unary operator, returns size in bytes.
13 * / % Multiplication/ division/ modulus  Left to right
14
Addition/ Subtraction
15 <<  >> Bitwise left shift/ Bitwise right shift Left to right
16 <  <= Relation less than, less than or equal to Left to right
17 >  >= Relational greater than, greater than or equal to Left to right
18 ==  != Is equal to, Not equal to Left to right
19 & Bitwise AND Left to right
20 ^ Bitwise exclusive OR Left to right
21 | Bitwise inclusive OR Left to right
22 && Logical AND Left to right
23 || Logical OR Left to right
24 ? : Ternary operator Left to right
25 = Assignment operator Right to left
26 +=  -= Addition/Subtraction assignment
27 *=  /= Multiplication/division assignment
28 %=  &= Modulus/Bitwise AND assignment 
29 ^=  |= Bitwise exclusive/ Bitwise inclusive OR
30 <<=  >>= Bitwise left shift/ Bitwise right shift assignment
31 , Comma operator Left to right

Types of Notations:

As discussed above, there are three types of notations:

1. Infix
2. Postfix
3. Prefix

Let us discuss them in detail.

1. Infix Notation

In this notation, the operator is in-between both the operands. It is the easiest readable form of expression for humans. However, it is very complex for the computer. This is why whenever we input an infix expression into the program, the compiler first converts it into postfix notation and then evaluates it.

For a computer, infix notation requires a lot of memory and time for computation. The following expression represents an infix notation: (p+q) * (r-s)

2. Postfix notation

It is also known as reverse polish notation. In postfix notation, the operators are written after the operand. The following expression represents a postfix notation: pq+rs-*

3. Prefix Notation

It is also known as polish notation. In this notation, the operator comes before the operands. The following expression represents the prefix notation: *+pq-rs

Expression Parsing in Data Structure

To parse any expression, the first thing we need is to take into consideration associativity and the order of precedence.

But why do we need to parse an expression? The answer is simple. The algorithms that are designed in infix are not very efficient. Therefore, we need to convert the infix expressions into a prefix or postfix expression.

Let us check two different conversions:
1. Infix to postfix
2. Infix to prefix

1. Infix to Prefix Conversion:

We can convert an infix expression into a prefix or postfix expression without or with using stack.

Let us first see an infix conversion into prefix without the help of a stack.

Consider an expression: A+((B*C-D/E)^F^G)+H

Step 1: First, we will solve B*C i.e. A + ((*BC – D/E) ^F ^ G) + H
Step 2: A + ((*BC – /DE) ^F ^ G) + H
Step 3: A + (( -*BC /DE) ^F ^ G) + H
Step 4: A + (( -*BC /DE) ^ ^FG) + H
Step 5: A + (^ -*BC /DE ^FG) + H
Step 6: + A (^ -*BC /DE ^FG) + H
Step 7: + + A (^ -*BC /DE ^FG) H

Thus, according to the order of precedence and associativity, we have calculated the prefix expression from the infix expression.

Infix to Prefix Conversion Using Stack

Consider the expression: A+(B-C)*D-E/F

Step 1: Reverse the expression as shown: F/E-D*)C-B(+A
Step 2:

Input symbol Stack  Prefix 
F Empty  F
/ / F
E / FE
FE/
D FE/D
* -* FE/D
) -*) FE/D
C -*) FE/DC

Step 3: Reverse the results -+A*-BCD/EF

Infix to Postfix Conversion:

We can convert the infix expression to postfix expression with or without using stack. Let us see the conversion with the help of the stack.

Consider the following expression:
A+(B*C/D-E^F^I+G)/H

Its postfix will be:

Input symbol Stack  Postfix 
A Empty  A
+ + A
( +( A
B +( AB
* +(* AB
C +(* ABC
/ +(/ ABC*
D +(/ ABC*D
+(- ABC*D/
E +(- ABC*D/E
^ +(-^ ABC*D/E
F +(-^ ABC*D/EF
^ +(-^^ ABC*D/EF
I +(-^^ ABC*D/EFI
+ +(+ ABC*D/EFI^^-
G +(+ ABC*D/EFI^^-G
) + ABC*D/EFI^^-G+
/ +/ ABC*D/EFI^^-G+
H +/ ABC*D/EFI^^-G+H
NULL Empty  ABC*D/EFI^^-G+H/+

Thus, the outcome of the above expression in postfix is: ABC*D/EFI^^-G+H/+

Postfix evaluation:

The rules for postfix evaluation are:

  • Whenever we encounter an operand, we push it into the stack.
  • Whenever we encounter an operator in the postfix notation, we pop the top 2 elements out of the stack.
  • The lower element i.e. the element at the position top-1 will be written first during evaluation.

For example: Let the postfix expression be 10 4 + 12 3 / *

Input  Current stack Current symbol Operations on stack
10 4 + 12 3 / * Empty  10 Push 10
4 + 12 3 / * 10 4 Push 4
+ 12 3 / * 4, 10 + Pop 4,10 push 4+10
12 3 / * 14 12 Push 12
3 / * 14, 12 3 Push 3
/ * 14, 12, 3 / Pop 3, 12 and push 12/3
* 14, 4 * Pop 4, 14 and push 4*14

The result of this postfix evaluation is: 56

Algorithm for expression parsing:

Step 1: Scan all the symbols of the expression one by one.
Step 2: If the input character is an operand, we push it into the stack.
Step 3: If the input character is an operator, we pop it out and perform the relevant operation.
Step 4: Push the result of the operation back into the stack.
Step 5: Repeat steps 1 to 4 until we have performed all the operations.
Step 6: Pop the stack. Perform the final operation.

Postfix evaluation in C++

#include <bits/stdc++.h>
using namespace std;
 
int Precedence_Order(char opd){
    if(opd == '+'||opd == '-')
        return 1;
    if(opd == '*'||opd == '/')
        return 2;
    return 0;
}
 
int Operation(int num1, int num2, char opd){
    switch(op){
        case '+': return num1 + num2;
        case '-': return num1 - num2;
        case '*': return num1 * num2;
        case '/': return num1 / num2;
    }
}
 
int Evaluate(string expression){
    int i;
     
    stack <int> values;
     
    stack <char> ops;
     
    for(i = 0; i < expression.length(); i++){
        
        if(expression[i] == ' ')
            continue;
         
        else if(expression[i] == '('){
            ops.push(expression[i]);
        }
         
        else if(isdigit(expression[i])){
            int val = 0;
             
            while(i < expression.length() && isdigit(expression[i]))
            {
                val = (val*10) + (expression[i]-'0');
                i++;
            }
             
            values.push(val);
             
              i--;
        }
         
       
        else if(expression[i] == ')')
        {
            while(!ops.empty() && ops.top() != '(')
            {
                int val2 = values.top();
                values.pop();
                 
                int val1 = values.top();
                values.pop();
                 
                char opd = ops.top();
                ops.pop();
                 
                values.push(Operation(val1, val2, opd));
            }
             
            
            if(!ops.empty())
               ops.pop();
        }
         
        
        else
        {
            
            while(!ops.empty() && Precedence_Order(ops.top())>= Precedence_Order(expression[i])){
                int val2 = values.top();
                values.pop();
                 
                int val1 = values.top();
                values.pop();
                 
                char opd = ops.top();
                ops.pop();
                 
                values.push(Operation(val1, val2, opd));
            }
             
            
            ops.push(expression[i]);
        }
    }
     
    
    while(!ops.empty()){
        int val2 = values.top();
        values.pop();
                 
        int val1 = values.top();
        values.pop();
                 
        char opd = ops.top();
        ops.pop();
                 
        values.push(Operation(val1, val2, opd));
    }
     
    
    return values.top();
}
 
int main() {
    cout << Evaluate("34 + 13 * 93") << "\n";
    cout << Evaluate("85 / 23 - 16") << "\n";
    cout << Evaluate("100 * ( 2 + 12 )") << "\n";
    cout << Evaluate("121 * ( 5 - 1 ) / 11");
    return 0;
}

Expression parsing in Python:

def Precedence_Order(opd):
    
    if opd == '+' or opd == '-':
        return 1
    if opd == '*' or opd == '/':
        return 2
    return 0
 
def Operation(num1, num2, opd):
     
    if opd == '+': return num1 + num2
    if opd == '-': return num1 - num2
    if opd == '*': return num1 * num2
    if opd == '/': return num1 // num2
 
def Evaluate(expression):
     
    values = []
    ops = []
    i = 0
     
    while i < len(expression):
         
        if expression[i] == ' ':
            i += 1
            continue
         
        elif expression[i] == '(':
            ops.append(expression[i])
         
        elif expression[i].isdigit():
            val = 0
             
            while (i < len(expression) and
                expression[i].isdigit()):
             
                val = (val * 10) + int(expression[i])
                i += 1
             
            values.append(val)
             
            i-=1
        
        elif expression[i] == ')':
         
            while len(ops) != 0 and ops[-1] != '(':
             
                val2 = values.pop()
                val1 = values.pop()
                opd = ops.pop()
                 
                values.append(Operation(val1, val2, opd))
            ops.pop()
         
        else:
        
            while (len(ops) != 0 and Precedence_Order(ops[-1]) >= Precedence_Order(expression[i])):
                val2 = values.pop()
                val1 = values.pop()
                opd = ops.pop()
                 
                values.append(Operation(val1, val2, opd))
             
            
            ops.append(expression[i])
         
        i += 1
     
    while len(ops) != 0:
         
        val2 = values.pop()
        val1 = values.pop()
        opd = ops.pop()
                 
        values.append(Operation(val1, val2, opd))
     
    return values[-1]
 
if __name__ == "__main__":
     
    print(Evaluate("34 + 13 * 93"))
    print(Evaluate("85 / 23 - 16"))
    print(Evaluate("100 * ( 2 + 12 )"))
    print(Evaluate("121 * ( 5 - 1 ) / 11"))

Conclusion

Expression parsing is a very important concept of data structures because it tells us how a compiler evaluates the arithmetic expressions. As mentioned earlier, it is difficult for a compiler to solve the expressions in infix notations which is why it converts the expression into postfix or prefix.

Thus, it is very important for us to understand what are postfix and prefix notations and how a compiler works on them or evaluates them.