The internet abounds with tutorials on programming projects like building
a calculator, for instance. The focus is often on the GUI aspects, ignoring
the complexities of expression evaluation. Mathematical expressions are usually
evaluated by the eval
function. This is not a safe method if arbitrary input
is allowed. In this and the following two posts, I will describe four
ways in which arithmetic expressions can be evaluated, starting with the
simplest - the dubious eval
. (I will not dwell on the GUI part of a
calculator.) The ways are:
eval
builtin functionThis post deals with options 1 and 2.
This can be used to evaluate any Python expression, not just arithmetic ones.
The syntax is eval(expression[, globals [, locals]])
. The optional globals
and locals are the global and local namespaces. If locals is omitted, it
defaults to the global namespace. If both are omitted, the current global and
local variables are used. A simple example of its usage is shown below:
>>> eval('2 * (3 + 9) / 8 - 1')
2.0
Note that since any Python expression can be evaluated, this method is unsafe.
User input can be a string containing malicious code which eval
would process
in the same way it would an arithmetic expression. So, though simple to use, it
is not suitable for arbitrary input in an application.
This is a well-known algorithm in Computer Science that converts mathematical
expressions in infix notation to the postfix form. Infix expressions are
those in which the operator lies in between the operands, like 2 + 3 * 5
.
On the other hand, in postfix expressions, the operator comes after the operands.
The postfix form of the expression 2 + 3 * 5
is 235*+
. Postfix notation is
also known as RPN or Reverse Polish Notation.
The algorithm is so named because its operation resembles that of a railroad shunting yard.
The algorithm parses the string by adding operands to an output string and pushing operators to an operator stack. It pops the stack and adds the operators to the output string based on their precedence and associativity. Specifically, if the incoming (from the input string) operator has a lower precedence than the operator on the stack or if the precedences are equal and the incoming operator is left associative, then the operator on the stack is popped and added to the output. After the input is exhausted, the remaining operators on the stack are popped off and added to the output to get the final postfix string.
Suppose the input consists of the string 10 + ( 3 * 2 ) ^ 2 ^ 3 - 25 / 5
.
The following table shows the algorithm in action -
Token | Stack | Output | Comments |
---|---|---|---|
10 | 10 | Number gets appended to output. |
|
+ | + | 10 | Symbol gets pushed onto stack. |
( | (+ | 10 | Left parenthesis is put on stack |
3 | (+ | 10 3 | Number appended to output |
* | *(+ | 10 3 | Symbol is put on stack. |
2 | *(+ | 10 3 2 | Number appended to output. |
) | + | 10 3 2 * | Stack gets popped until left parenthesis which is discarded. |
^ | ^+ | 10 3 2 * | Higher precedence operator put on stack. |
2 | ^+ | 10 3 2 * 2 | Number appended to output. |
^ | ^^+ | 10 3 2 * 2 | Equal precedence, right associative operator is put on stack. |
3 | ^^+ | 10 3 2 * 2 3 | Number appended to output. |
- | - | 10 3 2 * 2 3 ^ ^ + | Higher and equal precedence (incoming is left associative) operators popped and incoming operator pushed. |
25 | - | 10 3 2 * 2 3 ^ ^ + 25 | Number appended to output. |
/ | /- | 10 3 2 * 2 3 ^ ^ + 25 | Higher precedence operator pushed onto stack. |
5 | /- | 10 3 2 * 2 3 ^ ^ + 25 5 | Number appended to output. |
10 3 2 * 2 3 ^ ^ + 25 5 / - | Remaining operators popped and added to output. |
Thus the final postfix form is 10 3 2 * 2 3 ^ ^ + 25 5 / -
.
The Python code is described below in parts and then the whole code is presented
for reference. First we initialize 2 dictionaries, PREC
for holding the
operator precedences and ASSOC
for holding the operator associativities.
PREC = {'+': 1, '-': 1, '*': 2, '/': 2, '^': 3}
ASSOC = {'+': 'left', '-': 'left', '*': 'left', '/': 'left', '^': 'right'}
The input string, an infix expression, is split into tokens by the following code. Note that I have left out error handling for incorrectly formed input for simplicity's sake. The input string is split into tokens by a space character.
def get_next_token(s):
'''Split the input expression into a list of tokens - numbers and
operators.
'''
tokens = s.split(' ')
for token in tokens:
yield token
We compare precedences of operators t1
- the next token from the input string -
with t2
- the operator on top of the stack. If t1
has a higher precedence or
if precedences are equal and the incoming operator has right associativity, then
return True
- meaning that the incoming operator will be pushed onto the stack.
False
means that the stack be popped and the operator will be appended to
output. Note that the left parenthesis is considered to have the lowest precedence,
so the function returns True
if t2
is equal to it.
def compare(t1, t2):
'''Check if the operator from the input has a higher priority than the
operator on the stack.
'''
if t2 == '(':
return True
if (PREC[t1] > PREC[t2] or
PREC[t1] == PREC[t2] and
ASSOC[t1] == 'right'):
return True
else:
return False
The next function forms the bulk of the processing. An operator from the input
string (hereafter, 'the token') is compared with the stack top. If the token is a
left parenthesis, it is simply pushed on to the stack. If the token is a right
parenthesis, operators are popped from the stack until a left parenthesis appears
(which is discarded) and appended to the return value from the function. The
return value is a space separated string. Any other token is compared with the
stack top. If it has a higher priority as determined by the compare
function
above, then it is pushed on to the stack. If not, then the stack is successively
popped and the symbols appended to the return value until compare
returns True
.
Then it pushes the token onto the stack. If while popping the stack becomes empty,
the while
loop is terminated. The final else
simply pushes the token onto the
stack if it is empty. retval
is returned from the function.
def process(token, stack):
'''If the next input token is an operator, take action based on the contents
of the stack. Return a string that will be appended to the output.
'''
retval = ''
if stack:
if token == '(':
stack.append(token)
elif token == ')':
while stack[-1] != '(':
retval += stack.pop() + ' '
stack.pop()
elif compare(token, stack[-1]):
stack.append(token)
else:
while not compare(token, stack[-1]):
retval += stack.pop() + ' '
if not stack:
break
stack.append(token)
else:
stack.append(token)
return retval
The next function is the driver for the process. It initializes the stack and
processes the input string in a loop by calling get_next_token
. If the token is
a number, it is appended to the output. If not, then compare
is called and its
result is appended to the output. After the input string is exhausted, the
remaining operators on the stack are popped and appended to the output. Then
the output, which is the final postfix expression, is returned.
def infix_to_postfix(s):
'''Convert an infix expression to postfix form.'''
stack = []
output = ''
for token in get_next_token(s):
try:
int(token)
except ValueError:
output += process(token, stack)
else:
output += token + ' '
while stack:
output += stack.pop() + ' '
return output
The main
function is simple. It gets the infix expression from the user and calls
infix_to_postfix
to convert it to postfix, which is then printed out.
def main():
'''Get infix expression and convert it to postfix.'''
infix = input('Enter a space separated infix expression ')
postfix = infix_to_postfix(infix)
print('The corresponding postfix expression is ', postfix)
The final piece of code executes the whole program as a script:
if __name__ == "__main__":
main()
When the program (called shunting.py
) is run from a terminal with our example
infix string, we get the following response:
dp@asus:~/blog$ python3 shunting.py
Enter a space separated infix expression 10 + ( 3 * 2 ) ^ 2 ^ 3 - 25 / 5
The corresponding postfix expression is 10 3 2 * 2 3 ^ ^ + 25 5 / -
The complete code for shunting.py
is given below for your reference.
PREC = {'+': 1, '-': 1, '*': 2, '/': 2, '^': 3}
ASSOC = {'+': 'left', '-': 'left', '*': 'left', '/': 'left', '^': 'right'}
def get_next_token(s):
'''Split the input expression into a list of tokens - numbers and
operators.
'''
tokens = s.split(' ')
for token in tokens:
yield token
def compare(t1, t2):
'''Check if the operator from the input has a higher priority than the
operator on the stack.
'''
if t2 == '(':
return True
if (PREC[t1] > PREC[t2] or
PREC[t1] == PREC[t2] and
ASSOC[t1] == 'right'):
return True
else:
return False
def process(token, stack):
'''If the next input token is an operator, take action based on the contents
of the stack. Return a string that will be appended to the output.
'''
retval = ''
if stack:
if token == '(':
stack.append(token)
elif token == ')':
while stack[-1] != '(':
retval += stack.pop() + ' '
stack.pop()
elif compare(token, stack[-1]):
stack.append(token)
else:
while not compare(token, stack[-1]):
retval += stack.pop() + ' '
if not stack:
break
stack.append(token)
else:
stack.append(token)
return retval
def infix_to_postfix(s):
'''Convert an infix expression to postfix form.'''
stack = []
output = ''
for token in get_next_token(s):
try:
int(token)
except ValueError:
output += process(token, stack)
else:
output += token + ' '
while stack:
output += stack.pop() + ' '
return output
def main():
'''Get infix expression and convert it to postfix.'''
infix = input('Enter a space separated infix expression ')
postfix = infix_to_postfix(infix)
print('The corresponding postfix expression is ', postfix)
if __name__ == "__main__":
main()
The program to evaluate a postfix expression is simple. Again it uses a stack, but this time it is an operand stack. Numbers in the input expression are pushed on to the stack. When the input token is an operator, the top 2 elements of the stack are popped, the operator is applied to them and the result is pushed back on to the stack. When a correctly framed postfix expression is processed in this way, the stack will contain only one element at the end which will be the result of the postfix evaluation.
The program, which I call posteval.py
, can be run in the terminal, with the
postfix expression we obtained from the conversion process above, as follows:
dp@asus:~/blog$ python3 posteval.py
Enter the postfix expression 10 3 2 * 2 3 ^ ^ + 25 5 / -
The value of entered postfix expression is 1679621.0
To check if the value is correct, we can evaluate the infix form of the expression in a Python REPL as follows:
>>> 10 + ( 3 * 2 ) ** 2 ** 3 - 25 / 5
1679621.0
So the value obtained by evaluating the postfix is correct.
The complete code of posteval.py
is given below. (I have left out error checking
for improper input in order to focus on the evaluation.)
def get_next_token(s):
'''Split the input expression into a list of tokens - numbers and
operators.
'''
tokens = s.split(' ')
for token in tokens:
yield token
def process_operator(op, stack):
'''Process operator from the input string and the operand stack.'''
y = stack.pop()
x = stack.pop()
if op == '+':
z = x + y
elif op == '-':
z = x - y
elif op == '*':
z = x * y
elif op == '/':
z = x / y
elif op == '^':
z = x ** y
stack.append(z)
def postfix_eval(s):
'''Evaluate a postfix expression.'''
stack = []
for token in get_next_token(s):
try:
x = int(token)
except ValueError:
process_operator(token, stack)
else:
stack.append(x)
return stack.pop()
def main():
'''Get the postfix string from the user and evaluate it.'''
postfix = input('Enter the postfix expression ')
result = postfix_eval(postfix)
print('The value of entered postfix expression is', result)
if __name__ == "__main__":
main()
So, in this post, you learned about two ways of evaluating an arithmetic expression, namely,
eval
functionIn the next post, I will build what is known as a syntax directed translator that evaluates an expression by following a grammar. This draws from compiler theory which I will touch upon briefly.