Back to index

## 4 ways to build a simple calculator in Python - Part 3

In this concluding part of the series, I will build an abstract syntax tree or AST. An AST holds the information present in the arithmetic expression and can be evaluated to yield the result. It is built from grammar rules just like the translator in part 2 and evaluated using the design pattern known as the visitor pattern. Note that the `ast` module in Python can be used to build the tree, as I will show below, but it is more instructive to build it from first principles. The `ast` module is a general method to contruct ASTs for code, not just for arithmetic expressions.

One may argue that building an AST is overkill for simple arithmetic expressions - the method given in part 1 or in part 2 of this series should suffice. But the purpose here is to study it to get a peek at how programming languages parse expressions (and code in general) behind the scenes.

### What is an AST?

An AST is a tree-like structure that holds the essential information in an expression. The interior nodes are the operators of the expression while the leaves are the operands.

In contrast, a parse tree is contructed from the grammar with the nonterminals as the inner nodes and the terminals or tokens of the expression as leaves.

The diagram below shows an AST and a parse tree for the expression `5 * (2 + 4)`. The diagram shows that the AST comprises of only the operators and operands of the expression, without the parentheses, while the parse tree is elaborate - consisting of the nonterminals of the grammar (`expr`, `term` and `factor`) and the expression itself (reading the leaves from left to right).

### The Python `ast` module

We can use the Python standard library module `ast` to build an AST. I will use the same expression as above to illustrate.

``````>>> import ast
>>> tree = ast.parse('5 * (2 + 4)', mode='eval')
>>> ast.dump(tree)
``````

We see that the AST has, among other classes, two classes named `BinOp` and `Num`. These stand for the interior and leaf nodes of the AST respectively. I will use these classes in the design of the parser below.

The AST node (or tree) above can be evaluated to yield a value. It first needs to be compiled into a code object by the `compile` builtin function and then passed to the `eval` function to find the value.

``````>>> co = compile(tree, filename='<string>', mode='eval')
>>> eval(co)
30
``````

### The lexer

The lexical analyzer or lexer for the translator is identical to what was built in part 2. I am repeating it below for convenience.

``````import re

class Num:
def __init__(self, val):
self.value = float(val) if '.' in val else int(val)

def __repr__(self):
return f'Num: {self.value}'

def lexan():
'''Lexical analyzer - creates a stream of tokens'''

inp = input('Enter the expression ')

i = 0
while i < len(inp):
c = inp[i]
if c.isdigit() or c == '.':
val = re.match(r'[.0-9]+', inp[i:])
yield Num(val)
i += len(val)
else:
if inp[i:i + 2] == '**':
yield '**'
i += 1
elif c != ' ':
yield c
i += 1
``````

### The parser

The grammar for the parser will be the same as in part 2. I have repeated it below: However, instead of performing the operations while parsing, we build nodes of the tree. There are three kinds of nodes - `Num`, `BinOp` and `UMinus`. `Num` represents a number - a sequence of digits with an optional decimal point. `BinOp` stands for a binary operation. It has three parts - `left` (left operand), `op` (operator) and `right` (right operand). `UMinus` stands for a unary minus operation. It has two parts - `op` (operator - `-` sign) and `value` (the number). The Python classes for `BinOp` and `UMinus` are shown below - `Num` was shown above:

``````class BinOp:
def __init__(self, op, left, right):
self.left = left
self.op = op
self.right = right

def __repr__(self):
return f'<BinOp: left={self.left} op={self.op} right={self.right}>'

class UMinus:
def __init__(self, value):
self.op = '-'
self.value = value

def __repr__(self):
return f'<UMinus: value={self.op}{self.value}>'
``````

The rest of the code for the parser is similar to what was built in part 2. The difference is in the places where operations were performed - we now build nodes.

``````def expr():
'''Function to start the parsing. It builds the `+` and `-` nodes as
appropriate.'''

result = term()
while True:
match('+')
result = BinOp(left=result, op='+', right=term())
match('-')
result = BinOp(left=result, op='-', right=term())
else:
break

return result

def term():
'''This function builds the `*` and `/` nodes.'''

result = factor()
while True:
match('*')
result = BinOp(left=result, op='*', right=factor())
match('/')
result = BinOp(left=result, op='/', right=factor())
else:
break

return result

def factor():
'''This recursive function builds the node for the right associative
power operation.'''

result = base()
match('**')
result = BinOp(left=result, op='**', right=factor())

return result

def base():
'''This function evaluates another expression in parentheses or negates
a number or simply return a number.'''

match('(')
result = expr()
match(')')
match('-')
result = UMinus(value=base())
else:
match('number')

return result

def match(t):
'''Checks if token is valid and gets the next one.'''

try:
except StopIteration:
else:
``````

To test the parser, we can code a driver function as follows:

``````def main():
'''This function sets a starting value for the `lookahead` and calls the
parser.'''

token_gen = lexan()

tree = expr()
match('')
print(tree)
``````

Running `main` with the string `5 * (2 + 4)` gives the following output:

``````>>> main()
Enter the expression 5 * (2 + 4)
<BinOp: left=Num: 5 op=* right=<BinOp: left=Num: 2 op=+ right=Num: 4>>
``````

Note the similarity with what was obtained above by a dump of the AST using Python's `ast` module.

Now we discuss the evaluation of the AST.

### The visitor pattern

This is a design pattern commonly used in parsing and compiling. There are two ideas inherent in it. Firstly, it separates the manipulation of a complex data structure like an AST from the data structure itself. The classes `Num`, `BinOp` and `UMinus` do not carry any logic to manipulate the data they hold. Secondly, the visitor class, given below, uses a clever technique to dispatch actions to different methods, based on the node type. It uses recursion in the process. The visitor pattern can be used to evaluate an AST or take some other action like generating instructions for instance.

``````class NodeVisitor:
def visit(self, node):

method_name = 'visit_' + type(node).__name__
method = getattr(self, method_name, None)
if method:
return method(node)
else:
self.generic_visit(node)

def generic_visit(self, node):

raise RuntimeError(f'No method found for node {node}')

class Visitor(NodeVisitor):
def visit_BinOp(self, node):

if node.op == '+':
return self.visit(node.left) + self.visit(node.right)
elif node.op == '-':
return self.visit(node.left) - self.visit(node.right)
elif node.op == '*':
return self.visit(node.left) * self.visit(node.right)
elif node.op == '/':
return self.visit(node.left) / self.visit(node.right)
elif node.op == '**':
return self.visit(node.left) ** self.visit(node.right)

def visit_UMinus(self, node):

return -self.visit(node.value)

def visit_Num(self, node):

return node.value
``````

The `NodeVisitor` class dispatches a node of the AST to methods based on the type of the node. It uses the `getattr` builtin function to dynamically get the method from its name. If no method is found, then the `generic_visit` method is called to raise an exception. The `Visitor` class inherits from the `NodeVisitor` class and provides visit methods for the different node types. Its methods call themselves recursively to traverse the AST.

To test this code, we can modify the driver function as shown below:

``````def main():
'''This function sets a starting value for the `lookahead` and calls the
parser.'''

token_gen = lexan()

tree = expr()
match('')
v = Visitor()
result = v.visit(tree)
print(result)
``````

Following are some tests carried out in the REPL:

``````>>> main()
Enter the expression 5 * (2 + 4)
30
>>> main()
Enter the expression -(1 + 2) / 2
-1.5
>>> main()
Enter the expression 10 + ( 3 * 2 ) ** 2 ** 3 - 25 / 5
1679621.0
``````

The last example shows the same result as obtained in parts 1 and 2 of the series.

Complete listing

The complete listing of the translator program including the lexer, parser, AST evaluation and the driver components:

``````import re

class Num:
def __init__(self, val):
self.value = float(val) if '.' in val else int(val)

def __repr__(self):
return f'Num: {self.value}'

class BinOp:
def __init__(self, op, left, right):
self.left = left
self.op = op
self.right = right

def __repr__(self):
return f'<BinOp: left={self.left} op={self.op} right={self.right}>'

class UMinus:
def __init__(self, value):
self.op = '-'
self.value = value

def __repr__(self):
return f'<UMinus: value={self.op}{self.value}>'

class NodeVisitor:
def visit(self, node):

method_name = 'visit_' + type(node).__name__
method = getattr(self, method_name, None)
if method:
return method(node)
else:
self.generic_visit(node)

def generic_visit(self, node):

raise RuntimeError(f'No method found for node {node}')

class Visitor(NodeVisitor):
def visit_BinOp(self, node):

if node.op == '+':
return self.visit(node.left) + self.visit(node.right)
elif node.op == '-':
return self.visit(node.left) - self.visit(node.right)
elif node.op == '*':
return self.visit(node.left) * self.visit(node.right)
elif node.op == '/':
return self.visit(node.left) / self.visit(node.right)
elif node.op == '**':
return self.visit(node.left) ** self.visit(node.right)

def visit_UMinus(self, node):

return -self.visit(node.value)

def visit_Num(self, node):

return node.value

def lexan():
'''Lexical analyzer - creates a stream of tokens'''

inp = input('Enter the expression ')

i = 0
while i < len(inp):
c = inp[i]
if c.isdigit() or c == '.':
val = re.match(r'[.0-9]+', inp[i:])
yield Num(val)
i += len(val)
else:
if inp[i:i + 2] == '**':
yield '**'
i += 1
elif c != ' ':
yield c
i += 1

def expr():
'''Function to start the parsing. It builds the `+` and `-` nodes as
appropriate.'''

result = term()
while True:
match('+')
result = BinOp(left=result, op='+', right=term())
match('-')
result = BinOp(left=result, op='-', right=term())
else:
break

return result

def term():
'''This function builds the `*` and `/` nodes.'''

result = factor()
while True:
match('*')
result = BinOp(left=result, op='*', right=factor())
match('/')
result = BinOp(left=result, op='/', right=factor())
else:
break

return result

def factor():
'''This recursive function builds the node for the right associative
power operation.'''

result = base()
match('**')
result = BinOp(left=result, op='**', right=factor())

return result

def base():
'''This function evaluates another expression in parentheses or negates
a number or simply return a number.'''

match('(')
result = expr()
match(')')
match('-')
result = UMinus(value=base())
else:
match('number')

return result

def match(t):
'''Checks if token is valid and gets the next one.'''

try:
except StopIteration:
else:

def main():
'''This function sets a starting value for the `lookahead` and calls the
parser.'''

token_gen = lexan()

tree = expr()
match('')
v = Visitor()
result = v.visit(tree)
print(result)

if __name__ == '__main__':
main()
``````

This concludes this post and the series. In this post, you learned about ASTs and the visitor pattern for evaluating them. ASTs are used by compilers as a representation of code from which target language instructions are generated. Here I applied the concept to a simple calculator. Any of the techniques presented in this 3-part series can be used to build the calculator.

#### References

1. Compilers - Principles, techniques and tools by Aho, Sethi and Ullman

2. Let's build a simple interpreter. Part 7 - Blog by Ruslan Spivak

3. Python cookbook by David Beazley and Brian Jones

Back to index