12,244,041 members (47,803 online)
Tip/Trick
alternative version

Stats

7.8K views
2 bookmarked
Posted

, 24 May 2014 GPL3
 Rate this:
A Java program to translate arithmetic expression to three-address codes

Introduction

In this tip, I am demonstrating `CodeConvert`, a Java based translator which converts arithmetic expressions to 3-address codes.

This program was written as a part of Compiler Design course and its aim was to help students familiarize with various compiler modules, for example: lexer, parser, intermediate code generator and optimizer.

Using the Program

To run executable jar file, open terminal and do:

```\$ cd demo
\$ java -jar CodeConvert.jar
Instruction File Path: input_file_path```

`CodeConvert `takes input in the form of text file which contains arithmetic expressions. The content of input file is converted to 3-address codes which may contain many redundancies. Finally, the program performs various optimization techniques, such as Constant folding, Algebric simplification, Copy propagation and Dead code removal to give final optimized code.

 Input Unoptimized 3-address codes Codes after 4 optimization rounds Final output (8 optimization rounds) x = 9 y = sin(23) + cos(76) z = x*x + 2*y*x + y*y x = z/2 + y/2 y = (x + y - z * (z - 2))/(x/z + y/z + 2) x = 9 + 0 y = sin(23) + cos(76) t_1 = x * x t_2 = 2 * y t_3 = t_2 * x t_4 = t_1 + t_3 t_5 = y * y z = t_4 + t_5 t_6 = z / 2 t_7 = y / 2 x = t_6 + t_7 t_8 = x + y t_9 = z - 2 t_10 = z * t_9 t_11 = t_8 - t_10 t_12 = x / z t_13 = y / z t_14 = t_12 + t_13 t_15 = t_14 + 2 y = t_11 / t_15 x = 9 y = 0 t_4 = 81 z = 81 t_6 = 81 / 2 x = t_6 t_9 = 81 - 2 t_10 = 81 * t_9 t_11 = t_6 - t_10 t_12 = t_6 / 81 t_13 = 0 / 81 t_14 = t_12 + t_13 t_15 = t_14 + 2 y = t_11 / t_15 x = 9 y = 0 z = 81 x = 40 y = -3179

Language

Note from the example in the previous section that the input file has a specific structure.

1. Each line contains only one expression
2. Each expression can have only a variable on LHS. Numbers or other expressions are not allowed in LHS
3. RHS may contain nested expression with both numbers and variables.
4. Any variable can be used on LHS any number of times.
5. Expression may contain undefined variables in them, unlike the example given above.
6. Some other functions are also supported with the help of JExel. These functions are:

• `max`, `min`
• `floor`, `ceil`
• `neg`, `abs`
• `cos`, `sin`, `tan`
• `acos`, `asin`, `atan`
• `deg`, `rad`

Alternatives

This program was written to help understand different parts of compiler and challenges faced in their implementation. Instead of writing these parts from scratch, one can simply use existing utilities such as flex and bison to build lexer and parser.

Also I have assumed that input file follows the structure specified above and contains no syntactic errors. Alternatively, one can modify the source code and place checks for correctness.

Brief Description of Working

File is read line-by-line and its contents are saved in an `ArrayList`. Each item in `ArrayList `corresponds to one expression (one line from file).

```See Test.java, line 60 to 68

while(scan.hasNextLine()){
System.out.println(lines.get(lines.size()-1));   // Print input line
}```

2. Converting to Token Stream

Each element of `ArrayList `is processed to find tokens using regular expressions.

```See Builder/TreeBuilder.java, function GetTokens, line 81

[a-zA-Z_]+[a-zA-Z0-9_]*\\([^()]*\\) to find functions e.g. floor(2.1), cos(9.1)
[0-9]+                              to find numbers   e.g. 9, 13, 103
[a-zA-Z_][a-zA-Z0-9_]*              to find variables e.g. abc, _velocity, speed0
\\Q(\\E                             to find opening brackets (
\\Q)\\E                             to find closing brackets )```

Each line of input file is itself converted to an `ArrayList `of tokens by the function `GetTokens `referred above.

3. Token Stream to Syntax Tree

Token stream from previous section is now passed to function `GenerateTree `which reads tokens one-by-one and identifies its type. Utilizing tokens' types, it places them accordingly in a Binary Tree. Here we have to take care of operator precedence and importance of brackets. We define `root` as the root node of tree generated so far. `current` is a new node containing current token and is not attached to the tree yet.

1. If `root` is a number and `current` is an operator, `root` will become the left child of `current`. `current` will become new `root` of the tree.
2. If `root` has lower precedence than `current`, then it will remain `root` of the tree and `current` will become its right child.
3. Entities within brackets `(...)` will be treated as units for all outside nodes. We define `current_br` as the root of subexpression within brackets, it is yet not attached to the expression tree.
4. If `root` has higher precedence than `current` and `current` is not a `current_br` then `root` will replace left child of `current`. The original left child of `current` will become right child of `root`. `current` will become new `root`.
5. If `root` has higher precedence than `current` but `current` is a `current_br` then `root` will remain root and will take `current_br` as right child.

Actions in other situations are simpler and if you wish to check them, see `Builder/TreeBuilder.java`, `line 96-169`. There may be flaws in this logic but all the test expressions have given desired results so far. To test validity of any expression, you can print its expression tree using the `TreePrinter` class as follows:

```TreeBuilder tb = new TreeBuilder();
Node n = tb.GenerateTree("a*b-p/(a+b)");                // build syntax tree

n = ThreeAddressBuilder.normalize(n);                   // hack for special case

TextNode m = TreePrint.TreePrinter.TreeToTextTree(n);   // convert to printable TextNode
String print = TreePrint.TreePrinter.TreeString(m);

System.out.println(print);```

The above code will print the tree in standard output as:

```            [-]
/   \
/     \
/       \
/         \
/           \
[*]             [/]
/   \           /   \
[a]     [b]     [p]     \
\
\
\
[+]
/   \
[a]     [b]```

4. Syntax Tree to 3-Address Codes

Every single expression will now be converted to a series of 3-address codes. Class `ExprBuilder` in `Builder` package does this job. Tree is traversed top-bottom and temporary variables are defined to build 3-address codes.

```See Builder/ThreeAddressBuilder.java, line 65-77

ArrayList<Instruction> i = new ArrayList<>();       // temporary storage for 3-address codes
Node n = tb.GenerateTree(s[1].trim());
n = normalize(n);

ExprBuilder.Parse(i, n, start);        // convert tree to 3-address codes, start: starting index of temp-variables
code.addAll(i);                        // merge 3-address codes of current expression to codes so far

start = LastNumbered(code) + 1;        // update starting index of temp-variables of next series so to avoid confusion```

This stage will give us unoptimized 3-address codes of entire input file.

5. Optimization

See `Optimizer.java` in `Optimize` package to check working of various optimization techniques.

```See Test.java, line 102-109

boolean optimized = true;
int i = 1;

while(optimized && i < MAX_ROUNDS){     // try optimization as long as possible
optimized = false;

optimized = op.Optimize(ins);
i++;
}```

Features Missing

Some features are not implemented but I may do so in future. These are:

1. Custom operators and with defined semantics. See `Elements/Operator.java`
2. Associativity of operators may have bugs
3. Optimization implementation is very basic and can be extended to add other optimizations

Share

 Student India
No Biography provided

You may also be interested in...

 First Prev Next
 3 address code in compiler design Member 118068171-Jul-15 11:49 Member 11806817 1-Jul-15 11:49
 WESH Member 108744626-Jul-14 17:15 Member 10874462 6-Jul-14 17:15
 My vote of 5 xfzheng27-May-14 16:07 xfzheng 27-May-14 16:07
 Have I seen this before? DaveAuld24-May-14 8:47 DaveAuld 24-May-14 8:47
 Re: Have I seen this before? Psycho_Coder26-May-14 5:29 Psycho_Coder 26-May-14 5:29
 Last Visit: 31-Dec-99 19:00     Last Update: 1-May-16 2:41 Refresh 1