Visual AST for ANTLR Generated Parser Output

Matthew So (Hong Kong)

4.92/5 (14 votes)

Apr 23, 2012

CPOL

13 min read

94422

2201

Build a visual AST from ANTLR parser

Download source - 178 KB

Visual AST

Introduction

Building a concise Abstract Syntax Tree (AST) representing source code is a key step for conveniently producing any compiled program. The program presented here is written in C# using VS2010 and it uses .NET GDI+ library to draw an image for AST returned from parser after parsing input source code. The parse is generated with ANTLR and it comes with an editor ANTLRWorks which can also shows AST graphically and is written in Java. This program shows how to use C# to create a graphical AST which is easily integrated into C# project while building DSL (Domain-specific language) module. Also, I am going to formalize my Entity Mapping language grammar and used it as an example for building the parser and draw the AST image. The Entity Mapping language is first described in my previous article Entity mapping language implementation using bsn-goldparser with CodeDom.

Background

ANTLR Tree Grammar

Before talking about the drawing program, I would like to discuss on the tree grammar when defining parser with ANTLR. ANTLR has several ways to let you build your parser with desired AST output format and using the tree grammar is one I found it very powerful and straight forward to twist the output format wanted.

Let us consider a simple tree which may be produced by a parser generated from ANTLR after parsing an expression x = 12 + 5. Using BNF syntax in ANTLR style as below,
assignment : IDENTIFIER '=' expression;
expression : INT '+' INT;
what we get from the parser output will be a flatten, one dimensional list of tokens as shown in below figure. Such flatten one dimensional AST will be very difficult to manipulate. If you need a better tree structure output, you need to use other ANTLR techniques like including semantic actions to assist in building your AST structure. Semantic actions are actually fragment of codes in target language (e.g. C#) spitted to the generated parser and lexer classes. So you need to have good understand of the ANTLR auto-generate mechanism before comfortably adding code to help generating lexer and parser classes with additional features you wanted.

Standard Tree Ouput

Using ANTLR tree grammar, to turn the output as a 2-dimensional tree, we annotate the token that is the root of a sub-tree by appending a ^ character to it when defining parser with BNF syntax. For the previous example, we can express it as
assignment : IDENTIFIER '='^ expression;
expression : INT '+'^ INT;
Then what we get a is a nicer AST structure as shown below which can be much easier for further manipulation. But if you look carefully the diagram, you will find that the root node nil is a bit odd and often not we wanted. Actually, we have only used shorthand syntax of ANTLR tree grammar, I shall discuss the full tree grammar when reviewing the sample Entity Mapping language grammar in later section. Full tree grammar syntax helps to add imaginary nodes such that our tree looks nicer for further processing and surely the nil node will be gone.

AST with tree grammar

Define Entity Mapping Language

My proposed Entity Mapping Language is a simple collection of assignment statements to let user specifying how they want one entity's properties mapped to another entity's properties. She needs to specify what source and target entity type are in first line of statements. I call it mapping declaration statement and BNF syntax [Target Entity Type] <- [Source Entity Type]is used. After the mapping declaration, there are series of assignment statements to set properties of target entity and the BNF syntax [Targte Property] = Expression is used. Expression in each assignment statement can be computed from combinations of source and target properties and global function call(s). Each statement is terminated by semicolon ; character so that it can span multiple lines. Also, conditional IF expression is supported (similar to Excel IF expression) to allow user entering branching expressions with condition. The following is an example program allowed to be entered by user to specify how an invoice entity be assigned its properties from an order entity. For more information, please refer to my previous article which details the concept for defining an entity mapping language. Comparing to previous article, I have omitted static method call and replaced with global function call as it looks more natural for user who just entering assignment statements like what she did in Excel while entering formula.

// Listing 1
Invoice <- Order ;
InvoiceDate = Order.OrderDate + 30 ;
InvoiceNo = GetNextInvoiceNo() ;
Freight = Order.TotalCBM * 1.5 + Order.TotalWeight * 2.2;
ShipVia = IF(Order.IsLocal, "Express", "Mail") ;

The following listing is the grammar for the lexer and parser defined in ANTLR grammar file for the Entity Mapping language. I use what ANTLR called combined grammar to define my language that it combines both lexer and parer definitions in a single file. Also, I have specified C# 3 as the target language to create lexer and parser classes that is indicated in options section with lanaguage = CSharp3 setting. The setting output = AST means returning AST by the parser and format is in CommonTree type. Also the lines starting with @lexer and @parser allow us to specify the generated classes in .NET namespace and I use EntMapping here. The @header section will insert whatever code written inside to generate class headers and I specify C# using clauses here. Actually, you have full control on how to generate your lexer and parser classes like inserting fields, properties, methods and semantic actions (C# codes) in them. As these topics are out of my scope here, please refer to ANTLR website for more details.

// Listing 2 - EntityMapping.g
grammar EntityMapping;

options{
output=AST;
language=CSharp3;
ASTLabelType=CommonTree;
}

tokens{ 
ENTMAP;
MAPDCL;
MAPBODY;
ASSIGN;
GETMEMBER;
CALL;
COND;
TFEXPR;
}

@lexer::namespace { EntMapping }
@parser::namespace { EntMapping }

@header {
using System;
using System.Collections;
}

public prog : mapDeclaration stat* -> ^(ENTMAP mapDeclaration ^(MAPBODY stat*) ) ;

mapDeclaration : ID '<-' ID ';' -> ^(MAPDCL ID ID) ;

stat : ID '=' expr ';' -> ^(ASSIGN ID expr);

expr : boolAndExpr (OR^ boolAndExpr)*;

boolAndExpr : equalityExpr (AND^ equalityExpr)*;

equalityExpr : relationalExpr ((EQUALS^|NOTEQUALS^)relationalExpr)*;

relationalExpr : additiveExpr ((LT^|LTEQ^|GT^|GTEQ^)additiveExpr)* ;

additiveExpr : multiExpr ( (PLUS^|MINUS^) multiExpr )*;

multiExpr : unaryExpr ( (MULT^|DIV^|MOD^) unaryExpr)*;

unaryExpr : (NOT^|INC^|DEC^|) atom;

atom : value | ID | functionCall | ifExpr  | memberAccess | '(' expr ')' ;

value : INT | FLOAT | STRING |CHAR;

memberAccess : ID '.' ID -> ^(GETMEMBER ID ID);

ifExpr : 'IF''(' ce=expr ',' e1=expr ',' e2=expr ')' -> ^('IF' ^(COND $ce) ^(TFEXPR $e1 $e2));

functionCall : ID '(' (params)? ')' -> ^(CALL ID params*) ;

params : expr (',' expr )*;

ID : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')* ;

INT : '0'..'9'+ ;

FLOAT : ('0'..'9')+ '.' ('0'..'9')* EXPONENT? | '.' ('0'..'9')+ EXPONENT? | ('0'..'9')+ EXPONENT ;

COMMENT : '//' ~('\n'|'\r')* '\r'? '\n' { Skip();} | '/*' ( options {greedy=false;} : . )* '*/' { Skip();} ;

WS : ( ' ' | '\t' | '\r' | '\n' ) { Skip();} ;


STRING : '"' ( ESC_SEQ | ~('\\'|'"') )* '"' ;

CHAR: '\'' ( ESC_SEQ | ~('\''|'\\') ) '\'' ;

fragment
EXPONENT : ('e'|'E') ('+'|'-')? ('0'..'9')+ ;

fragment
HEX_DIGIT : ('0'..'9'|'a'..'f'|'A'..'F') ;

fragment
ESC_SEQ : '\\' ('b'|'t'|'n'|'f'|'r'|'\"'|'\''|'\\') | UNICODE_ESC | OCTAL_ESC ;

fragment
OCTAL_ESC : '\\' ('0'..'3') ('0'..'7') ('0'..'7') | '\\' ('0'..'7') ('0'..'7')  | '\\' ('0'..'7') ;

fragment
UNICODE_ESC : '\\' 'u' HEX_DIGIT HEX_DIGIT HEX_DIGIT HEX_DIGIT ;

OR : '||' | 'or';
AND : '&&' | 'and';
EQUALS : '==' | '=' | 'eq';
NOTEQUALS : '!=' | '<>' | 'ne';
LT : '<';
LTEQ : '<=';
GT : '>';
GTEQ : '>=';
PLUS : '+';
MINUS : '-';
MULT : '*';
DIV : '/';
MOD : '%';
NOT : '!' | 'not';
INC : '++';
DEC : '--';

In tokens section, it includes the imaginary tokens which will be added to the final AST tree and help us tidy up our final tree structure. For example, ENTMAP will become root node in the final AST and it replaces the nil node shown in example at previous section. The entry point for my language definition is the production with nonterminal prog which has a public attribute before it. By default, each production will be generated as a private method in the parser class and we need to have the entry method to be made public to allow us to execute it. Another way to allow calling into private method is to add a public method to another file declared as partial class to the parser. As ANTLR generates partial class and we can complement public methods on it to make it accessible outside the class.

Again, when we look at the first production, public prog : mapDeclaration stat* -> ^(ENTMAP mapDeclaration ^(MAPBODY stat*) ), we find there is statement after the -> characters which is not a normal BNF syntax (ANTLR uses EBNF which is an extended version of BNF and I assume reader understands when I talk about BNF here which does not necessary 100% follow the BNF standard). Actually, it is the tree building instruction and tokens within ^() form a tree with the first token treated as root and others as children. In the prog tree building instruction, ENTMAP becomes root of the tree and mapDeclaration and ^(MAPBODY stat*) become children of it. To further decomposing the right node of this tree, ^(MAPBODY stat*) forms another sub-tree at right side with MAPBODY as the root of this sub-tree and stat* as the children. Still confusing? Don't feel desperate. That's why I create visual AST program here to draw the AST tree output. Please keep on reading and I am sure you shall sooner or later understand what mentioned here after full picture is drawn on screen as a visual tree.

When you browse further down the list of productions, you cannot find ENTMAP and MAPBODY nonterminal declarations that appeared in prog production tree building instruction before. I think you can guess it right that they are imaginary tokens which I have talked about before. Their existence is solely for building connecting nodes on the AST tree output. ENTMAP specified here becomes the root node of the whole AST replacing the nil node before. Under this root node, the left child is the mapping declaration statement and you can find its production next on the listing. Since there are zero of more assignment statements followed and I want it to be grouped as children under an imaginary body node, I specify MAPBODY as the root of the right sub-tree and put all assignment statements under it as children.

Beside using tree building instruction followed each production, I have used inline instruction as talked before in previous section. Just annotate the token with ^ character after it for specifying it as the root node of the sub-tree in place. For example, the production - expr : boolAndExpr (OR^ boolAndExpr)*, it specifies OR token as the root node of the sub-tree of this production. I think this is quite straight forward for reader who has prior knowledge on compiler building (of course I expect reader should has basic understanding on compiler building before reading this article or else I suggest who needs to go through certain basic compiler materials before coming back to review this article), binary operator is the best candidate raised as root node at each sub-tree.

At last, I remind reader who wants to try creating grammar themselves using ANTLR and targeting C# language, remember to translate all referred sample codes at ANLTR website to C# syntax even for minor ones like Skip() I used here which was all lower case characters when originally written in Java code but we need to change it to Pascal naming convention for this method call written in C#. This is because naming convention changed when ANLTR library for C# target ported from Java code. I do want to help readers here save their time without trying to figure out what's wrong when everything followed the website examples but still does not work because information given usually based on Java but not C#. For this project, I used ANLTRWorks to create the grammar file (*.g) and generate the C# lexer and parser classes. This editor created using Java and includes necessary libraries to generate C# target lexer and paser classes without separated downloads. Of course you need to have Java runtime library installed before you successfully start the tool. After generating lexer and parser classes, you can start building the compiler using these generated classes.

Using the code

Before talking about the AST drawing codes, let me show you the outcome from running this program. Make sure you have download Antlr3.Runtime.dll from ANTLR website and you may need to add it as reference before running the program. In the example below, I enter testing statements in listing 1 and click Parse button. The returned result from the parser shows in two forms. The first form is in string format with brackets to describe the AST tree structure and the second form draws the AST as an image. Clearly, you can immediately get the whole picture from the AST image and easily compare it to the source statements. In contrast, unless your brain is a digitized machine you should be like me as a normal person having difficulty to comprehend the tree string to figure out how the hierarchy is formed.

ENT Mapping Program

When you browse through the AST image, you shall agree that having the imaginary nodes like ENTMAP, MAPDCL, MAPBODY and ASSIGN inserted as the connection nodes on the tree makes it much easier for reviewing and manipulation. For example, we can follow the AST tree and read like this - the result shows an entity mapping (ENMAP) program that has mapping declaration (MAPDCL) for Invoice mapped from Order entity. The mapping statement body (MAPBODY) includes four assignment statements (ASSIGN) to set the properties of Invoice entity. The first assignment statement sets the InvoiceDate property by getting OrderDate from Order entity and adding 30 days to it. So, you can follow the tree and speak out the other assignment statements as easy as I do here.

I think creating a visual AST shown about is not only for pretty but it really helps in syntax checking when input statements are incorrect or just to double checking assignment statements are all correctly entered before parsed result passing for compilation. In this article I just concentrate on AST presentation without further discussion on how we make use the AST output. Actually, reader may regard the drawing routine as a form of tree visiting steps and use as a reference for further development to a compilation module.

Points of Interest

The drawing function has idea adopted from the article Graphical BinaryTrees which let each node in tree draw its own image and delegate children drawing to each child. First, let us review the C# Interface designed for supporting AST drawing below. Since I do not want the AST drawing function restricted to only ANTLR AST output, defining an Interface, IASTTreeNode, gives us the flexibility to support other tools output. Any AST node implements this Interface which gives property Text as node's label, returns property Count as number of children and has Children property to return child nodes can be passed to the drawing function.

// Listing 3 - IASTTreeNode.cs
//
public interface IASTTreeNode
{
    string Text { get; }
    int Count { get; }
    IEnumerable<IASTTreeNode> Children { get; }
}

To create implementation for IASTTreeNode from ANTLR parser generated tree is rather straight forward. As shown in listing 4, just passing in the ANTLR parser returned AST, ITree, and delegating all implementation details by calling its methods or properties then we are ready to go on drawing.

// Listing 4 - ASTTreeNode.cs
//
public class ASTTreeNode : IASTTreeNode
{
    ITree _tree;
    
    public ASTTreeNode(ITree tree)
    {
        _tree = tree;
    }
    
    public string Text
    {
        get { return _tree.Text; }
    }
    
    public int Count
    {
        get { return _tree.ChildCount; }
    }
    
    public IEnumerable<IASTTreeNode> Children
    {
        get
        {
            for (int i = 0; i < _tree.ChildCount; ++i)
            {
                yield return new ASTTreeNode(_tree.GetChild(i));
            }
        }
    }    
}

Then, let us turn our discussion on the drawing function. When drawing the tree, we can consider each parent node may has zero or many children. When there is no child under an parent node, there is no need to reserve vertical gap and the total height is the parent node's height. But if there is any child, we need to calculate maximum child's height among children plus vertical space and add to parent node's height to get the total height of the image. Also, the total width is sum of all widths from children and add gap widths among the children. Of course, when in the case of no child, only the width of parent node plus a little gapping is needed. The center position from left side is returned from the drawing function as it is important to let each node's parent create connection line to it's top center. Reader can refer to figure below to get the geometry of image when drawing the tree.

Listing 5 shows all drawing codes within the class VisualAST. Start from the root node, each node only responsible to draw its own image and possible connection lines if it has any child. Then the parent node in current context recursively calls Draw() method and passing each of its children to it with with each child's center position returned. By using the center position returned, connection line is drawn from bottom of parent node to its child.

One final point I want to remind reader is that the GDI objects are often regarded as sacred resources. So, we need to release them once completed drawing but need to cached them in static properties when they can be shared to improve performance. That's why I cache Font and Pens objects in the class as they can be shared from multiple invocation of drawings.

// Listing 5 - VisualAST.cs
//
public class VisualAST
{
    private static Font NodeTextFont = new Font("Verdana", 8f);
    private static SizeF MinimumNodeSize = new SizeF(32, 28);
    private static Size NodeGapping = new Size(4, 32);
    private static Dictionary<string, Pen> Pens = new Dictionary<string, Pen>();

    public IASTTreeNode ASTTreeNode { get; private set; }

    public VisualAST(IASTTreeNode astTreeNode)
    {
        ASTTreeNode = astTreeNode;
    }

    private static Bitmap CreateNodeImage(Size size, string text, Font font)
    {
        Bitmap img = new Bitmap(size.Width, size.Height);
        using (var g = Graphics.FromImage(img))
        {
            g.SmoothingMode = SmoothingMode.HighQuality;
            var rcl = new Rectangle(1, 1, img.Width - 2, img.Height - 2);
            g.FillRectangle(Brushes.White, rcl);

            LinearGradientBrush linearBrush = 
              new LinearGradientBrush(rcl, Color.LightBlue, 
              Color.White, LinearGradientMode.ForwardDiagonal);
            g.DrawEllipse(NodeBorderPen, rcl);
            g.FillEllipse(linearBrush, rcl);
            linearBrush.Dispose();

            var sizeText = g.MeasureString(text, font);
            g.DrawString(text, font, Brushes.Black, 
              Math.Max(0, (size.Width - sizeText.Width) / 2), 
              Math.Max(0, (size.Height - sizeText.Height) / 2));
        }
        return img;
    }

    private static Pen ConnectionPen
    {
        get
        {
            string penName = "ConnectionPen";
            if (!Pens.ContainsKey(penName))
            {
                Pens.Add(penName, new Pen(Brushes.Black, 1) { EndCap = LineCap.ArrowAnchor, StartCap = LineCap.Round });
            }
            return Pens[penName];
        }
    }

    private static Pen NodeBorderPen
    {
        get
        {
            string penName = "NodeBorderPen";
            if (!Pens.ContainsKey(penName))
            {
                Pens.Add(penName, new Pen(Color.Silver, 1));
            }
            return Pens[penName];
        }
    }

    public Image Draw()
    {
        int center;
        Image image = Draw(this.ASTTreeNode, out center);
        return image;
    }

    private Image Draw(IASTTreeNode astTreeNode, out int center)
    {
        var nodeText = astTreeNode.Text;
        var nodeSize = TextMeasurer.MeasureString("*" + nodeText + "*", NodeTextFont);
        nodeSize.Width = Math.Max(MinimumNodeSize.Width, nodeSize.Width);
        nodeSize.Height = Math.Max(MinimumNodeSize.Height, nodeSize.Height);

        var childCentres = new int[astTreeNode.Count];
        var childImages = new Image[astTreeNode.Count];
        var childSizes = new Size[astTreeNode.Count];

        var enumerator = astTreeNode.Children.GetEnumerator(); ;
        int i = 0;
        while (enumerator.MoveNext())
        {
            var currentNode = enumerator.Current;
            var lCenter = 0;
            childImages[i] = Draw(currentNode, out lCenter);
            childCentres[i] = lCenter;
            if (childImages[i] != null)
            {
                childSizes[i] = childImages[i] != null ? childImages[i].Size : new Size();
            }
            ++i;
        }

        // draw current node and it's children
        var under = childImages.Any(nodeImg => nodeImg != null);// if true the current node has childs
        var maxHeight = astTreeNode.Count > 0 ? childSizes.Max(c => c.Height) : 0;
        var totalFreeWidth = astTreeNode.Count > 0 ? NodeGapping.Width * (astTreeNode.Count - 1) : NodeGapping.Width;
        var totalChildWidth = childSizes.Sum(s => s.Width);

        var nodeImage = CreateNodeImage(nodeSize.ToSize(), nodeText, NodeTextFont);

        var totalSize = new Size
        {
            Width = Math.Max(nodeImage.Size.Width, totalChildWidth) + totalFreeWidth,
            Height = nodeImage.Size.Height + (under ? maxHeight + NodeGapping.Height : 0)
        };

        var result = new Bitmap(totalSize.Width, totalSize.Height);
        var g = Graphics.FromImage(result);
        g.SmoothingMode = SmoothingMode.HighQuality;
        g.FillRectangle(Brushes.White, new Rectangle(new Point(0, 0), totalSize));

        var left = (totalSize.Width - nodeImage.Width) / 2;

        g.DrawImage(nodeImage, left, 0);

        center = Math.Max(totalSize.Width / 2, (nodeImage.Width + NodeGapping.Width) / 2);

        var fromLeft = 0;

        for (int j = 0; j < astTreeNode.Count; ++j)
        {
            float x1 = center;
            float y1 = nodeImage.Height;
            float y2 = nodeImage.Height + NodeGapping.Height;
            float x2 = fromLeft + childCentres[j];
            var h = y2 - y1;
            var w = x1 - x2;
            var childImg = childImages[j];
            if (childImg != null)
            {
                g.DrawImage(childImg, fromLeft, nodeImage.Size.Height + NodeGapping.Height);
                fromLeft += childImg.Width + NodeGapping.Width; // Prepare next child left starting point 
                var points1 = new List<PointF>
                              {
                                  new PointF(x1, y1),
                                  new PointF(x1 - w/6, y1 + h/3.5f),
                                  new PointF(x2 + w/6, y2 - h/3.5f),
                                  new PointF(x2, y2),
                              };
                g.DrawCurve(ConnectionPen, points1.ToArray(), 0.5f);
            }

            childImages[j].Dispose(); // Release child image as it aleady drawn on parent node's surface 
        }

        g.Dispose();

        return result;
    }
}

The main function to call the drawing method is listed below. It gets tree object returned from parser and passes it to Draw() method of VisualAST and assign image returned to a PictureBox on the testing Form. That's all how we get for the final AST drawn on screen

// Listing 6 - FormTest.cs
//
string inputString = textBoxInput.Text;
var input = new ANTLRStringStream(inputString);
var lexer = new EntityMappingLexer(input);
var tokens = new CommonTokenStream(lexer);
var parser = new EntityMappingParser(tokens);
var result = parser.prog();
var tree = result.Tree;
textBoxStringTree.Text = tree.ToStringTree();

var treeNodeDrawable = new ASTTreeNode(tree);
int center;
if (pictureBox1.Image != null)
{
    pictureBox1.Image.Dispose();
    pictureBox1.Image = null;
}
Image image = new VisualAST(treeNodeDrawable).Draw();
pictureBox1.Image = image;

History

2012.04.23 Article submitted.