Visual AST for ANTLR Generated Parser Output






4.92/5 (14 votes)
Build a visual AST from ANTLR parser
Introduction
Building a concise Abstract Syntax Tree (AST) representing source code is a key step for conveniently producing any compiled program. The program presented here is written in C# using VS2010 and it uses .NET GDI+ library to draw an image for AST returned from parser after parsing input source code. The parse is generated with ANTLR and it comes with an editor ANTLRWorks which can also shows AST graphically and is written in Java. This program shows how to use C# to create a graphical AST which is easily integrated into C# project while building DSL (Domain-specific language) module. Also, I am going to formalize my Entity Mapping language grammar and used it as an example for building the parser and draw the AST image. The Entity Mapping language is first described in my previous article Entity mapping language implementation using bsn-goldparser with CodeDom.
Background
ANTLR Tree Grammar
Before talking about the drawing program, I would like to discuss on the tree grammar when defining parser with ANTLR. ANTLR has several ways to let you build your parser with desired AST output format and using the tree grammar is one I found it very powerful and straight forward to twist the output format wanted.
Let us consider a simple tree which may be produced by a parser generated from ANTLR after parsing an expression x = 12 + 5
.
Using BNF syntax in ANTLR style as below, assignment : IDENTIFIER '=' expression;
expression : INT '+' INT;
what we get from the parser output will be a flatten, one dimensional list of tokens as shown
in below figure. Such flatten one dimensional AST will be very difficult to manipulate.
If you need a better tree structure output, you need to use
other ANTLR techniques like including semantic actions to assist in building your
AST structure. Semantic actions are actually fragment of codes in target
language (e.g. C#) spitted to the generated parser and lexer classes. So you
need to have good understand of the ANTLR auto-generate mechanism before comfortably
adding code to help generating lexer and parser classes with additional features
you wanted.
Using ANTLR tree grammar, to turn the output as a 2-dimensional tree, we annotate the token that is the root of
a sub-tree by appending a ^ character to it when defining parser with BNF syntax.
For the previous example, we can express it as assignment : IDENTIFIER '='^ expression;
expression : INT '+'^ INT;
Then what we get a is a nicer AST structure as shown below which can be much easier for further manipulation. But if you look carefully the diagram, you will find that the root node nil
is a bit odd and often not we
wanted. Actually, we have only used shorthand syntax of ANTLR tree grammar, I shall discuss the full tree grammar when reviewing the sample
Entity Mapping language grammar in later section. Full tree grammar syntax helps to add imaginary nodes such that
our tree looks nicer for further processing and surely the nil
node will be gone.
Define Entity Mapping Language
My proposed Entity Mapping Language is a simple collection of assignment statements to let user
specifying how they want one entity's properties mapped to another entity's
properties. She needs to specify what source and target entity type are in first
line of statements. I call it mapping declaration statement and BNF syntax [Target
Entity Type] <- [Source Entity Type]
is used. After the mapping declaration,
there are series of assignment statements to set properties of target entity and
the BNF syntax [Targte Property] = Expression
is used.
Expression in each assignment statement can be computed from combinations of source and
target properties and global function call(s). Each statement is terminated by
semicolon ;
character so that it can span multiple lines.
Also, conditional IF
expression is supported (similar to Excel IF expression) to
allow user entering branching expressions with condition. The following is an
example program allowed to be entered by user to specify how an invoice entity
be assigned its properties from an order entity. For more information,
please refer to my previous article which details the concept for defining an
entity mapping language. Comparing to previous article, I have omitted
static method call and replaced with global function call as it looks more
natural for user who just entering assignment statements like what she did in
Excel while entering formula.
// Listing 1
Invoice <- Order ;
InvoiceDate = Order.OrderDate + 30 ;
InvoiceNo = GetNextInvoiceNo() ;
Freight = Order.TotalCBM * 1.5 + Order.TotalWeight * 2.2;
ShipVia = IF(Order.IsLocal, "Express", "Mail") ;
The following listing is the grammar for the lexer and parser defined in ANTLR grammar file for the
Entity Mapping language. I use what ANTLR called combined grammar to
define my language that it combines
both lexer and parer definitions in a single file. Also, I have specified C#
3 as the target language to create lexer and parser classes that is indicated in options section
with lanaguage = CSharp3
setting. The setting output = AST
means returning AST by the parser
and format is in CommonTree
type. Also the lines starting with @lexer
and @parser
allow us to specify the
generated classes in .NET namespace and I use EntMapping
here.
The @header
section will insert whatever code written inside to generate
class headers and I specify C# using clauses here. Actually, you
have full control on how to generate your lexer and parser classes like
inserting fields, properties, methods and semantic actions (C# codes) in them.
As these topics are out of my scope here, please refer to ANTLR website for more details.
// Listing 2 - EntityMapping.g grammar EntityMapping; options{ output=AST; language=CSharp3; ASTLabelType=CommonTree; } tokens{ ENTMAP; MAPDCL; MAPBODY; ASSIGN; GETMEMBER; CALL; COND; TFEXPR; } @lexer::namespace { EntMapping } @parser::namespace { EntMapping } @header { using System; using System.Collections; } public prog : mapDeclaration stat* -> ^(ENTMAP mapDeclaration ^(MAPBODY stat*) ) ; mapDeclaration : ID '<-' ID ';' -> ^(MAPDCL ID ID) ; stat : ID '=' expr ';' -> ^(ASSIGN ID expr); expr : boolAndExpr (OR^ boolAndExpr)*; boolAndExpr : equalityExpr (AND^ equalityExpr)*; equalityExpr : relationalExpr ((EQUALS^|NOTEQUALS^)relationalExpr)*; relationalExpr : additiveExpr ((LT^|LTEQ^|GT^|GTEQ^)additiveExpr)* ; additiveExpr : multiExpr ( (PLUS^|MINUS^) multiExpr )*; multiExpr : unaryExpr ( (MULT^|DIV^|MOD^) unaryExpr)*; unaryExpr : (NOT^|INC^|DEC^|) atom; atom : value | ID | functionCall | ifExpr | memberAccess | '(' expr ')' ; value : INT | FLOAT | STRING |CHAR; memberAccess : ID '.' ID -> ^(GETMEMBER ID ID); ifExpr : 'IF''(' ce=expr ',' e1=expr ',' e2=expr ')' -> ^('IF' ^(COND $ce) ^(TFEXPR $e1 $e2)); functionCall : ID '(' (params)? ')' -> ^(CALL ID params*) ; params : expr (',' expr )*; ID : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')* ; INT : '0'..'9'+ ; FLOAT : ('0'..'9')+ '.' ('0'..'9')* EXPONENT? | '.' ('0'..'9')+ EXPONENT? | ('0'..'9')+ EXPONENT ; COMMENT : '//' ~('\n'|'\r')* '\r'? '\n' { Skip();} | '/*' ( options {greedy=false;} : . )* '*/' { Skip();} ; WS : ( ' ' | '\t' | '\r' | '\n' ) { Skip();} ; STRING : '"' ( ESC_SEQ | ~('\\'|'"') )* '"' ; CHAR: '\'' ( ESC_SEQ | ~('\''|'\\') ) '\'' ; fragment EXPONENT : ('e'|'E') ('+'|'-')? ('0'..'9')+ ; fragment HEX_DIGIT : ('0'..'9'|'a'..'f'|'A'..'F') ; fragment ESC_SEQ : '\\' ('b'|'t'|'n'|'f'|'r'|'\"'|'\''|'\\') | UNICODE_ESC | OCTAL_ESC ; fragment OCTAL_ESC : '\\' ('0'..'3') ('0'..'7') ('0'..'7') | '\\' ('0'..'7') ('0'..'7') | '\\' ('0'..'7') ; fragment UNICODE_ESC : '\\' 'u' HEX_DIGIT HEX_DIGIT HEX_DIGIT HEX_DIGIT ; OR : '||' | 'or'; AND : '&&' | 'and'; EQUALS : '==' | '=' | 'eq'; NOTEQUALS : '!=' | '<>' | 'ne'; LT : '<'; LTEQ : '<='; GT : '>'; GTEQ : '>='; PLUS : '+'; MINUS : '-'; MULT : '*'; DIV : '/'; MOD : '%'; NOT : '!' | 'not'; INC : '++'; DEC : '--';
In tokens
section, it includes the imaginary tokens which will be added to the final AST tree and help us tidy up our final tree structure. For example, ENTMAP
will become root node in
the final AST and it replaces the nil
node shown in example
at previous section. The entry point for my language definition is the
production with nonterminal prog
which has a public
attribute before it.
By default, each production will be generated as a private method in the parser
class and we need to have the entry method to be made public to allow us to
execute it. Another way to allow calling into private method is to add a
public method to another file declared as partial class to the parser. As
ANTLR generates partial class and we can complement public methods on it to make
it accessible outside the class.
Again, when we look at the first production, public prog : mapDeclaration stat* -> ^(ENTMAP mapDeclaration ^(MAPBODY stat*) )
,
we find there is statement after the ->
characters which is not
a normal BNF
syntax (ANTLR uses EBNF which is an extended version of BNF and I assume reader
understands when I talk about BNF here which does not necessary 100% follow the
BNF standard). Actually, it is the tree building instruction and tokens within
^()
form a tree with the first token treated as root and others as children. In
the prog
tree building instruction, ENTMAP
becomes root of the tree and
mapDeclaration
and ^(MAPBODY stat*)
become children of it. To further
decomposing the right node of this tree, ^(MAPBODY stat*)
forms another sub-tree at
right side with MAPBODY
as the root of this sub-tree and stat*
as the children.
Still confusing? Don't feel desperate. That's why I create visual AST
program here to draw the AST tree output. Please keep on reading and I am
sure you shall sooner or later understand what mentioned here after full
picture is drawn on screen as a visual tree.
When you browse further down the list of productions, you cannot find ENTMAP
and MAPBODY
nonterminal declarations
that appeared in prog
production tree building instruction
before. I think you can guess it right that they
are imaginary tokens which I have talked about before. Their existence is solely
for building connecting nodes on the AST tree output. ENTMAP
specified here
becomes the root node of the whole AST replacing the nil
node before. Under
this root node, the left child is the mapping declaration statement and you can
find its production next on the listing. Since
there are zero of more assignment statements followed and I want it to be
grouped as children under an imaginary body node, I specify MAPBODY
as the root
of the right sub-tree and put all assignment statements under it as children.
Beside using tree building instruction followed each production, I have used inline instruction as talked before in previous section. Just
annotate the token with ^
character after it for specifying it as the root node
of the sub-tree in place. For example, the production - expr : boolAndExpr
(OR^ boolAndExpr)*
, it specifies OR
token as the root node of the
sub-tree of this production. I think this is quite straight forward for
reader who has prior knowledge on compiler building (of course I expect reader
should has basic understanding on compiler building before reading this article
or else I suggest who needs to go through certain basic compiler materials
before coming back to review this article), binary operator is the
best candidate raised as root node at each sub-tree.
At last, I remind
reader who wants to try creating grammar themselves using ANTLR and targeting C#
language, remember to translate all referred sample codes at ANLTR website to C# syntax even
for minor ones like Skip()
I used here which was all lower case characters when
originally written in Java code but we need to change it to Pascal naming
convention for
this method call written in C#. This is because naming convention changed
when ANLTR library for C# target ported from Java code. I do want to help
readers here save their time without trying to figure out what's wrong when
everything followed the website examples but still does not work because
information given usually based on Java but not C#. For this project, I used
ANLTRWorks to create the grammar file (*.g) and generate the C# lexer and parser classes.
This editor created using Java and includes necessary libraries to generate C#
target lexer and paser classes without separated downloads. Of course you need
to have Java runtime library installed before you successfully start the tool. After generating
lexer and parser classes, you can start building the compiler using these
generated classes.
Using the code
Before talking about the AST drawing codes, let me show you the outcome from running this program. Make sure you have download Antlr3.Runtime.dll from ANTLR website and you may need to add it as reference before running the program. In the example below, I enter testing statements in listing 1 and click Parse button. The returned result from the parser shows in two forms. The first form is in string format with brackets to describe the AST tree structure and the second form draws the AST as an image. Clearly, you can immediately get the whole picture from the AST image and easily compare it to the source statements. In contrast, unless your brain is a digitized machine you should be like me as a normal person having difficulty to comprehend the tree string to figure out how the hierarchy is formed.
When you browse through the AST image, you shall agree that having the imaginary nodes like ENTMAP
, MAPDCL
,
MAPBODY
and ASSIGN
inserted as the connection nodes on the tree makes it much
easier for reviewing and manipulation. For example, we can follow the AST
tree and read like this - the result shows an entity mapping (ENMAP
) program
that has
mapping declaration (MAPDCL
) for Invoice mapped from Order entity. The
mapping statement body (MAPBODY
) includes four assignment statements (ASSIGN
) to
set the properties of Invoice entity. The first assignment statement sets the
InvoiceDate property by getting OrderDate from Order entity and adding
30 days
to it. So, you can follow the tree and speak out the other assignment
statements as easy as I do here.
I think creating a visual AST shown about is not only for pretty but it really helps in syntax checking when input statements are incorrect or just to double checking assignment statements are all correctly entered before parsed result passing for compilation. In this article I just concentrate on AST presentation without further discussion on how we make use the AST output. Actually, reader may regard the drawing routine as a form of tree visiting steps and use as a reference for further development to a compilation module.
Points of Interest
The drawing function has idea adopted from the article Graphical BinaryTrees
which let each node in tree draw its own image and delegate children drawing to
each child. First, let us review the C# Interface
designed for supporting AST drawing below. Since I do not want the AST
drawing function restricted to only ANTLR AST output, defining an Interface,
IASTTreeNode
, gives us the flexibility to support other tools output.
Any AST node implements this Interface which gives property Text as node's label, returns property Count as
number of children and has Children property to return child nodes can
be passed to the drawing function.
// Listing 3 - IASTTreeNode.cs
//
public interface IASTTreeNode
{
string Text { get; }
int Count { get; }
IEnumerable<IASTTreeNode> Children { get; }
}
To create implementation for IASTTreeNode
from ANTLR parser generated tree is rather straight forward. As shown in listing
4, just passing in the ANTLR parser returned AST, ITree
, and delegating all
implementation details by calling its methods or properties then we are ready to
go on drawing.
// Listing 4 - ASTTreeNode.cs
//
public class ASTTreeNode : IASTTreeNode
{
ITree _tree;
public ASTTreeNode(ITree tree)
{
_tree = tree;
}
public string Text
{
get { return _tree.Text; }
}
public int Count
{
get { return _tree.ChildCount; }
}
public IEnumerable<IASTTreeNode> Children
{
get
{
for (int i = 0; i < _tree.ChildCount; ++i)
{
yield return new ASTTreeNode(_tree.GetChild(i));
}
}
}
}
Then, let us turn our discussion on the drawing function. When drawing the tree, we can consider each parent node may has zero or many children. When there is no child under an parent node, there is no need to reserve vertical gap and the total height is the parent node's height. But if there is any child, we need to calculate maximum child's height among children plus vertical space and add to parent node's height to get the total height of the image. Also, the total width is sum of all widths from children and add gap widths among the children. Of course, when in the case of no child, only the width of parent node plus a little gapping is needed. The center position from left side is returned from the drawing function as it is important to let each node's parent create connection line to it's top center. Reader can refer to figure below to get the geometry of image when drawing the tree.
Listing 5 shows all drawing codes within the class VisualAST
.
Start from the root node, each node only responsible to draw its own image and
possible connection lines if it has any child. Then the parent node in current
context recursively calls Draw()
method and passing each of its children to it
with with each child's center position returned. By using the center
position returned, connection line is drawn from bottom of parent node to its
child.
One final point I want to remind reader is that the GDI objects are often regarded as sacred resources. So, we need to release them once completed drawing but need to cached them in static properties when they can be shared to improve performance. That's why I cache Font and Pens objects in the class as they can be shared from multiple invocation of drawings.
// Listing 5 - VisualAST.cs
//
public class VisualAST
{
private static Font NodeTextFont = new Font("Verdana", 8f);
private static SizeF MinimumNodeSize = new SizeF(32, 28);
private static Size NodeGapping = new Size(4, 32);
private static Dictionary<string, Pen> Pens = new Dictionary<string, Pen>();
public IASTTreeNode ASTTreeNode { get; private set; }
public VisualAST(IASTTreeNode astTreeNode)
{
ASTTreeNode = astTreeNode;
}
private static Bitmap CreateNodeImage(Size size, string text, Font font)
{
Bitmap img = new Bitmap(size.Width, size.Height);
using (var g = Graphics.FromImage(img))
{
g.SmoothingMode = SmoothingMode.HighQuality;
var rcl = new Rectangle(1, 1, img.Width - 2, img.Height - 2);
g.FillRectangle(Brushes.White, rcl);
LinearGradientBrush linearBrush =
new LinearGradientBrush(rcl, Color.LightBlue,
Color.White, LinearGradientMode.ForwardDiagonal);
g.DrawEllipse(NodeBorderPen, rcl);
g.FillEllipse(linearBrush, rcl);
linearBrush.Dispose();
var sizeText = g.MeasureString(text, font);
g.DrawString(text, font, Brushes.Black,
Math.Max(0, (size.Width - sizeText.Width) / 2),
Math.Max(0, (size.Height - sizeText.Height) / 2));
}
return img;
}
private static Pen ConnectionPen
{
get
{
string penName = "ConnectionPen";
if (!Pens.ContainsKey(penName))
{
Pens.Add(penName, new Pen(Brushes.Black, 1) { EndCap = LineCap.ArrowAnchor, StartCap = LineCap.Round });
}
return Pens[penName];
}
}
private static Pen NodeBorderPen
{
get
{
string penName = "NodeBorderPen";
if (!Pens.ContainsKey(penName))
{
Pens.Add(penName, new Pen(Color.Silver, 1));
}
return Pens[penName];
}
}
public Image Draw()
{
int center;
Image image = Draw(this.ASTTreeNode, out center);
return image;
}
private Image Draw(IASTTreeNode astTreeNode, out int center)
{
var nodeText = astTreeNode.Text;
var nodeSize = TextMeasurer.MeasureString("*" + nodeText + "*", NodeTextFont);
nodeSize.Width = Math.Max(MinimumNodeSize.Width, nodeSize.Width);
nodeSize.Height = Math.Max(MinimumNodeSize.Height, nodeSize.Height);
var childCentres = new int[astTreeNode.Count];
var childImages = new Image[astTreeNode.Count];
var childSizes = new Size[astTreeNode.Count];
var enumerator = astTreeNode.Children.GetEnumerator(); ;
int i = 0;
while (enumerator.MoveNext())
{
var currentNode = enumerator.Current;
var lCenter = 0;
childImages[i] = Draw(currentNode, out lCenter);
childCentres[i] = lCenter;
if (childImages[i] != null)
{
childSizes[i] = childImages[i] != null ? childImages[i].Size : new Size();
}
++i;
}
// draw current node and it's children
var under = childImages.Any(nodeImg => nodeImg != null);// if true the current node has childs
var maxHeight = astTreeNode.Count > 0 ? childSizes.Max(c => c.Height) : 0;
var totalFreeWidth = astTreeNode.Count > 0 ? NodeGapping.Width * (astTreeNode.Count - 1) : NodeGapping.Width;
var totalChildWidth = childSizes.Sum(s => s.Width);
var nodeImage = CreateNodeImage(nodeSize.ToSize(), nodeText, NodeTextFont);
var totalSize = new Size
{
Width = Math.Max(nodeImage.Size.Width, totalChildWidth) + totalFreeWidth,
Height = nodeImage.Size.Height + (under ? maxHeight + NodeGapping.Height : 0)
};
var result = new Bitmap(totalSize.Width, totalSize.Height);
var g = Graphics.FromImage(result);
g.SmoothingMode = SmoothingMode.HighQuality;
g.FillRectangle(Brushes.White, new Rectangle(new Point(0, 0), totalSize));
var left = (totalSize.Width - nodeImage.Width) / 2;
g.DrawImage(nodeImage, left, 0);
center = Math.Max(totalSize.Width / 2, (nodeImage.Width + NodeGapping.Width) / 2);
var fromLeft = 0;
for (int j = 0; j < astTreeNode.Count; ++j)
{
float x1 = center;
float y1 = nodeImage.Height;
float y2 = nodeImage.Height + NodeGapping.Height;
float x2 = fromLeft + childCentres[j];
var h = y2 - y1;
var w = x1 - x2;
var childImg = childImages[j];
if (childImg != null)
{
g.DrawImage(childImg, fromLeft, nodeImage.Size.Height + NodeGapping.Height);
fromLeft += childImg.Width + NodeGapping.Width; // Prepare next child left starting point
var points1 = new List<PointF>
{
new PointF(x1, y1),
new PointF(x1 - w/6, y1 + h/3.5f),
new PointF(x2 + w/6, y2 - h/3.5f),
new PointF(x2, y2),
};
g.DrawCurve(ConnectionPen, points1.ToArray(), 0.5f);
}
childImages[j].Dispose(); // Release child image as it aleady drawn on parent node's surface
}
g.Dispose();
return result;
}
}
The main function to call the drawing method is listed below. It gets
tree object returned from parser and passes it to Draw()
method of VisualAST
and
assign image returned to a PictureBox on the testing Form. That's all how
we get for the final AST drawn on screen
// Listing 6 - FormTest.cs
//
string inputString = textBoxInput.Text;
var input = new ANTLRStringStream(inputString);
var lexer = new EntityMappingLexer(input);
var tokens = new CommonTokenStream(lexer);
var parser = new EntityMappingParser(tokens);
var result = parser.prog();
var tree = result.Tree;
textBoxStringTree.Text = tree.ToStringTree();
var treeNodeDrawable = new ASTTreeNode(tree);
int center;
if (pictureBox1.Image != null)
{
pictureBox1.Image.Dispose();
pictureBox1.Image = null;
}
Image image = new VisualAST(treeNodeDrawable).Draw();
pictureBox1.Image = image;
History
- 2012.04.23 Article submitted.