Click here to Skip to main content
Click here to Skip to main content
Technical Blog

Tagged as

Inside Groovy

, 22 Mar 2013 CPOL
Rate this:
Please Sign up or sign in to vote.
Let’s go inside Groovy to discover how it works internally.

Introdcution

Groovy is an object-oriented programming language for the Java platform. It is a dynamic language with features similar to those of Python, Ruby, Perl, and Smalltalk. It can be used as a scripting language for the Java Platform, is dynamically compiled to Java Virtual Machine (JVM) bytecode, and interoperates with other Java code and libraries.

Let’s go inside Groovy to discover how it works internally, for that we use JArchitect. Groovy comes with many libraries like Groovy-SQL, Groovy-JSON and others, here’s the dependency structure matrix of all Groovy jars.

groovy10

The DSM (Dependency Structure Matrix) is a compact way to represent and navigate across
dependencies between components.

Inside Groovy

In this post we will focus only in the groovy compiler, and discover the compilation phases from the source code to byte code.

Step1: Generate ANTLR AST

ANTLR is a powerful parser generator for reading, processing, executing, or translating structured text or binary files. It’s widely used to build languages, tools, and frameworks. From a grammar, ANTLR generates a parser that can build and walk parse trees.

ANTLR generate from a grammar file the classes needed to generate the AST from source files, concerning Groovy two classes are generated GroovyLexer and GroovyRecognizer.

Here’s the dependency graph of org.codehaus.groovy.antlr.parser package.

groovy2

GroovyLexer inherits form CharScaner and its role is to walk thought source file tokens by invoking the nextToken method.

Let’s search what happens when nextToken is invoked.

from m in Methods where m.IsUsedBy ("org.codehaus.groovy.antlr.parser.GroovyLexer.nextToken()")
select new { m, m.NbBCInstructions }

groovy4

It depends of the token found, nextToken delegates the treatment to the corresponding method, and to be more concrete we can take the case where a comma was found:

groovy5

For each token found a GroovySourceToken is created and info like line and column are assigned to this instance. So the GroovyLexer acts as a scanner to iterate through tokens.

To generate the AST we need a parser that checks for correct syntax and builds it, it’s the role of the other generated class GroovyRecognizer.

GroovyRecognizer inherit from LLKParser from ANTLR, for more detail about these kind of parsers please refer to this article.

And to generate the ANTLR AST the compilationUnit method is used. Let’s search for all methods used by compilationUnit directly or indirectly

from m in Methods
let depth0 = m.DepthOfIsUsedBy("org.codehaus.groovy.antlr.parser.GroovyRecognizer.compilationUnit()")
where depth0 >= 0 orderby depth0
select new { m, depth0 }

groovy11

As we can observe this method use mainly the AST ANTLR classes to generate the AST nodes.

Here’s the dependency graph representing the collaboration between the GroovyLexer and the GroovyRecognizer when the main method is invoked

groovy1

After discovering the role of each class let’s take a look to their design:

Cohesion:

Both of GroovyLexer and GroovyRecognizer has many methods and fields. The GroovyLexer has 140 methods and 35 fields, and the GroovyRecognizer has 310 methods and 121 fields.

In general these kind of classes are more concerned by the low cohesion, because it’s hard to maintain a high cohesion where the number of methods and fields are high.

The single responsibility principle states that a class should have one, and only one, reason to change. Such a class is said to be cohesive. A high LCOM value generally pinpoints a poorly cohesive class. There are several LCOM metrics. The LCOM takes its values in the range [0-1]. The LCOMHS (HS stands for Henderson-Sellers) takes its values in the range [0-2]. Note that the LCOMHS metric is often considered as more efficient to detect non-cohesive types.

LCOMHS value higher than 1 should be considered alarming.

The LCOMHS of GroovyLexer is equal to 0.96814 and for GroovyRecognizer it’s equal to 0.98932.

So even if they has many methods and fields, their LCOMHS is acceptable.

Coupling

Low coupling is desirable because a change in one area of an application will require less changes throughout the entire application. In the long run, this could alleviate a lot of time, effort, and cost associated with modifying and adding new features to an application.

Using interfaces and abstract classes improve the low coupling.

In the case of GroovyLexer here’s all the classes used:

groovy12

GroovyLexer use only few interfaces and abstract classes, it’s highly coupled with ANTLR classes, the same remark concern GroovyRecognizer, but it’s not a problem for these kind of classes because there are generated by ANTLR, however for no generated classes it’s better to avoid a high coupling with other classes.

Step2: Generate Groovy AST

In the first step an ANTLR AST is generated, but Groovy use it’s own AST nodes, the next step is to convert to Groovy AST.

The AntlrParserPlugin is the responsible of this conversion, and its convertGroovy method did the job, and here’s its dependency graph:

groovy6

This method iterate through ANTL AST nodes and for each kind of node found it delegates the treatement to their corresponding methods. For example here’s what happen in the case of the statement node:

groovy7

There are many possible kind of statements (try, continue, if, while, …), and this method delegate the treatment to their corresponding methods like before. what’s make the code very easy to understand and isolate each responsibility to a specific method.

Using ANTLR is a good choice, but it’s better to isolate the using of this library to avoid a high coupling with it, what gives the flexibility to use a new version of ANTLR or even another parser generator without impacting the whole code base.

Let’s search for classes using ANTLR

from t in Types where t.IsUsing ("antlr-2.7.7") select  t

groovy13

Only few types use directly ANTLR, what’s very good if in the future another parser generator is used.

Step3: Generate code byte
To generate code byte groovy walk thought AST and create byte code. the popular technique used for almost all compilers is the use of the visitor pattern.

Motivation of using the visitor pattern

We can apply many algorithm and treatments to the AST nodes like:

  • print the AST
  • save it to XML file
  • generate byte code
  • save it to HTML file

And the visitor design pattern is a way of separating an algorithm from an object structure on which it operates. A practical result of this separation is the ability to add new operations to existing object structures without modifying those structures.

The idea is to implement an interface that contains many methods visitXXX, like for example the GroovyCodeVisitor:

groovy14

In the case of Groovy the AsmClassGenerator class is the responsible of generating code byte.

Let’s search for all its base classes:

from t in Types where t.FullName=="org.codehaus.groovy.classgen.AsmClassGenerator"
select new { t, t.BaseClasses}

groovy8

ClassCodeVisitorSupport implements the GroovyClassVisitor interface, and CodeVisitorSupport implements the GroovyCodeVisitor interface.

Extend Groovy capabilities:

Although at times, it may sound like a good idea to extend the syntax of Groovy to implement new features , most of the time, we can’t just add a new keyword to the grammar, or create some new syntax construct to represent a new concept. However, with the idea of AST (Abstract Syntax Tree) Transformations, we are able to tackle new and innovative ideas without necessary grammar changes.

When the Groovy compiler compiles Groovy scripts and classes, at some point in the process, the source code will end up being represented in memory in the form of a Concrete Syntax Tree, then transformed into an Abstract Syntax Tree. The purpose of AST Transformations is to let developers hook into the compilation process to be able to modify the AST before it is turned into bytecode that will be run by the JVM.

AST Transformations provides Groovy with improved compile-time metaprogramming capabilities allowing powerful flexibility at the language level, without a runtime performance penalty.

One hook for accessing this capability is via annotations. In your Groovy code you can make use of one of more annotations to mark a class for receiving an AST transformation during compilation.

Let’s search for all standard groovy transformations, that has the annotation GroovyASTTransformation.

from t in Types where t.HasAnnotation("org.codehaus.groovy.transform.GroovyASTTransformation")
select t

groovy15

And the user can create its own transformation to extend groovy capabilities, this feature make Groovy very flexible and powerful.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

James_Carter
Software Developer (Senior)
United States United States
CppDepend lead developer.

Comments and Discussions

 
-- There are no messages in this forum --
| Advertise | Privacy | Terms of Use | Mobile
Web01 | 2.8.141216.1 | Last Updated 22 Mar 2013
Article Copyright 2013 by James_Carter
Everything else Copyright © CodeProject, 1999-2014
Layout: fixed | fluid