Click here to Skip to main content
Click here to Skip to main content

Portable SQL with ANTLR

, 25 Feb 2008 CPOL
Rate this:
Please Sign up or sign in to vote.
Architectural overview of the MacroScope portable SQL ADO.NET provider

Introduction

In theory, SQL is a standardized language, but in practice, database vendors see too many advantages in product differentiation to standardize everything. From data types to limiting the number of rows returned by a SELECT query, different databases use various incompatible, obscure and generally irritating constructs for doing basically the same thing. That is a problem for applications which should work with more than one backend, and the usually recommended solution is to move all database access to stored procedures, implemented and maintained separately for each supported backend. When all supported backends have stored procedures and the application doesn't need dynamically generated commands, such code duplication is possible, but clearly at variance with a well-established software engineering principle. This article describes an alternative solution, implemented by a component called MacroScope.

When using MacroScope libraries, all database access is embedded in the application's database access layer, but the database commands are not duplicated manually - they're automatically transformed to a SQL dialect understood by the run-time backend. .NET is well-suited to such an architecture: a custom .NET Framework data provider can intercept the application's database access and delegate the transformed SQL to the "real" provider of a specific database backend. But, to transform the application's commands, they must be analyzed - which is where ANTLR comes in. ANTLR "provides a framework for constructing recognizers, interpreters, compilers, and translators from grammatical descriptions", so parsing SQL is well within its scope - actually the task is so common that an SQL grammar was already written and available for reuse by this project. The linked grammar is for an old version of ANTLR, implements only SELECT and is limited to Microsoft SQL Server, but extending it (for commands INSERT, UPDATE and DELETE and backends Oracle and Microsoft Access) still proved much better than starting from scratch.

ANTLR grammars are written in a variant of BNF, in a form similar to grammars for Lex and Yacc - but if you know Yacc, be warned that ANTLR differs from it in being left-recursive, like a typical hand-written parser, so traditional right-recursive grammars must be adapted to work with ANTLR.

The core of MacroScope is therefore a formal SQL grammar. It isn't used only internally, but also serves as the documentation of the SQL variant supported by the frontend, which is actually quite difficult to describe informally with any precision. It's a subset of SQL 92, with added incompatible constructs from the supported backends, so for technical details, the formal definition is indispensable. For example, the following rules describe how to write a date literal:

MAccessDateTime :
    '#' Digit Digit Digit Digit '-'
        Digit Digit '-' Digit Digit ' '
        Digit Digit ':' Digit Digit ':' Digit Digit
        '#'
    ;

Iso8601DateTime :
    Digit Digit Digit Digit '-'
        Digit Digit '-' Digit Digit ( 't' | ' ' )
        Digit Digit ':' Digit Digit ':' Digit Digit
    ;

In plain words, MacroScope supports a subset of ISO 8601, optionally (for historical reasons, because some existing application code already had them) surrounded by Microsoft Access date delimiters. Lexer rules like those above - they are lexer rules because they start with a capital letter - are the basic building blocks of parser rules, for example for the date:

educationalSimplification returns [ INode value ] :
    // subset of an MS Access-specific format
    | MAccessDateTime {
        $value = new LiteralDateTime($MAccessDateTime.text);
    }
    // subset of ISO 8601 recommended for SQL Server 2005
    | Iso8601DateTime {
        $value = new LiteralDateTime($Iso8601DateTime.text);
    }
    ;

The actually implemented rule has more branches (because it recognizes all literals, not just dates) but the same structure.

Grammar rules have associated actions, written in an imperative language - basically the language ANTLR generates, in our case C#, except the generator replaces names starting with '$' with real parser variables. Current version of ANTLR doesn't have any form of name mangling, and therefore its internal names can clash with names defined by the grammar, but it isn't hard to work around this problem by keeping the names reasonably long (or, in the case of MacroScope local variables, very short).

Grammar actions, executed during parsing whenever their rule matches, build an object model by returning objects implementing the custom INode interface (i.e. the rule above returns an instance of LiteralDateTime) and building trees from them, up to a full SQL statement. The INode interface is very simple:

public interface INode
{
    INode Clone();

    void Traverse(IVisitor visitor);
}

All transformations, including the transformation from application-specified to backed-compatible SQL, are performed by visitors. The IVisitor interface has methods accepting all classes implementing INode, called at specific points during tree traversal - for example objects that can have a subtree call the PerformBefore method before traversing that subtree and PerformAfter afterwards. Tree leaves like LiteralDateTime call a single Perform method.

Every supported backend has a dedicated visitor (called "tailor" in MacroScope) with hardcoded transformation rules - for example OracleTailor knows that Oracle doesn't have the Microsoft-specific GETDATE() or NOW() functions but can use SYSDATE instead. This rule is checked for every function call in the traversed tree:

public override void PerformBefore(FunctionCall node)
{
    string name = node.Name.ToLowerInvariant();
    if (name.Equals(TailorUtil.GETDATE) || name.Equals(TailorUtil.NOW))
    {
        ReplaceTerm(node, new DbObject(new Identifier(
            TailorUtil.SYSDATE.ToUpperInvariant())));
    }
}

After a tailor traverses the SQL tree, all known incompatibilities are removed and the command is ready to be passed to the backend. Standard ADO.NET providers naturally don't handle the tree form, but they'll accept the command as a simple string, to which the tree must be serialized (by another class of visitor, Stringifier).

Using a code generator doesn't have only advantages - apart from debugging complications, generated code significantly complicates the component's build. Visual Studio has some hooks, but for real build rules with correct dependencies, NAnt provides a much better framework, supporting not just automated build (including the code generation step) but also integrated unit tests using NUnit. On the other hand, Visual Studio's debugger comes in very handy for interactive debugging, so it's useful to keep its projects up to date. The current implementation treats the Visual Studio projects as authoritative and generates NAnt build scripts from them with XSLT. This setup unfortunately isn't stable enough to be used unchanged across projects - some details of the build stylesheets are project-specific - but for a single project it's quite stable, and it allows the use of Visual Studio IDE for all development (including adding files) except grammar changes and unit testing. Overall, MacroScope proved a success and is in use in multiple live projects.

History

  • 25th February, 2008: Initial post

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

vb3
Web Developer
Czech Republic Czech Republic
No Biography provided

Comments and Discussions

 
GeneralINSERT, UPDATE and DELETE grammar Pinmemberalhambra-eidos20-Aug-10 0:17 
GeneralRe: INSERT, UPDATE and DELETE grammar Pinmembervb320-Aug-10 0:31 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Terms of Use | Mobile
Web01 | 2.8.1411023.1 | Last Updated 25 Feb 2008
Article Copyright 2008 by vb3
Everything else Copyright © CodeProject, 1999-2014
Layout: fixed | fluid