NetBase is a C# project that creates and reads DBF-like files. It supports a tiny subset of SQL to create, insert, and select data from a table. This small project has been an interesting crutch for writing a basic parser. The project also includes a small utility that provides a front-end.
See the first part of this two-part series for more information:
- Introduction and a look at the DBF file format (the storage namespace)
- The SQL parser and query builder (the
sql namespace) - this article
This code is also hosted in github at http://github.com/buttonpusher/NetBase/.
Before we start, to follow through the examples, you should open up the
NetBase.Viewer application and run the following SQL statements to set up a table ready for the examples below.
CREATE TABLE part2 (first,second,third)
INSERT INTO part2 (first,second,third) VALUES (1,2,3)
INSERT INTO part2 (first,second,third) VALUES (4,5,6)
INSERT INTO part2 (first,second,third) VALUES (7,8,9)
INSERT INTO part2 (first,second,third) VALUES (4,10,11)
The process of parsing involves building a data structure that represents the meaning of a text string. In the
NetBase application, this data structure then determines what actions need to be made on the database to give the desired results. The parser in
NetBase uses techniques very similar to those described by Jack Crenshaw's "Let's Build a Compiler". This type of parser is called a "recursive descent parser". There is a lot of theory about parsing and compiling; however, we can take a quick overview at the basics behind this kind of parsing, in this case, in the context of SQL.
The idea is that each SQL statement can be described as a tree structure. The root node of the tree encompasses the whole string:
SELECT third FROM part2 WHERE first = 4
The first layer of the tree divides the whole string into meaningful divisions. In the case of
NetBase, this string would be divided into two branches of the tree, the
SELECT third FROM part2
WHERE first = 4
The idea of a recursive descent parser is that each node in the tree - subdivision of the string - is parsed by a function, which calls other functions that parse the child nodes of the tree. This is the descending aspect. The recursive aspect is simply that functions may eventually call themselves, although it is worth noting that this does not happen in
NetBase's simplistic version of SQL.
The code involved in parsing our
SELECT statement in
NetBase is as follows - it has been reorganized to better illustrate this particular example.
private void Expression()
if (tokenizer.Token == "SELECT")
else if (tokenizer.Token == "INSERT")
else if (tokenizer.Token == "CREATE")
throw new QuerySyntaxException("Syntax Error: Expected \"select\";" +
" \"insert\"; or \"create table\" clause");
private void SelectExpression()
Query = new SelectQuery();
Query.TableName = tokenizer.Token;
if (tokenizer.Token == "JOIN")
if (tokenizer.Token == "WHERE")
private void FieldList()
if (tokenizer.Token == "*")
((SelectQuery)this.Query).SelectAll = true;
if (tokenizer.Token == ",") tokenizer.Match(",");
} while (tokenizer.Token == ",");
private void WhereClause()
if (tokenizer.Token == "AND") tokenizer.Match("AND");
string Field = tokenizer.Token;
string value = tokenizer.Token;
Where w = new Where();
w.Field = Field;
w.Value = value;
} while (tokenizer.Token == "AND");
It is possible to see from this code snippet that, for example, the
WHERE clause supports only
AND and not
OR. Note that the
JOIN functionality is incomplete.
It may be useful to step through this in a debugger to see how the internal representation of the query is constructed. The
QueryBuilder class is very large, and it would definitely be possible to improve the design of this part of the database.
Using the Parsed Representation to Run the Query
The parsed representation of the query above is now wholly contained in an instance of
SelectQuery. From here, we can determine what is required to return the right results. This is done by the
Selector class in the executive namespace.
Execute function of this class starts by allocating a
Storage.MemoryTable instance to store the results of the query. It then constructs the set of columns required from the
Fields function (which contains the set of fields in the
SELECT clause of the original query string).
MemoryTable nt = new MemoryTable();
ITable t = db.Tables[Query.TableName];
foreach (Column c in t.Columns)
foreach (Join j in Query.Joins)
ITable jt = db.Tables[j.TableName];
foreach (Column c in jt.Columns)
foreach (string s in Query.Fields)
Column c = t.Columns.Find(o => o.Name == s);
The next step is to move through the rows of the source table, using the
Filter method of the
sql.Where class to determine which rows should be included. If a row should be included, then the contents of the appropriate columns are copied into a new row, which is then added to the
Row r = t.NextRow();
bool pass = true;
foreach (Where w in Query.Wheres)
pass = false;
Row nr = new Row(nt);
Note that in the above snippet, I have excised a section of unfinished code which attempts to implement joins, because this functionality is incomplete and irrelevant to this article.
nt now contains the results of the query, and can be returned up through the API to the front-end, or perhaps in a later project, sent across the network to a client application. The results of the query as displayed in the
NetBase.Viewer application will be:
The comments about indexes are referring to the inefficient way in which every row in the table is checked to see if it matches the
where clause. Modern RDBMSs use - amongst other techniques - data structures like b-trees to efficiently seek for matching rows, and therefore do not always have to scan through the entire table.
There are a lot of additional subprojects that could be interesting to work on with
- Implementing the
- Implementing a network layer
- Enabling concurrent access (locking)
- Adding b-trees, indexes, and other performance optimisations
Perhaps at some point, I or someone else will undertake these projects; they'd be interesting by themselves.