Summary
A simple implementation of an INI file reader using the boost::spirit
framework.
Introduction
This article details the usage of the spirit framework being used as the parser for a simple library that reads and writes INI files. It is not intended as a crash course in parsing or using the spirit
library, but it can be used as an example for such purposes.
Background
Applications usually require some means of storing/retrieving configuration data; on UNIX system it is customary to use configuration files, on Microsoft Windows, the registry is most commonly used. However, sometimes this can be inconvenient as registry cannot be treated as a file, so backup, sharing, source control systems can have problems dealing with it. Using a configuration file can solve those problems, but there is no painless way of doing that - while writing the settings can be done easily, reading them back might become a problem if the file will be user edited.
The problem with reading usually lies in white spaces, comments and optional items in the file structure that can make the parsing code obscure and difficult to maintain. Therefore, the need of a specialized parser is quite obvious when one takes into account ease of maintenance.
Probably most known parser generators are flex and yacc, however there are some others available. For this purpose we will use the spirit
framework, part of the boost
library. The boost::spirit
framework differentiates from other parser generators by couple of features. First, it is a fully object oriented lexer and parser. It is implemented as a C++ template library, using overloaded operators to describe the grammar in a BNF like language. As the grammar is described in actual C++ code, and the parser is generated through template instantiation, there is no pre-compile step in your application to generate C code from the grammar definition, nor is any executable required for that purpose. Second, it parses LL grammars (vs. LALR(1) for lex/yacc). Third, it provides some useful predefined parsers, and as well some predefined semantic actions (called actors in spirit).
Please note that because of the heavy template usage, the spirit
library is supported by only a limited number of compilers. This code was tested with Visual Studio 8. In order to compile the code you will need to download and install the boost library from here.
Requirements for the library
We'd like to have the library parse a file like the one below, being able to recognize C-style comments
, or line comments starting with ; #
or
.
Each category will have it's name enclosed in square brackets and can contain only letters, numbers and underscore. Each category can have an arbitrary number of entries, which have both a name and a value. For the name, same conventions apply for as the category name.
[Cat1]
name=100
name=dhfj dhjgfd
/*
[Cat_2_bak]
UsagePage=9
Usage=19
Offset=0x1204
*/
[Cat_3]
UsagePage=12
Usage=39
//Usage2=39
;Usage3=39
#Usage4=39
Offset=0x12304
Grammar definition
We'll define first some helper rules
rule<> char_ident_start = alpha_p | ch_p('_') ;
rule<> char_ident_middle = alnum_p | ch_p('_') ;
rule<> ident = char_ident_start >> * char_ident_middle ;
rule<> char_start_comment = ch_p('#') | ch_p(';') | str_p("//");
rule<> blanks_p = * blank_p;
rule<> value_p = * ( alnum_p | blank_p | punct_p );
ident
stands for identifier - it is either a category name or entry name. Note it is composed by a start character followed by an arbitrary number of "middle" characters. Start characters are defined as either a letter or an underscore, middle characters are either letters, numbers or underscore.
value_p
stands for an entry value - it is any combination of letters, numbers, spaces and punctuation characters.
blanks_p
is a white space eater - it will match any number of white spaces.
Please note the use of predefined parsers - like alpha_p
, alnum_p
, blank_p
or punct_p
or ch_p
.
Now let's deal with empty lines and comments
rule<> l_comment = blanks_p >> char_start_comment >> * print_p >> eol_p;
rule<> l_empty = blanks_p >> eol_p;
rule<> c_comment_rule = confix_p("/*", *anychar_p, "*/");
rule<> b_comment =
blanks_p >>
c_comment_rule >>
blanks_p >>
eol_p
;
Quite easily, we define a line comment as something that starts with any number of spaces, then has one of the start comment characters, then has any number of printable characters followed by a line end character.
Same for an empty line, we have any number of spaces followed by a line end character.
For the C-style comment, we use the provided confix_p
parser, and require that there is only white space before and after the comment (up to line boundaries).
Here is the structure that holds the INI file in memory - and it is filled by the parser
struct Entry
{
string name;
string value;
};
struct Category
{
string name;
vector<Entry> entries;
};
typedef vector<Category> CategoryList;
class CIniFile
{
CategoryList m_file;
int m_crt_category;
int m_crt_entry;
}
Now let's describe the category
rule<> l_category =
blanks_p >>
ch_p('[') >>
blanks_p >>
ident [ addCategory ] >>
blanks_p >>
ch_p(']') >>
blanks_p >>
eol_p
;
Quite simple, we have a ident
surrounded by square brackets and eat the spaces. However, when we encounter a category, we need to take some action (save it someplace), so we have to provide a functor that will handle it - in our case addCategory
. The functor is defined as follows:
struct AddCategory
{
CategoryList * m_file;
void operator() (char const* str, char const* end) const
{
string name(str, end);
Category cat;
cat.name = name;
m_file->push_back(cat);
}
};
Describing the entry
rule<> l_entry =
blanks_p >>
ident [ addName ] >>
blanks_p >>
ch_p('=') >>
blanks_p >>
value_p [ addValue ] >>
blanks_p >>
eol_p
;
Same as category, we use the previously defined parsers, and we need to provide two functors as semantic actions:
struct AddName
{
CategoryList * m_file;
void operator() (char const* str, char const* end) const
{
string name(str, end);
int last_cat_id = m_file->size() - 1;
if (last_cat_id == -1)
{
Category cat;
cat.name = "Globals";
m_file->push_back(cat);
last_cat_id = 0;
}
Entry entry;
entry.name = name;
m_file->at(last_cat_id).entries.push_back(entry);
}
};
struct AddValue
{
CategoryList * m_file;
void operator() (char const* str, char const* end) const
{
string value(str, end);
int last_cat_id = m_file->size() - 1;
int last_entry_id = m_file->at(last_cat_id).entries.size() - 1;
if (last_entry_id == -1)
{
Entry entry;
entry.name = "";
m_file->at(last_cat_id).entries.push_back(entry);
last_entry_id = 0;
};
m_file->at(last_cat_id).entries[last_entry_id].value = value;
}
};
Almost there - last steps
We are almost done, we just need to piece together all the rules and call the parser.
rule<> lines = l_comment | b_comment | l_category | l_entry | l_empty;
rule<> ini_file = lexeme_d [ * lines ] ;
int errcode = parse(buffer, ini_file).full;
The lexeme_d
directive will turn off white space skipping.
CIniFile class
We described the parser of the INI file, but we are still away from having a library. The CIniFile
class needs functions like Load
, Save
, GetEntry
and so on.
Here is the complete interface for the class:
class CIniFile
{
CategoryList m_file;
int m_crt_category;
int m_crt_entry;
private:
int b_CategoryIsValid();
int b_EntryIsValid();
public:
CIniFile();
void ClearAll();
int m_status;
void LoadFile(string filename);
void SaveToFile(string filename);
int GetNumCategories();
void SetCategory(int cat);
int SetCategory(string name);
string GetCategoryName();
void AddCategoryUnique(string name);
void AddCategory(string name);
int GetNumEntries();
void SetEntry(int entry);
int SetEntry(string name);
int SetEntryByValue(string value);
string GetEntryName();
string GetEntryValue();
string GetEntryValueByName(string name);
void AddEntryUnique(string name, string value);
void AddEntry(string name, string value);
void ChangeEntry(string name, string value);
void SetValue(string value);
void DeleteEntry(string name);
void DeleteEntry(int entry);
void DeleteByValue(string value);
};
Implementation of these functions is straightforward and is included in the demo source project.
Conclusions
As you can see, it was quite easy to build a custom INI file reader using the spirit
library, and the resulting code is quite clean and very easy to extend, change and maintain. Not only that, but it also requires just a simple installation of the boost
library, no other external tools or complicated build dependencies.
Parsing data from a buffer is quite a common task, and the spirit
parser makes parsing quite easy. Although it has many advantages, the spirit
has some drawbacks: compiler support is limited, compile times are quite long, sometimes error messages are cryptic (however I did not as yet use the debugging interface provided). As time goes, and templates are being properly implemented in C++ compilers, spirit
will be supported on more platforms, so if your compiler supports it, I'd definitely recommend you use it.