Click here to Skip to main content
15,867,330 members
Articles / Operating Systems / Windows
Article

INI file reader using the spirit library

Rate me:
Please Sign up or sign in to vote.
4.90/5 (31 votes)
3 Jan 2006CPOL5 min read 105.4K   1.6K   45   15
A simple implementation of an INI file reader using the boost::spirit framework

Summary

A simple implementation of an INI file reader using the boost::spirit framework.

Introduction

This article details the usage of the spirit framework being used as the parser for a simple library that reads and writes INI files. It is not intended as a crash course in parsing or using the spirit library, but it can be used as an example for such purposes.

Background

Applications usually require some means of storing/retrieving configuration data; on UNIX system it is customary to use configuration files, on Microsoft Windows, the registry is most commonly used. However, sometimes this can be inconvenient as registry cannot be treated as a file, so backup, sharing, source control systems can have problems dealing with it. Using a configuration file can solve those problems, but there is no painless way of doing that - while writing the settings can be done easily, reading them back might become a problem if the file will be user edited.

The problem with reading usually lies in white spaces, comments and optional items in the file structure that can make the parsing code obscure and difficult to maintain. Therefore, the need of a specialized parser is quite obvious when one takes into account ease of maintenance.

Probably most known parser generators are flex and yacc, however there are some others available. For this purpose we will use the spirit framework, part of the boost library. The boost::spirit framework differentiates from other parser generators by couple of features. First, it is a fully object oriented lexer and parser. It is implemented as a C++ template library, using overloaded operators to describe the grammar in a BNF like language. As the grammar is described in actual C++ code, and the parser is generated through template instantiation, there is no pre-compile step in your application to generate C code from the grammar definition, nor is any executable required for that purpose. Second, it parses LL grammars (vs. LALR(1) for lex/yacc). Third, it provides some useful predefined parsers, and as well some predefined semantic actions (called actors in spirit).

Please note that because of the heavy template usage, the spirit library is supported by only a limited number of compilers. This code was tested with Visual Studio 8. In order to compile the code you will need to download and install the boost library from here.

Requirements for the library

We'd like to have the library parse a file like the one below, being able to recognize C-style comments /* */, or line comments starting with ; # or //.

Each category will have it's name enclosed in square brackets and can contain only letters, numbers and underscore. Each category can have an arbitrary number of entries, which have both a name and a value. For the name, same conventions apply for as the category name.

[Cat1]
name=100
name=dhfj dhjgfd

/*
[Cat_2_bak]
UsagePage=9
Usage=19
Offset=0x1204
*/

[Cat_3]
UsagePage=12
Usage=39
//Usage2=39
;Usage3=39
#Usage4=39
Offset=0x12304

Grammar definition

We'll define first some helper rules

C++
rule<> char_ident_start = alpha_p | ch_p('_') ;
rule<> char_ident_middle = alnum_p | ch_p('_') ;
rule<> ident = char_ident_start >> * char_ident_middle ;
rule<> char_start_comment = ch_p('#') | ch_p(';') | str_p("//");
rule<> blanks_p = * blank_p;
rule<> value_p = * ( alnum_p | blank_p | punct_p );

ident stands for identifier - it is either a category name or entry name. Note it is composed by a start character followed by an arbitrary number of "middle" characters. Start characters are defined as either a letter or an underscore, middle characters are either letters, numbers or underscore.
value_p stands for an entry value - it is any combination of letters, numbers, spaces and punctuation characters.
blanks_p is a white space eater - it will match any number of white spaces.
Please note the use of predefined parsers - like alpha_p, alnum_p, blank_p or punct_p or ch_p.

Now let's deal with empty lines and comments

C++
rule<> l_comment = blanks_p >> char_start_comment >> * print_p >> eol_p;
rule<> l_empty = blanks_p >> eol_p;
rule<> c_comment_rule = confix_p("/*", *anychar_p, "*/");

rule<> b_comment =
                blanks_p >>
                c_comment_rule >>
                blanks_p >>
                eol_p
;

Quite easily, we define a line comment as something that starts with any number of spaces, then has one of the start comment characters, then has any number of printable characters followed by a line end character.
Same for an empty line, we have any number of spaces followed by a line end character.
For the C-style comment, we use the provided confix_p parser, and require that there is only white space before and after the comment (up to line boundaries).

Here is the structure that holds the INI file in memory - and it is filled by the parser

C++
struct Entry
{
    string name;
    string value;
};

struct Category
{
    string name;
    vector<Entry> entries;
};

typedef vector<Category> CategoryList;

class CIniFile
{
    CategoryList m_file;
    int m_crt_category;
    int m_crt_entry;
}

Now let's describe the category

C++
rule<> l_category =
                blanks_p >>
                ch_p('[') >>
                blanks_p >>
                ident [ addCategory ] >>
                blanks_p >>
                ch_p(']') >>
                blanks_p >>
                eol_p
;

Quite simple, we have a ident surrounded by square brackets and eat the spaces. However, when we encounter a category, we need to take some action (save it someplace), so we have to provide a functor that will handle it - in our case addCategory. The functor is defined as follows:

C++
struct AddCategory
{
    CategoryList * m_file;
    void operator() (char const* str, char const* end) const
    {
        string name(str, end);
        Category cat;
        cat.name = name;
        m_file->push_back(cat);
    }
};

Describing the entry

C++
rule<> l_entry =
                blanks_p >>
                ident [ addName ] >>
                blanks_p >>
                ch_p('=') >>
                blanks_p >>
                value_p [ addValue ] >>
                blanks_p >>
                eol_p
;

Same as category, we use the previously defined parsers, and we need to provide two functors as semantic actions:

C++
struct AddName
{
    CategoryList * m_file;
    void operator() (char const* str, char const* end) const
    {
        string name(str, end);
        int last_cat_id = m_file->size() - 1;
        if (last_cat_id == -1)
        {
            Category cat;
            cat.name = "Globals";
            m_file->push_back(cat);
            last_cat_id = 0;
        }
        Entry entry;
        entry.name = name;
        m_file->at(last_cat_id).entries.push_back(entry);
    }
};

struct AddValue
{
    CategoryList * m_file;
    void operator() (char const* str, char const* end) const
    {
        string value(str, end);
        int last_cat_id = m_file->size() - 1;
        int last_entry_id = m_file->at(last_cat_id).entries.size() - 1;
        if (last_entry_id == -1)
        {
            Entry entry;
            entry.name = "";
            m_file->at(last_cat_id).entries.push_back(entry);
            last_entry_id = 0;
        };
        m_file->at(last_cat_id).entries[last_entry_id].value = value;
    }
};

Almost there - last steps

We are almost done, we just need to piece together all the rules and call the parser.

C++
rule<> lines = l_comment | b_comment | l_category | l_entry | l_empty;
rule<> ini_file =  lexeme_d [ * lines ] ;
int errcode = parse(buffer, ini_file).full;

The lexeme_d directive will turn off white space skipping.

CIniFile class

We described the parser of the INI file, but we are still away from having a library. The CIniFile class needs functions like Load, Save, GetEntry and so on.

Here is the complete interface for the class:

C++
class CIniFile
{
    CategoryList m_file;
    int m_crt_category;
    int m_crt_entry;
private:
    int b_CategoryIsValid();
    int b_EntryIsValid();

public:
    CIniFile();
    void ClearAll();

    int m_status;
    void LoadFile(string filename);
    void SaveToFile(string filename);

    int GetNumCategories();
    void SetCategory(int cat);
    int SetCategory(string name);
    string GetCategoryName();
    void AddCategoryUnique(string name);
    void AddCategory(string name);

    int GetNumEntries();
    void SetEntry(int entry);
    int SetEntry(string name);
    int SetEntryByValue(string value);

    string GetEntryName();
    string GetEntryValue();
    string GetEntryValueByName(string name);
    void AddEntryUnique(string name, string value);
    void AddEntry(string name, string value);
    void ChangeEntry(string name, string value);
    void SetValue(string value);

    void DeleteEntry(string name);
    void DeleteEntry(int entry);
    void DeleteByValue(string value);
};

Implementation of these functions is straightforward and is included in the demo source project.

Conclusions

As you can see, it was quite easy to build a custom INI file reader using the spirit library, and the resulting code is quite clean and very easy to extend, change and maintain. Not only that, but it also requires just a simple installation of the boost library, no other external tools or complicated build dependencies.

Parsing data from a buffer is quite a common task, and the spirit parser makes parsing quite easy. Although it has many advantages, the spirit has some drawbacks: compiler support is limited, compile times are quite long, sometimes error messages are cryptic (however I did not as yet use the debugging interface provided). As time goes, and templates are being properly implemented in C++ compilers, spirit will be supported on more platforms, so if your compiler supports it, I'd definitely recommend you use it.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Software Developer
Canada Canada
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Comments and Discussions

 
QuestionCool solution. How about the license? Pin
Bill David18-Aug-08 0:51
Bill David18-Aug-08 0:51 
AnswerRe: Cool solution. How about the license? Pin
Silviu Simen18-Aug-08 4:46
Silviu Simen18-Aug-08 4:46 
GeneralRe: Cool solution. How about the license? Pin
Bill David18-Aug-08 16:16
Bill David18-Aug-08 16:16 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.