Click here to Skip to main content
Click here to Skip to main content

Tagged as

Schemeer, a Semi XML and JSON Handler

, 22 Jan 2013 CPOL
Rate this:
Please Sign up or sign in to vote.
It is a lib that can process XML documents or JSON by changing a config file.

Introduction

This article explains an application that can process any tree structure of strings, such as an XML document or a JSON document. My application needs a scheme file and a config file to define the target document structure. 

Using the code

I  abstract the input document into a tree structure combined by tree nodes. Each node contains a scheme phrase and each phrase is combined by words. The word is a string vector. Each input char of the input document has an input type before it is saved in the word. There is the input type: 

//
// The input object struct
//
enum ObjectType{
    NORMAL_WORD,
    KEY_WORD,
    INVALID_WORD,
    SEPARATE_LETTER,
    SPECIAL_LETTER,
    IGNORE_LETTER,
    NORMAL_LETTER,
    INVALID_LETTER,
    SCHEME_PHRASE
};  

The type has key word, invalid word, special letter, ignore letter, invalid letter, and so on. These letters and words are defined in the config file which is input at the initialization stage of our application. They are saved in a Dictionary class. 

//The init function

bool Dictionary::init(std::string configFile){
 
    FILE *file = NULL;
    fopen_s(&file,configFile.c_str(),"r");
    if (file == NULL) return false;

    insertWords(file,KEY_WORD);
    insertLetters(file,SPECIAL_LETTER);
    insertLetters(file,SEPARATE_LETTER);
    insertLetters(file,IGNORE_LETTER);
    insertLetters(file,INVALID_LETTER);

    fclose(file);
    return true;
} 

Each line of the config file is a type of input object and they are separated by ','. After filtering the input letters and constructing them into words, a file analysis class is used to analyze the input strings and organize them into a tree model. The file analysis class uses the Scheme class to match the input document. The match equation is defined in a scheme file.

//Init Scheme class
void Scheme::init(std::string config){
    if (dictionary == NULL) return;
    FILE *file = NULL;
    fopen_s(&file,config.c_str(),"r");
    while(file != NULL && !feof(file)){
        std::string sentence;
        readSentence(file,sentence);
        if (sentence.empty() == true) continue;
        SchemePhrase equation;
        unsigned int offset = 0;
        analyzeSentence(sentence,offset,&equation);
        equations.push_back(equation);
    }
    if (file != NULL) fclose(file);
}  
//Analyze the  input object into tree elements</span> 
void FileAnalysis::analyze(){
    std::vector<InputObject*> elements;
    while(true){
        InputObject* object = fileReaders->getNextObject();
        if (object == NULL) break;
        elements.push_back(object);
    }
    std::vector<Memo> &process = scheme->match(elements);
    TreeNode *curNode = tree->getRoot();
    for(unsigned int p = 0; p < process.size(); ++p){
        Memo &m = process[p];
        if (m.operation->getType() == SCHEME_PHRASE){
            SchemePhrase * phrase = dynamic_cast<SchemePhrase*>(m.operation);
            if (phrase->getOperation() == UPCHILD){
                curNode = curNode->getFather();
            }else if (phrase->getOperation() == DOWNCHILD){
                curNode = curNode->getLastChild();
            }
        }else if (m.operation->getType() == NORMAL_WORD){
            curNode->makeChild(dynamic_cast<Word*>(m.operation)->getContent(),
                dynamic_cast<Word*>(m.object)->getContent());
        }
    }
}

The tree node uses a component design pattern. Each of them has a value and a name parameter and has the pointers to its children which are the same type. 

class BaseNode
{
public:
    BaseNode(void);
    ~BaseNode(void);


    void setName(std::string n){ name = n; }
    void setValue(std::string v){ value = v;}

    std::string getName(){return name;}
    std::string getValue(){return value;}
private:
    std::string name;
    std::string value;
};
class TreeNode :
    public BaseNode
{
public:
    TreeNode(TreeNode *father = NULL,std::string name="", std::string value = "");
    virtual ~TreeNode(void);

    TreeNode* getNode(std::string name, std::string value = "");

    void addNode(TreeNode node){elements.push_back(node);}
    TreeNode* makeChild(std::string name, std::string value = "");
    TreeNode* getFather();
    TreeNode* getLastChild();

    TreeNode* getElement(int i){return &elements[i];}
    unsigned int size(){return elements.size();}

    std::vector<TreeNode*> getNodes(std::string name, std::string value = "");
    std::string get(std::string name);
    std::vector<std::string> gets(std::string name);

    void clear();
private:
    TreeNode *father;
    std::vector<TreeNode> elements;
}; 

After processing the input document, the node of the tree can be got by the name value.

std::vector<TreeNode*> TreeNode::getNodes(std::string name, std::string value){
    std::vector<TreeNode *> result;
    for(unsigned int i = 0; i < elements.size(); ++i){
        if (elements[i].getName() == name && (value == "" || elements[i].getValue() == value)){
            result.push_back(&(elements[i]));
        }
    }
    return result;
} 

Examples

To init the schema file and the config file:

ModelTree tree;
Dictionary dic;
Scheme sch(&dic);
FileAnalysis analysis;
TreeNode * root;

analysis.setDictionary(&dic);
analysis.setScheme(&sch);
analysis.setTree(&tree);

dic.init("config\\config.txt");
sch.init("config\\scheme.txt");
analysis.setFile("input.txt");
analysis.analyze(); 

To process the input document and print them: 

root = tree.getRoot();
printNode(root,0); 

Then process the JSON document. The input config:

meta;
44;125;93;123;91;34;58;
32;9;10;
0;1;2;3;4;5;6;7;8;11;12;13;14;15;16;17;18;19;20;21;22;23;24;25;26;27;28;29;30;31;127;
;

The input schema: 

;{"element":\d ["value"|\[ self [, self]* \]]1 \u [,"element":\d ["value"|\[ self [, self]* \]]1 \u]* }

The input file:

{ "programmers": [
{ "firstName": "Brett", "lastName":"McLaughlin", "email": "be@gmail.com" },
{ "firstName": "Jason", "lastName":"Hunter", "email": "abc@gmail.com" },
{ "firstName": "Elliotte", "lastName":"Harold", "email": "ell@163.com" }
],
"authors": [
{ "firstName": "Isaac", "lastName": "Asimov", "genre": "sciencefiction" },
{ "firstName": "Tad", "lastName": "Williams", "genre": "fantasy" },
{ "firstName": "Frank", "lastName": "Peretti", "genre": "christianfiction" }
],
"musicians": [
{ "firstName": "Eric", "lastName": "Clapton", "instrument": "guitar" },
{ "firstName": "Sergei", "lastName": "Rachmaninoff", "instrument": "piano" }
] } 

The output:

Then process the XMLdocument

The input config:

meta;
60;61;62;63;92;34;47;
32;9;10;
0;1;2;3;4;5;6;7;8;11;12;13;14;15;16;17;18;19;20;21;22;23;24;25;26;27;28;29;30;31;127;
;

The input schema: 

;<element\d [attribute[="\dvalue\u"]1]* >[value|self]*\u</element>
;<element\d [attribute[="\dvalue\u"]1]* \u/>

The input file:

<book title="facebook" >
<auther><lxdfigo name="lxd" >
<age value="16" />
</lxdfigo></auther>
<price value="216.0" />
<date value="2012-2-3" />
</book>
<book2 title="facebook" >
<auther><lxdfigo name="lxd" >
<age value="16" />
</lxdfigo></auther>
<price value="216.0" />
<date value="2012-2-3" />
</book2>
<book title="facebook" >
<auther><lxdfigo name="lxd" >
<age value="16" />
</lxdfigo></auther>
<price value="216.0" />
<date value="2012-2-3" />
</book>
<book2 title="facebook" >
<auther><lxdfigo name="lxd" >
<age value="16" />
</lxdfigo></auther>
<price value="216.0" />
<date value="2012-2-3" />
</book2> 

The output:

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

lxdfigo
Student Institute of Software, Chinese Academy of Sciences
China China
No Biography provided
Follow on   Google+

Comments and Discussions

 
-- There are no messages in this forum --
| Advertise | Privacy | Terms of Use | Mobile
Web04 | 2.8.141220.1 | Last Updated 22 Jan 2013
Article Copyright 2012 by lxdfigo
Everything else Copyright © CodeProject, 1999-2014
Layout: fixed | fluid