Parser in JavaScript






4.71/5 (9 votes)
Universal text parsing by given grammar
Moony Parser 1.0 - shift-reduce algorithm implemented Moony Parser 2.0 - implemented Earley parsing algorithm Moony Parser 2.1 - bug fixes Moony Parser 2.2 - implemented “@AnyCharacter” variable - implemented substraction operator “\” - building grammar with javascript functions is marked obsolete. Preferred method is passing grammar string to compile function - changed returned parsed tree internal structure for memory and performance improvements - fixed recursive function call stack overflow error for big texts. Only memory is a limit now - cool grammar tester environment included in download bundle
1. Introduction
Parsing texts is something that every programmer encounters sooner or later. Word “parsing” in programming world denotes extraction of “abstract syntax tree” from textual input (string or file). For textual input that has some internal structure, we can use parsers to arrange fragments of that textual input over resulting syntax tree. Once built from input text, syntax tree is giving us an easy access to each syntax fragment that builds input text.
Date ( Day {@Number}, "/", Month {@Number}, "/", Year {@Number} )
Date | +---+----+----+---+ | | | | | Day “/” Month “/” Year | | | “20” “10” “2013”
Although usual date parsing would be done with three simple substring functions, we used the date example for a sake of explanation simplicity. In the same way that date is parsed, we can parse any kind of text, assuming that we provide correct grammar. These grammars can include boolean comparison definition, math expression definition or even definitions for entire programming languages.
2. About Moony Parser
3. Moony Grammar
Grammars in Moony parser are built from blocks that denote sequences (represented as tuples), choices (options determined at parsing time), variables, constants and regular expressions.
Person ( Name {@Word}, @WhiteSpace, Surname {@Word} )
This grammar succeeds at parsing strings like “John Doe”, but not “John Doe Smith”.
Vehicle {"car" | "ship" | "plane"}
In parsing time this grammar will succeed weather parsed string says “car”, “ship” or “plane”.
@CR @LF @Blank @WhiteSpace @AnyCharacter @Number @Word @Variable @String @RegExp @Null
Person ( Name {@Word}, Next {(@WhiteSpace, @Person) | @Null} )
In this example, besides built in variables we have also used a variable @Person that refers recursively to a tuple Person. Grammar in previous example can parse a name followed by any number of middle names and surnames. Variables can also make use of dots, giving us a way to denote a path to the final variable, in which case a variable “@Person.Name” would refer to its respective content.
VariableDeclaration ( Type { "string" | 'int' | "decimal" }, @WhiteSpace, Name {@Variable}, EOL {';'} )
This grammar can parse strings like “string name;” or “int num1;”.
AnyCaseWords { /[a-z]+/i, Next {(@WhiteSpace, @AnyCaseWords) | "!"} }
4. Keyword “abstract”
AbstractTest { abstract Str {"parse me there, not here"} | Test1 ("now ", @AbstractTest.Str) | Test2 ("again ", @AbstractTest.Str) }
5. Left recursion
Sum { Fact { Exp ( {@WhiteSpace | @Null}, {'-' | @Null}, Value { Num {@Number} | Var {@Variable} | Bra (Left {'('}, In {@Sum}, Right {')'}) }, {@WhiteSpace | @Null} ) | MulDiv (Left {@Fact}, In {'*' | '/'}, Right {@Exp}) } | AddSub (Left {@Sum}, In {'+' | '-'}, Right {@Fact}) }
6. Substractions
Str ( '"', Value ( Char {@AnyCharacter} \ '"', Next {@Value | @Null} ), '"' )
7. Programming interface
Grammar is defined in string format and it should be firstly compiled before parsing any text. Once when compile succeeds, we can pass “compiled” property of a result to parsing function that actually parses text. After parsing a text against compiled grammar rules, a syntax tree is returned.
// this is synchronous call to compiler and parser var rules = parser.compileSync (grammarString); if (rules.errors.length === 0) { var tree = parser.parseSync (rules.compiled, stringToParse); if (tree.errors.length === 0) { // to do something with tree.parsed } else { alert ("Text error count: " + tree.errors.length); } } else { alert ("Grammar error count: " + rules.errors.length); }
// this is asynchronous call to parser parser.compileAsync (grammarString, function (rules) { if (rules.errors.length > 0) { alert ("Grammar error count: " + tree.errors.length); } else { parser.parseAsync (rules.compiled, text, function (tree) { if (tree.errors.length > 0) { alert ("Text error count: " + tree.errors.length); } else { // to do something with tree.parsed } }, function (progress) { progresPercent = Math.round(progress * 100) + "%"; return true; } ) } }, function (progress) { progressPercent = Math.round (progress * 100) + "%"; return true; } );
- grammar: string where a grammar is stored in textual format, - done function: this function is called when parsing grammar is done. It takes compiled grammar or errors as parameter - progress function: this function is repeatedly called while parsing executes. If it returns false, parsing breaks
- rules: top rule - text: text to parse by rules - done function: this function is called when parsing text is done. It takes syntax tree or errors as a parameter - progress function: this function is repeatedly called while parsing executes. If it returns false, parsing breaks
- if a node is a token: - “value” property holds a string as a fragment of parsed text - if a node is a choice: - “value” property holds a parsed sub-tree - if a node is a tuple: - “value” property contains an array of tuple attributes’ sub-trees - in all cases - “parsedBy” property holds a rule by which a node is parsed, where: - “name” property holds a name of the rule - “type” property holds one of strings: “token”, “tuple” or “choice” - properties “from” and “to” hold integer positions of parsed node inside parsed string
function pTree (tree) { var tmp; if (tree.parsedBy.type === "choice") { alert (tree.parsedBy.name) pTree (tree.value); } else if (tree.parsedBy.type === "tuple") { alert (tree.parsedBy.name) for (var i = 0; i < tree.value.length; i++) { pTree (tree.value[i]); } } else if (tree.parsedBy.type === "token") { alert (tree.value); } } pTree (tree.parsed);
9. Errors
tree.errors[x] = { absolutePosition: ...int..., line: ...int..., column: ...int..., description: ...string... }
10. Conclusion
Moony parser is an “Earley” parser which makes it context free grammar complete. It has no problems with nullable grammars, at least none I’ve noticed. I guess it will find uses in parsing texts like database search engine queries or parsing textual file formats through AJAX calls. Basically, for parsing large files only amount of memory is the limit. 2 gigs of ram enables parsing only few thousands lines of text and bigger files tend to use virtual memory which degrades performance a lot.
Good luck!