Tokenizer and analyzer package supporting precedence prioritized rules

Alexander Berthold

Rate me:

5.00/5 (4 votes)

1 Jan 20023 min read

181.5K

2.8K

A library allowing you to conveniently build a custom tokenizer and analyzer supporting precedence priorized rules

pkgcomplete.zip
- COMMON.CPP
- COMMON.H
- common_error.h
- common_win32.h
- COPYING
- cpAbsd.dsw
- cpp-grammar.txt
- cxAnalyzer
  - ANALYZER.H
  - cxAnalyzer.dsp
  - cxAnalyzerException.h
  - cxAnalyzerExpression.cpp
  - cxAnalyzerExpression.h
  - cxAnalyzerMain.cpp
  - cxAnalyzerMain.h
  - cxAnalyzerTree.cpp
  - cxAnalyzerTree.h
  - cxAnalyzerTypeMap.cpp
  - cxAnalyzerTypeMap.h
  - cxaParseTree.cpp
  - cxaParseTree.h
  - cxaRuleCache.cpp
  - cxaRuleCache.h
  - cxaRuleCacheElement.cpp
  - cxaRuleCacheElement.h
  - cxaToken.cpp
  - cxaToken.h
  - cxaToken.inl
  - cxaTokenStream.cpp
  - Readme.txt
  - StdAfx.cpp
  - StdAfx.h
- cxTokenizer
- cxtPackage
  - cxtPackage.cpp
  - cxtPackage.dsp
  - cxtPackage.h
  - PACKAGE.H
  - Readme.txt
  - StdAfx.cpp
  - StdAfx.h
  - todo.txt
- DEBUG.CPP
- DEBUG.H
- emptyTestApp
  - emptyTestApp.clw
  - emptyTestApp.cpp
  - emptyTestApp.dsp
  - emptyTestApp.h
  - emptyTestApp.rc
  - emptyTestAppDlg.cpp
  - emptyTestAppDlg.h
  - ReadMe.txt
  - res
    - emptyTestApp.ico
    - emptyTestApp.rc2
  - Resource.h
  - StdAfx.cpp
  - StdAfx.h
- grammarIDE
  - DlgEvaluate.cpp
  - DlgEvaluate.h
  - DlgProperties.cpp
  - DlgProperties.h
  - grammarIDE.clw
  - grammarIDE.cpp
  - grammarIDE.dsp
  - grammarIDE.h
  - grammarIDE.rc
  - grammarIDEDoc.cpp
  - grammarIDEDoc.h
  - grammarIDEView.cpp
  - grammarIDEView.h
  - LeftView.cpp
  - LeftView.h
  - ltrItemTreeWnd.cpp
  - ltrItemTreeWnd.h
  - MainFrm.cpp
  - MainFrm.h
  - PaneEditorWnd.cpp
  - PaneEditorWnd.h
  - propsItemGrammar.cpp
  - propsItemGrammar.h
  - ReadMe.txt
  - res
    - grammarIDE.ico
    - grammarIDE.rc2
    - grammarIDEDoc.ico
    - icon1.ico
    - Toolbar.bmp
    - vssver.scc
    - zoomable.ico
  - Resource.h
  - StdAfx.cpp
  - StdAfx.h
- readme.txt
- sample-grammar.txt
- simpleCalc
  - ReadMe.txt
  - simpleCalc.bmp
  - simpleCalc.cpp
  - simpleCalc.dsp
  - StdAfx.cpp
  - StdAfx.h
- tkCommon
  - ctkCheckValid.h
  - ctkEnumerator.h
  - ctkExternalObjectPointer.h
  - ctkFlagsMixin.h
  - ctkHLinkedList.cpp
  - ctkHLinkedList.h
  - ctkMisc.h
  - ctkSerializable.h
  - tkCommon.h
  - vssver.scc
- vcstl_nowarnings.h
pkgsrconly.zip
- COMMON.CPP
- COMMON.H
- common_error.h
- common_win32.h
- COPYING
- cpAbsd.dsw
- cpp-grammar.txt
- ANALYZER.H
- cxAnalyzer.dsp
- cxAnalyzerException.h
- cxAnalyzerExpression.cpp
- cxAnalyzerExpression.h
- cxAnalyzerMain.cpp
- cxAnalyzerMain.h
- cxAnalyzerTree.cpp
- cxAnalyzerTree.h
- cxAnalyzerTypeMap.cpp
- cxAnalyzerTypeMap.h
- cxaParseTree.cpp
- cxaParseTree.h
- cxaRuleCache.cpp
- cxaRuleCache.h
- cxaRuleCacheElement.cpp
- cxaRuleCacheElement.h
- cxaToken.cpp
- cxaToken.h
- cxaToken.inl
- cxaTokenStream.cpp
- Readme.txt
- StdAfx.cpp
- StdAfx.h
- cxTokenizer.cpp
- cxTokenizer.dsp
- cxTokenizer.h
- cxTokenizerCharTokenRule.cpp
- cxTokenizerCharTokenRule.h
- cxTokenizerCommentTokenRule.cpp
- cxTokenizerCommentTokenRule.h
- cxTokenizerContext.cpp
- cxTokenizerContext.h
- cxTokenizerContextCookie.h
- cxTokenizerContextDiags.cpp
- cxTokenizerDiags.cpp
- cxTokenizerException.cpp
- cxTokenizerException.h
- cxTokenizerInputStream.h
- cxTokenizerMap.cpp
- cxTokenizerMap.h
- cxTokenizerMapData.cpp
- cxTokenizerMapData.h
- cxTokenizerMapDataDiags.cpp
- cxTokenizerMapDiags.cpp
- cxTokenizerMatchTokenRule.h
- cxTokenizerMatchTokenRule.inl
- cxTokenizerNumberTokenRule.cpp
- cxTokenizerNumberTokenRule.h
- cxTokenizerSTLInputStream.cpp
- cxTokenizerSTLInputStream.h
- cxTokenizerStringTokenRule.cpp
- cxTokenizerStringTokenRule.h
- cxTokenizerTextInputStream.cpp
- cxTokenizerTextInputStream.h
- cxTokenizerTokenRule.cpp
- cxTokenizerTokenRule.h
- Readme.txt
- StdAfx.cpp
- StdAfx.h
- tokenizer.h
- tokenizer_error.h
- tokenizer_flags.h
- tokenizerBase.h
- cxtPackage.cpp
- cxtPackage.dsp
- cxtPackage.h
- PACKAGE.H
- Readme.txt
- StdAfx.cpp
- StdAfx.h
- todo.txt
- DEBUG.CPP
- DEBUG.H
- emptyTestApp.clw
- emptyTestApp.cpp
- emptyTestApp.dsp
- emptyTestApp.h
- emptyTestApp.rc
- emptyTestAppDlg.cpp
- emptyTestAppDlg.h
- ReadMe.txt
- emptyTestApp.ico
- emptyTestApp.rc2
- Resource.h
- StdAfx.cpp
- StdAfx.h
- readme.txt
- sample-grammar.txt
- ctkCheckValid.h
- ctkEnumerator.h
- ctkExternalObjectPointer.h
- ctkFlagsMixin.h
- ctkHLinkedList.cpp
- ctkHLinkedList.h
- ctkMisc.h
- ctkSerializable.h
- tkCommon.h
- vssver.scc
- vcstl_nowarnings.h
grammaride.zip
- cpp-grammar.txt
- grammarIDE.exe
- readme.txt
- sample-grammar.txt
- stlport_vc645.dll
cxtpackagetut_win32vc.zip
- common.cpp
- common.h
- common_error.h
- common_win32.h
- COPYING
- cpAbsd.dsw
- analyzer.h
- cxAnalyzer.dsp
- cxAnalyzer.plg
- cxAnalyzerException.h
- cxAnalyzerExpression.cpp
- cxAnalyzerExpression.h
- cxAnalyzerMain.cpp
- cxAnalyzerMain.h
- cxAnalyzerTree.cpp
- cxAnalyzerTree.h
- cxAnalyzerTypeMap.cpp
- cxAnalyzerTypeMap.h
- cxaParseTree.cpp
- cxaParseTree.h
- cxaToken.cpp
- cxaToken.h
- cxaTokenStream.cpp
- Readme.txt
- StdAfx.cpp
- StdAfx.h
- cxTokenizer.cpp
- cxTokenizer.dsp
- cxTokenizer.h
- cxTokenizer.plg
- cxTokenizerCharTokenRule.cpp
- cxTokenizerCharTokenRule.h
- cxTokenizerContext.cpp
- cxTokenizerContext.h
- cxTokenizerContextCookie.h
- cxTokenizerContextDiags.cpp
- cxTokenizerDiags.cpp
- cxTokenizerException.cpp
- cxTokenizerException.h
- cxTokenizerInputStream.h
- cxTokenizerMap.cpp
- cxTokenizerMap.h
- cxTokenizerMapData.cpp
- cxTokenizerMapData.h
- cxTokenizerMapDataDiags.cpp
- cxTokenizerMapDiags.cpp
- cxTokenizerNumberTokenRule.cpp
- cxTokenizerNumberTokenRule.h
- cxTokenizerSTLInputStream.cpp
- cxTokenizerSTLInputStream.h
- cxTokenizerStringTokenRule.cpp
- cxTokenizerStringTokenRule.h
- cxTokenizerTextInputStream.cpp
- cxTokenizerTextInputStream.h
- cxTokenizerTokenRule.cpp
- cxTokenizerTokenRule.h
- Readme.txt
- StdAfx.cpp
- StdAfx.h
- tokenizer.h
- tokenizer_error.h
- tokenizer_flags.h
- tokenizerBase.h
- cxtPackage.cpp
- cxtPackage.dsp
- cxtPackage.h
- cxtPackage.plg
- package.h
- Readme.txt
- StdAfx.cpp
- StdAfx.h
- mathTok
  - mathTok.cpp
  - mathTok.dsp
  - mathTok.plg
  - ReadMe.txt
  - StdAfx.cpp
  - StdAfx.h
  - ReadMe.txt
  - simpleCalc.bmp
  - simpleCalc.cpp
  - simpleCalc.dsp
  - simpleCalc.plg
  - StdAfx.cpp
  - StdAfx.h
  - ctkCheckValid.h
  - ctkExternalObjectPointer.h
  - ctkFlagsMixin.h
  - ctkHLinkedList.cpp
  - ctkHLinkedList.h
  - ctkMisc.h
  - ctkSerializable.h
  - tkCommon.h
cxtpackage_win32vc.zip
- common.cpp
- common.h
- common_error.h
- common_win32.h
- COPYING
- cpAbsd.dsw
- analyzer.h
- cxAnalyzer.dsp
- cxAnalyzer.plg
- cxAnalyzerException.h
- cxAnalyzerExpression.cpp
- cxAnalyzerExpression.h
- cxAnalyzerMain.cpp
- cxAnalyzerMain.h
- cxAnalyzerTree.cpp
- cxAnalyzerTree.h
- cxAnalyzerTypeMap.cpp
- cxAnalyzerTypeMap.h
- cxaParseTree.cpp
- cxaParseTree.h
- cxaToken.cpp
- cxaToken.h
- cxaTokenStream.cpp
- Debug
  - Readme.txt
  - StdAfx.cpp
  - StdAfx.h
  - cxTokenizer.cpp
  - cxTokenizer.dsp
  - cxTokenizer.h
  - cxTokenizer.plg
  - cxTokenizerCharTokenRule.cpp
  - cxTokenizerCharTokenRule.h
  - cxTokenizerContext.cpp
  - cxTokenizerContext.h
  - cxTokenizerContextCookie.h
  - cxTokenizerContextDiags.cpp
  - cxTokenizerDiags.cpp
  - cxTokenizerException.cpp
  - cxTokenizerException.h
  - cxTokenizerInputStream.h
  - cxTokenizerMap.cpp
  - cxTokenizerMap.h
  - cxTokenizerMapData.cpp
  - cxTokenizerMapData.h
  - cxTokenizerMapDataDiags.cpp
  - cxTokenizerMapDiags.cpp
  - cxTokenizerNumberTokenRule.cpp
  - cxTokenizerNumberTokenRule.h
  - cxTokenizerSTLInputStream.cpp
  - cxTokenizerSTLInputStream.h
  - cxTokenizerStringTokenRule.cpp
  - cxTokenizerStringTokenRule.h
  - cxTokenizerTextInputStream.cpp
  - cxTokenizerTextInputStream.h
  - cxTokenizerTokenRule.cpp
  - cxTokenizerTokenRule.h
  - Readme.txt
  - StdAfx.cpp
  - StdAfx.h
  - tokenizer.h
  - tokenizer_error.h
  - tokenizer_flags.h
  - tokenizerBase.h
  - cxtPackage.cpp
  - cxtPackage.dsp
  - cxtPackage.h
  - cxtPackage.plg
  - package.h
  - Readme.txt
  - StdAfx.cpp
  - StdAfx.h
  - ctkCheckValid.h
  - ctkExternalObjectPointer.h
  - ctkFlagsMixin.h
  - ctkHLinkedList.cpp
  - ctkHLinkedList.h
  - ctkMisc.h
  - ctkSerializable.h
  - tkCommon.h

grammarIDE ReadMe
-----------------

This product is not nearly finished, so please be forgiveful if you find some bugs 
(this refers to the IDE, not to the tokenizer/analyzer library).

To get some results fast, extract the contents of this .ZIP-Archive someplace on your
HD, lets say c:\grammarIDE. Now start the IDE, select File/Open and search for 
"sample-grammar.txt" located in the directory you unzipped the files to.

You should now see in the left pane the tree structure of the grammar, and in the editor
pane you see the source code.

Now select Parse/Evaluate Expression and enter for example 1*2+3-4*(5/6)*8-9 and press "Evaluate".
You should now get a graphical display of the parsed expression.

Now, press "Rebalance", and the parse tree is instantly reorganized with respect to the 
precedence priorities of the grammar.


C++ grammar
---------------
As an example of a pretty complex grammar you can alternatively open the file "cpp-grammar.txt".
This is an pre-beta version of the grammar of a C++ compiler, slightly modified to run without the context
of a compiler (for example variable names are treated simply as literals).
To test it, select again Parse/Evaluate, select in the "Select rule" - Combo the rule
".globalscopeblock" and enter a C++ program without templates or preprocessor statements.

struct x { int **(const *a[3])[4]; };
class t : public b {
private:
 x s;
};
void test()
{
  int *b;
  (*x->a[1])[0]=&b;
}


---
Small grammar reference:
[tokens]	- defines tokens
[seperators]	- defines seperators
[rules]		- defines pre-defined tokenizer rules ("numbers" for example)
[grammar]	- defines the analyzer grammar

----
A typical line in the sections [tokens], [seperators] or [rules] looks like:
xxx:zzz

Where 'xxx' is the ID the item gets assigned to, and 'zzz' describes the item itself.
For both [tokens] and [seperators], 'zzz' is plain text describing the token in question
and can include escape characters in a subset of the C escape notation.

In the section [rules] things are a bit different:
'zzz' equals 'numbers' means: include the number token recognition rule into the parser.
For more information on this topic, see http://www.subground.cc/devel.
CAUTION: The resulting token of the rule 'numbers' is named 'number' - see below

----
A typical line in the secion [grammar] looks like this:

xxx:{.rulename}=yyy:{item}[{item},...]

Where 'xxx' is again the ID, 'rulename' - surrounded by '{}' and prepended with a '.' - is
the name of the rule to declare. 'yyy' is the precedence priority which must be in the range
0 (maximum precedence) to 32767 (minimum precedence).

There are different classes of 'item's:
{$hello} -> refers to the token 'hello' CAUTION: must be defined in the tokens or seperators section!
{!number} -> refers to the rule 'numbers' CAUTION: The rule is named 'numbers' instead of 'number'!
{.rule} -> refers to the grammar rule 'rule'
{#literal#} -> refers to an undefined literal

By viewing downloads associated with this article you agree to the Terms of Service and the article's licence.

If a file you wish to view isn't highlighted, and is a text file (not binary), please let us know and we'll add colourisation support for it.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here

Written By

Alexander Berthold

Web Developer

Germany

This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Tokenizer and analyzer package supporting precedence prioritized rules

License

Comments and Discussions