Click here to Skip to main content
15,886,873 members
Articles / Programming Languages / C++

Tokenizer and analyzer package supporting precedence prioritized rules

Rate me:
Please Sign up or sign in to vote.
5.00/5 (4 votes)
1 Jan 20023 min read 181.7K   2.8K   54  
A library allowing you to conveniently build a custom tokenizer and analyzer supporting precedence priorized rules
' simple (not complete) C grammar

[tokens]
1:auto
2:break
3:case
4:continue
5:default
6:do
7:else
8:enum
9:extern
10:for
11:goto
12:if
13:register
14:return
15:sizeof
16:static
17:struct
18:class
19:switch
20:typedef
21:union
22:while
23:signed
24:unsigned
25:const
26:int
27:short
28:long
29:char
30:float
31:double
32:void
33:throw
34:try
35:catch
36:new
37:delete
38:public
39:private
40:protected
[seperators]
41:!
42:%
43:^
44:&
45:*
46:-
47:+
48:=
49:~
50:|
51:.
52:<
53:>
54:/
55:?
56::
57:,
58:[
59:]
60:(
61:)
62:{
63:}
64:++
65:--
66:->
67:<<
68:>>
69:<=
70:>=
71:==
72:!=
73:*=
74:/=
75:%=
76:+=
77:-=
78:<<=
79:>>=
80:&=
81:^=
82:|=
83:&&
84:||
85:;
86:::
0:\n
0: 
0:	
0:\r
[rules]
87:numbers
88:strings
89:variable
90:function
91:datatype
92:structvar
93:label
0:cppmultilinecomments,nocollect
0:cppsinglelinecomments,nocollect
0:cpppreprocessor
[grammar]

' scope resolution
-94:{.scoperesolution}=0:{#literal#}{$::}
-95:{.scoperesolution}=0:{#literal#}{$::}{$~}
-96:{.scoperesolution}=0:{.scoperesolution}{#literal#}{$::}

' literal or scope-resolved literal
-97:{.literal}=0:{#literal#}
-98:{.literal}=0:{.scoperesolution}{#literal#}

' variable sub-expressions
99:{.vsubscripts}=0:{$[}{.expr}{$]}
100:{.vsubscripts}=0:{$[}{!number}{$]}
101:{.vsubscripts}=0:{.vsubscripts}{.vsubscripts}

' atomar variable expressions (like a, a[5])
-102:{.variable}=0:{.literal}

' atomar structure variable expressions
-103:{.structvar}=0:{.literal}

' variable expression
104:{.varexpr}=0:{.variable}
105:{.varexpr}=0:{$*}{$(}{.expr}{$)}
106:{.varexpr}=0:{$*}{.varexpr}
107:{.varexpr}=0:{$(}{.expr}{$)}
108:{.varexpr}=0:{$&}{.varexpr}
109:{.varexpr}=0:{.varexpr}{$.}{.structvar}
110:{.varexpr}=0:{.varexpr}{$->}{.structvar}
111:{.varexpr}=0:{.varexpr}{$.}{.function}
112:{.varexpr}=0:{.varexpr}{$->}{.function}
113:{.varexpr}=0:{.varexpr}{.vsubscripts}
114:{.varexpr}=0:{.function}
115:{.varexpr}=0:{$new}{.sdatatype}
116:{.varexpr}=0:{$new}{.sdatatype}{$(}{$)}
117:{.varexpr}=0:{$new}{.sdatatype}{$(}{.functionarguments}{$)}

' function/function arguments in expressions
118:{.functionarguments}=0:{.expr}
119:{.function}=0:{.literal}{$(}{.functionarguments}{$)}
120:{.function}=0:{.literal}{$(}{$)}

' data type qualifiers
-121:{.dtqualifier}=0:{$const}
-122:{.dtqualifier}=0:{$signed}
-123:{.dtqualifier}=0:{$unsigned}
-124:{.dtqualifier}=0:{$*}
-125:{.dtqualifier}=0:{$&}
-126:{.dtqualifier}=0:{.dtqualifier}{.dtqualifier}
-127:{.dtqualifier}=0:{$extern}

' data type prefixes
-128:{.sdatatype}=0:{$int}
-129:{.sdatatype}=0:{$long}
-130:{.sdatatype}=0:{$char}
-131:{.sdatatype}=0:{$float}
-132:{.sdatatype}=0:{$void}
-133:{.sdatatype}=0:{$struct}{.literal}
-134:{.sdatatype}=0:{$class}{.literal}
-135:{.sdatatype}=0:{.literal}

' data type (complete, i.e. including qualifiers)
136:{.datatype}=0:{.sdatatype}
137:{.datatype}=0:{.sdatatype}{.dtqualifier}
138:{.datatype}=0:{.dtqualifier}{.datatype}

' cast expression ( like (int) )
139:{.cast}=0:{$(}{.datatype}{$)}{.uexpr}

' access types
140:{.accesstype}=0:{$public}
141:{.accesstype}=0:{$private}
142:{.accesstype}=0:{$protected}

' structure body
-143:{.structblock}=0:{.globalscopestmt}
-144:{.structblock}=0:{.accesstype}{$:}
-145:{.structblock}=0:{.structblock}{.structblock}

' structure helpers
-146:{.structorcls}=0:{$struct}
-147:{.structorcls}=0:{$class}
-148:{.structdeclparam}=0:{.literal}
-149:{.structdeclparam}=0:{.literal}{$:}{.structinheritancelist}
150:{.structinheritancelist}=0:{.accesstype}{.literal}
151:{.structinheritancelist}=0:{.literal}
152:{.structinheritancelist}=0:{.structinheritancelist}{$,}{.structinheritancelist}

' structure declaration
153:{.struct-decl}=0:{.structorcls}{.literal}{$;}
154:{.struct-decl}=0:{.structorcls}{.structdeclparam}{$\{}{.structblock}{$\}}{$;}
155:{.struct-decl}=0:{.structorcls}{.structdeclparam}{$\{}{.structblock}{$\}}{.literal}{$;}
156:{.struct-decl}=0:{.structorcls}{.structdeclparam}{$\{}{.structblock}{$\}}{.dtqualifier}{.literal}{$;}
157:{.struct-decl}=0:{.dtqualifier}{.structorcls}{.structdeclparam}{$\{}{.structblock}{$\}}{.literal}{$;}
158:{.struct-decl}=0:{.dtqualifier}{.structorcls}{.structdeclparam}{$\{}{.structblock}{$\}}{.dtqualifier}{.literal}{$;}

' strings
159:{.string}=0:{!string}
160:{.string}=0:{.string}{.string}

' expressions (implicits)
161:{.uexpr}=0:{!number}
162:{.uexpr}=0:{.string}
163:{.uexpr}=0:{.varexpr}

' expressions (special)
164:{.uexpr}=0:{.cast}
165:{.uexpr}=0:{$sizeof}{$(}{.datatype}{$)}
166:{.uexpr}=0:{$sizeof}{$(}{.varexpr}{$)}

' expressions (unary)
167:{.uexpr}=5:{.uexpr}{$++}
168:{.uexpr}=5:{.uexpr}{$--}
169:{.uexpr}=5:{$++}{.uexpr}
170:{.uexpr}=5:{$--}{.uexpr}
171:{.uexpr}=5:{$~}{.uexpr}
172:{.uexpr}=5:{$+}{.uexpr}
173:{.uexpr}=5:{$-}{.uexpr}
174:{.uexpr}=5:{$!}{.uexpr}

' expressions (binary)
175:{.uexpr}=15:{.uexpr}{$+}{.uexpr}
176:{.uexpr}=15:{.uexpr}{$-}{.uexpr}
177:{.uexpr}=10:{.uexpr}{$*}{.uexpr}
178:{.uexpr}=10:{.uexpr}{$/}{.uexpr}
179:{.uexpr}=10:{.uexpr}{$%}{.uexpr}
180:{.uexpr}=10:{.uexpr}{$|}{.uexpr}
181:{.uexpr}=20:{.uexpr}{$<<}{.uexpr}
182:{.uexpr}=20:{.uexpr}{$>>}{.uexpr}
183:{.uexpr}=25:{.uexpr}{$<}{.uexpr}
184:{.uexpr}=25:{.uexpr}{$>}{.uexpr}
185:{.uexpr}=25:{.uexpr}{$<=}{.uexpr}
186:{.uexpr}=25:{.uexpr}{$>=}{.uexpr}
187:{.uexpr}=30:{.uexpr}{$==}{.uexpr}
188:{.uexpr}=30:{.uexpr}{$!=}{.uexpr}
189:{.uexpr}=35:{.uexpr}{$&}{.uexpr}
190:{.uexpr}=45:{.uexpr}{$|}{.uexpr}
191:{.uexpr}=40:{.uexpr}{$^}{.uexpr}
192:{.uexpr}=50:{.uexpr}{$&&}{.uexpr}
193:{.uexpr}=55:{.uexpr}{$||}{.uexpr}
194:{.uexpr}=60:{.uexpr}{$?}{.uexpr}{$:}{.uexpr}
195:{.uexpr}=65:{.uexpr}{$=}{.uexpr}
196:{.uexpr}=65:{.uexpr}{$+=}{.uexpr}
197:{.uexpr}=65:{.uexpr}{$-=}{.uexpr}
198:{.uexpr}=65:{.uexpr}{$*=}{.uexpr}
199:{.uexpr}=65:{.uexpr}{$-=}{.uexpr}
200:{.uexpr}=65:{.uexpr}{$/=}{.uexpr}
201:{.uexpr}=65:{.uexpr}{$%=}{.uexpr}
202:{.uexpr}=65:{.uexpr}{$<<=}{.uexpr}
203:{.uexpr}=65:{.uexpr}{$>>=}{.uexpr}
204:{.uexpr}=65:{.uexpr}{$&=}{.uexpr}
205:{.uexpr}=65:{.uexpr}{$|=}{.uexpr}
206:{.uexpr}=65:{.uexpr}{$^=}{.uexpr}
207:{.uexpr}=70:{.uexpr}{$,}{.uexpr}

' pseudo-rule used for caching
-208:C{.expr}=0:{.uexpr}

' veriable declaration subscripts
-209:{.declsubscripts}=0:{$[}{!number}{$]}
-210:{.declsubscripts}=0:{.declsubscripts}{$[}{!number}{$]}

' variable declaration args
211:{.args-vardecl}=0:{.literal}
212:{.args-vardecl}=0:{.literal}{.declsubscripts}
213:{.args-vardecl}=0:{.args-vardecl}{$,}{.args-vardecl}

' variable declaration
-214:{.uvardecl}=0:{.literal}
-215:{.uvardecl}=0:{.dtqualifier}{.uvardecl}
-216:{.uvardecl}=0:{.uvardecl}{.declsubscripts}
-217:{.uvardecl}=0:{$(}{.uvardecl}{$)}
218:{.uvardecls}=0:{.uvardecl}
219:{.uvardecls}=0:{.uvardecl}{$=}{.expr}
220:{.uvardecls}=0:{.uvardecls}{$,}{.uvardecls}

' pseudo-rule used for caching
-221:C{.vardecl}=0:{.uvardecls}

' base class initialization list
222:{.ibaseinitlist}=0:{.literal}
223:{.ibaseinitlist}=0:{.literal}{$(}{$)}
224:{.ibaseinitlist}=0:{.literal}{$(}{.expr}{$)}
225:{.ibaseinitlist}=0:{.ibaseinitlist}{$,}{.ibaseinitlist}
226:{.baseinitlist}=0:{$:}{.ibaseinitlist}

' function declaration inner args
227:{.argsi-funcdecl}=0:{.datatype}{.uvardecl}
228:{.argsi-funcdecl}=0:{.argsi-funcdecl}{$,}{.argsi-funcdecl}

' function declaration args
-229:{.args-funcdecl}=0:{$(}{.argsi-funcdecl}{$)}{$const}
-230:{.args-funcdecl}=0:{$(}{$)}{$const}
-231:{.args-funcdecl}=0:{$(}{.argsi-funcdecl}{$)}
-232:{.args-funcdecl}=0:{$(}{$)}

' function declaration header
233:C{.func-header}=0:{.datatype}{.literal}{.args-funcdecl}
234:C{.func-header}=0:{.literal}{.args-funcdecl}
235:C{.func-header}=0:{.datatype}{.literal}{.args-funcdecl}{.baseinitlist}
236:C{.func-header}=0:{.literal}{.args-funcdecl}{.baseinitlist}

' function declaration
237:{.func-decl}=0:{.func-header}{$\{}{.functionscopeblock}{$\}}
238:{.func-decl}=0:{.func-header}{$\{}{$\}}

' if body
239:{.uif-block}=0:{.functionscopestmt}
240:{.uif-block}=0:{$\{}{.functionscopeblock}{$\}}
241:{.uif-block}=0:{$\{}{$\}}

' pseudo rule for caching
-242:C{.if-block}=0:{.uif-block}

' for part
243:{.forpart}=0:{$;}
244:{.forpart}=0:{.expr}{$;}

' statements
245:{.globalscopestmt}=0:{.datatype}{.vardecl}{$;}
246:{.globalscopestmt}=0:{.func-header}{$;}
247:{.globalscopestmt}=0:{.func-decl}
248:{.globalscopestmt}=0:{.struct-decl}
249:{.functionscopestmt}=0:{.datatype}{.vardecl}{$;}
250:{.functionscopestmt}=0:{$if}{$(}{.expr}{$)}{.if-block}
251:{.functionscopestmt}=0:{$if}{$(}{.expr}{$)}{.if-block}{$else}{.if-block}
252:{.functionscopestmt}=0:{$for}{$(}{.forpart}{.forpart}{.expr}{$)}{.if-block}
253:{.functionscopestmt}=0:{$for}{$(}{.forpart}{.forpart}{$)}{.if-block}
254:{.functionscopestmt}=0:{$for}{$(}{.forpart}{.forpart}{.expr}{$)}{$;}
255:{.functionscopestmt}=0:{$for}{$(}{.forpart}{.forpart}{$)}{$;}
256:{.functionscopestmt}=0:{$while}{$(}{.expr}{$)}{.if-block}
257:{.functionscopestmt}=0:{$do}{.if-block}{$while}{$(}{.expr}{$)}{$;}
258:{.functionscopestmt}=0:{.expr}{$;}
259:{.functionscopestmt}=0:{$return}{.expr}{$;}
260:{.functionscopestmt}=0:{$return}{$;}
261:{.functionscopestmt}=0:{$break}{$;}
262:{.functionscopestmt}=0:{$continue}{$;}
263:{.functionscopestmt}=0:{$throw}{.expr}{$;}
264:{.functionscopestmt}=0:{$delete}{.varexpr}{$;}

' function scope block
265:{.functionscopeblock}=0:{.functionscopestmt}
266:{.functionscopeblock}=0:{.functionscopeblock}{.functionscopeblock}

' global scope block
267:{.globalscopeblock}=0:{.globalscopestmt}
268:{.globalscopeblock}=0:{.globalscopeblock}{.globalscopeblock}

By viewing downloads associated with this article you agree to the Terms of Service and the article's licence.

If a file you wish to view isn't highlighted, and is a text file (not binary), please let us know and we'll add colourisation support for it.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here


Written By
Web Developer
Germany Germany
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Comments and Discussions