 |
|
 |
http://www.subground.cc/devel ?
|
|
|
|
 |
|
 |
I am working on projects that perform operations on grammars and parse trees.
The project does not require any parsing.
I need data structures that represent parse trees and unrestricted grammars (open source).
I need to be able to represent rules that represent modifying an existing parse tree (changing leaves, changing ordering of non leaf branches, changing structures of branches, removing a branch, removing a branch and replacing it with another instance of that abstract branch type).
The operations I need to perform are given a word and parse tree perform mutations involving the word and its parse tree. I perform mutations based on rules that might need to use knowledge of its parse tree.
If anyone can give a lead to references or source code it would be appreciated.
|
|
|
|
 |
|
 |
This software seems excelent to me. I dont understand the comparisons with flex / bison since with them you cannot obtain a parse tree. Surely the great thing about this is being able to parse user input expressions, and then evaluate them at a later triger point.
I was wondering if you had any facility to convert the parse tree to a post fix array or any other routines that could enable making it easier to actually invoke the tree resolving symbols to data or functions in the a program.
Clearly for most applications, its only the evaluation time thats relevant.
Ed.
|
|
|
|
 |
|
 |
I keep getting a 403 error
|
|
|
|
 |
|
 |
http://absd.hypermart.net/
i am realfly8)
|
|
|
|
 |
|
 |
I have some preliminary results to share. I wrote a lex-based calculator very similar to Alexander's simpleCalc program. I then wrote a program called genfile that can generate arbitrarily-long mathematical expressions with arbitrary and random parenthesis nesting for the calculators to work on. I generated a file that was 10 MB in size with a maximum nesting depth of 20. I ran this file through my bison/flex calculator and it took 13.7 seconds for the file to parse and evaluate the expression. I then ran the file through Alexander's simpleCalc program. It is still running as I write this, 42 minutes later.
I have provided full source and projects to Alexander so that he can perform additional tests/comparisons, but it seems obvious that this system performs several orders of magnitude worse than bison/flex for an equivalent implementation.
Matthew L. Wigdahl
Software Engineer
matt@scriptpro.com
|
|
|
|
 |
|
 |
Hi,
your comparision might be quite missleading.
If Alexander's package builds a parse tree, this tree will use at least 20 to 50 times more memory than your comparable bison/flex program. The long running time in your test may therefore be a result of swapping.
Right?
The point is, does a simple calculator need a parse tree? Probably not!
Best Regards
Martin
|
|
|
|
 |
|
 |
Wow, a response 3 years after the fact!
You are correct, of course. The calculator is an oversimplified example. Obviously it doesn't need a parse tree as I was able to write my version completely in flex.
As best I can recall from the testing, no hard drive swapping was involved. I ran both programs on a development server with 1 GB of RAM. The real problem seemed to be the dynamic lexing in Alexander's program. My point was just to demonstrate that his program was not speed-neutral to or faster than equivalent parsers built using standard tools, where those tools were applicable.
There are certainly other cases where the extra features of Alexander's program make possible parsers that can't be done (easily, efficiently or even at all) in bison/flex. But for nontrivial inputs in equivalent problem domain implementations, I'm confident that properly-implemented bison/flex will outperform this package every time, often by considerable margins.
Matt
|
|
|
|
 |
|
 |
Hi,
this class looks interesting to me, but I can't find any links to how the rules/grammar is defined!
What would a basic XML(/HTML) parser rule-set look like?
Regards,
Ray
|
|
|
|
 |
|
 |
Granted lex and yacc (really, flex and bison) produce pretty ugly code. But you should never need to even look at that code. In my MFC projects that need parsing I include the clean, understandable .l and .yy files in my project with custom build steps that invoke flex and bison on them. The resultant files are included in the build without my ever needing to see them. The advantages of using the existing tools are: Flex and bison are free, available today, extremely fast, portable and almost completely rock-stable; their behavior is well-defined and lots of people know how to use them. Contrast this to your yacc replacement, which will be buggy, and will have non-standard syntax. Really, we should all be happy that flex produces ugly code. It's that static, ugly code that makes it so screamingly fast since the token map is implemented in compilable, optimizable, C code. Your solution implements lexing dynamically which almost guarantees that it will be drastically slower than flex for equivalent tasks. Matthew L. Wigdahl Software Engineer matt@scriptpro.com
|
|
|
|
 |
|
 |
Take a look at the file at http://www.subground.cc/devel/sample-expression.txt.
It consists of 13516 tokens and is 18KB large; the whole file is a large C++ expression.
Using the "complex-grammar.txt" grammar provided with the grammarIDE binary and testing for the rule ".uexpr" (a C++ expression rule), it takes on my machine (Athlon 1.33 Ghz, 512 MB DDR) approx. 79 msec for the tokenizing stage and 53 msec for the analyzing stage (standard Release build).
Memory requirements (with disabled preview window):
Before tokenizing/analyzing
the memory occupied is 4.564 KB, with the complete parse tree of the expression above in memory (after tokenizing/analyzing) the memory consumption is 6572 KB.
If you want to try, i can send you a slightly modified version which measures using QueryPerformanceCounter the time consumed for each step.
|
|
|
|
 |
|
 |
I would be very interested in performing a comparison test using a complex grammar such as C++ with a very large dataset. I'm not sure that the data you provide is directly comparable, however, for the following reason:
When a scanner and parser are generated by flex and bison, the resulting code takes the input text and analyzes it with no assumptions as to what rule will end up as terminal. Your grammarIDE program seems to require me to specify what type of syntactic element I think the input text will be, which bypasses some analysis and is unrealistic for real-world applications. I tried to paste some arbitrary C++ code into the grammarIDE and analyze it, but except for your example and one variable declaration I could not get anything to resolve. Perhaps I did not specify the correct rule to evaluate.
Is your grammar complete? Do you have a large block of sample C++ representing an entire nontrivial program that you can feed through your grammar using a rule like ".program" and get it to parse? If so, I'd be interested in timing it against an equivalent flex/bison C++ grammar.
If you like, we can use a simpler grammar and a very large amount of input (say, parsing sentences out of the contents of Project Gutenberg) -- this might make the comparison more straightforward by rendering the respective inefficiencies of the different grammar implementations less of an issue.
Matthew L. Wigdahl
Software Engineer
matt@scriptpro.com
|
|
|
|
 |
|
 |
I have made a C grammar and for the pseudo-C code at http://www.subground.cc/devel/sample-pseudo-c-prog.txt it takes 80 msec to tokenize and 67 msec to analyze and build the parse tree (".globalscopeblock"). I corrected also a small bug, if you want to try the grammar, please download again the binary.
The idea of trying a large dataset is interesting. Maybe i can contact you via mail?
|
|
|
|
 |
|
 |
Thank's for the code, good job!! Just a correction.
Mathematic equations are evaluated from right to left, not from left to right.
If the example equation shown above had been:
1-2*(3-4)+8
your program would have divided it as:
1-expr, where expr = 2*(3-4)+8
and this is wrong. It should be: expr+8, and then divide expr as: 1-expr2.
Good luck
|
|
|
|
 |
|
 |
??? What are you talking about?
Try it ... Of course the associativity is defined by the grammar.
|
|
|
|
 |
|
 |
but when I do this:
1-2*(3-4)*8
I think it should executed as:
(3-4)
2*(3-4)
2*(3-4)*8
1-2*(3-4)*8
but it seem be executed as this
(3-4)
1-2*(3-4)
1-2*(3-4)*8
can you help me?
|
|
|
|
 |
|
 |
Math equations are left to right.
1-2*(3-4)+8
Executed as:
(3-4)
2*(3-4)
1-2*(3-4)
1-2*(3-4)+8
Which is what it looks like his code is doing.
Tim Smith
Descartes Systems Sciences, Inc.
|
|
|
|
 |
|
 |
Thanks for the great work.
Now, how do you compare the final result with 'lexx' and 'yacc'?
Have you tried Antlr? If yes, how do you compare this with your sources?
Best regards,
Paul.
Paul Selormey, Bsc (Elect Eng), MSc (Mobile Communication) is currently Windows open source developer in Japan, and open for programming contract anywhere!
|
|
|
|
 |
|
 |
Hi, thanks for your comment.
Intentionally i did not try to get too deep into 'lexx' and 'yacc' before writing this library, so i do not really have experience using them.
But i think 'cxtPackage' is comparable to lexx and yacc. In fact, i am currently writing a C++ compiler built upon 'cxtPackage' and w/o templates i managed up to now to implement >70% of the language grammar using cxtPackage, even complex expressions like (int *(const *a[5])[3]) can be parsed without modifications.
I also never took a closer look at 'Antlr', but i will in the near future.
In terms of speed i'd say 'cxtPackage' could be even faster compared to lexx/yacc, but i have never tried this also.
Regards + Happy new year,
Alexander Berthold
|
|
|
|
 |
|
 |
Take a look at this site: www.bumblebee.com. Here you can find a lex/yacc IDE for WIndows. The output code can be used in cpp projects.
=================================================================
Stefan Ungureanu Software analyst
email:stefan.ungureanu@siatel.ro
|
|
|
|
 |
|
 |
"Welcome to Bumble Bee Seafoods"
|
|
|
|
 |
|
 |
Sorry, wrong link...
www.bumblebeesoftware.com
=================================================================
Stefan Ungureanu Software analyst
email:stefan.ungureanu@siatel.ro
|
|
|
|
 |
|
 |
I found that installing SP5 went a long way to getting rid of the troublesome 4768 warnings. I think you still need the #pragma directive to turn them off, but in SP5 this directive seems to work properly.
|
|
|
|
 |
|
 |
Arrgh! Why are you using these ghastly type warts ?
Do you really think they serve any purpose?
|
|
|
|
 |
|
 |
Naming conventions save everybody a lot of time when it comes to reading another person's code... I have found that people are using them less and less often and I attribute this to the large increase in the number of data types that have been "invented" with Windows programming and OOP.
It's nice to see someone that still uses them.
|
|
|
|
 |