|
if it were an Afghan could you pet it, wear it, eat it, or shake hands with it?
after many otherwise intelligent sounding suggestions that achieved nothing the nice folks at Technet said the only solution was to low level format my hard disk then reinstall my signature. Sadly, this still didn't fix the issue!
|
|
|
|
|
Despite all his rage
He was just a dog in a cage
GCS d--(d+) s-/++ a C++++ U+++ P- L+@ E-- W++ N+ o+ K- w+++ O? M-- V? PS+ PE- Y+ PGP t+ 5? X R+++ tv-- b+(+++) DI+++ D++ G e++ h--- r+++ y+++* Weapons extension: ma- k++ F+2 X
|
|
|
|
|
I've got a coding problem that I don't know if there's a solution to and it's pretty important.
This isn't a coding question. It's more of a rant.
When a bottom up parser parses input it kind of does things backward.
You get your inputs into the queue:
1. term a
2. term b
3. non-term C
Then a reduce action with a rule
4. Reduce D-> a b C
which tells me to pop C, b, a, and then push D, effectively "replacing" the first 3 entries with 1 entry from step 4
This is all well and good, except when there's errors in the input
let's say there was an error parsing step 1 above - and the error recovery "ate" 2 and 3 leaving me with
1. error #ERROR
Well even if i figured out to reduce to D (which i probably wouldn't) I have the wrong number of items on the stack.
So my tree building fails.
I've tried inserting artificial tokens. I've tried rule rewriting, nothing works. I don't have enough information at any given point to reconstruct what's left of the tree.
So I can do a pull parse on it, but the tree can't be generated from the input.
i get my info back with a class that works like XmlReader with the pull parser
but there's no way to generate a hierarchy for it because the tree is built from the leaves to the root, and a corrupt subtree will corrupt its parents due to the stack issue - and that's IF you don't wind up ending early by trying to pop from an empty stack!
This is my life right now. I never did solve this problem last time I encountered it either.
EDIT: Woo I got it working somewhat
Real programmers use butterflies
modified 20-Feb-20 10:35am.
|
|
|
|
|
Maybe I am reading it wrong, but it's confusing... are you saying that even if you hit an error you still want to continue parsing anyway? Surely if the input in invalid then it should just terminate?
|
|
|
|
|
Yes, I want to continue.
Could you imagine if C#'s compiler stopped at the first parse error?
Real programmers use butterflies
|
|
|
|
|
But that parser has an easier job.
It seems to me that it starts counting matching brackets top down, and if they match up it parses the stuff in between brackets bottom up.
modified 20-Feb-20 10:28am.
|
|
|
|
|
No, it is all bottom up. It recognizes things called viable prefixes using a state machine. When it recognizes a right hand side, it removes that from the stack and replaces it with the appropriate reduction.
Real programmers use butterflies
|
|
|
|
|
Tested a bit, and it seems you're right.
But in a file with 500 rows, one well placed bracket renders 80 errors.
Now factor in how many man years MS has poured into VS.
The lady witch doth protest too much, methinks
|
|
|
|
|
Yeah. What's interesting is a team at Microsoft Research reimplemented the C# front-end to use a GLR parser like the one I've made. It's not used in production though.
Real programmers use butterflies
|
|
|
|
|
honey the codewitch wrote: Could you imagine if C#'s compiler stopped at the first parse error?
Or the converse, which I encountered many years ago in PASCAL -- a forgotten semicolon would result it reams, and I mean REAMS of "missing ;" errors. Stupid parser.
|
|
|
|
|
I left a full stop off the end of a line of COBOL once and managed to get 82 error messages - none of which said "missing full stop".
- I would love to change the world, but they won’t give me the source code.
|
|
|
|
|
Well, my stuff kind of has that issue too in some cases, but i'm working on a better error recovery scheme. Right now it's just "panic-mode" error recovery
Real programmers use butterflies
|
|
|
|
|
I am slightly surprised. The first compiler I tried to understand thoroughly was a Pascal compiler, and I was surprised by how great efforts it made to avoid repeated error messages. E.g. if a type declaration failed, it went on with a pseudo-type-object for that type name that allowed "any" operation. If a non-existing variable was referenced, it created the variable, of a similar forgiving pseudo-type. And so on - lots of tricks to avoid repeated and false errors.
Sometimes we were frustrated by this strategy, as the "forgiving" strategy let other "real" errors pass by without an error message. This was in an age when compilation of even small projects could take minutes - it didn't take that large projects to cross the hour limit. Catching the maximum number of errors per compilation run was a quality measure of the compiler. It made perfectly sense to try to continue as far as possible after a compilation error. (And, with those compilation times, the make utility made far more sense than today!)
Btw: LALR parsers (often referred to "bottom up parsers") is far better suited for going on after errors than recursive descent parser are. But in the golden years of Pascal, at least nine out of ten Pascal compilers were recursive. I guess that some of the few Pascal compilers in use today are bottom up, though.
|
|
|
|
|
Yeah with LALR(1) is possible to do what recursive descent can't easily do in terms of error continuation.
But it's way more difficult than with an LL class parser, at least in my experience.
And then there's the stack issue, which is separate since it's building process is separate than the parse, but it relies on the parser to return the correct number of items or it fails badly stack-wise.
I implemented an ad-hoc technique where in I stop popping when I encounter an error (better the stack has too many nodes than not enough)
Real programmers use butterflies
|
|
|
|
|
Or Intellisense ...
It does not solve my Problem, but it answers my question
modified 19-Jan-21 21:04pm.
|
|
|
|
|
Intellisense doesn't use a standard parser though. It doesn't actually parse really. It is an advanced tokenizer that can nest states, so *almost* a parser, but more like a regex runner that supports nesting
Whoops i was thinking of syntax highlighting, not intellisense. More coffee
Real programmers use butterflies
|
|
|
|
|
|
Yeah, I ran into that valid point face-first when I was implementing an old parser last year.
I had to redesign the whole thing
Real programmers use butterflies
|
|
|
|
|
Error recovery merely tries to bring your parse in a valid state e.g. by adding error productions, error "repair" tries to convert the input into something that may be is not meant
by the writer of the program, but is valid.
Since you do kind of LR parsing, you are certain that what you have recognized is a valid
prefix. The "state" you are in when detecting the error gives all possibilities for an error free continuation. Add to each state such a symbol and you always can recover from the error
with a correct parse tree (although not necessarily the tree meant by the writer of the program)
(There are some cases where that does not work, where you have to do some form of state splitting as long as you do the simple forms like SLR or LALR)
Error recovery and repair was a popular topic in the late 70-ies and 80-ies, and dr Johannes Roehrich had a few good papers on it, Unfortunately these seem to be behind pay walls
|
|
|
|
|
I've got Dick Grune's book on it, and he gives it a pretty thorough treatment.
It's not really the parser that's not working
The parser is a pull-parser underneath and that handles errors okay.
The tree builder that uses the pull-parser gets confused though when there aren't the right number of right hand sides to satisfy a rule.
Real programmers use butterflies
|
|
|
|
|
Yeah, but then the state the parser is in is incorrect or - if you use a separate stack
for the building process, there is an inconsistency between the parse stack and the "semantics" stack
|
|
|
|
|
Yep, and I do have a separate step for the tree building.
The reason for the inconsistency is a rule/shift/reduce mismatch
Like if something is supposed to do
Shift a
Shift b
Shift c
Reduce D
but there's in error it might return this from the parse:
Shift a
#ERROR bc
and then there's an inconsistency, because the rule didn't match the number of shifts in this case (there should have been 3 followed by a reduce)
Real programmers use butterflies
|
|
|
|
|
Have you tried turning it off and back on again?
"I have no idea what I did, but I'm taking full credit for it." - ThisOldTony
AntiTwitter: @DalekDave is now a follower!
|
|
|
|
|
Yes.
Real programmers use butterflies
|
|
|
|
|
Hmm. Reformat your HDD and reinstall your algorithm from scratch?
"I have no idea what I did, but I'm taking full credit for it." - ThisOldTony
AntiTwitter: @DalekDave is now a follower!
|
|
|
|