|
I've spent several days adding a bunch of foundational features to Parsley like composition parsing, lex priorities, additional firsts and follows hints and bugfixing.
The upshot, is it now parses Slang except for directives and comments both of which it currently discards, but i'll fix it shortly. It's not critical. I'm taking a novel approach with comment preservation - or at least I haven't seen it using other parsers and lexers. I have three major states for a terminal - normal, hidden, or skipped. If it's skipped, it still picks it up in the lexer and gives you the skipped tokens in each advance in the parser, as a separate list, so you can get them "out of band" and not interfere with the regular parse. In my hand-rolled slang parser i just checked for comments everywhere, which is buggy and stupid, but sometimes stupid works.
I still have quite a bit to do before it can do all the things the hand rolled parser can do, but at this point it's 80% my grammar that needs to be coded.
The grammar parses, but i'll be putting in => actions to build the codedom tree from the parse tree. That way if i call EvaluateXXXX() on the parser i get a codedom tree back.
At that point it will do what the hand rolled parser does and i can swap it out and pray i got it right
But for now, an article will have to suffice.
When I was growin' up, I was the smartest kid I knew. Maybe that was just because I didn't know that many kids. All I know is now I feel the opposite.
|
|
|
|
|
looking forward to your new post! Happy New Year!
diligent hands rule....
|
|
|
|
|
Thanks!
I'm always thrilled to know someone appreciates my work. My latest parser is very clever, IMO. It's not my fault. I stumbled on to an idea practically by accident and it worked out pretty well.
I've got it parsing 95% of Slang now - I'm just missing comments and directives, both of which are currently skipped, but i think that's article material.
Happy new year to you and yours as well.
When I was growin' up, I was the smartest kid I knew. Maybe that was just because I didn't know that many kids. All I know is now I feel the opposite.
|
|
|
|
|
I've thought about trying to preserve comments as well, so how you do it sounds interesting.
Happy New Year and good luck with the article!
|
|
|
|
|
Thanks! I just posted it. I've run into the problem a lot, and the answer of how to do it depends on what you're using it for.
In previous parsers, I made the *parser* not the *lexer* skip over hidden terminals. That way, I could set ShowHidden to true and the parser would spit those out in the main stream without interfering with the parse.
That's great for syntax highlighting, but i needed something a bit different here for a couple of reasons.
For starters, I still want to hide some things, like whitespace.
Second, all my parser code now is *inside* the parser (where it belongs) rather than outside of it, driving it, as it was in PCK. Because of this, I needed to expose the comments using an alternate means.
"Skipped" solves these problems, even if it does mean carrying a bag of unparsed tokens around every time i advance.
When I was growin' up, I was the smartest kid I knew. Maybe that was just because I didn't know that many kids. All I know is now I feel the opposite.
|
|
|
|
|
I tried my way. It works like a charm. Woo! The only issue is (and this is not because of this particularly but ongoing and getting worse) my gplex specs have "a lot" of code in them - not an unreasonable amount yet - but more than lexer specs usually do, so it's a bit bigger than normal.
Oh well. Skipping comments is easy.
lineComment<skipped>='\/\/[^\n]*';
blockComment<skipped,blockEnd="*/">="/*";
blockEnd is another feature I added, though getting gplex to do it was a trick. It makes it far easier to indicate things like C block comments, HTML/XML/SGML comments, CDATA sections, basically anything with a multicharacter ending condition. Sure you can do it with lazy matching but not all lexers support that (i turned it off for performance) - this way is faster and clearer.
When I was growin' up, I was the smartest kid I knew. Maybe that was just because I didn't know that many kids. All I know is now I feel the opposite.
|
|
|
|
|
I'd have to adapt your strategy because my code is totally hand-rolled. This function would probably have to assemble comments and save them in an adjunct vector of strings. It's invoked from many places and I'm loath to spend time speculatively constructing and passing around strings.
size_t Lexer::NextPos(size_t pos) const
{
while(pos < size_)
{
auto c = source_->at(pos);
switch(c)
{
case SPACE:
case CRLF:
case TAB:
++pos;
break;
case '/':
if(++pos >= size_) return string::npos;
switch(source_->at(pos))
{
case '/':
pos = source_->find(CRLF, pos);
if(pos == string::npos) return pos;
++pos;
break;
case '*':
if(++pos >= size_) return string::npos;
pos = source_->find(COMMENT_END_STR, pos);
if(pos == string::npos) return string::npos;
pos += 2;
break;
default:
return --pos;
}
break;
case BACKSLASH:
if(++pos >= size_) return string::npos;
if(source_->at(pos) != CRLF) return pos - 1;
++pos;
break;
default:
return pos;
}
}
return string::npos;
}
modified 4-Jan-20 10:04am.
|
|
|
|
|
Well, yours do work better than mine since you accounted for line continuation and i forgot about it
When I was growin' up, I was the smartest kid I knew. Maybe that was just because I didn't know that many kids. All I know is now I feel the opposite.
|
|
|
|
|
A relatively recent addition.
|
|
|
|
|
I still have to figure out how to match multiline tokens with gplex/lex/flex.
Have you ever used those tools? Had any luck with them?
I'm using gplex now, and it works, but it's a bit clunky sometimes. I'd like to write my own scanner generator that supports unicode but i need to understand NFA regex first, and I only understand DFA regex.
When I was growin' up, I was the smartest kid I knew. Maybe that was just because I didn't know that many kids. All I know is now I feel the opposite.
|
|
|
|
|
No, I've never used those tools. Usually I start something that grows organically, at which point it's too late to use them. But even if they might help, I'd rather write the code from scratch than search for a tool, read its manual, get it running, and maybe discover, after investing lots of time on it, that it has show-stopping deficiencies. Mostly it's because I hate spending time configuring stuff.
|
|
|
|
|
those scanner generators are really easy to use once you know how. the issue i have building my own tokenizer/scanner by hand is they get awfully complicated for real world languages as you've probably found. I prefer to use regex to define my lexemes/terminal tokens as it makes it easier - less code i have to write and debug. regex is like nothing to me. I'd almost rather write my own scanner generator and then use that than write my own scanner by hand.
I have an unrelated question for you.
I have two options with respect to parsing C#: I can parse in two passes, parsing only as far as types the first time, just to get type information completely so i can parse the rest accurately (i need type information to disambiguate the parse, just like you do to parse C only worse)
My other option - and what i've done with the hand rolled parser, is simply punt *correcting* the AST after the fact, rather than touching the parse tree. basically *after* I've finished building my AST out of my parse tree, then i go back and correct the AST with type information.
I have this nasty method called "Patch" which visits my entire tree, with type info, looking for bits in the tree it needs to patch. Currently it's slow, but i have some ideas to speed it up.
The first way might be more efficient, but it also might be a dead end. I've never tried it?
Any thoughts? What would you do?
When I was growin' up, I was the smartest kid I knew. Maybe that was just because I didn't know that many kids. All I know is now I feel the opposite.
|
|
|
|
|
This is after a rather large dinner with wine, so I hope it helps.
I don't understand the difference between the parse tree and the AST. I only have one tree, but maybe that's because I'm only doing C++ whereas you're converting one language to another.
Once I have a subtree for something that is "executable" (e.g. an enum , typedef , data declaration or definition, function declaration or definition), I invoke a virtual EnterBlock function on its root, which is like invoking an interpreter. Each node invokes EnterBlock on its descendants, so it proceeds depth first. A QualName (a possibly qualified name) or DataSpec (a QualName tagged with pointers, references, and/or const ) implements this by resolving its name based on the current scope. I hope this is what you mean by "getting the type information". It also causes stuff to be pushed onto the operand (types) and operator stacks.
Could this wait until all of the code is parsed? I don't see why not, and I don't see how it would be more or less efficient. In fact, name resolution can occur later if there are errors during the parsing or interpretation. If you run the >check tool on the code, one of the things it does (to clean up #include lists) is to ask each file for all of the things that it uses. Any nodes that have names but that weren't "interpreted" because an error caused them to be skipped will then try to resolve their names.
|
|
|
|
|
Greg Utas wrote: Could this wait until all of the code is parsed? I don't see why not
That's what I've been doing. I think i just need to implement a more efficient visit on the tree i'm using.
Maybe i'll make it so it can visit only marked nodes.
When I was growin' up, I was the smartest kid I knew. Maybe that was just because I didn't know that many kids. All I know is now I feel the opposite.
|
|
|
|
|
return _Compare(x.Key.Key, y.Key.Key);
When I was growin' up, I was the smartest kid I knew. Maybe that was just because I didn't know that many kids. All I know is now I feel the opposite.
|
|
|
|
|
It's sunny here too. Wall to wall sunshine and blue skies. Nary a cloud.
-15 right now, but it's a dry cold and the sun feels nice on your numb face.
|
|
|
|
|
Quote: -15 right now I am glad to be a "Florida Man". It was cold this morning. 55 F. That is: PLUS 55!
|
|
|
|
|
I thought all the "Florida Mans" were here!
As well as everyone from every other state and walk of life for the holiday craziness in Summit County Colorado.
|
|
|
|
|
Quote: I thought all the "Florida Mans" were here!
Are you telling me there is more than one? That I am not unique? Rats! Double Rats!
|
|
|
|
|
Ditto.....
A human being should be able to change a diaper, plan an invasion, butcher a hog, navigate a ship, design a building, write a sonnet, balance accounts, build a wall, set a bone, comfort the dying, take orders, give orders, cooperate, act alone, solve equations, analyze a new problem, pitch manure, program a computer, cook a tasty meal, fight efficiently, die gallantly. Specialization is for insects! - Lazarus Long
|
|
|
|
|
Living in x, it's grey and cold here. Quite the contrast from 74 and "clear with periodic clouds" in y.Key.Key
But hey, I moved to the East Coast from San Diego years ago to experience actual seasons and weather.
|
|
|
|
|
|
|
Who writes the subtitles on the Code Project Daily News?
They really crack me up.
Example:
Scientists say they've found a way to solve the 'oldest open question in astrophysics' (3 body problem)
OK, now do four
|
|
|
|
|
Kent Sharkey[^] does.
"I have no idea what I did, but I'm taking full credit for it." - ThisOldTony
AntiTwitter: @DalekDave is now a follower!
|
|
|
|
|