|
I had to cut back on coffee after I finally realized that drinking too much of it gave me a chronic lightheaded feeling, the kind you get after staying in bed too long while ill.
So now I spend the entire 90% talking to my rubber duck. 🦆
modified 1-Jan-20 11:02am.
|
|
|
|
|
Recursive descent parsers are the cat's ass! Others are inscrutable by comparison. In our compiler course, we had a choice of what kind of parser to implement. I was the only one who chose recursive descent. It was easy to enhance, so it ended up handling a larger subset of Algol than the others. But boy, was it slow! It was written in Simula-67, running on a PDP-10, and probably backtracked too much. Adding backtracking statistics would have revealed if it was doing something silly.
|
|
|
|
|
Did you know Raymond Burr had a brother that was a lumber jack?
His name was Tim!
Ok I'm bored, I promise this will be the last joke this year!
Did a little mechanic work today.
Put a rear end in a recliner!
JaxCoder.com
|
|
|
|
|
Plausible. He was born in British Columbia!
|
|
|
|
|
|
That will be enough of that. There is an expectation of no more Raymond Burr or Ray Charles references in the lounge in the new year...
That's right - we need Ray Bans to protect our 2020 vision.
Hey Mike - get my coat while you are over there getting yours.
I, for one, like Roman Numerals.
|
|
|
|
|
luberjack is in fact a very honorable profession,
after all:
they help us all to see the forest by removing the trees that are in the way.
<< Signature removed due to multiple copyright violations >>
|
|
|
|
|
He had loads of brothers:
Har became a sailor.
Bom joined the air force.
Lim taught Yoga.
Fib became a politician.
Rob went to jail.
Cob moved to Oz.
"I have no idea what I did, but I'm taking full credit for it." - ThisOldTony
AntiTwitter: @DalekDave is now a follower!
|
|
|
|
|
You forgot the rest of the family:
Bar became a hair stylist
Dub became a sound technician
and Num became an accountant.
Freedom is the freedom to say that two plus two make four. If that is granted, all else follows.
-- 6079 Smith W.
|
|
|
|
|
I knew you wood take a crack at this.
/ravi
|
|
|
|
|
Yeah I FELL for it!
Did a little mechanic work today.
Put a rear end in a recliner!
JaxCoder.com
|
|
|
|
|
As reported in Paging-Mr-Cthulhu[^] I'm dealing with "bad" HTML (not that any HTML is truly good).
The first attempt used a Regular Expression to grab the TH and TD elements (within TR elements of course). Hence the above reference. It works.
The second attempt used HtmlAgilityPack with good results, but I doubt I can get it installed on the server, and it didn't provide as much feedback as I'd like. It works.
Sooo... yesterday and today were spent implementing an HTML-to-XML process. Using a Regular Expression to grab anything which looks roughly like a tag.
Then the process iterates the tags, matching end tags with start tags, and figuring out how to deal with unmatched start and end tags (and firing events for such aberrations).
The result is an XmlDocument, which enforces well-formed XML (but not necessarily valid XHTML).
It works.
|
|
|
|
|
Ph'nglui mglw'nafh Cthulhu R'lyeh wgah'nagl fhtagn!
|
|
|
|
|
PIEBALDconsult wrote: (not that any HTML is truly good) Only a dev could make that complaint.
Programming languages are for nerds like us; HTML makes room for everyone else.
Besides, it's HTML that made the Internet; without HTML there would be no Internet, so it's one of the most important inventions in the entire history of the world.
My God!
I just envisioned a dev-only Internet -- CP, SO, 10,000,000 blogs bitching about the latest Star Wars movie, endless rants about preferred coding standards, and everything else about why one programming language is supreme, and the rest are all garbage.
Not a pretty sight...
I wanna be a eunuchs developer! Pass me a bread knife!
|
|
|
|
|
I've spent several days adding a bunch of foundational features to Parsley like composition parsing, lex priorities, additional firsts and follows hints and bugfixing.
The upshot, is it now parses Slang except for directives and comments both of which it currently discards, but i'll fix it shortly. It's not critical. I'm taking a novel approach with comment preservation - or at least I haven't seen it using other parsers and lexers. I have three major states for a terminal - normal, hidden, or skipped. If it's skipped, it still picks it up in the lexer and gives you the skipped tokens in each advance in the parser, as a separate list, so you can get them "out of band" and not interfere with the regular parse. In my hand-rolled slang parser i just checked for comments everywhere, which is buggy and stupid, but sometimes stupid works.
I still have quite a bit to do before it can do all the things the hand rolled parser can do, but at this point it's 80% my grammar that needs to be coded.
The grammar parses, but i'll be putting in => actions to build the codedom tree from the parse tree. That way if i call EvaluateXXXX() on the parser i get a codedom tree back.
At that point it will do what the hand rolled parser does and i can swap it out and pray i got it right
But for now, an article will have to suffice.
When I was growin' up, I was the smartest kid I knew. Maybe that was just because I didn't know that many kids. All I know is now I feel the opposite.
|
|
|
|
|
looking forward to your new post! Happy New Year!
diligent hands rule....
|
|
|
|
|
Thanks!
I'm always thrilled to know someone appreciates my work. My latest parser is very clever, IMO. It's not my fault. I stumbled on to an idea practically by accident and it worked out pretty well.
I've got it parsing 95% of Slang now - I'm just missing comments and directives, both of which are currently skipped, but i think that's article material.
Happy new year to you and yours as well.
When I was growin' up, I was the smartest kid I knew. Maybe that was just because I didn't know that many kids. All I know is now I feel the opposite.
|
|
|
|
|
I've thought about trying to preserve comments as well, so how you do it sounds interesting.
Happy New Year and good luck with the article!
|
|
|
|
|
Thanks! I just posted it. I've run into the problem a lot, and the answer of how to do it depends on what you're using it for.
In previous parsers, I made the *parser* not the *lexer* skip over hidden terminals. That way, I could set ShowHidden to true and the parser would spit those out in the main stream without interfering with the parse.
That's great for syntax highlighting, but i needed something a bit different here for a couple of reasons.
For starters, I still want to hide some things, like whitespace.
Second, all my parser code now is *inside* the parser (where it belongs) rather than outside of it, driving it, as it was in PCK. Because of this, I needed to expose the comments using an alternate means.
"Skipped" solves these problems, even if it does mean carrying a bag of unparsed tokens around every time i advance.
When I was growin' up, I was the smartest kid I knew. Maybe that was just because I didn't know that many kids. All I know is now I feel the opposite.
|
|
|
|
|
I tried my way. It works like a charm. Woo! The only issue is (and this is not because of this particularly but ongoing and getting worse) my gplex specs have "a lot" of code in them - not an unreasonable amount yet - but more than lexer specs usually do, so it's a bit bigger than normal.
Oh well. Skipping comments is easy.
lineComment<skipped>='\/\/[^\n]*';
blockComment<skipped,blockEnd="*/">="/*";
blockEnd is another feature I added, though getting gplex to do it was a trick. It makes it far easier to indicate things like C block comments, HTML/XML/SGML comments, CDATA sections, basically anything with a multicharacter ending condition. Sure you can do it with lazy matching but not all lexers support that (i turned it off for performance) - this way is faster and clearer.
When I was growin' up, I was the smartest kid I knew. Maybe that was just because I didn't know that many kids. All I know is now I feel the opposite.
|
|
|
|
|
I'd have to adapt your strategy because my code is totally hand-rolled. This function would probably have to assemble comments and save them in an adjunct vector of strings. It's invoked from many places and I'm loath to spend time speculatively constructing and passing around strings.
size_t Lexer::NextPos(size_t pos) const
{
while(pos < size_)
{
auto c = source_->at(pos);
switch(c)
{
case SPACE:
case CRLF:
case TAB:
++pos;
break;
case '/':
if(++pos >= size_) return string::npos;
switch(source_->at(pos))
{
case '/':
pos = source_->find(CRLF, pos);
if(pos == string::npos) return pos;
++pos;
break;
case '*':
if(++pos >= size_) return string::npos;
pos = source_->find(COMMENT_END_STR, pos);
if(pos == string::npos) return string::npos;
pos += 2;
break;
default:
return --pos;
}
break;
case BACKSLASH:
if(++pos >= size_) return string::npos;
if(source_->at(pos) != CRLF) return pos - 1;
++pos;
break;
default:
return pos;
}
}
return string::npos;
}
modified 4-Jan-20 10:04am.
|
|
|
|
|
Well, yours do work better than mine since you accounted for line continuation and i forgot about it
When I was growin' up, I was the smartest kid I knew. Maybe that was just because I didn't know that many kids. All I know is now I feel the opposite.
|
|
|
|
|
A relatively recent addition.
|
|
|
|
|
I still have to figure out how to match multiline tokens with gplex/lex/flex.
Have you ever used those tools? Had any luck with them?
I'm using gplex now, and it works, but it's a bit clunky sometimes. I'd like to write my own scanner generator that supports unicode but i need to understand NFA regex first, and I only understand DFA regex.
When I was growin' up, I was the smartest kid I knew. Maybe that was just because I didn't know that many kids. All I know is now I feel the opposite.
|
|
|
|
|
No, I've never used those tools. Usually I start something that grows organically, at which point it's too late to use them. But even if they might help, I'd rather write the code from scratch than search for a tool, read its manual, get it running, and maybe discover, after investing lots of time on it, that it has show-stopping deficiencies. Mostly it's because I hate spending time configuring stuff.
|
|
|
|