Welcome to the Lounge

Little known fact...believe it or don't

1-Jan-20 1:35

Recursive descent parsers are the cat's ass! Others are inscrutable by comparison. In our compiler course, we had a choice of what kind of parser to implement. I was the only one who chose recursive descent. It was easy to enhance, so it ended up handling a larger subset of Algol than the others. But boy, was it slow! It was written in Simula-67, running on a PDP-10, and probably backtracked too much. Adding backtracking statistics would have revealed if it was doing something silly.

Mike Hankey31-Dec-19 13:55

Mike Hankey

31-Dec-19 13:55

Did you know Raymond Burr had a brother that was a lumber jack?

His name was Tim!

Ok I'm bored, I promise this will be the last joke this year!

Did a little mechanic work today.
Put a rear end in a recliner!

JaxCoder.com

Greg Utas31-Dec-19 14:11

31-Dec-19 14:11

Plausible. He was born in British Columbia!

Ron Anders31-Dec-19 14:34

Ron Anders

31-Dec-19 14:34

Ba Doom Pshhh!

DRHuff31-Dec-19 15:50

DRHuff

31-Dec-19 15:50

That will be enough of that. There is an expectation of no more Raymond Burr or Ray Charles references in the lounge in the new year...

That's right - we need Ray Bans to protect our 2020 vision.

Hey Mike - get my coat while you are over there getting yours.

I, for one, like Roman Numerals.

lopatir31-Dec-19 17:49

lopatir

31-Dec-19 17:49

luberjack is in fact a very honorable profession,

after all:

they help us all to see the forest by removing the trees that are in the way.

<< Signature removed due to multiple copyright violations >>

OriginalGriff31-Dec-19 21:55

OriginalGriff

31-Dec-19 21:55

He had loads of brothers:

Har became a sailor.
Bom joined the air force.
Lim taught Yoga.
Fib became a politician.
Rob went to jail.
Cob moved to Oz.

"I have no idea what I did, but I'm taking full credit for it." - ThisOldTony
AntiTwitter: @DalekDave is now a follower!

Daniel Pfeffer1-Jan-20 0:50

Daniel Pfeffer

1-Jan-20 0:50

You forgot the rest of the family:

Bar became a hair stylist
Dub became a sound technician
and Num became an accountant.

Freedom is the freedom to say that two plus two make four. If that is granted, all else follows.
-- 6079 Smith W.

Ravi Bhavnani1-Jan-20 9:07

Ravi Bhavnani

1-Jan-20 9:07

I knew you wood take a crack at this. Smile | :)

/ravi

My new year resolution: 2048 x 1536
Home | Articles | My .NET bits | Freeware
ravib(at)ravib(dot)com

Re: Second call, Mr. Cthulhu

Mike Hankey1-Jan-20 9:31

Mike Hankey

1-Jan-20 9:31

Yeah I FELL for it! Smile | :)

Did a little mechanic work today.
Put a rear end in a recliner!

JaxCoder.com

Second call, Mr. Cthulhu

PIEBALDconsult31-Dec-19 12:14

PIEBALDconsult

31-Dec-19 12:14

As reported in Paging-Mr-Cthulhu[^] I'm dealing with "bad" HTML (not that any HTML is truly good).

The first attempt used a Regular Expression to grab the TH and TD elements (within TR elements of course). Hence the above reference. It works.
The second attempt used HtmlAgilityPack with good results, but I doubt I can get it installed on the server, and it didn't provide as much feedback as I'd like. It works.

Sooo... yesterday and today were spent implementing an HTML-to-XML process. Using a Regular Expression to grab anything which looks roughly like a tag.
Then the process iterates the tags, matching end tags with start tags, and figuring out how to deal with unmatched start and end tags (and firing events for such aberrations).
The result is an XmlDocument, which enforces well-formed XML (but not necessarily valid XHTML).

It works. Java | [Coffee]

Super Lloyd31-Dec-19 20:18

Super Lloyd

31-Dec-19 20:18

Ph'nglui mglw'nafh Cthulhu R'lyeh wgah'nagl fhtagn!

A new .NET Serializer
All in one Menu-Ribbon Bar
Taking over the world since 1371!

Re: Second call, Mr. Cthulhu

Mark_Wallace31-Dec-19 21:54

Mark_Wallace

31-Dec-19 21:54

PIEBALDconsult wrote:
(not that any HTML is truly good)

Only a dev could make that complaint.

Programming languages are for nerds like us; HTML makes room for everyone else.

Besides, it's HTML that made the Internet; without HTML there would be no Internet, so it's one of the most important inventions in the entire history of the world.

My God!

I just envisioned a dev-only Internet -- CP, SO, 10,000,000 blogs bitching about the latest Star Wars movie, endless rants about preferred coding standards, and everything else about why one programming language is supreme, and the rest are all garbage.

Not a pretty sight...

I wanna be a eunuchs developer! Pass me a bread knife!

I'm finally ready for an article. I'm exhausted

honey the codewitch31-Dec-19 7:21

Re: I'm finally ready for an article. I'm exhausted

31-Dec-19 7:21

I've spent several days adding a bunch of foundational features to Parsley like composition parsing, lex priorities, additional firsts and follows hints and bugfixing.

The upshot, is it now parses Slang except for directives and comments both of which it currently discards, but i'll fix it shortly. It's not critical. I'm taking a novel approach with comment preservation - or at least I haven't seen it using other parsers and lexers. I have three major states for a terminal - normal, hidden, or skipped. If it's skipped, it still picks it up in the lexer and gives you the skipped tokens in each advance in the parser, as a separate list, so you can get them "out of band" and not interfere with the regular parse. In my hand-rolled slang parser i just checked for comments everywhere, which is buggy and stupid, but sometimes stupid works.

I still have quite a bit to do before it can do all the things the hand rolled parser can do, but at this point it's 80% my grammar that needs to be coded.

The grammar parses, but i'll be putting in => actions to build the codedom tree from the parse tree. That way if i call EvaluateXXXX() on the parser i get a codedom tree back.

At that point it will do what the hand rolled parser does and i can swap it out and pray i got it right Laugh | :laugh:

But for now, an article will have to suffice.

When I was growin' up, I was the smartest kid I knew. Maybe that was just because I didn't know that many kids. All I know is now I feel the opposite.

Southmountain31-Dec-19 8:09

Southmountain

31-Dec-19 8:09

looking forward to your new post! Happy New Year!

diligent hands rule....

Re: I'm finally ready for an article. I'm exhausted

honey the codewitch31-Dec-19 8:13

Re: I'm finally ready for an article. I'm exhausted

31-Dec-19 8:13

Thanks!

I'm always thrilled to know someone appreciates my work. My latest parser is very clever, IMO. It's not my fault. Laugh | :laugh:

I stumbled on to an idea practically by accident and it worked out pretty well.

I've got it parsing 95% of Slang now - I'm just missing comments and directives, both of which are currently skipped, but i think that's article material. Smile | :)

Happy new year to you and yours as well.

When I was growin' up, I was the smartest kid I knew. Maybe that was just because I didn't know that many kids. All I know is now I feel the opposite.

Greg Utas31-Dec-19 10:19

Re: I'm finally ready for an article. I'm exhausted

31-Dec-19 10:19

I've thought about trying to preserve comments as well, so how you do it sounds interesting.

Happy New Year and good luck with the article!

honey the codewitch31-Dec-19 10:23

Re: I'm finally ready for an article. I'm exhausted

31-Dec-19 10:23

Thanks! I just posted it. I've run into the problem a lot, and the answer of how to do it depends on what you're using it for.

In previous parsers, I made the *parser* not the *lexer* skip over hidden terminals. That way, I could set ShowHidden to true and the parser would spit those out in the main stream without interfering with the parse.

That's great for syntax highlighting, but i needed something a bit different here for a couple of reasons.

For starters, I still want to hide some things, like whitespace.
Second, all my parser code now is *inside* the parser (where it belongs) rather than outside of it, driving it, as it was in PCK. Because of this, I needed to expose the comments using an alternate means.

"Skipped" solves these problems, even if it does mean carrying a bag of unparsed tokens around every time i advance.

When I was growin' up, I was the smartest kid I knew. Maybe that was just because I didn't know that many kids. All I know is now I feel the opposite.

honey the codewitch31-Dec-19 12:24

Re: I'm finally ready for an article. I'm exhausted

31-Dec-19 12:24

I tried my way. It works like a charm. Woo! The only issue is (and this is not because of this particularly but ongoing and getting worse) my gplex specs have "a lot" of code in them - not an unreasonable amount yet - but more than lexer specs usually do, so it's a bit bigger than normal.

Oh well. Skipping comments is easy.

lineComment<skipped>='\/\/[^\n]*';
blockComment<skipped,blockEnd="*/">="/*";

blockEnd is another feature I added, though getting gplex to do it was a trick. It makes it far easier to indicate things like C block comments, HTML/XML/SGML comments, CDATA sections, basically anything with a multicharacter ending condition. Sure you can do it with lazy matching but not all lexers support that (i turned it off for performance) - this way is faster and clearer.

When I was growin' up, I was the smartest kid I knew. Maybe that was just because I didn't know that many kids. All I know is now I feel the opposite.

Greg Utas31-Dec-19 13:38

Re: I'm finally ready for an article. I'm exhausted

31-Dec-19 13:38

I'd have to adapt your strategy because my code is totally hand-rolled. This function would probably have to assemble comments and save them in an adjunct vector of strings. It's invoked from many places and I'm loath to spend time speculatively constructing and passing around strings.

size_t Lexer::NextPos(size_t pos) const
{
   //  Find the next character to be parsed.
   //
   while(pos < size_)
   {
      auto c = source_->at(pos);

      switch(c)
      {
      case SPACE:
      case CRLF:
      case TAB:
         //
         //  Skip these.
         //
         ++pos;
         break;

      case '/':
         //
         //  See if this begins a comment (// or /*).
         //
         if(++pos >= size_) return string::npos;

         switch(source_->at(pos))
         {
         case '/':
            //
            //  This is a // comment.  Continue on the next line.
            //
            pos = source_->find(CRLF, pos);
            if(pos == string::npos) return pos;
            ++pos;
            break;

         case '*':
            //
            //  This is a /* comment.  Continue where it ends.
            //
            if(++pos >= size_) return string::npos;
            pos = source_->find(COMMENT_END_STR, pos);
            if(pos == string::npos) return string::npos;
            pos += 2;
            break;

         default:
            //
            //  The / did not introduce a comment, so it is the next
            //  character of interest.
            //
            return --pos;
         }
         break;

      case BACKSLASH:
         //
         //  See if this is a continuation of the current line.
         //
         if(++pos >= size_) return string::npos;
         if(source_->at(pos) != CRLF) return pos - 1;
         ++pos;
         break;

      default:
         return pos;
      }
   }

   return string::npos;
}

modified 4-Jan-20 10:04am.

honey the codewitch31-Dec-19 17:21

Re: I'm finally ready for an article. I'm exhausted

31-Dec-19 17:21

Well, yours do work better than mine since you accounted for line continuation and i forgot about it

When I was growin' up, I was the smartest kid I knew. Maybe that was just because I didn't know that many kids. All I know is now I feel the opposite.

Greg Utas1-Jan-20 0:39

Re: I'm finally ready for an article. I'm exhausted

1-Jan-20 0:39

A relatively recent addition. Laugh | :laugh:

honey the codewitch1-Jan-20 5:54

Re: I'm finally ready for an article. I'm exhausted

1-Jan-20 5:54

I still have to figure out how to match multiline tokens with gplex/lex/flex.

Have you ever used those tools? Had any luck with them?

I'm using gplex now, and it works, but it's a bit clunky sometimes. I'd like to write my own scanner generator that supports unicode but i need to understand NFA regex first, and I only understand DFA regex. Frown | :(

When I was growin' up, I was the smartest kid I knew. Maybe that was just because I didn't know that many kids. All I know is now I feel the opposite.

Greg Utas1-Jan-20 6:26