Welcome to the Lounge

Re: I'm finally ready for an article. I'm exhausted

31-Dec-19 8:13

Thanks!

I'm always thrilled to know someone appreciates my work. My latest parser is very clever, IMO. It's not my fault. Laugh | :laugh:

I stumbled on to an idea practically by accident and it worked out pretty well.

I've got it parsing 95% of Slang now - I'm just missing comments and directives, both of which are currently skipped, but i think that's article material. Smile | :)

Happy new year to you and yours as well.

When I was growin' up, I was the smartest kid I knew. Maybe that was just because I didn't know that many kids. All I know is now I feel the opposite.

Greg Utas31-Dec-19 10:19

Re: I'm finally ready for an article. I'm exhausted

31-Dec-19 10:19

I've thought about trying to preserve comments as well, so how you do it sounds interesting.

Happy New Year and good luck with the article!

honey the codewitch31-Dec-19 10:23

Re: I'm finally ready for an article. I'm exhausted

31-Dec-19 10:23

Thanks! I just posted it. I've run into the problem a lot, and the answer of how to do it depends on what you're using it for.

In previous parsers, I made the *parser* not the *lexer* skip over hidden terminals. That way, I could set ShowHidden to true and the parser would spit those out in the main stream without interfering with the parse.

That's great for syntax highlighting, but i needed something a bit different here for a couple of reasons.

For starters, I still want to hide some things, like whitespace.
Second, all my parser code now is *inside* the parser (where it belongs) rather than outside of it, driving it, as it was in PCK. Because of this, I needed to expose the comments using an alternate means.

"Skipped" solves these problems, even if it does mean carrying a bag of unparsed tokens around every time i advance.

When I was growin' up, I was the smartest kid I knew. Maybe that was just because I didn't know that many kids. All I know is now I feel the opposite.

honey the codewitch31-Dec-19 12:24

Re: I'm finally ready for an article. I'm exhausted

31-Dec-19 12:24

I tried my way. It works like a charm. Woo! The only issue is (and this is not because of this particularly but ongoing and getting worse) my gplex specs have "a lot" of code in them - not an unreasonable amount yet - but more than lexer specs usually do, so it's a bit bigger than normal.

Oh well. Skipping comments is easy.

lineComment<skipped>='\/\/[^\n]*';
blockComment<skipped,blockEnd="*/">="/*";

blockEnd is another feature I added, though getting gplex to do it was a trick. It makes it far easier to indicate things like C block comments, HTML/XML/SGML comments, CDATA sections, basically anything with a multicharacter ending condition. Sure you can do it with lazy matching but not all lexers support that (i turned it off for performance) - this way is faster and clearer.

When I was growin' up, I was the smartest kid I knew. Maybe that was just because I didn't know that many kids. All I know is now I feel the opposite.

Greg Utas31-Dec-19 13:38

Re: I'm finally ready for an article. I'm exhausted

31-Dec-19 13:38

I'd have to adapt your strategy because my code is totally hand-rolled. This function would probably have to assemble comments and save them in an adjunct vector of strings. It's invoked from many places and I'm loath to spend time speculatively constructing and passing around strings.

size_t Lexer::NextPos(size_t pos) const
{
   //  Find the next character to be parsed.
   //
   while(pos < size_)
   {
      auto c = source_->at(pos);

      switch(c)
      {
      case SPACE:
      case CRLF:
      case TAB:
         //
         //  Skip these.
         //
         ++pos;
         break;

      case '/':
         //
         //  See if this begins a comment (// or /*).
         //
         if(++pos >= size_) return string::npos;

         switch(source_->at(pos))
         {
         case '/':
            //
            //  This is a // comment.  Continue on the next line.
            //
            pos = source_->find(CRLF, pos);
            if(pos == string::npos) return pos;
            ++pos;
            break;

         case '*':
            //
            //  This is a /* comment.  Continue where it ends.
            //
            if(++pos >= size_) return string::npos;
            pos = source_->find(COMMENT_END_STR, pos);
            if(pos == string::npos) return string::npos;
            pos += 2;
            break;

         default:
            //
            //  The / did not introduce a comment, so it is the next
            //  character of interest.
            //
            return --pos;
         }
         break;

      case BACKSLASH:
         //
         //  See if this is a continuation of the current line.
         //
         if(++pos >= size_) return string::npos;
         if(source_->at(pos) != CRLF) return pos - 1;
         ++pos;
         break;

      default:
         return pos;
      }
   }

   return string::npos;
}

modified 4-Jan-20 10:04am.

honey the codewitch31-Dec-19 17:21

Re: I'm finally ready for an article. I'm exhausted

31-Dec-19 17:21

Well, yours do work better than mine since you accounted for line continuation and i forgot about it

When I was growin' up, I was the smartest kid I knew. Maybe that was just because I didn't know that many kids. All I know is now I feel the opposite.

Greg Utas1-Jan-20 0:39

Re: I'm finally ready for an article. I'm exhausted

1-Jan-20 0:39

A relatively recent addition. Laugh | :laugh:

honey the codewitch1-Jan-20 5:54

Re: I'm finally ready for an article. I'm exhausted

1-Jan-20 5:54

I still have to figure out how to match multiline tokens with gplex/lex/flex.

Have you ever used those tools? Had any luck with them?

I'm using gplex now, and it works, but it's a bit clunky sometimes. I'd like to write my own scanner generator that supports unicode but i need to understand NFA regex first, and I only understand DFA regex. Frown | :(

When I was growin' up, I was the smartest kid I knew. Maybe that was just because I didn't know that many kids. All I know is now I feel the opposite.

Greg Utas1-Jan-20 6:26

Re: I'm finally ready for an article. I'm exhausted

1-Jan-20 6:26

No, I've never used those tools. Usually I start something that grows organically, at which point it's too late to use them. But even if they might help, I'd rather write the code from scratch than search for a tool, read its manual, get it running, and maybe discover, after investing lots of time on it, that it has show-stopping deficiencies. Mostly it's because I hate spending time configuring stuff.

honey the codewitch1-Jan-20 13:45

Re: I'm finally ready for an article. I'm exhausted

1-Jan-20 13:45

those scanner generators are really easy to use once you know how. the issue i have building my own tokenizer/scanner by hand is they get awfully complicated for real world languages as you've probably found. I prefer to use regex to define my lexemes/terminal tokens as it makes it easier - less code i have to write and debug. regex is like nothing to me. I'd almost rather write my own scanner generator and then use that than write my own scanner by hand.

I have an unrelated question for you.

I have two options with respect to parsing C#: I can parse in two passes, parsing only as far as types the first time, just to get type information completely so i can parse the rest accurately (i need type information to disambiguate the parse, just like you do to parse C only worse)

My other option - and what i've done with the hand rolled parser, is simply punt *correcting* the AST after the fact, rather than touching the parse tree. basically *after* I've finished building my AST out of my parse tree, then i go back and correct the AST with type information.

I have this nasty method called "Patch" which visits my entire tree, with type info, looking for bits in the tree it needs to patch. Currently it's slow, but i have some ideas to speed it up.

The first way might be more efficient, but it also might be a dead end. I've never tried it?

Any thoughts? What would you do?

When I was growin' up, I was the smartest kid I knew. Maybe that was just because I didn't know that many kids. All I know is now I feel the opposite.

Greg Utas1-Jan-20 15:25

Re: I'm finally ready for an article. I'm exhausted

1-Jan-20 15:25

This is after a rather large dinner with wine, so I hope it helps. Smile | :)

I don't understand the difference between the parse tree and the AST. I only have one tree, but maybe that's because I'm only doing C++ whereas you're converting one language to another.

Once I have a subtree for something that is "executable" (e.g. an enum, typedef, data declaration or definition, function declaration or definition), I invoke a virtual EnterBlock function on its root, which is like invoking an interpreter. Each node invokes EnterBlock on its descendants, so it proceeds depth first. A QualName (a possibly qualified name) or DataSpec (a QualName tagged with pointers, references, and/or const) implements this by resolving its name based on the current scope. I hope this is what you mean by "getting the type information". It also causes stuff to be pushed onto the operand (types) and operator stacks.

Could this wait until all of the code is parsed? I don't see why not, and I don't see how it would be more or less efficient. In fact, name resolution can occur later if there are errors during the parsing or interpretation. If you run the >check tool on the code, one of the things it does (to clean up #include lists) is to ask each file for all of the things that it uses. Any nodes that have names but that weren't "interpreted" because an error caused them to be skipped will then try to resolve their names.

honey the codewitch1-Jan-20 15:27

at least it's sunny there

1-Jan-20 15:27

Greg Utas wrote:
Could this wait until all of the code is parsed? I don't see why not

That's what I've been doing. I think i just need to implement a more efficient visit on the tree i'm using.

Maybe i'll make it so it can visit only marked nodes.

When I was growin' up, I was the smartest kid I knew. Maybe that was just because I didn't know that many kids. All I know is now I feel the opposite.

honey the codewitch31-Dec-19 5:39

31-Dec-19 5:39

// aloha
return _Compare(x.Key.Key, y.Key.Key);

When I was growin' up, I was the smartest kid I knew. Maybe that was just because I didn't know that many kids. All I know is now I feel the opposite.

Ron Anders31-Dec-19 5:52

Ron Anders

31-Dec-19 5:52

It's sunny here too. Wall to wall sunshine and blue skies. Nary a cloud.

-15 right now, but it's a dry cold and the sun feels nice on your numb face.

Cp-Coder31-Dec-19 6:24

Cp-Coder

31-Dec-19 6:24

Quote:
-15 right now

I am glad to be a "Florida Man". It was cold this morning. 55 F. That is: PLUS 55!

Ron Anders31-Dec-19 6:29

Ron Anders

31-Dec-19 6:29

I thought all the "Florida Mans" were here!

As well as everyone from every other state and walk of life for the holiday craziness in Summit County Colorado.

Big Grin | :-D

Cp-Coder31-Dec-19 6:35

Cp-Coder

31-Dec-19 6:35

Quote:
I thought all the "Florida Mans" were here!

Are you telling me there is more than one? That I am not unique? Rats! Double Rats! Confused | :confused:

stoneyowl231-Dec-19 9:45

stoneyowl2

31-Dec-19 9:45

Ditto.....

A human being should be able to change a diaper, plan an invasion, butcher a hog, navigate a ship, design a building, write a sonnet, balance accounts, build a wall, set a bone, comfort the dying, take orders, give orders, cooperate, act alone, solve equations, analyze a new problem, pitch manure, program a computer, cook a tasty meal, fight efficiently, die gallantly. Specialization is for insects! - Lazarus Long