Click here to Skip to main content
14,453,505 members

Welcome to the Lounge

   

For discussing anything related to a software developer's life but is not for programming questions. Got a programming question?

The Lounge is rated Safe For Work. If you're about to post something inappropriate for a shared office environment, then don't post it. No ads, no abuse, and no programming questions. Trolling, (political, climate, religious or whatever) will result in your account being removed.
 
GeneralRe: I got this new breakfast cereal called "Lorem ipsum" Pin
Rick York22-Jan-20 5:55
mveRick York22-Jan-20 5:55 
GeneralCCC 22/1/2019 Pin
super21-Jan-20 22:53
professionalsuper21-Jan-20 22:53 
GeneralRe: CCC 22/1/2019 Pin
OriginalGriff21-Jan-20 22:54
communityengineerOriginalGriff21-Jan-20 22:54 
GeneralRe: CCC 22/1/2019 Pin
Duncan Edwards Jones21-Jan-20 23:03
professionalDuncan Edwards Jones21-Jan-20 23:03 
GeneralRe: CCC 22/1/2019 Pin
OriginalGriff21-Jan-20 23:17
communityengineerOriginalGriff21-Jan-20 23:17 
GeneralRe: CCC 22/1/2019- we have a winner. Pin
super22-Jan-20 2:35
professionalsuper22-Jan-20 2:35 
GeneralMessage Closed Pin
22-Jan-20 2:34
Memberbay5s22-Jan-20 2:34 
GeneralA tale of two lexers Pin
honey the codewitch21-Jan-20 22:50
mvahoney the codewitch21-Jan-20 22:50 
(Reference source provided below)

So optimizing my lexer machine is slow going, but i'm feeling clever.

Introducing a new switch/case instruction worked, cutting the lexing time for the DFA version of my simple test lexer to as little as half what the NFA version costs

NFA lex: fubar bar 123 1foo bar -243 0
Lexed in 0.0217 msec
DFA lex: fubar bar 123 1foo bar -243 0
Lexed in 0.0134 msec

The trouble i'm having is
A) loading lazy quantified regexs into the NFA graph. I don't know how to represent them
B) Doing a partial, opportunistic transformation from NFA to DFA. I am running into trouble due to my use of a rangeset as my basic input alphabet element among other things.

Once I can do that though, I'll have significantly optimized this little machine. And I'll have what will be pretty close to a "real" optimizing compiler for it.

NFA version:
jmp id, int, space, error

id: ; [A-Z_a-z][0-9A-Z_a-z]*
set "A".."Z", "_", "a".."z"
id_loop: jmp id_part, id_done
id_part: set "0".."9", "A".."Z", "_", "a".."z"
jmp id_loop
id_done: save 1
match 0

int: ; (0|\-?[1-9][0-9]*)
jmp int_zero, int_nonzero
int_zero:
char "0"
jmp int_done
int_nonzero: jmp int_neg, int_pos
int_neg: char "-"
int_pos: set "1".."9"
int_loop: 
jmp int_part, int_done
int_part: set "0".."9"
jmp int_loop
int_done: save 1
match 1

space: ; (\t|\n|\v|\f|\r| )
set "\t", "\n", "\v", "\f", "\r", " "
save 1
match 2

error: ; anything not caught above returns -1
any
save 1
match -1


DFA version:
save 0
switch case "A".."Z","_","a".."z":id_part, case "0":int_done,case "-":int_neg,case "1".."9":int_digits,case "\t", "\n", "\v", "\f", "\r", " ":space,default:error

id_part:
switch case "0".."9","A".."Z","_","a".."z":id_part, default:id_done
id_done:
save 1
match 0

int_neg:
set "1".."9"
int_digits:
switch case "0".."9":int_digits, default:int_done
int_done:
save 1
match 1

space:
save 1
match 2

error:
any
save 1
match -1

Real programmers use butterflies

GeneralRe: A tale of two lexers Pin
Sander Rossel21-Jan-20 23:26
professionalSander Rossel21-Jan-20 23:26 
GeneralRe: A tale of two lexers Pin
OriginalGriff21-Jan-20 23:34
communityengineerOriginalGriff21-Jan-20 23:34 
GeneralRe: A tale of two lexers Pin
honey the codewitch21-Jan-20 23:49
mvahoney the codewitch21-Jan-20 23:49 
GeneralRe: A tale of two lexers Pin
Sander Rossel21-Jan-20 23:51
professionalSander Rossel21-Jan-20 23:51 
GeneralRe: A tale of two lexers Pin
honey the codewitch21-Jan-20 23:46
mvahoney the codewitch21-Jan-20 23:46 
GeneralRe: A tale of two lexers Pin
Sander Rossel21-Jan-20 23:52
professionalSander Rossel21-Jan-20 23:52 
GeneralRe: A tale of two lexers Pin
honey the codewitch21-Jan-20 23:54
mvahoney the codewitch21-Jan-20 23:54 
GeneralRe: A tale of two lexers Pin
Richard Deeming22-Jan-20 0:57
communityengineerRichard Deeming22-Jan-20 0:57 
GeneralRe: A tale of two lexers Pin
Rick York22-Jan-20 5:57
mveRick York22-Jan-20 5:57 
GeneralRe: A tale of two lexers Pin
Mark_Wallace22-Jan-20 0:23
MemberMark_Wallace22-Jan-20 0:23 
GeneralRe: A tale of two lexers Pin
honey the codewitch22-Jan-20 0:27
mvahoney the codewitch22-Jan-20 0:27 
GeneralFinally! A truly useful update! Pin
Mark_Wallace21-Jan-20 21:42
MemberMark_Wallace21-Jan-20 21:42 
GeneralRe: Finally! A truly useful update! Pin
Sander Rossel21-Jan-20 22:45
professionalSander Rossel21-Jan-20 22:45 
GeneralRe: Finally! A truly useful update! Pin
Mark_Wallace22-Jan-20 0:17
MemberMark_Wallace22-Jan-20 0:17 
GeneralRe: Finally! A truly useful update! Pin
Kornfeld Eliyahu Peter21-Jan-20 22:50
mveKornfeld Eliyahu Peter21-Jan-20 22:50 
GeneralWondering why all those funny names for new programming languages? Pin
Kornfeld Eliyahu Peter21-Jan-20 21:31
mveKornfeld Eliyahu Peter21-Jan-20 21:31 
GeneralRe: Wondering why all those funny names for new programming languages? Pin
Sander Rossel21-Jan-20 22:48
professionalSander Rossel21-Jan-20 22:48 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.