|
OK: I'll take it.
Making amends for
Communist RED
topless (remove header)
costume dRESS
REDRESS
"I have no idea what I did, but I'm taking full credit for it." - ThisOldTony
AntiTwitter: @DalekDave is now a follower!
|
|
|
|
|
I think make amends would be a better fit - just sayin
"We can't stop here - this is bat country" - Hunter S Thompson - RIP
|
|
|
|
|
Hey! Oi don wrote 'em, Oi jus' answer 'em!
"I have no idea what I did, but I'm taking full credit for it." - ThisOldTony
AntiTwitter: @DalekDave is now a follower!
|
|
|
|
|
No wot you meen guv
"We can't stop here - this is bat country" - Hunter S Thompson - RIP
|
|
|
|
|
as any fule kno ...
"I have no idea what I did, but I'm taking full credit for it." - ThisOldTony
AntiTwitter: @DalekDave is now a follower!
|
|
|
|
|
...@DaveAuld I thought of you ... Coffee Style[^]
"I have no idea what I did, but I'm taking full credit for it." - ThisOldTony
AntiTwitter: @DalekDave is now a follower!
|
|
|
|
|
...and best served with a deep fried Mars bar, I assume.
I am not a fan of coffee, but I do really fancy a scotch egg right now
|
|
|
|
|
|
I'm just beginning my work day right now, having a coffee, and I'm logged into a server named Haggis.
|
|
|
|
|
So Parsley is cool, but the hand written parser I wrote for Slang is under 100K, and the Generated one is 939K. The latter parser is slower, but more accurate, and otherwise better, but not 10 times the size better.
For comparison, Antlr's grammar for parsing C#6 was worth about 800K and change of C# source code but it didn't parse so i don't know if it was complete, or what was wrong with it.
I'm considering a code synthesis approach which might make the generated code smaller, especially for loops but i don't know how much i can really save here, but the emitted code would look cooler. =P - closer to hand written code.
People say size doesn't matter these days. (Apparently its all in how you use it) but by my rough calculations, this is the difference between 40-50k of binary size and closer to 500k of binary size when its compiled. That means cache lines and locality are going to take a hit i think, because the working set has to grow accordingly though i would need to do extensive comparison testing before i could be sure of any of that. Still, it doesn't look good.
hack everything.
|
|
|
|
|
Quote: People say size doesn't matter these days They are wrong and always have been. Just because people have large disk drives and mass amounts of memory does not mean that lean and mean is not preferable to bloat code.
I have seen code that only took 100k to do the same thing that someone did in over 1000K. Just remember Complex is easy, but simple is hard. Beautiful code is simple and easy to understand, but is often hard to create.
On a C compiler that does not do optimization "i = ++n" is much more efficient than "i = n++". On a C++ compiler that does optimization (when dealing with classes) the difference in efficiency is much worse (n++ is a very bad habit to fall into).
The goal of code generation should be concise efficient code.
Sorry, I think a got off subject. Just keep in mind that even generated code should be reviewed.
INTP
"Program testing can be used to show the presence of bugs, but never to show their absence." - Edsger Dijkstra
"I have never been lost, but I will admit to being confused for several weeks. " - Daniel Boone
|
|
|
|
|
Meh - parser generators never generate reviewable code. The exception being those that can do code synthesis, but I can't even name one that does that.
Parsley, being recursive descent, generates more readable code than 90% of the generators out there. You can actually debug it and make sense of it.
However, it doesn't mean it's compact. It's just not. Generated code is generated code, and LL(1) parsing is LL(1) parsing. Being that this is the landscape, my code size is not unreasonable.
As I suggested, in generated code size it compares to ANTLR, which is a much more mature and very popular parser generator.
However, Parsley's code is somewhat readable, whereas ANTLR's isn't.
Here's an excerpt of Parsley's generated code (c# in this case):
internal static ParseNode ParseFactorExpression(ParserContext context) {
int line = context.Line;
int column = context.Column;
long position = context.Position;
if ((((((((((((((((((((((((((((((((((((ExpressionParser.add == context.SymbolId)
|| (ExpressionParser.sub == context.SymbolId))
|| (ExpressionParser.not == context.SymbolId))
|| (ExpressionParser.inc == context.SymbolId))
|| (ExpressionParser.dec == context.SymbolId))
|| (ExpressionParser.lparen == context.SymbolId))
|| (ExpressionParser.nullLiteral == context.SymbolId))
|| (ExpressionParser.typeOf == context.SymbolId))
|| (ExpressionParser.defaultOf == context.SymbolId))
|| (ExpressionParser.verbatimStringLiteral == context.SymbolId))
|| (ExpressionParser.characterLiteral == context.SymbolId))
|| (ExpressionParser.integerLiteral == context.SymbolId))
|| (ExpressionParser.floatLiteral == context.SymbolId))
|| (ExpressionParser.stringLiteral == context.SymbolId))
|| (ExpressionParser.boolLiteral == context.SymbolId))
|| (ExpressionParser.newKeyword == context.SymbolId))
|| (ExpressionParser.thisRef == context.SymbolId))
|| (ExpressionParser.baseRef == context.SymbolId))
|| (ExpressionParser.verbatimIdentifier == context.SymbolId))
|| (ExpressionParser.identifier2 == context.SymbolId))
|| (ExpressionParser.boolType == context.SymbolId))
|| (ExpressionParser.charType == context.SymbolId))
|| (ExpressionParser.stringType == context.SymbolId))
|| (ExpressionParser.floatType == context.SymbolId))
|| (ExpressionParser.doubleType == context.SymbolId))
|| (ExpressionParser.decimalType == context.SymbolId))
|| (ExpressionParser.sbyteType == context.SymbolId))
|| (ExpressionParser.byteType == context.SymbolId))
|| (ExpressionParser.shortType == context.SymbolId))
|| (ExpressionParser.ushortType == context.SymbolId))
|| (ExpressionParser.intType == context.SymbolId))
|| (ExpressionParser.uintType == context.SymbolId))
|| (ExpressionParser.longType == context.SymbolId))
|| (ExpressionParser.ulongType == context.SymbolId))
|| (ExpressionParser.objectType == context.SymbolId))) {
System.Collections.Generic.List<ParseNode> children = new System.Collections.Generic.List<ParseNode>();
children.Add(ExpressionParser.ParseUnaryExpression(context));
children.AddRange(ExpressionParser.ParseFactorExpressionPart(context).Children);
return new ParseNode(121, "FactorExpression", children.ToArray(), line, column, position);
}
throw new SyntaxException("Expecting UnaryExpression", line, column, position);
}
Most of the size here is taken up by the FIRST set comparison in the if statement - canonical LL(1). Note that 121 *was* a constant, but since it's only ever used internally by the parser, and never seen outside of it, i removed the constants like this from the list of fields in the class (there can be hundreds of these in a real world grammar). I had to make a call between code size and readability in that instance. The constant is literally "FactorExpression" and always followed by it's string name so I favored size over using a constant here.
Anything seen publicly has constant symbols and when it has constants, the generated code uses the constants.
hack everything.
|
|
|
|
|
Funny how the code generator uses Yoda conditions - Wikipedia[^], even though it should be immune to the '==' -> '=' typo problem for which they were invented in the first place.
GOTOs are a bit like wire coat hangers: they tend to breed in the darkness, such that where there once were few, eventually there are many, and the program's architecture collapses beneath them. (Fran Poretto)
|
|
|
|
|
That's because I use them and i wrote the generator. Force of habit.
hack everything.
|
|
|
|
|
I should have come back earlier - I am impressed. I am not knocking your generator; keep it up, because it looks good to me.
As an actual human being I would try to avoid something that looked like that (reformat the code at minimum). Because the first thing I would do with that code is reformat it into something that a human can read without counting braces (brackets or whatever). "Trust, but verify".
There was a time when that whole 'if' statement would be one step in a debugger. You had to stop and look at every variable to figure out why it passed or failed. That is why some modern compilers (VS) allow you to actually step through the individual sub-statements in the 'if' statements. Why that was the case has to do with the language itself, which I cannot explain at the moment.
I got carried away again. In theory, if not in practice, generated code should be more precise than what a human could create. In reality, a human that really knows the language can do better. But, in my experience, that is a very small subset of humans, so keep it up.
INTP
"Program testing can be used to show the presence of bugs, but never to show their absence." - Edsger Dijkstra
"I have never been lost, but I will admit to being confused for several weeks. " - Daniel Boone
|
|
|
|
|
Because I'm using a language independent renderer, I have no control over the final code format.
Basically, this code was an AST, and that AST is rendered by a 3rd party into VB.NET, C# or whatever.
It's that 3rd party renderer that puts out the extra parens.
One day I'll rebuild my custom C# renderer that eliminates this and other issues, but it's non-trivial
Real programmers use butterflies
|
|
|
|
|
Perfect code is about the same length as a piece of string.
I wanna be a eunuchs developer! Pass me a bread knife!
|
|
|
|
|
I like that answer. Although in this case I have a big nasty tangled ball of yarn I suppose. 934K source file. woo
And that's not counting the 235k tokenizer it needs
hack everything.
|
|
|
|
|
Normally is written less code as possible the best solution.
Important hint: this includes the usage of well tested libraries and frameworks.
Press F1 for help or google it.
Greetings from Germany
|
|
|
|
|
Umm, all this code is generated.
So I've written zero code here.
hack everything.
|
|
|
|
|
If it works and meets performance and -ility expectations I wouldn't worry about it. It's always nice to find novel approaches that make the code smaller, cleaner, and/or faster but you could spend months on a solution that might not even exist and might mean absolutely nothing to anyone but yourself. That's time you could have poured into another project
Disclaimer: This advice is a little hypocritical as I have to actively stop myself from doing the same thing.
|
|
|
|
|
Unfortunately I have no idea what is acceptable because this is a tool for other people to use, not for a specific project or client.
It has competition though so anything smaller and/or faster is going to be "better", ie more desirable.
So there's that.
I have my own concerns about using this parser i generated because my CodeDomGoKit.brick.cs file is already over 500K and that's with a parser that's hand written and under 100K. Not 934K or whatever it was!
It's not a show stopper, but it's almost embarrassing for me to have build tools that are 1MB executables.
hack everything.
|
|
|
|
|
If competing with products in the market is the goal I can see the problem. Usually I try to stick to my original requirements/goals to avoid the endless cycle of "I could probably make this better" but that doesn't seem to be an option here with active competition.
Considering that, my only input would be that generated code will never be as efficient as hand-rolled. I'm running into a similar issue with a project that generates a full type-tree constructor for a given type given the data to "fill" that type tree. There are always going to be edge cases that exist in the general problem space which prevent some optimizations. You can't solve a general problem as efficiently as a specific problem.
Best of luck though! I only understand a fraction of what you post but it sounds like a really cool project
|
|
|
|
|
Jon McKee wrote: I only understand a fraction of what you post but it sounds like a really cool project
Yours too - although i can't quite wrap my head around it. Now I'm curious, are you trying to precompile something you'd have to reflect at runtime?
hack everything.
|
|
|
|
|
Something like that, yea. I'm creating a delegate for a type that constructs it and all of it's dependencies. I'm trying to write the data input/"parser" layer very generically so it can support a lot of cases, but the original idea related to parsing would be something like:
class User {
[OverrideMethodBind(Method = "Add")]
[OverrideParameterBind(Type = typeof(Address))]
List<Address> addresses;
[DataBind]
string firstName;
[DataBind(ID = "LN")]
string lastName;
}
class Address {
[DataBind(ID = "street")]
string streetAddress;
string city;
[DataBind]
StateAbbrev state;
int zipCode;
}
enum StateAbbrev {
AL,
WA,
MI,
}
In this example you want to create the object based on (state, streetAddress)+ (i.e. one or more times), firstName, and lastName from the data.
You can add metadata to the items of interest, supply "parsers" for various members (based on whatever condition you want - type, member, or ID), and it uses the data to then create (possibly many) Users based on that data. Removes the need for boilerplate related to loading a domain object with parsing results or whatever your data source is (another object, database, etc). So instead of writing something like:
MatchCollection results = regex.Matches();
User u = new User(results["firstName"], results["LN"]);
Address a = new Address(results["street"], results["state"]);
u.addresses.Add(a);
You'd write:
ObjectGenerator g = new ObjectGenerator<User>().AddParser(regex);
ObjectGenerator g = new ObjectGenerator(typeof(User)).AddParser(regex);
User user = g.GenerateObject();
User[] users = g.GenerateObjects();
Not a finalized interface but that's the jist of it.
|
|
|
|
|