|
Or significantly less in woman-years
|
|
|
|
|
It's even shorter in Cat Years.
"I have no idea what I did, but I'm taking full credit for it." - ThisOldTony
AntiTwitter: @DalekDave is now a follower!
|
|
|
|
|
She's a Witch!
|
|
|
|
|
Not to minimize what you have done, but what you have written is more a preprocesor than a compiler (think of the early C++ compilers that translated the code to C). You still rely on the underlying C# compiler to do the work of converting the translated source to executable code.
Writing the optimizer (to name just one component) of a modern compiler is a non-trivial task. I can easily see it taking 20-30 man-years to write.
Freedom is the freedom to say that two plus two make four. If that is granted, all else follows.
-- 6079 Smith W.
|
|
|
|
|
It's not though. All I'd have to do to make it a compiler in the true sense is target IL, not the CodeDom. Maybe I'll do that as well to settle an argument before it happens.
I think you're overestimating what the CodeDOM is doing for me here. I still need to be able to get the current scope from any expression - the codedom doesn't help natively with this at all.
And frankly, if you've done much examination of IL the C# compiler does eff all for optimizing, as I think Super Lloyd and I discussed before - it left him feeling less than super once he disassembled.
Microsoft punts nearly everything to the JIT compiler, which IMO is silly because the JIT can only do peephole optimizations. As I've said before when it comes to C#, you'd be surprised at what it doesn't do in this regard.
Edit: I should add, the CodeDOM only saves me building my own AST. It doesn't parse, but it gives me something to hold my parsed data in. I still have to patch it up because foo.bar is too ambiguous for the codedom - is it referencing an argument? a field? a property? the codedom needs to to know. Is it an implicit this reference or an implicit static reference? or a reference to the base class?
So that means I need fully type resolution, for both external and declared types, and I need scope resolution so i can tell what is a variable, what is an argument, what is a field, and where it goes.
This is what the middle tier of a compiler does.
Where I'm really saving is in design. I'm implementing a subset of C#. The language has already been designed, and I have access to that goldmine of specs for it.
It's a well designed language. I don't have to reinvent it, and I know it's unambiguous and battle tested.
When I was growin' up, I was the smartest kid I knew. Maybe that was just because I didn't know that many kids. All I know is now I feel the opposite.
modified 1-Dec-19 14:03pm.
|
|
|
|
|
As Tech U sophomores, in our spare time a couple of us created a "Norwegian" programming language, all keywords being in Norwegian (in the dialect based language variant called "Nynorsk"). This was as a protest against the professor teaching the "101 Programming" course, who refused to accept any hand-ins that used anything but Norwegian names for all variables, functions, user defined types etc. He was convinced that programming in a hodgepodge of English keywords and Norwegian variables would be a significant contribution to the fight against the extinction of the Norwegian language.
So we provided a "compiler" where you could write e.g.
MEDAN Vekta > Bæreevna BYRJ
TyngsteStokken := FinnTyngsteStokken;
Vekta = Vekta - TyngsteStokken.Vekt;
STOGG We sure impressed the freshmen students who never had been doing any programming before. When they later learned that all we did was a pure textual replacement of the uppercase words with English keywords before running the standard Pascal compiler, they felt deceived ...
(A small sidetrack: I first read the source code of a compiler in the summer vacation before I started my sophomore year. Until then, I had had a hard time understanding how it could take that much time to translate Pascal to machine code. After learning of all the things the compiler had to do, I switched completely around, finding it hard to understand how it could do it all that fast. Note: This was in 1979, when even a small student program might require a minute or two for compilation and linking.)
|
|
|
|
|
Freedom is the freedom to say that two plus two make four. If that is granted, all else follows.
-- 6079 Smith W.
|
|
|
|
|
Not real interested in an average compiler.
|
|
|
|
|
This isn't an average compiler. For one thing it targets (multiple) other high level languages instead of machine code.
That's rather the point of it in fact. It's a language you can use to write in that is "language independent" at least in terms of .NET languages. If I write code in this it is C# code. It is VB code. It is F# code. Etc.
So what it's used for is when
A) I need to easily make some code generation stuff for end developers who may be using any .NET language.
B) I need things like my parser generators to be able to accept code in their grammars without that tying the generated parsers to a particular programming language.
B is where I really intend to get the most upfront mileage from it, but A may end up overtaking that in long haul.
It's also fun to develop, and I may even add an Eval() function to do true dynamic (compilationless) evaluation of code in .NET (right now all of .NET's dynamism works through background compilation). I'm already doing 80% of the work that goes into that.
When I was growin' up, I was the smartest kid I knew. Maybe that was just because I didn't know that many kids. All I know is now I feel the opposite.
|
|
|
|
|
That's because the statistic comes from the days when you had to write compilers in assembly language so you could then write code in a non-assembly language.
|
|
|
|
|
That's fair.
Though I'm also "cheating" by leveraging .NET's incredible metadata system without which I'd be lost in making my type resolution stuff. That alone saves me I don't know how much time.
When I was growin' up, I was the smartest kid I knew. Maybe that was just because I didn't know that many kids. All I know is now I feel the opposite.
|
|
|
|
|
If you could have access to all the effort that has gone into the gcc compiler, accumulated over the years, I am quite sure that it would add up to a lot more than 30 manyears.
Another side of it: In the first Systems Engineering course I took at the University, the professor claimed that if developing a program takes 1 resource unit, making a program product takes at least 3 resource units, for testing, wrapping it up, documentation, marketing, ... all the non-programming stuff.
And: If developing a standalone program, independent of any other application software, takes 1 resource unit, developing a software component that must interface to other components, link to other components, adhere to standards, satisfy requirements from multiple peer components, and so on, takes 3 resource units.
Making a program product component, it takes 3 * 3 resource units, rounded up to 10 units.
Experience has shown me that the professor was quite optimistic.
Another figure he presented, one that we never believed in as students: When your first version is out on the market, you have typically spent 10% of the total resources that the product will require over its lifetime. 90% of the resource spending is customer support, adding extensions, handle bug reports, ... Obviously, this depends a lot on the success of the product, so 10% is nothing but a ballpark figure. I think the professor was optimistic here as well: I can't think of a single successful product I have been involved with in the development that has not required significantly more than 10 times the resources after the first release.
So if making a simple program takes 1 unit, a program product component takes 10 units, it will require at least 100 resource units over its lifetime.
This sounds like reasonable figures, according to my own experience.
|
|
|
|
|
Clearly a compiler doesn't take that much to develop.
But it takes as much time as their age to develop them to the level of sophistication they have today. In a a word this assertion is a bit of a sophistry!
|
|
|
|
|
I think when that proclamation was written it was well before compilers were advanced. I think it was made in the early 1970s?
C# - even the subset I'm supporting (which includes generics) is fairly sophisticated.
I mean, Slang isn't C#, but it's no Ada either.
When I was growin' up, I was the smartest kid I knew. Maybe that was just because I didn't know that many kids. All I know is now I feel the opposite.
|
|
|
|
|
I would guess the 70s, yes.
In the early 1970s, programmers in general did not know how to write a compiler. First, the methods were poorly developed, nobody knew Well, not quite that bad, but lots of "best practice in compiler writing" has come, and become well known, in much later years. Today, any college course in Systems Programming teaches you lots of the basic techniqes before you are out of scool; that certainly wasn't the case (or only to a microscopic degree) in the 1960s (and, I would say, far into the 1980s). When Wirth published the open source code Pascal compiler in 1971, it wasn't just to provide a pedagogical programming language; it was just as much to show how to make a recursive descent compiler.
Nowadays, you have got lots of compilers you can learn from, and lots of good tools. As was already pointed out: In the old days, compilers were essentially written in assembly code. They were usually closely tied to the machine architecture. Transferring knowledge to another architecture was difficult. There was a plethora of quite different languages: Even if you have written a C compiler, how does that help you in making an APL interpreter? A SNOBOL interpreter? A Lisp interpreter? Forth? Erlang? Even writing a Fortran IV parser poses a number of problems: No reserved words, even word tokens need not be separated by spaces, ...
Today, almost everything is a variant of C/C++, syntactically speaking. (Semantically as well, mostly.) So you know a whole lot of it already. Up until around 1980, developing a compiler often involved a lot of effort in defining the language itself. Besides, expectations to languages were a lot higher in the 1970s-1980s. C is a "you asked for it, you got it" language. Older versions had a frightingly high number of "implementation dependent" features. Compare that to e.g. the very rigorous specification of the CHILL lanugage (ITU's language for programming digital phone switches): As a language, it is an excellent design, very advanced (especially for its time - it appeared 1980), with a series of features that didn't appear in the C family until many years later, and then often incomplete, cumbersome and less elegant. I would guess that making a CHILL compiler, satisfying all formal requirements, could take ten times as much effort as making a C compiler.
|
|
|
|
|
I looked at the first kernel of NT 3.51, it weighed in at 90 MB. Current kernel weighs in at 20GB.
So I suspect that "Depends" really applies here. Are those 30 man-years for the first version someone did for "Fun", or the last version when it has gone all in on being community developed?
|
|
|
|
|
i think it is closer to the first version, given that i think the pronouncement comes from the early 70s iirc. I don't remember now where I read it, but it's old.
When I was growin' up, I was the smartest kid I knew. Maybe that was just because I didn't know that many kids. All I know is now I feel the opposite.
|
|
|
|
|
public CodeTypeMember Member { get; set; }
public CodeStatement Statement { get; set; }
public CodeExpression Expression { get; set; }
public CodeTypeReference TypeRef { get; set; }
public HashSet<string> VariableNames { get; } = new HashSet<string>();
public HashSet<string> MemberNames { get; } = new HashSet<string>();
public HashSet<string> ArgumentNames { get; } = new HashSet<string>();
public HashSet<string> FieldNames { get; } = new HashSet<string>();
public HashSet<string> MethodNames { get; } = new HashSet<string>();
public HashSet<string> PropertyNames { get; } = new HashSet<string>();
public HashSet<string> EventNames { get; } = new HashSet<string>();
public HashSet<string> ThisTargets { get; } = new HashSet<string>();
public HashSet<string> BaseTargets { get; } = new HashSet<string>();
*sigh*
I really hope the GC doesn't hate me for all this. The class with this gets frequently instantiated all over the place.
When I was growin' up, I was the smartest kid I knew. Maybe that was just because I didn't know that many kids. All I know is now I feel the opposite.
|
|
|
|
|
You can't put some of those in a static class?
:: I'll be back to add more to this. ::
|
|
|
|
|
Heh. Nope. In fact, this object gets populated with different information every time it lands on an element of code
var cls = CD.Class("Foo", true);
var m = CD.Method("Bar", MemberAttributes.Public, CD.Param(typeof(string), "baz"));
var slang = SlangParser.ParseExpression("baz + \"world\"");
CodeStatement anchor1 = null;
m.Statements.Add(anchor1=(CD.Call(CD.TypeRef(typeof(Console)), "Writeline", slang)));
cls.Members.Add(m);
var ns = new CodeNamespace();
ns.Types.Add(cls);
var ccu = new CodeCompileUnit();
ccu.Namespaces.Add(ns);
var res = new CodeDomResolver();
res.CompileUnits.Add(ccu);
res.Refresh();
var scope = res.GetScope(anchor1);
That is operating on this object model of abstract code:
public class Foo {
public virtual void Bar(string baz) {
System.Console.Writeline((baz + "world"));
}
}
From here we get all of the information about what variables, fields, properties, methods, events, and arguments we have access to, returned from GetScope({statement})
I use this to perform analysis and patch up the code tree.
When I was growin' up, I was the smartest kid I knew. Maybe that was just because I didn't know that many kids. All I know is now I feel the opposite.
|
|
|
|
|
cognitive meltdown warning: insert lithium control rods into cerebral cortex to avoid further impairment
«One day it will have to be officially admitted that what we have christened reality is an even greater illusion than the world of dreams.» Salvador Dali
|
|
|
|
|
If not all members are required to exist at all times, you may want to consider using Lazy<T> .
/ravi
|
|
|
|
|
Yeah, I've considered something akin to this approach. It's not the creation of them even that worries me so much but the population of them. I *can* do it on demand, for which Lazy here would be awesome, but I'm getting ahead of myself. As much as I know an operation like this is probably a pig, I want to wait to see before I implement demand population. I've stuck a pin in it mentally (and via some comments) but I want to see what the real world hit is going to be like. I have a low to middling PC to run it on, and plenty of test material to throw at it once it's ready.
When I was growin' up, I was the smartest kid I knew. Maybe that was just because I didn't know that many kids. All I know is now I feel the opposite.
|
|
|
|
|
Beauty!
|
|
|
|
|
I hope so. It gets instantiated a lot. I use it during code dom visitation, on any "marked" expression I need to "patch"
Basically, I'm using the CodeDOM as my abstract syntax tree to hold the results of my parse.
But without type information applied to it the tree is ambiguous. That is,
foo.bar.baz()
Could be a delegate invocation of field baz, or a delegate invocation of property baz.
foo could be a variable, a method argument, a type (where bar is a static field), an instance field, etc.
So for you to even know what to compile from this parse, you need to apply type information from the tree.
The CodeDOM has different objects for reference fields, properties, methods and events, plus arguments and variables. So when I parse, i plug the tree with "dummies" - foo is always treated as a variable until it's patched. xxx(...) always refers to a delegate invocation until it's patched.
While I patch, I "visit" each object in the CodeDOM tree, and look for these dummies I inserted. When I find one, I "get scope" which returns one of the monsters the partial code for is above.
I then use that data to match it against the names of each of the dummies I inserted - to see what's a field and what's a method and what's a property, and what's a type, etc. I then use this information to fixup the tree with the appropriate objects, creating compilable code.
Not much different than what the C# compiler does internally.
When I was growin' up, I was the smartest kid I knew. Maybe that was just because I didn't know that many kids. All I know is now I feel the opposite.
|
|
|
|
|