Welcome to the Lounge

SlugTriton3-Aug-17 6:38

SlugTriton

3-Aug-17 6:38

We don't think anyone wins

Wrong - the attorneys (always) win.

Jeremy Falcon2-Aug-17 6:30

Jeremy Falcon

2-Aug-17 6:30

Yeah, but the point is, it was based on Java. Way more than VB. I think that book author is just biased. I don't have citations, but I always heard that even MS hired some top Java guys to help with the initial design of C#.

As much as we love to hate Java, we still have it thank for what we use.

Jeremy Falcon

User 101325462-Aug-17 22:21

User 10132546

2-Aug-17 22:21

Anders Hejlsberg - Wikipedia - he didn't work for Sun but he worked on Microsoft's J++ language.

Quote:
In 1996, Hejlsberg left Borland and joined Microsoft. One of his first achievements was the J++ programming language and the Windows Foundation Classes; he also became a Microsoft Distinguished Engineer and Technical Fellow. Since 2000, he has been the lead architect of the team developing the C# language. In 2012 Hejlsberg announced his new project, TypeScript, a superset of JavaScript.

Now is it bad enough that you let somebody else kick your butts without you trying to do it to each other? Now if we're all talking about the same man, and I think we are... it appears he's got a rather growing collection of our bikes.

modified 31-Aug-21 21:01pm.

Kirk 103898213-Aug-17 3:10

Kirk 10389821

3-Aug-17 3:10

And this move KILLED Borland. Anders was the key behind Delphi, which I still use and enjoy.
He made Delphi and C++ Builder produce interchangeable object files (compile in one, use in the other),
which also helped him with C# and VB. Same trick.

I used to have a Quick XXX and a Turbo XXX of just about everything back in the day. $99.00 bought you a lot!

Dang I feel old...

User 101325463-Aug-17 3:15

User 10132546

3-Aug-17 3:15

Yep, I remember using C++ Builder back in the 1990's - it was a demo application I wrote in C++ Builder that landed me my first professional programming job (after spending about 12 years in electronics) Smile | :)

modified 31-Aug-21 21:01pm.

kalberts2-Aug-17 22:23

2-Aug-17 22:23

Is it worthwile to distinguish between "Based on Java" and "Based on experience with Java"?

It seems to me (own thoughts - no Wikipedia URL) that the C# designers sure made a thorough study of how Java served in practice, not just the formal language definition. Seeing how it was used, how the compiler handled it, how well the bytecode/VM idea worked in practice, made them go back to the drawing board, saying "OK, Java did it their way - some of it was successful, some of it was not. Now let us redo it in a way we believe will get at higher fraction of 'successful' elements." - Of course they did the same study of C++.

Whether you will call it "based on" or "based on experience with" may be a matter of taste, but to me, the former alternative suggests much more of "a further development of ...", which C# certainly is not from Java.

Michael Martin2-Aug-17 19:37

Michael Martin

2-Aug-17 19:37

Brent Jenkins wrote:
Now is it bad enough that you let somebody else kick your butts without you trying to do it to each other? Now if we're all talking about the same man, and I think we are... it appears he's got a rather growing collection of our bikes.

Right turn Clyde.

Michael Martin
Australia

"I controlled my laughter and simple said "No,I am very busy,so I can't write any code for you". The moment they heard this all the smiling face turned into a sad looking face and one of them farted. So I had to leave the place as soon as possible."
- Mr.Prakash One Fine Saturday. 24/04/2004

kalberts2-Aug-17 22:28

2-Aug-17 22:28

I must say that a statement that says "C# is compiled to a type of bytecode (called CIL)" (my emphasis) makes me somewhat sceptical to the competence of the writer. I doubt very much that (s)he has implemented very many compilers Smile | :)

User 101325462-Aug-17 23:09

User 10132546

2-Aug-17 23:09

Those who can, do.. those that can't, teach.. those that can't teach write Wikipedia articles Smile | :)

modified 31-Aug-21 21:01pm.

Kenworth713-Aug-17 0:27

3-Aug-17 0:27

Can you elaborate? I have to say that I agree. Isn't the .NET CLI a type of bytecode?

kalberts3-Aug-17 1:41

3-Aug-17 1:41

I haven't been working with compilers for a number of years, so maybe there are younger species out there that do things in a different way - I know the "classical" way of doing it, believing that today's compilers are roughly the same:

First, you break the source text into tokens. Then you try to identify structures in the sequence of tokens so that you can form a tree of hiearchical groups representing e.g. functions at some intermediate level, statements at a lower level, terms of a mathematical expression even further down. The term DAG - Directed Acyclic Graph - is commonly used for the parse tree. Nodes in the DAG commonly consist of 3-tuples or 4-tuples in a more or less common format for all nodes: Some semantic / operation code, two or three operands, or whatever else the compiler writer finds necessary.

Many kinds of optimisation is done by restructuring the DAG: Recognizing identical sub-trees (e.g. common subexpressions) that need to be done only once, identifying statements that within a loop will have identical effect in every iteration so that sub-tree can be moved out of the loop, etc. etc. Unreachable code is pruned off the DAG. All such operations are done on an abstract level - a variable X is treated as X without regard to its location in memory, number of bits (unless the language makes special requirements) etc. etc. The DAG is completely independent of the word length, byte ordering, 1- or 2-complement arithmetic, register ID or field structure of the instruction code of any specific machine architecture. You may think of variables and locations as sort of still in a "symbolic" form (lots of symbolic labels where never visible in the source code, so this certainly is "sort of").

Once you have done all the restructuring of the DAG that you care for, you may traverse the tree's leaf node to generate the actual machine instructions. (This part of the compiler is commonly called the "back end".) Now you assign memory addresses, use of registers, choose the fastest sequence of machine instructions for that specific machine. You can still do some optimization, e.g. keeping values in registers (now that you know which registers you've got), but it is essentially very local. The DAG indicates which sub-trees are semantically independent of each other, so that you may reorder them, run them in parallell, or e.g. assemble six independent multiplication operations into one vector multiply if your CPU allows. All internal symbolic rerferences can be peeled off; the only symbols retained are external entry points and references to external modules.

The back end may produce machine instructions for a hypthetical CPU that does not exist in silicon. Yet it has (or may have) its registers, word length, binary address space etc. There could be a machine having this instruction set as its native one. Many years ago, someone wrote an alternative microcode for a PDP-11 architecture so that it could execute the P4 bytecodes directly - but it was dead slow! Usually, you make a software virtual machine that pretends to be a "real" CPU for those instructions, interpreting the bytecodes one by one. JVM is such a virtual machine. For many years, this was The Way to run Java.

Compilers for dotNET essentially has no backend - they do not generate anything ready for execution. Essentially, their output is a linearization of the DAG, i.e. the abstract 4-tuple DAG nodes. The compiler backend is in the dotNET implementation: When a module is requested for the first time, the dotNET backend will do the last stages of compilation, creating machine specific binary code in the native instruction set of that specific machine, assigning specific locations to the named variables etc. The compiled result is stored in a cache, so that next time the same code is requested, no new compilation is required.

So, while Java bytecode is meant to be complete, ready to run, code (symbolic linking to other modules may still be required, but that is not code generation), dotNET assemblies are only half baked, requiring a final compilation step. This takes a little bit of time (it is surprisingly little!), but the generated code is native, requiring no interpretation.

Java compilers can generate binary code, rather than bytecode, but then it is for a specific machine. Or, the JVM may look at a (sequence of) bytecode(s) at run time, translate it to native instructions, but that is like interpreting Motorola 68000 instructions on a 386 (Apple did that when they switced to 386, to run old binary software!), but you will always be bound by the limitiations of the bytecode instruction set.

To a plain user with limited computer knowledge, there is little "visible" difference between the way JVM and dotNET works, but at the internal level, the architectures are signifiantly different.

Member 98620823-Aug-17 3:45

Member 9862082

3-Aug-17 3:45

That was a lovely, clear and comprehensive elaboration. Thank you (assuming of course, that it is correct!).

Kenworth713-Aug-17 12:54

3-Aug-17 12:54

All correct, but are you disagreeing with 'C# is compiled to a type of bytecode'?

kalberts3-Aug-17 22:35

3-Aug-17 22:35

My immediate reaction: Yes, I would disagree. Bytecodes are ready for execution, while the dotNET output from a C# compiler is not.

You could say that I am using a "narrow" defintion of the term, but the term could have a more general meaning. Yes, it could - it could mean any code representation that is built up of bytes. Like source code Smile | :)

. We could even generalize the "byte" concept: The old Univac mainframes could work with 9 bit bytes (4 to the word) or 6 bit bytes (6 to the word), while DEC-10 and DEC-20 had 7-bit bytes (5 to the word and one spare bit).

But that is not the commmon "compiler guy" interpretation of "bytecode". The linearized DAG is not directly executable, like a bytecode. Obviously, you could, at run time, do a just-in-time compilation into a bytecode for an interpreter, rather than compiling into native binary code. But at least as far as I know, there are no virtual machines directly interpreting dotNET assemblies with no pre-execution processing step.

In my student days, we were a group of students making an attempt to build a direct interpreter for the intermedidate language from another front end compiler (for the CHILL programming language), having a similar architecture. We soon realized that the data structures required to maintain the current exeuction state would be immensly large and complex; the task of building the interpreter would far exceed making a complete backend compiler. You couldn't do without a unified symbol table. You couldn't do without a label-to-location mapping. You couldn't do without a lot of state information for various objects. You couldn't do without ... So we never completed the project. (It was a hobby project, not a course assignment.)

Kenworth713-Aug-17 22:47

3-Aug-17 22:47

Nah, op-codes are ready for execution, bytecodes are not. I like this definition: Bytecode is a form of hardware-independent machine language that is executed by an interpreter. It can also be compiled into machine code for the target platform for better performance.

kalberts4-Aug-17 0:17

4-Aug-17 0:17

If you are right, then terminology is changing.

In my book, the op-code is that field in the binary instruction code that indicates what is to be done: Add, shift, jump, ... Usually, the rest of the binary instruction code is operand specifications, such as memory addresses or constants.

In more high-level contexts I have seen "op-code" used for a field in a structure, e.g. a protocol. Again, the opcode tells what is to be done (at the level of "withdraw from bank account", "turn on" etc.), the other fields tells with what is it to be done.

You suggest a new interpretation, that an op-code is both the 'what to do' and the 'with what to do it'. Maybe that is an upcoming understanding, but certainly not the traditional one.

JVM bytecodes are certainly ready for execution, once you find a machine for it. It is easier to build a virtual machine, a simulator, than to build a silicon one. So that is what we do.

You can build a translator from MC68000 instructions to 386 instructions. Or from IBM 360 instructions to AMD64 instructions. Or from JVM instructions to VAX instructions. Suggesting that the intention of compiling to MC68K instructions was to serve as an intermediate step to 386 code would be crazy - that was never the intention of the MC68K instruction set. Similarly, the intention of Java bytecodes were not to be translated into another instruction set.

If you first compile to one instruction set (including bytecode, such as Java or Pascal P4 bytecode), and then translate to another instruction set, there is generally a loss of information, so that the final code is of poorer quality than if it had been compiled directly from the DAG, which usually contains a lot of info that is lost (i.e. used and then discarded) in the backend. Some of it may be recovered by extensive code analysis, but expect to loose a significant part, in the sense that you will not utilize the target CPU fully. Especially if the first/bytecode architecture has a different register philosophy (general? special?), interrupt system or I/O mechanisms. So, if at all possible, generate the target code from the intermedate level, not from some fully compiled instruction set.

Kenworth714-Aug-17 11:00

4-Aug-17 11:00

You're agreeing with me: bytecodes are an abstraction, they for the JVM, not the CPU.

kalberts6-Aug-17 20:32

6-Aug-17 20:32

The JVM IS a CPU!

Just like any microcoded processor is a CPU. There is no principal difference between microcode breaking down the instruction code into activation of the various circuits, or (compiled) C code doing the same thing.

Years ago, I was working on a machine which didn't provide BCD instructions in silicon. Cobol users could either buy a floppydisc with the microcode to give the CPU BCD instructions (microcode was kept in RAM), or they could use the software package that emulated BCD (triggered by the 'Illegal Instruction Code' interrupt).

How would you describe the BCD instructions? As "abstractions" like the Java bytecodes? Or as integral to the CPU (even though they triggered an Illegal Instruction Code if the microcode was not installed)? Are all microcoded intstructions "abstractions"? If so, then this CPU as well as a lot of others are all abstractions.

The Java bytecodes are just like those BCD instructions, except that they cover the complete instruction set. And I am quite sure that it would be possible to write microcode (for this machine with the BCD) to make the bytecodes the "native instruction set" of the machine - it did provide logarithmic/trigonometic functions and malloc/free as instructions, and microcode was developed so that it executed lisp more or less directly (after a tokenization, of course).

Kenworth716-Aug-17 21:08

6-Aug-17 21:08

JVM bytecode is abstract - i.e. it's not your CPU's native opcodes. That's it.

kalberts6-Aug-17 22:05

6-Aug-17 22:05

Not even if I have a machine microcoded to handle them?

Years ago, someone did write microcode for a PDP-11 to directly execute Pascal P4 bytecodes. With that microcode, the machine could execute P4 instructions and nothing else. Load a P4 file into RAM, set the instruction pointer to the starting point, and run: The program would execute.

JVM bytecodes are quite similar to P4 bytecodes; there are no essential principal differences that makes one abstract, the other one machine instructions.

...Unless you say that "native opcodes" are those where each bit in the instruction corresponds directly to one physical control signal steering the transistor logic. If you do, then you reject every CPU that has any microcode at all; you accept only 100% hardcoded logic as a true CPU. Even though less microcode is used in today's CPUs than in the golden days of microcoding (like in the VAX era), almost all general CPUs today (as well as many specialized ones) are to some degree microcode. By your logic, the binary instructions fed to those machines are not the CPU's native opcodes, but only an abstraction.

You have the right to say so, but renaming binary programs for almost all machines to "abstractions" does not contribute anything of significance.

Kenworth717-Aug-17 0:21

7-Aug-17 0:21

If the definition of bytecode is that it's abstract, & non-natively-executable by the CPU, therefore requires another layer to turn it into the op-codes your given CPU can understand, then sure both your Java bytecode and your CIL/MSIL meet that definition? You say 'bytecodes are ready for execution', I don't agree. You need the JVM - right?

kalberts7-Aug-17 1:30

7-Aug-17 1:30

If I have a VAX executable file, containing VAX instructions, I need an interpreter for those codes. There never was a CPU that interpreted VAX instructions in pure silicon; every VAX on the market ran an interpreter, implemented in microcode.

If I have a JVM executable file, containing Java bytecodes, I need an interpreter for those codes. There never was a CPU that interpreted Java bytecodes in pure silicon. But there was a CPU interpreting Pascal P4 bytecodes running an interpreter implemented in microcode. There is no reason why you couldn't do exactly the same for Java bytecodes.

So I'll agree with you under the condition that we agree that both VAX instructions, and IAPX 386 instructions, IBM 360 instructions, Java bytecodes and Pascal P4 bytecodes are all in the same group. Neither of them are natively executed by silicon, but require an interpreter implemented at a lower level.

The distinction between instructions being directly implemented in silicon and those being interpreted by some code at a lower level may be essential with regart to speed and physical size of the silicon die. For the user, for the programmer and for the system architecure as seen at the programm interface the difference is marginal.

A far more essential question is whether you need to do any preprocessing to a file before you submit it for execution. For a Java bytecode (or Pascal P4) file, you need not do any further processing: The interpreter, whether written in microcode or as, say, conventional PDP-11 machine code, can start churning bytecodes right away, one by one.

With CIL, you can NOT feed the tuples to an interpreter one by one and have it interpret it as you go. Actually, I have tried to do so - not with CIL, but with a very similar intermediate code coming out from the front end compiler for the CHILL language. At the outset, it looked like doable - after all, each tuple indiates an operation and some operands and stuff, sort of like an instruction. But the deeper you dig into it, the more you find that is yet-to-be-determined. Stuff that is dependent on the context, but that context must be built up from information in other parts of the DAG structure. Lots of things you cannot do without traversing major parts of the graph to do a single operation ... unless you do a preprocessing before you start doing any execution at all. THAT is an essential difference. You MUST do a preprocessing before you can start any execution, and that preprocessing will reshape the code into something else that can be interpreted one by by one, like traditional binary VAX instructions.

Sure, there is a lot of preparations to be done to set up, say, a Windows process before it can execute the first instruction of the user program, but that is OS business, not CPU business. The Windows process setup does not restructure the instructions, the way CIL is transformed into a different format. All the machine instructions in the application code are perfectly valid as machine instructions even without the OS preparations.

If you have a PDP-11 with the microcode to interpret Pascal bytecodes, you are in the same situation: Every single one of the bytecodes are valid and can be executed one by one as they stand, without being restructured in any way, and fully defined by themselves, witout having to analyze any large context.

You may still believe that even if my group failed to make a direct interpreter for that intermediate code from the CHILL compiler, it could be done, if we had been clever enough. It turned out that the language and compiler designers said that if we had been clever enough we wouldn't have started the project at all. The intermediate code was never meant for interpretation. No intermediate code is. But bytecode is.

Feel free to implement a direct interpreter for CIL! Report back when you have completed your work.

jschell3-Aug-17 5:59

jschell

3-Aug-17 5:59

To be fair if you type "bytecode" into google most of the links returned refer to java rather than the more generic usage. The former would suggest a definition of 'type of' where the latter would not require the comparison.

kalberts3-Aug-17 21:49

3-Aug-17 21:49

If Google had existed in the early 1980s, a search for "bytecode" would have returned thousands of references to Pascal and its P4 bytecode format. The Pascal compiler was distributed as open source, with a backend for a virtual machine (also available as open source for a couple architectures). You could either adapt the VM to the architecture of your machine, and keep the compiler unchanged, or you could replace the P4 code generating parts of the compiler with binary code generation for your own machine.

Actually, lots of interpreters for non-compiled languages of today do some compilation into some sort of bytecode, which is cached up internally so that e.g. a loop body needs to be symbolically analyzed only on the first iteration. But Java is the only language (after Pascal and its P4) to really focus on this, making "Java Virtual Machine" a marketing concept, and really pushing "compile once, run anywhere" as The Selling Point of the language (more so 20 year ago than today). So you are right: Java is very prominent in bytecode references.