Click here to Skip to main content
Click here to Skip to main content

Understanding Common Intermediate Language (CIL)

, 23 Mar 2013
Rate this:
Please Sign up or sign in to vote.
Understanding Common Intermediate Language

Introduction

This article helps you in understanding and getting started with the Common Intermediate Lanaguage (CIL).  Since CIL is also a programming language we can very well program directly in IL and is not very difficult as most of us initially think.Understanding CIL is very important in writing dynamic assembly, allows developer to do changes in assembly by changing the IL code directly even if source code is not available and above all it gives clear understanding of internals of .net languages.

Background  

When we compile our code written in any .net language the associated compiler (like C#,VB Compiler) generates binaries called assembly which contains IL code .These Instructions are low level human readable language which can be converted into machine language by the run time compiler during its first execution.It is done just during execution so that the compiler will have before hand knowledge of which environment it’s going to run so that it can emit the optimized machine language code targeting that platform. The .net Framework’s such compiler is called Just-in-Time (JIT) compiler or Jitter

A .net assembly consist of following elements:

1. A Win32 file Header - The header data identifies the kind of application (console,Windows,code lib ) to be hosted by Windows operating system.

2.  A CLR file Header- The CLR header is a block of data that provides information which allow it to be hosted by the CLR. CLR header contain information about the run time version used for building the assembly, public key etc.

3. CIL Code - CIL code are actual implementation of code in terms of instructions. CIL code is described in length below.

4. Type Metadata - Describes the types contained within the assembly and the format of types referenced by that assembly.

5. An Assembly Manifest-Describes the modules within the assembly, the version of the assembly,all the external assemblies reference by that assembly.   

Common Intermediate Language  

CIL is a low level language based on the Common language Infrastructure specification document. (http://www.ecma-international.org/publications/standards/Ecma-335.htm). CIL was earlier referred as MSIL (Microsoft Intermediate Language) but later changed to CIL to standardize the  name of the language which is based on the Common Language Infrastructure specification.

CIL is CPU and platform-independent instructions that can be executed in any environment supporting the Common Language Infrastructure, such as the .NET run time  or the cross-platform Mono run time. These instructions are later processed by run time compiler and is then converted into native language. CIL is an object-oriented assembly language, and is entirely stack-based.


Before we go into the details of CIL, its stack nature etc, let's have a close look at the CIL code.  The below code display the classic Hello World console application and the  corresponding IL code generated by ILDASM (IL Disassembler can be found under Windows SDK Tools) 

 Hello World in C#:

static void Main(string[] args)
{
 Console.WriteLine("Hello World");
}
 

 Hello World in IL :

.assembly extern mscorlib
{
  .publickeytoken = (B7 7A 5C 56 19 34 E0 89 )                        
  .ver 4:0:0:0
}

.assembly HelloWorld
{
 ----lines omitted
}

.module HelloWorld.exe

.class private auto ansi beforefieldinit HelloWorld.Program
       extends [mscorlib]System.Object
{
 .method private hidebysig static void  Main(string[] args) cil managed
  {
    .entrypoint
    // Code size       13 (0xd)
    .maxstack  8
    IL_0000:  nop
    IL_0001:  ldstr      "Hello World"
    IL_0006:  call       void [mscorlib]System.Console::WriteLine(string)
    IL_000b:  nop
    IL_000c:  ret
  } // end of method Program::Main


  .method public hidebysig specialname rtspecialname 
          instance void  .ctor() cil managed
  {
    .maxstack  8
    IL_0000:  ldarg.0
    IL_0001:  call       instance void [mscorlib]System.Object::.ctor()
    IL_0006:  ret
  } // end of method Program::.ctor

} // end of class HelloWorld.Program
    

In the above code you can notice some names (CIL Tokens) with prefix "." (dot) eg. .assembly,.namespace,.class,.method,.ctor,.override these are called CIL Directives.

The tokens that are used along with CIL Directive and describes how the CIL Directive should be processed are called CIL Attributes.

Ex: public,extends,implements 

CIL Opcodes (or operation code) are tokens that are used to build the type’s implementation logic. This is the area where we are going to focus in our remaining article.

Ex:add,sub,mul,div,and,call,newobj,ldarg etc

CIL Opcodes are actually binary codes (0x01) but has corresponding friendly mnemonic (like for 0x01 the friendly name is "Break") to assist developer in understanding, debugging and writing the code directly in intermediate language. Here are few examples of binary Op codes and its mnemonic.More can be found in the ECMA document given in the link above.

Opcode

Instruction

Opcode

Instruction 

0x00

Nop

0x72

ldstr

0x01 

Break

0x73

newobj

0x02 

ldarg.0 

0x7A

throw

0x03

ldarg.1

0x8C

box

The tokens like IL_000, IL_001 etc are called CIL Code labels. It's just a label and you can replace them with any text of your choice. These are optional and you are free to write IL without any labels.

Before we see the IL code it’s important to understand the role of Evaluation Stack in executing the CIL instructions.

Evaluation Stack  

Before understanding  evaluation stack, one should know what Stack is. In case you are new to programming world here is definition of the stack.

Stack is the data structure that follows Last in and First out kind of data storing method i.e the type present at the top are the first one to be removed from it. The below example makes it clear.

 

The process of adding the item to the stack is called Push and the removing the item from the stack is called Pop. Fig displays the item A is added on top of the stack and when we remove the item from stack, A is the first item to be removed.

Evaluation Stack is used to hold the local variable or the method argument before they are evaluated. Before the start of every method the evaluation stack is empty and during the execution of the method the CIL instructions adds/removes the items from the evaluation stack , the end result of which is an empty evaluation stack at the end of that method execution.

Instruction that copy values from memory to the evaluation stack are called Load and the instruction that copy values from stack back to memory are called Store. All the Opcodes starting with ld are used for loading the item on the stack and the Opcodes starting with st are used for storing the item in the memory. The instruction used for storing the data in memory also result in popping off the item from the stack.

At the beginning of the function it is required to provide the maximum items that would be present on that stack at any particular time, this is done using .maxstack directive. If this is not provided the .maxstack  value would be defaulted to 8. We can easily provide the size of the stack by static analysis of the method, looking for the number of variables in the method.Also it should be noted that  maxstack doesn't represent the size of the stack frame ,it is just the number of items on the stack. The Maxstack value can be easily determined by analyzing the method and seeing how many variables and parameters it has.

The below table lists down some of the commonly used CIL instructions with its definition.  

Code

Definition

Code                     Definition

ldc.i4.m1 

Push -1 onto the stack as int32.

ret

This instruction is used to exit a method and return a value to the caller. (if there is any)

ldc.i4.s num 

Push num onto the stack as int32 

ldloc.x

Load local variable x on the stack.


ldstr 

Loads the string on the stack.

ldloca

Load memory address of local variable.

ldfld 

Loads field of an object

ldc.* 

used to load constants of t ype int32,int62,float32,float64.
ldarg 

Loads by-value argument. 

br target

Branch to target.The br instruction unconditionally transfers control to target.target is signed offset 4 bytes

ldarga 

Loads by-reference argument.  

br.s target

Branch to target, short form.  Target is  represented as 1 byte 

stloc.n

Pop a value from stack and store into local variable at index n.  

clt

Compares less than. Returns 1 or 0

starg.n

Pop off the value from the stack ans store into the method argument at index n.

blt target 

The blt instruction transfers control to target if value1 is less than value2.

pop

Only Pops off the value from the stack  

bgt target

The bgt  instruction transfers control to target if value1 is greter than value2. 


To understand evaluation stack let us see the below C# code using ILDASM.exe 

C# Code :

   static void add()
        {
            int value1 = 10;
            int value2 = 20;
            int value3 = value1 + value2;
        }

Here is the IL Code and i have added the comment to explain what exactly happens on evaluation stack during execution.  You can see in below method that maxstack size is set to 2 when you go through comment you will see what is the size of stack after each instructions which will make you clear why it was initially set to 2.

0.method private hidebysig static void  'add'() cil managed
{
  // Code size       12 (0xc)
  .maxstack  2
  .locals init ([0] int32 value1,
           [1] int32 value2,
           [2] int32 value3) // three int32 local variables are declared
           
  IL_0000:  nop  // no operation ( no push or pop on the stack)
  
  IL_0001:  ldc.i4.s   10  //loads the int32 value(10) on the stack. Item on Stack=1
  
  IL_0003:  stloc.0 // pops off the item from the stack and stores in first local variable.
                    //Item on Stack=0
                    
  IL_0004:  ldc.i4.s   20 //loads the int32 value(20) on the stack.Item on the stack =1
  
  IL_0006:  stloc.1 0 // pops off the item from the stack and stores in second local variable .
                      //Item on Stack=0

  IL_0007:  ldloc.0// Loads the value of first local variable on the stack. Item on Stack=1
  
  IL_0008:  ldloc.1// Loads the value of second local variable on the stack. Item on Stack=2

  IL_0009:  add //(Pops off first two numeric value from the stack and sends the result
                // back to the stack) //Item on stack=2-2+1=1
  IL_000a:  stloc.2 //(Pop off the item from the stack and store it in local variable [3]. 
                    //Item on the Stack=0
  
  IL_000b:  ret 
} // end of method Program::'add'

  

The nop instruction is simply a debug build artifact and are used to allow to put breakpoint on the curly braces ,this is the reason we can only find this instruction in the assembly compiled in debug mode.

Having understood the evaluation stack, we will now go through our Hello World IL code block by block.

Hello World example.
.assembly extern mscorlib
{
  .publickeytoken = (B7 7A 5C 56 19 34 E0 89 )                         // 
  .ver 4:0:0:0
}
 

The .assembly extern declaration is used to reference an external assembly, in our case mscorlib, which contains the definition of System.Console the only type that we have used outside of our assembly.  

.assembly HelloWorld
{
  //  Removed  code inside
} 

The .assembly declaration without extern attribute is used to declare the name of the assembly for this program. 

method public hidebysig specialname rtspecialname 
          instance void  .ctor() cil managed
  {
    // Code size       7 (0x7)
    .maxstack  8
    IL_0000:  ldarg.0
    IL_0001:  call       instance void [mscorlib]System.Object::.ctor()
    IL_0006:  ret
  } // end of method Program::.ctor
 

.ctor directive represents instance level constructor ,for representing static constructor .cctor (class constructor) is used. .ctor is always qualified with specialname and rtspecialname attribute. Special name is used to indicate that this token can be treated differently by different tools, for ex in c# language, constructor do not have return type but in CIL it has return type of void. Apart from managed code compilers the run time also needs to treat them specially as  memory need to be allocated when constructor is invoked. 

ldarg opcode is used to load the argument passed in method to the stack ldarg.0  here means load the first argument on the stack. If you are looking closely you might be wondering that where is the argument for this constructor , so what actually its trying to load? The answer is that any instance level IL method (non static) has an implicit parameter which is reference to current object similar to this keyword in C#. So for every non static method the first argument will be always the current object.

The call instruction calls the base class constructor.

 

.method private hidebysig static void  Main(string[] args) cil managed
  {
    .entrypoint
    // Code size       13 (0xd)
    .maxstack  8
    IL_0000:  nop
    IL_0001:  ldstr      "Hello World"
    IL_0006:  call       void [mscorlib]System.Console::WriteLine(string)
    IL_000b:  nop
    IL_000c:  ret
  } // end of method Program::Main

 

hidebysig attribute means that the member in the base class with same name and signature is hidden from derived class.
In this method first the "Hello World" string is loaded on the stack using the ldstr opcode and then System.Console.WriteLine method is called using call instruction and string loaded on the stack is passed as argument. 

Condition and Iteration in IL 

To understand how Condition and Iteration works in IL let us check following example in C# and its Corresponding IL.

C# Code :

  static void IterationExample( )
  {
      int i = 0;
      while (i < 5)
      {
          i++;
      }
  } 

IL Code : 

.method private hidebysig static void  IterationExample() cil managed
{
  // Code size       20 (0x14)
  .maxstack  2
  .locals init ([0] int32 i,
           [1] bool CS$4$0000) //STEP1
  IL_0000:  nop                //STEP2
  IL_0001:  ldc.i4.0           //STEP3 
  IL_0002:  stloc.0            //STEP4
  IL_0003:  br.s       IL_000b //STEP5
  IL_0005:  nop                //STEP12 //STEP24--and so on....
  IL_0006:  ldloc.0            //STEP13
  IL_0007:  ldc.i4.1           //STEP14
  IL_0008:  add                //STEP15
  IL_0009:  stloc.0            //STEP16
  IL_000a:  nop                //STEP17
  IL_000b:  ldloc.0            //STEP6 //STEP18
  IL_000c:  ldc.i4.5           //STEP7 //STEP19
  IL_000d:  clt                //STEP8 //STEP20
  IL_000f:  stloc.1            //STEP9 //STEP21
  IL_0010:  ldloc.1            //STEP10//STEP22
  IL_0011:  brtrue.s   IL_0005 //STEP11//STEP23
  IL_0013:  ret
} // end of method Program::IterationExample


Explanation of above code wrt to step mentioned in the above Code:

STEP1:

Declare two local variable at index 0 and 1 

STEP2:

No operation (not required) 

STEP3:

Load int32 value 0 on the stack. (Items on Stack =1)

STEP4:

Pops the value from the stack and store the value in local variable 0 .(Items on Stack=00[Value=0]

STEP5:

Go to target IL_0000b .

STEP6:

Load local variable 0 on the stack.(Items on Stack=1)

STEP7:

Load int32 value 5 on the stack. (Items on Stack=2)

STEP8:


Does less than comparision on two items after removing them from stack ,the one in bottom is at left hand side.Pushes the value 1 to the stack as the clt returns 1 (true) as 0<1. (Stack=1 (2-2+1))

STEP 9:

Pops the value from the stack and stores in local variable 1 i.e value 1 returned in last step is loaded into memory. (Items on Stack=0)

 

STEP10:

Loads the value 1 on the stack (Items on Stack=1) [Value=1]

STEP11:

 

brtrue checks if value is more than 0 then branches to IL_0005 In this case  since the value is 1 (as shown in Step 8) it goes to IL_0005. If the value  is 0 it goes next line ret and comes out of the method.

STEP12:

No operation (Not required) 

STEP13:

Load value of local variable 0 on the stack. 

STEP14: 

Load value 1 on the the stack (Items on Stack=2) [Value =1]

STEP15: 

Pop two items and add and return sum to the stack (stack=1) [Value=1] 

STEP16:

Stores  sum in the local variable 0.(Items on Stack=0)

The execution continues until the condition in STEP 11 returns 0. Its worth noting that this is not the only way this method can be written , this can be very well written differently by different user, for ex: clt and brtrue can be replaced by one blt instruction.  

 

Points of Interest  

Since the size of the article is already big i am planning to write second part of this article explaining how to write and compile  the IL code directly followed by explanation of different C# concepts by showing how it's implemented in IL.

References

 ECMA document, Pro C# by Andrew Troelsen

    

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

Rishikesh_Singh
Software Developer NCS,Singapore
Singapore Singapore
No Biography provided

Comments and Discussions

 
QuestionVoted 5 PinmemberMember 985763622-Sep-13 4:28 
GeneralMy vote of 5 Pinmemberjdkane27-Aug-13 9:54 
Questionmy vote of five Pinmemberalejandro29A15-Jul-13 5:17 
AnswerRe: my vote of five PinmemberRishikesh_Singh16-Jul-13 3:20 
GeneralRe: my vote of five Pinmemberalejandro29A16-Jul-13 3:26 
QuestionGood Description of IL [modified] PinmemberMartin Kropp29-Apr-13 4:56 
GeneralWaiting for the second part of this article. Pinmemberarun_sabat24-Jul-12 9:31 
GeneralMy vote of 2 PinmemberFrewCen1-May-12 1:34 
GeneralRe: My vote of 2 PinmemberRishikesh_Singh1-May-12 1:40 
GeneralRe: My vote of 2 PinmemberFrewCen1-May-12 4:02 
GeneralRe: My vote of 2 PinmemberRishikesh_Singh1-May-12 5:16 
yes, since the size of the article was already big i thought of including them in next article as i mentioned in the last. Thanks for your feedback.. and i agree that i should have included them too according to the article name.
GeneralRe: My vote of 2 PinmemberFrewCen1-May-12 7:55 
GeneralExcellent PinmemberAli Al Omairi(Abu AlHassan)30-Apr-12 22:15 
QuestionNice one Pinmembertrbznl30-Apr-12 11:30 
GeneralGood Article Pinmembertirumalararao29-Apr-12 23:25 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Mobile
Web01 | 2.8.140814.1 | Last Updated 23 Mar 2013
Article Copyright 2012 by Rishikesh_Singh
Everything else Copyright © CodeProject, 1999-2014
Terms of Service
Layout: fixed | fluid