|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Announcements
Want a new Job?
Chapters
Services
Feature Zones
|
IntroductionThe focus of this article is on a hack developed to implement C# 2.0 Iterators with version 1.1 of the .NET Framework. It is simply the result of a fascination with the compiler feature and of mere curiosity to see it work with version 1.1 of the Framework. The hack is based on the Iterators implementation described in chapter 22 of the C# Specification. Beforehand, however, it should be absolutely clear to the reader that the hack is of no practical value. It is certainly not expected or recommended that anyone actually use it for anything, certainly not with production code, not necessarily because it will not get the job done or for major performance issues, but rather because it is not wise to expose code, especially production code, to a tool that modifies the PE file yet has not been fully tested, as is the case with this tool, even less so when this tool is nothing more than a hack! Moreover, C# has this feature straight out of the box with the next major version, 2.0, of the compiler, so why even bother. The reader has been warned! This article is meant to amuse the reader by showing how C# 2.0 Iterators can be used with version 1.1 of the .NET Framework. It is not meant to provide the reader with anything that can be used for production purposes. First, a very brief and weak introduction to C# 2.0 Iterators will be given for those of you who are not already familiar with the feature. This article does assume that the reader is very comfortable with the Iterator design pattern and how it is implemented within .NET. Specifically, you should completely understand the Iterator type interface This introduction is then followed by a brief outline of the steps the hack takes to implement Iterators with PE files that target version 1.1 of the .NET Framework, and that can be disassembled to IL code using the ildasm.exe tool that ships with the .NET 1.1 platform SDK. Note, however, that extreme caution should be taken when using PE files generated with compilers other than the C# and VB compilers that ship with Visual Studio .NET 2003. C# 2.0 IteratorsGo read chapter 22 of the C# Specification! Just kidding, although it is highly recommended. Furthermore, this introduction to C# 2.0 Iterators provides only the main characteristics of the feature and implementation. It omits various other facts that are very important, for example, those dealing with exception handling. Also, the reader should not be surprised to discover that this introduction to Iterators simply paraphrases specific content of chapter 22 of the C# Specification. Finally, with all that said, the reader is better off reading and understanding chapter 22 of the C# Specification and then skipping this section. C# 2.0 Iterators is simply a compiler feature that greatly facilitates implementation of the Iterator design pattern. Specifically, the language is given the capability to define Iterator types through ordinary functions. These Iterator functions hold iteration logic and, at compile time, are enclosed within nested An Iterator function is any function that satisfies the following main criteria, although other criteria do exist and are fully documented in chapter 22 of the C# Specification:
An example of an Iterator function that yields the values 1 through 5 is: System.Collections.IEnumerable yieldOneToFive()
{
yield return 1;
yield return 2;
yield return 3;
yield return 4;
yield return 5;
}
The code block of an Iterator function can be referred to as a
The compiler simply does the work that otherwise the programmer would need to do when implementing the Iterator design pattern in traditional fashion. In addition, however, Iterator functions can be used as coroutines. With that said, C# 2.0 Iterators not only facilitate implementation of the Iterator design pattern but also the implementation of other design patterns that benefit from being implemented using coroutines (Wesner Moise, .NET Undocumented - Iterators, Not Just for Iteration). Fortunately for the learning experience, making note of the behavior of Iterator functions will make this evident for those of you who know what coroutines are. The behavior can be described as follows, although this is certainly not an exhaustive description; for that see chapter 22 of the C# specification:
With all the above said, the Iterator function shown in the example above will result in a nested Iterator type similar but not identical to: class _IEnumerable : System.Collections.IEnumerable, System.Collections.IEnumerator
{
bool System.Collections.IEnumerator.MoveNext()
{
switch(_state)
{
case 0: break;
case 1: goto state_1;
case 2: goto state_2;
case 3: goto state_3;
case 4: goto state_4;
default: return false;
}
_current = 1;
_state++;
return true;
state_1:
_current = 2;
_state++;
return true;
state_2:
_current = 3;
_state++;
return true;
state_3:
_current = 4;
_state++;
return true;
state_4:
_current = 5;
_state++;
return true;
}
//...the rest of the nested class definition
}
In addition, the Iterator function itself will be rewritten to return an instance of the nested Iterator type above. Perhaps something like: System.Collections.IEnumerable yieldOneToFive()
{
return new _IEnumerable();
}
C# 2.0 Iterators is a feature that greatly simplifies implementation of the Iterator design pattern. It alleviates the need to explicitly create types that implement the This concludes the brief, practically non existent, introduction to C# 2.0 Iterators. Once again, it is highly recommended that the reader understand chapter 22 of the C# Specification. What follows now is a description of a hack that enables seeing this feature in action with version 1.1 of the .NET Framework. IteratorsHackTo see C# 2.0 Iterators in action with version 1.1 of the Framework, the following steps must be taken, at least when dealing with the hack described herein:
An example of an Iterator function the hack will process: [Iterators.IteratorFunction]
System.Collections.IEnumerable yieldOneToFive()
{
Iterators.Yield.Return(1);
Iterators.Yield.Return(2);
Iterators.Yield.Return(3);
Iterators.Yield.Return(4);
Iterators.Yield.Return(5);
return null;
}
At a high level, the hack performs the following steps:
Be forewarned that this hack is by no means whatsoever a tokenizer, rather just a dirty parsing routine that relies solely on native What follows now is a brief description of some, but not all, of the processing the hack performs. Specifically, a brief rundown is given of some of the IL modifications made to the Iterator functions in the course of converting them into corresponding nested Instance Member Access within Iterator FunctionsIf the Iterator function is an instance method, the instruction stream is updated so that each occurrence of the //this is IL code running within a non static Iterator function
//defined within type Namespace1.Class1
//push instance (this) pointer onto stack
ldarg.0
//push field _i onto stack
ldfld int32 Namespace1.Class1::_i
After the Iterator function is enclosed within the nested type, the above IL code will look something like: //this is IL code running within the MoveNext method
//of type Namespace1.Class1/Enumerable1 which is nested
//within type Namespace1.Class1
//and represents an Iterator function
//push instance pointer onto stack
//that is, pointer to the nested type instance
//Namespace1.Class1/Enumerable1
ldarg.0
//push field _this onto stack
//which is a pointer to the outer type instance
//Namespace1.Class1
ldfld class Namespace1.Class1/Enumerable1::_this
//push field _i onto stack
//which is a field of the outer type
//Namespace1.Class1
ldfld int32 Namespace1.Class1::_i
The actual code that performs this processing is: private void processMemberAccess(StringCollection sc,
IteratorMethodInfo imi, string cls)
{
if(imi.IsStatic)
return;
int n = -1;
while(++n < sc.Count)
{
string s = sc[n].Trim();
int ndx = s.IndexOf(": ");
if(ndx != -1)
s = s.Substring(ndx + 1).Trim();
if(s.StartsWith("ldarg.0"))
sc.Insert(++n, "ldfld class " + cls + " " + cls + "/" +
imi.ShortEnumerableName + "::" + THIS_FIELD);
}
}
Read/Write Access to Local Variables and Parameters of Iterator FunctionsAll instructions that read/write the value of local variables and parameters are processed. All local variables and parameters will eventually be elevated to field status within the nested type that represents the Iterator function. Therefore, the instruction stream is updated so that instructions that read the values of local variables and parameters are translated to instructions that read the values of corresponding fields. Specifically, the instructions we are looking to process are:
Basically, whenever one of these instructions is encountered, the corresponding field is located based on the argument supplied to the instruction in question. Then the instruction is replaced with either a //this is IL code running within a non static Iterator function
//defined within type Namespace1.Class1
//push onto stack the value of local variable v
ldloc v
//push onto stack the value of the argument supplied to parameter p
//which is the first parameter in the method’s parameter list
ldarg 1
After the Iterator function is enclosed within the nested type, the above IL code will look something like: //this is IL code running within the MoveNext method
//of type Namespace1.Class1/Enumerable1 which is nested
//within type Namespace1.Class1
//and represents an Iterator function
//push onto stack the value of field _v
//which corresponds to local variable v
//first though push the instance pointer
ldarg.0
ldfld int32 Namespace1.Class1/Enumerable1::_v
//push onto stack the value of field _p
//which corresponds to parameter p
//first though push the instance pointer
ldarg.0
ldfld int32 Namespace1.Class1/Enumerable1::_p
The actual code that performs this processing is: private void processLocalRead(StringCollection sc, IteratorMethodInfo imi)
{
int n = -1;
while(++n < sc.Count)
{
string s = sc[n].Trim();
int ndx = s.IndexOf(": ");
string label = string.Empty;
if(ndx != -1)
{
label = s.Substring(0, ndx + 1);
s = s.Substring(ndx + 1).Trim();
}
FieldInfo fi = null;
bool loadAddress = false;
if(Regex.IsMatch(s, @"(?:^ldloc(?:\.| ))"))
fi = getFieldInfo(s, imi, false);
else if(s.StartsWith("ldloca"))
{
fi = getFieldInfo(s, imi, false);
loadAddress = true;
}
else if(Regex.IsMatch(s, @"(?:^ldarg(?:\.| ))") &&
(imi.IsStatic || s.IndexOf("ldarg.0") == -1))
fi = getFieldInfo(s, imi, true);
else if(s.StartsWith("ldarga"))
{
fi = getFieldInfo(s, imi, true);
loadAddress = true;
}
if(fi != null)
{
sc.Insert(n++, label + " ldarg.0");
sc[n] = sc[n].Replace(s, "ldfld" +
(loadAddress ? "a" : string.Empty) + " " + fi.Type + " "
+ imi.EnumerableName + "::" + fi.Name).Replace(label,
string.Empty).Trim();
}
}
}
Once read instructions of local variables and parameters have been processed, all instructions that write to local variables and parameters are also processed, although this latter process is not as straightforward as the former. It is necessary to make sure that write instructions that operate on local variables and parameters are reflected in the corresponding fields of the nested type. In other words, if a local variable or parameter has its value set within the Iterator function, the field of the nested type that corresponds to this local variable or parameter also needs to have its value set when the Iterator function is enclosed. Specifically, the write instructions we are looking to process are:
As was the case when reading local variables or parameters, whenever one of these instructions is encountered, the corresponding field is located based on the argument supplied to the instruction in question. However, this is where the difference comes in. Read instructions are always replaced; however, write instructions are never replaced, rather are followed by additional instructions that have the effect of assigning the value of the local variable or parameter to the corresponding field of the nested type via the //this is IL code running within a non static Iterator function
//defined within type Namespace1.Class1
//store in local variable v the value that is on top of the stack
stloc v
//store in parameter p the value that is on top of the stack
//parameter p is the first parameter of the method’s parameter list
starg 1
After the Iterator function is enclosed within the nested type, the above IL code will look something like: //this is IL code running within the MoveNext method
//of type Namespace1.Class1/Enumerable1 which is nested
//within type Namespace1.Class1
//and represents an Iterator function
//store in local variable v the value that is on top of the stack
//notice that this instruction has not been replaced
stloc v
//now store in field _v that value of variable v
ldarg.0
ldloc v
stfld int32 Namespace1.Class1/Enumerable1::_v
//store in parameter p the value that is on top of the stack
//parameter p is the first parameter of the method’s parameter list
//notice that this instruction has not been replaced
//HOWEVER, since the MoveNext method does not have a parameter list
//all parameters of the Iterator function will become local variables
//of the MoveNext method, in addition to the local variables of the
//Iterator function, more on this to come
stloc p
//now store in field _p the value of variable p
//which in the Iterator function was actually parameter p
//more on this to come
ldarg.0
ldloc p
stfld int32 Namespace1.Class1/Enumerable1::_p
So what is going on here? Why is that instructions that write to local variables and parameters are not simply replaced as is the case with instructions that read local variables or parameters? Why is it that the local variables of the Iterator function must also be available within the There is one simple answer to all of these questions, and that is, what has been described so far is nothing more than a hack, and as such, it takes the easy way out of a problem that would otherwise require a much more sophisticated solution. As was mentioned earlier, instance member access, whether read or write, requires that the instance pointer be pushed onto the stack beforehand. Field read operations expect the topmost value on the stack to be the instance pointer, and that is all they expect to see. On the other hand, the field write instruction So the problem is where to place the To completely replace local write instructions with The actual code that performs this processing is: private void processLocalWrite(StringCollection sc, IteratorMethodInfo imi)
{
int n = -1;
while(++n < sc.Count)
{
string s = sc[n].Trim();
int ndx = s.IndexOf(": ");
if(ndx != -1)
s = s.Substring(ndx + 1).Trim();
FieldInfo fi = null;
bool starg = false;
if(s.StartsWith("stloc"))
fi = getFieldInfo(s, imi, false);
else if(s.StartsWith("starg"))
{
fi = getFieldInfo(s, imi, true);
starg = true;
}
if(fi == null)
continue;
string local = fi.LocalName;
if(starg)
sc[n] = sc[n].Replace(s, "stloc " + local);
sc.Insert(++n, "ldarg.0");
sc.Insert(++n, "ldloc " + local);
sc.Insert(++n, "stfld " + fi.Type + " " +
imi.EnumerableName + "::" + fi.Name);
}
}
Yielding Values From Iterator FunctionsNow the heart of the matter has been reached, that is, the point at which the instruction stream is updated in response to all
To illustrate: //this is IL code running within a non static Iterator function
//defined within type Namespace1.Class1
//yield the value 1 to the caller
ldc.i4.1
box
call void [Iterators]Iterators.Yield::Return(object)
//yield the value 2 to the caller
ldc.i4.1
box
call void [Iterators]Iterators.Yield::Return(object)
After the Iterator function is enclosed within the nested type, the above IL code will look something like: //this is IL code running within the MoveNext method
//of type Namespace1.Class1/Enumerable1 which is nested
//within type Namespace1.Class1
//and represents an Iterator function
//branch to state 1 if necessary
//that is, the state of the Iterator
//after the first yield is encountered
ldc.i4 1
ldarg.0
ldfld int32 Namespace1.Class1/Enumerable1::_state
beq _STATE_1
//otherwise, execution begins here
//yield the value 1 to the caller
//store the value in the local variable current
ldc.i4.1
box
stloc current
//set the _current field to the
//value of the local variable current
ldarg.0
ldloc current
stfld object Namespace1.Class1/Enumerable1::_current
//set the local variable result to true
//this variable holds the value returned by MoveNext
ldc.i4.1
stloc result
//set the _state field to 1
//so that next time MoveNext is invoked,
//execution will begin at the instruction labeled STATE_1
ldarg.0
ldc.i4 1
stfld int32 Namespace1.Class1/Enumerable1::_state
//exit MoveNext
br _EXIT_ITERATOR_BLOCK
//execution will begin here when the _state field equals 1
STATE_1: nop
//yield the value 2 to the caller in the same manner
//that the value 1 was yielded, always updating the state machine
The actual code, which is in need of refactoring, that performs this processing is: private int processMethodYields(StringCollection sc, int index,
ref int tryBlockIndex, IteratorMethodInfo imi,
ref int state, int instrIndex)
{
bool tryBlock = sc[index == 0 ? 0 : index - 1].Trim().StartsWith(".try");
bool finallyBlock =
sc[index == 0 ? 0 : index - 1].Trim().StartsWith("finally");
bool validYldBlock = index == 0 || tryBlock;
string tryBlockLabel = string.Empty;
int tryInstrIndex = 0;
if(tryBlock)
{
tryBlockLabel = TRY_BLOCK_LABEL + (tryBlockIndex++).ToString();
sc.Insert(++index, tryBlockLabel + ": nop");
tryInstrIndex = index + 1;
}
else if(finallyBlock)
{
int i = index;
while(!sc[++i].EndsWith(" endfinally"));
string endFinallyLabel = sc[i].Substring(0, sc[i].IndexOf(":")).Trim();
sc.Insert(++index, "ldc.i4.1");
sc.Insert(++index, "ldloc " + imi.MoveNextResultLocal);
sc.Insert(++index, "beq " + endFinallyLabel);
}
while(true)
{
string s = sc[++index].Trim();
if(s == "{")
index = processMethodYields(sc, index,
ref tryBlockIndex, imi, ref state, instrIndex);
else if(s.StartsWith("}"))
return index;
else if(validYldBlock && Regex.IsMatch(s,
@"(?:call +void +\[Iterators\]Iterators\.Yield::Return\(object\))"))
{
sc[index] = s.Substring(0, s.IndexOf("call ")) +
"stloc " + imi.CurrentLocal;
sc.Insert(++index, "ldarg.0");
sc.Insert(++index, "ldloc " + imi.CurrentLocal);
sc.Insert(++index, "stfld object " +
imi.EnumerableName + "::" + CURRENT_FIELD);
sc.Insert(++index, "ldc.i4.1");
sc.Insert(++index, "stloc " + imi.MoveNextResultLocal);
sc.Insert(++index, "ldarg.0");
sc.Insert(++index, "ldc.i4 " + (++state).ToString());
sc.Insert(++index, "stfld int32 " +
imi.EnumerableName + "::" + STATE_FIELD);
sc.Insert(++index, (tryBlock ? "leave" : "br") +
" " + EXIT_ITERATOR_BLOCK_LABEL);
string stateLabel = STATE_LABEL + state.ToString();
sc.Insert(++index, stateLabel + ": nop");
sc.Insert(instrIndex++, "ldc.i4 " + (state).ToString());
index++;
tryInstrIndex++;
sc.Insert(instrIndex++, "ldarg.0");
index++;
tryInstrIndex++;
sc.Insert(instrIndex++, "ldfld int32 " +
imi.EnumerableName + "::" + STATE_FIELD);
index++;
tryInstrIndex++;
sc.Insert(instrIndex++,
"beq " + (tryBlock ? tryBlockLabel : stateLabel));
index++;
tryInstrIndex++;
if(!tryBlock)
continue;
sc.Insert(tryInstrIndex++, "ldc.i4 " + (state).ToString());
index++;
sc.Insert(tryInstrIndex++, "ldarg.0");
index++;
sc.Insert(tryInstrIndex++, "ldfld int32 " +
imi.EnumerableName + "::" + STATE_FIELD);
index++;
sc.Insert(tryInstrIndex++, "beq " + stateLabel);
index++;
}
}
}
Terminating Execution of Iterator FunctionsThe instruction stream is updated so that all
To illustrate: //this is IL code running within a non static Iterator function
//defined within type Namespace1.Class1
//exit the method
//we know that either null
//or a value of type IEnumerable
//is the only value on the stack
ret
After the Iterator function is enclosed within the nested type, the above IL code will look something like: //this is IL code running within the MoveNext method
//of type Namespace1.Class1/Enumerable1 which is nested
//within type Namespace1.Class1
//and represents an Iterator function
//pop off whatever value is on the stack,
//either null or a value of type IEnumerable
pop
//store the value false in the local variable
//that holds the result of the MoveNext method
ldc.i4.0
stloc result
//store the value -1 in the _state field
ldarg.0
ldc.i4 -1
stfld int32 Namespace1.Class1/Enumerable1::_state
//unconditionally branch to the instruction
//labeled _EXIT_ITERATOR_BLOCK
br _EXIT_ITERATOR_BLOCK
The actual code that performs this processing is: private void processMethodReturns(StringCollection sc, IteratorMethodInfo imi)
{
int n = -1;
string result = imi.MoveNextResultLocal;
while(++n < sc.Count)
{
string s = sc[n].Trim();
int ndx = s.IndexOf(": ");
string label = string.Empty;
if(ndx != -1)
{
label = s.Substring(0, ndx + 1);
s = s.Substring(ndx + 1).Trim();
}
if(s != "ret")
continue;
sc.Insert(n++, (label == string.Empty ? label : label + " ") + "pop");
sc.Insert(n++, "ldc.i4.0");
sc.Insert(n++, "stloc " + result);
sc.Insert(n++, "ldarg.0");
sc.Insert(n++, "ldc.i4 -1");
sc.Insert(n++, "stfld int32 " + imi.EnumerableName + "::" + STATE_FIELD);
sc[n] = "br " + EXIT_ITERATOR_BLOCK_LABEL;
}
}
Here we conclude the discussion of some of the IL hacking performed by this hack. DemosTwo versions of the same demo are provided with the source code of this article, one written in C#, the other in VB. The demo is a simple Windows application that demonstrates the use of a recursive Iterator by populating a The C# version of the recursive Iterator function is: [IteratorFunction]
private IEnumerable getDirectories(DirectoryInfo dir)
{
foreach(DirectoryInfo dir1 in dir.GetDirectories())
{
Yield.Return(dir1);
foreach(DirectoryInfo dir2 in getDirectories(dir1))
{
Yield.Return(dir2);
}
}
return null;
}
The VB version of the recursive Iterator function is: <IteratorFunction()> _
Private Function getDirectories(ByVal dir As DirectoryInfo) As IEnumerable
For Each dir1 As DirectoryInfo In dir.GetDirectories()
Yield.Return(dir1)
For Each dir2 As DirectoryInfo In getDirectories(dir1)
Yield.Return(dir2)
Next
Next
End Function
Final Concluding ThoughtsC# 2.0 Iterators is a powerful programming feature that facilitates implementation of the Iterator design pattern, although there are many other uses for Iterators, ones that have nothing to do with iteration per se. The Thanks for taking the time to read the article and I hope you enjoyed it as much as I enjoyed writing it. If there are any inaccuracies, a definite possibility since I am no expert of any field, please disclose. Peace out homies and until next time, God willing of course! References
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||