Click here to Skip to main content
15,867,330 members
Articles / Programming Languages / C#
Article

Some Useful Concurrency Classes and A Small Testbench

Rate me:
Please Sign up or sign in to vote.
4.92/5 (37 votes)
15 Jan 2007CPOL70 min read 102.2K   1K   140   18
Useful concurrency classes and small test bench in C#

Introduction

This is the first article I have written for this forum. I've enjoyed and benefited from the many wonderful articles on The Code Project and thought it might be a good time for me to contribute something. I've recently finished a long-term project for a client (actually I'm still tweaking DOCs and a few minor things) who has graciously granted permission to place a generic version of the developed code in the public domain. This was a very large project that employed C#/.Net (to the extent possible) to integrate legacy code in a platform-independent fashion. The application is NDE (Non-Destructive Evaluation) and involves the coordination of multiple real-time processing assets and sophisticated analysis functions that operate on 2-D, 3-D and higher-dimensional datasets. I'll talk just a bit later on about the application, but not in detail unless somebody asks me to.

Originally, I was to play the role of a project manager for an "in-house" project that would be augmented with outside contractors to supply some of the .Net programming expertise. Unfortunately, when the project actually geared up, we found a dearth of capable .Net programmers. I ended up hiring a project manager and doing most of the system architecture work myself, as an individual contributor. I also performed much of the programming of the framework that was developed to support the application. My involvement in the actual software development ended up being much more substantial than was originally anticipated. This was a tough and long-winded assignment for me. In recompense, I learned a great deal from some very brilliant (and kind) people, so I am thankful to have been involved in this major project. I will talk about the overall framework a little bit in a following section, just to give perspective on why we did (and are doing on the generic project) things in certain ways.

I've been trying to figure out which part of the project code to talk about and release first. My problem has been that I want to have everything perfect before releasing anything in order not to embarrass myself. After a few weeks of cleaning up one thing and then another, then another, I realized I might never get anything released until I was too senile to remember what the project was about. So I decided to pick a simple concept and try to write something about it that might be useful to people and put some code up on The Code Project website just to get my feet wet.

Even after this landmark decision, it's taken a while to dis-entangle namespace dependencies and refactor things a bit so I did not have to put all of the code up here at once in order to run anything. I've actually had to stub out a few references out and/or replicate functionality locally, but the download size is now manageable, so I guess everything worked out fine. I was originally going to put the code for our service manager up here, but then I remembered that I wanted to introduce some concurrent processing so I could initialize some services on background threads. So I decided to work for a bit on concurrency management and this ended up being a project all in itself. So I guess the service manager will be article number two or maybe three.

In writing this article, I have taken a tutorial approach. I've been training folks with varied backgrounds who are new to .Net for a while now. Usually I lose people if I don't provide a bit of explanation of how .Net compares to other platforms and also explain some of its more advanced features. I've tried to provide some tutorial explanations as they are needed when I describe various code features. I hope this is useful. The code documentation is comprehensive. I use my code for teaching and thus it must be so. I should also say that there are some advanced (arcane??) constructs used at different times. We were pushing the envelope of managed code on this project to a large extent and we just needed to do some strange things. I've tried to note in the code where you probably should not do certain things unless you really need to. The reader might also note that there is great deal of duplication in the various code examples. Sensible design practices (OO and otherwise) would dictate that duplication of code be minimized. This would entail factoring and inheriting and all the other good stuff that smart programmers do. However, it's difficult to demonstrate different variations on a theme if there is a complex inheritance or composition structure in the demonstration code. So, I like to keep things flat in the examples unless I am trying to demonstrate some specific structural design.

OK, time to get serious...

Background

The NDE field is characterized by the need to handle massive amounts of data derived from various types of sensors (ultrasound, X-ray, electrooptical, etc..) that are formed into different types of datasets for analysis. In a production environment, the NDE system will typically need to coordinate the activities of multiple processors of multiple types, all working toward the common goal of performing an accurate analysis of the quality of a certain material or part under inspection.

The NDE data processing world today involves massive parallelism. The current state-of-the-art in data processing and transfer is exemplified by products like those produced by Texas Memory Systems (http://www.texmemsys.com). A network of parallel processing nodes provides 192 GFlops of processing power with 16 GBytes/sec of shared storage bandwidth at a cost of somewhere under $250K. In such systems, data is processed quickly and moves quickly. Our framework was developed to take advantage of these modern processing assets.

We want to distinguish between "embedded" and "real-time" systems. Our NDE systems are not embedded in another system, though they have a hard real-time requirement. Not all embedded systems have any real-time requirement, though many do. Definitions differ, but in our context "hard" real-time means that a missed "deadline" is never acceptable. Embedded processors in consumer products often have a "firm" or "soft" real-time requirement, specifying that they must keep up with processing only on an average basis.

C#/.Net was selected to implement a framework for integrating/rewriting legacy image processing and analysis algorithms and also, to the extent possible, to control high-speed processing assets. Much of the time-critical data processing is done in special-purpose machines, with their own execution systems. There is much functionality that is also implemented in the "Host" processors that has to do with moving data and performing high-level processing, analysis and display. There is a great deal of middle ground, here, where the boundary between managed code and unmanaged code was not clear. As the project has evolved, we've learned a bit more about how to decide such things. Hopefully we'll have a chance to share some of this knowledge in a few articles.

There are a number of good reasons why existing implementations of .Net are not directly suitable for many real-time operations. First, the VES imposes some additional overhead as compared to unmanged code. Our experience with this is that some standard operations can take up to four times longer, or so. Lot's of things run much faster and various optimizations can be performed for specific applications, though.

A more important reason has to do with memory management and garbage collection. First, many hard real-time systems can't afford to have the garbage collector run during a processing loop - ever. In some time-critical applications, an object falling out of scope and becoming a candidate for garbage collection must be considered equivalent to a memory leak in an unmanaged code application. One simply cannot afford the time for a collection. Purveyors of .Net technology are rather cavalier about simply letting the garbage collector take care of things behind the scenes, since it "runs so fast". Well, there's fast and then there's FAST. Often real-time applications cannot spare even a few milliseconds. As a result, an execution environment must allow the developer to write code that effectively reuses data items without discarding too many over the life of the application run. Keep in mind that these real-time systems often run millions of iterations in a typical processing task. In an unmanaged implementation, one instance of a malloc and free inside a processing loop, even on a small data item, can ruin one's whole day. Technically, it shouldn't if the heap manager were designed right but...... As a result, many real-time programmers spend a lot of time designing pooling schemes and custom memory managers. The facilities of a general-purpose OS are generally not designed to do this. Real-time OS's handle some of this, but one still must write the application code in such a way as to require minimal intervention by memory management facilities.

There are a variety of manufacturers who make OS extenders (some folks call them "shims" - another overused term) that are designed to augment an OS to make it more useful for real-time applications. These extenders usually handle low-level hardware synchronization details and prevent ongoing processing tasks from being interrupted by the main OS at an inopportune time. This idea works fine if the main OS has to do only a few simple things during the processing loop, like maybe send a message to the screen indicating the status of the real-time operation. It does not work if the OS needs to do serious work (like managing memory) that has to fit between strokes of a cutting head (or whatever).

Problems with implementations of garbage collectors in .Net are described in [Lutz03] and [Zerzelidis05], for example, and still have not been adequately addressed. One of the major problems with garbage collection is the problem of latency of control. When a data buffer full of scanned data needs to be unloaded or a tool head needs to be moved, the execution environment has to respond right away. If the OS is doing a garbage collection or something similar, it has to stop immediately and service the real-time request. Even if the average time the garbage collector spends in compacting memory could be tolerated, its work must be interruptable to service real-time requests almost instantaneously. [Zerzelidis05] points out this problem. An interesting report [NIST99] that predates the emergence of .Net covers some of the same issues for the Java platform. This report defines some of the terminology that has been adopted in more recent studies. It also helped drive the definition for the Real-Time Specification for Java (RTSJ).

When we were first planning this project, we looked a number of real-time OS's and OS extenders. We found none of these technologies that supported .Net (and C#) in a platform-independent way. One solution is do a lot of processing in unmanaged code and coordinate that processing through either Pinvoke or COM interop. That is the the approach we took in our project (mostly Pinvoke, since COM support was not there on Mono) and it worked fine. One just needs to keep the amount of processing done in unmanaged code high relative to the amount of time spent in performing Pinvokes. Pinvokes are costly, but not THAT costly, especially if any data transferred is in the form of blittable Types. Packing data and control information into a single native int or byte array with mapping/unmapping helps considerably.

In the NDE field there is a high premium placed on data integrity and correctness of results (how could a flawed data collection/analysis system be expected to perform Quality Assurance inspections?). In many cases, users of the system will expect that there will never be one single mistake in the various data manipulations performed, ever. NDE applications are not alone in this requirement. A computer-controlled machine tool must perform its operations on the correct schedule, every time, or the part being manufactured could be destroyed. This fact drove the choice of concurrency management techniques we adopted for our work.

Options for Concurrency Management

Our options for concurrency management techniques have always been limited by the fact that a major goal of the project was platform independence. We've spent significant effort over the past couple of years 1) researching the Microsoft DOCs to see how a certain thing could be accomplished in .Net, then checking out 2)Whether it was spelled out the same way in ECMA and 3)Checking out whether Mono had it yet. With regards to Mono, it was never enough to check if it was in their Docs, but whether their compiler and CLR actually implemented it fully and correctly. We've been burned a couple of times by this. For example, their compiler accepts the new [SecurityCritical]attribute from the Security namespace, but doesn't do anything with it and doesn't inform the code author of this fact!! Don't misinterpret our comment - the effort by everyone working on Mono is spectacular. To be developing an open-source version of .Net functionality substantially in parallel with Microsoft (or soon after) is a wonderful accomplishment and a noble goal. It's just that one must spend a little time checking the status of things. When our company was involved in product development, we built a major multi-platform product under a government contract. It was so difficult that it almost brought the company to it's knees. The tools just weren't there for this type of effort. The platforms were PCs running Win32 (just out) connected to SUNs running Solaris communicating through PC-NFS. Reconciling code across the two platforms was primarily a manual operation and cross-platform testing was a nightmare. The network always seemed to be down. So things are much better these days - we're not complaining:-)

Prior to .Net 2.0, things like Semaphores were not available in managed code so we relied on rather simplistic locking mechanisms using Mutexes and Monitors. As a result of the platform issues, we are still staying away from anything that is not verified as stable on Mono. We don't need anything sophisticated at the moment, since our synchronization requirements in managed code are fairly simple. Managed thread synchronization is done currently within our framework's managed code, but in a piecemeal way. We've rewritten some simple concurrency management code under 2.0 to provide something that is a bit cleaner and more reusable. To the extent possible, we've stayed entirely within the Virtual Execution System (VES) using Monitor, .et .al. At the moment, all we need to do is run a few services that are not time-critical on background threads. However, we are anticipating the conversion of more real-time control processing into managed code, so we want the work to be reusable for this purpose. We'll see how this goes....

Retry Loops

While studying up on System.Theading under .Net 2.0, we had occasion to stumble across the methods of the Interlocked class. We'd seen them (and used them) before, but we noticed a certain exposition concerning the ADD method that we found troublesome. (The ADD method is new in 2.0.) There were apparently several other people who noticed that the description was at least peculiar, based on some forum entries that have appeared. We are starting the technical discussion with an explanation and a demonstration of the technique that was described in this piece and why its not a good idea to use it. Or, better said, why one needs to be very careful with it. We have a bit of heartburn when we see it's use advocated in a cavalier fashion, since we go through such great pains to use it safely and correctly.

The exposition in question describes how an atomic CompareExchange operation can be used in a loop to build an ADD function that is also atomic in SOME sense. For the uninitiated, the concept of an "atomic" operation is part of the jargon employed in multiprocessing (and in other sub-disciplines of the computer field) to describe an operation that is guaranteed to take place all at once. The term is commonly applied when referring to a procedure that reads a data item, modifies it somehow and then writes it back, all in one operation. What is meant by this is that no one else gets to touch the data item during this sequence of three steps. Contrast this to an ordinary sum = sum + summand; instruction in C#. This operation is not guaranteed to be atomic - in fact most compilers treat it as three separate steps of read - modify - write. In multi-threaded applications, the sequence can be pre-empted by other threads accessing the very same variable in between the steps. The idea behind atomic operations is that this cannot happen, the instruction normally being mapped to a hardware instruction for a particular processor that locks memory for the duration of the read - modify - write operation. Atomic operations (and concurrency management in general) are becoming more important with the advent of Hyperthreaded and multichip CPUs. These issues are no longer solely within the purview of the multiprocessing folks, so it's worthwhile to discuss them just a bit.

The way one implementation of an ADD function is suggested in the descriptive piece in question is shown in generic (not .Net Generic) form in figure 1.

Sample Image - maximum width is 600 pixels

Figure 1. "Retry Loop" Caller Logic.

In this figure, the CompareExchange method, which is a true atomic operation, is used in a loop to iteratively implement an addition. We call this a "pseudo-atomic" operation, since it is not truly atomic. I've taken a tiny bit of license with this diagram, since I wanted to illustrate the general idea of the retry loop. We'll present the code a bit later. The loop works as follows:

  1. The Target to be modified is read from memory and stored locally.
  2. A modification is made to the local copy of the Target (in this case the "OP" adds a number) and it is stored in another local variable.
  3. The original Target's memory location is locked.
  4. A check is made to see if the current value in the original memory location is different from the local copy.
  5. If it is not different, the modified Target's value (in this case the sum) is written back to the original Target's memory location.
  6. If the local copy of Target is not the same, store the new version and try the OP again.

This procedure was discussed at least as early as [Herlihy93]. In this paper, he discusses the use of "retry loops" (our quotes) that attempt an operation that changes some target object, then use an atomic operation to confirm and commit the change if things go as planned. If a "collision" is detected by the atomic operation, the OP is retried on the new state of the target, with the retry loop being executed again and again until it succeeds. "Back in the day", many discussions of non-blocking synchronization techniques such as this focused on database applications. It makes sense that many database operations (e.g.adding money to an account) could just be tried again if collisions were detected. Some can't, though. If the transaction is a withdrawal, for example, it's probably wise to check for an overdrawn condition before blindly making another withdrawal. Thus, it is critical to place the check for a negative balance inside the retry loop, not simply to perform it once before entering the loop. For more general applications, one needs to understand how this technique works and when it is useful. Let's get back to the case at hand and consider the situation when multiple threads are all trying to add numbers to a common target. This might be useful when a tally is being made of the number of operations of another type each thread has performed. The add operation may function correctly, depending on one's definition of the sum. The addition operation is commutative and the resulting sum does not depend on the order in which the summands appear (1+2 = 2+1). If the loop executes more than once, it just updates the sum until it has detected no further collisions. We say "MAY" come out alright. It depends on what is expected to be returned as the result of each thread's access to the integer. Is it the CURRENT state of the sum when the pseudo-atomic operation is entered? If so, the results will almost surely be incorrect if there are any collisions (unless someone is adding 0's). The only way it will be correct is if the user expects the result to be the ultimate value of the sum that has been continually updated by sidestepping any collisions that may have occurred within the loop. Furthermore, this result would be closely tied to the pseudo-atomic operation implementation. It could be made correct by capturing the FIRST sum calculated and saving it for eventual return to the caller, while still continuing with the update loop so that the final sum is calculated correctly. Why would one possibly care if the sum returned was the state at one instant or one millisecond before? In most cases it would be irrelevant. It is not irrelevant in our application, since the correctness of our results often depend on timing and ordering of specific operations.

Another point to consider is that the correctness of the concurrent addition process just described is contingent on the fact that all threads are accessing the data item with an add operation. Let's, instead, say that one thread is resetting the number to zero if it exceeds a certain value and the rest are adding, as usual. A moment's thought will convince the reader that the results will not be the same as if true atomic operations were involved. Worse yet is the case when a main thread (say) is performing a tally of how many times other threads have accessed the shared data item since the main thread's last access. Obviously, retry loops are not appropriate for performing audits of any characteristic based on timing of thread access. Some seasoned authors who embrace this technique have argued that only in rare cases will the loop execute more than once. This may be true, but it's not guaranteed. Furthermore, in our application, even one collision (as defined above) is entirely unacceptable. When we perform an operation that we are counting on to be atomic, we need to lock the data so that no one else can read or write it.

Let me reiterate that retry loops are very useful in many applications. They are a non-blocking technique and are very useful (when applied correctly) because they can eliminate deadlocks. We implement techniques similar to those described in [Anderson97] to make effective use of retry loops in our hard real-time application. One just has to think think things through and make sure that the results are correct in a particular scenario. This is a very good place to start this article, because it demonstrates the need for a verification methodology that can reveal problems in the application of a given synchronization technique. We will get our start by demonstrating how our simple testbench can shed some light on the operation of retry loops.

A Simple Demonstration

Note the use of the term "MultiProcess" in what follows. This appears in various places in the code, since it is the base name of our processing abstraction Types. In this article, we use both terms "threads" and "MultiProcesses". From our perspective, a thread is but a particular implementation of a MultiProcess. MultiProcesses can also be mapped to hardware, but we obviously can't post hardware on the Code Project, now can we?? Seriously, though, we try to make the fact that an algorithm is operating on an attached piece of hardware versus a thread transparent to the user. Obviously we have implemented everything in this article with System.Threading.Threads. We use the Capitalized version (Thread) when we want to refer to managed threads (System.Threading.Threads) in .Net.

In this section, we will demonstrate a simple test of the pseudo-atomic operation style that uses a retry loop. We don't use it in our implementation, but it's a simple example to start with to demonstrate concepts. We display the code implementing our retry loop in listing 1. The method shown is part of the test class QATester_MultiProccessing and accepts a delegate to perform a binary operation on a System.Int32. The definition of the delegate (which we use quite a bit) can be found in AtomicUtils_Article.cs. For those readers unfamilair with C# delegates, they are very similar to function pointers in the C languages. C# delegates, however, include a specification for the Types of parameters and return value (if any). Any method matching the pattern in the delegate definition (including the Types) may be passed as a parameter wherever a given delegate is called for. There is again a bit of confusion concerning nomenclature, since C#'s definition of delegate is but a compiler token that allows access the the fundamental CLR Type System.Delegate. We try to differentiate the two by using case and color (Delegate vs. delegate).

For a discussion of how the C# language compiler maps the C# delegate into the rather complex facilities of System.Delegate, see [Lowy2005] (pg. 131). [Duffy06] (pg. 519) provides a deep treatment of a System.Delegate's representation inside the .Net Common Type System(CTS). Our delegate in listing 1 accepts a source parameter and a byRef target parameter which is to be updated. As can be seen in the listing, the logic of the method centers on the call to CompareExchange. The CompareExchange method provides true atomic operation, comparing the third parameter with the first (the target) and replacing the target with the second (the opResult) if and only if the comparison succeeds. The CompareExchange method always returns the value of the first parameter at the instant itis called. This allows the operation to be tried on the updated value over and over again until it succeeds. The CELOOPPseudoAtomicInt32Caller method returns a value indicating whether or not a collision occurred (whether another Thread snuck in and changed our Int32 when we weren't looking). The code shown here allows us to assign different operations to different Threads and also wrap delays inside the OPs to simulate Thread timing and loading dynamics.

C#
// Delegate for a binary Int32 operation.
public delegate System.Int32 Int32Op(ref System.Int32 target, 
                                     System.Int32 source);

// Caller for an Int32 OP.
internal static System.Boolean RetryLOOPPseudoAtomicInt32OpCaller(
    ref System.Int32 target,
    System.Int32 source, Int32Op oP, out System.Int32 lastValue)
{
    // This variable stores a local copy of the original value of the target
    // that is read before the OP and the atomic exchange. It is used to
    // compare with the most recently fetched value from the atomic exchange
    // to see if it has been modified by another Thread.
    System.Int32 workingValue = target;
    // This variable stores a local copy of the most recent value of the
    // target as returned from the atomic exchange operation.
    System.Int32 lastUpdatedValue = target;
    // This variable will be set to <C>true</C> if we undergo a collision.
    System.Boolean hadCollision = false;
    // Result for our oP;
    Int32 opResult;

    // Repeat until we get no more collisions.
    do {
        // Set the working value to the last update.
        workingValue = lastUpdatedValue;
        // Perform the OP on the local opResult variable.
        opResult = workingValue;
        oP(ref opResult, source);
        // Make the switch only if the target has not changed from our
        // last working value. lastUpdatedValue always receives the current
        // value of target as returned by CompareExchange.
        lastUpdatedValue
            = Interlocked.CompareExchange(ref target, opResult, workingValue);
        // Report progress if we want....
        if(QATester_MultiProcessing_Testdata.s_reportToScreen)
        Console.WriteLine(" >Processed a Number on ManagedThreadID #: "
            + Thread.CurrentThread.ManagedThreadId.ToString());
        lastValue = lastUpdatedValue;
        // If the two values are different, this means that we had a collision.
        if(lastUpdatedValue != workingValue) hadCollision = true;
        // Keep LOOPng and RETRYing if we have had a collision.
    } while(lastUpdatedValue != workingValue);
    return hadCollision;
}

Listing 1. Caller Method for a Binary Int32 Operation

In figure 2 we display some results of running the retry loop shown above on a few Threads. We will describe and diagram the full architecture that we employ to support such tests a bit later in the article. For the moment, we'd like to introduce it with this simple example.

Our testbench has the ability to create and start a number of worker Threads, each running a "caller" such as RetryLOOPPseudoAtomicInt32OpCaller a specified number of times, with the same target. Please forgive the long-winded name of the method. There are a large number of experimental "callers" in the testbench and the descriptive names help keep them straight. (They are all included in the download). In the first experiment in our test set, we create a total of five "spawned" Threads, in addition to the main Thread. All our tests are designed to optionally do some processing in the main program in addition to processing on a user-specifiable number of Threads that are created and started by the the main program (referred to as the "Main Thread" in what follows). This type of arrangement is also used to simulate a situation where a "host" is acting as a controller of some parallel processing assets. The "ManagedThreadID" printed in the output is nothing but the ManagedThreadID property pulled off the various executing Thread class instances. "Worker Thread #n" just indicates the n'th Thread that the main program has fired up. The testbench includes a random number generator that is used to synthesize both random delays and random data for the tests. The random data is usually applied to the "source" parameter to the caller methods, which is done in the test shown here. In this test, random delays have been applied inside the oP delegate. This first test uses a simple replacement operation as the delegate, which simply replaces the target with a new randomly-generated source integer. The only reason we need the source of integers here is to tell if the numbers have been overwritten or not. In this case, we are not doing any examination of the results outside of the retry loop caller. This is obviously a degenerate example, since the replace operation is redundant with that of the CompareExchange method call. The point of the test is to illustrate that a retry loop does not implement an atomic operation. If Threads collide, one doesn't always get the expected result......

Sample image

Figure 2. Output from Retry Loop Caller Showing Collisions.

Why would we ever need such a capability as a "testbench" with random numbers and these sorts of things for concurrency testing? The problem is that validation of any concurrency mechanism involves verifying a negative hypothesis - that data will never get corrupted or out of synchronization or deadlocked because of a design error. A program accomplishing concurrent data access may run for years in support of a given application without any sort of failure. One never knows whether the program will provide reliable operation under all conditions. It often happens that concurrent programs will fail under conditions for which they were not originally tested. In practice, this usually occurs when some strange combination of events occurs, quite often under an improbable circumstance. While it's never possible to test for all eventualities, it helps a great deal to stimulate a concurrency management architecture with statistical sources. This testing methodology can help identify problems by subjecting the design to a rather large combination of events in a reasonable time. Any statistical testbench should be capable of generating fixed (non-random) events as well, in order to subject the design to certain specific conditions that may be suspected to be troublesome. Some statistical testing of a design on the front end of a project can reduce the nightmare of debugging concurrent systems to some tolerable level (It's still awful). Concurrency problems have a well-deserved reputation of being the worst problems to debug and solve in the software world.

Object Locking in .Net

Microsoft Corporation devised a special locking mechanism for use in .Net. It involves a table of entries called a "sync table" which is populated with indicators describing which objects in the GC heap have been locked by a client that wishes exclusive access. This facility is accessed through the Base Class Library (BCL) System.Threading.Monitorclass and implicitly through the C# lock keyword. [Richter06] provides a scholarly treatment of this synchronization mechanism (p. 630) as well as the .Net garbage collector (p. 457) as implemented in the Microsoft .Net 2.0 CLR. Interested readers can find a comprehensive treatment there. This facility is now also part of ECMA, although the implementation through a "sync table" is a platform-specific detail, not part of ECMA. It suffices to say here that the locking capabilities afforded by Monitor are only available for Objects on the heap. This means, unfortunately, that access to our Int32 cannot be synchronized with the Monitor class. All is not lost, however, since a ValueType can be wrapped inside a C# class and the class can, in turn (being an Object), be synchronized with Monitor. There are specific reasons why we wish to employ this technique in our work. We will describe these reasons in detail in a later section.

There are a number of additional obvious ways that almost any design could make use of Monitor and its Object-locking capabilities. Many elementary books on .Net indicate the possibility of simply locking an instance of a class when synchronized access to its members is desired. Usually these same books will admonish the user to refrain from doing so, since a hostile party could also take out a lock on the same Object (if accessible) and deny the rightful client(s) access to the Object. Things get even worse for locks taken out on class-level object, which would occur when synchronizing access to a Type's static variables by locking the Type itself. The solution that is usually proposed is to place a private class of some sort as a member variable on a class. Methods needing to access synchronized member data can then (consistently) lock on the private Object and everything works just fine. We won't provide further detail here, but we refer the reader to [Richter06] or [Lowy05] where examples abound.

Kernel-Level Locking

There are several facilities specified within the ECMA standard which map to kernel mode operations in Microsoft's CLR. These facilities, such as Mutex and Semaphore are wrappers for classic Win32 functions of the same variety. We have employed these facilities at various times and in various places in the past. As stated previously, we are attempting to operate in user mode to the extent possible, with plain-vanilla synchronization features of the CLR. We do this for speed and to avoid any implementation inconsistencies that tend to crop up across platforms when employing features that haven't been in use for long enough to get the bugs out. For the moment, we are attempting to go as far as we can with monitor and lock, seeing just what we can accomplish within the CLR.

The CLR Thread Pool

There are a number of very useful facilities made available through the .Net System.ThreadPool class. We've used these in many places in the past and continue to do so. However, for the rewrite/unification/whatever of our concurrentcy architecture, we need to be able to abstract away the notion of threads entirely. We are trying to make our MultiProcessing classes general so that from the client's point of view, it is immaterial whether a task is carried out on a piece of hardware or on another machine or on another Thread (of course with the advent of MultiCPU machines the concept of "another machine" is becoming ill-defined). Asynchronous delegates support some of this, but we need something quite general that we can wrap and expose different functionality for different purposes. That, plus the fact that we need to do hierarchical management of MultiProcesses and support specialized MultiProcess queues and a few other things makes it unfeasible to use the standard Thread pool.

The TestBench

Our "TestBench" is a set of classes that can perform statistical tests on various concurrency management architectures to HELP gain confidence that their operation is correct. It contains various "TestRunner" methods that can be run under the NUnit testing framework. These can also be run from an ordinary command prompt, for those who do not have NUnit installed or choose not to use it. See the NUNIT_IN_USE constant at the top of QAUtils.Article.cs. It is sometimes easier to debug programs from within tools like Visual Studio when they are not run from NUnit. By providing these simple classes, We're hoping to provide folks with a starting point for concurrency work.

TestRunners

The tests are performed within an architecture that resembles that in figure 3. The "Caller" method (see listing 1) is shown in general form in the diagram, along with its internal OP delegate. Recall the test results in listing 1 were generated with a simple replacement OP. We make extensive use of delegates in our multiprocessing architecture. The .Net System.Threading library makes extensive use of delegates and we also find them quite useful in allowing us to run different types of processes from our MultiProcessControllers. In figure 3, delegates are shown in the boxes with rounded edges, colored in red. We normally like to use UML diagrams when possible. What needed to be described here seemed a bit too convoluted to be adequately detailed with a reasonable combination of UML diagrams. Thus this diagram - a mishmash of different paradigms. If anybody can make a better diagram, send it to us and we'll be grateful.......

Sample image

Figure 3. Architecture of a Typical TestCaller

As can be gleaned from the figure, the Caller method is actually itself a delegate that is supplied to a "CallerProcess" that is responsible for invoking the caller and supervising its operation. The CallerProcess is, in turn, a delegate of the form called for by one overload of the Thread.Start() method. One of the convenient enhancements to the System.Threading namespace in .Net 2.0 is the ability to pass a data object into a Thread process. The new "ParameterizedThreadStart" delegate allows a System.Thread to be created with reference to a delegate that can accept a System.Object when the Thread is started. This is much more convenient than passing auxiliary static data that has to be kept reconciled with the active Threads, etc.. This is more consistent with a true multiprocessing architecture where a processor is associated with its own data store.

The name QATester_MultiProcessing derives from the fact that prior to the use of NUnit, we did our own unit testing through reflection. Instead of using attributes, we employed a certain naming convention for test classes and methods. This whole thing is explained in files within the MPFramework.AppCore.QualityAssurance namespace for the interested reader. For the article, we created the set of TestRunners within QATester_MultiProcessing_Article by simplifying some of our control methods to make them more accessible. We have attempted to design the various TestRunner, CallerProcess and Caller methods in an evolutionary fashion, starting simple (just the retry loop functionality), then adding features to create more advanced versions. We provide a large variety of these methods simply because it's tough to figure out how to do certain things unless they have been seen before (been there). Hopefully, though, we've structured things so that the examples are understandable. Note that all of the NUnit test methods in QATester_Multiprocessing_Article are qualitative - they were created for this article and they just scroll results to the screen. Test methods in the accompanying QATester_ReferencedValueTypes_Article class are quantitative - they employ NUnit assertions to ensure numerically correct results.

The blue lines in figure 3 indicate the data that is passed down from the testing framework. There are higher-level caller methods in the QATester_MultiProcessing_Article class that are the actual NUnit test methods. For those unfamiliar with NUnit, the NUnit testing framework currently employs the facilities of the .Net 2.0 System.Reflection namespace to scan assemblies for methods decorated with the [Test] attribute and runs them, collecting test results as it goes along its way. Living above the TestRunners is a number of these methods. These are not on the diagram because it's too complicated already :-)

In our MultiProcessing classes, we apply various C# interfaces to the "CallerObject" indicated in figure 3 to invoke supervisory protocols. Again, in the interest of simplicity, we don't deal with any of this in QATester_MultiProcessing_Article. We did extract one very useful concept for use in the TestBench, however. As can be seen in the figure, there is an indication of a "Control" and "Status" capability which is exposed to both the TestRunner and the CallerProcess. In our full-up system, we have a set of registers that form a "virtual front-panel" implementing various supervisory functions. This provides an emulation of an actual hardware interface. For these simple demonstrations, one version of CallerObject includes two Int32 variables - one a StatusRegister and one a ControlRegister. We use these in one of our experiments to demonstrate how executing Threads can report their progress and how the TestRunner can control running Threads. [Lowy2005] (pg. 247) illustrates a simple version of this technique. We use it extensively in our Framework. Referring again to the figure, Callers are designed to operate in a loop, with a default number of iterations specified in an input parameter. The presence of the StatusRegister allows the TestRunner to receive reports from the CallerProcess on its progress. The TestRunner, on the other hand, can issue commands to the CallerProcess by writing to the ControlRegister. Each of these registers is normally only writable from one side of the interface. This completely eliminates the need for managing concurrent access to these data items. This is similar to the type of interaction that is afforded through a memory-mapped hardware interface. There are usually a number of other registers (e.g. programmed I/O register, DMA control register, etc., etc..), but these two suffice to demonstrate the concept.

Note that the TestRunner has the capability to start multiple Threads. The developer of a test may pre-store data for individual Threads in the CallerObjects array before the test is run. This data may include a Caller delegate and an OP delegate. Thus, each Thread may have a different Caller and a different OP. In this way, it's possible to create Threads that are performing different activities, such as adding, counting, resetting and so forth, as mentioned in an earlier section. It is also possible to customize the random number generators for each Thread that is created. Each Thread maintains an independent copy of the data random number generator and the delay random number generator, as indicated in the figure. Although it's possible to customize each Thread individually, it's also possible to create some custom Threads and clone the rest from a "standard" Thread. An indication can be made to a TestRunner (not the simpler ones) that the data for certain Threads should be cloned. In this case, the TestRunner will create new random number generators for each "cloned" Thread by copying the parameters from an existing set of random number generators and using the Thread.ManagedThreadIDs as seeds for the new independent generators. This is a convenience for those cases where it is desired to create a large number of similar Threads along with one or a few unique Threads. We will describe this further in sections on the AuditingRandomNumberGenerator and the CallerObject.

The Generic IAuditingGenerator<T>

Before we discuss Generics, a comment should be made about terminology. In discussions about Types in the .Net CLI, the terminology used has been most confusing. Certain terms like value, object, reference, delegate/Delegate and class are invariably used to mean different things at different points in any discourse ever seen concerning these subjects. The problem is, of course, that some folks just take ordinary common terms and associate them with concepts that have a very specific meaning. And they do this again and again with the same term in different contexts. Even the word "context" is overloaded in many discussions!!! As a result, it's almost impossible to make sense of some of the detailed expositions on the .Net VES and similar topics. Things have gotten even worse with the introduction of Generics. More overloaded terminology. ECMA has done a bit to dis-ambiguate terminology, but it's still not clear, even in the standards documents. Well, this is not going to change any time soon, but at least we will attempt to use terms unambiguously in what follows. Please let us know if anything is unclear, since this is an important goal for us.

We have developed an interface for a Generic random number generator that allows access to some internal bookkeeping capabilities. In our work, it has proven useful to record the sequence of numbers that have been generated, for example. This is sometimes used in an analysis of what went wrong in certain concurrent operations. We don't use these capabilities in any of our experiments in QATester_MultiProcessing_Article, but this is the origin of "Auditing" in IAuditingGenerator<T>.

This is the first time we are touching the concept of Generics. Generics are new in .Net 2.0 and have added a great deal of power to the .Net Framework. There are many areas in which Generics are helpful. For a full treatment of Generics in .Net 2.0, see [Golding05]. The main benefit of generics in our work is the ability to deal with System.ValueType (henceforth termed ValueType) data items in the .Net Framework without boxing. In C#, these data items are either primitive Types (e.g. int, float...), struct (user-defined ValueTypes) or enums. It's mostly hidden from direct view of C# programmers, but these C# Types are all derived from the abstract System.ValueType Type (a .Net Class) within the .Net Common Type System (CTS). We use the capitalized version of the word (i.e. "Type") when we want to refer to the specific categories of data items in .Net. The ValueType Class derives directly from System.Object and contains no data members. ValueType-derived Types are unique in they can exist in either boxed or unboxed form. The terminology gets tough here already, since the C# class keyword refers to a .Net Class that is not derived from System.ValueType - i.e., a C# class is a Reference Type. Confused? It's OK - it is VERY confusing. We try to use the blue highlighting for C# keywords when we are referring to C# concepts.

[ECMA335] (12.1.6.1) develops the concept of a "HOME" for a Type's databytes (we use the term "databytes" for a Type's data to help disambiguate terms). We capitalize HOME, since it is a critical pedagogical term from our viewpoint. In boxed form, the HOME of a ValueType's databytes is within an object slot on the managed heap. In unboxed form, a ValueType's databytes can be in a HOME somewhere outside the heap. Most C# programmers are accustomed to passing primitive Types and structs to methods. In this case, their HOME can be on the evaluation stack of the VES (if passed by copy). We use the term "passed by copy" to avoid an ambiguous use, this time of the term "value". In C# code, ValueTypes can also have a HOME in local storage, like this: int myInt = 1234; . Note that the HOME of a non-ValueType's databytes is always on the heap. This is true for all C# classes. Microsoft has tried to make it easy for developers with a background in the c languages to step up to .Net (to their credit) by using the names "struct" and "class" that are familiar to those coming from the c world. However, as soon as the underpinnings of the CTS begin to be revealed, things start to get confusing. C# compiles to a fundamentally different method of execution from the c languages and requires a bit of explanation. One of the fundamental difficulties in interacting with unmanaged real-time code is moving databytes from a HOME on the heap to a HOME outside the heap (and back again) in an efficient way.

Another issue that arises in the context of interoperability is that of "safe" code. Even though ValueTypes can have a HOME outside of the managed heap, their access is still Type-safe. In the VES, a ValueType is always accessed by a TypedReference, no matter where it lives. This is true, at least, when implementing verifiable code. In C#, this is any code which does not employ the unsafe keyword. In CIL, one has to know the rules for verifiability, but ANY CIL code should always be checked with peverify.exe. This will determine whether any unverifiable constructs (analogous to C# unsafe code) are used in the CIL program. In our work, we attempt to keep the amount of unverifiable code to a minimum. It's a good practice in general.

We deal with unmanaged code interoperability a great deal and it's important for us to be able to handle ValueTypes in an efficient manner. For instance, in C# 2.0, one can create an array of custom structs without having every single one placed in a box. It should be mentioned that much of the inefficient use of managed memory can be overcome through coding directly in CIL. However, this is a ghastly and tedious procedure - eliminating some of it is a boon to the .Net developer community. Generics have helped a lot in this regard.

What are Generics? We'll say just a few things here, leaving the rest to the authors of the many fine textbooks that cover the subject. Generics are similar in concept to c++ Templates. They allow operations, containers, classes, etc. to be defined for a "placeholder" data item that is not fully known when the Template/Generic is written. The placeholder data item is sometimes referred to as a "parameterized type". Templates/Generics allow a common set of behaviors, structures and interrelationships to be defined for an abstraction of a data item. A single Template/Generic can be specialized to many different data items, obviating the need to replicate much of the same code over and over again for each. In this regard, c++ Templates and .Net Generics are similar. There are significant differences, however. c++ Templates are "parameterized" at compile time or, at best, at link time. The compiler must have complete knowledge of the type of the data item that the template will operate with. In contrast, .Net Generics are "constructed" at JIT compile time, when the needed Type is loaded. A .Net Generic Type that has been constructed through the JIT compilation process is also known as a "closed" Type, as opposed to the "open" designation of an unconstructed .Net Generic Type. Additionally, the .Net community seems to favor the use of "TypeParameter" to refer to the placeholder in a Generic Type. We will use this term in what follows. There are advantages to both c++ Templates and .Net Generics. An often-mentioned advantage to .Net Generics is the ability to only construct the closed Types that are accessed during a particular invocation of an application. This avoids code "bloating". In fact, the notion of an "open" .Net Generic Type exists both at the source code level and at runtime, whereas in c++ it is a source code notion only. It's quite useful to be able to create constructed Generic Types dynamically, though reflection. The System.Reflection namespace has been updated with a full complement of capabilities for handling Generics. [Golding05] (pg. 187) covers this topic comprehensively, but there is also some useful material in [Smachia05] (pg. 410) not found elsewhere. To be entirely accurate, we should mention that the situation is slightly more complex for .Net Generics. It's possible to supply a closure for a Generic at language compile time, before the Type is ever used (Note that we must sometimes employ the term "lanaguage compile time", since the advent of JIT compilation has caused an ambiguity in the term "compile time" in some cases). This can be done, for example, if a given custom .Net Type inherits from a Generic Type and supplies a concrete TypeParameter with the Type definition. This would look like: classmyGenericClassClosedAsIntClass:myGenericClass<int> {}. With .Net Generics that have more than one TypeParameter (have "arity" greater than 1 in the standard .Net jargon) it would also be possible to close some TypeParameters and leave others open when inheriting from the Generic.

Java Generics are also quite a bit different from .Net Generics. If .Net Generics were implemented in the same fasion as Java Generics, they wouldn't do much for us. Generics in Java are primarily a type-checking mechanism to ensure type-safety when a given pattern (the Generic) is applied to a data item. Java bytecode generated at compile time is independent of the type of the data item[Venners04]. The Java compiler acts, in effect, by ensuring that a given sequence of bytecode can be safely applied to a given data item. The bytecode is not specialized to serve the needs of a particular data item and thus no increased efficiency ensues. Contrast this to .Net where the metadata corresponding to the Generic and the Type are combined at JIT compile time to create an efficient representation of the Type T and an customized implementation of the manipulations defined by the Generic. Actually, this is a bit of an oversimplification. JITted Generic code is shared in .Net across all Reference types. This is done because all Reference Types are accessed through pointers into the managed heap, anyway, so one really doesn't save anything by generating specialized code. However, (important for us!!!), .Net Generic code is specialized for each and every ValueType that closes a given Generic. This is important, for example, because collections of ValueTypes can be laid out flat (end-to-end) in memory without being partitioned into boxes. And, of course, ValueTypes can also be moved to a HOME on the stack, which is where we need them for unmanaged calls. It is due to the existence of ValueTypes in .Net that the addition of Generics to the .Net Framework is so much more important than the incorporation of Generics into Java (Java is a closed environment with classes and primitive types only). SUN actually terms this environment the "Java Ecosystem". To be sure, they are handy in Java, but there they affect syntax only, not the internal efficiency in the way Java types can be handled. For a definitive treatment of the way Generics are handled in Java vs. Net, see [Estrada04]. The incorporation of Generics into .Net has allowed us to move large portions of our project from unmanaged to managed code. Effectively, we are able to break unmanaged code access into smaller "pieces" and move more of the control into the managed world. This will become obvious as we move forward.

Having sung the praises of .Net Generics we must be fair and mention that there are some drawbacks compared to c++ Templates. c++ Templates can essentially be thought of as macros. At compile time the macro is filled out with a token corresponding to a specific data type (say an "int") and that string is substituted anywhere the placeholder token appears. Thus, for example, it's possible to have an indication of one data item being added to another with the standard "+" sign. We can write token3 = token1 + token2; and if it is an int or a float that we are parameterizing the tokens with, everything works out just fine - we get either an int or a float addition. If the placeholder data type does not support an addition, the compiler will generate an error. Such is not the case with .Net Generics, unfortunately. Since the language compiler does not know the TypeParameter at compile time it can't just assume that whatever Type it is can support the "+" operator, for example. Well, what good are Generics, then, if they can't even do a simple addition? The approach .Net takes is to constrain a TypeParameter Y to implement interfaces. The style is to define a math library with an interface containing methods like Add(), Sub(), Or(), And() and all the rest for primitive Type manipulators. Some consider this to be a serious restriction. On legacy code conversion projects we have managed, it has been our experience that this style is quickly assimilated and provides great benefits in terms of generality. Furthermore, interfaces allow easier runtime definition of the operations we might desire a Type to support.

Getting back to the generator...

Our generators are Generic for the simple reason that we want to be able to generate different data Types. Handling generators through an interface provides the usual benefits of handling Types through interfaces (in the general sense) - being able to abstract and separate certain functionality of Types from any specific implementation. An interface that is also defined generically (such as IAuditingGenerator<T>), allows a further level of abstraction, this time on the data. What the declaration "IAuditingGenerator<T>" announces to the world is that it is a definition of a set of methods, some of which contain placeholders for arguments or return values, and that those placeholders are for an (as yet) unknown Type "T". Thus, a Generic interface allows the definition of both the functional implementation and the concrete data Type to be postponed until the functionality needs to be realized in a specific application. To give a feeling of how this works, the methods of IAuditingGenerator<T> are shown in figure 4.

Class Diagram of AuditingGenerators

Figure 4. Type Relationships for AuditingRandomGenerators.

This figure was generated with the "Class Diagram" tool within VS2005. Is does a very nice job when one doesn't need to diagram something as complex as the TestCaller in figure 3. This tool is accessed by selecting a VS project in the Project window and then adding a new "Class Diagram" item. Then just drag a C# file onto the design surface that comes up and the tool will create the diagrams. The "AuditingGenerators.cd" file in the download project is the result of dragging the whole "QATester_MultiProcessing_Articles.cs" file onto the design surface and then deleting everything but the AuditingGenerator-related items.

We are going to describe the use of Generic Types and Generic interfaces just a little bit in order to prepare for later work. First, the methods of IAuditingGenerator<T>, out of order...

Any sensible generator would need a "Next()" method - this gets the next number in the sequence. The numbers could be deterministic (e.g. a square-wave generator), but the two generators in our test code are random. Our generator implementations wrap the standard System.Random class that provides a pseudo-random stream of Int32's. We do internal conversions to get random Type T's out of the generators. No problem if T is Int32. No problem if T is a Single or Double - just use the standard CLR conversion functions. But wait - what if we wanted to generate floating point variables that are uniform on the interval (-10.0, 10.0)? That's the reason that interfaces are useful - we could build a different generator and stick it behind the interface if we wanted. Fortunately, our basic implementation of the generator provides a virtual conversion function that the user can override to provide a custom mapping from Int32's to Type T's. However, the entire generator Type could be replaced if it was desired. So with a Generic interface, customization of both the functional implementation and the data is possible.

The constructors of our default generators accept an Int32 seed, an Int32 minimum value and an Int32 maximum value. The minimum and maximum values are immutable once the generator is instantiated.

The "Reset" methods allow the generator to be reset to a certain internal state. The parameterized version allows the generator to be set to an arbitrary state. The unparameterized version always returns the generator to it's state when first constructed.

The "SpawnGenerator(int initialState)" method allows a clone of an existing generator to be created. The only thing that changes is the state of the generator. The TestRunners make use of this to create identical generators with different seeds derived from individual Thread characteristics. Note that the output Type of the SpawnGenerator(...) method is another copy of IAuditingGenerator<T>. It's quite useful for interface methods to work with interface Types. Generic interfaces are no different.

"ConvertNumber(Type outputType)" allows the Int32 generated internally to be converted to an arbitrary Type, whose System.Type will be dynamically determined at runtime. It is sometimes useful in our work to be able to generate "auxiliary variables" whose type is not known at language compile time. Since the type of the output number is dynamic, it must be output as a System.Object. The databytes of ValueTypes are output in their boxed form, with the System.Object reference. There is no reason why the variable that is output could not be a C# class. Someone would have to define a conversion function internally.

The final methods defined by the interface are the "GetNumberHistory" and the "GetOperationHistory" methods. These methods are used in advanced scenarios to extract the sequence of random numbers from any generator and also to extract a sequence of results from certain operations. We don't use these in these examples. They are employed extensively in stress testing of our remote service architectures. They will show up again in future articles.

There are two classes that provide implementations of IAuditingGenerator<T>. We do not discuss these in detail in this article, but point the reader to the C# source code, which is thoroughly documented. The first, AuditingInt32RNGenerator, is a concrete implementation of the IAuditingGenerator<T> closed with Int32 as the TypeParameter. The second, AuditingRNGenerator<T>, is an open Generic implementation of the interface. Why two? We included the first because it is easier to understand and because it shows how a concrete implementation of a Generic interface can be optimized for a particular TypeParameter or set of TypeParameters. In this case, we know that the TypeParameter is the same as the underlying Int32 number. Thus, we don't have to do any conversion to T. The class is simpler, in general. When we need the full generality of the open Generic implementation, we can switch it in behind the scenes. This is a design/implementation/refactoring/whatever technique that is used often in our framework with Generic interface implementations. Provide generality first, then customize for efficiency when hotspots and/or most-used cases are identified. Another reason that one would wish to provide a constructed Type at language compile time is to export functionality outside managed code. Unmanaged code cannot consume Generic Types. By providing a specific concrete closure, a Generic Type can be accessed from COM or through Pinvoke facilities. This particular generator needs a bit more work to expose to COM, though - there are a few more things than just the Generic that require modification before a COM wrapper can be built. But that's a whole different article.......

Note that both of these generators inherit from an abstract class, AuditingRandomGenerator. This class provides common methods for accessing the internal System.Random random number generator from the .Net BCL. There are actually no abstract methods in AuditingRandomGenerator (C# allows this). We wanted to provide an indication that this class doesn't really do anything and it is not sensible to instantiate it. Instantiation could also be prevented by making all constructors protected. However, the abstract attribute of the class shows up nicely in Class Diagrams and also in documentation, so we usually choose that approach when we can.

There is one additional general matter that should be mentioned in conjunction with our Generic implementation in AuditingGenerator<T>. Recall it was stated that constraints are sometimes placed on Generic TypeParameters. This is so the language compiler knows all the things that can be done with TypeParameter T ahead of time. Sometimes it is not reasonable or desirable to provide full information about TypeParameter T. Consider the simple case of type conversions. Surely, it's possible to convert between a variety of simple .Net Types. We use the standard BCL utilities that support that sort of thing. However, ParameterType T can be absolutely anything - we need it to have this flexibility. We might want to generate an arbitrary class with some random characteristic. In this case there is really no general way to bound the nature of ParameterType T. In other words, there is no way to make such a Generic Type entirely type-safe. This would be possible, if one could limit the universe of TypeParameters that could close a Generic. It would be convenient to be a able to specify, in a constraint, a list of valid concrete Types that T could take on. The current version of .Net does not support this. The next best thing is to provide this list of Types to a constructor for the Generic Type and have the constructor check for a valid Type. This doesn't provide a compile-time (either language or JIT) type check, but it at least allows the use of an unsupported Type to be discovered early, hopefully with an understandable exception message. One way to constrain the Type T would be to define an interface that it must implement - IConvertIntToType<T> or something like that. This seems a bit much for a simple test generator. Last, but not least, proper documentation assists the users of a Generic Type to establish what TypeParameters are valid.

The CallerObject

The CallerObject, like the the delegate methods described in previous sections, comes in various flavors. Recall that this is the generic System.Object that is passed into a System.Thread process delegate (in our case the CallerProcess) upon startup. We do a lot of work with it in our system, mostly implementing a variety of control procedures, MultiProcess tasking, scheduling, etc.. For our experiments here, we keep things very simple. The most complex CallerObject in QATester_MultiProcessing _Article, IRVTConcurrentOpCallerObject , is diagrammed in figure 5. Every CallerObject contains fields that expose the the OP delegate and the Caller delegate that are to be used in a particular experiment. Additionally, copies of the two random number generators are supplied so that each System.Thread can have an unique source of numbers. The DataGenerator is called from within the CallerProcess and supplies the source parameter in each Caller's argument list. The DelayGenerator's numbers are applied to cause a delay either within an OP delegate or within a CallerProcess. The NumCallsOnThread Property determines the number of times the Caller will be invoked by default from within the CallerProcess. The Target field is either an initial value for the common synchronized target or a reference to it, depending on the variety of the CallerObject. IRVTConcurrentOpCallerObject adds the ControlRegister and StatusRegister fields that are shown in figure 5. The reader will note that IRVTConcurrentOpCallerObject is actually a generic class parameterized by the Type of the target (which is the same as the source).

Sample image

Figure 5. Class Diagram of a Typical CallerObject.

Although we don't do it here, it's possible to abstract the specifics of a given CallerObject's internal implementation through the use of interfaces. A caller object might be defined for a particular MultiProcess and have specific fields associated with that MultiProcess. It's useful to provide an abstraction layer in the form of interfaces to allow manipulation by a higher-level MultiProcessController (another of our base Type names). We've defined two Properties on IRVTConcurrentOpCaller<UValueType> that indicate this Type of abstraction. It's useful for the MultiProcessController (in our examples, the TestRunner plays this role) to be able to monitor the progress of an iterative task by examining its CurrentIteration property. Similarly, the executing MultiProcess (in our examples, the CallerProcess plays this role) needs to examine the control register to determine when to stop. The stop command is exposed in an abstracted way through the ThreadStopNow Property. In our Framework, these sorts of basic control and status functions are exposed through various interfaces. We use the two abstracted Properties here just to indicate what's possible in a concurrency management design.

The ReferencedValueType - Synchronizing Wrapped ValueTypes

The architecture shown in figure 3 is not limited to working with primitive Types, such as the Int32 within our retry loop. As alluded to in the introduction, it's possible to work with Monitor and Lock to perform synchronization entirely within the VES. There are two limitations. Monitor can synchronize only objects - not unboxed ValueType's. The second limitation is that a single Monitor can only take out a lock on one object. It's possible to use multiple Monitors, but it's not possible to lock multiple objects atomically in one call. Both of these limitations are overcome in a concurrency architecture we have used for some time. We do need to employ ValueTypes in many circumstances, since we need to pass certain data structures to unmanaged code. We do this by wrapping ValueTypes in a ReferenceType and providing various interfaces to access the members of the ValueType. The problem of needing to lock multiple objects is solved naturally in our design by accessing objects hierarchically. An object provides a "gateway" to objects of lower level in a hierarchy of objects that undergo concurrent access. Additionally, object access can be "ordered" in our design. In many situations, there are dependencies between tasks that are to be performed on objects. Task 4 cannot be performed on object 3 until tasks 1 and 2 have been performed on object 1 and task 3 has been performed on object 2, and so forth. In our architecture, an object often acts as a controller for objects of lower level in a hierarchy of processes. We will not describe the details of this architecture in this article. We will concentrate on the methodology for wrapping a ValueType and providing efficient access to it. We accomplish this with certain Types defined in our MPFramework.AppCore.Manufacturing.SpecializedTypes namespace called ReferencedValueTypes (RVT)s. The files from this namespace that relate to RVT's are included in the download.

The approach to wrapping ValueType data items is to provide two interfaces that support placing a struct inside a class in a boxed form. A base non-Generic interface provides inheritance support, legacy code support and a method of handling heterogeneous RVT collections. A Generic interface inherits the methods of the non-Generic interface and adds support for manipulating wrapped ValueType data items as Generic Types. The methods of the two classes and their inheritance relationship is diagrammed in figure 6. The main Property of IReferencedValueType is BoxedValue. This is a managed pointer to an object on the managed heap which contains a boxed C# struct. Without the help of Generics, there is not much we can do with the internal ValueType, except identify its Type, compare it with other ValueTypes and swap its databytes with another. A class implementing IReferencedValueType can be specialized to support a given custom struct (think control registers, virtual front panels, etc.), without the use of Generics. This has to be done on a case-by-case basis, however, and is a bit tedious. With Generics, code to manipulate the ValueType into and out of the box can be written once, and provided in a consistent format to clients closing the Generic Type with a specific TypeParameter.

Sample image

Figure 6. IRVT interfaces and Inheritance Relationship.

Note: We had to export the Class Diagram in figure 6 into Visio and add the "where UValueType : struct" string. Constraints don't appear on VS - generated Class Diagrams for some reason. (Just so somebody doesn't pull their hair out trying to figure out why it doesn't work!!)

IReferencedValueType<UValueType> extends IReferencedValueType and provides access to both boxed and unboxed versions of the wrapped struct. Note the decorating text "where UValueType : struct" applied to the interface name. This is one form of a constraint that can be applied to a TypeParameter (see [Golding05]( pg. 111) or [Troelsen05] (pg. 337)). In this case it simply states that the Type must be a C# struct (a Type deriving from ValueType). This is significant. As mentioned previously, the CLR treats C# structs very differently than C# classes. The struct constraint is what allows us to access the internal representation of a boxed UValueType with a ValueType reference. We won't discuss it here, but the code documentation describes why it's useful to discriminate a System.ValueType reference from an ordinary System.Object reference.

It is important for our application to be able to switch between a boxed and an unboxed version of a ValueType in an efficient manner. The most important issue for us is to be able to "paint" the inside of box with data from the managed code side without ever having the boxed struct reallocated (and moved). A comprehensive set of sample RVTs written in the C# language is provided in ReferencedValueTypes_Articles.cs. In C#, the options for moving data items into and out of a box are somewhat limited. Unsafe code can be used to define unmanaged pointers into a ValueType's data bytes. interfaces can be defined for a UValueType to read/write its fields within a box. Reflection can be used to access fields on a boxed ValueType, but as is noted in the code samples, that does not gain us much. It will write the inside of a boxed value alright, but we have to box an incoming variable to pass to the reflection-based field setter. If this is in a scan loop in our NDE MultiProcess, we'll get garbage collections again.

The problem of the unavoidable copy/box operation associated with ValueType Types in C# is a grave one for us. In our application this is unacceptable. The problem is not only one of overloading the garbage collector (or even running it at all). In our application, we must work out of unmanaged memory segments that our Generic ValueTypes occupy within a HOME on the managed heap. For this reason, our critical RVTs are implemented directly in CIL code through the vehicle of the Microsoft symbolic assembler, ILasm. The VES allows data to be written directly into a ValueType's box, without creating additional allocations on the managed heap and without calling through an interface.

If there is an interest in the internal details of implementing RVT's, I'll be happy to do another article on it. We have a full set of profiling results for CIL versus C# implementations. I'm just not sure if this is a topic of general interest.

Synchronizing an Int32 with an RVT

In this section, we will provide another demonstration, this time of two concepts. First, we'd like to wrap a very simple C# struct in an RVT and synchronize the RVT with System.Threading.Monitor. The second feature of the demonstration shows how to use the StatusRegister and the ControlRegister from figure 3 to interact with executing System.Threading.Threads. In this experiment, we will set up the TestBench to allow the main Thread to do a certain amount of processing, then stop and monitor the worker Threads. The main Thread will monitor the progress of all the worker Threads through their StatusRegisters and wait until all worker Threads have completed at least 3 iterations (an arbitrary number) of their respective processing loops. The Main Thread will then set a flag within each worker Thread's ControlRegister to tell it to stop processing. To make things a bit more interesting, we have created the worker Threads with different characteristics. As mentioned in the section on the generators, it's possible to spawn some generators and also create some custom versions of generators for the various worker Threads. The main Thread is a relatively fast-running Thread, with delays uniform between 1 and 100 units (a unit is 10000 Int32 multiplications). Its "NumCallsOnThread" (figure 3) is set to 10. A "LongRunningThread" is created by spawning the main Thread's CallerObject (CallerObjects can be spawned, too) and then setting its NumCallsOnThread to 1000. A "SlowRunningThread" is created by instantiating a copy of the random delay generator with a fixed delay of 2000 units and setting its NumCallsOnThread to 10, just like the main Thread. A third worker Thread has its characteristics unspecified, so it will inherit it's characteristics (but not its generator seeds) from a default CallerObject passed in to the TestRunner. The test method where this is accomplished is QATester_RunTest_WorkerProcessControl_a(). Figure 6 displays the first and last parts of the Console output from the experiment.

In figure 7a, We can see the main Thread firing up the various Worker Threads. A soon as LongRunningThread (ManagedThreadID#3) is started, it processes rapidly. Note, also, that LongRunningThread does a lot of processing even before the main Thread has a chance to get started. SlowRunningThread (ManagedThreadID#4) doesn't manage to get much work done before it is shut down, due to the fact that it's delays are so long. Once the normal Thread (ManagedThreadID#5) gets started, it moves right along with a few passes of its own. Once the main Thread gets started, however, it completes all of it's iterations before anything else runs again. This is just random chance, however. Results will vary on different machines and between different runs on the very same machine.

Sample image

Figure 7a. First Screen of WorkerProcessControl() Output.

In figure 7b, We can see the main Thread shutting down the various Worker Threads. Note that, although the main Thread finishes its work fairly quickly, it can't shut down the Worker Threads until it has verified that each Worker Thread has performed at least 3 iterations. This doesn't happen until the languishing SlowRunningThread has completed its third iteration near the bottom of the console window. Note, also, that there is a bit of latency for SlowRunningThread to respond to the STOP command, stopping only after 4 iterations, not 3. It starts another iteration after it updates its StatusRegister and before it reads the ControlRegister at the end of its processing loop. Note that Worker Thread number three never receives the STOP command, since it finishes its work early, before the Main Thread broadcasts the command to all executing Threads. LongRunningThread, on the other hand, is only a small way through its 1000 iterations, so it will also be running when the STOP command is issued.

 Sample image

Figure 7b. Last Screen of WorkerProcessControl Output.

The moral of the story here is that Threads (and threads, in general) can be controlled in a variety of ways. The more structured the tasks that threads must perform, the more flexibility can be designed into the method of control. It really is a function of the logic that is implemented within a Thread's delegate (in our examples the CallerProcess) and its control process (in our examples the TestCaller).

Our final listing is that of the Caller for an IRVT that is employed in the experiment just run. Again, this caller is designed to be simple - just enough to demonstrate the concept of using Monitor to synchronize a wrapped ValueType. Once more, the Caller uses a delegate for an operation that is designed to perform a binary operation - this time on our Generic interface, IReferencedValueType<UValueType>. The advantage to handling wrapped ValueTypes through the interface is that implementors are free to provide the functionality of an IRVT on any class. This may be accomplished by constructing special facilities on a class containing a UValueType as a member or delegating to a contained class implementing IRVT or through any other means.

C#
// Delegate for a binary UValueType - valued operation.
public delegate UValueType IRVTOp<UVALUETYPE>
(IReferencedValueType<UVALUETYPE> target, 
 IReferencedValueType<UVALUETYPE> source)
where UValueType : struct;

// Caller for a binary IRVTOp.
public static System.Boolean IRVTConcurrentOpCaller<UVALUETYPE>(
    IReferencedValueType<UVALUETYPE> target, 
    IReferencedValueType<UVALUETYPE> source,
    IRVTOp<UVALUETYPE> oP, out UValueType lastValue)
where UValueType : struct
{
    // These variables are working storage for our internal generic
    // UValueType.
    UValueType oldValue;
    UValueType opResult;

    // This one has the usual purpose - it should never be true in this
    // method, however....
    bool hadCollision = false;

    // Lock the IRVT's underlying System.Object and perform the OP.
    lock(target) {
        // Save the last value.
        oldValue = target.Value;
        // Do the OP.
        opResult = oP(target, source);
        lastValue = opResult;
    }

    // Report progress if we want....
    if(QATester_MultiProcessing_Testdata.s_reportToScreen)
        Console.WriteLine(" >Processed a Number on ManagedThreadID #: "
        + Thread.CurrentThread.ManagedThreadId.ToString());

    // If the two values are different, this means that we had a
    // collision. Note that you should define a Type-specific "Equals"
    // on your own UValueType since the CLR does it through reflection
    // on arbitrary structs (slow), but you don't have to if you have a
    // specific concrete closure!! Note also, that we will be BOXing
    // again if we use something expecting a System.Object, so you
    // probably would want to define an UValueType.Equals(UValueType)
    // ala an implicit <code lang="cs">interface</CODE> implementation of Equatable<UVALUETYPE>
    // if you needed an equality comparison.
    //
    // This obviously should never be true, since we have the target locked.
    if(!Equals(oldValue, opResult)) hadCollision = true;
    return hadCollision;
}

Listing 2. Caller Method for a Binary Generic IRVT Operation Closed on an Int32.

The Caller is designed to perform the same role as the Caller in listing 1, which implements the retry loop. In this case, we are simply carrying the integer around inside of an RVT. We must, of course, employ a concrete RVT class to enclose our working Int32. This does not appear in listing 2, since this Caller manipulates the RVT through its interface. We won't detail it here, but a closure of a class in ReferencedValueTypes_Articles.cs: public class ReferencedValueType<UValueType>: ReferencedValueType, IEquatable<IReferencedValueType>, IReferencedValueType<UValueType>where UValueType :struct is used to wrap the Int32 by specifying it as the TypeParameter UValueType. This class provides only the simplest of functionality to work on the wrapped Type. In the Caller method shown here, a random UValueType is passed in, exactly the same as before. This time, however, it's passed in within it's RVT wrapper, handled through the interface. The only thing we need to know about an RVT is how to get at its internal value - with the IReferencedValueType<UvalueType>.Value Property. We need this in our loop to get an unboxed version of our (in this case) Int32 in order to compare it with the result of the OP. As explained in the code, this check should never be necessary. We leave it in here just to attempt an analogy to the retry loop case. Actually, another Caller, IRVTNonAtomicConcurrentOpCaller(), provides a closer analogy to the retry loop. It is actually a retry loop, but uses a Monitor to lock the RVT only AFTER the OP has been performed, leaving the data unlocked for the read and OP part of the code, just like the CompareExchange loop. This Caller will undergo collisions. Its test is in QATester_RunTest_IRVTNonAtomicConcurrentCaller_a().

The ReferencedValueType<UValueType> class has most of its methods defined as virtual for inheritance support. It is a "demonstration" class, however, filled with different examples of how to access wrapped System.ValueTypes, perform comparisons and other things. A inheritor of the class will probably want to remove any functionality that is not needed.

Moving Forward

Further Reading

Currently, there seems to be a dearth of good books on concurrency in .Net. There is a very nice E-book by Josoeph Albahari that is free: http://www.albahari.com/threading/ for download. There are a number of Java - related books that I like that are general in their treatment of threading and concurrency issues. Try [Goetz06] or [Oaks04], for example. Both of these do things from a perspective of 1.5 or later, although Goetz's book is more recent. Goetz has a very good chapter on testing of concurrent systems. For .Net Generics, the general .Net books covering 2.0 are fine. [Golding05] is comprehensive.

Future Articles

I've enjoyed writing this article. I'd like to know if readers would like me to continue with this topic or not. I was hoping that the code posted here might give interested readers an elementary starting point for performing concurrency experiments. Our concurrency issues, being somewhat hardware-related, are not entirely mainstream. There are lots of other parts of our framework I could talk about (and release). Interop, Service Managers, Type Handling, Remoting, are parts of the Framework that are fairly clean and would be easy to develop articles on.

Please Help

I am placing this code into the public domain without restriction. Anyone can use it for any purpose, including in commercial products. I may start an open-source project at some point, but I don't have the time at this very instant. If you find anything that is wrong or that could be improved or that is even unclear, please let me know. I would like to improve the code (and its DOCs) to make it more useful. If I use your bug fix or suggestion, I will give you credit - I promise. I'll be releasing more of the Framework as time goes on, in one way or another.

References

  • [Anderson97] - Anderson, J., Ramamurthy, S., and Jeffay, K.; "Real-Time Computing with Lock-Free Shared Objects"; University of North Carolina, 1997.
  • [Duffy06] - Professional .Net Framework 2.0; Duffy, J.; Wrox, 2006.
  • [ECMA334] - Standard ECMA-334, C# Language Specification; 4'th Edition, ECMA International, June 2006
  • [ECMA335] - Standard ECMA-335, Common Language Infrastructure (CLI) Partitions I to VI; 4'th Edition, ECMA International, June 2006
  • [Estrada04] - Estrada, M., Stansifer, R.; "A Comparison of Generics in Java and C#"; Florida Institute of Technology, 2004
  • [Goetz06] -Java Concurrency in Practice; Goetz, B., Peieris, T., Bloch, J., Bowbeer, J., Holmes, D., Lea, D. ; Addison-Wesley, 2006
  • [Golding05] - .Net 2.0 Generics; Golding, Todd; Wrox, 2005
  • [Herlihy93] - Herlihy, M.; "A Methodology for Implementing Highly Concurrent Data Objects"; ACM Transactions on Programming Languages and Systems, Vol. 15, No. 5, 1993, pp. 745-770.
  • [Lowy05] - Programming .Net Components; Lowy, J.; O'Reilly, 2005
  • [Lutz03] - Lutz, M., Laplante, P.; "C# and the .NET Framework: Ready for Real Time?"; IEEE Software Magazine, January/February 2003 (vol. 20, No. 1), p. 74-80
  • [Oaks04] - Java Threads; Oaks, S., Wong, H.; O'Reilly, 2004
  • [Richter06] - CLR via C#; Richter, J.; Microsoft Press, 2006
  • [NIST99] - Requirements For Real-Time Extensions For the Java Platform; National Institute of Standards and Technology, 1999; Carnahan, L. and Ruark, M., Editors.
  • [Smachia05] - Practical .NET2 and C#2; Smachia, P.; Paradoxal Press, 2005
  • [Troelsen05] - Pro .Net 2005 and the .Net 2.0 Platform; Troelsen, A.; Apress, 2005
  • [Venners04] - Venners, W., Eckel, B.; "A Conversation with Anders Hejlsberg, Part VII"; Artima Developer, January 26, 2004
  • [Zerzelidis05] - Zerzelidis, A, Wellings, A.; "Requirements for a Real-Time .NET Framework"; Department of Computer Science, University of York, U.K., 2005

Acknowledgments

I'd like to thank Marc Clifton for being kind enough to review the article. His many helpful suggestions have improved it a great deal.

History

  • 15 Jan 2006 - original.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Web Developer
United States United States
Kurt R. Matis received the B.S degree in Applied Mathemetics from Empire State College in 1981 and the PhD. degree in Electrical Engineering from Rensselaer Polytechnic Institute in 1984. He has been involved in several companies over the past 30 years, but has been most recently involved with the Macher-Plander Software Engineering Consortium, of which he is a co-founder. The Consortium is involved with education in .Net technologies and Software Quality Management topics.

Dr. Matis is a member of IEEE and the American Historical Truck Society. Kurt lives happily in Troy, NY with his beautiful wife, two beautiful daughters and his beautiful trucks.

Dr. Matis is interested in working with companies who wish assistance in porting legacy applications of all types to .Net. He can be reached at krogerma@aol.com.




Comments and Discussions

 
QuestionSync Table Pin
Pinky9826-May-08 13:23
Pinky9826-May-08 13:23 
AnswerRe: Sync Table Pin
krogerma26-May-08 14:39
krogerma26-May-08 14:39 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.