Click here to Skip to main content
13,097,230 members (56,308 online)
Click here to Skip to main content
Add your own
alternative version

Stats

21.4K views
232 downloads
35 bookmarked
Posted 15 Mar 2015

Threads, Processes, Memory Allocation, and Workstation Mode vs. Server Mode

, 15 Mar 2015
Rate this:
Please Sign up or sign in to vote.
What you may not realize about memory allocation and threads, and a little known thing called "Server Mode"

Introduction

A few years ago I converted a single threaded C++ application to multithreaded.  The application relied heavily on the STL and each thread did a lot (and I mean a lot) of memory allocations and de-allocations.  While that could have been optimized, the problem is (or was at that time) that the STL hides all the memory allocations from you in the various collection classes that my code was using, and I really wasn't interested in replacing the allocator.

Now, the interesting thing was, that the performance of the multithreaded version was considerably slower than the single threaded application.  This was very puzzling at first, because the processes could be neatly divided into the number of available cores, there was no intercommunication between the work, and the only synchronization with the main thread was "here's some work to do" where the actual work was easily 99% of the processing time as compared to the queue locking mechanism.

So, I did some digging and discovered that the memory allocation in C++ is, while thread safe, effectively a single-threaded function.  In other words, when a thread requested an allocation or released an allocation, all other threads blocked.  Now, this was totally unacceptable, and my solution was to launch separate physical processes, one per CPU, to do the work, and use pipes to communicate between the main application thread and the physical processes.  Again, because the overhead in communication was so low, this was not an issue.  The result was finally what I expected to see, namely that each core was utilized now at 100%, and indeed, the overall processing time of the work was reduced linearly by the number of cores.  If I had 4 cores, the work took 1/4 of the time.  And by the way, we're talking about doing an analysis that could take days, if not weeks, on a single core CPU.

I've always been curious how .NET behaved in a high allocation environment.  The results are documented in this article, and (no peeking) special thanks (though he'll never know it) to Craig Peters for his post on StackOverflow.

What Does .NET Do?

We'll first make sure things are working right.

Testing Non-Allocating Threads

Let's write a simple test case that doesn't do allocation, but instead just computes, over and over, the factorial of 100.  First, the setup:

static int FACTORIAL_OF = 100;

static void ThreadTest()
{
  List<Thread> threads = new List<Thread>();
  List<Worker> workers = new List<Worker>();
  int n = Environment.ProcessorCount;

  for (int i = 0; i < n; i++)
  {
    Worker worker = new Worker(FactorialTest);
    Thread thread = new Thread(worker.DoWork);
    workers.Add(worker);
    threads.Add(thread);
  }

  threads.ForEach(t=>t.Start());

  Console.WriteLine("Press ENTER key to stop...");
  Console.ReadLine();

  workers.ForEach(w=>w.RequestStop());
  threads.ForEach(t=>t.Join());

  Console.WriteLine("Done");
}

And of course, the work to do:

static void FactorialTest()
{
  decimal f = 1;

  for (int i = 0; i < FACTORIAL_OF; i++)
  {
    f = f * i;
  }
}

On my 8 core system, I see what I expect to see: all processors at 100%:

Testing Allocations in Threads

Now let's try the same thing but with allocation 10,000 16K blocks of memory on the heap (not the stack), which we immediately discard for the next allocation of 10,000 objects.  Instead of initializing a worker thread to compute factorials, we tell it do memory allocations:

Worker worker = new Worker(AllocationTest);

Implemented as:

static int ALLOCATIONS = 10000;
static int ALLOCATION_SIZE = 16384;

static void AllocationTest()
{
  // Console.WriteLine(AppDomain.CurrentDomain.FriendlyName);
  object[] objects = new object[ALLOCATIONS];

  for (int i = 0; i < ALLOCATIONS; i++)
  {
    objects[i] = new byte[ALLOCATION_SIZE];
  }
}

Here's the result:

Oh my.  Only 33% CPU utilization, and only four of the cores are actually doing anything.  So, we've learned that, as with C++, the memory management in .NET is blocking when we allocate memory.  No big surprise, really.  By the way, the memory allocation never exceeded about 1GB.  Remember, we're allocation 10,000 blocks of 16K each, or about 163MB per thread, so on my 8 core system, this would amount to about 1.3GB, which is in line with the bouncing around I saw with regards to the memory allocation.

Testing Allocations in Separate Processes

So let's try running this as separate processes.  Here's the code (including a very ungraceful Kill call to the processes):

static void ProcessTest()
{
  List<Process> processes = new List<Process>();
  int n = Environment.ProcessorCount;

  for (int i = 0; i < n; i++)
  {
    Process p = Process.Start("ProcessWorker.exe");
    processes.Add(p);
  }

  Console.WriteLine("Press ENTER key to stop...");
  Console.ReadLine();

  processes.ForEach(p => p.Kill());

  Console.WriteLine("Done");
}

Here's the results:

Ah, now, because each test is running in its own process, the memory allocations do not block across processses.

Workstation Mode vs. Server Mode

As I mentioned at the start of the thread, thanks to Craig Peters for this gem.  Let's go back to the thread allocation code, but now we'll introduce this in our app.config file:

<runtime>
  <gcServer enabled="true"/>
</runtime>

And the result:

Oh my, look at that.  We're getting on average 75% CPU utilization, and each core is mostly busy doing its thing.

Conclusion

I think you can reach your own conclusion here.  If you have threads that are very memory allocation intensive, you are probably being deceived into thinking that you are gaining much improvement of a single-threaded application.  If you don't mind a 25% loss in performance, setting the garbage collector to Server mode is a neat trick.  But if you really want to maximize your performance, create separate processes.  Of course, all of this is irrelevant if the worker thread is doing something that doesn't require allocation and garbage collection of memory!  Regardless, this little suite of tests should give you some pause when considering how to design a multithreaded application with all those other fancy features we have in C# now, such as Task, async/await, and so forth.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

Marc Clifton
United States United States
Marc is the creator of two open source projects, MyXaml, a declarative (XML) instantiation engine and the Advanced Unit Testing framework, and Interacx, a commercial n-tier RAD application suite.  Visit his website, www.marcclifton.com, where you will find many of his articles and his blog.

Marc lives in Philmont, NY.

You may also be interested in...

Pro
Pro

Comments and Discussions

 
SuggestionInteresting Write Up Pin
Dave Kerr6-Nov-16 14:30
mvpDave Kerr6-Nov-16 14:30 
GeneralMy vote of 5 Pin
D V L4-Nov-15 23:57
professionalD V L4-Nov-15 23:57 
SuggestionObject pool Pin
Shao Voon Wong9-Apr-15 15:08
professionalShao Voon Wong9-Apr-15 15:08 
AnswerExcelente! Pin
jediYL7-Apr-15 17:20
professionaljediYL7-Apr-15 17:20 
QuestionNice one! Pin
manchanx15-Mar-15 9:33
professionalmanchanx15-Mar-15 9:33 
AnswerRe: Nice one! Pin
Marc Clifton15-Mar-15 12:07
protectorMarc Clifton15-Mar-15 12:07 
GeneralRe: Nice one! Pin
manchanx15-Mar-15 12:13
professionalmanchanx15-Mar-15 12:13 
GeneralRe: Nice one! Pin
Marc Clifton15-Mar-15 16:26
protectorMarc Clifton15-Mar-15 16:26 
GeneralRe: Nice one! Pin
manchanx15-Mar-15 16:56
professionalmanchanx15-Mar-15 16:56 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Permalink | Advertise | Privacy | Terms of Use | Mobile
Web03 | 2.8.170813.1 | Last Updated 15 Mar 2015
Article Copyright 2015 by Marc Clifton
Everything else Copyright © CodeProject, 1999-2017
Layout: fixed | fluid