Click here to Skip to main content
11,428,497 members (66,361 online)
Rate this: bad
good
Please Sign up or sign in to vote.
See more: C++ Threading
So I've been adding in some multi threading to my game demo. It all works fine until I create more threads than I have cores on my CPU.

I've only got a dual core so I have the main thread and an extra helper thread, if for whatever reason I create a third thread then performance drops considerably - like going from a super car to your dad's first Ford.

With a single thread I get 50 fps, with two threads I get 70 fps but with three threads I get 8 fps.

Using Very Sleepy to see what's going on; when I add the third thread a lot of time is spent in WaitForSingleObject and ReleaseSemaphore, but there is only one place I use that and the locks are going to get called the same number of times regardless of the number of threads because there's only a limited amount of data.

I create the threads at startup using CreateThread and they then wait for en event to signal that there is work for them to do, once the work is done they'll be waiting for the event again.

I've not done much threading so am I doing something immensely stupid that's causing the horrific performance?
Posted 18-Oct-10 13:16pm
Rate this: bad
good
Please Sign up or sign in to vote.

Solution 2

Having more threads than cores will slow you down if you are already running at 100% cpu. However, I have written programs running dozens of threads on a dual core cpu without a problem. Just need to avoid deadlock and race conditions.
  Permalink  
Rate this: bad
good
Please Sign up or sign in to vote.

Solution 1

It's harder to look at the actual problem when no actual code example is given, but based on your information I get to the following:
Because you have more threads than cores in a situation with limited data, you have a lot of overhead that causes the processor to stall.

This is way more clear in a real life example:
When doing dishes by hand, you have one person doing the washing and one doing the drying. When there is an extra person (thread) that has to compete to get a lock on the only brush (and sink) available, it is clear that this will get messy without an actual performance gain. Doing dishes this way certainly won't get any faster.

But how to speed it up? Well, if after drying an item you would have to walk from the kitchen to the living room to put that item away, it could help to leave the drying cloth at the sink and let another person (thread) take that resource and use it while the other person is putting an item away (meaning the lock on the drying cloth is released).

The conclusion here is only to use more threads than cores in a case where I/O operations are involved. That time is otherwise spent on waiting. When threads would otherwise have to compete on resources that are already available it won't speed up anything. In that cases you are making the processor crazy because threads are suspended to/restored from main memory, meaning you are thrashing the cache. The latency on that is enormous because the performance of a computer program is simply the maximum latency that the system encounters by executing it. By adding a thread you just added more latency intensive operations which drops performance drastically.

Good luck!
  Permalink  
v2
Comments
SK Genius at 19-Oct-10 8:59am
   
Hmm, so adding more threads is kinda like using too much memory and causing paging, or setting the max resolution in a game with the highest AA when your card doesn't have enough RAM to handle it all.

I always figured that threads would be handled better somehow. But still, like a good developer I am limiting the number of threads created based on the number cores to run them.
E.F. Nijboer at 19-Oct-10 11:19am
   
It's not that the thread is using a huge amount of ram. First thread 1 and 2 are working and thread 3 is waiting for a core to get available. Thread 1 is done and tries to acquire the lock. In case of 1 thread per core the kernel doesn't interrupt the thread and it can resume after the lock is acquired. But with the extra thread the kernel switches context, meaning that thread 3 is scheduled to execute. The processor cache is cleared and code and data of thread 3 is loaded. The cache controller probably didn't (and couldn't) anticipate to this and is way behind on getting the necessary data to the processor. The processor is at this time very inefficient because code and data is simply missing to get the actual work done. This is the case for each thread on every iteration. Each time the thread context will be switched with the waiting thread and that takes a huge amount of time because the quite intelligent cache controller could not foresee this tragedy of cache trashing.
livebytes at 15-Feb-15 13:27pm
   
What might be happening in your case is context switching/Time-slicing and not parallel execution or processing with more Threads than processor cores,
I suggest to use Task Parallel Library (TPL) which utilizes your system processor capabilities much efficiently knowing which processor/core is idle/available and assigning tasks providing in-built thread pooling, utilizing memory too
Thanks
E.F. Nijboer at 15-Feb-15 16:34pm
   
Your comment is very late in the discussion, as this question/answer goes back to 2010. Using TPL might be a good way to go. But this question was more about using more threads than cores when the actual tasks are processor intensive instead of i/o intensive. Even then someone can make the mistake in forcing tpl to use 3 threads and still lose performance. But TPL can help in a great way indeed.
Rate this: bad
good
Please Sign up or sign in to vote.

Solution 5

  Permalink  

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



Advertise | Privacy | Mobile
Web01 | 2.8.150428.2 | Last Updated 6 Jan 2015
Copyright © CodeProject, 1999-2015
All Rights Reserved. Terms of Service
Layout: fixed | fluid

CodeProject, 503-250 Ferrand Drive Toronto Ontario, M3C 3G8 Canada +1 416-849-8900 x 100