I should either write a full threading tutorial for you or write the code that isn't possible from what you have shown and I don't really want to do that anyway. Your problem is that the global
::bond_portfolio_valuation()
function has to be thread safe without locking but currently it isn't thread safe so it must be guarded by a lock externally.
Solution: make the
::bond_portfolio_valuation()
function threadsafe and remove the lock (mutex). You can do that by making sure that the
::bond_portfolio_valuation()
doesn't use other data than its input parameters and the location where it stores the result data plus some non-static local variables on the stack and the same must be true for all functions that it calls directly or indirectly.
Its often isn't fruitful to start more threads than the number of cores because your program wont be faster, it will rather be slower than a correct implementation. As an edge case: if you have just one core then a sequential execution is faster than a multithreaded, and the same is true for multicore systems. Don't start 16 threads where you have just 2 or 4 cores... (Of course this isn't true if most of the threads are just waiting for blocking IO but in your case this isn't true.)
In case where I have to process the same items on multicore systems I usually separate two cases:
1. the number of processable items is known (like in your case)
2. the number of processable items is unknown
In if the number of processable items is unknown I use a blocking message queue the receives the item on its input side and outputs the items to all threads on the other side.
In your case with fixed number of processable items I usually do the following:
(WARNING!!! Pseudocode written here in firefox without syntax checking - read at your own risk!!!)
InputStruct thread_input[ITEM_COUNT];
OutputStruct thread_output[ITEM_COUNT];
LONG items_left = ITEM_COUNT;
LONG running_threads = THREAD_COUNT;
HANDLE hFinishedEvent = ::CreateEvent(NULL, TRUE, FALSE, NULL);
bool GetNextItemToProcess(const InputStruct*& input, OutputStruct*& output)
{
LONG index = InterlockedDecrement(&items_left);
if (index < 0)
return false;
input = &thread_input[index];
output = &thread_output[index];
return true;
}
DWORD WINAPI ThreadProc(LPVOID arg)
{
for (;;)
{
const InputStruct* input;
OutputStruct* output;
if (!GetNextItemToProcess(input, output))
break;
}
if (0 == InterlockedDecrement(&thread_count))
::SetEvent(hFinishedEvent);
return 0;
}
void MainControllerThread()
{
if (ITEM_COUNT <= 0)
return;
DWORD thread_id;
for (int i=0; i<THREAD_COUNT; ++i)
{
HANDLE hThread = ::CreateThread(NULL, 0, ThreadProc, NULL, 0, &thread_id);
if (!hThread)
YourFatalErrorExitFunc();
else
::CloseHandle(hThread);
}
::WaitForSingleObject(hFinishedEvent, INFINITE);
}
Note that this isn't C++ code but I intentionally wrote C because it seems that your code is C too, it has just been put in a "namespace" that doesn't make it C++ at all.
Here the synchronization is done with the relative lightweight
InterlockedDecrement()
and an event on which the main thread waits. In case of a lot of relatively short tasks you can try to decrement the number of
InterlockedDecrement()
calls by packing more tasks into one task - this is dependent on the scenario, needs experimenting and profiling before releasing your app because there is no single optimal solution.