|
Marc Clifton wrote: but it was using Microsoft's STL
well.... I use vectors in threaded operations all the time. your lists and trees will bog down in threading, but a vector will only kill you on an expansion. I control the expansion, thus preventing any issues with STL vectors, making them the easiest and fastest to use. I have used some of the others, but you really have to know how STL is storing and accessing the information to know what you have to do to use it in parallel.
_________________________
John Andrew Holmes "It is well to remember that the entire universe, with one trifling exception, is composed of others."
Shhhhh.... I am not really here. I am a figment of your imagination.... I am still in my cave so this must be an illusion....
|
|
|
|
|
El Corazon wrote: but a vector will only kill you on an expansion.
Well, that's exactly what was going on. And unfortunately, as this was an analysis routine that analyzes switch topologies for failure conditions, the modus operandi of the algorithm is expanding various vectors, maps, etc.
Marc
|
|
|
|
|
Marc Clifton wrote: as this was an analysis routine that analyzes switch topologies for failure conditions, the modus operandi of the algorithm is expanding various vectors, maps, etc.
not enough memory to reserve before hand?
_________________________
John Andrew Holmes "It is well to remember that the entire universe, with one trifling exception, is composed of others."
Shhhhh.... I am not really here. I am a figment of your imagination.... I am still in my cave so this must be an illusion....
|
|
|
|
|
El Corazon wrote: not enough memory to reserve before hand?
Ah, the problem is, it's impossible to figure out before hand, though ballpark estimates would definitely be doable. I'll have to look into that.
Marc
|
|
|
|
|
Marc Clifton wrote: Ah, the problem is, it's impossible to figure out before hand, though ballpark estimates would definitely be doable. I'll have to look into that.
if you have the memory, ball-park estimates and even over estimates will help. Most STL implementations add 50% more storage when you run out of reserve storage (though a few double). So if you can get past most of the re-allocs on the small end you may only realloc a few times... and I know it is a sin to use more memory than you need... but ... if you must get it done fast, over-estimate if you have the memory to spare.
_________________________
John Andrew Holmes "It is well to remember that the entire universe, with one trifling exception, is composed of others."
Shhhhh.... I am not really here. I am a figment of your imagination.... I am still in my cave so this must be an illusion....
|
|
|
|
|
El Corazon wrote: and I know it is a sin to use more memory than you need... but ... if you must get it done fast, over-estimate if you have the memory to spare.
That doesn't bother me at all. The number of combinations of failure cases that have to be analyzed are in the billions, so there's no way to hold all that in memory anyways, but there's a lot of state information that does fit in memory easily, and of course every iteration of a failure analysis has to clear various working lists. So thank you, I'll have to give this a try!
Marc
|
|
|
|
|
C++ isn't modern, and neither is MFC.
OTOH, I've used threading without problems in MFC apps - the one thign you should not try on Windows anyway is a multithreaded UI.
|
|
|
|
|
I am using the most primitive and most horrible way of implementing parallelism: manual thread creation/synchronization/communication. Have been doing it for 10 years now and the more I know the more I dislike it.
|
|
|
|
|
I prefer GPU instead of CPU for parallel operations. My favorite is NVIDIA CUDA and its much much faster. 
|
|
|
|
|
Is there a library that wraps the whole thing using CPU / CPU SIMD implementation if no CUDA-supporting card is found, or do you have to require a newer NVIDIA card / code it on your own?
|
|
|
|
|
AFAIK: CUDA is strictly for nvidia GPU's only. Earlier versions supported emulators for non-nvidia devices, eliminated completely in v3.0. For using it on others, OpenCL is there (There's both CPU/GPU SIMD implementations). But I prefer CUDA because its faster even than OpenCL and very similar to OpenCL syntax. On top of it NVIDIA/ATI/(to some extent on S3G) has OpenCL support. Moreover nvidia customers as compared to ATI are more. Though last 6 months ATI customer base has shown excellent increase.
To Run CUDA, nvidia 8-Series and above cards are required. I work with GeForce 8500GT (With Compute Capability 1.1) which is very low end card and the performance is little more than my Intel Core 2 Duo E8400 @3.00GHz.
modified on Monday, July 5, 2010 6:59 AM
|
|
|
|
|
peterchen wrote: Is there a library that wraps the whole thing using CPU / CPU SIMD implementation if no CUDA-supporting card is found, or do you have to require a newer NVIDIA card / code it on your own?
Get Thrust, and use the OpenMP back-end. Unfortunately that is compile time, I was talking with one of the developers during an nVidia class on Thrust and he is considering doing a run-time version. Certainly you can setup one call to an OpenMP routine and another to a CUDA, but that means double coding. When Thrust gets the runtime OpenMP/CUDA working, it will be much easier.
http://code.google.com/p/thrust/wiki/DeviceBackends[^]
There are also projects that handle multiple backends like Ocelot[^]
Cuda, in and of itself, can only be compiled in emulation mode to run on a CPU, but that limits what libraries and calls you can make.
_________________________
John Andrew Holmes "It is well to remember that the entire universe, with one trifling exception, is composed of others."
Shhhhh.... I am not really here. I am a figment of your imagination.... I am still in my cave so this must be an illusion....
|
|
|
|
|
I don't know if people realise this, but only certain tasks are suitable for parellelism. Specifically, tasks that can be split into completely seperate units with no data sharing between threads. Once you have data shared by several threads, then you have thread synchonisation issues and you are in store for some *pain* 
|
|
|
|
|
I don't know how this is with other platforms, but on Windows the most common reason for parallelism is not performance, but keeping the UI active (not sure if this counts as "strict" parallelism).
Also, when writing libraries to be consumed by others, you often have to design for parallelism, whether it gets used or not.
And yes, it's a pain.
|
|
|
|
|
I think this poll is specifically about performance. Of coarse, there are various other reasons why you might need multi-threading
|
|
|
|
|
I write image analysis software and parallel processing is an absolute must. BUT you better be careful - coding by the seat of your pants is asking for trouble. You have to spend some time designing how it's going to work, otherwise it probably won't, or (worse) it will work sometimes, or (even worse) it will work slightly differently each time. Yes, I have been reduced to swearing at my own software. Some careful planning and it shouldn't be too painful, just tedious. But you get to watch every core running flat out which is pretty funny. Hah! Who's making who work now?
|
|
|
|
|
ed welch wrote: I don't know if people realise this, but only certain tasks are suitable for parellelism
Perhaps, perhaps.... Obviously you are correct, and at the same time some programmers can write things that always prove you are correct, no matter what they write. I have one programmer here who places all variables, including STL iterators in the class private variables, and accessing through an Instance pointer he always has access to the root thread from multiple threads and thus always has to mutex because he is always sharing data, no routine is re-entrant, no routine can thread without synchronization.
Some things can be rephrased, some algorithms can be restructured so that multiple operations can occur, large loops that operate on a large structure (thus sharing data), but operate on different areas of memory for every increment of the counter can be done in parallel because no two areas operate on the same area of memory. But other methods such as interleaving allow even sharing areas to operate simultaneously. Thus, even in shared data systems, when you learn the techniques, you can thread without synchronization blocking.
_________________________
John Andrew Holmes "It is well to remember that the entire universe, with one trifling exception, is composed of others."
Shhhhh.... I am not really here. I am a figment of your imagination.... I am still in my cave so this must be an illusion....
|
|
|
|
|
While writing code i never thought how much cpu i used.
May be the Threading concept in .NET helps me to parallelsing my coding.
BackgroundWorker process also helps me to do the same.
what is your way to do that...
Rating always..... WELCOME
Be a good listener...Because Opprtunity knoughts softly...N-Joy
|
|
|
|
|