Click here to Skip to main content
15,886,026 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
hi everyone!
i'm working with a simple image process program using open mp,
but can't get make my program faster as it's supposed to be.

the process time of the code bellow was 0.5secs.
but with the open mp line commented out, it was only 0.1secs.
i can't see what is going on.
please tell me what i'm missing.

thanks in advance!
-carlos

[machine info]
cpu:intel core 2 quad 2.6ghz
memory:4gb
os:win xp professional

[code]
int width = 10000;
int length = 10000;
#pragma omp parallel for  // open mp line
for (int y = 0; y < length; y++) {
  int y_offset = y * width;
  const byte* source = SOURCE_IMAGE_ADDRESS_;
  source += y_offset;
  byte* destination = DESTINATION_IMAGE_ADDRESS_;
  destination += y_offset;
  for (int x = 0; x < width; x++) {
    *destination = (*source);
    source++;
    destination++;  
  }
}
Posted
Updated 16-Jan-12 1:36am
v2

Did you gain 5 times in performance? This is better result as I would expect. How could you expect more if you only have 2 cores? Do you think parallel processing is the miracle, can draw power from nowhere? :-)

—SA
 
Share this answer
 
Comments
alianzalima1978 16-Jan-12 0:28am    
thanks for checking my question SAKryukov!

i meant i got worse performance using open mp.
-with "#pragma omp parallel for" : 0.5secs
-without "#pragma omp parallel for" : 0.1secs
The inner loop var x is shared by default. You should make it private:
C++
#pragma omp parallel for private(x)

This should boost performance. But I'm not sure if this is all.
 
Share this answer
 
simple image process program using open mp

That is the first mistake: Multithreading is never simple!

The second problem is memory access: I don't know how clever OMP goes about it, but I doubt it will recognize that for each y you access values from a very specific range of memory, and that this memory does not overlap with the ranges used for other values, and, more importantly, that this range does not overlap with the memory range you write to! That means every single memory access will have to be synchronized, and therefore the majority of your code is in effect serial, no matter whether you use OMP or not. Worse: the synchronization probably takes longer than the actual access!

I don't know OMP, but you must somehow tell it to not synchronize the memory accesses. Either by specifying the memory ranges as local (or 'private' as used in solution 2?), or maybe through other options.

There's more to it, e. g. that each core will try to load the memory it accesses into it's cache. If two (or more) access memory addresses that overlap, that cache needs to be synchronized everytime this happens. You may not be aware of that, but if the memory address of your source and destination are close to each other, the cores may try to load the entire memory block that includes both addresses, and thus the cache would indeed overlap...

As I said: Multithreading is never simple!
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900