Click here to Skip to main content
Rate this: bad
good
Please Sign up or sign in to vote.
See more: C++ Parallel OpenMP
hi everyone!
i'm working with a simple image process program using open mp,
but can't get make my program faster as it's supposed to be.
 
the process time of the code bellow was 0.5secs.
but with the open mp line commented out, it was only 0.1secs.
i can't see what is going on.
please tell me what i'm missing.
 
thanks in advance!
-carlos
 
[machine info]
cpu:intel core 2 quad 2.6ghz
memory:4gb
os:win xp professional
 
[code]
int width = 10000;
int length = 10000;
#pragma omp parallel for  // open mp line
for (int y = 0; y < length; y++) {
  int y_offset = y * width;
  const byte* source = SOURCE_IMAGE_ADDRESS_;
  source += y_offset;
  byte* destination = DESTINATION_IMAGE_ADDRESS_;
  destination += y_offset;
  for (int x = 0; x < width; x++) {
    *destination = (*source);
    source++;
    destination++;  
  }
}
Posted 15-Jan-12 18:19pm
Edited 16-Jan-12 2:36am
v2
Rate this: bad
good
Please Sign up or sign in to vote.

Solution 1

Did you gain 5 times in performance? This is better result as I would expect. How could you expect more if you only have 2 cores? Do you think parallel processing is the miracle, can draw power from nowhere? Smile | :)
 
—SA
  Permalink  
Comments
alianzalima1978 at 16-Jan-12 0:28am
   
thanks for checking my question SAKryukov!
 
i meant i got worse performance using open mp.
-with "#pragma omp parallel for" : 0.5secs
-without "#pragma omp parallel for" : 0.1secs
Rate this: bad
good
Please Sign up or sign in to vote.

Solution 2

The inner loop var x is shared by default. You should make it private:
#pragma omp parallel for private(x)
This should boost performance. But I'm not sure if this is all.
  Permalink  
Rate this: bad
good
Please Sign up or sign in to vote.

Solution 3

simple image process program using open mp
 
That is the first mistake: Multithreading is never simple!
 
The second problem is memory access: I don't know how clever OMP goes about it, but I doubt it will recognize that for each y you access values from a very specific range of memory, and that this memory does not overlap with the ranges used for other values, and, more importantly, that this range does not overlap with the memory range you write to! That means every single memory access will have to be synchronized, and therefore the majority of your code is in effect serial, no matter whether you use OMP or not. Worse: the synchronization probably takes longer than the actual access!
 
I don't know OMP, but you must somehow tell it to not synchronize the memory accesses. Either by specifying the memory ranges as local (or 'private' as used in solution 2?), or maybe through other options.
 
There's more to it, e. g. that each core will try to load the memory it accesses into it's cache. If two (or more) access memory addresses that overlap, that cache needs to be synchronized everytime this happens. You may not be aware of that, but if the memory address of your source and destination are close to each other, the cores may try to load the entire memory block that includes both addresses, and thus the cache would indeed overlap...
 
As I said: Multithreading is never simple!
  Permalink  

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

  Print Answers RSS
0 OriginalGriff 587
1 Sergey Alexandrovich Kryukov 479
2 Maciej Los 305
3 BillWoodruff 220
4 Mathew Soji 195
0 OriginalGriff 7,356
1 Sergey Alexandrovich Kryukov 6,817
2 DamithSL 5,461
3 Manas Bhardwaj 4,946
4 Maciej Los 4,475


Advertise | Privacy | Mobile
Web01 | 2.8.1411023.1 | Last Updated 16 Jan 2012
Copyright © CodeProject, 1999-2014
All Rights Reserved. Terms of Service
Layout: fixed | fluid

CodeProject, 503-250 Ferrand Drive Toronto Ontario, M3C 3G8 Canada +1 416-849-8900 x 100