Click here to Skip to main content
13,202,873 members (57,394 online)
Rate this:
Please Sign up or sign in to vote.
See more:
hi everyone!
i'm working with a simple image process program using open mp,
but can't get make my program faster as it's supposed to be.

the process time of the code bellow was 0.5secs.
but with the open mp line commented out, it was only 0.1secs.
i can't see what is going on.
please tell me what i'm missing.

thanks in advance!

[machine info]
cpu:intel core 2 quad 2.6ghz
os:win xp professional

int width = 10000;
int length = 10000;
#pragma omp parallel for  // open mp line
for (int y = 0; y < length; y++) {
  int y_offset = y * width;
  const byte* source = SOURCE_IMAGE_ADDRESS_;
  source += y_offset;
  byte* destination = DESTINATION_IMAGE_ADDRESS_;
  destination += y_offset;
  for (int x = 0; x < width; x++) {
    *destination = (*source);
Posted 15-Jan-12 17:19pm
Updated 16-Jan-12 1:36am
Rate this: bad
Please Sign up or sign in to vote.

Solution 1

Did you gain 5 times in performance? This is better result as I would expect. How could you expect more if you only have 2 cores? Do you think parallel processing is the miracle, can draw power from nowhere? :-)

alianzalima1978 16-Jan-12 0:28am
thanks for checking my question SAKryukov!

i meant i got worse performance using open mp.
-with "#pragma omp parallel for" : 0.5secs
-without "#pragma omp parallel for" : 0.1secs
Rate this: bad
Please Sign up or sign in to vote.

Solution 2

The inner loop var x is shared by default. You should make it private:
#pragma omp parallel for private(x)

This should boost performance. But I'm not sure if this is all.
Rate this: bad
Please Sign up or sign in to vote.

Solution 3

simple image process program using open mp

That is the first mistake: Multithreading is never simple!

The second problem is memory access: I don't know how clever OMP goes about it, but I doubt it will recognize that for each y you access values from a very specific range of memory, and that this memory does not overlap with the ranges used for other values, and, more importantly, that this range does not overlap with the memory range you write to! That means every single memory access will have to be synchronized, and therefore the majority of your code is in effect serial, no matter whether you use OMP or not. Worse: the synchronization probably takes longer than the actual access!

I don't know OMP, but you must somehow tell it to not synchronize the memory accesses. Either by specifying the memory ranges as local (or 'private' as used in solution 2?), or maybe through other options.

There's more to it, e. g. that each core will try to load the memory it accesses into it's cache. If two (or more) access memory addresses that overlap, that cache needs to be synchronized everytime this happens. You may not be aware of that, but if the memory address of your source and destination are close to each other, the cores may try to load the entire memory block that includes both addresses, and thus the cache would indeed overlap...

As I said: Multithreading is never simple!

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

  Print Answers RSS
Top Experts
Last 24hrsThis month

Advertise | Privacy |
Web04 | 2.8.171020.1 | Last Updated 16 Jan 2012
Copyright © CodeProject, 1999-2017
All Rights Reserved. Terms of Service
Layout: fixed | fluid

CodeProject, 503-250 Ferrand Drive Toronto Ontario, M3C 3G8 Canada +1 416-849-8900 x 100