Hello everyone,
I have a computer vision algorithm in which the convolution algorithm must be run in parallel on multicore processors using openMP. The code is of the form as shown below.
void convolve(Bitmap &src,Kernel &kernel, Bitmap &dst)
#pragma omp parallel for
for(int y = 0; y < height; ++y){
for(int x = 0; x <width; ++x){
kernel.response(src,x,y,dst);
}
}
The kernel is an interface
class Kernel {
public: virtual ~Kernel();
public: virtual void response(Bitmap &src, int x, int y, Bitmap &dst) = 0;
};
The problem is that implementations of kernel are not known by the compiler and they can be complex. So is this code capable of being parallelised using openMP?
If I run the code it actually runs slower than the serial version and it visibly stalls when running real time image recognition NOTE: this is not the case when openMP is disabled. I'am using visual studio express 2013 with "/openmp" enabled
EDIT:
Example below shows a Kernel implementation for an algorithm to change bitmap to gray scale. The convolution operator is inherently a parallel problem as is most computer vision algorithms, yet parallelizing such algorithms with shared memory is hard, especially due to race conditions. Maybe I should consider using GPU's for doing hardware acceleration of my algorithms NOTE: the code runs in real - time even on a single core, but I wanted to speed things up for mobile platforms, as the code is to be run on mobile devices for Augment Reality apps, thus multicore is the way to go.
Mostly if I use "omp parallel for shared(dst)" the code runs fast with minimal stalling but I feel like the race conditions are still there, is there a hardware implementation for avoiding false sharing and race conditions without using the expensive "critical"? I tried "atomic" but it's only for primitives such as addition operations. And why assignment operator is not supported by "atomic"?
I also just recently came across openMP and was excited about it until these issues came around :-(
class GrayScale: public Kernel {
public: void response(Bitmap &src, int x, int y, Bitmap &dst)
{
Pixel &sp = src.pData[x + y*src.width];
Pixel &dp = dst.pData[x + y*dst.width];
unsigned char gray = static_cast<unsigned char>(0.3*sp.red + 0.5*sp.green + 0.2*sp.blue);
#pragma omp critical
{
dp.red = dp.green = dp.blue = gray;
}
}
};
where
struct Pixel {
unsigned char red;
unsigned char green;
unsigned char blue;
unsigned char alpha;
};