Hi. I'm currently CUDA C from Udacity and I'm stuck at Lesson 1. I've written this code for color to grey-scale conversion but its converting only a thin strip of pixels from top.
Please tell me where does the fault lie: in the grid-size calculation or in the kernel itself.
Here's the code:
#include "reference_calc.cpp"
#include "utils.h"
#include <stdio.h>
__global__ void rgba_to_greyscale(const uchar4* const rgbaImage,
unsigned char* const greyImage,
int numRows, int numCols)
{
int x,y,i; i = (blockIdx.y * blockDim.x) + blockIdx.x;
x= (blockIdx.x * blockDim.x) + threadIdx.x;
y= (blockIdx.y * blockDim.y) + threadIdx.y;
if(x < numCols && y < numRows)
{
greyImage[i] = (0.299f * rgbaImage[y].x) + (0.587f * rgbaImage[y].y) + (0.114f * rgbaImage[y].z);
}
}
void your_rgba_to_greyscale(const uchar4 * const h_rgbaImage, uchar4 * const d_rgbaImage,
unsigned char* const d_greyImage, size_t numRows, size_t numCols)
{
const dim3 blockSize(10, 10, 1); size_t gridSizeX, gridSizeY;
gridSizeX = numCols + (10 - (numCols % 10) ); gridSizeY = numRows + (10 - (numRows % 10) );
const dim3 gridSize( gridSizeX, gridSizeY, 1); rgba_to_greyscale<<<gridSize, blockSize>>>(d_rgbaImage, d_greyImage, numRows, numCols);
cudaDeviceSynchronize(); checkCudaErrors(cudaGetLastError());
}
For grid-size calculation, I follower this strategy:
> First make number of threads per block fixed. I choose 100 in this case (dim3 blocksize(10, 10, 1);)
> Then make the dimensions of the image an integral multiple of num. of threads per block bu adding something.
> Do this for both x and y dimensions.
> Divide them by number of threads in each dimension respectively.
> Above operation will result to a 2D grid size containing slightly more number of threads, which is inevitable due to variable image size.
eg.:-
> suppose the image to be of dimension 512 x 512 pixels.
> I add 8 to both dimensions so as to make it an integral multiple of 10 and 10, resulting to 520 x 520.
> 520/10 and 520/10 gives 52x52 as the grid size.