15,443,281 members

See more:

Hi. I'm currently CUDA C from Udacity and I'm stuck at Lesson 1. I've written this code for color to grey-scale conversion but its converting only a thin strip of pixels from top.

Please tell me where does the fault lie: in the grid-size calculation or in the kernel itself.

Here's the code:

For grid-size calculation, I follower this strategy:

> First make number of threads per block fixed. I choose 100 in this case (dim3 blocksize(10, 10, 1);)

> Then make the dimensions of the image an integral multiple of num. of threads per block bu adding something.

> Do this for both x and y dimensions.

> Divide them by number of threads in each dimension respectively.

> Above operation will result to a 2D grid size containing slightly more number of threads, which is inevitable due to variable image size.

eg.:-

> suppose the image to be of dimension 512 x 512 pixels.

> I add 8 to both dimensions so as to make it an integral multiple of 10 and 10, resulting to 520 x 520.

> 520/10 and 520/10 gives 52x52 as the grid size.

Please tell me where does the fault lie: in the grid-size calculation or in the kernel itself.

Here's the code:

C++

#include "reference_calc.cpp" #include "utils.h" #include <stdio.h> __global__ void rgba_to_greyscale(const uchar4* const rgbaImage, unsigned char* const greyImage, int numRows, int numCols) { int x,y,i; // i is index for 1D array greyImage. x and y for rgbaImage i = (blockIdx.y * blockDim.x) + blockIdx.x; x= (blockIdx.x * blockDim.x) + threadIdx.x; y= (blockIdx.y * blockDim.y) + threadIdx.y; if(x < numCols && y < numRows) { greyImage[i] = (0.299f * rgbaImage[y].x) + (0.587f * rgbaImage[y].y) + (0.114f * rgbaImage[y].z); } } void your_rgba_to_greyscale(const uchar4 * const h_rgbaImage, uchar4 * const d_rgbaImage, unsigned char* const d_greyImage, size_t numRows, size_t numCols) { //You must fill in the correct sizes for the blockSize and gridSize //currently only one block with one thread is being launched const dim3 blockSize(10, 10, 1); //TODO size_t gridSizeX, gridSizeY; gridSizeX = numCols + (10 - (numCols % 10) ); //adding some number to make it multiple of 10 gridSizeY = numRows + (10 - (numRows % 10) ); //adding some number to make it multiple of 10 const dim3 gridSize( gridSizeX, gridSizeY, 1); //TODO rgba_to_greyscale<<<gridSize, blockSize>>>(d_rgbaImage, d_greyImage, numRows, numCols); cudaDeviceSynchronize(); checkCudaErrors(cudaGetLastError()); }

For grid-size calculation, I follower this strategy:

> First make number of threads per block fixed. I choose 100 in this case (dim3 blocksize(10, 10, 1);)

> Then make the dimensions of the image an integral multiple of num. of threads per block bu adding something.

> Do this for both x and y dimensions.

> Divide them by number of threads in each dimension respectively.

> Above operation will result to a 2D grid size containing slightly more number of threads, which is inevitable due to variable image size.

eg.:-

> suppose the image to be of dimension 512 x 512 pixels.

> I add 8 to both dimensions so as to make it an integral multiple of 10 and 10, resulting to 520 x 520.

> 520/10 and 520/10 gives 52x52 as the grid size.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

CodeProject,
20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8
+1 (416) 849-8900