Click here to Skip to main content
15,443,281 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
Hi. I'm currently CUDA C from Udacity and I'm stuck at Lesson 1. I've written this code for color to grey-scale conversion but its converting only a thin strip of pixels from top.

Please tell me where does the fault lie: in the grid-size calculation or in the kernel itself.

Here's the code:

#include "reference_calc.cpp"
#include "utils.h"
#include <stdio.h>

__global__ void rgba_to_greyscale(const uchar4* const rgbaImage,
                       unsigned char* const greyImage,
                       int numRows, int numCols)
    int x,y,i; // i is index for 1D array greyImage. x and y for rgbaImage
    i = (blockIdx.y * blockDim.x) + blockIdx.x;
    x= (blockIdx.x * blockDim.x) + threadIdx.x;
    y= (blockIdx.y * blockDim.y) + threadIdx.y;

    if(x < numCols && y < numRows)
        greyImage[i] = (0.299f * rgbaImage[y].x) + (0.587f * rgbaImage[y].y) + (0.114f * rgbaImage[y].z);


void your_rgba_to_greyscale(const uchar4 * const h_rgbaImage, uchar4 * const d_rgbaImage,
                            unsigned char* const d_greyImage, size_t numRows, size_t numCols)
  //You must fill in the correct sizes for the blockSize and gridSize
  //currently only one block with one thread is being launched
  const dim3 blockSize(10, 10, 1);  //TODO
  size_t gridSizeX, gridSizeY;
  gridSizeX = numCols + (10 - (numCols % 10) );  //adding some number to make it multiple of 10
  gridSizeY = numRows + (10 - (numRows % 10) );  //adding some number to make it multiple of 10

  const dim3 gridSize( gridSizeX, gridSizeY, 1);  //TODO
  rgba_to_greyscale<<<gridSize, blockSize>>>(d_rgbaImage, d_greyImage, numRows, numCols);

  cudaDeviceSynchronize(); checkCudaErrors(cudaGetLastError());

For grid-size calculation, I follower this strategy:

> First make number of threads per block fixed. I choose 100 in this case (dim3 blocksize(10, 10, 1);)

> Then make the dimensions of the image an integral multiple of num. of threads per block bu adding something.

> Do this for both x and y dimensions.

> Divide them by number of threads in each dimension respectively.

> Above operation will result to a 2D grid size containing slightly more number of threads, which is inevitable due to variable image size.


> suppose the image to be of dimension 512 x 512 pixels.

> I add 8 to both dimensions so as to make it an integral multiple of 10 and 10, resulting to 520 x 520.

> 520/10 and 520/10 gives 52x52 as the grid size.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900