Lots of sample code is available to start learning how to use NVidia's CUDA development platform to create stunningly fast parallel floating-point code for execution on graphics processing units. Precious little information is available to help those getting started to understand exactly how to organize, compile and link their code. This article reviews the fundamental organization of a Visual Studio project for compiling CUDA code in the hopes that folks can get to the fun part, the crafting of new code, as quickly as possible.
The "Hello, World" program has been seen so often it has, to some degree, become a cliche. So much so, in fact, that people forget that the first "Hello, World" program had a very important point, far beyond a simple piece of sample code. In the early days of C compilers, merely installing the compiler and getting it to run was not a trivial task. There were include directories which had to be referenced by environment variables, and there had to be folders for runtime libraries, and all the components had to have a correct mutual relationship. Today, one needs only click on the "Setup" icon, but in those days installing a compiler was an accomplishment in its own right.
If you can compile the "Hello, World" program, it means that you have installed your compiler successfully. Another way of saying this is that programming is easy; it's understanding the environment in which you work which can be difficult! Nowhere is this more true than in the case of CUDA, where we are compiling not with one but with two separate compilers. It's a square function, by the way. With two compilers, there are four times as many things that can go wrong.
Things We Won't Cover
Visual Studio Express on 64-bit Systems: Installing Visual Studio Express on a 64-bit system does not immediately provide you with a 64-bit development environment. If you want to do this, you must first ensure you have the appropriate 64-bit tools installed on your machine. Here is a place to start: Jen's Blog.
Visual Studio 2010: This is a fast-moving field, but to the best of my knowledge NVidia does not yet support VS2010 for CUDA development.
We will not discuss the installation of CUDA; NVidia provides this information. You will need CUDA-enabled drivers to run CUDA code and the CUDA SDK to develop code. This is where you can start. The default installation directory for the CUDA binaries is C:\CUDA. NVidia's excellent sample code installs in C:\Documents and Settings\All Users\Application Data\NVIDIA Corporation. If you have successfully loaded and compiled some examples, you are already looking good. Take a look at the location of the sample projects in relationship to another folder called "common".
The NVidia samples assume that this relative relationship exists. NVidia provides a template solution which you can use as a starting position for your own projects. If you move this solution to some other location, it is up to you to ensure that the include files and library files can be found successfully. This cannot be done in the graphical tools of the Visual Studio IDE. You are back in the early 70s with your C compiler!
The Nitty Gritty
When you compile a CUDA project, some of the code must be compiled for the CUDA device and some must be compiled for the Windows host. One NVidia sample you might like to consider is "
cppintegration". In this solution, there are files with the extension ".cu" and files with the extension ".cpp". The .cpp files are good, old-fashioned C or C++. The .cu files will be compiled by NVidia's compiler, nvcc.exe. It would be too easy, however, if the cpp code ran on Windows and the cu code ran on the device. The cpp code is indeed compiled for Windows by the Microsoft VC compiler. The cu code, however, is often split. The NVidia compiler compiles some of the code to run on the CUDA device, but some of the code must be compiled to run on the Windows host. The host, of course, must be able to load and execute device code at runtime.
For the developer just getting started with CUDA, a very important fact is the NVidia compiler must be able to invoke cl.exe, the VC compiler. Furthermore, the NVidia compiler will invoke cl with commandline arguments. A significant stumbling block for developers getting started with CUDA is that the commandline arguments used by nvcc must be consistent with those used in your Visual Studio project, or your code will not link successfully.
The Custom Build
The Visual Studio permits the definition of custom build steps and these steps can invoke external tools. There are two approaches. NVidia has created a custom rules file, and has associated this rules file with the extension cu. When you click "Build" in the Visual Studio, the rules file tells VS to invoke nvcc for the cu files. Any cpp files are directly compiled by VS. When the compilation is complete, the Visual Studio linker links all the object files: those created by VS and those created by nvcc. Hopefully, this proceeds without error. But one of the greatest single impediments to successful compilation is that you are now dealing with two sets of include files and two sets of library specifications. The includes for the Windows portion of your code are set in the familiar property dialogs of VS. The NVidia compiler does not know of the settings in the Visual Studio project. To change the nvcc side of the operation, which you must if you move the relative location of the NVidia common directory, you must modify the nvcc commandline.
The nvcc Command Line
If you are familiar with custom build rules, you could edit the rules files. The disadvantage is that any modifications will apply to all CUDA project. It may be prudent to create a new custom build rules file. If you are not familiar with custom build rules files, there is a problem. Microsoft has completely revised the structure of custom build rules in Visual Studio 2010. If you learn about custom build rules for your CUDA project, you are spending time on something which is already in the process of becoming outmoded.
An alternative is to associate custom build steps with each individual cu file. Though this ends up requiring a fair amount of redundant copying and pasting, there are some advantages. If you are just getting started with custom build rules, this I think is an easier way to get going, and it has the advantage of giving you a much clearer notion of what is actually happening during the build process. Here we see the property pages for a cu file with custom build steps individually defined.
Remember to fill in the "Outputs" section. If you leave this blank, the build step won't run. This example also shows that additional cu files can be associated with the same step. These additional files are excluded from the build managed by the Visual Studio, but they will be invoked for inclusion in the compilation step performed by nvcc. You can use this as you see fit, but a common practice among CUDA developers is to break many cu files into two. The "template.cu" file will contain code to be compiled via nvcc but which may target the host, while the second file "template_kernel.cu" contains only "kernel" code, that is, code which will run exclusively on the device and never on the host.
Here is the actual nvcc commandline from the above property page:
"$(CUDA_BIN_PATH)\nvcc.exe" -ccbin "$(VCInstallDir)bin" -c -D_DEBUG -D_CONSOLE -D_MBCS -
Xcompiler /EHsc,/W3,/nologo,/Wp64,/Od,/Zi,/RTC1,/MTd -I"$(CUDA_INC_PATH)" -I./ -
I../../common/inc -o $(ConfigurationName)\$(InputName).obj $(InputFileName)
Let's consider this in pieces:
This is obviously the invoking of the executable. Note that Visual Studio uses a make-style "
$" to dereference an environment variable. If you wanted to run this from a command line manually, you would have to change to the DOS-style "
"$(VCInstallDir)bin" -c -D_DEBUG -D_CONSOLE -D_MBCS -
In this section of the commandline, you will note that parameters are specified which will be passed to VC. If these parameters are not compatible with the project parameters set in the Visual Studio project properties, your project may compile but fail to link. Pay special attention to the parameter for the runtime library, in this example /RTC1. If the Visual Studio properties reference a different library, you will likely encounter link errors saying that some procedure has already be defined. That is, the linker encounters procedures of the same name in the VS library and in the library specified to nvcc. This may also pop up as a problem when you move from a debug build to a release build. The VS library will be set changed to a release runtime while the nvcc commandline may still be referencing the debug runtime version. This is sometimes a matter of experimentation. In the VS linker properties, we can specify Ignore All Default Libraries, or, as in this example, demand that a specific library not be linked. Note also the inclusion of the standard CUDA libraries for the linker's consideration.
Include file folder specifications:
This example gives nvcc two places to look for include files; you can add as many
-I parameters as required. A very common error encountered when getting started with CUDA is the failure to find an include file. When this occurs, you must determine whether it is VC or nvcc which has failed to find the include. If the error occurs in a custom build step, it is almost certainly nvcc. In this case, adding additional include directories into the VS project property pages will accomplish nothing. You must add additional include directories in the nvcc commandline, as we see here.
-o $(ConfigurationName)\$(InputName).obj $(InputFileName)
Lastly, the commandline ends with the specification of the path and filename for the output object file and the name of the input file. The style used here is far easier than hard-coding the filename. Once a commandline has been found to be successful, you can copy and paste it into the property pages for other cu files in your VS project. In a large project, files are to be grouped into separate folders and these may be specified in the commandline as follows:
Points of Interest
The book "CUDA by Example", by Jason Sanders and Edward Kandrot, is a very well thought-out and valuable introduction to GPU programming for NVidia hardware. People who know me personally know I dole out praise for computer books rarely, and then only sparingly. While this book has lots of good code examples, it provides little guidance for actually compiling and running the examples; this is why I thought jotting down a few notes might prove useful to those considering a look at CUDA development. By the way, it's fun, and it's well worth the effort.