GPU Computing Using CUDA, Eclipse, and Java with JCuda

Mark H Bishop

4.71/5 (8 votes)

Dec 24, 2012

CPOL

18 min read

105794

840

Tutorial: GPU computing with JCuda and Nsight (Eclipse)

Download source - 30 KB

Introduction

PCs come with an amazingly powerful device: a graphics processing unit (GPU). It is mostly underutilized, often doing little more than rendering a desktop to the user. But computing on the GPU is refreshingly fast compared to conventional CPU processing whenever significant portions of your program can be run in parallel. The applications are seemingly endless including: matrix computations, signal transformations, random number generation, molecular modeling, and password recovery. Why are GPUs are so effective? They have hundreds, in some cases thousands, of cores available for parallel processing. Compare this to the typical one to four CPU cores on today's PCs. (For a more technical treatment see: graphics.stanford.edu/~mhouston/public_talks/cs448-gpgpu.pdf

Here I present a way to use the power of NVidia's Cuda-enabled GPUs for computing using Java with an Eclipse-based IDE. My platform is Debian Wheezy (64 and 32 bit), but I have also reproduced the process on Linux Mint 13, and it can be done on many other Linux distributions. The approach can be adapted to a Windows install, a process that is well documented elsewhere.

Update Notes

This is a September 2013 update of the original article. Since writing this article, there are many new developments particularly in regard to the process for installing the NVidia Development driver on Linux. As distros evolve, it has become increasingly difficult to disable the Nouveau driver, a requirement for installing the NVidia driver. Also, occasionally the compiler (gcc) that ships with the distro differs from the compiler used to compile the OS's kernel itself. Finally, Linux systems using the NVidia Optimus technology require additional gymnastics to configure the driver.

Background

Easily accessing the power of the GPU for general purpose computing requires a GPU programming utility that exposes a set of high-level methods and does all of the granular, hardware-level work for us. The popular choices are OpenGl and Cuda. Cuda works only with NVidia GPUs. I prefer NVidia devices and this article presents a Cuda solution.

Eclipse is my favorite IDE for programming in Java, C++, and PHP. NVidia provides an Eclipse-based IDE called Nsight, which is pre-configured for Cuda C++ development. Other features, like Java, PHP, etc., can be added to your Nsight installation from compatible Eclipse software repositories (e.g. Nsight 5.5 is compatible with the Eclipse Juno repository).

Direct programming with Cuda requires using unmanaged C++ code. I prefer programming with managed code. To do this I use a method for wrapping the C++ functionality of Cuda in bindings that are accessible to Java. In the past, on a Windows 7 platform, I wrote my own wrappers for use with C#.net code (see my CodeProject Article). With Java, this is not necessary because open source wrappers are available. I use JCuda.

There are four basic elements presented here:

Determining if you have a compatible GPU
Installing/configuring Cuda
Configuring Nsight for Java
Utilizing JCuda

Sometimes tutorials present steps that the writer followed on an existing production machine that already had certain prerequisite configurations in place. Consequently, when a reader follows the steps, the procedure may fail. To avoid this, I tested the process described below from fresh installs of Mint 13_64 bit, Linux Mint 13_32 bit, Debian Wheezy x32, and Debian Wheezy x64. For Mint, I chose the Mate flavor in both cases. Here are the details of my demonstration machines:

Mint 13-Mate x64 and Debian Wheezy x64 were used for my AMD 64 machine with a GeForce GTX 560 Ti GPU
Mint 13-Mate x32 and Debian Wheezy x32 used for my Intel 32 machine with a Quadro NVS 160M GPU)
Fresh OS installs were fully updated with update manager.
No other software was added except gedit for consistency in writing this tutorial
No other hardware configurations were performed prior to testing

Special Considerations

Stable, Long Term Service releases for distributions were explicitly chosen for this project. Interim, releases frequently change certain basic hardware configurations and filesystem arrangements. After reviewing and contributing to several hundred Linux forum posts, I am certain that you will experience fewer headaches if you do the same.

On Linux systems there are configuration complications with systems that use the NVidia Optimus technology. Simply stated, GPU tasks that do not require the high-performance of the NVidia GPU are delegated to a lower-performance, lower-power consumption GPU, typically Intel devices. This process is currently not well implemented on Linux machines. But, it can be made to work! If you are lucky, your machine has a BIOs setting for disabling Optimus integration, but many PC manufacturers do not bother to provide this option. Enter Bumblebee, a program that allows you to specify the GPU to use for a given application. Because I have not constructed a test on an Optimus system, details for Optimus-enabled GPUs are not provided here and you will have to research the Bumblebee gymnastics independently. Later, when you configure eclipse for JCuda, my understanding is that Eclipse (and Nsight) can be run with optirun eclipse and the proper GPU will be used for debugging your programs. Here are some promising resources: http://forums.linuxmint.com/viewtopic.php?f=47&t=144049 (post # 7) and http://bumblebee-project.org/install.html

Computationally intensive applications, e.g. Fourier transforms, whether they are done on the CPU or the GPU, will give your system a stress test. Start small and monitor system temperatures when you have high computational overhead.

Setup

Step 1: Do you have a compatible GPU?

NVidia has an exhaustive list of Cuda-compatible GPUs on their Developer Zone web site: http://developer.NVidia.com/Cuda-gpus. Check to see if yours is listed. Also, determine whether your machine uses the NVidia Optimus technology and, if it does, see the note above.

Step 2: Install dependencies:

There are some prerequisites. From a terminal, run the following commands to get them:

sudo apt-get update
sudo apt-get install -y linux-headers-$(uname -r)
sudo apt-get install freeglut3-dev build-essential libx11-dev libxmu-dev libxi-dev libgl1-mesa-glx libglu1-mesa libglu1-mesa-dev gcc

Step 3: Download the CUDA Production Release and install

Download the latest Cuda release from: https://developer.NVidia.com/Cuda-downloads. (Note: The NVidia site only shows Ubuntu releases for Debian forks like Mint. The Cuda releases for Ubuntu work well with Mint LTS 13 and Debian Wheezy.) Select the proper 32/64 choice and prefer the .run file over the .deb file. My most recent download was cuda_5.5.22_linux_32.run (or cuda_5.5.22_linux_64.run).

Split the installer into its three component installer scripts: toolkit, driver, and samples. This fine-grained control is a great benefit if/when troubles occur. Here is the syntax for splitting the installer.

sh cuda_5.5.22_linux_32.run -extract=<theCompletePathToYourDestination>

sh cuda_5.5.22_linux_64.run -extract=<theCompletePathToYourDestination>

The following three files are created:

NVidia-Linux-x86-319.37.run or NVidia-Linux-x64-319.37.run (AKA Developer Driver)
cuda-linux-rel-5.5.22-16488124.run (AKA Toolkit)
cuda-samples-linux-5.5.22-16488124.run

Install the developer driver

We start by installing the NVidia developer driver. This step creates the most trouble for Linux users because it varies substantially from distro to distro. Before you do anything; print this page, save your work, and be sure you are backed-up.

You cannot have an X server running when you install the developer drivers. Do a preliminary test to make sure you can drop to a console and stop your X server. Simultaneously press [ctrl][alt][f2]. If you are lucky your desktop shows a console prompting you to login. If so, login and stop the display manager:

sudo service mdm stop (for Mint desktops)
sudo service gdm3 stop (for Gnome 3 desktops)
sudo service lightdm stop (for xfce desktops)

You should now see the console. If you see a blank screen, do [ctrl]+[alt]+[f2] again. Now you can either run sudo reboot or startx to return to your desktop. If this test fails, then you should install your package manager's NVidia non-free driver, then try it again... even though in a subsequent step we will be removing it.

Debian and it's siblings use a default driver called nouveau, a wonderful, open-source solution for NVidia GPU's that is totally incompatible with NVidia Cuda development. It must be disabled at boot time. One way is to modify grub:

gksu gedit /etc/default/grub

Find the line that reads: “GRUB_CMDLINE_LINUX_DEFAULT=...” and make it read:

GRUB_CMDLINE_LINUX_DEFAULT="quiet nouveau.modeset=0"

Save the file, close gedit, and run:

sudo update-grub

sudo reboot

Another more conservative way is to interrupt the grub bootloader and manually insert the nouveau.modeset=0 phrase as a one-time boot option. To do this, your grub configuration must have a timeout that enables you to view the grub menu. At the grub menu, highlight your default boot option and press e to get the grub command line. Find the line that reads "Linux ..." and add nouveau.modeset=0 to the end of the line. Press [cntl][x] to start. If you use this method, you will need to repeat this process if you reboot before the driver is installed and nouveau is removed. Here's a reference that presents the basic idea on a Mint distro: http://community.linuxmint.com/tutorial/view/842

Next, edit your blacklist configuration file (gksu gedit /etc/modprobe.d/blacklist.conf) and add these lines to the end:

blacklist amd76x_edac
blacklist vga16fb
blacklist nouveau
blacklist rivafb
blacklist NVidiafb
blacklist rivatv

Then, remove everything NVidia from the system with:

sudo apt-get remove --purge NVidia*

Drop to a console ([ctrl][alt][f2]), exit the X server (e.g. sudo service mdm stop), and run the installer:

sudo sh NVIDIA-Linux-x86-319.37.run (or sudo sh NVIDIA-Linux-x64-319.37.run)

Read/accept EULA
At question: "register kernel module sources with DKMS", I said YES.
At question (64 bit only): "Install 32-bit OpenGL compatibility", I said NO.
At question: "run the NVidia-xconfig utility", I said YES.
On one test machine where I had not disabled nouveau at boot, the installer asked me if I wanted it to attempt to remove nouveau. This works occasionally, but don't count on it.
When complete, reboot. Hopefully you will see the NVidia splash screen when your desktop loads.

Your installer may fail. The most common errors are that a display manager is in use or that there is a conflict (with nouveau). Retracing the steps above will remedy these problems. But, sometimes an error will occur if the distro's kernel was compiled with an earlier version of gcc. (You'll see something like: The compiler used to compile the kernel (gcc 4.6) does not exactly match the current compiler (gcc 4.7).) Occasionally selecting to ignore this will work, but again, don't count on it. You need to install the gcc version used to compile the kernel (e.g. 4.6 in the example above). Do this using your preferred package manager. Next, because your machine now has two gcc versions, we need to create alternatives. Using the example of gcc 4.6 and gcc 4.7 we run:

sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-4.6 10
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-4.7 20

Now, when you run:
sudo update-alternatives --config gcc

You can pick gcc 4.6 as the active version. Later, after the install, you can switch it back.

Install the Cuda Toolkit

Whew! Now it gets easier. Next, we install the toolkit with:

sudo sh cuda-linux-rel-5.5.22-16488124.run (or sudo sh cuda-linux64-rel-5.5.22-16488124.run)

(If you see a gcc version error, see Your installer may fail under Install the Developer Driver above.)

Your toolkit install console will present the following text when it is complete:

* Please make sure your PATH includes /usr/local/cuda-5.5/bin
* Please make sure your LD_LIBRARY_PATH
* for 32-bit Linux distributions includes /usr/local/cuda-5.5/lib
* for 64-bit Linux distributions includes /usr/local/cuda-5.5/lib64:/usr/local/cuda-5.5/lib
* OR
* for 32-bit Linux distributions add /usr/local/cuda-5.5/lib
* for 64-bit Linux distributions add /usr/local/cuda-5.5/lib64 and /usr/local/cuda-5.5/lib
* to /etc/ld.so.conf and run ldconfig as root

Save time and frustration

Set your additional paths persistently by editing (creating if necessary) the .profile file in your home directory. Add PATH=$PATH:/usr/local/cuda-5.5/bin to the end of the file, save, then logout and login.

Use a persistent, modular approach for managing your LD_LIBRARY_PATH. I never edit the /etc/ld.so.conf file. Rather, my ld.so.conf file contains the line: include /etc/ld.so.conf.d/*.conf. I create a new file in the /etc/ld.so.conf.d folder named cuda.conf that has the following line(s):

/usr/local/cuda-5.5/lib
/usr/local/cuda-5.5/lib64 (64 bit installs only)

Then run sudo ldconfig.

Step 4: Test CUDA Using NVidia CUDA Samples

Install the samples by running your third, split-out installer script:

sudo sh cuda-samples-linux-5.5.22-16488124.run

Now let's run a test. From a terminal, change to the folder where the deviceQuery sample is located (default is /usr/local/cuda-5.5/samples/1_Utilities/deviceQuery). Make the sample with the system compiler:

sudo make

(If you see a gcc version error when you run sudo make, see Your installer may fail under Install the Developer Driver above.)

Then, run the sample with:

./deviceQuery

I see the following on my 64 bit test system:

/usr/local/cuda-5.5/samples/1_Utilities/deviceQuery $ .
/deviceQuery ./deviceQuery Starting...
Cuda Device Query (Runtime API) version (CudaRT static linking)
Detected 1 Cuda Capable device(s)
Device 0: "GeForce GTX 560 Ti"
etc., etc., ...
Runtime Version = 5.5, NumDevs = 1, Device0 = GeForce GTX 560 T

Step 5: Start the Nsight Eclipse edition

Nsight is a fork of Eclipse that is pre-configured for C++ and Cuda. It is included in your toolkit install (you already have it). For now, run it from a terminal: /usr/local/cuda-5.5/libnsight/nsight. (Do not double-click the file from your file manager.) Later you can make a desktop launcher. Go ahead and choose the default folder for projects that it recommends.
Let's test it.

File > New > Cuda C++ Project
Pick Import Cuda Sample
Name the project test
Click Next
In the samples list pick Bandwidth Test
Click Next
Basic settings - use defaults
Click Finish
From the Project menu: Project > Build Project
From the Run menu: Run > Run

My output in the console window is:

[Cuda Bandwidth Test] - Starting...
Running on..Device 0:
GeForce GTX 560 Ti.
etc., ...

Step 6: Configure Nsight for Java Development

Nsight can be expanded through Help>Install New Software. To add Java development, you need to add http://download.eclipse.org/releases/juno to your Available Software Sites. (Note: the Kepler repository does not work as of Nsight 5.5) Then, install Eclipse Java Development Tools.

add java

Follow the install dialog and restart Nsight.

Step 7: Download and Get Started with the JCuda Bindings

Download the zip for your platform from http://www.jCuda.org/downloads/downloads.html. Extract it to a folder in your home directory. Then start Nsight. Create a new Java Project (File > New > Java Project) and name it JCudaHello. Right-click the JCudaHello project in the project explorer and select Properties. Go to the Java Build Path tree item and select the Libraries tab. Click Add External Jars, navigate to the extracted folder you created, and pick jCuda-0.5.5.jar. With the Libraries tab still open, expand the tree for the jCuda-0.5.5.jar you added and click on Native library location (none). Then click the Edit button. You will be asked for a location. Click External Folder and again navigate to the extracted folder. Click OK.

set libraries

Now, right-click your src folder in the jcudaHello project from the Project Explorer and select New > Class. Name the class cudaTest and select the public static void main method stub:

new project

Click Finish. Delete the code that is pre-generated in cudaTest.java from the editor pane and paste this in:

    import jcuda.Pointer;
    import jcuda.runtime.JCuda;
    public class test {
    public static 
    void main(String[] args) {
        Pointer pointer = new Pointer();
        JCuda.cudaMalloc(pointer, 4);
        System.out.println("Pointer: " + pointer);
        JCuda.cudaFree(pointer);
    }
}

When you run it, you should see something like this:

Pointer:
Pointer[nativePointer=0x800100000,byteOffset=0]

Using the project code

The project code is a zipped Eclipse workspace that does not include any hidden meta-data folders or information files. When you unzip it to your location of choice, you will see two sub-directories: JCudaFftDemo and Notes.

First, we need to create an Nsight Java project from the existing sources in the JCudaFftDemo folder. Start Nsight and choose your extracted directory (parent directory for JCudaFftDemo) when it asks you to select a workspace. Create a new Java Project from the File menu and give it the exact name: JCudaFftDemo. Then, click Finish. If you expand the trees for the project in the Project Explorer you should see:

project explorer

Next, you need to add the JCuda binaries to the Java Build Path. Right-click the JCudaFftDemo project in the Project Explorer and select Properties. Go to the Java Build Path tree item and select the Libraries tab. Click Add External Jars, navigate to the JCuda binaries you downloaded in Setup – Step 7, and pick jCuda-0.5.5.jar, jcublas-0.5.5.jar, and jcufft-0.5.5.jar. With the Libraries tab still open, one at a time, expand the trees for the jars you added and click on Native library location (none). Click the Edit button and set the location to match your JCuda binaries directory. (We are repeating Step 7 in the above Setup section, this time for the new project.)

Then, run it as a Java application. Here is the output console from my Linux Mint 13, 32 bit laptop:

Creating sin wave input data: Frequency = 11.0, N = 1048576, dt = 5.0E-5 ...

L2 Norm of original signal: 724.10583

Performing a 1D C2C FFT on GPU with JCufft...
GPU FFT time: 0.121 seconds

Performing a 1D C2C FFT on CPU...
CPU time: 3.698 seconds

GPU FFT L2 Norm: 741484.3
CPU FFT L2 Norm: 741484.4

Index at maximum in GPU power spectrum = 572, frequency = 10.910034
Index at maximum in CPU power spectrum = 572, frequency = 10.910034

Performing 1D C2C IFFT(FFT) on GPU with JCufft...
GPU time: 0.231 seconds

Performing 1D C2C IFFT(FFT) on CPU...
CPU time: 3.992 seconds

GPU FFT L2 Norm: 724.1056
CPU FFT L2 Norm: 724.10583

More about the project code

First, a word about complex data arrays; CUDA and JCuda can work with data arrays that contain complex vectors of type float or double, provided you construct the array as an interleaved, complex number sequence. This is best demonstrated with an example. Let’s say we have a complex vector of length 2: (1 + 2i, 3 + 4i). The corresponding interleaved data array has a length of 4 and has the form: (1, 2, 3, 4). In the project code I use this format for all complex vectors that are submitted to JCuda methods.

In contrast, for CPU coding simplicity, I use a ComplexFloat class to represent complex numbers. When using this class to from a complex vector, the vector x = (1 + 2i, 3 + 4i) has the form ComplexFloat[2] = (x[0].Real = 1, x[0].Imaginary = 2, x[1].Real = 3, x[1].Imaginary = 4). The array, and the vector it represents, both have the same length: 2.

Main.java is the entry point for the application. It creates a sample signal and performs the demo. The signal produced is: sin(2*pi*FREQ *t) sampled N times in increments of dT. The demo computes forward and inverse Fourier transforms of the test signal — both on the GPU and the CPU — and provides execution times and signal characteristics for the results.

The CPU FFT part of the code (FftCpuFloat.java) purposely implements the Cooley–Tukey algorithm in an awkward way that depends on instances of the ComplexFloat.java class. Little attention is paid to memory allocation and access. Also, although I have multi-core CPUs, my CPU thread executes on only one core. Doing this makes the radix-2 procedure intuitive and simple, but there is an overhead cost that will overstate the advantage of using the GPU.

You can adjust the constants (FREQ, N, and dT) for creating the test signal from the Main.java class. Using a Linux 32 bit installation on an older Dell laptop I found that, by varying the length of the test signal (N), the CPU FFT outperformed the JCuda FFT with signals that had fewer than 4096 complex elements. Thereafter, the JCuda FFT speeds overwhelmed my CPU FFT. At N = 4194304, JCuda was 250 times faster than the CPU FFT (CPU = 23 seconds, GPU = 0.9 seconds). Beyond that, the laptop fans blaze during the CPU computation loop (system temp: 90 C) and fear of thermal overload prompted me to curtail testing. (My Linux 64 bit desktop, has a 6 core AMD Phenom II on a Sabretooth mombo, 16 GiB of memory, a GeForce GTX 560 Ti graphics card, and some great fans. It can process FFTs (CPU or GPU) all night provided I manage memory effectively.)

A fair amount of the speed advantage I observe is due to the inefficiency of my poorly optimized CPU implementation. More rigorous CPU/GPU evaluations using optimized CPU code suggest that gains are roughly 10X. I'll take 10X over 1X, but the practical reality is; the the power of CUDA's underlying implementation efficiency together with the intrinsic GPU gain (whatever it really is), collectively gives me an average 50X boost.

The Notes folder in the project download includes some tips on how to run a deployed, runnable jar. Basically, you need to use the -Djava.libraries.path switch to point to your JCuda binaries folder.

In conclusion

Getting setup and becoming acquainted with CUDA, JCuda, and Nsight takes a fair amount of work. But it's worth it. General-purpose computing on graphics processing units (GPGPU) is a very important tool to have in your coding toolbox. I hope this article helps make the process more accessible to other GPGPU novices like me. I wish you success as a cutting-edge JCuda coder!