Part 6: Primitive Restart and OpenGL Interoperability





5.00/5 (1 vote)
This sixth article in a series on portable multithreaded programming using OpenCL™ where Rob Farber discusses how to calculate data in OpenCL™ and render it with OpenGL within the same application.
Topics covered include configuration, buffer interoperability, and the use of primitive restart to accelerate GPU rendering. The previous article, part 5, of this series, demonstrated how to tie computation to data in a multi-device, multi-GPU environment.
Introduction
OpenGL is a common graphical programming API (Application Programming Interface) that is standards based, cross-language, and cross-platform. It can be used to create applications that can render 2D and 3D images on most visualization hardware. OpenGL and OpenCL interoperability can greatly accelerate both data generation as well as data visualization. Basically, the OpenCL application maps the OpenGL buffers so they can be modified by massively-parallel kernels running on the GPU. This keeps the data on the GPU and avoids costly PCIe bus transfers.
Primitive restart is a new feature added in the OpenGL 3.1 specification. In short, primitive restart lets the programmer define a numeric value that acts as a token that tells the OpenGL state machine to restart an OpenGL rendering instruction to begin with the next data item. This has multiple advantages:
- For the programmer, this means that multiple lines, triangle fans and irregular meshes can be rendered with a single command.
- The developers of the OpenGL state machine can optimize their code so the test for the primitive restart token happens on the GPU thus eliminating host/GPU communications bottlenecks across the PCIe bus.
- Increase rendering performance can be obtained by arranging the data to achieve the highest reuse of the cache in the texture units.
- Higher quality images can be created by alternating the direction of tessellation as noted in the primitive restart specification and illustrated in the following two figures:
This article is not intended as an OpenGL tutorial. There are a number of excellent books and tutorials available on the Internet. An excellent one is the NeHe tutorial series. The following application is intended to demonstrate OpenCL and OpenGL interoperability and the use of primitive restart.
Building the Application
Copy and paste the complete source listing for gltest.cpp into a file by that name. The complete source is provided at the end of this article after the walk through. Similarly, copy and paste the source for the OpenCL kernel, sinewave.cl.
The source can be compiled under Linux with the following command:
g++ -I $ATISTREAMSDKROOT/include -L $ATISTREAMSDKROOT/lib/x86_64 gltest.cpp -lglut -lGLEW -lOpenCL -o gltest
The application runs on the GPU by default. The program runs on the CPU when anything is specified on the command-line. This simple application requires that the OpenCL kernel be in the file sinewave.cl in the same directory.
./gltest # running on the GPU ./gltest CPU # This will run the application on the CPU
By default, the application starts by rendering a surface. Use the mouse to rotate and zoom the image. Notice that the colors as well as the shape of the image changes with time to show repeated computation by the OpenCL kernel.
Pressing ‘D’ or ‘d’ on the keyboard cycles the rendering mode from a surface, to a collection of points, to a set of lines, and finally back to a surface again as can be seen in the following three images.
Pressing ‘q’ exits the program.
A Walk Through the gltest.cpp Source Code
The initial section of the code specifies the include files, constants and a set of global variables.
//Code by Rob Farber
#include <iostream>
#include <fstream>
using namespace std;
#include <CL/cl.h>
#include <CL/cl_gl.h>
#include <GL/glew.h>
#include <GL/glut.h>
#ifndef _WIN32
#include <GL/glxew.h>
#endif //!_WIN32
#define WIDTH 1408
#define HEIGHT 1024
// Globals used in the program
const unsigned int mesh_width = 128, mesh_height = 128;
const unsigned int RestartIndex = 0xffffffff;
cl_platform_id platform;
cl_device_id device;
cl_context context;
cl_command_queue queue;
cl_program program;
cl_kernel kernel;
size_t kernelsize;
size_t global[] = {mesh_width, mesh_height};
har *pathname = NULL;
char *source = NULL;
A VBO (Vertex Buffer Object) is an OpenGL memory buffer in the high speed memory of the GPU. The following source code shows that gltest.cpp utilizes two VBOs, one that describes the coordinates of a set of vertices for the surface and another that describes the color associated with each vertex. VBOs can also store information such as normals, texcoords, indicies, and other data.
// Globals associated with the position vbo
const unsigned int p_vbo_size = mesh_width*mesh_height*4*sizeof(float);
GLuint p_vbo;
cl_mem p_vbocl;
// Globals associated with the color vbo
const unsigned int c_vbo_size = mesh_width*mesh_height*4*sizeof(unsigned char);
GLuint c_vbo;
cl_mem c_vbocl;
Primitive restart requires that a set of indices be used to point to each location of the data (or to the primitive restart token). The qIndicies vector holds these indices.
// Globals associated with the indices for primitive restart
GLuint* qIndices=NULL;
int qIndices_size = 5*(mesh_height-1)*(mesh_width-1);
This code utilizes the OpenGL Utility Toolkit GLUT, which is a portable toolkit that runs under UNIX, Windows and other operating systems. GLUT provides an easy API to specify callbacks to functions that handle mouse, window, keyboard and other events. The following section provides some global variables for those callbacks as well as forward references to the functions.
float anim = 0.0;
int drawMode=GL_TRIANGLE_FAN; // the default draw mode
const char* drawStr="fan";
const char* platformString="notset";
// Globals associated with the mouse controls
int mouse_old_x, mouse_old_y;
int mouse_buttons = 0;
float rotate_x = 0.0, rotate_y = 0.0;
float translate_z = -2.5;
// Forward references for the GLUT callbacks
void display();
void motion(int x, int y);
void mouse(int button, int state, int x, int y);
void keyboard(unsigned char key, int x, int y);
void initgl(int argc, const char** argv);
For convenience, the title of the window is used to report important characteristics such as the device (CPU or GPU) and drawing mode. This information is specified with the following helper function:
// helper routine to set the window title void setTitle() { char title[256]; sprintf(title, "GL Interop Wrapper: mode %s device %s", drawStr, platformString); glutSetWindowTitle(title); }
The first part of main
initializes the window and
provides a very simple command-line processing to select the CPU or the GPU.
int main(int argc, const char **argv)
{
initgl(argc, argv);
clGetPlatformIDs(1, &platform, NULL);
if(argc > 1) {
clGetDeviceIDs(platform, CL_DEVICE_TYPE_CPU, 1, &device, NULL);
platformString = "CPU";
} else {
clGetDeviceIDs(platform, CL_DEVICE_TYPE_GPU, 1, &device, NULL);
platformString = "GPU";
}
The OpenCL context and queue is created. Note that the
context must be tied to the OpenGL context as highlighted by the #ifdef
preprocessor
statements. Otherwise, nothing will be displayed.
// It is necessary to add the gl context to the properties or
// nothing will display
#ifdef _WIN32
HGLRC glCtx = wglGetCurrentContext();
#else //!_WIN32
GLXContext glCtx = glXGetCurrentContext();
#endif //!_WIN32
cl_context_properties props[] = { CL_CONTEXT_PLATFORM,
(cl_context_properties)platform,
#ifdef _WIN32
CL_WGL_HDC_KHR, (intptr_t) wglGetCurrentDC(),
#else //!_WIN32
CL_GLX_DISPLAY_KHR, (intptr_t) glXGetCurrentDisplay(),
#endif //!_WIN32
CL_GL_CONTEXT_KHR, (intptr_t) glCtx, 0};
// Create the context and the queue
context = clCreateContext(props, 1, &device, NULL, NULL, NULL);
queue = clCreateCommandQueue(context, device, 0, NULL);
Both the position and color VBOs are created. Note that the
call to glBindBuffer
allocates the space on the device. VBOs are mapped
into the OpenCL memory space with a call to clCreateFromGLBuffer
, which
returns a pointer to global memory that can be passed to the OpenCL kernel.
// create position p_vbo
glGenBuffers(1, &p_vbo);
glBindBuffer(GL_ARRAY_BUFFER, p_vbo);
// initialize buffer object
glBufferData(GL_ARRAY_BUFFER, p_vbo_size, 0, GL_DYNAMIC_DRAW);
// create OpenCL buffer from GL VBO
p_vbocl = clCreateFromGLBuffer(context, CL_MEM_WRITE_ONLY, p_vbo, NULL);
// create color c_vbo (very similar to the position vbo)
glGenBuffers(1, &c_vbo);
glBindBuffer(GL_ARRAY_BUFFER, c_vbo);
glBufferData(GL_ARRAY_BUFFER, c_vbo_size, 0, GL_DYNAMIC_DRAW);
c_vbocl = clCreateFromGLBuffer(context, CL_MEM_WRITE_ONLY, c_vbo, NULL);
Even though this is a C-API OpenCL program, C++ was used to load the program source. The OpenCL program is then built, the kernel arguments are specified, and the indices for primitive restart are defined.
// For convenience use C++ to load the program source into memory
ifstream file("sinewave.cl");
string prog(istreambuf_iterator<char>(file), (istreambuf_iterator<char>()));
file.close();
const char* source = prog.c_str();
const size_t kernelsize = prog.length()+1;
program = clCreateProgramWithSource(context, 1, (const char**) &source,
&kernelsize, NULL);
// Build the program executable
int err = clBuildProgram(program, 0, NULL, NULL, NULL, NULL);
if (err != CL_SUCCESS) {
size_t len;
char buffer[2048];
cerr << "Error: Failed to build program executable!" << endl;
clGetProgramBuildInfo(program, device, CL_PROGRAM_BUILD_LOG,
sizeof(buffer), buffer, &len);
cerr << buffer << endl;
exit(1);
}
// Create the compute kernel in the program
kernel = clCreateKernel(program, "sinewave", &err);
if (!kernel || err != CL_SUCCESS) {
cerr << "Error: Failed to create compute kernel!" << endl;
exit(1);
}
// Set the kernel arguments. Note argument 3 is set in display
clSetKernelArg(kernel, 0, sizeof(cl_mem), (void*)&p_vbocl);
clSetKernelArg(kernel, 1, sizeof(unsigned int), &mesh_width);
clSetKernelArg(kernel, 2, sizeof(unsigned int), &mesh_height);
clSetKernelArg(kernel, 4, sizeof(cl_mem), (void*)&c_vbocl);
// Generate the indices for primitive restart
// allocate and assign trianglefan indicies
qIndices = (GLuint *) malloc(qIndices_size*sizeof(GLint));
int index=0;
for(int i=1; i < mesh_height; i++) {
for(int j=1; j < mesh_width; j++) {
qIndices[index++] = (i)*mesh_width + j;
qIndices[index++] = (i)*mesh_width + j-1;
qIndices[index++] = (i-1)*mesh_width + j-1;
qIndices[index++] = (i-1)*mesh_width + j;
qIndices[index++] = RestartIndex;
}
}
The main
method completes by setting the window title
and calling the GLUT main loop, which does not exit.
setTitle();
glutMainLoop();
}
The initgl
method performs all the work of setting up
the window, registering the callbacks, and defining some initial viewing
conditions.
// setup the window and assign callbacks
void initgl(int argc, const char** argv)
{
glutInit(&argc, (char**)argv);
glutInitDisplayMode(GLUT_RGBA | GLUT_DOUBLE);
glutInitWindowPosition (glutGet(GLUT_SCREEN_WIDTH)/2 - WIDTH/2,
glutGet(GLUT_SCREEN_HEIGHT)/2 - HEIGHT/2);
glutInitWindowSize(WIDTH, HEIGHT);
glutCreateWindow("");
glutDisplayFunc(display); // register GLUT callback functions
glutKeyboardFunc(keyboard);
glutMouseFunc(mouse);
glutMotionFunc(motion);
glewInit();
glClearColor(0.0, 0.0, 0.0, 1.0);
glDisable(GL_DEPTH_TEST);
glViewport(0, 0, WIDTH, HEIGHT);
glMatrixMode(GL_PROJECTION);
glLoadIdentity();
gluPerspective(10000.0, (GLfloat)WIDTH / (GLfloat)HEIGHT, 0.1, 10.0);
glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
glMatrixMode(GL_MODELVIEW);
glLoadIdentity();
return;
}
The display
method is where all the work happens.
Basically, the OpenGL buffer objects are acquired, and the kernel is queued.
The OpenGL buffers are released once the kernel completes.
// This method is called everytime the screen is redisplayed. No
// optimization is performed as it recalculates the kernel every time.
void display()
{
anim += 0.01f;
// map OpenGL buffer object for writing from OpenCL
glFinish();
clEnqueueAcquireGLObjects(queue, 1, &p_vbocl, 0,0,0);
clEnqueueAcquireGLObjects(queue, 1, &c_vbocl, 0,0,0);
// Set arg 3 and queue the kernel
clSetKernelArg(kernel, 3, sizeof(float), &anim);
clEnqueueNDRangeKernel(queue, kernel, 2, NULL, global, NULL, 0, 0, 0);
// queue unmap buffer object
clEnqueueReleaseGLObjects(queue, 1, &c_vbocl, 0,0,0);
clEnqueueReleaseGLObjects(queue, 1, &p_vbocl, 0,0,0);
clFinish(queue);
// clear graphics
glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
The newly computed data is then rendered according to the
drawing mode after which display
cleans up. Note that primitive restart
is enabled with the call to glEnableClientState
and that the unique
index is specified to the OpenGL state machine with glPrimitiveRestartIndexNV
.
Only a single glDrawElements
call is made to render all the triangle
fans.
// Apply the image transforms
glMatrixMode(GL_MODELVIEW);
glLoadIdentity();
glTranslatef(0.0, 0.0, translate_z);
glRotatef(rotate_x, 1.0, 0.0, 0.0);
glRotatef(rotate_y, 0.0, 1.0, 0.0);
//render from the p_vbo
glBindBuffer(GL_ARRAY_BUFFER, p_vbo);
glVertexPointer(4, GL_FLOAT, 0, 0);
glEnableClientState(GL_VERTEX_ARRAY);
// enable colors from the c_vbo
glBindBuffer(GL_ARRAY_BUFFER, c_vbo);
glColorPointer(4, GL_UNSIGNED_BYTE, 0, 0);
glEnableClientState(GL_COLOR_ARRAY);
// draw points, lines or triangles according to the user keyboard input
switch(drawMode) {
case GL_LINE_STRIP:
for(int i=0 ; i < mesh_width*mesh_height; i+= mesh_width)
glDrawArrays(GL_LINE_STRIP, i, mesh_width);
break;
case GL_TRIANGLE_FAN:
glPrimitiveRestartIndexNV(RestartIndex);
glEnableClientState(GL_PRIMITIVE_RESTART_NV);
glDrawElements(GL_TRIANGLE_FAN, qIndices_size, GL_UNSIGNED_INT, qIndices);
break;
default:
glDrawArrays(GL_POINTS, 0, mesh_width * mesh_height);
break;
}
// handle housekeeping and redisplay
glDisableClientState(GL_COLOR_ARRAY);
glDisableClientState(GL_VERTEX_ARRAY);
glutSwapBuffers();
glutPostRedisplay();
}
The keyboard callback handles keyboard input. Pressing ‘D’ or ‘d’ cycles the drawing mode between rendering a surface, to a bunch of points, then to lines and back to a surface. Pressing a ‘q’ or ESC terminates the application.
// Keyboard events handler for GLUT
void keyboard(unsigned char key, int x, int y)
{
switch(key) {
case('q') :
case(27) :
exit(0);
break;
case 'd':
case 'D':
switch(drawMode) {
case GL_POINTS: drawMode = GL_LINE_STRIP; drawStr = "line"; break;
case GL_LINE_STRIP: drawMode = GL_TRIANGLE_FAN; drawStr = "fan"; break;
default: drawMode=GL_POINTS; drawStr = "points"; break;
}
}
setTitle();
glutPostRedisplay();
}
Following are the mouse handlers.
// Mouse event handler for GLUT
void mouse(int button, int state, int x, int y)
{
if (state == GLUT_DOWN) {
mouse_buttons |= 1<<button;
} else if (state == GLUT_UP) {
mouse_buttons = 0;
}
mouse_old_x = x;
mouse_old_y = y;
glutPostRedisplay();
}
// Motion event handler for GLUT
void motion(int x, int y)
{
float dx, dy;
dx = x - mouse_old_x;
dy = y - mouse_old_y;
if (mouse_buttons & 1) {
rotate_x += dy * 0.2;
rotate_y += dx * 0.2;
} else if (mouse_buttons & 4) {
translate_z += dy * 0.01;
}
mouse_old_x = x;
mouse_old_y = y;
}
The complete source for gltest.cpp follows:
//Code by Rob Farber
#include <iostream>
#include <fstream>
using namespace std;
#include <CL/cl.h>
#include <CL/cl_gl.h>
#include <GL/glew.h>
#include <GL/glut.h>
#ifndef _WIN32
#include <GL/glxew.h>
#endif //!_WIN32
#define WIDTH 1408
#define HEIGHT 1024
// Globals used in the program
const unsigned int mesh_width = 128, mesh_height = 128;
const unsigned int RestartIndex = 0xffffffff;
cl_platform_id platform;
cl_device_id device;
cl_context context;
cl_command_queue queue;
cl_program program;
cl_kernel kernel;
size_t kernelsize;
size_t global[] = {mesh_width, mesh_height};
char *pathname = NULL;
char *source = NULL;
// Globals associated with the position vbo
const unsigned int p_vbo_size = mesh_width*mesh_height*4*sizeof(float);
GLuint p_vbo;
cl_mem p_vbocl;
// Globals associated with the color vbo
const unsigned int c_vbo_size = mesh_width*mesh_height*4*sizeof(unsigned char);
GLuint c_vbo;
cl_mem c_vbocl;
// Globals associated with the indices for primitive restart
GLuint* qIndices=NULL;
int qIndices_size = 5*(mesh_height-1)*(mesh_width-1);
float anim = 0.0;
int drawMode=GL_TRIANGLE_FAN; // the default draw mode
const char* drawStr="fan";
const char* platformString="notset";
// Globals associated with the mouse controls
int mouse_old_x, mouse_old_y;
int mouse_buttons = 0;
float rotate_x = 0.0, rotate_y = 0.0;
float translate_z = -2.5;
// Forward references for the GLUT callbacks
void display();
void motion(int x, int y);
void mouse(int button, int state, int x, int y);
void keyboard(unsigned char key, int x, int y);
void initgl(int argc, const char** argv);
// helper routine to set the window title
void setTitle()
{
char title[256];
sprintf(title, "GL Interop Wrapper: mode %s device %s",
drawStr, platformString);
glutSetWindowTitle(title);
}
int main(int argc, const char **argv)
{
initgl(argc, argv);
clGetPlatformIDs(1, &platform, NULL);
if(argc > 1) {
clGetDeviceIDs(platform, CL_DEVICE_TYPE_CPU, 1, &device, NULL);
platformString = "CPU";
} else {
clGetDeviceIDs(platform, CL_DEVICE_TYPE_GPU, 1, &device, NULL);
platformString = "GPU";
}
// It is necessary to add the gl context to the properties or
// nothing will display
#ifdef _WIN32
HGLRC glCtx = wglGetCurrentContext();
#else //!_WIN32
GLXContext glCtx = glXGetCurrentContext();
#endif //!_WIN32
cl_context_properties props[] = { CL_CONTEXT_PLATFORM,
(cl_context_properties)platform,
#ifdef _WIN32
CL_WGL_HDC_KHR, (intptr_t) wglGetCurrentDC(),
#else //!_WIN32
CL_GLX_DISPLAY_KHR, (intptr_t) glXGetCurrentDisplay(),
#endif //!_WIN32
CL_GL_CONTEXT_KHR, (intptr_t) glCtx, 0};
// Create the context and the queue
context = clCreateContext(props, 1, &device, NULL, NULL, NULL);
queue = clCreateCommandQueue(context, device, 0, NULL);
// create position p_vbo
glGenBuffers(1, &p_vbo);
glBindBuffer(GL_ARRAY_BUFFER, p_vbo);
// initialize buffer object
glBufferData(GL_ARRAY_BUFFER, p_vbo_size, 0, GL_DYNAMIC_DRAW);
// create OpenCL buffer from GL VBO
p_vbocl = clCreateFromGLBuffer(context, CL_MEM_WRITE_ONLY, p_vbo, NULL);
// create color c_vbo (very similar to the position vbo)
glGenBuffers(1, &c_vbo);
glBindBuffer(GL_ARRAY_BUFFER, c_vbo);
glBufferData(GL_ARRAY_BUFFER, c_vbo_size, 0, GL_DYNAMIC_DRAW);
c_vbocl = clCreateFromGLBuffer(context, CL_MEM_WRITE_ONLY, c_vbo, NULL);
// For convenience use C++ to load the program source into memory
ifstream file("sinewave.cl");
string prog(istreambuf_iterator<char>(file), (istreambuf_iterator<char>()));
file.close();
const char* source = prog.c_str();
const size_t kernelsize = prog.length()+1;
program = clCreateProgramWithSource(context, 1, (const char**) &source,
&kernelsize, NULL);
// Build the program executable
int err = clBuildProgram(program, 0, NULL, NULL, NULL, NULL);
if (err != CL_SUCCESS) {
size_t len;
char buffer[2048];
cerr << "Error: Failed to build program executable!" << endl;
clGetProgramBuildInfo(program, device, CL_PROGRAM_BUILD_LOG,
sizeof(buffer), buffer, &len);
cerr << buffer << endl;
exit(1);
}
// Create the compute kernel in the program
kernel = clCreateKernel(program, "sinewave", &err);
if (!kernel || err != CL_SUCCESS) {
cerr << "Error: Failed to create compute kernel!" << endl;
exit(1);
}
// Set the kernel arguments. Note argument 3 is set in display
clSetKernelArg(kernel, 0, sizeof(cl_mem), (void*)&p_vbocl);
clSetKernelArg(kernel, 1, sizeof(unsigned int), &mesh_width);
clSetKernelArg(kernel, 2, sizeof(unsigned int), &mesh_height);
clSetKernelArg(kernel, 4, sizeof(cl_mem), (void*)&c_vbocl);
// Generate the indices for primitive restart
// allocate and assign trianglefan indicies
qIndices = (GLuint *) malloc(qIndices_size*sizeof(GLint));
int index=0;
for(int i=1; i < mesh_height; i++) {
for(int j=1; j < mesh_width; j++) {
qIndices[index++] = (i)*mesh_width + j;
qIndices[index++] = (i)*mesh_width + j-1;
qIndices[index++] = (i-1)*mesh_width + j-1;
qIndices[index++] = (i-1)*mesh_width + j;
qIndices[index++] = RestartIndex;
}
}
setTitle();
glutMainLoop();
}
// setup the window and assign callbacks
void initgl(int argc, const char** argv)
{
glutInit(&argc, (char**)argv);
glutInitDisplayMode(GLUT_RGBA | GLUT_DOUBLE);
glutInitWindowPosition (glutGet(GLUT_SCREEN_WIDTH)/2 - WIDTH/2,
glutGet(GLUT_SCREEN_HEIGHT)/2 - HEIGHT/2);
glutInitWindowSize(WIDTH, HEIGHT);
glutCreateWindow("");
glutDisplayFunc(display); // register GLUT callback functions
glutKeyboardFunc(keyboard);
glutMouseFunc(mouse);
glutMotionFunc(motion);
glewInit();
glClearColor(0.0, 0.0, 0.0, 1.0);
glDisable(GL_DEPTH_TEST);
glViewport(0, 0, WIDTH, HEIGHT);
glMatrixMode(GL_PROJECTION);
glLoadIdentity();
gluPerspective(10000.0, (GLfloat)WIDTH / (GLfloat)HEIGHT, 0.1, 10.0);
glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
glMatrixMode(GL_MODELVIEW);
glLoadIdentity();
return;
}
// This method is called everytime the screen is redisplayed. No
// optimization is performed as it recalculates the kernel every time.
void display()
{
anim += 0.01f;
// map OpenGL buffer object for writing from OpenCL
glFinish();
clEnqueueAcquireGLObjects(queue, 1, &p_vbocl, 0,0,0);
clEnqueueAcquireGLObjects(queue, 1, &c_vbocl, 0,0,0);
// Set arg 3 and queue the kernel
clSetKernelArg(kernel, 3, sizeof(float), &anim);
clEnqueueNDRangeKernel(queue, kernel, 2, NULL, global, NULL, 0, 0, 0);
// queue unmap buffer object
clEnqueueReleaseGLObjects(queue, 1, &c_vbocl, 0,0,0);
clEnqueueReleaseGLObjects(queue, 1, &p_vbocl, 0,0,0);
clFinish(queue);
// clear graphics
glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
// Apply the image transforms
glMatrixMode(GL_MODELVIEW);
glLoadIdentity();
glTranslatef(0.0, 0.0, translate_z);
glRotatef(rotate_x, 1.0, 0.0, 0.0);
glRotatef(rotate_y, 0.0, 1.0, 0.0);
//render from the p_vbo
glBindBuffer(GL_ARRAY_BUFFER, p_vbo);
glVertexPointer(4, GL_FLOAT, 0, 0);
glEnableClientState(GL_VERTEX_ARRAY);
// enable colors from the c_vbo
glBindBuffer(GL_ARRAY_BUFFER, c_vbo);
glColorPointer(4, GL_UNSIGNED_BYTE, 0, 0);
glEnableClientState(GL_COLOR_ARRAY);
// draw points, lines or triangles according to the user keyboard input
switch(drawMode) {
case GL_LINE_STRIP:
for(int i=0 ; i < mesh_width*mesh_height; i+= mesh_width)
glDrawArrays(GL_LINE_STRIP, i, mesh_width);
break;
case GL_TRIANGLE_FAN:
glPrimitiveRestartIndexNV(RestartIndex);
glEnableClientState(GL_PRIMITIVE_RESTART_NV);
glDrawElements(GL_TRIANGLE_FAN, qIndices_size, GL_UNSIGNED_INT, qIndices);
break;
default:
glDrawArrays(GL_POINTS, 0, mesh_width * mesh_height);
break;
}
// handle housekeeping and redisplay
glDisableClientState(GL_COLOR_ARRAY);
glDisableClientState(GL_VERTEX_ARRAY);
glutSwapBuffers();
glutPostRedisplay();
}
// Keyboard events handler for GLUT
void keyboard(unsigned char key, int x, int y)
{
switch(key) {
case('q') :
case(27) :
exit(0);
break;
case 'd':
case 'D':
switch(drawMode) {
case GL_POINTS: drawMode = GL_LINE_STRIP; drawStr = "line"; break;
case GL_LINE_STRIP: drawMode = GL_TRIANGLE_FAN; drawStr = "fan"; break;
default: drawMode=GL_POINTS; drawStr = "points"; break;
}
}
setTitle();
glutPostRedisplay();
}
// Mouse event handler for GLUT
void mouse(int button, int state, int x, int y)
{
if (state == GLUT_DOWN) {
mouse_buttons |= 1<<button;
} else if (state == GLUT_UP) {
mouse_buttons = 0;
}
mouse_old_x = x;
mouse_old_y = y;
glutPostRedisplay();
}
// Motion event handler for GLUT
void motion(int x, int y)
{
float dx, dy;
dx = x - mouse_old_x;
dy = y - mouse_old_y;
if (mouse_buttons & 1) {
rotate_x += dy * 0.2;
rotate_y += dx * 0.2;
} else if (mouse_buttons & 4) {
translate_z += dy * 0.01;
}
mouse_old_x = x;
mouse_old_y = y;
}
Following is the complete source code for sinewave.cl. Note that it operates on __global vectors, which also happen to be OpenGL buffers.
__kernel void sinewave(__global float4* pos, unsigned int width,
unsigned int height, float time, __global uchar4* color)
{
unsigned int x = get_global_id(0);
unsigned int y = get_global_id(1);
// calculate uv coordinates
float u = x / (float) width;
float v = y / (float) height;
u = u*2.0f - 1.0f;
v = v*2.0f - 1.0f;
// calculate simple sine wave pattern
float freq = 4.0f;
float w = sin(u*freq + time) * cos(v*freq + time) * 0.5f;
// write output vertex
pos[y*width+x] = (float4)(u, w, v, 1.0f);
color[y*width+x] = (uchar4) (
(uchar) 255.f *0.5f*(1.f+sin(w+x)),
(uchar) 255.f *0.5f*(1.f+sin(x)*cos(y)),
(uchar) 255.f *0.5f*(1.f+sin(w+time/10.f)), 0 );
}
Summary
GPGPU devices are powerful visualization as well as computational devices. Utilizing the OpenCL graphical interoperability capabilities is a fantastic way to speed visualization applications and utilize the full capabilities of these devices. Comparing the speed of visualization on the CPU vs. the GPU with the example code from this article is one way to see the difference.
Primitive restart is an excellent way to further speed visualization on GPU devices by keeping both command tokens and data on the GPU to avoid PCIe bus transfers. Further, it gives the developer tremendous flexibility in handling irregular meshes and other challenging visualization tasks. In addition to speed, primitive restart can also generate higher quality graphics.