Click here to Skip to main content
14,492,143 members

Cudafy Me: Part 4 of 4

Rate this:
4.00 (1 vote)
Please Sign up or sign in to vote.
4.00 (1 vote)
12 Oct 2012CPOL
These posts are meant to inspire you to enter into the world of graphics processor programming.

These posts are meant to inspire you to enter into the world of graphics processor programming.


Posts in this Series

Full Source Code


Computing an 11 city solution with 40 million permutations on a Dell T7500 (Xeon X5690 @ 3.47GHz) with an NVidia Tesla C2050:

  1. Single CPU: 47.340 seconds
  2. Multi threaded: 7.965 seconds
  3. GPGPU: 0.117 seconds
    Note: 12 cities - 480 million permutations – it took 2.206 seconds on the GPGPU

Computing an 11 city solution with 40 million permutations on a 2009 HP Pavilion dv7 (i7 CPU Q 720 @ 1.60GHz) with an NVidia GeForce GT 230M (48 CUDA cores):

  1. Single CPU: 162.535 seconds
  2. Multi threaded: 36.240 seconds
  3. GPGPU: 2.181 seconds

Computing a 10 city solution with 3.6 million permutations on a Dell T3400 (Core 2 Duo E6850 @ 3.00GHz) with an NVidia Quadro NVS 290:

  1. Single CPU: 5.707 seconds
  2. Multi threaded: 3.097 seconds
  3. GPGPU: 0.888 seconds


There are a few things that took me a while to figure out that might be helpful to others.

  • I used VS2010 Pro. I heard you can use the express editions if you install both the C# and C++ versions. It seems Microsoft is making it harder to find the VS2010 though.
  • You cannot use VS2012 yet, and probably never be able to use the express edition of VS2012 since I doubt express will support desktop applications.
  • It really is not that hard to add CL.EXE to your path. Just type “environment” into the Windows 7 start box that states “search programs and files” and pick the “Edits system and environment variables”.
  • If your video card is older, try this:
  • If the GPGPU call takes more than a few seconds, then the video driver resets. I think you can tweak the timeout somewhere.
  • Things go a bit wonky if you call functions in your Cudafied code that return void.
  • It is a bit faster to call the version of launch that takes the function to call as a string.
  • I think calling CopyFromDevice forces a call to Synchronize first.
  • You cannot allocate arrays in Cudafied code, so get used to calling AllocateShared for all threads.
  • Recursion is not supported in Cudafied code.

Can We Go Faster? Oh Yeah!

The GPGPU algorithm described in this series is an introductory primer meant to inspire you to enter into the world of graphics processor programming. For those who are looking for phonemically fast solutions to the traveling salesman problem, you might consider starting with the link from GPU Science below. The author Dr. Kamil Rocki of the University of Tokyo taunts us: "Anyone faster out there?"

Essential Resources


This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


About the Author

John Michael Hauck
Software Developer (Senior) LECO Corporation
United States United States
John Hauck has been developing software professionally since 1981, and focused on Windows-based development since 1988. For the past 17 years John has been working at LECO, a scientific laboratory instrument company, where he manages software development. John also served as the manager of software development at Zenith Data Systems, as the Vice President of software development at TechSmith, as the lead medical records developer at Instrument Makar, as the MSU student who developed the time and attendance system for Dart container, and as the high school kid who wrote the manufacturing control system at Wohlert. John loves the Lord, his wife, their three kids, and sailing on Lake Michigan.

Comments and Discussions

Generaljhgjk Pin
designdot16-Oct-12 23:08
Memberdesigndot16-Oct-12 23:08 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Technical Blog
Posted 12 Oct 2012

Tagged as


7 bookmarked