*As seen in **GPU Science** and **CUDA Week in Review*

These posts are meant to inspire you to enter into the world of graphics processor programming.

__Posts in This Series__

__Full Source Code__

http://cudafytsp.codeplex.com/

__Introduction__

It may come as a surprise to many that we only use 10% of our brain capacity, regardless of what Snopes says. It is also true that we use only 10% of the computing power of our PCs. This is because our programs are not written to utilize our computer’s “Cudabral Cortex” (commonly called the GPGPU). It is time for us as developers to unleash 100% of our computer’s potential by tapping the GPGPU. By doing this, we can in some cases increase the performance of our programs by a factor of 10 or even 100.

Yet your hesitation to consider using the GPGPU may come from an innate fear of the learning curve involved. Yet the learning curve has recently been straightened out by a wonderful library called Cudafy. Cudafy allows methods written in C# to be executed on the GPGPU. This series of posts will meander down the path of creating a massively parallel program and explain some of the details along the way. After reading these posts, you will see the world as a massively parallel phenomena just begging to be modeled.

__The Problem__

Before diving into GPGPU programming, I want to first present the challenge we will be solving. It is called the Traveling Salesman Problem. Our program will be given the coordinates of several cities. Our task is to find the shortest path that connects all the cities. To keep the problem simple we will assume a flat earth, that 1 degree of latitude equals 1 degree of longitude, and that we will not constrain which city we start in or end up at. To further simplify the task, we will use a brute force method. That is, we will test the length of every permutation of cities to find the shortest one. So, if we are given 5 cities, we will be computing the length of 120 (5!) different paths; if we are given 10 cities, we will be testing 3628800 (10!) paths.

__The Method__

We will review code that solves this problem in three ways:

- A single-threaded application on the CPU,
- A multi-threaded application on a multi-core CPU, and
- A massively parallel application on the GPGPU.

As you examine the CPU-based solutions, you will no doubt notice that parameters are passed to functions that seem unneeded and that the algorithm could be optimized in many ways. These may be legitimate concerns on your part, especially if you are itching to challenge my inevitable benchmark comparisons or to delve into the world of heuristic algorithms. Yet, my goal is not to compare a performance-crafted CPU solution with a performance-crafted GPGPU solution. Instead, my goal is to show the same problem being solved in exactly the same way across all three solutions computing platforms. Where you take it from here is your business.

__PathFromRoutePermutation__

Here is a function that will be used in all three solutions. It calculates the sequence for given a permutation number. This function will be called for each permutation to retrieve the order in which the cities could be visited by our salesman.

{

for (var i = 0; i < cities; i++)

{

paths[pathIndex, i] = i;

}

for (int remaining = cities, divisor = permutations; remaining > 0; remaining--) // http://www.daniweb.com/software-development/cpp/code/274075/all-permutations-non-recursive

{

divisor /= remaining;

var index = (permutation / divisor) % remaining;

var swap = paths[pathIndex, index];

paths[pathIndex, index] = paths[pathIndex, remaining - 1];

paths[pathIndex, remaining - 1] = swap;

}

return 0;

}

This function takes a permutation number and fills in the paths array with the sequence of city indexes that make up the path. For example, for 5 cities, permutation 0, yields the path (1,2,3,4,0). Permutation 40 yields the path (4,0,3,2,1):

Ignore the fact that this function returns a 0 and that the parameter “paths” is provided as a two-dimensional array (instead of a single dimensional array).

__FindPathDistance__

Here is another function that will be used in all three solutions. It calculates the distance for given a permutation number. This function will be called for each permutation to retrieve the length that could be traveled by our salesman.

{

PathFromRoutePermutation(permutations, permutation, cities, paths, pathIndex);

float distance = 0;

var city = paths[pathIndex, 0];

var previousLatitude = latitudes[city];

var previousLongitude = longitudes[city];

for (var i = 1; i < cities; i++)

{

city = paths[pathIndex, i];

var latitude = latitudes[city];

var longitude = longitudes[city];

distance += (float)Math.Sqrt(Square(latitude - previousLatitude) * Square(longitude - previousLongitude));

previousLatitude = latitude;

previousLongitude = longitude;

}

return distance;

}

This function takes a permutation number, and then armed with the order in which the cities could be visited from PathFromRoutePermutation, computes (and returns) the total distance of the given permutation.

__The Single-Threaded CPU Solution__

Finally, here is the code that iterates through all the permutations and finds the one with the shortest distance as a single threaded CPU solution.

{

var bestDistance = float.MaxValue;

var bestPermutation = -1;

var stopWatch = Stopwatch.StartNew();

for (var permutation = 0; permutation < _permutations; permutation++)

{

var path = new int[1, Cities];

var distance = FindPathDistance(_permutations, permutation, Cities, _latitudes, _longitudes, path, 0);

if (distance < bestDistance)

{

bestDistance = distance;

bestPermutation = permutation;

}

}

return new Answer { Distance = bestDistance, Permutation = bestPermutation, Milliseconds = stopWatch.ElapsedMilliseconds };

}

__Next Step__

Our next step is to write a method that takes advantage of the code we have already written and runs it in parallel on a multi-core CPU.