Click here to Skip to main content
14,456,195 members

Cudafy Me: Part 1 of 4

Rate this:
5.00 (1 vote)
Please Sign up or sign in to vote.
5.00 (1 vote)
12 Oct 2012CPOL
These posts are meant to inspire you to enter into the world of graphics processor programming.

As seen in GPU Science and CUDA Week in Review

These posts are meant to inspire you to enter into the world of graphics processor programming.


Posts in This Series

  • Part 1 – Introduction & Single Threaded CPU Solution
  • Part 2 – Multi-Threaded CPU Solution
  • Part 3 – Massively Parallel GPGPU Solution
  • Part 4 – Discussion

    Full Source Code


    It may come as a surprise to many that we only use 10% of our brain capacity, regardless of what Snopes says. It is also true that we use only 10% of the computing power of our PCs. This is because our programs are not written to utilize our computer’s “Cudabral Cortex” (commonly called the GPGPU). It is time for us as developers to unleash 100% of our computer’s potential by tapping the GPGPU. By doing this, we can in some cases increase the performance of our programs by a factor of 10 or even 100.

    Yet your hesitation to consider using the GPGPU may come from an innate fear of the learning curve involved. Yet the learning curve has recently been straightened out by a wonderful library called Cudafy. Cudafy allows methods written in C# to be executed on the GPGPU. This series of posts will meander down the path of creating a massively parallel program and explain some of the details along the way. After reading these posts, you will see the world as a massively parallel phenomena just begging to be modeled.

    The Problem

    Before diving into GPGPU programming, I want to first present the challenge we will be solving. It is called the Traveling Salesman Problem. Our program will be given the coordinates of several cities. Our task is to find the shortest path that connects all the cities. To keep the problem simple we will assume a flat earth, that 1 degree of latitude equals 1 degree of longitude, and that we will not constrain which city we start in or end up at. To further simplify the task, we will use a brute force method. That is, we will test the length of every permutation of cities to find the shortest one. So, if we are given 5 cities, we will be computing the length of 120 (5!) different paths; if we are given 10 cities, we will be testing 3628800 (10!) paths.

    The Method

    We will review code that solves this problem in three ways:

    1. A single-threaded application on the CPU,
    2. A multi-threaded application on a multi-core CPU, and
    3. A massively parallel application on the GPGPU.

    As you examine the CPU-based solutions, you will no doubt notice that parameters are passed to functions that seem unneeded and that the algorithm could be optimized in many ways. These may be legitimate concerns on your part, especially if you are itching to challenge my inevitable benchmark comparisons or to delve into the world of heuristic algorithms. Yet, my goal is not to compare a performance-crafted CPU solution with a performance-crafted GPGPU solution. Instead, my goal is to show the same problem being solved in exactly the same way across all three solutions computing platforms. Where you take it from here is your business.


    Here is a function that will be used in all three solutions. It calculates the sequence for given a permutation number. This function will be called for each permutation to retrieve the order in which the cities could be visited by our salesman.

    public static float PathFromRoutePermutation(int permutations, int permutation, int cities, int[,] paths, int pathIndex)
        for (var i = 0; i < cities; i++)
            paths[pathIndex, i] = i;

        for (int remaining = cities, divisor = permutations; remaining > 0; remaining--) //
            divisor /= remaining;
            var index = (permutation / divisor) % remaining;

            var swap = paths[pathIndex, index];
            paths[pathIndex, index] = paths[pathIndex, remaining - 1];
            paths[pathIndex, remaining - 1] = swap;

        return 0;

    This function takes a permutation number and fills in the paths array with the sequence of city indexes that make up the path. For example, for 5 cities, permutation 0, yields the path (1,2,3,4,0). Permutation 40 yields the path (4,0,3,2,1):

    Untitled     Untitled

    Ignore the fact that this function returns a 0 and that the parameter “paths” is provided as a two-dimensional array (instead of a single dimensional array).


    Here is another function that will be used in all three solutions. It calculates the distance for given a permutation number. This function will be called for each permutation to retrieve the length that could be traveled by our salesman.

    public static float FindPathDistance(int permutations, int permutation, int cities, float[] latitudes, float[] longitudes, int[,] paths, int pathIndex)
        PathFromRoutePermutation(permutations, permutation, cities, paths, pathIndex);

        float distance = 0;
        var city = paths[pathIndex, 0];
        var previousLatitude = latitudes[city];
        var previousLongitude = longitudes[city];

        for (var i = 1; i < cities; i++)
            city = paths[pathIndex, i];
            var latitude = latitudes[city];
            var longitude = longitudes[city];
            distance += (float)Math.Sqrt(Square(latitude - previousLatitude) * Square(longitude - previousLongitude));
            previousLatitude = latitude;
            previousLongitude = longitude;

        return distance;

    This function takes a permutation number, and then armed with the order in which the cities could be visited from PathFromRoutePermutation, computes (and returns) the total distance of the given permutation.

    The Single-Threaded CPU Solution

    Finally, here is the code that iterates through all the permutations and finds the one with the shortest distance as a single threaded CPU solution.

    private Answer CpuTsp()
        var bestDistance = float.MaxValue;
        var bestPermutation = -1;
        var stopWatch = Stopwatch.StartNew();

        for (var permutation = 0; permutation < _permutations; permutation++)
            var path = new int[1, Cities];
            var distance = FindPathDistance(_permutations, permutation, Cities, _latitudes, _longitudes, path, 0);
            if (distance < bestDistance)
                bestDistance = distance;
                bestPermutation = permutation;

        return new Answer { Distance = bestDistance, Permutation = bestPermutation, Milliseconds = stopWatch.ElapsedMilliseconds };

    Next Step

    Our next step is to write a method that takes advantage of the code we have already written and runs it in parallel on a multi-core CPU.

    >>> Part 2 – Multi-Threaded CPU Solution

  • License

    This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


    About the Author

    John Michael Hauck
    Software Developer (Senior) LECO Corporation
    United States United States
    John Hauck has been developing software professionally since 1981, and focused on Windows-based development since 1988. For the past 17 years John has been working at LECO, a scientific laboratory instrument company, where he manages software development. John also served as the manager of software development at Zenith Data Systems, as the Vice President of software development at TechSmith, as the lead medical records developer at Instrument Makar, as the MSU student who developed the time and attendance system for Dart container, and as the high school kid who wrote the manufacturing control system at Wohlert. John loves the Lord, his wife, their three kids, and sailing on Lake Michigan.

    Comments and Discussions

    -- There are no messages in this forum --
    Technical Blog
    Posted 12 Oct 2012

    Tagged as


    10 bookmarked