Click here to Skip to main content
Click here to Skip to main content

Book Review - Programming Massively Parallel Processors (second edition) by Kirk and Hwu

By , 13 May 2013
 

“Programming Massively Parallel Processors (second edition)” by Kirk and Hwu is a very good second book for those interested in getting started with CUDA. A first must-read is “CUDA by Example: An Introduction to General-Purpose GPU Programming” by Jason Sanders. After reading all of Sanders work, feel free to jump right to chapters 8 and 9 of this Kirk and Hwu publication.

pmpp

In chapter 8, the authors do a nice job of explaining how to write an efficient convolution algorithm that is useful for smoothing and sharpening data sets. Their explanation of how shared memory can play a key role in improving performance is well written. They also handle the issue of “halo” data very well. Benchmark data would have served as a nice conclusion to this chapter.

In chapter 9, the authors provide the best description of the Prefix Sum algorithm I have seen to date. It describes the problem being solved in terms that I can easily relate to - food. They write, “We can illustrate the applications of inclusive scan operations using an example of cutting sausage for a group of people.” They first describe a simple algorithm, then a “work-efficient” algorithm, and then an extension for larger data sets. What puzzles me here is that the authors seem fixated on solving the problem with the least number of total operations (across all threads) as opposed to the least number of operations per thread. They do not mention that the “work-efficient” algorithm requires almost twice as many more operations for the longest-path thread than the simple algorithm. Actual performance benchmarks showing a net throughput gain would be required for a skeptical reader.

Now before moving forward, let's back up a bit. Even though we have already read CUDA by Example, it is worth reading chapter 6… at least the portion regarding the reduction algorithm starting at the top of page 128. The discussion is rather well written and insightful. Now, onward.

In chapter 13, the authors list the tree-fold goals of parallel computing: solve a given problem in less time, solve bigger problems in the same amount of time, and achieve better solutions for a given problem in a given amount of time. These all make sense, but have not been the reasons I have witnessed for the transition to parallel computing. I believe the biggest motivation for utilizing CUDA is to solve problems that would otherwise be unsolvable. For example, the rate of data generated by many scientific instruments could simply not be processed without a massively parallel computing solution. In other words, CUDA makes things possible.

Also in Chapter 13, they bring up a very important point. Solving problems with thousands of threads requires that software developers think differently. To think of the resources of a GPU as a means by which you can make a parallel-for-loop run faster completely misses the point – and the opportunity the GPU provides. These three chapters then make the book worthwhile.

The chapters on OpenCL, OpenACC, and AMP seem a bit misplaced in a book like this. The author’s coverage of these topics is a bit too superficial to make them useful for serious developers. On page 402, they list the various data types that AMP supports. It would have made sense for the authors to point out that AMP does not support byte and short. When processing large data sets of these types, AMP introduces serious performance penalties. Also, the author’s coverage of OpenCL fails to mention that it does not support overlapped kernel and data transfer operations, which can also have serious performance implications.

This then brings me to my biggest concern about this book. There is very little attention paid to the technique of overlapping data transfer operations and with kernel execution. I did happen upon a discussion of streaming in chapter 19, “Programming a Heterogeneous Computing Cluster.” However, the context of the material is with respect to MPI, and those not interested in MPI might easily miss it. Because overlapping I/O with kernel operations can easily double throughput, I believe this topic deserves at least one full chapter. Perhaps in the next edition, we can insert it between chapters 8 and 9? Oh, and let’s add “Overlapped I/O”, “Concurrent” and “Streams” as first class citizens in the index. While we are editing the index, let’s just drop the entry for “Apple’s iPhone Interfaces”. Seriously.

In summary, I believe this is a very helpful book and well written. I would consider it a good resource for CUDA developers. It is not, however, a must-have CUDA resource.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

John Michael Hauck
Software Developer (Senior) LECO Corporation
United States United States
Member
John Hauck has been developing software professionally since 1981, and focused on Windows-based development since 1988. For the past 17 years John has been working at LECO, a scientific laboratory instrument company, where he manages software development. John also served as the manager of software development at Zenith Data Systems, as the Vice President of software development at TechSmith, as the lead medical records developer at Instrument Makar, as the MSU student who developed the time and attendance system for Dart container, and as the high school kid who wrote the manufacturing control system at Wohlert. John loves the Lord, his wife, their three kids, and sailing on Lake Michigan.

Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
Hint: For improved responsiveness ensure Javascript is enabled and choose 'Normal' from the Layout dropdown and hit 'Update'.
You must Sign In to use this message board.
Search this forum  
    Spacing  Noise  Layout  Per page   
-- There are no messages in this forum --
Permalink | Advertise | Privacy | Mobile
Web02 | 2.6.130513.1 | Last Updated 13 May 2013
Article Copyright 2013 by John Michael Hauck
Everything else Copyright © CodeProject, 1999-2013
Terms of Use
Layout: fixed | fluid