Click here to Skip to main content
Click here to Skip to main content

Tagged as

Go to top

Measuring the benefits of data-oriented design

, 29 Jan 2014
Rate this:
Please Sign up or sign in to vote.
Short summary: I created a microbenchmark originally intended to compare efficiency of virtual calls on different processor architectures, which yielded some unexpected by me, yet reproducible, results. The link to source code is at the very bottom.The benchmarkOne day I wanted to measure how much o

Short summary: I created a microbenchmark originally intended to compare efficiency of virtual calls on different processor architectures, which yielded some unexpected by me, yet reproducible, results. The link to source code is at the very bottom.

The Benchmark

One day I wanted to measure how much one can gain by replacing an object-oriented hierarchy (with virtual methods) with a collection of "components", where each component is responsible for a particular piece of object.

For the object-oriented part, I created 8 classes (O1-O8) derived from each other, with a virtual Update() method. Parent's Update() method is also called, so O8::Update() calls O7::Update() and so on. Care has been taken to prevent these calls from being inlined - I moved their implementations to separate compilation units, so they are not inlined unless globally optimizing compiler is used.

For the data-oriented part, unrelated 8 structs (O1Comp-O8Comp) have been created, with a non-virtual Update() method. This method is also prevented from being inlined by moving its implementation to separate unit.  

To make this benchmark less memory performance oriented, Update() methods of object-oriented and data-oriented classes do some work. The work done by OO and DOD code is the same, because it actually gets done in inlined functions of separate Payload classes. Amount of work increases non-linearly, so O8Comp class does 128 times more work than O1Comp one (this seemed more natural for me).

The objects are managed the following way:

  • For OO part, each object is created separately on the heap (i.e. by calling new). Array of pointers to base object is then traversed and Update() (a virtual method) is called on each one.
  • For DOD part, separate linear arrays for each of component are created (number of elements is adjusted to take into account that O2 class has O2Comp and O1Comp and O8 has all 8 components). Arrays are processed one by one, with non-virtual Update() function called on each element of it.
Only loops which update the objects are measured, creation/deletion time is not.

Benchmark has some configurable options, like counting calls to ensure the same work is done, prefetching the virtual table, unrolling the loops, and - most importantly as it turns out - varying the workload.

The Results

With zero workload, gains are spectacular, with DOD code being about 3 times as fast as OO one. However, once I began increasing the amount of work done by Update(), the gains became smaller - and for some cases OO code is actually faster! I cannot explain that - I profiled one of such pathological cases, and it is probably related to the fact that processor may prefer longer (although separated with unconditional calls) code path when calling O8::Update() than doing the same work in separate 8 loops for each of O1Comp...O8Comp

The general rule seems to be - the more work is done in an Update() method, the less you gain from OO to DOD conversion (at least when using gcc). To be more exact, gains seem to become smaller (or even become negative) if

  • a) the virtual table is prefetched before making a call in OO code
  • and
  • b) the 'payload' loops are unrolled.

I ran this on several processors I own, including PowerPC 970 (G5), Cell BE (PS3), Athlon 64, Pentium IV, Atom and Cortex-A8 (Nokia N900), in 8 configurations: light workload, light workload with unrolled loops, light workload with prefetch, light workload with prefetch and unrolled loops and the same 4 configurations for heavy workload. Actual tables of data are too boring to show, so here are the summarized graphs:

This one shows gains of going data-oriented way averaged among all those 8 configurations. As you can see, on average you still are better off if you decide to drop object-orientedness:

This is maximum amount of speed up (in percents) this benchmark gained when doing things data-oriented way:

This is minimum amount of speed up (actually, often a slowdown) when doing things data-oriented way:

How to test it yourself

The resulting microbenchmark can be found in this github.com repository. For Unix machines (including Mac OS X), just do this:

git clone git://github.com/RCL/dodbench.git
cd dodbench
make benchmark
(use gmake if on FreeBSD)

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

RCL_SPD

Russian Federation Russian Federation
No Biography provided

Comments and Discussions

 
-- There are no messages in this forum --
| Advertise | Privacy | Mobile
Web02 | 2.8.140905.1 | Last Updated 29 Jan 2014
Article Copyright 2014 by RCL_SPD
Everything else Copyright © CodeProject, 1999-2014
Terms of Service
Layout: fixed | fluid