Click here to Skip to main content
15,893,266 members
Articles / Programming Languages / C++
Article

How to Debug Your Linux C/C++ Application Faster with Time Travel Debugging

13 Nov 2020CPOL4 min read 6.2K   2  
This walkthrough illustrates how time travel debugging works.

This article is in the Product Showcase section for our sponsors at CodeProject. These articles are intended to provide you with information on products and services that we consider useful and of value to developers.

Do you feel you’re spending far too much time debugging?

We seem to insist on debugging using logging, printf, core dump analysis, or gdb; but these traditional techniques rely heavily on guesswork and joining the dots in our head. The whole process of debugging remains tedious and inefficient: we have to rerun the program repeatedly, set new breakpoints, slowly step forwards through the code - and we’re stuck in that loop until we get the piece of information we need.

What if instead of stepping forward through code... we could simply travel backwards to root cause the problem?

Thankfully, time travel debugging offers a completely new workflow from traditional debugging - one that allows us to fix software bugs much faster and get to the root cause of bugs with 100% certainty.

This walkthrough illustrates how time travel debugging works.

This sample application (cache calculate) crashes with a vague error message.

Image 1

Let’s open the application with UDB to analyze its execution and diagnose this failure. UDB is Undo’s extended version of GDB which supports time travel debugging.

Let’s begin with start to set a breakpoint at main() and run to that point.

Image 2

This is fairly standard GDB usage, so let’s continue now to get to the point of failure.

Image 3

As with normal GDB, we can get a backtrace to see where in the application it failed (or in this case where the assert() predicate failed).

Image 4

With normal GDB, we could inspect the stack frames at this point; but with UDB, we are able to run the execution of the application in reverse.

First, we can use the reverse-finish command to reverse up the call stack to the assert().

Note, in UDB - like in GDB - pressing enter on an empty line repeats the previous command, so here it took 4 reverse-finish commands to get back to main().

Image 5

Back in main(), we can inspect the locals to confirm that the erroneous value is in sqroot_cache (which should be 15 as the integer square root of 255).

Image 6

So far, this has been essentially achievable with GDB, looking up the call stack and examining the frames; but with UDB, the reverse execution allows us to do more without having to set many breakpoints and iteratively re-running the application (potentially many times to get to the root cause).

We see that line 73 is where the sqroot_cache variable is set, so we can use the reverse-next and reverse-step commands to reverse execute further and reverse-step into the cache_calculate() function.

This has allowed us to get directly to the point in execution where this function returned the incorrect value, which is the 2011th time it was executed in this application. It might have taken considerably more effort to establish and set a suitable breakpoint to get there the traditional way, especially if this was a non-deterministic error case.

Within the function, we can see that the incorrect return value is coming from the cache[] array, which therefore was set with the incorrect value at some point in the past.

Again, traditional debuggers would require a number of iterations and steps to try to track down how this array was corrupted. With UDB, this is far simpler. We can set a watchpoint on this value in the array, and then use another reverse command, this time reverse-continue, to continue running the application backwards until the watchpoint is hit.

Image 7

As mentioned, here we have set the watchpoint, so let’s do the reverse-continue to run backwards until this array element is set to the value 0.

Image 8

It’s that simple: reverse-continue has run backwards and got us to the point where the incorrect value was written into the cache[] array.

Now we can look at the locals variables (and can see from the breakpoint that the parameter for the cache_calculate() function (the number variable) was 0.

Image 9

Finally we have enough to root cause this failure. The sqroot2 variable is written to at line 45, and is being set to sqrt(number2), which is the square root of minus 1 (which is not representable in integers).

This application failure ultimately happened because the range of values the for loop iterates over to populate in the cache are not checked. This led to the cache_calculate() function trying to populate the cache[] array with the square root of minus one.

Note: -1 in an unsigned char value wraps around, hence the number element is set to 255.

Want to try this on your own program? Undo offers a free trial of UDB on their website.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
United Kingdom United Kingdom
Chris Croft-White is a Pre-Sales Consultant Engineer. Armed with a Computer Science degree from the University of Cambridge, UK, he acquired experience as a Field Application Engineer, a Security Engineer, and a Sales Engineer at a range of technology firms. He is particularly adept at problem-solving and resolving customer issues by getting to the root cause of pesky bugs quickly.

Comments and Discussions

 
-- There are no messages in this forum --