Systematic Debugging

Tiha Juhasz

4.83/5 (4 votes)

Feb 22, 2017

MIT

7 min read

11746

How to apply a systematic approach to debugging

Introduction

Novice programmers have surely felt the confusion and frustration of looking at a huge legacy framework or a simple coding course task, trying to find that elusive bug. Senior colleagues come to the rescue and solve it in minutes, but when asked how exactly they did it, they just shrug their shoulders and can’t really give a clear strategy for the novice to follow the next time. Programming in general is a highly systematic activity with coding standards, design patterns, style guides. Thus debugging should not be a magic skill of the experienced or the sequence of trial and error attempts of the novice, but an activity based on clearly defined strategies with best practices, study examples and such. We will try to make an attempt in this direction here.

To enjoy this article, the reader should be familiar with the basics of programming and should have felt that confusion and frustration of facing real-life bugs. Experienced programmers may gain from reading this by becoming more conscious and more efficient in debugging. Code snippets will be given in C and C++ but tailored to be easy to understand for programmers of other languages too. Code examples will not be simplified for presentation purposes, because real-life bugs are always hidden in complex code; and the general complexity of code contributes to the difficulty of bug fixing.

For more details check out part two when I will try to apply this approach when debugging using the call stack.

What is a bug?

When a program is running, namely it is executing a sequence of instructions, at any given moment it holds in memory a certain number of objects and variables with certain values and functions having called each other (the call stack). We call this the program state. The state is the actual values and content the program holds at any execution step. Program instructions get a program state and create a new program state after being executed for the next instruction. Instructions expect a certain program state to work properly. In the example in Listing 1, the expected state of the instruction in line 3 is that both operands hold some valid numerical values.

Listing 1

int a = 1;
string b = "2";
int c = a + b;

Because ‘a’ is a string, strange things will happen. The program may crash or cast the type of ‘a’ or ‘b’ to some other data type to execute the instruction. Probably not what you wanted. If ‘a’ holds a string, the instruction in line 3 encounters an unexpected state. There is a discrepancy between the expected state and the actual or unexpected state of line 3 and a bug occurs. Most bugs are caused by such conflict of state. Syntax errors are obviously not included in this definition. Syntax errors are not bugs, but errors.

Note: In the comments please submit types of bugs or actual examples of bugs which are not covered by this definition because they are not caused by a conflict of state.

We have a bug!

There are several ways to find out about a bug. You can receive a bug report, find that your application has crashed unexpectedly, or just see some funny things on the screen. Your task is always the same: find the bug, understand it and correct it. Program logs (log files, database logs, console output) are best used as the first sources of information to get familiar with the bug. Most programs display the line of code where something unbearable happened to them. If you have no logs and no traces about where the bug has occurred, insert code which creates the logs and traces, primarily by outputting data values and messages about which instructions were executed to the console. The information you receive are the manifestations of the bug, not the bug itself. Remember, the bug is a conflict of state somewhere in the program. First you need to gather valuable information about it.

Locate and understand the bug!

The first two steps of finding and understanding the bug actually rely on each other and are best done together. The programmer must find the part of code where there was a conflict between the expected state and the actual state received from earlier process flow. If he/she can:

pinpoint the location of code where the conflict of state occurred;
and what the expected state and the unexpected state were;

the problem is identified, well described and can be dealt with with ease in further steps.

In Listing 2 there is a real-life bug:

std::vector<unsigned char> bytes = {0x1a, 0x50, 0x60, 0x70};
std::cout << "Byte vector contains (hex values): ";
for (std::vector<unsigned char>::iterator it = bytes.begin(); it != bytes.end(); ++it)
        std::cout << std::setw(2) << std::setfill('0') << std::hex << *it << " ";

Listing 3

Program output:

Byte vector contains (hex values): 0 0P 0` 0p

In the code snippet there is a vector of bytes (unsigned char) defined and initialized with some constant values given in hexadecimal format. The instructions in line 3 and 4 are supposed to print out the values stored in this vector to the console output (std::cout) in hex format (std::hex). There is an issue, however. The program prints out character literals instead of hexadecimal byte values. What could be wrong?

First of all the characters printed out instead of numeric hex data are not the bug in itself, just the manifestation of the bug. The screen is not a program instruction so it cannot have a conflict of state. Only instructions have a conflict of state. We need to dig a little deeper to locate the instruction where the conflict of state occurred. So the code leading up to the characters shown on the console is traced back. The vector of unsigned chars is defined properly, the for cycle with the vector iterator is textbook, the standard output stream std::cout is fine, setw(2) sets the field width for output operations, setfill(‘0’) sets the fill character to be used to fill up the field. After systematic verification of the instructions backtracking from the place where the issue showed up, we get to std::hex. According to the C++ documentation on std::hex:

std::hex sets the basefield format flag for the str stream to hex.

When basefield is set to hex, integer values inserted into the stream are expressed in hexadecimal base (i.e., radix 16). (Original page see here.)

So when std::hex sets the basefield flag to hex, integer values received by the stream are printed in hex format. Does the stream receive integer values? Well... NO! We have a vector of unsigned chars; and a char is a character, not an integer. There it is the conflict of state in the stream output instruction.

Dig the code!

After clearly identifying the location and specifics of the conflict of state, the programmer has two options. He/she can change the code where the bug occurred because the unexpected state is actually desirable or the code leading to it cannot be changed (find a workaround). In this first case the instruction where the bug occurred is inconsistent with the program flow and needs to be rewritten.

This is not the case, however, in Listing 2. Here, the programmer needs to start digging the code to find out what part of the program flow leading to the instruction caused the unexpected state to happen. After clearly identifying the conflict of state and finding its location, this third step is generally easier to perform.

In Listing 2, the stream output receives unsigned chars and treats them as characters instead of integers, so no hex conversion is performed. Feeding characters to the stream output instead of integers is therefore the cause of the conflict of state. The characters need to be converted to integers because this is the expected state of the stream output. Thus the systematic analysis of the bug leads to the perfect and most simple solution possible, which deals with the problem and nothing else. Line 4 of Listing 2 is replaced with the following code:

Listing 4

std::cout << std::setw(2) << std::setfill('0') << std::hex << (int)*it << " ";

The instruction (int) forces a conversion to integer and the bug is solved.

Be systematic!

Final thoughts on the process of debugging: proceed step-by-step; check and eliminate possible causes until you find the conflict of state. Find the best documentation; and read it to make informed decisions about where the bug could hide. Never guess, just think logically and proceed gradually. Thinking is an energy-hungry process, but still the most efficient method of solving the bug and getting on with life. In the next article we will present how to apply the theoretical approach outlined here (bug = conflict of state) for debugging using the call stack.

What's next?

In part 2 I will give more practical details on how to use this debugging approach when using the call stack for debugging.