Table-driven Approach

Peter Kankowski

Rate me:

4.67/5 (22 votes)

29 Sep 2009Zlib5 min read

74.6K

How to make your code shorter and easier to maintain by using arrays

Introduction

You can often avoid repeating code by placing distinct parts of it into an array:

C++

// A text adventure game
if(strcmpi(command, "north") == 0) {
    if(cur_location->north)
        GoToLocation(cur_location->north);
    else
        Print("Cannot go there");
}
else if(strcmpi(command, "east") == 0) {
    if(cur_location->east)
        GoToLocation(cur_location->east);
    else
        Print("Cannot go there");
}
else if(strcmpi(command, "south") == 0) {
    if(cur_location->south)
        GoToLocation(cur_location->south);
    else
        Print("Cannot go there");
}
else if(strcmpi(command, "west") == 0) {
    if(cur_location->west)
        GoToLocation(cur_location->west);
    else
        Print("Cannot go there");
}

A shorter equivalent:

C++

enum SIDE {SIDE_NORTH = 0, SIDE_EAST, SIDE_SOUTH, SIDE_WEST};
struct COMMAND {
   const char * name;
   SIDE side;
};
static const COMMAND commands[] = {
   {"north", SIDE_NORTH},
   {"east", SIDE_EAST},
   {"south", SIDE_SOUTH},
   {"west", SIDE_WEST},
};
for(int i = 0; i < NUM_OF(commands); i++)
    if(strcmpi(commands[i].name, command) == 0) {
        SIDE d = commands[i].side;
        if(cur_location->sides[d])
            GoToLocation(cur_location->sides[d]);
        else
            Print("Cannot go there");
    }

The second version is not only smaller, but also more abstract and hence easier to maintain. To implement short command names, you have to modify the code in just one place. If more commands will be added, you can easily replace the array with a hash table.

Such arrays are indispensable in many programs: from compilers and disassemblers to tax calculators and web applications. This article highlights some examples of their usage.

Price Calculation

In Refactoring, Martin Fowler discusses the following code:

C++

// calculating the price for renting a movie
double result = 0;
switch(movieType) {
   case Movie.REGULAR:
     result += 2;
     if(daysRented > 2)
        result += (daysRented - 2) * 1.5;
     break;
 
   case Movie.NEW_RELEASE:
     result += daysRented * 3;
     break;
 
   case Movie.CHILDRENS:
     result += 1.5;
     if(daysRented > 3)
        result += (daysRented - 3) * 1.5;
     break;
}

In order to improve it, he replaces the switch statement with inheritance (one class for calculating the regular price, one for new releases, and another one for children's movies).

A simpler solution uses arrays:

C++

enum MovieType {Regular = 0, NewRelease = 1, Childrens = 2};
 
                             // Regular   NewRelease   Childrens
const double initialCharge[] = {2,             0,        1.5};
const double initialDays[] =   {2,             0,          3};
const double multiplier[] =    {1.5,           3,        1.5};
 
double price = initialCharge[movie_type];
if(daysRented > initialDays[movie_type])
    price += (daysRented - initialDays[movie_type]) * multiplier[movie_type];

It's evident that the same formula is used in all three cases, so you don't have to make a complicated class hierarchy for them. The table-driven code is faster, much shorter, and easier to maintain.

Generally, when you think in terms of the only paradigm (in this case, OOP), you will often miss a better solution. By learning more different approaches (including table-driven approach), you can extend your mental toolbox and solve more programming problems with less lines of code.

Shipping Rates and Progressive Taxation

Another situation when the tables are useful is a simple or a piecewise linear function.

Imagine you are developing a web shop script. The total price shown to a user should include a shipping rate, which depends on the weight of items and the shipping distance. For example, the rates of Russian postal service are:

Distance	Price for each 0.5 kg
less than 601 km	27.60 rubles
601..2000 km	35.00 rubles
2001..5000 km	39.60 rubles
5001..8000 km	46.35 rubles
more than 8000 km	50.75 rubles

The dependence between distance and price cannot be represented with a formula, so the program has to search in an array to find the applicable rate:

C++

def shipping_fee(weight, distance, value):
# weight in grams, distance in kilometers, declared value in rubles
    rates = [
        # max distance, price
        ( 600, 27.60),
        (2000, 35.00),
        (5000, 39.60),
        (8000, 46.35)
    ]
    # Find the applicable rate for this distance
    for r in rates:
        if distance <= r[0]:
            rate = r[1]
            break
    else:
        rate = 50.75
    
    # For each 500 grams of the parcel
    fee = rate * math.ceil(weight / 500)
    
    # For each ruble of the declared value
    fee += 0.03 * math.ceil(value)
    
    # Apply VAT (18%)
    return round(fee * 1.18, 2)

The array is usually small, so linear scanning works faster than binary search.

For example, sending The practice of programming from Moscow to Khabarovsk (0.3 kg, more than 8000 km, 214 rubles) should cost 67.46 rubles (around $2.20).

Progressive tax calculation is a similar, but slightly more complex example:

C++

def get_tax(income):
    if income < 0:
        return income
    MAX_INCOME = 0; RATE = 1 # array indexes
    tax_brackets = [
        (  8350,  0.10),
        ( 33950,  0.15),
        ( 82250,  0.25),
        (171550,  0.28),
        (372950,  0.33)
    ]
    tax = 0
    prev_bracket = 0
    for bracket in tax_brackets:
        tax += ( min(bracket[MAX_INCOME], income) - prev_bracket ) * bracket[RATE]
        if income <= bracket[0]:
            break
        prev_bracket = bracket[MAX_INCOME]
    else:
        tax += (income - prev_bracket) * 0.35
    return tax

Disclaimer: These functions are only meant to illustrate the table-driven technique; the author doesn't claim they calculate the shipping rate in Russia and the income tax in USA correctly. For the latter, please use the calculator from IRS site, which takes into account various deductions.

Assemblers, Disassemblers, and Code Generators

Writing an assembler or disassembler is a good exercise in table-driven design. Typically, you have arrays for one-byte and two-byte instructions. Mnemonic and operand types of each instruction are stored in these arrays. Gary Shute provides a small example of RISC assembler that uses this technique.

x86 code is harder to assemble because of its irregular and complicated structure. The best assemblers/disassemblers use additional tables to deal with opcode extensions (see, for example, CADT by Ms-Rem). The Table Driven Assembler by Jan Nikitenko shows that it's possible to write a generalized assembler, which can be retargeted for any ISA.

Character Properties

The ctype.h header file in C includes macros for character classification (isdigit, isupper, isalnum, etc.) Their comparison-based implementation is clumsy and ineffective:

C++

int isalnum(int ch) {
    return 'a' <= ch && ch <= 'z' ||
           'A' <= ch && ch <= 'Z' ||
           '0' <= ch && ch <= '9';
}

If there are several character properties, you can assign a bit to each of them, and then make a 256-element array storing the properties for each character. Most C run-time libraries use this method for ctype.h macros:

C++

static const unsigned char properties[] = {
      0,  0,  0,  0,  0,  0,  0,  0,  0, 16, 16, 16, 16, 16,  0,  0,
      0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
     16,160,160,160,160,160,160,160,160,160,160,160,160,160,160,160,
    204,204,204,204,204,204,204,204,204,204,160,160,160,160,160,160,
    160,202,202,202,202,202,202,138,138,138,138,138,138,138,138,138,
    138,138,138,138,138,138,138,138,138,138,138,160,160,160,160,160,
    160,201,201,201,201,201,201,137,137,137,137,137,137,137,137,137,
    137,137,137,137,137,137,137,137,137,137,137,160,160,160,160,  0,
      0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
      0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
      0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
      0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
      0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
      0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
      0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
      0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
};
#define islower(ch)  (properties[ch] & 0x01)
#define isupper(ch)  (properties[ch] & 0x02)
#define isdigit(ch)  (properties[ch] & 0x04)
#define isalnum(ch)  (properties[ch] & 0x08)
#define isspace(ch)  (properties[ch] & 0x10)
#define ispunct(ch)  (properties[ch] & 0x20)
#define isxdigit(ch) (properties[ch] & 0x40)
#define isgraph(ch)  (properties[ch] & 0x80)

If you have only one property to store, use a bit array, which is smaller (256 bits = 32 bytes), but needs extra operations to retrieve the value:

C++

inline int isalnum(int ch) {
    static const unsigned int alnum[] = {
        0x0, 0x3ff0000, 0x7fffffe, 0x7fffffe, 0x0, 0x0, 0x0, 0x0,
    };
    return (alnum[ch >> 5]) & (1 << (ch & 31));
}

The alnum array has ones in the positions corresponding to alphanumeric characters, and zeros in other positions.

The range of matching characters is often limited. If you have a range of 32 or less characters, you can store the bits in an integer constant:

C++

// True for a, e, i, o, u, and y
inline int isvowel(int ch) {
    int x = ch - 'a';
    return (0x1104111 >> x) & ((x & 0xffffffe0) == 0);
}

When x is negative or greater than 31, the result of shift by x is undefined in C language. An x86 processor uses the lower 5 bits of x; some other processors always return 0 for the big shift amounts. For portability, we have to do an extra check: (x & 0xffffffe0) == 0. Both MSVC++ and MinGW generate a branchless x86 code for this condition (using SBB or SETE instruction). The method is not as fast as the table-driven approach, but the code is smaller.

The arrays for these methods can be generated with a script. For Unicode character properties, use multi-stage tables. If the range of characters is continuous, a comparison should be more effective than a table lookup.

Decision Tables

A complex condition can often be replaced with a short decision table. Here is an example. A program counts the number of digraphs to determine the encoding and language of document (e.g., Windows Latin-1, German language). All characters are divided into 3 classes:

letters of the language (e.g., a to z, ä, ö, ü, and ß for German)
letters not used by the language (e.g., è or ô for German);
separators (space, period, comma, etc.)

If two adjacent separators are found, they should be ignored, because somebody may use spaces for indentation. A non-used character should not appear near a letter, so the expected frequency of this digraph is zero (e.g., if we found êt digraph, the text is probably in French, not German). A word consisting of non-used characters should be ignored: this will improve statistics for multilingual texts (e.g., English trademarks mentioned in an Arabic text).

A series of if statements for these rules would be long and hard to read (try to write it!). Such conditions are better to be represented with a decision table. Here is a fragment of code that builds the expected_frequencies matrix:

C++

enum LETTER_CLASS { ST_NONUSED = 0, ST_LETTER, ST_SEPARATOR };
enum DECISION { DT_SETDONTCARE, DT_SETZERO, DT_LEAVEASIS };
const DECISION decision_table[3][3] = {
      // Non-used        Letter        Separator
    {DT_SETDONTCARE, DT_SETZERO,   DT_SETDONTCARE},  // Non-used
    {DT_SETZERO,     DT_LEAVEASIS, DT_LEAVEASIS},    // Letter
    {DT_SETDONTCARE, DT_LEAVEASIS, DT_SETDONTCARE}}; // Separator

// Read a sample text and fill the expected_frequencies matrix with digraph frequencies
...

// Correct the expected_frequency value depending on the letter class
switch (decision_table[letter_class1][letter_class2]) {
    case DT_SETDONTCARE:
        expected_frequencies[letter1][letter2] = DONTCARE;
        break;

    case DT_SETZERO:
        if (expected_frequencies[letter1][letter2] != 0)
            OutputMessage("Illegal letter combination %c%c\n",
                letter1, letter2);
        expected_frequencies[letter1][letter2] = 0;
        break;

    case DT_LEAVEASIS:
        // No action, use previously calculated expected_frequencies value
        break;
}

History

29^th September, 2009: Initial post

License

This article, along with any associated source code and files, is licensed under The zlib/libpng License

Written By

Peter Kankowski

Software Developer

Czech Republic

Peter is the developer of Aba Search and Replace, a tool for replacing text in multiple files. He likes to program in C with a bit of C++, also in x86 assembly language, Python, and PHP.

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Table-driven Approach

Introduction

Price Calculation

Shipping Rates and Progressive Taxation

Assemblers, Disassemblers, and Code Generators

Character Properties

Decision Tables

Further Reading

Books

Web Pages

History

License

Comments and Discussions