Click here to Skip to main content
Click here to Skip to main content

Table-driven Approach

By , 29 Sep 2009
 

Introduction

You can often avoid repeating code by placing distinct parts of it into an array:

// A text adventure game
if(strcmpi(command, "north") == 0) {
    if(cur_location->north)
        GoToLocation(cur_location->north);
    else
        Print("Cannot go there");
}
else if(strcmpi(command, "east") == 0) {
    if(cur_location->east)
        GoToLocation(cur_location->east);
    else
        Print("Cannot go there");
}
else if(strcmpi(command, "south") == 0) {
    if(cur_location->south)
        GoToLocation(cur_location->south);
    else
        Print("Cannot go there");
}
else if(strcmpi(command, "west") == 0) {
    if(cur_location->west)
        GoToLocation(cur_location->west);
    else
        Print("Cannot go there");
}

A shorter equivalent:

enum SIDE {SIDE_NORTH = 0, SIDE_EAST, SIDE_SOUTH, SIDE_WEST};
struct COMMAND {
   const char * name;
   SIDE side;
};
static const COMMAND commands[] = {
   {"north", SIDE_NORTH},
   {"east", SIDE_EAST},
   {"south", SIDE_SOUTH},
   {"west", SIDE_WEST},
};
for(int i = 0; i < NUM_OF(commands); i++)
    if(strcmpi(commands[i].name, command) == 0) {
        SIDE d = commands[i].side;
        if(cur_location->sides[d])
            GoToLocation(cur_location->sides[d]);
        else
            Print("Cannot go there");
    }

The second version is not only smaller, but also more abstract and hence easier to maintain. To implement short command names, you have to modify the code in just one place. If more commands will be added, you can easily replace the array with a hash table.

Such arrays are indispensable in many programs: from compilers and disassemblers to tax calculators and web applications. This article highlights some examples of their usage.

Price Calculation

In Refactoring, Martin Fowler discusses the following code:

// calculating the price for renting a movie
double result = 0;
switch(movieType) {
   case Movie.REGULAR:
     result += 2;
     if(daysRented > 2)
        result += (daysRented - 2) * 1.5;
     break;
 
   case Movie.NEW_RELEASE:
     result += daysRented * 3;
     break;
 
   case Movie.CHILDRENS:
     result += 1.5;
     if(daysRented > 3)
        result += (daysRented - 3) * 1.5;
     break;
}

In order to improve it, he replaces the switch statement with inheritance (one class for calculating the regular price, one for new releases, and another one for children's movies).

A simpler solution uses arrays:

enum MovieType {Regular = 0, NewRelease = 1, Childrens = 2};
 
                             // Regular   NewRelease   Childrens
const double initialCharge[] = {2,             0,        1.5};
const double initialDays[] =   {2,             0,          3};
const double multiplier[] =    {1.5,           3,        1.5};
 
double price = initialCharge[movie_type];
if(daysRented > initialDays[movie_type])
    price += (daysRented - initialDays[movie_type]) * multiplier[movie_type];

It's evident that the same formula is used in all three cases, so you don't have to make a complicated class hierarchy for them. The table-driven code is faster, much shorter, and easier to maintain.

Generally, when you think in terms of the only paradigm (in this case, OOP), you will often miss a better solution. By learning more different approaches (including table-driven approach), you can extend your mental toolbox and solve more programming problems with less lines of code.

Shipping Rates and Progressive Taxation

Another situation when the tables are useful is a simple or a piecewise linear function.

Imagine you are developing a web shop script. The total price shown to a user should include a shipping rate, which depends on the weight of items and the shipping distance. For example, the rates of Russian postal service are:

Distance Price for each 0.5 kg
less than 601 km 27.60 rubles
601..2000 km 35.00 rubles
2001..5000 km 39.60 rubles
5001..8000 km 46.35 rubles
more than 8000 km 50.75 rubles

The dependence between distance and price cannot be represented with a formula, so the program has to search in an array to find the applicable rate:

def shipping_fee(weight, distance, value):
# weight in grams, distance in kilometers, declared value in rubles
    rates = [
        # max distance, price
        ( 600, 27.60),
        (2000, 35.00),
        (5000, 39.60),
        (8000, 46.35)
    ]
    # Find the applicable rate for this distance
    for r in rates:
        if distance <= r[0]:
            rate = r[1]
            break
    else:
        rate = 50.75
    
    # For each 500 grams of the parcel
    fee = rate * math.ceil(weight / 500)
    
    # For each ruble of the declared value
    fee += 0.03 * math.ceil(value)
    
    # Apply VAT (18%)
    return round(fee * 1.18, 2)

The array is usually small, so linear scanning works faster than binary search.

For example, sending The practice of programming from Moscow to Khabarovsk (0.3 kg, more than 8000 km, 214 rubles) should cost 67.46 rubles (around $2.20).

Progressive tax calculation is a similar, but slightly more complex example:

def get_tax(income):
    if income < 0:
        return income
    MAX_INCOME = 0; RATE = 1 # array indexes
    tax_brackets = [
        (  8350,  0.10),
        ( 33950,  0.15),
        ( 82250,  0.25),
        (171550,  0.28),
        (372950,  0.33)
    ]
    tax = 0
    prev_bracket = 0
    for bracket in tax_brackets:
        tax += ( min(bracket[MAX_INCOME], income) - prev_bracket ) * bracket[RATE]
        if income <= bracket[0]:
            break
        prev_bracket = bracket[MAX_INCOME]
    else:
        tax += (income - prev_bracket) * 0.35
    return tax

Disclaimer: These functions are only meant to illustrate the table-driven technique; the author doesn't claim they calculate the shipping rate in Russia and the income tax in USA correctly. For the latter, please use the calculator from IRS site, which takes into account various deductions.

Assemblers, Disassemblers, and Code Generators

Writing an assembler or disassembler is a good exercise in table-driven design. Typically, you have arrays for one-byte and two-byte instructions. Mnemonic and operand types of each instruction are stored in these arrays. Gary Shute provides a small example of RISC assembler that uses this technique.

x86 code is harder to assemble because of its irregular and complicated structure. The best assemblers/disassemblers use additional tables to deal with opcode extensions (see, for example, CADT by Ms-Rem). The Table Driven Assembler by Jan Nikitenko shows that it's possible to write a generalized assembler, which can be retargeted for any ISA.

Character Properties

The ctype.h header file in C includes macros for character classification (isdigit, isupper, isalnum, etc.) Their comparison-based implementation is clumsy and ineffective:

int isalnum(int ch) {
    return 'a' <= ch && ch <= 'z' ||
           'A' <= ch && ch <= 'Z' ||
           '0' <= ch && ch <= '9';
}

If there are several character properties, you can assign a bit to each of them, and then make a 256-element array storing the properties for each character. Most C run-time libraries use this method for ctype.h macros:

static const unsigned char properties[] = {
      0,  0,  0,  0,  0,  0,  0,  0,  0, 16, 16, 16, 16, 16,  0,  0,
      0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
     16,160,160,160,160,160,160,160,160,160,160,160,160,160,160,160,
    204,204,204,204,204,204,204,204,204,204,160,160,160,160,160,160,
    160,202,202,202,202,202,202,138,138,138,138,138,138,138,138,138,
    138,138,138,138,138,138,138,138,138,138,138,160,160,160,160,160,
    160,201,201,201,201,201,201,137,137,137,137,137,137,137,137,137,
    137,137,137,137,137,137,137,137,137,137,137,160,160,160,160,  0,
      0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
      0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
      0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
      0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
      0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
      0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
      0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
      0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
};
#define islower(ch)  (properties[ch] & 0x01)
#define isupper(ch)  (properties[ch] & 0x02)
#define isdigit(ch)  (properties[ch] & 0x04)
#define isalnum(ch)  (properties[ch] & 0x08)
#define isspace(ch)  (properties[ch] & 0x10)
#define ispunct(ch)  (properties[ch] & 0x20)
#define isxdigit(ch) (properties[ch] & 0x40)
#define isgraph(ch)  (properties[ch] & 0x80)

If you have only one property to store, use a bit array, which is smaller (256 bits = 32 bytes), but needs extra operations to retrieve the value:

inline int isalnum(int ch) {
    static const unsigned int alnum[] = {
        0x0, 0x3ff0000, 0x7fffffe, 0x7fffffe, 0x0, 0x0, 0x0, 0x0,
    };
    return (alnum[ch >> 5]) & (1 << (ch & 31));
}

The alnum array has ones in the positions corresponding to alphanumeric characters, and zeros in other positions.

The range of matching characters is often limited. If you have a range of 32 or less characters, you can store the bits in an integer constant:

// True for a, e, i, o, u, and y
inline int isvowel(int ch) {
    int x = ch - 'a';
    return (0x1104111 >> x) & ((x & 0xffffffe0) == 0);
}

When x is negative or greater than 31, the result of shift by x is undefined in C language. An x86 processor uses the lower 5 bits of x; some other processors always return 0 for the big shift amounts. For portability, we have to do an extra check: (x & 0xffffffe0) == 0. Both MSVC++ and MinGW generate a branchless x86 code for this condition (using SBB or SETE instruction). The method is not as fast as the table-driven approach, but the code is smaller.

The arrays for these methods can be generated with a script. For Unicode character properties, use multi-stage tables. If the range of characters is continuous, a comparison should be more effective than a table lookup.

Decision Tables

A complex condition can often be replaced with a short decision table. Here is an example. A program counts the number of digraphs to determine the encoding and language of document (e.g., Windows Latin-1, German language). All characters are divided into 3 classes:

  • letters of the language (e.g., a to z, ä, ö, ü, and ß for German)
  • letters not used by the language (e.g., è or ô for German);
  • separators (space, period, comma, etc.)

If two adjacent separators are found, they should be ignored, because somebody may use spaces for indentation. A non-used character should not appear near a letter, so the expected frequency of this digraph is zero (e.g., if we found êt digraph, the text is probably in French, not German). A word consisting of non-used characters should be ignored: this will improve statistics for multilingual texts (e.g., English trademarks mentioned in an Arabic text).

A series of if statements for these rules would be long and hard to read (try to write it!). Such conditions are better to be represented with a decision table. Here is a fragment of code that builds the expected_frequencies matrix:

enum LETTER_CLASS { ST_NONUSED = 0, ST_LETTER, ST_SEPARATOR };
enum DECISION { DT_SETDONTCARE, DT_SETZERO, DT_LEAVEASIS };
const DECISION decision_table[3][3] = {
      // Non-used        Letter        Separator
    {DT_SETDONTCARE, DT_SETZERO,   DT_SETDONTCARE},  // Non-used
    {DT_SETZERO,     DT_LEAVEASIS, DT_LEAVEASIS},    // Letter
    {DT_SETDONTCARE, DT_LEAVEASIS, DT_SETDONTCARE}}; // Separator

// Read a sample text and fill the expected_frequencies matrix with digraph frequencies
...

// Correct the expected_frequency value depending on the letter class
switch (decision_table[letter_class1][letter_class2]) {
    case DT_SETDONTCARE:
        expected_frequencies[letter1][letter2] = DONTCARE;
        break;

    case DT_SETZERO:
        if (expected_frequencies[letter1][letter2] != 0)
            OutputMessage("Illegal letter combination %c%c\n",
                letter1, letter2);
        expected_frequencies[letter1][letter2] = 0;
        break;

    case DT_LEAVEASIS:
        // No action, use previously calculated expected_frequencies value
        break;
}

Further Reading

Books

  • Code Complete by Steve McConnell, chapter 18 (Table-driven methods). Covers access by index and linear search in array. McConnell also criticizes the mechanical application of OOP.
  • Programming Pearls by Jon Bentley, chapter 3.1, 3.3 (A Survey Program, An Array of Examples). The chapter shows how to make a statistical program shorter by using arrays and provides more examples of where table-driven methods can be useful.

Web Pages

History

  • 29th September, 2009: Initial post

License

This article, along with any associated source code and files, is licensed under The zlib/libpng License

About the Author

Peter Kankowski
Software Developer
Russian Federation Russian Federation
Peter lives in Siberia, the land of sleeping sun, beautiful mountains, and infinitely deep snow. He recently started a wiki about algorithms and code optimization, where people could share their ideas, learn, and teach others.

Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
You must Sign In to use this message board.
Search this forum  
    Spacing  Noise  Layout  Per page   
GeneralMy vote of 5memberKeting27-Jul-10 0:42 
The author presents many good examples of Table-driven approach.
GeneralThanksmemberdanielj7-Oct-09 5:06 
Thanks for a great article, helps you think about the code your writing.
I found some switch statements that I was able to simplify.
GeneralTabels vs. if/else switch/casememberDamir Valiulin6-Oct-09 4:28 
Peter,
 
Good article. I just want to comment about the first two examples provided: GoToLocation and Price calculation.
 
From my own experience, I find that if there are only few branches in decision making, the first version of both examples is faster to write, easier to understand, and easier to maintain. If you had more direction points (NE, NW, etc), then surely the table version is better, but with 4 branches it's faster to write if/else.
 
I know you just used those examples for illustration, but I couldn't help it. Wink | ;)
GeneralRe: Tabels vs. if/else switch/casememberPeter Kankowski6-Oct-09 14:18 
Thanks Smile | :) I would use conditions for 2-4 branches, but if more cases are expected in future, tables are a better choice.
 
In a real game, there are more keywords and more complex data structures are used. See source code for the Adventure game: there is a hash table for mapping command names to codes, several arrays indexed by command code or location code (default_msg, long_desc, start, etc.), and macros to fill the tables. The tables are similar to the ones in my article.
 
For price calculation, another good solution is to allow their manager to add/edit movie types and define formulas for calculating price in admin interface, then use an expression evaluator to calculate the price based on daysRented and the defined formula. The manager will be able to change prices without bothering a programmer.
GeneralRe: Tabels vs. if/else switch/casememberDamir Valiulin7-Oct-09 3:27 
I completely agree with you
GeneralGood stuff...memberNemanja Trifunovic30-Sep-09 8:06 
Thanks for a good article. BTW, have you read Writing Solid Code[^], by Steve Maguire? He advocates table-based approach from the standpoint of robustness rather than performance.
 

GeneralRe: Good stuff...memberPeter Kankowski30-Sep-09 14:43 
Thank you! I will find and read this book.
GeneralMy vote of 1memberMember 1869130-Sep-09 2:00 
seriously ?
GeneralRe: My vote of 1memberKochise3-Oct-09 22:09 
Seems you just haven't read his article. This is golden to hardcore coders, because that's just the way it works.
 
Kochise
 
In Code we trust !

GeneralRe: My vote of 1membernbr5-Oct-09 18:37 
Member 18691: Articles Submitted=0
 
NBR

General"Generally, when you think in terms of the only paradigm (in this case, OOP)"memberArash Partow30-Sep-09 1:02 
Perhaps it should be "If All You Have Is a Hammer, Everything Looks Like a Nail" Laugh | :laugh:
 

 
good stuff btw.
GeneralRe: "Generally, when you think in terms of the only paradigm (in this case, OOP)"memberPeter Kankowski30-Sep-09 15:19 
Laugh | :laugh: Yes, exactly. Paul Graham says even rougher:
 
[OOP] is a good tool if you want to convince yourself, or someone else, that you are doing a lot of work.

 
See also Your Code: OOP or POO? by Jeff Atwood.
GeneralRe: "Generally, when you think in terms of the only paradigm (in this case, OOP)"memberCurtainDog5-Oct-09 17:47 
"If All You Have Is a Hammer, Everything Looks Like a Nail" is insufficiently OO. I prefer "If All You Have Is a Hammer, Everything Looks Like a Hammerable"
 
But seriously, I disagree that Fowler advocates replacing switches with inheritence but rather with polymorphism. It ain't the same thing. I enjoyed the rest of the article though, despite being an OO zealot.
GeneralRe: "Generally, when you think in terms of the only paradigm (in this case, OOP)"memberCurtainDog6-Oct-09 19:38 
Sorry I'll take that back, he does say to replace switches with inheritence. I guess it might help to read stuff in future before sounding off on it. But the inheritence is there in order to get the polymorphic behaviour, rather than any of the other effects that come along with inheritence.
GeneralRe: "Generally, when you think in terms of the only paradigm (in this case, OOP)"memberStefan636-Oct-09 23:14 
Actually, I still believe that Fowler's OO approach is better maintainable than a table. The example in this case just happens to be suitable for packing it into a single formula.
 
But what if new switch cases are added? Every single case will potentially make the formula more complicated and more error prone. What's worse: if you make an error, you won't know which part of the formula is erroneous, because suddenly all cases might be calculated wrong!
 
If you go with inheritance, whenever you introduce a new case, only this case is prone to errors and you can quickly and efficiently find and fix that error.
 
And what if conditions are changed? You have to pinpoint the correct factor inside the formula. If you do it right, then fine. But if you make an error, again, all cases will be affected, and you will have to go over the whole formula again to find the error and fix it. Whereas an OO approach lets you pinpoint the correct point of change before even looking at the actual code.
 
Not that I'd advice to use inheritance in such a simple case, but if I expect an application to be used over a long time and maybe being expanded and changed a lot, I still would prefer the 'long term maintenance' approach. And that is OOP.
 
In practice, I often find multiple conditional branches to handle several cases, and often find the code to be repetitive. But in most cases this isn't just about changing or calculating a single value, it is about several values, creating or destroying data structures, setting error conditions, generating output, and the like. Nothing you can put in a single formula, or three.
 
When I look at the repetitive parts, I can easily see how thess cases might originally have evolved from somthing that could indeed have been emulated by a table and formula. But the code I see now would never work by that approach!
GeneralRe: "Generally, when you think in terms of the only paradigm (in this case, OOP)"memberPeter Kankowski7-Oct-09 4:29 
The original code is:
 
double charge(int daysRented) {
    double result = 0;
    switch (priceCode()) {
        case REGULAR:
            result += 2;
            if (daysRented > 2)
                result += (daysRented - 2) * 1.5;
            break;
        case NEW_RELEASE:
            result += daysRented * 3;
            break;
        case CHILDRENS:
            result += 1.5;
            if (daysRented > 3)
                result += (daysRented - 3) * 1.5;
            break;
 
    }
    return result;
}
 
The refactored code is:
 
abstract class Price {
    static Price regular() {
        return _regular;
    }
    static Price childrens() {
        return _childrens;
    }
    static Price newRelease() {
        return _newRelease;
    }
        
    private static Price _childrens = new ChildrensPrice();
    private static Price _newRelease = new NewReleasePrice();
    private static Price _regular = new RegularPrice();
    
    abstract double charge(int daysRented);
    abstract int priceCode();
}
 
class RegularPrice {
    double charge(int daysRented){
        double result = 2;
        if (daysRented > 2)
            result += (daysRented - 2) * 1.5;
        return result;
    }
    int priceCode() {
        return Movie.REGULAR;
    }
}
 
class ChildrensPrice {
    double charge(int daysRented){
        double result = 1.5;
        if (daysRented > 3)
            result += (daysRented - 3) * 1.5;
        return result;
    }
    int priceCode() {
        return Movie.CHILDRENS;
    }
}
 
class NewReleasePrice {
    double charge(int daysRented){
        return daysRented * 3;
    }
    int priceCode() {
        return Movie.NEW_RELEASE;
    }
}
 
It's 2.5 times longer and much more complicated, so there are more possibilities to overlook a bug. There is more code to read and understand when doing code review. OOP has no advantages here; it's just useless wordiness.
 
If the formula is fixed, I would use tables. If changes are expected, the original switch statement would be a better choice, IMHO.
 
You can also allow the user to change the formula without recompiling the program:
  • use a generalized formula (see my table-driven code) and allow the user to change initialCharge, initialDays, and multiplier for each movie type in the GUI;
  • implement an expression evaluator and allow the user to enter any price formula he or she wants.
 
I used table-driven code for many things in real programs and it was short and easy to maintain in long-term perspective. I remember only one case when I had to add a hack (an additional condition), because it was impossible to handle a non-standard case with the generalized code.
GeneralRe: "Generally, when you think in terms of the only paradigm (in this case, OOP)"memberStefan637-Oct-09 5:27 
Yes the code is longer. But you omit that the classes involved are also being used for other purposes, such as calculating frequent renter points, generating text statements and they are used in conjunction with various other objects. When you look at the whole picture there isn't really that much more code than before, and much of it is formalism, such as class bodies - which really should be viewed as documentation rather than actual code. The effective code that actually does the calculation has not magically grown by even a single line!
 
As to maintenance - sure, you might need a longer look to spot the code segment you need to modify if the code is longer. If you're using Notepad as editor that is! Don't forget that you can look up classes and methods in every modern IDE! You cannot look up the right case statement though.
 
So to maintain the original code, you will have to lookup your charge method, then look through that function to find the case statements that are affected by your change, and then change them. In the new code, you look up the class::method in your IDE and will have the relevant line of code immediately in front of you. Aren't IDE's useful?
 
In comparison: if you want to change your table lookup, you have to either know what arrays the relevant values are stored in, or you'll first have to look up the function to see how they're called, then look up the arrays and change them. This might be easy enough when the relevant arrays are just on top of the function, but the bigger your applications get, the less likely this will be the case.
GeneralRe: "Generally, when you think in terms of the only paradigm (in this case, OOP)"memberPeter Kankowski7-Oct-09 14:20 
It's certainly a matter of taste, but I prefer shorter solutions for small problems. As Paul Graham wrote in his essays, "succinctness is power" and "the easiest program to change is one that's very short".
 
OOP is useful for more complex things. For example, you are writing an MP3/OGG tag editor. There are 3 tag formats (ID3v1, ID3v2, and Vorbis comment), which are very different in implementation, but similar in interface. For each tag format, you want to parse the file, get/set fields (artist, song title, album, genre, etc.), and save the modified file. Three classes derived from an abstract class would perfectly hide the complexity of tag formats.
 
However, a class with 1-2 very short methods seems to be an overkill for me. OOP is not the only approach to programming, you can try table-driven techniques, too.
 
In Code Complete, McConnell describes a complex program for printing messages received from hardware. Object-oriented approach would require writing 20 classes (one class for each message type). Table-driven approach requires just one function for reading all types of messages. He says that OOP hides the logic in class hierarchy, but doesn't make the program structure simpler. It's worth reading for you.
Generalgood'ol timesmemberasthalas29-Sep-09 21:00 
Thanks for reminding me of the early nineties, when I was writting Assembler on the ARM... And, most of all, of the *cool* tricks we used for code reduction and speed optimizations. You are very right when you say that OO is not always the answer.
In fact, I've found myself falling in the same mistake you mention with inheritance vs. table aproach.
Thank you for reminding me of that approach!
Please keep on it Smile | :)

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Permalink | Advertise | Privacy | Mobile
Web01 | 2.6.130617.1 | Last Updated 29 Sep 2009
Article Copyright 2009 by Peter Kankowski
Everything else Copyright © CodeProject, 1999-2013
Terms of Use
Layout: fixed | fluid