Click here to Skip to main content
Click here to Skip to main content

Something You May Not Know About the Switch Statement in C/C++

By , 19 Dec 2012
 
Prize winner in Competition "Best C++/MFC article of August 2010"

Introduction

Many programming languages such as C/C++, C#, Java, and Pascal provide the switch statement to let us implement selection logic. In some scenarios, it's a good alternative to if-then-else, making code clearer and more readable. When using switch in practice, you may want to know:

  • How the switch block is executed at runtime?  
  • Is it running faster than an if-then-else for a long list of conditions?
  • For n conditions, what is the switch time complexity?

The C/C++ standard defines the specification of language elements, but it doesn't say anything about how to implement the switch statement. Every vendor is free to use any implementation as long as it fits the standard. This article discusses what happens when running a switch statement in Visual C++, with a few examples under different conditions. We'll analyze these examples by using the Microsoft Visual Studio IDE, since it can generate the corresponding assembly listing on compiling. Thus, a general understanding of Intel (x86) assembly language is assumed. As you can see shortly, all results here are based on reverse engineering, and so the article is never a comprehensive description of switch implementations in compilers. If you are learning Assembly Language Programming, this article might be a study material to read.

Our first example is switch1.cpp, a commonly used simple block as below:

#include "functions.h"

int main()
{
    int i =3;    // or i =20

    switch (i)
    {
        case 1: f1(); break;
        case 2: f2(); break;
        case 5: f1(); break;
        case 7: f2(); break;
        case 10: f1(); break;
        case 11: f2(); break;
        case 12: f2(); break;
        case 17: f1(); break;
        case 18: f1(); break;

        default: f3(); 
    }

    return 0;
}

where three functions are defined in functions.cpp:

void f1() { cout  "f1 called\n"; }
void f2() { cout  "f2 called\n"; }
void f3() { cout  "f3 called\n"; }

Could the worst case be thought of as i=3 or i=20? How does it execute: does it exhaust all nine cases and finally reach the default to call f3? Let's answer it from the assembly translations.

In order to generate an assembly listing in Visual Studio, open the switch1.cpp Property dialog and select the Output Files category under C/C++. On the right pane, choose the Assembly With Source Code (/FAs) option as shown below:

The option in the switch1.cpp Properties

Then, when you compile switch1.cpp, an assembly file called switch1.asm will be generated. Using this option, the listing includes the C++ source code, which is commented by the semi-colon with a line number as shown in the next section.

The Two-Level Jump Table

Let's analyze the assembly listing from top down. Here is where the switch starts:

; 5    :     int i =3;    // or i =20

    mov    DWORD PTR _i$[ebp], 3

; 6    : 
; 7    :     switch (i)

    mov    eax, DWORD PTR _i$[ebp]
    mov    DWORD PTR tv64[ebp], eax
    mov    ecx, DWORD PTR tv64[ebp]
    sub    ecx, 1
    mov    DWORD PTR tv64[ebp], ecx
    cmp    DWORD PTR tv64[ebp], 17            ; 00000011H
    ja     SHORT $LN1@main
    mov    edx, DWORD PTR tv64[ebp]
    movzx  eax, BYTE PTR $LN15@main[edx]
    jmp    DWORD PTR $LN16@main[eax*4]

Assuming that the symbol _i$[ebp] is an alias of i, tv64[ebp] is another name of i2, $LN15@main is renamed as table1, $LN16@main as table2, and $LN1@main is a label named Default_Label. The snippet merely does this in pseudocode:

i2 = i;
i2 = i2-1;
if i2 > 17 goto Default_Label;
goto table2[4*table1[i2]];

Here, 17 indicates the last case condition value, because the integers 1 to 18 are mapped from 0 to 17. This is why i2 is decremented to make it a zero-based integer as an index. Now, if i2 is greater than 17 (e.g., n=20), the control goes to the default. Otherwise, it goes to where table2[4*table1[i2]] is pointed.

How about when i is zero? Then i2 becomes -1. Worry about the index out of range error? No, it will never happen. Back to the assembly listing, you can see that -1 is saved as DWORD, the double word as an unsigned 4-byte integer. Hence, it must be greater than 17 and goes to default.

Let's look at two tables and see how they work together. The table1 is pretty simple, with a starting address at $LN15@main, which you can just think of as an array name.

$LN15@main:
    DB    0
    DB    1
    DB    9
    DB    9
    DB    2
    DB    9
    DB    3
    DB    9
    DB    9
    DB    4
    DB    5
    DB    6
    DB    9
    DB    9
    DB    9
    DB    9
    DB    7
    DB    8

With this array, table1[0] is 0, table1[1] is 1, table1[2] and table1[3] are 9, etc. These values are created for the calculation of the index of table2, which starts at $LN16@main:

$LN16@main:
    DD    $LN10@main
    DD    $LN9@main
    DD    $LN8@main
    DD    $LN7@main
    DD    $LN6@main
    DD    $LN5@main
    DD    $LN4@main
    DD    $LN3@main
    DD    $LN2@main
    DD    $LN1@main

The above labels, from $LN10@main to $LN1@main, are ten calling targets in C++, for nine cases plus one default. Notice that DB represents defining byte (8 bits), while DD defines the double word type of four bytes (32 bits). This is why we need to multiply 4 in table2[4*table1[i2]]. By this formula, we calculate the calling address via table1 and table2:

  • if i equals 1, i2 is 0 and table1[0] is 0, jump to $LN10@main by table2[0], the first case.
  • if i equals 2, i2 is 1 and table1[1] is 1, jump to $LN9@main by table2[4*1], the second case.
  • if i equals 3, i2 is 2 and table1[2] is 9, jump to $LN1@main by table2[4*9], the default.
  • ... ...

Now we come to the code segment labeled by LN10@main to $LN1@main as the calling targets:

$LN10@main:
; 8    :     {
; 9    :         case 1: f1(); break;

    call        ?f1@@YAXXZ            ; f1
    jmp    SHORT $LN11@main

$LN9@main:
; 10   :         case 2: f2(); break;

    call    ?f2@@YAXXZ                ; f2
    jmp    SHORT $LN11@main

$LN8@main:
; 11   :         case 5: f1(); break;

    call    ?f1@@YAXXZ                ; f1
    jmp    SHORT $LN11@main

; ... ...

$LN2@main:
; 17   :         case 18: f1(); break;

    call    ?f1@@YAXXZ                ; f1
    jmp    SHORT $LN11@main

$LN1@main:
; 18   : 
; 19   :         default: f3(); 

    call    ?f3@@YAXXZ                ; f3

$LN11@main:
; 20   :     }
; 21   : 
; 22   :     return 0;

Functions f1, f2, and f3 are converted to the decorated assembly procedures, and prefixed with a question mark, like ?f1@@YAXXZ, ?f2@@YAXXZ, and ?f3@@YAXXZ. Notice that $LN10@main is a label of case 1 to call f1, $LN9@main for case 2 to call f2, and $LN1@main for default to call f3.

An additional label is $LN11@main, pointing to the location after the last default clause. As soon as each case operation is done, the control jumps to $LN11@main. This implements the break statement. Without it, the control goes to the next case. This is why the break statement is necessary in the C/C++ switch block.

Obviously, based on such a two-level table mechanism, we have one comparison, one multiplication, and two address jumps. The time complexity of this switch pattern should be O(1). Put this all together, we have a big picture like this:

The Two-Level Jump Table Diagram

Recall that we use DB to define index values in table1, and DD for table2. In this example, there are only 18 case values for table1. But DB defining byte means the range is only from zero to 255. What if there are more cases in switch, with the number of values over 255?

The One-Level Jump Table

Here comes the second example, switch2.cpp with 1000 cases. For simplicity, we start from the condition value zero:

int main2()
{
    int i =1000;

    switch (i)
    {
        case 0: f1(); break;
        case 1: f1(); break;
        case 2: f2(); break;
        case 3: f1(); break;
      // ... ...
        case 995: f1(); break;
        case 996: f1(); break;
        case 997: f1(); break;
        case 998: f2(); break;
        case 999: f1(); break;

        default: f3(); 
    }

    return 0;
}

Don't think it makes things complicated. On the contrary, it becomes simpler. This is the switch2.asm that Visual Studio generates for us:

; 5    :     int i =1000;

    mov    DWORD PTR _i$[ebp], 1000        ; 000003e8H

; 6    : 
; 7    :     switch (i)

    mov    eax, DWORD PTR _i$[ebp]
    mov    DWORD PTR tv64[ebp], eax
    cmp    DWORD PTR tv64[ebp], 999        ; 000003e7H
    ja     $LN1@main2
    mov    ecx, DWORD PTR tv64[ebp]
    jmp    DWORD PTR $LN1006@main2[ecx*4]

Likewise, assume that the symbol _i$[ebp] is an alias of i. Because the first case starts from zero, no mapping is needed. So i2 (from tv64[ebp]) equals i. The array $LN1006@main2 is an only jump table, and the label $LN1@main2 is Default_Label. This snippet simply does this:

i2 = i;
if i2 > 999 goto Default_Label;
goto table[4*i2];

By searching $LN1006@main2 in switch2.asm, we find the code labels all defined by double words:

$LN1006@main2:
    DD    $LN1001@main2
    DD    $LN1000@main2
    DD    $LN999@main2
    DD    $LN998@main2
    DD    $LN997@main2
    DD    $LN996@main2
   ; ... ...
    DD    $LN5@main2
    DD    $LN4@main2
    DD    $LN3@main2
    DD    $LN2@main2

Totally 1000 calling targets, each label is followed by a procedure call, downwards from $LN1001@main2 to $LN2@main2:

$LN1001@main2:
; 8    :     {
; 9    :         case 0: f1(); break;
    call    ?f1@@YAXXZ                ; f1
    jmp    $LN1002@main2

$LN1000@main2:
; 10   :         case 1: f1(); break;

    call    ?f1@@YAXXZ                ; f1
    jmp    $LN1002@main2

$LN999@main2:
; 11   :         case 2: f2(); break;

    call    ?f2@@YAXXZ                ; f2
    jmp    $LN1002@main2

; ... ...

$LN3@main2:
; 1106 :         case 998: f2(); break;

    call    ?f2@@YAXXZ                ; f2
    jmp    SHORT $LN1002@main2

$LN2@main2:
; 1107 :         case 999: f1(); break;

    call    ?f1@@YAXXZ                ; f1
    jmp    SHORT $LN1002@main2

$LN1@main2:
; 1108 : 
; 1109 :         default: f3(); 

    call    ?f3@@YAXXZ                ; f3
$LN1002@main2:

; 1110 :     }
; 1111 : 
; 1112 :     return 0;

Here, $LN1@main2 directs to default, and the additional label $LN1002@main implements break. Although the complexity is also O(1), this mechanism should be a little bit efficient since only one address jump is required. The following is a picture of this example:

The One-Level Jump Table Diagram

Until now, we see the switch illustrations executed so well. Is it really better to use switch than if-then-else? For n conditions without a match, definitely, if-then-else must check each condition until the last, which is the worst case of O(n). But for switch, do we always expect its complexity to be O(1)? Or is there some kind of code that would lead its execution beyond that?

Using Binary Search

We will give the third example showing the large gap between case condition values in switch3.cpp, where the switch execution behaves as the binary search:

int main3()
{
    int i =1;

    switch (i)
    {
        case 100: f1(); break;
        case 200: f2(); break;
        case 250: f2(); break;
        case 500: f1(); break;
        case 700: f2(); break;
        case 750: f2(); break;
        case 800: f2(); break;
        case 900: f1(); break;

        default: f3(); 
    }

    return 0;
}

By generating the assembly listing switch3.asm, this time, we take a look from bottom up by checking the case operations first:

$LN9@main3:

; 8    :     {
; 9    :         case 100: f1(); break;
    call    ?f1@@YAXXZ                ; f1
    jmp    SHORT $LN10@main3

$LN8@main3:
; 10   :         case 200: f2(); break;
    call    ?f2@@YAXXZ                ; f2
    jmp    SHORT $LN10@main3

$LN7@main3:
; 11   :         case 250: f2(); break;
    call    ?f2@@YAXXZ                ; f2
    jmp    SHORT $LN10@main3

$LN6@main3:
; 12   :         case 500: f1(); break;
    call    ?f1@@YAXXZ                ; f1
    jmp    SHORT $LN10@main3

$LN5@main3:
; 13   :         case 700: f2(); break;
    call    ?f2@@YAXXZ                ; f2
    jmp    SHORT $LN10@main3

$LN4@main3:
; 14   :         case 750: f2(); break;
    call    ?f2@@YAXXZ                ; f2
    jmp    SHORT $LN10@main3

$LN3@main3:
; 15   :         case 800: f2(); break;
    call    ?f2@@YAXXZ                ; f2
    jmp    SHORT $LN10@main3

$LN2@main3:
; 16   :         case 900: f1(); break;
    call    ?f1@@YAXXZ                ; f1
    jmp    SHORT $LN10@main3

$LN1@main3:
; 17   : 
; 18   :         default: f3(); 
    call    ?f3@@YAXXZ                ; f3

$LN10@main3:
; 19   :     }
; 20   : 
; 21   :     return 0;

Pretty neat, with eight cases plus one default, we can easily write this part into a list of nine rows, each corresponding to a target label, a C++ case, and a procedure name:

The Worst Case Diagram

Besides $LN9@main3 to $LN1@main3, we also have $LN10@main3 implement the break statement. Next, how do we jump to these calling targets? Look at the following code:

; 5    :     int i =1;

    mov    DWORD PTR _i$[ebp], 1

; 6    : 
; 7    :     switch (i)

    mov    eax, DWORD PTR _i$[ebp]
    mov    DWORD PTR tv64[ebp], eax
    cmp    DWORD PTR tv64[ebp], 700        ; 000002bcH
    jg     SHORT $LN14@main3
    cmp    DWORD PTR tv64[ebp], 700        ; 000002bcH
    je     SHORT $LN5@main3
    cmp    DWORD PTR tv64[ebp], 250        ; 000000faH
    jg     SHORT $LN15@main3
    cmp    DWORD PTR tv64[ebp], 250        ; 000000faH
    je     SHORT $LN7@main3
    cmp    DWORD PTR tv64[ebp], 100        ; 00000064H
    je     SHORT $LN9@main3
    cmp    DWORD PTR tv64[ebp], 200        ; 000000c8H
    je     SHORT $LN8@main3
    jmp    SHORT $LN1@main3
$LN15@main3:
    cmp    DWORD PTR tv64[ebp], 500        ; 000001f4H
    je     SHORT $LN6@main3
    jmp    SHORT $LN1@main3
$LN14@main3:
    cmp    DWORD PTR tv64[ebp], 750        ; 000002eeH
    je     SHORT $LN4@main3
    cmp    DWORD PTR tv64[ebp], 800        ; 00000320H
    je     SHORT $LN3@main3
    cmp    DWORD PTR tv64[ebp], 900        ; 00000384H
    je     SHORT $LN2@main3
    jmp    SHORT $LN1@main3

The logic is not too hard to understand. At first, the condition value i is saved in both _i$[ebp] and tv64[ebp]. To simplify, we just make use of all the labels here by removing the leading "$" and the tailing "@main3". Rewrite the snippet like this:

    i2 = i;
    if i2 > 700 goto LN14;
    if i2 == 700 goto LN5;
    if i2 > 250 goto LN15;
    if i2 == 250 goto LN7;
    if i2 == 100 goto LN9;
    if i2 == 200 goto LN8;
    goto LN1;
LN15:
    if i2 == 500 goto LN6;
    goto LN1;
LN14:
    if i2 == 750 goto LN4;
    if i2 == 800 goto LN3;
    if i2 == 900 goto LN2;
    goto LN1;

Surely, the compiler optimizes the code, because it first chooses 700 to compare, instead of the beginning case value 100 in the switch block. Although this switch is implemented with ten if-then statements for nine conditions, it actually applies the binary search mechanism. The following shows the equivalent decision tree of above code, a Binary Search Tree that you may be familiar with:

The binary search mechanism Diagram

For all internal nodes colored yellow in circle, we make the comparisons, while all leave nodes in oval represent the successful ending. Each failure goes to LN1 by default. The inorder traversal sequence for comparison nodes is 100, 200, 250, 500, 700, 750, 800, and 900; while the sequence for all leaves in the inorder traversal is exactly LN9, LN8, LN7, LN6, LN5, LN4, LN3, and LN2, as ordered in the previous table. Essentially, this is a binary search algorithm with the complexity of O(log n). When i=1, it will pass through the first six comparisons and then reach default at LN1. But when i=500, just four comparisons are required.

As we know, the prerequisite of binary search is that the input data is sorted. I wonder if this is just because I use the ascending case values in switch in switch3.cpp. The curiosity brings out the following disordered conditions in switch4.cpp.

int main4()
{
    int i =1;

    switch (i)
    {
    case 750: f2(); break;
    case 700: f2(); break;
    case 250: f2(); break;
    case 500: f1(); break;
    case 800: f2(); break;
    case 900: f1(); break;
    case 100: f1(); break;
    case 200: f2(); break;

        default: f3(); 
    }

    return 0;
}

To my surprise, the same binary search strategy appears in code of switch4.asm, with exact the same decision tree as shown above. Only difference is that the labels are renumbered - that's quite reasonable, because we have just reordered them! You can examine the attached switch4.asm for details.

This experiment definitely brings us some hint to find out how the compiler does its magic to sort the case condition values. A sorting algorithm is over O(log n) and not worthwhile to have it at runtime. Notice that sorting is not visible from generated assemblies. This means that sorting is not contained in the assembly instructions which will execute at runtime. Also notice that all the condition values are constant and available before the compiler translates the C++ to machine code. Now it’s reasonable to think of the preprocessor; the sorting with all known case values could be simply done in compilation. This is why the translated assembly code only reflects the binary search without sorting code. The static sorting behavior (as opposed to dynamic behavior at runtime), could be implemented with a macro procedure, assembly directives and operators. Such an preprocessing example can be found in References at the end of the article.

The Hybrid

At this moment, I can show an example of combination of a jump table and the binary search as in switch5.cpp.

int main5()
{
    int i =1;

    switch (i)
    {
        case 3: f1(); break;
        case 5: f2(); break;
        case 6: f2(); break;
        case 7: f2(); break;
        case 100: f1(); break;
        case 200: f2(); break;
        case 250: f2(); break;
        case 700: f2(); break;
        case 750: f2(); break;
        case 900: f2(); break;

        default: f3(); 
    }

    return 0;
}

The following is how this switch block is implemented in the corresponding switch5.asm:

; 5    :     int i =1;

    mov    DWORD PTR _i$[ebp], 1

; 6    : 
; 7    :     switch (i)

    mov    eax, DWORD PTR _i$[ebp]
    mov    DWORD PTR tv64[ebp], eax
    cmp    DWORD PTR tv64[ebp], 100        ; 00000064H
    jg     SHORT $LN16@main5
    cmp    DWORD PTR tv64[ebp], 100        ; 00000064H
    je     $LN7@main5
    mov    ecx, DWORD PTR tv64[ebp]
    sub    ecx, 3
    mov    DWORD PTR tv64[ebp], ecx
    cmp    DWORD PTR tv64[ebp], 4
    ja     $LN1@main5
    mov    edx, DWORD PTR tv64[ebp]
    jmp    DWORD PTR $LN18@main5[edx*4]

$LN16@main5:
    cmp    DWORD PTR tv64[ebp], 700        ; 000002bcH
    jg     SHORT $LN17@main5
    cmp    DWORD PTR tv64[ebp], 700        ; 000002bcH
    je     SHORT $LN4@main5
    cmp    DWORD PTR tv64[ebp], 200        ; 000000c8H
    je     SHORT $LN6@main5
    cmp    DWORD PTR tv64[ebp], 250        ; 000000faH
    je     SHORT $LN5@main5
    jmp    SHORT $LN1@main5
$LN17@main5:
    cmp    DWORD PTR tv64[ebp], 750        ; 000002eeH
    je     SHORT $LN3@main5
    cmp    DWORD PTR tv64[ebp], 900        ; 00000384H
    je     SHORT $LN2@main5
    jmp    SHORT $LN1@main5

It consists of two parts, a one-level jump table first and then the binary search code. Using the previous notations of i, i2 and label naming, let $LN18@main5 as a table and $LN1@main5 as default. This snippet does this:

    i2 = i;
    if i2 > 100 goto LN16;
    if i2 == 100 goto LN7;    // go case 100
                              // now i2 < 100
    i2 = i2-3;                // map to zero-based index
    if i2 > 4 goto LN1;       // go default over 7 
    goto table[4*i2];         // go jump table

LN16:                         // binary search
    if i2 > 700 goto LN17;
    if i2 == 700 goto LN4;    // go case 700
    if i2 == 200 goto LN6;    // go case 200
    if i2 == 250 goto LN5;    // go case 250
    goto LN1;                 // go default
LN17:
    if i2 == 750 goto LN3;    // go case 750
    if i2 == 900 goto LN2;    // go default    
    goto LN1;

where the table is defined as:

$LN18@main5:
    DD    LN11    ; index 0, go case 3
    DD    LN1     ; index 1 for value 4, go default
    DD    LN10    ; index 2, go case 5
    DD    LN9     ; index 3, go case 6
    DD    LN8     ; index 4, go case 7

An interesting change is to remove the case 6 in switch5.cpp. Just because of this, the hybrid becomes the combination of a two-level jump table and the binary search. For details, look at switch6.cpp and switch6.asm in downloadable samples. If you try to add extra cases in some order, you can find two individual jump tables combined with the binary search.

More Questions and Beyond

Now you have learned something about switch from some typical examples. You should understand now why the C/C++ switch only supports an integral data type for its conditional expression, such as char and enumeration type, rather than the float point or string type. Besides these, more intuitive questions might come to your head:

  • Is it necessary to maintain the cases in the order of condition values for a jump table?
  • How about if we use negative integers as case values?
  • What if the default label is missing, or appears anywhere in the block not at the last?

I believe you can answer these just by analyzing an assembly listing of the C++ code containing these questions. For convenience, I attached the following switch7.cpp with its switch7.asm in samples.

int main7()
{
    int i =15;

    switch (i)
    {
        case 2: f2(); break;
        case 1: f1(); break;
        case 5: f1(); break;
        case 10: f1(); break;
        case 7: f2(); break;
        default: f3(); break;
        case -3: f2(); break;
        case 12: f1(); break;
        case 17: f2(); break;
        case 18: f1(); break;
    }

    return 0;
}

Although the switch performance looks better than if-than-else in these examples, we still have questions unanswered. Obviously, it's not possible to enumerate all the switch executions in practice. A comprehensive analysis of switch implementations should be written by a compiler developer, not by a blind black-box tester. So, we are not able to determine the worst case in the switch execution, whether it would reach O(n) or not. Also we don't know under which conditions, the compiler chooses one implementation instead of another or even other methods unmentioned here.

As an example from a reader in discussion, one concern is about the memory waste of above said jump tables when implementing sparsely populated switch cases. How about an extreme example with only two case entries 1 and 0x7fffffff as below? Would the compiler consume most useless entries in a jump table? 

int main8()
{
    int i =3;
 
    switch (i)
    {
        case 1: f1(); break;
        case 0x7fffffff: f2(); break;
        default: f3(); break;
    }
 
    return 0;
} 

This is not true for our smart compiler. As mentioned, we don't know how the compiler chooses one implementation instead of the other. Please see here, this two-case example doesn't choose the jump table. It just simply converts switch to if-then without consuming any more memory: 

; 5    :     int i =3;
	mov	DWORD PTR _i$[ebp], 3
 
; 6    : 
; 7    :     switch (i)

	mov	eax, DWORD PTR _i$[ebp]
	mov	DWORD PTR tv64[ebp], eax
	cmp	DWORD PTR tv64[ebp], 1
	je	SHORT $LN3@main8
	cmp	DWORD PTR tv64[ebp], 2147483647		; 7fffffffH
	je	SHORT $LN2@main8
	jmp	SHORT $LN1@main8
 
$LN3@main8:
; 8    :     {
; 9    :         case 1: f1(); break;

	call	?f1@@YAXXZ				; f1
	jmp	SHORT $LN4@main8
 
$LN2@main8:
; 10   :         case 0x7fffffff: f2(); break;

	call	?f2@@YAXXZ				; f2
	jmp	SHORT $LN4@main8
 
$LN1@main8:
; 11   :         default: f3(); break;

	call	?f3@@YAXXZ				; f3
$LN4@main8:
 
; 12   :     }
; 13   : 
; 14   :     return 0; 

Again, no way to affect or predict the compiler's choices with such a black box testing.  Depending on the problem, either sparsely or densely populated switch cases, the compiler adapts them obviously very well. 

Summary  

By scrutinizing the above examples, we expose something that you may not know about the switch at runtime. To analyze a C/C++ program in Visual Studio, we can have both static and dynamic analyses. For this article, we make use of the assembly listings generated by the compiler. On the other hand, we can also monitor binary executions at runtime with the VS Disassembly window in debug, where the labels and procedures in the listing are translated to memory addresses. This way, you can trace registers and memory allocations to understand data endianness, stack frames, etc. Here, for our purpose, the assembly listing seems sufficient and easy to indicate a jump from here to there. The downloadable zip file contains seven examples, with both the .cpp source code and the .asm listings in VS 2010. The same listings can be generated by an earlier version of VS.

This article does not involve hot technologies like .NET or C#. Assembly Language is comparatively traditional without much sensation these days. However, in applications, different types of assembly languages play an important role in new device development. Academically, Assembly Language Programming is considered as a demanding course in Computer Science. I am teaching Assembly Language Programming, CSCI 241, at Fullerton College, where the concept of high-level language interfacing with low-level execution is an essential part in teaching. In particular, one of the main topics that students are interested in is how C/C++ or Java runs on a machine, represented by low-level data structures and algorithms. Therefore, I hope this article could also serve as one of the basic problem-solving examples for students. Any comments and suggestions are welcome.

Acknowledgment

I would like to sincerely thank wtwhite, Charvak Karpe, and Harold Aptroot for their valuable discussions about the binary search here. Without them, the section "Using Binary Search" would not be like it as today.

References

History

  • 9 August, 2010 -- Original version posted
  • 8 September, 2010 -- Modified and rewrote some sections:
    • Replaced the section "The Worst Case of the switch Statement" with "Using Binary Search". Added the Binary Search Tree image.
    • Added the section "The Hybrid" 
    • Rewrote the section "More Questions and beyond"
    • Adjusted the paragraphs between first two sections
    • Added the explanation for a negative index in the section "The Two-Level Jump Table"
  • 28 September, 2010 -- Article updated
    • Clarified the unanswered question about sort prerequisite for the binary search.  
    • Added Acknowledgement and References
  •  19 December, 2012 -- Incorporated the memory discussion example from Christian Scholze

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

Zuoliu Ding
Software Developer
United States United States
An Adjunct Faculty and Software Developer in Los Angeles and Orange County, CA
 
* Typical articles published in Dr. Dobb’s Journal and Windows Developer Magazine:

- A Silent Component Update for Internet Explorer
- Silent Application Update
- An MDI-Style Web Browser and Load Spy Monitor
- Implementing Wireless Print for WinNT/Win2K
- Multi-State Checkbox Tree Views
- A Generic Tool Tip Class
- An Easy Way to Add Tool Tips to Any MFC Control

- More from Google...

Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
You must Sign In to use this message board.
Search this forum  
    Spacing  Noise  Layout  Per page   
QuestionFantastic!membergcode112-Mar-13 18:02 
I learned a lot about the switch statement and how exactly it functions with your article. Thanks a lot!
 
Rahul
GeneralMy vote of 5memberClebson Derivan8-Jan-13 22:49 
great article (:
GeneralGreat Article!memberWiStRoM21-Dec-12 0:18 
Great Article! What a comprehensive analysis it is! Smile | :)
GeneralMy vote of 5memberaydinsahin20-Dec-12 22:40 
good article
GeneralGreat article. One suggestion.memberSoMad20-Dec-12 10:45 
I like this article and it addresses something I have heard several times from developers: "don't use switch-case because it is less efficient than an if-else" (I am para-phrasing and simplifying of course).
 
I missed this article when it was first published, so it is too late to jump in on any of the previous discussion, but the "Memory vs time" discussion made me think. In my CPU intensive projects, I set the optimization settings to "Maximize speed" and "Favor fast code" - the memory footprint is not important to me, but I think Christian Scholze does have a point in bringing it up.
I don't see any mentioning of the optimization settings you use, so I assume you just used the default (I know I can check the samples, but others might benefit from this discussion too).
 
My suggestion is that you recompile your samples using both "fast code" and "small code" optimizations. If there is no difference in how the compiler handles it, just make a note of that in the article. If there is a difference, I would like to see a table at the end of the article summarizing for each sample, which approach the compiler selects based on the optimization setting.
 

Thank you for your contribution.
 
Soren Madsen
GeneralRe: Great article. One suggestion.memberZuoliu Ding20-Dec-12 13:07 
Thank you so much for your suggestion of optimization idea. I saw the topic mentioned in earlier discussions already and looks definitely worth writing something here.
 
Yes, you are right that all code generated in this article with optimization disabled in compilation and link, so that compare and analyze the code under the same conditions originally.
 
For sure, any optimization either for speed or size, will be reflected in the disassemble code that has been demonstrated in my Assembly Programming class. To fully consider it, I have to take time to write another article in this topic. Hopefully I could get chance to think about soon.
 
Thanks for your interest and discussion, Happy holidays!
GeneralMy vote of 5mvpPaulo Zemek20-Dec-12 8:03 
Very good analysis of the generated result. I also liked that you've shown sequential and non-sequential situations. Keep the good work!
GeneralMy vote of 5memberH.Brydon19-Dec-12 16:31 
Great article. Thanks for writing (and updating) it. I knew about some of these details from books, other articles and courses I have taken, but you have provided the most details and done more research than anybody else I have seen.
GeneralRe: My vote of 5memberZuoliu Ding19-Dec-12 19:22 
Thanks for your nice comments. My pleasure to receive the encouragement from you, from a seasoned and experienced software veteran!
GeneralMy vote of 5memberhari1911330-Nov-12 15:25 
None has discussed Switch-Case in so detail...Good One...
GeneralInteresting Articlememberalibabo19-May-11 0:57 
The other day I was looking at my code to save few CPU clocks. It was very interesting for me that ternary operator takes extra 'mov' as compare to ordinary 'if else'. To my understanding it always takes extra mov for base pointer in case of ternary operator and I have no idea why it is so. Would love to know if someone can share why there is an extra 'mov'.
 

Below is food for thought:
 
Consider 'r' and 'i' are 'int' type variables.
r = i==0 ? 2 : 3;
cmpl $0x0,-0x4(%rbp)
jne 0x100000f2c <main+32>
movl $0x2,-0x14(%rbp)
jmp 0x100000f33 <main+39>
movl $0x3,-0x14(%rbp)
mov -0x14(%rbp),%eax
mov %eax,-0x8(%rbp)
 
And here we do again with classic 'if else' structure.
 
if(i==0)
cmpl $0x0,-0x4(%rbp)
jne 0x100000f30 <main+32>
r = 2;
movl $0x2,-0x8(%rbp)
jmp 0x100000f37 <main+39>
r =3;
movl $0x3,-0x8(%rbp)
 
I have complied this code via latest gcc.
GeneralMy vote of 5membermagicpapacy9-Jan-11 15:59 
I've learned a lot
GeneralMy vote of 5memberRobert Yuan25-Oct-10 18:09 
So observant.
GeneralMy vote of 5membersun-woo chung4-Oct-10 16:37 
inside details for better programming.
Questionquestion?memberetkins1-Oct-10 3:27 
is this true for all compilers? if this was compiled with a gnu compiler, would the thesis hold true? kind regards, david
David

AnswerRe: question?memberZuoliu Ding1-Oct-10 7:26 
Not sure what happens to the GNU compiler. To verify, just find the dissembler tool there to check code. While the implementation could be different, the basic idea should be similar, for example, the Case statement in Delphi.
GeneralMy vote of 5memberCorey Fournier29-Sep-10 4:37 
I love stuff like this.
GeneralMy vote of 5memberJohn Buller28-Sep-10 6:05 
More than just an exercise well worth doing, which it certainly is. Very well written and comprehensive. Will require several readings to extract all the ideas here.
GeneralMy vote of 5membersimon841028-Sep-10 0:39 
concept clear
GeneralMemory versus time usagememberChristian Scholze28-Sep-10 0:06 
A completely missed point is the memory footprint. (And don't tell me on modern systems this is not an issue!)
 
The jump tables described at the beginning of the article would implement O(1) behavior under all circumstances. But they are very memory intensive for sparsely populated switch cases. The perfect tradeoff is the binary search which reduces the memory requirements linear to the number of cases at the cost of a binary search with O(log n) behavior. There is no worse case than o(log n) for this type of switch implementation!
 
Only under rare circumstances a programmer would like to control the compiler decision which method to use (time versus memory). The majority of us should let the compiler decide what method to implement. If you definitely need O(1) behavior, write your own jump tables.
 
As an example for a really bad memory footprint imagine a switch statement with only two case entries 0 and 0x7fffffff. A jump table would consume 2 G entries which is more than any 32-bit computer can handle. Memory can be an issue on modern systems when wasted like this.
GeneralRe: Memory versus time usage - About your two-case examplememberZuoliu Ding29-Sep-10 8:34 
This is not true for the two-case example with a bit misunderstanding. As mentioned, we don't know under which conditions, the compiler chooses one implementation instead of the other. Please see below, your two-case example doesn't choose the jump table. It just converts switch to if-then without consuming any more memory here:
 
int main8()
{
    int i =3;
 
    switch (i)
    {
        case 1: f1(); break;
        case 0x7fffffff: f2(); break;
        default: f3(); break;
    }
 
    return 0;
}
 
VS generates:
 
; 5    :     int i =3;
	mov	DWORD PTR _i$[ebp], 3
 
; 6    : 
; 7    :     switch (i)

	mov	eax, DWORD PTR _i$[ebp]
	mov	DWORD PTR tv64[ebp], eax
	cmp	DWORD PTR tv64[ebp], 1
	je	SHORT $LN3@main8
	cmp	DWORD PTR tv64[ebp], 2147483647		; 7fffffffH
	je	SHORT $LN2@main8
	jmp	SHORT $LN1@main8
 
$LN3@main8:
; 8    :     {
; 9    :         case 1: f1(); break;

	call	?f1@@YAXXZ				; f1
	jmp	SHORT $LN4@main8
 
$LN2@main8:
; 10   :         case 0x7fffffff: f2(); break;

	call	?f2@@YAXXZ				; f2
	jmp	SHORT $LN4@main8
 
$LN1@main8:
; 11   :         default: f3(); break;

	call	?f3@@YAXXZ				; f3
$LN4@main8:
 
; 12   :     }
; 13   : 
; 14   :     return 0;

GeneralRe: Memory versus time usage - About your two-case examplememberChristian Scholze29-Sep-10 10:29 
You are definitely right. I did not check what the compiler makes out of my two case select statement. I actually assumed it does what it did. But this was not my point.
 
I wanted to emphasize that one can always make a tradeoff between memory and time. This was never mentioned in all the other threads. It looks like the compiler makes a really optimized choice, depending on the problem. I even assume you cannot force this compiler to do a bad decision what method to use by any example.
 
My two case example was more for people who argued against the binary search and preferred the jump table under all circumstances. Therefore I started a new thread and didn't answer to any other. An extremely sparse populated table is a waste of memory. It always was and it always will be. Mathemeticians found good alternatives for their implementation and the compiler adapts these methods obviously very well.
 
Thanks for the reply anyhow. I really enjoyed reading your initial post, because most people don't care about these implementation details. From management you often hear that memory is not an issue and you shouldn't optimize your code regarding memory. This examples show, that memory would be an issue, if someone (the makers of the compiler) didn't take care of that.
GeneralRe: Memory versus time usage - About your two-case examplememberZuoliu Ding29-Sep-10 12:43 
I see your point. Yes, we have no way to affect how a compiler choose which implementations just with such a black box testing.
 
Memory is really a factor in software design, while the importance somehow depends on prospective applications. However, a well-developed project definitely needs to consider memory consumption, especially on the lower-level or machine oriented design such as communications and devices.
 
Thank you for your interests and comments.
GeneralRe: Memory versus time usagememberpeterchen11-Dec-12 4:50 
Christian Scholze wrote:
The perfect tradeoff is the binary search

 
It's not perfect, as it involves O(log N) conditional jumps, compared to one indirect jump for th table. If code cache is saturated, but the table is stuck in L1, table wins.
 
I'd guess the condition for binary search being a sparsely populated large range, whereas a table might be better for a dense Population, even if the range is large.

Generalassemble languagememberd3orn27-Sep-10 22:42 
Hey very nice article about a thing we all on a daily frequency, also it was good to see how switch looks in assemble language, I'm not that good in it but it was clear and good to understand
Keep up the good work
cheers d3orn (from www.d3orn.ch)
GeneralMy vote of 5memberMember 460292827-Sep-10 10:16 
An extremely well thought out and thurough analysis.
 
Job well done!
GeneralSlightly confusedmemberharold aptroot27-Sep-10 10:09 
You mention that, in case the cases are not in order, a sort is not worth while. But it would happen at compile-time, and it is very cheap compared to some of the other things the compiler would be doing. To make something like that slow, you'd need millions of cases.
So I'm not sure what you mean there..
GeneralRe: Slightly confusedmemberZuoliu Ding27-Sep-10 12:38 
Hi Harold,
 
You are right. The sorting would be simply preprocessed in compilation, not necessary at run time, so that the translated assembly code only reflected the binary search. This really helps to clarify the unanswered question there. I'll update to include this.
 
Thank you very much for your nice comments,
Zuoliu Ding
Generalit`s deeply analyzed articlememberCarlos_never17-Sep-10 5:13 
very good!! reverse engineering, assemble language teaching is easily understand, and you also talk about the arithmetic, even the IDE`s optimization. maybe I shoult vote for you,because I want to learn reverse engineering and assemble recently.
GeneralThank you all for your support and encouragementmemberZuoliu Ding15-Sep-10 19:09 
Also thanks to the Code Project giving us such an excellent online forum Laugh | :laugh:
GeneralMy vote of 5memberBigdeak13-Sep-10 23:00 
Really nice article! This will help to understand the difference between if and switch Smile | :)
GeneralMy vote of 4memberDookie12-Sep-10 20:35 
Very interesting, easy to read and understand.
Question[My vote of 2] Switch or if then ...C/C++ or assembly?memberVaclav_Sal20-Aug-10 17:55 
The article has an academic value only for an assembly language study.
In practice switch will eventually process the intended code ( with > conditions evaluations) , however, "if then" when coded in improper sequence ( same > evaluation) will not.
My point has been proven indirectly and by accident in the original code already ( default: without break )
I do not advocate sloppy code, but I am against adding complexity to multi-branch code.
AnswerRe: [My vote of 2] Switch or if then ...C/C++ or assembly?memberScot Brennecke27-Sep-10 18:31 
Huh? I don't think you really know what you are talking about. Switch is almost alwys going to be more efficient when there are more than a couple of conditions to test. "default without break"? Maybe he corrected it before I read it, but I don't see any default without break that matters here.
Scot Brennecke
Software Developer
VC++ MVP

GeneralRe: [My vote of 2] Switch or if then ...C/C++ or assembly?memberVaclav_Sal28-Sep-10 7:38 
Scot,
Starting a reply with " you do not know what are you talking about" is IMHO arrogant and not worth continuing in the discussion with you.
Have a swell day.
 
Computer hobbyist
Vaclav
GeneralRe: [My vote of 2] Switch or if then ...C/C++ or assembly?memberScot Brennecke28-Sep-10 18:07 
My tone was simply a rebuttal to your insult of "The article has an academic value only for an assembly language study."
Scot Brennecke
Software Developer
VC++ MVP

GeneralRe: [My vote of 2] Switch or if then ...C/C++ or assembly?memberCorey Fournier29-Sep-10 4:45 
I agree, I don't see what this guy is talking about. First off, if your default is not the last, then it will always execute before anything after it. Because of that, you should always put the default last and there is no need for a break.
GeneralThe complexity in the last example is O(log n) not O(n)memberwtwhite18-Aug-10 5:25 
The fact that the initial comparison is with 700 shows that the compiler is (rather intelligently) attempting to pick approximate half-way points at each step, reducing the problem by (roughly) half at each point, down to a minimum problem size. So I fully expect that if you double the number of cases tested, you will find that the worst-case number of comparisons goes up by only 1 (or perhaps 2), instead of doubling as your O(n) analysis would have it.
 
It's pretty awesome that the compiler does this! And besides this mistake, your article is otherwise very interesting. Smile | :)
GeneralRe: The complexity in the last example is O(log n) not O(n)memberZuoliu Ding18-Aug-10 6:15 
Very good point! Yes, this would be why the compiler first compare 700. With this pattern, we can think of it as a binary search, so that the complexity is O(log n). Now the next question would be: is this the worst-case? Or is there any possibility of O(n) in switch execution? If not, it means that switch is always better than if-than-else.
 
Thank you very much.
GeneralRe: The complexity in the last example is O(log n) not O(n)memberwtwhite18-Aug-10 6:24 
No problem Smile | :) Yes, it's a binary search.
 
As to whether this means switch is always better than chains of if-elses, you would have to test it. My hypothesis: if the compiler is smart enough to turn a switch into a binary search when it is efficient to do so, it's likely that it can also analyse a chain of if-elses to see whether it is equivalent to some switch statement, and if so, perform that transformation followed by any of the transformations (2-layer jump table, 1-layer jump table, or binary search) that it can apply to switch statements. The level of analysis required for this (or at least, required to catch most cases) is pretty low when compared to other analyses performed for code optimisation. But that's just my guess.
GeneralRe: The complexity in the last example is O(log n) not O(n)membersupercat923-Aug-10 8:05 
I would not expect a compiler to rearrange 'if' statements. If the tests do not involve volatile variables and do not have side-effects, I suppose it could do resequence the tests, but sometimes programmers have reasons for writing things a certain way.
 
For example:
  if (var == 0)
    do_this();
  else if (var == 0x512B54A3)
    do_this();
  ... etc. for 30 more cases
if all cases are equally likely, using a binary search will reduce the worst-case number of tests from 31 to 5-6. On the other hand, it will also increase the best-case number of tests from 1 to 5-6. If the first tested value occurs 90%+ of the time, rewriting to use a binary search is unlikely to improve performance, and may make it a fair bit worse.
 
Incidentally, in such cases, if one or two values will occur much more often than any other, my tendency would be to use explicit 'if' tests for the very common cases, and a switch statement for the rest.
GeneralRe: The complexity in the last example is O(log n) not O(n)memberwtwhite23-Aug-10 17:37 
Good point, the programmer often has a much better idea than the compiler about which tests will most commonly succeed. So I agree it would be sensible for the compiler not to reorder if statements, even if it could.
GeneralRe: The complexity in the last example is O(log n) not O(n)memberCharvak Karpe24-Aug-10 12:30 
Maybe the lesson learned here is to use switch-case statements when you want the binary optimisation, and use if-then-else when you want to specify the order of testing.
 
Zuoliu, can you please correct your article to highlight that switch-case statements are O(log n)? I began the article thinking that I'd implement it using an O(n*log n) sort at compile time, giving O(log n) performance at runtime, and kept expecting to see that. By the end, I didn't catch that your O(n) example was actually using an O(log n) binary search and the only reason I wouldn't have checked the comments if I hadn't thought it could be better than O(n).
 
You can prove that the only situations in which the switch-case construct will require O(n) time are ones in which there are side effects to the tests. I'm sure it's poor programming practise to rely on side-effects of switch-case comparisons. As long as there are no side effects, the cases can always be sorted into ascending or descending order, and therefore can always be binary searched.
GeneralRe: The complexity in the last example is O(log n) not O(n)memberZuoliu Ding24-Aug-10 13:33 
Thank you for your suggestion. I'll update the third example to explain its binary search features soon. Hope to do around this Labor Day weekend. While I still cannot catch an example that requires O(n). Any idea? What do you mean that "side effects"?
GeneralRe: The complexity in the last example is O(log n) not O(n)memberwtwhite24-Aug-10 15:38 
I mostly agree, but I would warn against inferring anything about the behaviour of C++ compilers in general from a handful of results from a particular version of a particular compiler, as interesting as they may be! The behaviour could change in any new version. Also, many compilers can now perform profile-guided optimisation, which I suspect renders this kind of analysis irrelevant.
 
BTW, switch tests (unlike if tests) are always side-effect-free -- the language spec mandates that they can only ever be an equality comparison between an expression (which will be evaluated only once) and several literal values. So I expect Visual C++ will never generate an O(n) switch statement -- the only decision the compiler needs to make is whether to use an O(log n) test, or one of two kinds of O(1) jump-table operations that are even faster but can only work when certain conditions are satisfied.
GeneralRe: The complexity in the last example is O(log n) not O(n)memberCharvak Karpe25-Aug-10 4:01 
Side-effects occur when you have something like:
 
case ++i: dosomething(); break;
case --i: dosomethingelse(); break;
 
which is possible with an if-then-else setup, but apparently not possible with switch-case because the cases cannot be expressions or functions. So, it looks like there's never a situation in which switch-case would require O(n) time.
GeneralRe: The complexity in the last example is O(log n) not O(n)memberZuoliu Ding25-Aug-10 5:46 
Yes, I see. Thanks!
GeneralRe: The complexity in the last example is O(log n) not O(n)memberdojohansen24-Aug-10 4:11 
Zuoliu Ding wrote:
it means that switch is always better than if-than-else

 
Except of course that in nearly all cases where you would use either an if-then-else or a switch structure, the number of comparisons is so low that the asymptotic behavior is completely irrelevant! That said, the coefficients for if or switch has got to be rather similar, since they both do the same comparisons; and log n is clearly not greater than n for n >= 1.
 
Still, if we're in the academical corner, I think it's worth mentioning that asymptotic behavior isn't always the relevant metric for real-world performance. Smile | :)
GeneralRe: The complexity in the last example is O(log n) not O(n)memberZuoliu Ding24-Aug-10 6:45 
The problem is that with the analyzing method used here, we cannot trace an exact logic how the compiler rearranges the comparison sequence since there is no such operation found in assembly listing like switch3.asm. Actually, this rearrangement does make difference and can lead to sequential search or binary search. Unfortunately, the information is hidden and any discussion here is just hypothesis.
Questionbreak; after default?memberBlake Miller17-Aug-10 11:25 
In your fourth example:
 
        
        default: f3(); 
        case -3: f2(); break;
 
Won't case -3 be handled if the default is processed, since there is no break; to avoid falling into case -3 D'Oh! | :doh:

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Permalink | Advertise | Privacy | Mobile
Web02 | 2.6.130617.1 | Last Updated 19 Dec 2012
Article Copyright 2010 by Zuoliu Ding
Everything else Copyright © CodeProject, 1999-2013
Terms of Use
Layout: fixed | fluid