Introduction
Many programming languages such as C/C++, C#, Java, and Pascal provide the switch
statement to let us implement selection logic. In some scenarios, it's a good alternative to if-then-else
, making code clearer and more readable. When using switch
in practice, you may want to know:
- How the
switch
block is executed at runtime? - Is it running faster than an
if-then-else
for a long list of conditions? - For n conditions, what is the
switch
time complexity?
The C/C++ standard defines the specification of language elements, but it doesn't say anything about how to implement the switch
statement. Every vendor is free to use any implementation as long as it fits the standard. This article discusses what happens when running a switch
statement in Visual C++, with a few examples under different conditions. We'll analyze these examples by using the Microsoft Visual Studio IDE, since it can generate the corresponding assembly listing on compiling. Thus, a general understanding of Intel (x86) assembly language is assumed. As you can see shortly, all results here are based on reverse engineering, and so the article is never a comprehensive description of switch
implementations in compilers. If you are learning Assembly Language Programming, this article might be a study material to read.
Our first example is switch1.cpp, a commonly used simple block as below:
#include "functions.h"
int main()
{
int i =3;
switch (i)
{
case 1: f1(); break;
case 2: f2(); break;
case 5: f1(); break;
case 7: f2(); break;
case 10: f1(); break;
case 11: f2(); break;
case 12: f2(); break;
case 17: f1(); break;
case 18: f1(); break;
default: f3();
}
return 0;
}
where three functions are defined in functions.cpp:
void f1() { cout "f1 called\n"; }
void f2() { cout "f2 called\n"; }
void f3() { cout "f3 called\n"; }
Could the worst case be thought of as i=3
or i=20
? How does it execute: does it exhaust all nine cases and finally reach the default
to call f3
? Let's answer it from the assembly translations.
In order to generate an assembly listing in Visual Studio, open the switch1.cpp Property dialog and select the Output Files category under C/C++. On the right pane, choose the Assembly With Source Code (/FAs) option as shown below:
Then, when you compile switch1.cpp, an assembly file called switch1.asm will be generated. Using this option, the listing includes the C++ source code, which is commented by the semi-colon with a line number as shown in the next section.
The Two-Level Jump Table
Let's analyze the assembly listing from top down. Here is where the switch
starts:
mov DWORD PTR _i$[ebp], 3
mov eax, DWORD PTR _i$[ebp]
mov DWORD PTR tv64[ebp], eax
mov ecx, DWORD PTR tv64[ebp]
sub ecx, 1
mov DWORD PTR tv64[ebp], ecx
cmp DWORD PTR tv64[ebp], 17
ja SHORT $LN1@main
mov edx, DWORD PTR tv64[ebp]
movzx eax, BYTE PTR $LN15@main[edx]
jmp DWORD PTR $LN16@main[eax*4]
Assuming that the symbol _i$[ebp]
is an alias of i
, tv64[ebp]
is another name of i2
, $LN15@main
is renamed as table1
, $LN16@main
as table2
, and $LN1@main
is a label named Default_Label
. The snippet merely does this in pseudocode:
i2 = i;
i2 = i2-1;
if i2 > 17 goto Default_Label;
goto table2[4*table1[i2]];
Here, 17 indicates the last case condition value, because the integers 1 to 18 are mapped from 0 to 17. This is why i2
is decremented to make it a zero-based integer as an index. Now, if i2
is greater than 17 (e.g., n=20
), the control goes to the default
. Otherwise, it goes to where table2[4*table1[i2]]
is pointed.
How about when i
is zero? Then i2
becomes -1. Worry about the index out of range error? No, it will never happen. Back to the assembly listing, you can see that -1 is saved as DWORD
, the double word as an unsigned 4-byte integer. Hence, it must be greater than 17 and goes to default
.
Let's look at two tables and see how they work together. The table1
is pretty simple, with a starting address at $LN15@main
, which you can just think of as an array name.
$LN15@main:
DB 0
DB 1
DB 9
DB 9
DB 2
DB 9
DB 3
DB 9
DB 9
DB 4
DB 5
DB 6
DB 9
DB 9
DB 9
DB 9
DB 7
DB 8
With this array, table1[0]
is 0, table1[1]
is 1, table1[2]
and table1[3]
are 9, etc. These values are created for the calculation of the index of table2
, which starts at $LN16@main
:
$LN16@main:
DD $LN10@main
DD $LN9@main
DD $LN8@main
DD $LN7@main
DD $LN6@main
DD $LN5@main
DD $LN4@main
DD $LN3@main
DD $LN2@main
DD $LN1@main
The above labels, from $LN10@main
to $LN1@main
, are ten calling targets in C++, for nine cases plus one default
. Notice that DB
represents defining byte (8 bits), while DD
defines the double word type of four bytes (32 bits). This is why we need to multiply 4 in table2[4*table1[i2]]
. By this formula, we calculate the calling address via table1
and table2
:
- if
i
equals 1, i2
is 0 and table1[0]
is 0, jump to $LN10@main
by table2[0]
, the first case. - if
i
equals 2, i2
is 1 and table1[1]
is 1, jump to $LN9@main
by table2[4*1]
, the second case. - if
i
equals 3, i2
is 2 and table1[2]
is 9, jump to $LN1@main
by table2[4*9]
, the default. - ... ...
Now we come to the code segment labeled by LN10@main
to $LN1@main
as the calling targets:
$LN10@main:
call ?f1@@YAXXZ
jmp SHORT $LN11@main
$LN9@main:
call ?f2@@YAXXZ
jmp SHORT $LN11@main
$LN8@main:
call ?f1@@YAXXZ
jmp SHORT $LN11@main
$LN2@main:
call ?f1@@YAXXZ
jmp SHORT $LN11@main
$LN1@main:
call ?f3@@YAXXZ
$LN11@main:
Functions f1
, f2
, and f3
are converted to the decorated assembly procedures, and prefixed with a question mark, like ?f1@@YAXXZ
, ?f2@@YAXXZ
, and ?f3@@YAXXZ
. Notice that $LN10@main
is a label of case 1 to call f1
, $LN9@main
for case 2 to call f2
, and $LN1@main
for default
to call f3
.
An additional label is $LN11@main
, pointing to the location after the last default
clause. As soon as each case operation is done, the control jumps to $LN11@main
. This implements the break
statement. Without it, the control goes to the next case. This is why the break
statement is necessary in the C/C++ switch
block.
Obviously, based on such a two-level table mechanism, we have one comparison, one multiplication, and two address jumps. The time complexity of this switch
pattern should be O(1). Put this all together, we have a big picture like this:
Recall that we use DB
to define index values in table1
, and DD
for table2
. In this example, there are only 18 case values for table1
. But DB
defining byte
means the range is only from zero to 255. What if there are more cases in switch
, with the number of values over 255?
The One-Level Jump Table
Here comes the second example, switch2.cpp with 1000 cases. For simplicity, we start from the condition value zero:
int main2()
{
int i =1000;
switch (i)
{
case 0: f1(); break;
case 1: f1(); break;
case 2: f2(); break;
case 3: f1(); break;
case 995: f1(); break;
case 996: f1(); break;
case 997: f1(); break;
case 998: f2(); break;
case 999: f1(); break;
default: f3();
}
return 0;
}
Don't think it makes things complicated. On the contrary, it becomes simpler. This is the switch2.asm that Visual Studio generates for us:
mov DWORD PTR _i$[ebp], 1000
mov eax, DWORD PTR _i$[ebp]
mov DWORD PTR tv64[ebp], eax
cmp DWORD PTR tv64[ebp], 999
ja $LN1@main2
mov ecx, DWORD PTR tv64[ebp]
jmp DWORD PTR $LN1006@main2[ecx*4]
Likewise, assume that the symbol _i$[ebp]
is an alias of i
. Because the first case starts from zero, no mapping is needed. So i2
(from tv64[ebp]
) equals i
. The array $LN1006@main2
is an only jump table, and the label $LN1@main2
is Default_Label
. This snippet simply does this:
i2 = i;
if i2 > 999 goto Default_Label;
goto table[4*i2];
By searching $LN1006@main2
in switch2.asm, we find the code labels all defined by double words:
$LN1006@main2:
DD $LN1001@main2
DD $LN1000@main2
DD $LN999@main2
DD $LN998@main2
DD $LN997@main2
DD $LN996@main2
DD $LN5@main2
DD $LN4@main2
DD $LN3@main2
DD $LN2@main2
Totally 1000 calling targets, each label is followed by a procedure call, downwards from $LN1001@main2
to $LN2@main2
:
$LN1001@main2:
call ?f1@@YAXXZ
jmp $LN1002@main2
$LN1000@main2:
call ?f1@@YAXXZ
jmp $LN1002@main2
$LN999@main2:
call ?f2@@YAXXZ
jmp $LN1002@main2
$LN3@main2:
call ?f2@@YAXXZ
jmp SHORT $LN1002@main2
$LN2@main2:
call ?f1@@YAXXZ
jmp SHORT $LN1002@main2
$LN1@main2:
call ?f3@@YAXXZ
$LN1002@main2:
Here, $LN1@main2
directs to default
, and the additional label $LN1002@main
implements break
. Although the complexity is also O(1), this mechanism should be a little bit efficient since only one address jump is required. The following is a picture of this example:
Until now, we see the switch
illustrations executed so well. Is it really better to use switch
than if-then-else
? For n
conditions without a match, definitely, if-then-else
must check each condition until the last, which is the worst case of O(n). But for switch
, do we always expect its complexity to be O(1)? Or is there some kind of code that would lead its execution beyond that?
Using Binary Search
We will give the third example showing the large gap between case condition values in switch3.cpp, where the switch
execution behaves as the binary search:
int main3()
{
int i =1;
switch (i)
{
case 100: f1(); break;
case 200: f2(); break;
case 250: f2(); break;
case 500: f1(); break;
case 700: f2(); break;
case 750: f2(); break;
case 800: f2(); break;
case 900: f1(); break;
default: f3();
}
return 0;
}
By generating the assembly listing switch3.asm, this time, we take a look from bottom up by checking the case operations first:
$LN9@main3:
call ?f1@@YAXXZ
jmp SHORT $LN10@main3
$LN8@main3:
call ?f2@@YAXXZ
jmp SHORT $LN10@main3
$LN7@main3:
call ?f2@@YAXXZ
jmp SHORT $LN10@main3
$LN6@main3:
call ?f1@@YAXXZ
jmp SHORT $LN10@main3
$LN5@main3:
call ?f2@@YAXXZ
jmp SHORT $LN10@main3
$LN4@main3:
call ?f2@@YAXXZ
jmp SHORT $LN10@main3
$LN3@main3:
call ?f2@@YAXXZ
jmp SHORT $LN10@main3
$LN2@main3:
call ?f1@@YAXXZ
jmp SHORT $LN10@main3
$LN1@main3:
call ?f3@@YAXXZ
$LN10@main3:
Pretty neat, with eight cases plus one default
, we can easily write this part into a list of nine rows, each corresponding to a target label, a C++ case, and a procedure name:
Besides $LN9@main3
to $LN1@main3
, we also have $LN10@main3
implement the break
statement. Next, how do we jump to these calling targets? Look at the following code:
mov DWORD PTR _i$[ebp], 1
mov eax, DWORD PTR _i$[ebp]
mov DWORD PTR tv64[ebp], eax
cmp DWORD PTR tv64[ebp], 700
jg SHORT $LN14@main3
cmp DWORD PTR tv64[ebp], 700
je SHORT $LN5@main3
cmp DWORD PTR tv64[ebp], 250
jg SHORT $LN15@main3
cmp DWORD PTR tv64[ebp], 250
je SHORT $LN7@main3
cmp DWORD PTR tv64[ebp], 100
je SHORT $LN9@main3
cmp DWORD PTR tv64[ebp], 200
je SHORT $LN8@main3
jmp SHORT $LN1@main3
$LN15@main3:
cmp DWORD PTR tv64[ebp], 500
je SHORT $LN6@main3
jmp SHORT $LN1@main3
$LN14@main3:
cmp DWORD PTR tv64[ebp], 750
je SHORT $LN4@main3
cmp DWORD PTR tv64[ebp], 800
je SHORT $LN3@main3
cmp DWORD PTR tv64[ebp], 900
je SHORT $LN2@main3
jmp SHORT $LN1@main3
The logic is not too hard to understand. At first, the condition value i
is saved in both _i$[ebp]
and tv64[ebp]
. To simplify, we just make use of all the labels here by removing the leading "$" and the tailing "@main3". Rewrite the snippet like this:
i2 = i;
if i2 > 700 goto LN14;
if i2 == 700 goto LN5;
if i2 > 250 goto LN15;
if i2 == 250 goto LN7;
if i2 == 100 goto LN9;
if i2 == 200 goto LN8;
goto LN1;
LN15:
if i2 == 500 goto LN6;
goto LN1;
LN14:
if i2 == 750 goto LN4;
if i2 == 800 goto LN3;
if i2 == 900 goto LN2;
goto LN1;
Surely, the compiler optimizes the code, because it first chooses 700 to compare, instead of the beginning case value 100 in the switch
block. Although this switch
is implemented with ten if-then
statements for nine conditions, it actually applies the binary search mechanism. The following shows the equivalent decision tree of above code, a Binary Search Tree that you may be familiar with:
For all internal nodes colored yellow in circle, we make the comparisons, while all leave nodes in oval represent the successful ending. Each failure goes to LN1
by default. The inorder traversal sequence for comparison nodes is 100, 200, 250, 500, 700, 750, 800, and 900; while the sequence for all leaves in the inorder traversal is exactly LN9
, LN8
, LN7
, LN6
, LN5
, LN4
, LN3
, and LN2
, as ordered in the previous table. Essentially, this is a binary search algorithm with the complexity of O(log n). When i
=1, it will pass through the first six comparisons and then reach default
at LN1
. But when i
=500, just four comparisons are required.
As we know, the prerequisite of binary search is that the input data is sorted. I wonder if this is just because I use the ascending case values in switch
in switch3.cpp. The curiosity brings out the following disordered conditions in switch4.cpp.
int main4()
{
int i =1;
switch (i)
{
case 750: f2(); break;
case 700: f2(); break;
case 250: f2(); break;
case 500: f1(); break;
case 800: f2(); break;
case 900: f1(); break;
case 100: f1(); break;
case 200: f2(); break;
default: f3();
}
return 0;
}
To my surprise, the same binary search strategy appears in code of switch4.asm, with exact the same decision tree as shown above. Only difference is that the labels are renumbered - that's quite reasonable, because we have just reordered them! You can examine the attached switch4.asm for details.
This experiment definitely brings us some hint to find out how the compiler does its magic to sort the case condition values. A sorting algorithm is over O(log n) and not worthwhile to have it at runtime. Notice that sorting is not visible from generated assemblies. This means that sorting is not contained in the assembly instructions which will execute at runtime. Also notice that all the condition values are constant and available before the compiler translates the C++ to machine code. Now it’s reasonable to think of the preprocessor; the sorting with all known case values could be simply done in compilation. This is why the translated assembly code only reflects the binary search without sorting code. The static sorting behavior (as opposed to dynamic behavior at runtime), could be implemented with a macro procedure, assembly directives and operators. Such an preprocessing example can be found in References at the end of the article.
The Hybrid
At this moment, I can show an example of combination of a jump table and the binary search as in switch5.cpp.
int main5()
{
int i =1;
switch (i)
{
case 3: f1(); break;
case 5: f2(); break;
case 6: f2(); break;
case 7: f2(); break;
case 100: f1(); break;
case 200: f2(); break;
case 250: f2(); break;
case 700: f2(); break;
case 750: f2(); break;
case 900: f2(); break;
default: f3();
}
return 0;
}
The following is how this switch
block is implemented in the corresponding switch5.asm:
mov DWORD PTR _i$[ebp], 1
mov eax, DWORD PTR _i$[ebp]
mov DWORD PTR tv64[ebp], eax
cmp DWORD PTR tv64[ebp], 100
jg SHORT $LN16@main5
cmp DWORD PTR tv64[ebp], 100
je $LN7@main5
mov ecx, DWORD PTR tv64[ebp]
sub ecx, 3
mov DWORD PTR tv64[ebp], ecx
cmp DWORD PTR tv64[ebp], 4
ja $LN1@main5
mov edx, DWORD PTR tv64[ebp]
jmp DWORD PTR $LN18@main5[edx*4]
$LN16@main5:
cmp DWORD PTR tv64[ebp], 700
jg SHORT $LN17@main5
cmp DWORD PTR tv64[ebp], 700
je SHORT $LN4@main5
cmp DWORD PTR tv64[ebp], 200
je SHORT $LN6@main5
cmp DWORD PTR tv64[ebp], 250
je SHORT $LN5@main5
jmp SHORT $LN1@main5
$LN17@main5:
cmp DWORD PTR tv64[ebp], 750
je SHORT $LN3@main5
cmp DWORD PTR tv64[ebp], 900
je SHORT $LN2@main5
jmp SHORT $LN1@main5
It consists of two parts, a one-level jump table first and then the binary search code. Using the previous notations of i
, i2
and label naming, let $LN18@main5
as a table and $LN1@main5
as default
. This snippet does this:
i2 = i;
if i2 > 100 goto LN16;
if i2 == 100 goto LN7; // go case 100
// now i2 < 100
i2 = i2-3; // map to zero-based index
if i2 > 4 goto LN1; // go default over 7
goto table[4*i2]; // go jump table
LN16: // binary search
if i2 > 700 goto LN17;
if i2 == 700 goto LN4; // go case 700
if i2 == 200 goto LN6; // go case 200
if i2 == 250 goto LN5; // go case 250
goto LN1; // go default
LN17:
if i2 == 750 goto LN3; // go case 750
if i2 == 900 goto LN2; // go default
goto LN1;
where the table is defined as:
$LN18@main5:
DD LN11
DD LN1
DD LN10
DD LN9
DD LN8
An interesting change is to remove the case 6 in switch5.cpp. Just because of this, the hybrid becomes the combination of a two-level jump table and the binary search. For details, look at switch6.cpp and switch6.asm in downloadable samples. If you try to add extra cases in some order, you can find two individual jump tables combined with the binary search.
More Questions and Beyond
Now you have learned something about switch
from some typical examples. You should understand now why the C/C++ switch
only supports an integral data type for its conditional expression, such as char
and enumeration type, rather than the float
point or string
type. Besides these, more intuitive questions might come to your head:
- Is it necessary to maintain the cases in the order of condition values for a
jump
table? - How about if we use negative integers as case values?
- What if the
default
label is missing, or appears anywhere in the block not at the last?
I believe you can answer these just by analyzing an assembly listing of the C++ code containing these questions. For convenience, I attached the following switch7.cpp with its switch7.asm in samples.
int main7()
{
int i =15;
switch (i)
{
case 2: f2(); break;
case 1: f1(); break;
case 5: f1(); break;
case 10: f1(); break;
case 7: f2(); break;
default: f3(); break;
case -3: f2(); break;
case 12: f1(); break;
case 17: f2(); break;
case 18: f1(); break;
}
return 0;
}
Although the switch
performance looks better than if-than-else
in these examples, we still have questions unanswered. Obviously, it's not possible to enumerate all the switch
executions in practice. A comprehensive analysis of switch
implementations should be written by a compiler developer, not by a blind black-box tester. So, we are not able to determine the worst case in the switch
execution, whether it would reach O(n) or not. Also we don't know under which conditions, the compiler chooses one implementation instead of another or even other methods unmentioned here.
As an example from a reader in discussion, one concern is about the memory waste of above said jump tables when implementing sparsely populated switch cases. How about an extreme example with only two case entries 1
and 0x7fffffff
as below? Would the compiler consume most useless entries in a jump table?
int main8()
{
int i =3;
switch (i)
{
case 1: f1(); break;
case 0x7fffffff: f2(); break;
default: f3(); break;
}
return 0;
}
This is not true for our smart compiler.
As mentioned, we don't know how the compiler chooses
one implementation instead of the other. Please see here, this
two-case example doesn't choose the jump table. It just simply converts switch
to if-then
without consuming any more memory:
mov DWORD PTR _i$[ebp], 3
mov eax, DWORD PTR _i$[ebp]
mov DWORD PTR tv64[ebp], eax
cmp DWORD PTR tv64[ebp], 1
je SHORT $LN3@main8
cmp DWORD PTR tv64[ebp], 2147483647
je SHORT $LN2@main8
jmp SHORT $LN1@main8
$LN3@main8:
call ?f1@@YAXXZ
jmp SHORT $LN4@main8
$LN2@main8:
call ?f2@@YAXXZ
jmp SHORT $LN4@main8
$LN1@main8:
call ?f3@@YAXXZ
$LN4@main8:
Again, no way to affect or predict the compiler's choices with such a black box testing. Depending on the problem, either sparsely or densely populated switch cases, the compiler adapts them obviously very well.
Summary
By scrutinizing the above examples, we expose something that you may not know about the switch
at runtime. To analyze a C/C++ program in Visual Studio, we can have both static and dynamic analyses. For this article, we make use of the assembly listings generated by the compiler. On the other hand, we can also monitor binary executions at runtime with the VS Disassembly window in debug, where the labels and procedures in the listing are translated to memory addresses. This way, you can trace registers and memory allocations to understand data endianness, stack frames, etc. Here, for our purpose, the assembly listing seems sufficient and easy to indicate a jump from here to there. The downloadable zip file contains seven examples, with both the .cpp source code and the .asm listings in VS 2010. The same listings can be generated by an earlier version of VS.
This article does not involve hot technologies like .NET or C#. Assembly Language is comparatively traditional without much sensation these days. However, in applications, different types of assembly languages play an important role in new device development. Academically, Assembly Language Programming is considered as a demanding course in Computer Science. I am teaching Assembly Language Programming, CSCI 241, at Fullerton College, where the concept of high-level language interfacing with low-level execution is an essential part in teaching. In particular, one of the main topics that students are interested in is how C/C++ or Java runs on a machine, represented by low-level data structures and algorithms. Therefore, I hope this article could also serve as one of the basic problem-solving examples for students. Any comments and suggestions are welcome.
Acknowledgment
I would like to sincerely thank wtwhite, Charvak Karpe, and Harold Aptroot for their valuable discussions about the binary search here. Without them, the section "Using Binary Search" would not be like it as today.
References
History
- 9 August, 2010 -- Original version posted
- 8 September, 2010 -- Modified and rewrote some sections:
- Replaced the section "The Worst Case of the
switch
Statement" with "Using Binary Search". Added the Binary Search Tree image. - Added the section "The Hybrid"
- Rewrote the section "More Questions and beyond"
- Adjusted the paragraphs between first two sections
- Added the explanation for a negative index in the section "The Two-Level
Jump
Table"
- 28 September, 2010 -- Article updated
- Clarified the unanswered question about sort prerequisite for the binary search.
- Added Acknowledgement and References
- 19 December, 2012 -- Incorporated the memory discussion example from Christian Scholze
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.