I like this article and it addresses something I have heard several times from developers: "don't use switch-case because it is less efficient than an if-else" (I am para-phrasing and simplifying of course).
I missed this article when it was first published, so it is too late to jump in on any of the previous discussion, but the "Memory vs time" discussion made me think. In my CPU intensive projects, I set the optimization settings to "Maximize speed" and "Favor fast code" - the memory footprint is not important to me, but I think Christian Scholze does have a point in bringing it up.
I don't see any mentioning of the optimization settings you use, so I assume you just used the default (I know I can check the samples, but others might benefit from this discussion too).
My suggestion is that you recompile your samples using both "fast code" and "small code" optimizations. If there is no difference in how the compiler handles it, just make a note of that in the article. If there is a difference, I would like to see a table at the end of the article summarizing for each sample, which approach the compiler selects based on the optimization setting.
Thank you so much for your suggestion of optimization idea. I saw the topic mentioned in earlier discussions already and looks definitely worth writing something here.
Yes, you are right that all code generated in this article with optimization disabled in compilation and link, so that compare and analyze the code under the same conditions originally.
For sure, any optimization either for speed or size, will be reflected in the disassemble code that has been demonstrated in my Assembly Programming class. To fully consider it, I have to take time to write another article in this topic. Hopefully I could get chance to think about soon.
Thanks for your interest and discussion, Happy holidays!
Great article. Thanks for writing (and updating) it. I knew about some of these details from books, other articles and courses I have taken, but you have provided the most details and done more research than anybody else I have seen.
The other day I was looking at my code to save few CPU clocks. It was very interesting for me that ternary operator takes extra 'mov' as compare to ordinary 'if else'. To my understanding it always takes extra mov for base pointer in case of ternary operator and I have no idea why it is so. Would love to know if someone can share why there is an extra 'mov'.
Below is food for thought:
Consider 'r' and 'i' are 'int' type variables.
r = i==0 ? 2 : 3;
cmpl $0x0,-0x4(%rbp)
jne 0x100000f2c <main+32>
movl $0x2,-0x14(%rbp)
jmp 0x100000f33 <main+39>
movl $0x3,-0x14(%rbp)
mov -0x14(%rbp),%eax
mov %eax,-0x8(%rbp)
And here we do again with classic 'if else' structure.
if(i==0)
cmpl $0x0,-0x4(%rbp)
jne 0x100000f30 <main+32>
r = 2;
movl $0x2,-0x8(%rbp)
jmp 0x100000f37 <main+39>
r =3;
movl $0x3,-0x8(%rbp)
Not sure what happens to the GNU compiler. To verify, just find the dissembler tool there to check code. While the implementation could be different, the basic idea should be similar, for example, the Case statement in Delphi.
More than just an exercise well worth doing, which it certainly is. Very well written and comprehensive. Will require several readings to extract all the ideas here.
A completely missed point is the memory footprint. (And don't tell me on modern systems this is not an issue!)
The jump tables described at the beginning of the article would implement O(1) behavior under all circumstances. But they are very memory intensive for sparsely populated switch cases. The perfect tradeoff is the binary search which reduces the memory requirements linear to the number of cases at the cost of a binary search with O(log n) behavior. There is no worse case than o(log n) for this type of switch implementation!
Only under rare circumstances a programmer would like to control the compiler decision which method to use (time versus memory). The majority of us should let the compiler decide what method to implement. If you definitely need O(1) behavior, write your own jump tables.
As an example for a really bad memory footprint imagine a switch statement with only two case entries 0 and 0x7fffffff. A jump table would consume 2 G entries which is more than any 32-bit computer can handle. Memory can be an issue on modern systems when wasted like this.
This is not true for the two-case example with a bit misunderstanding. As mentioned, we don't know under which conditions, the compiler chooses one implementation instead of the other. Please see below, your two-case example doesn't choose the jump table. It just converts switch to if-then without consuming any more memory here:
int main8()
{
int i =3;
switch (i)
{
case1: f1(); break;
case 0x7fffffff: f2(); break;
default: f3(); break;
}
return0;
}
VS generates:
; 5 : int i =3;
mov DWORD PTR _i$[ebp], 3; 6 :
; 7 : switch (i)
moveax, DWORD PTR _i$[ebp]
mov DWORD PTR tv64[ebp], eaxcmp DWORD PTR tv64[ebp], 1je SHORT $LN3@main8
cmp DWORD PTR tv64[ebp], 2147483647; 7fffffffH
je SHORT $LN2@main8
jmp SHORT $LN1@main8
$LN3@main8:
; 8 : {
; 9 : case 1: f1(); break;
call ?f1@@YAXXZ ; f1
jmp SHORT $LN4@main8
$LN2@main8:
; 10 : case 0x7fffffff: f2(); break;
call ?f2@@YAXXZ ; f2
jmp SHORT $LN4@main8
$LN1@main8:
; 11 : default: f3(); break;
call ?f3@@YAXXZ ; f3
$LN4@main8:
; 12 : }
; 13 :
; 14 : return 0;
You are definitely right. I did not check what the compiler makes out of my two case select statement. I actually assumed it does what it did. But this was not my point.
I wanted to emphasize that one can always make a tradeoff between memory and time. This was never mentioned in all the other threads. It looks like the compiler makes a really optimized choice, depending on the problem. I even assume you cannot force this compiler to do a bad decision what method to use by any example.
My two case example was more for people who argued against the binary search and preferred the jump table under all circumstances. Therefore I started a new thread and didn't answer to any other. An extremely sparse populated table is a waste of memory. It always was and it always will be. Mathemeticians found good alternatives for their implementation and the compiler adapts these methods obviously very well.
Thanks for the reply anyhow. I really enjoyed reading your initial post, because most people don't care about these implementation details. From management you often hear that memory is not an issue and you shouldn't optimize your code regarding memory. This examples show, that memory would be an issue, if someone (the makers of the compiler) didn't take care of that.
I see your point. Yes, we have no way to affect how a compiler choose which implementations just with such a black box testing.
Memory is really a factor in software design, while the importance somehow depends on prospective applications. However, a well-developed project definitely needs to consider memory consumption, especially on the lower-level or machine oriented design such as communications and devices.
It's not perfect, as it involves O(log N) conditional jumps, compared to one indirect jump for th table. If code cache is saturated, but the table is stuck in L1, table wins.
I'd guess the condition for binary search being a sparsely populated large range, whereas a table might be better for a dense Population, even if the range is large.
Hey very nice article about a thing we all on a daily frequency, also it was good to see how switch looks in assemble language, I'm not that good in it but it was clear and good to understand
Keep up the good work
cheers d3orn (from www.d3orn.ch)