this post is a bit low level. I’ve been writing a lot of C & C++ recently, and found a few concepts eye opening. This post is dedicated to all the people that think that switch
is a glorified if
.
I’m going to do my best to avoid writing a lot of assembly, and make this post as easy as possible. Don’t worry if you don’t know any assembly, I’ll explain everything!
The if statement
First, we need to understand how if
statements are transformed from c
to assembly
.
We’ll use gcc’s -S
flag to generate the assembly of our functions:
gcc -O2 -S -c foo.c |
Here’s a simple example of a max
implementation:
int max(int x, int y) { |
disassembly | assembly | ||
---|---|---|---|
|
|
move
move 8(R8), R3 |
x
and y
are saved to registers R3
and R1
respectively.
Special Registers
R8
A register that holds the pointer to the beginning of the frame. For example:
move 8(R8), R3 |
- get the address of X at
R8 + sizeof(pointer) * 8
- all variables are stacked after the frame pointer
- move the value that is addressed by the above, and put it in
R3
.
R1
A register dedicated for return values
. y
is already saved to R1, so if x < y
, there’s nothing to do. otherwise, we move the content of R3
to R1
and return.
compare
compare R1,R3 |
compare R1
and R3
. compare
sets a register flag that is used by jle
to determine if it should jump to L9
label or not.
jle
jle L9 |
jump lower or equal. In other words: jump to label L9
if x < y
. otherwise, go to the next instruction.
! Remember that the value that’s currently saved in R1
is the one that’s used as the return value.
The switch statement
If switch
was a glorified if
, then the following would’ve compiled into a bunch of move-compare-jle-move statements.
typedef enum {ADD, MULT, MINUS, DIV, MOD, BAD} op_type; |
but actually, it’s compiled to this weird beast:
unparse_symbol: |
What’s going on here?
Jump Table
Each block is labeled, and a table is created at compile time to bind an op to a label. These “bind table” is also called a “Jump Table”:
.section .rodata |
As you can see, there are five blocks. One for each case:
.L51: |
If the following was C code, It would probably look similar to this:
// get the pointer to the block of code that belongs to 'op' |
and back to the assembly…
move 8(R8),R1 // move the switch operation to R1 |
Lets talk a bit about the last instruction, because it’s the most complicated one:
jmp *.L57(,R1,4) |
.L57
is address of the jump table.Lxx
is translated into an address by the assembler.- The jump table is layout sequentially in memory, so all the cells are squeezed next to each other.
- The
switch
operation is the index in the table. In 32 bit architectures, We’re talking about 4 bytes.
All in all, the address of the op’s cell will be at: L57 + op * 4
in a 32bit machine.
In conclusion, we only need one instruction to:
- access the jump table
- find the right cell
- take the label address that’s saved in the cell
- jump to that location (which holds the code block)
cool right?!
Switch optimizations
int div111(int x) { |
Now the cases aren’t laid out sequentially in memory. would that produce a 1000 cell jump table?
Theoretically, without any optimization, the jump table would’ve contained many empty cells. Instead, the compiler performs a neat optimization: It produces a binary search tree to find the right case on runtime:
move 8(R8),R1 |
If you look closely, you’ll see that the compiler actually removed the jump table and replaced it with compare
& jump equal
calls. In other words, it replaced the switch statement with a “regular” if.
That’s kind of awesome no? I find it really cool that compilers can perform such neat tricks.