printf
is magical. Did you ever stop and ask yourself how it works?
Contrary to most functions, it accepts a variable number of arguments, and somehow transforms them into a formatted string! The GNU code of printf is pretty simple:
printf (const char *format, ...) { |
if you look closely, it uses a weird ...
syntax, performs a couple of va_
calls and one vfprint call.
to understand printf
, we first need to understand how va_
works, then move to printf
.
If you’re ready for some hard-core c
and assembly
, start by reading how va_ works!
VA_
The va_
family of macros manipulate a stack pointer, which points to the beginning of variable argument “list”. This stack pointer is calculated from the argument passed to va_start
, and then va_arg
“pops” values from the “stack” as it iterates.
That was a lot to process. Let’s look at a concrete example to see what’s really going on.
|
First, the main is be called. The following is a simplified main assembly code:
push 46 |
Those push operations fill up the stack:
+------+ |
sp
is the real stack pointer.- 14, 29 and 46 are the arguments.
ret
is the return address: where to jump to when the function is done.
Next, va_start(args, numOfArgs)
takes the address of numOfArgs
and uses it to calculates the position of the first argument.
+------+ |
Next, va_arg(args, int)
returns what the ap
stack pointer points to, and increments it to point at the next argument.
+------+ |
And so on, until we’re done. Of course this is simplified, and the real code is more complex.
Dangers
You’ve probably noticed that va_
relies on the programmer to provide a way to figure out how many arguments were passed. Users can easily misuse use a variadic function, and introduce a security vulnerability if they continue calling va_arg
to access excess data.
Assembly
Lets re-cap on the code we’re talking about -
|
Done reading? awesome. The following assembly is a simplified version of the above, without unnecessary boilerplate.
It was generated using gcc:
gcc -m32 -S sum.c |
#################### |
Now that we understand how va_
works, we can talk about printf
.
printf
Again, let’s recap:
printf (const char *format, ...) { |
See those va_
calls? in our sum
function, we used the first argument as an indicator to how many arguments we have. printf
uses the format argument as an indicator.
Actually, most of the magic is done in vprintf
. printf
is only a wrapper for vprintf
which write the output string to stdout
. I suggest you read vprint’s GNU implementation, it only has 2278 line of code ;)
I said earlier that the format string is used as an indicator to the amount of variables. Actually, it serves two more purposes:
- figure out the type of the argument in order to calculate the position of the next argument.
- figure out the type in order to understand how to transform it to a character
So when parsing the format, vprintf
recognizes the %
tokens, and for each token it loads one more argument from the stack. Then it does some magical transformation code, and keeps going. That’s it basically.
P.S: remember we talked about the dangers of variadic functions? well, the Format String Attack is considered one of the Top 25 Most Dangerous Software Errors a programmer can make.