# How does printf really work?

printf is magical. Did you ever stop and ask yourself how it works?

Contrary to most functions, it accepts a variable number of arguments, and somehow transforms them into a formatted string! The GNU code of printf is pretty simple:

if you look closely, it uses a weird ... syntax, performs a couple of va_ calls and one vfprint call.

to understand printf, we first need to understand how va_ works, then move to printf.

If you’re ready for some hard-core c and assembly, start by reading how va_ works!

## VA_

The va_ family of macros manipulate a stack pointer, which points to the beginning of variable argument “list”. This stack pointer is calculated from the argument passed to va_start, and then va_arg “pops” values from the “stack” as it iterates.

That was a lot to process. Let’s look at a concrete example to see what’s really going on.

First, the main is be called. The following is a simplified main assembly code:

Those push operations fill up the stack:

• sp is the real stack pointer.
• 14, 29 and 46 are the arguments.
• ret is the return address: where to jump to when the function is done.

Next, va_start(args, numOfArgs) takes the address of numOfArgs and uses it to calculates the position of the first argument.

Next, va_arg(args, int) returns what the ap stack pointer points to, and increments it to point at the next argument.

And so on, until we’re done. Of course this is simplified, and the real code is more complex.

### Dangers

You’ve probably noticed that va_ relies on the programmer to provide a way to figure out how many arguments were passed. Users can easily misuse use a variadic function, and introduce a security vulnerability if they continue calling va_arg to access excess data.

### Assembly

Lets re-cap on the code we’re talking about -

Done reading? awesome. The following assembly is a simplified version of the above, without unnecessary boilerplate.

It was generated using gcc:

Now that we understand how va_ works, we can talk about printf.

## printf

Again, let’s recap:

See those va_ calls? in our sum function, we used the first argument as an indicator to how many arguments we have. printf uses the format argument as an indicator.

Actually, most of the magic is done in vprintf. printf is only a wrapper for vprintf which write the output string to stdout. I suggest you read vprint’s GNU implementation, it only has 2278 line of code ;)

I said earlier that the format string is used as an indicator to the amount of variables. Actually, it serves two more purposes:

1. figure out the type of the argument in order to calculate the position of the next argument.
2. figure out the type in order to understand how to transform it to a character

So when parsing the format, vprintf recognizes the % tokens, and for each token it loads one more argument from the stack. Then it does some magical transformation code, and keeps going. That’s it basically.

P.S: remember we talked about the dangers of variadic functions? well, the Format String Attack is considered one of the Top 25 Most Dangerous Software Errors a programmer can make.