JeePuzzle #1

Added by jcw over 5 years ago

Time for a little diversion. Let’s dive into a perhaps-less-obvious detail of something mentioned on the weblog … see if you can figure this one out :)

Line 24 of the bare blinker example consists of the following line:

while (true) ;

In other words: loop forever. The resulting firmware image compiles to exactly 400 bytes (of which 192 are for the fixed interrupt vector table, btw).

Why does the code size **increase* to 404 bytes when that while() loop is removed?*

Replies (9)

RE: JeePuzzle #1 - Added by over 5 years ago

Because you then exit from main (despite lacking a return instruction, so the actual value will be whatever is in a register) and then start running whatever is in the startup code after main() is called. Perhaps the C library function abort()?

RE: JeePuzzle #1 - Added by jcw over 5 years ago

Getting warm - but why would the compiled code be different if it’s a runtime effect?
Note also that there is no abort() - all the code for this project is on github, there is no additional runtime code.

RE: JeePuzzle #1 - Added by jcw over 5 years ago

I’ve added the return statement for completeness. No difference for this puzzle, code size is still 400b with and 404b without the while.

RE: JeePuzzle #1 - Added by jfklein over 5 years ago

In the case without “while(true)”, the compiler needs to add another jump instruction at the end of main to jump back to it’s calling function.
This will add at least 2 bytes to the program-size.
When we have a while(true), all the code after it, gets stripped away by the compiler because it can never run anyway.

That still leaves us with 2 bytes to explain, which will boil down to while(true) v.s. while(1).
We need another byte to store the constant 1 and we also need another byte to store the comparison instruction (>= 0) to test if we need to keep running the body of this while loop.

This would be my best guess.

RE: JeePuzzle #1 - Added by JohnO over 5 years ago

My money is on you jfklein

RE: JeePuzzle #1 - Added by jcw over 5 years ago

Bingo, Jasper! Some bonus points missed, but here is the exact story:

The while (true) ; loop is optimised by the compiler as:

    4a:   e7fe            b.n     4a 

That’s just two bytes (notice that no constant is needed, nor a test, the compiler is clever).

As you point out, the compiler also knows that anything after that is unreachable code, and hence does not compile the normal return-pop-from-stack code. Which explains why the extra infinite loop leads to a better optimisation.

Here’s the compiled code without the while loop:

    4a:   2000            movs    r0, #0
    4c:   bd10            pop     {r4, pc}

In this case, the compiler has to generate code to return 0, which takes two instructions, 4 bytes total. Note that leaving out the return in the source code generates a C warning, but it still causes the compiler to emit exactly the same code.

FYI: the above information can easily be obtained using the arm-none-eabi-objdump -S main.o command, which dumps out a “disassembly” listing of the main.o file created by the compiler.

So that’s 4 bytes instead of 2. We still need to explain why the final code is 400 vs 404 bytes. One bonus point left to go… ;)

RE: JeePuzzle #1 - Added by jcw over 5 years ago

Here’s a hint…

The best way to identify unreachable code is to insert this statement (this is gcc-compiler specific, not standard C/C++):


Then, no code is emitted, neither for an infinite while loop, not for a return statement.

But even in this case, the generated code still includes this instruction:

4a: 46c0 nop ; (mov r8, r8)

Why is the compiler inserting these two bytes of code into the generated data?

RE: JeePuzzle #1 - Added by jfklein over 5 years ago

Probably the compiler is constructing some sort of busy wait, by endlessly writing the value of register 8 back to register 8.

The compiler tells us it’s a nop-operation, but that small cpu probably doesn’t have a nop-instruction, so the compiler “creates” or “hacks” something into existance that will do the same.

Again, this is a best guess, because I don’t have the toolchain at hand atm.

RE: JeePuzzle #1 - Added by jcw over 5 years ago

Nah - it’s simpler than that: the next generated “code” consists of some 32-bit constants, which are aligned to 4-byte boundaries.
The “nop” is just a filler to reach 32-bit alignment in flash memory.

But yeah, I’m getting a bit ahead of myself. Posts about the compiler toolchain are scheduled for next week.