Causes and benefits of this improvement on gcc version >= 4.9.0 vs gcc version < 4.9?

Question

Causes and benefits of this improvement on gcc version >= 4.9.0 vs gcc version < 4.9?

256 views Asked by lzutao At 23 December 2016 at 13:47

I have recently exploited a dangerous program and found something interesting about the difference between versions of gcc on x86-64 architecture.

Note:

Wrongful usage of gets is not the issue here.
If we replace gets with any other functions, the problem doesn't change.

This is the source code I use:

#include <stdio.h>
int main()
{
    char buf[16];
    gets(buf);
    return 0;
}

I use gcc.godbolt.org to disassemble the program with flag -m32 -fno-stack-protector -z execstack -g.

At the disassembled code, when gcc with version >= 4.9.0:

lea     ecx, [esp+4]            # begin of main
and     esp, -16
push    DWORD PTR [ecx-4]       # push esp
push    ebp
mov     ebp, esp
/* between these comment is not related to the question
push    ecx
sub     esp, 20
sub     esp, 12
lea     eax, [ebp-24]
push    eax
call    gets
add     esp, 16
mov     eax, 0
*/
mov     ebp, esp            
mov     ecx, DWORD PTR [ebp-4]  # ecx = saved esp
leave
lea     esp, [ecx-4]
ret                             # end of main

But gcc with version < 4.9.0 just:

push    ebp                     # begin of main
mov     ebp, esp
/* between these comment is not related to the question
and     esp, -16
sub     esp, 32
lea     eax, [esp+16]
mov     DWORD PTR [esp], eax
call    gets
mov     eax, 0
*/
leave
ret                             # end of main

My question is: What is the causes of this difference on the disassembled code and its benefits? Does it have a name for this technique?

Original Q&A

There are 2 answers

**Olivier** · Answer 1 · 2016-12-23T16:40:59+00:00

I can't say for sure without the actual values in:

and     esp, 0xXX               # XX is a number

but this looks a lot like extra code to align the stack to a larger value than the ABI requires.

Edit: The value is -16, which is 32-bit 0xFFFFFFF0 or 64-bit 0xFFFFFFFFFFFFFFF0 so this is indeed stack alignment to 16 bytes, likely meant for use of SSE instructions. As mentioned in comments, there is more code in the >= 4.9.0 version because it also aligns the frame pointer instead of only the stack pointer.

**Margaret Bloom** · Answer 2 · 2018-05-06T19:21:36+00:00

The i386 ABI, used for 32-bit programs, imposes that a process, immediately after loaded, has to have the stack aligned on 32-bit values:

%esp Performing its usual job, the stack pointer holds the address of the bottom of the stack, which is guaranteed to be word aligned.

confront this with the x86_64 ABI¹ used for 64-bit programs:

%rsp The stack pointer holds the address of the byte with lowest address which is part of the stack. It is guaranteed to be 16-byte aligned at process entry

The opportunity gave by the new AMD's 64-bit technology to rewrite the old i386 ABI allow a number of optimizations that were lacking due to backward compatibility, among these a bigger (stricter?) stack alignment.
I won't dwell on the benefits of stack alignment but it suffices to say that if a 4-byte alignment was good, so is a 16-byte one.
So much that it is worth spending some instructions aligning the stack.

That's what GCC 4.9.0+ does, it aligns the stack at 16-bytes.
That explains the and esp, -16 but not the other instructions.

Aligning the stack with and esp, -16 is the fastest way to do it when the compiler only knows that the stack is 4-byte aligned (since esp MOD 16 can be 0, 4, 8 or 12).
However it is a destructive method, the compiler loses the original esp value.

But now it comes the chicken or the egg problem: if we save the original esp on the stack before aligning the stack, we lose it because we don't know how far the stack pointer is lowered by the alignment. If we save it after the alignment, well, we can't. We lost it in the alignment.
So the only possible solution is to save it in a register, align the stack and then save said register on the stack.

;Save the stack pointer in ECX, actually is ESP+4 but still does
lea     ecx, [esp+4]            #ECX = ESP+4

;Align the stack
and     esp, -16                #This lowers ESP by 0, 4, 8 or 12

;IGNORE THIS FOR NOW
push    DWORD PTR [ecx-4]  

;Usual prolog
push    ebp
mov     ebp, esp

;Save the original ESP (before alignment), actually is ESP+4 but OK
push    ecx

GCC saves esp+4 in ecx, I don't know why² but this values still does the trick.

The only mystery left is the push DWORD PTR [ecx-4].
But it turns out to be a simple mystery: for debugging purposes GCC pushes the return addresses just before the old frame pointer (before push ebp), this is where 32-bit tools expect it to be.
Since ecx=esp_o+4, where esp_o is the original stack pointer pre-alignment, [ecx-4] = [esp_o] = return address.

Note that now the stack is at 12 bytes modulo 16, thus the local variable area must be of size 16*k+4 to have the stack aligned at 16-byte again.
In your example k is 1 and the area is of 20 bytes in size.

The subsequent sub esp, 12 is to align the stack for the gets function (the requirement is to have the stack aligned at the function call).

Finally, the code

mov ebp, esp
mov ecx, DWORD PTR [ebp-4] # ecx = saved esp leave lea esp, [ecx-4] ret

The first instruction is copy-paste error.
One could check it out or simply reason that if it were there the [ebp-4] would be below the stack pointer (and there is no red zone for the i386 ABI).

The rest is just undoing what's is done in the prolog:

;Get the original stack pointer
mov     ecx, DWORD PTR [ebp-4]          ;ecx = esp_o+4

;Standard epilog
leave                                   ;mov esp, ebp / pop ebp
                                        ;The stack pointer points to the copied return address                

;Restore the original stack pointer
lea     esp, [ecx-4]                    ;esp = esp_o
ret

GCC has to first get the original stack pointer (+4) saved on the stack, then restore the old frame pointer (ebp) and finally, restore the original stack pointer.
The return address is on the top of the stack when lea esp, [ecx-4] is executed, so in theory GCC could just return but it has to restore the original esp because main is not the first function to be executed in a C program, so it cannot leave the stack unbalanced.

¹ This is not the latest version but the text quoted went unchanged in the successive editions.
² This has been discussed here on SO but I can't remember if in some comment or in an answer.

TechQA.

Causes and benefits of this improvement on gcc version >= 4.9.0 vs gcc version < 4.9?

There are 2 answers

Related Questions in C

Related Questions in GCC

Related Questions in DISASSEMBLY

Popular Questions

Popular Tags

Trending Questions