Why GCC does not do function dead code elimination with LTO when compiling the object file with -O0?

169 views Asked by At

Example:

notmain.c

int __attribute__ ((noinline)) notmain(int i) {
    return i + 1;
}

int notmain2(int i) {
    return i + 2;
}

main.c

int notmain(int);

int main(int argc, char **argv) {
    return notmain(argc);
}

I use noinline to ensure that what happens is not a secondary effect of whether notmain is inlined or not.

Compile and disassemble with -O1:

gcc -c -flto -O1 notmain.c
gcc -flto -O3 notmain.o main.c
objdump -d a.out

Outcome: notmain present and notmain2 not present:

0000000000001040 <main>:
    1040:       f3 0f 1e fa             endbr64
    1044:       e9 f0 00 00 00          jmp    1139 <notmain>
    1049:       0f 1f 80 00 00 00 00    nopl   0x0(%rax)

0000000000001139 <notmain>:
    1139:       8d 47 01                lea    0x1(%rdi),%eax
    113c:       c3                      ret

However if I instead do:

gcc -c -flto -O0 notmain.c
gcc -flto -O3 notmain.o main.c
objdump -d a.out

then both are present:

0000000000001040 <main>:
    1040:       f3 0f 1e fa             endbr64
    1044:       e9 f0 00 00 00          jmp    1139 <notmain>
    1049:       0f 1f 80 00 00 00 00    nopl   0x0(%rax)

0000000000001139 <notmain>:
    1139:       f3 0f 1e fa             endbr64
    113d:       55                      push   %rbp
    113e:       48 89 e5                mov    %rsp,%rbp
    1141:       89 7d fc                mov    %edi,-0x4(%rbp)
    1144:       8b 45 fc                mov    -0x4(%rbp),%eax
    1147:       83 c0 01                add    $0x1,%eax
    114a:       5d                      pop    %rbp
    114b:       c3                      ret

000000000000114c <notmain2>:
    114c:       f3 0f 1e fa             endbr64
    1150:       55                      push   %rbp
    1151:       48 89 e5                mov    %rsp,%rbp
    1154:       89 7d fc                mov    %edi,-0x4(%rbp)
    1157:       8b 45 fc                mov    -0x4(%rbp),%eax
    115a:       83 c0 02                add    $0x2,%eax
    115d:       5d                      pop    %rbp
    115e:       c3                      ret

So my question is what does -O1 change in the notmain.o object file that leads to the optimization not being done?

Interestingly I also tried to bisect which exact optimization from -O1 leads to this. man gcc lists all the flags that -O1 enables:

gcc -c -flto -fauto-inc-dec -fbranch-count-reg -fcombine-stack-adjustments -fcompare-elim -fcprop-registers -fdce -fdefer-pop -fdelayed-branch -fdse -fforward-propagate -fguess-branch-probability -fif-conversion -fif-conversion2 -finline-functions-called-once -fipa-modref -fipa-profile -fipa-pure-const -fipa-reference -fipa-reference-addressable -fmerge-constants -fmove-loop-invariants -fmove-loop-stores -fomit-frame-pointer -freorder-blocks -fshrink-wrap -fshrink-wrap-separate -fsplit-wide-types -fssa-backprop -fssa-phiopt -ftree-bit-ccp -ftree-ccp -ftree-ch -ftree-coalesce-vars -ftree-copy-prop -ftree-dce -ftree-dominator-opts -ftree-dse -ftree-forwprop -ftree-fre -ftree-phiprop -ftree-pta -ftree-scev-cprop -ftree-sink -ftree-slsr -ftree-sra -ftree-ter -funit-at-a-time notmain.c
gcc -flto -O3 notmain.o main.c
objdump -d a.out

but notmain2 is still present.

I tired to observe the LTO with:

lto-dump -dump-body=notmain2 notmain.o

but I don't see anything that clearly would make a difference, with -O1:

Gimple Body of Function: notmain2
int notmain2 (int i)
{
  int _2;

  <bb 2> [local count: 1073741824]:
  _2 = i_1(D) + 2;
  return _2;

}

with -O0:

Gimple Body of Function: notmain2
int notmain2 (int i)
{
  int D.4724;
  int _2;

  <bb 2> :
  _2 = i_1(D) + 2;

  <bb 3> :
<L0>:
  return _2;

}

Tested on Ubuntu 23.04, GCC 12.2.0.

1

There are 1 answers

2
n. m. could be an AI On

If all else fails, read the manual.

Note that it is generally ineffective to specify an optimization level option only at link time and not at compile time, for two reasons. First, compiling without optimization suppresses compiler passes that gather information needed for effective optimization at link time. Second, some early optimization passes can be performed only at compile time and not at link time.