Understanding GCC's Register Allocation

179 views Asked by At

I am trying to understand how GCC decides register allocation. Given the expression "x = a * b * c * d * e * f * g * h", its GIMPLE representation would be:

    _1 = a * b;
    _2 = c * _1;
    _3 = d * _2;
    _4 = e * _3;
    _5 = f * _4;
    _6 = g * _5;
    x = h * _6;

And the resulting ARM64 assembly is:

    ldr w1, [sp, 44]; a
    ldr w0, [sp, 40]; b
    mul w1, w1, w0
    ldr w0, [sp, 36]; c
    mul w1, w1, w0
    ldr w0, [sp, 32]; d
    mul w1, w1, w0
    ldr w0, [sp, 28]; e
    mul w1, w1, w0
    ldr w0, [sp, 24]; f
    mul w1, w1, w0
    ldr w0, [sp, 20]; g
    mul w0, w1, w0
    ldr w1, [sp, 16]; h
    mul w0, w1, w0
    str w0, [sp, 12]; x

However, for similar GIMPLE like "_4 = e * _3" and "_6 = g * _5", I found that the order of register allocation in the resulting assembly is different.

For example, the ARM64 assembly for the expression "_4 = e * _3" is:

    ldr w0, [sp, 28]; e
    mul w1, w1, w0

However, for "_6 = g * _5" it's:

    ldr w1, [sp, 16]; h
    mul w0, w1, w0

W1 register is mapped to the virtual register _1, _2, _3, _4, and _5, but W0 register is mapped to the virtual register _6.

In contrast, Clang seems to assign registers in a more consistent manner:

    ldr w9, [sp, #24]; e
    mul w8, w8, w9
    ...
    ldr w9, [sp, #12]; h
    mul w8, w8, w9

Let's consider the another expression:

    x = a + b + c + a * b + b * c + c * a + a * b * c;

When paired with GIMPLE and assembly, it looks like:

    ; _1 = a + b
    ldr w1, [sp, 12]; a
    ldr w0, [sp, 8];  b
    add w1, w1, w0

    ; _2 = c + _1
    ldr w0, [sp, 4];  c
    add w1, w1, w0

    ; _3 = a * b
    ldr w2, [sp, 12]; a
    ldr w0, [sp, 8];  b
    mul w0, w2, w0

    ; _4 = _2 + _3
    add w1, w1, w0

    ; _5 = b * c
    ldr w2, [sp, 8];  b
    ldr w0, [sp, 4];  c
    mul w0, w2, w0

    ; _6 = _4 + _5
    add w1, w1, w0

    ; _7 = c * a
    ldr w2, [sp, 4];  c
    ldr w0, [sp, 12]; a
    mul w0, w2, w0

    ; _8 = _6 + _7
    add w1, w1, w0

    ; _9 = a * b
    ldr w2, [sp, 12]; a
    ldr w0, [sp, 8];  b
    mul w2, w2, w0

    ; _10 = c * _9
    ldr w0, [sp, 4];  c
    mul w0, w2, w0

    ; x = _8 + _10
    add w0, w1, w0
    str w0, [sp];     x

Let's analyze the above GIMPLE and assembly in order.

    ; _1 = a + b
    ldr w1, [sp, 12]; a
    ldr w0, [sp, 8];  b
    add w1, w1, w0

(1) The register for the right operand is allocated first, followed by the left operand.

(2) The register used for the left operand W1 is mapped to the virtual register _1.

    ; _2 = c + _1
    ldr w0, [sp, 4];  c
    add w1, w1, w0

Since the W1 register is already mapped to the virtual register _1, c is allocated the W0 register. After this, the virtual register _1 is no longer used, so the W1 register is mapped to the virtual register _2.

    ; _3 = a * b
    ldr w2, [sp, 12]; a
    ldr w0, [sp, 8];  b
    mul w0, w2, w0

Similar to (1), the right operand gets the W0 register first. Since the W1 register is already mapped to the virtual register _1, the left operand gets the w2 register.

I expected, based on (2), that the W2 register used for the left operand would be mapped to the virtual register _3, but in reality, W0 was used.

It doesn't appear as though GCC linearly generates the assembly from GIMPLE.

Could someone provide more insight into the underlying register allocation logic that GCC follows?

I would appreciate if someone could explain the process and steps GCC would take to convert the GIMPLE representation of the complex expression "x = a + b + c + a * b + b * c + c * a + a * b * c" into ARM64 assembly.

  • When I dumped the GIMPLE tree and the assembly code, I used the compile option -O0 (No Optimization).
0

There are 0 answers