I am working with a team in a Computer Architecture class on a Y86 program to implement multiplication function imul. We have a block of code that works, but we are trying to make it as execution-time efficient as we can. Currently our block looks like this for imul:
imul:
# push all used registers to stack for preservation
pushq %rdi
pushq %rsi
pushq %r8
pushq %r9
pushq %r10
irmovq 0, %r9 # set 0 into r9
rrmovq %rdi, %r10 # preserve rdi in r10
subq %rsi, %rdi # compare rdi and rsi
rrmovq %r10, %rdi # restore rdi
jl continue # if rdi (looping value/count) less than rsi, don't swap
swap:
# swap rsi and rdi to make rdi smaller value of the two
rrmovq %rsi, %rdi
rrmovq %r10, %rsi
continue:
subq %r9, %rdi # check if rdi is zero
cmove %r9, %rax # if rdi = 0, rax = 0
je imulDone # if rdi = 0, jump to end
irmovq 1, %r8 # set 1 into r8
rrmovq %rsi, %rax # set rax equal to initial value from rsi
imulLoop:
subq %r8, %rdi # count - 1
je imulDone # if count = 0, jump to end
addq %rsi, %rax # add another instance of rsi into rax, looped adition
jmp imulLoop # restart loop
imulDone:
# pop all used registers from stack to original values and return
popq %r10
popq %r9
popq %r8
popq %rsi
popq %rdi
ret
Right now our best idea is using immediate arithmetic instructions (isubq, etc) instead of normal OPq instructions with settings constants into registers and using those registers. Would this method be meaningfully more efficient in this particular instance? Thanks so much!