I've been trying to implement a multiplication program in ARMv8 Assembly. I've been able to get the skeleton to work properly, but for some reason it isn't multiplying properly.
.initial: .string "multiplier = 0x%08x (%d) multiplicand = 0x%08x (%d)\n\n"
.product1: .string "product = 0x%08x multiplier = 0x%08x\n\n"
.result1: .string "64-bit result = 0x%016llx (%lld)\n\n" // Corrected format specifier
define(FALSE, 0)
define(TRUE, 1)
define(multiplier, w18)
define(multiplicand, w19)
define(product, w20)
define(i, w21)
define(negative, w22)
define(result, x18)
define(temp1, x19)
define(temp2, x20)
define(product64, x21)
define(multiplier64, x22)
define(multiplicand64, x23)
.balign 4
.global main
.text
main:
stp x29, x30, [sp, -16]!
mov x29, sp
mov multiplicand, -30
mov multiplier, 70
mov product, 0
mov i, 0
print1:
adrp x0, .initial
add x0, x0, :lo12:.initial
mov w1, multiplier
mov w2, multiplier
mov w3, multiplicand
mov w4, multiplicand
bl printf
multiplier_check:
cmp multiplier, 0
b.ge Loop1
mov negative, TRUE
Loop1:
add i, i, 1
cmp i, 32
b.gt end
tst multiplier, 0x1
b.eq Loop3
asr multiplier, multiplier, 1
Loop3:
add product, product, multiplicand
Loop2:
tst product, 0x1
b.ne Loop4
orr multiplier, multiplier, 0x80000000
Loop4:
and multiplier, multiplier, 0x7FFFFFFF
Loop5:
asr product, product, 1
negative_check:
cmp negative, TRUE
b.eq Loop6
b print2
Loop6:
sub product, product, multiplicand
b Loop5 // Corrected loop condition
print2:
adrp x0, .product1
add x0, x0, :lo12:.product1
mov w1, product
mov w2, multiplier
bl printf
sxtw product64, product
sxtw multiplier64, multiplier
mul result, product64, multiplier64 // Corrected multiplication for 64-bit result
adrp x0, .result1
add x0, x0, :lo12:.result1
mov x1, result
mov x2, result
bl printf
end:
mov w0, 0
ldp x29, x30, [sp], 16
ret
Normally its supposed to output the product and multiplicand in 8-bit hexadecimal along with the 64-bit result, but innstead of outputting it properly I keep getting this:
multiplier = 0x00000046 (70) multiplicand = 0xffffffe2 (-30)
product = 0xfffffff1 multiplier = 0x00000046
64-bit result = 0xfffffffffffffbe6 (-1050)
In order to fix this I tried to change the branch logic to see what been going wrong, but I keep getting this same output. (Although I was able to get the proper sign now.) Can anyone help me see where the logic is breaking?
Edit: For clarification, I use these commands for it to run.
m4 assign2a.asm > assign2a.s
gcc assign2a.s -o e.o
./e.o
I haven't gone through your entire program, but here are some comments. Some are definitely bugs (but may or may not be the specific bug causing your problem), some are just sketchy coding.
General coding style and practice
Comment your code! Comments should explain, not only what the code is doing, but more importantly why.
Use more descriptive label names than
Loop1,Loop2, etc. What is supposed to happen when you branch to this label? Name it accordingly.Single-step with a debugger to see what your code is actually doing, and inspect the register contents as you go. If you understand your code well, then you should know what values should be in all the registers at all times. When you see them diverge from what you expect, you have found a bug. It looks like you are building and running your code natively on some aarch64 Unix-like system, so that means you can run
gdborlldbnatively as well.Defining symbolic names for all your registers is nice in some ways, but it's risky that you've chosen registers that overlap. (You do know that, e.g.,
w18is the low 32 bits ofx18, right?) So you have to remember thatmultiplierandresultconflict with each other, and that writing to one will overwrite the other; and the same for all the other pairs. This sounds like a recipe for bugs.Calling convention violations
The call-preserved registers in the standard aarch64 calling conventions are x19-x28. So w18/x18 is not call-preserved and may be overwritten when you call
printfor any other C function.You're guaranteed that a function call will not clobber x19-x28, but by that same token, your function (even if it is
main) must not clobber them either. So push any of those registers that you use at the start ofmain, and pop them before returning.Branch logic bugs
In
The
negativeregister (reallyw22) has not been initialized before this. So ifmultiplieris nonnegative and the branch is taken,negativeretains its previous garbage value which won't necessarily be 0.This is a situation (as in many other places in your code) where a branch can be replaced with a conditional-select. You can simply do
which will set the
negativeregister to0or1according to whether the condition codelt(complement ofge) is false or true.(Actually, this one could be optimized further as the single instruction
lsr negative, multiplier, 31to just shift the sign bit ofmultiplierinto the low bit ofnegative.)This doesn't look right. If the branch is not taken, then
orrsets the sign bit ofmultiplier, but then you fall through to theandinstruction which just clears that bit again.You could put an unconditional branch after the
orrto jump over theand. However this whole sequence can also be replaced with a bitfield move:bfi multiplier, product, 31, 1to insert the low bit ofproductinto bit 31 ofmultiplier.