In thumb2 assembly, when r0 and r1 have signed integers, I like to have r1=-1 (i.e. 0xffffffff) if r0 < r1, otherwise r1=0.
I can simply code:
4288 cmp r0, r1
bfb4 ite lt
f04f 31ff movlt.w r1, #-1
2100 movge r1, #0
But I wonder if there is more optimized way either in cycles or in space.
If it were an unsigned comparison, I could use the carry flag:
4288 cmp r0, r1
4189 sbcs r1, r1
In ARM64, cset r1, lt would return 0 or 1, but I like to code in thumb2 assembly.
If your inputs have limited range so subtraction won't have signed overflow, you can use Jester's suggestion of using the sign bit of a subtraction:
This works as long as
r0-r1doesn't overflow a 2's complement signed integer. Then the sign of the result will indeed be negative whenr0 < r1.Failing cases include
-10 < INT_MAX, where the mathematical result is-2147483657, but truncated to 32-bit we get0x7ffffff7(+2147483639). The V flag will be set, indicating signed overflow, and the N flag (sign bit) will be clear because the truncated result is not Negative, opposite of the sign of the mathematical non-truncated result.That's why signed compare conditions like
ltcheckN != Vinstead of justN, so for examplecmp/bltworks correctly with these inputs.If your code has to work correctly with arbitrary inputs (full-range), I don't think there's any room for improvement, not even in code-size. Using an
ltcondition, either branch oritpredication, seems the only reasonable option. Emulating 2's complement comparison manually is not going to be shorter than this.Even outside an IT block, ARM Thumb2 doesn't have a 2-byte instruction for setting a register to
-1. (At least not that compilers know about or use.)movsdoesn't sign-extend its immediate, andmvn/mvns-immediate is a 4-byte instruction. So isorrs r0, #-1, not that you'd want that false dependency for performance anyway. So even if we could produce the result in a different register than either input, there's no savings.Current GCC and clang (Godbolt) prefer unconditionally setting a register and then predicating one
mov-immediate to overwrite it. But that might just be a heuristic for Thumb mode that saves code-size if one of the constants allows a shorter instruction outside an IT block, or of predicating fewer instructions in case it fills up an IT block and needs another IT, or couldn't combine into one ITE. That could happen in a larger function, or if other things are predicated on the same condition, but isn't a problem here.In ARM mode (
-marm), GCC preferscmp;movge r0, #0;mvnlt r0, #0for all-mcpu=that I've looked at (cortex-a8, cortex-a53, cortex-a76 and unset). (I'm looking at a function, so it returns in r0, but the inputs are r0 and r1 so it's still the same situation as yours.)So that's exactly the same as your strategy for Thumb mode. Unless an instruction inside an IT block is slower than being outside, probably best to do what you're doing.