I've been looking at MMX/SSE and I am wondering. There are instructions for packed, saturated subtraction of unsigned bytes and words, but not doublewords.
Is there a way of doing what I want, or if not, why is there none?
I've been looking at MMX/SSE and I am wondering. There are instructions for packed, saturated subtraction of unsigned bytes and words, but not doublewords.
Is there a way of doing what I want, or if not, why is there none?
If you have SSE4.1 available, I don't think you can get better than using the
pmaxud+psubdapproach suggested by @harold. With AVX2, you can of course also use the corresponding 256bit variants.Without SSE4.1, you need to compare both arguments in some way. Unfortunately, there is no
epu32comparison (not before AVX512), but you can simulate one by first adding0x80000000(which is equivalent to xor-ing in this case) to both arguments:In some cases, it might be better to replace the comparison by some bit-twiddling of the highest bit and broadcasting that to every bit using a shift (this replaces a
pcmpgtdand three bit-logic operations (and having to load0x80000000at least once) by apsradand five bit-logic operations):Godbolt-Link, also including
adds_epu32variants: https://godbolt.org/z/n4qaW1 Strangely, clang needs more register copies than gcc for the non-SSE4.1 variants. On the other hand, clang finds thepmaxudoptimization for thecmpgt_epu32variant when compiled with SSE4.1: https://godbolt.org/z/3o5KCm