I came access this post whilst doing research for my next project. Being able to bit shift 8 and 16-bit integers by vector using SIMD would be very useful to me and I think many other people here.
Unfortunately for me, the platform my project will be running on will have at most SSE2 capabilities.
Swapping the
_mm256_***
with
_mm_***
is not gonna cut it as
_mm_shuffle_epi8() //Requires SSSE3
_mm_blendv_epi8() //Requires SSE4.1
_mm_blend_epi16() //Requires SSE4.1
_mm_sllv_epi32() //Requires AVX2
So you see my dilemma. It may be impossible to achieve with just SSE2, but I would be very happy (and frankly amazed) to by proven wrong.
Thanks in advance.
Not the nicest code going, and I can't really say if it's better or worse than processing each element as uint16. You could save a few ops if you ensure the bit shift amount is always < 16, but it's still not great.
and for 8 bit