Can a register hold multiple values at a time?

Question

Can a register hold multiple values at a time?

2.3k views Asked by Luke Davis At 11 April 2020 at 23:46

In the case of a 64-bit x86 register, is it possible to hold more than one value at a time in the same register, if the size of an value is small enough such that multiple instructions could fit into a register? For example fitting two 32 bit ints into one register. Would this be a bad thing to do if it is possible? I've been reading up on registers and I'm quite new to the concept.

Original Q&A

There are 2 answers

paxdiablo On 12 April 2020 at 00:07

Registers don't tend to hold instructions, they instead hold data to be worked on by instructions.

However, if you wanted to store instructions as data, I believe (from here) that the longest x86 instruction is about fifteen bytes, or 120 bits. So, no, it won't fit into a single 64-bit register.

In terms of holding multiple data values in a single register, that is certainly possible. This is even supported by the hardware, with even the earliest x86 chips having ah and al which together formed the ax register.

Even without that, you can certainly insert/extract "sub-registers" into/from registers, by using the bitwise operations (like and, or, not and xor), and the bit shift operations (like shl, shr, rol, and ror).

**Peter Cordes** · Accepted Answer · 2020-04-12T01:32:44+00:00

Registers don't hold instructions, but I'll assume you meant fitting multiple values into one register, so that you can add them both with one instruction.

Yes, this is called SIMD. (Single Instruction, Multiple Data) On x86-64, SSE2 (Streaming SIMD Extensions) is guaranteed to be available, so you have sixteen different 16-byte registers (xmm0..15). And you have instructions that can do packed FP add/sub/mul/div/sqrt/cmp of 4x 32-bit floats, 2x 64-bit double, packed integer add/sub/cmp/shift/etc for byte, word, dword, and qword operand-sizes.

(With some gaps; SSE2 is not very orthogonal, e.g. narrowest shift is 16-bit, packed min/max only available for certain sizes. Some of these gaps are filled in by SSE4.1).

And bitwise-boolean stuff where element width is irrelevant (until AVX512 with mask registers...)

See https://www.felixcloutier.com/x86/. p... instructions like paddw are packed-integer. ...ps and pd are floating point packed-single or packed-double.

Compilers frequently use SSE/SSE2 instructions like movdqa to zero or copy memory in 16-byte chunks, as well as to "vectorize" (use SIMD computations) for loops over arrays. And GCC 7 or 8 and later know how to coalesce loads/stores of adjacent struct members or array elements into a scalar load or store using RAX, for example.

e.g. this sum of an array:

int sumarr(const int *arr)
{
    int sum = 0;
    for(int i=0; i < 10240; i++) {
        sum += arr[i];
    }
    return sum;
}

compiles like this with GCC9.3 -O3 for x86-64 on the Godbolt compiler explorer

sumarr:
        lea     rax, [rdi+40960]            # endp = arr + size
        pxor    xmm0, xmm0
.L2:                                        # do {
        movdqu  xmm2, XMMWORD PTR [rdi]        # v = arr[i + 0..3]
        add     rdi, 16                        # p += 4
        paddd   xmm0, xmm2                     # sum += v  // packed addition of 4 elements
        cmp     rax, rdi
        jne     .L2                         # }while(p != endp)
   ... then a horizontal vector sum ...
        MOVD eax, xmm0
        ret

Vectorization is sort of like parallelization and for a reduction like this (summing an array down to scalar) requires associative operations. e.g. an FP version would only vectorize with -ffast-math or with OpenMP.

In a general purpose register like RAX that doesn't have instructions to do SIMD addition without carry between byte boundaries (like paddb xmm0, xmm1 would), it's called SWAR (SIMD within a register).

This technique was more useful in the past, on ISAs without a proper SIMD instruction set like Alpha or MIPS64. But it's still possible, and SWAR techniques can be useful as part of something like a popcount without the popcnt instruction, e.g. masking out every other bit and shifting so you're effectively doing 32 separate additions (that can't overflow into each other) into 2-bit accumulators.

The popcnt bithack shown in How to count the number of set bits in a 32-bit integer? does that, widening to 4-bit counters then 8-bit, then using a multiply to shift-and-add by 4 different shifts and produce the sum in the high byte.

TechQA.

Can a register hold multiple values at a time?

There are 2 answers

Related Questions in ASSEMBLY

Related Questions in X86-64

Related Questions in SIMD

Related Questions in CPU-REGISTERS

Related Questions in SWAR

Popular Questions

Trending Questions