In the case of a 64-bit x86 register, is it possible to hold more than one value at a time in the same register, if the size of an value is small enough such that multiple instructions could fit into a register? For example fitting two 32 bit ints into one register. Would this be a bad thing to do if it is possible? I've been reading up on registers and I'm quite new to the concept.
Can a register hold multiple values at a time?
2.3k views Asked by Luke Davis At
2
There are 2 answers
3
paxdiablo
On
Registers don't tend to hold instructions, they instead hold data to be worked on by instructions.
However, if you wanted to store instructions as data, I believe (from here) that the longest x86 instruction is about fifteen bytes, or 120 bits. So, no, it won't fit into a single 64-bit register.
In terms of holding multiple data values in a single register, that is certainly possible. This is even supported by the hardware, with even the earliest x86 chips having ah and al which together formed the ax register.
Even without that, you can certainly insert/extract "sub-registers" into/from registers, by using the bitwise operations (like and, or, not and xor), and the bit shift operations (like shl, shr, rol, and ror).
Related Questions in ASSEMBLY
- Is there some way to use printf to print a horizontal list of decrementing hex digits in NASM assembly on Linux
- How to call a C language function from x86 assembly code?
- Binary Bomb Phase 2 - Decoding Assembly
- AVR Assembly Clock Cycle
- Understanding the differences between mov and lea instructions in x86 assembly
- ARM Assembly code is not executing in Vitis IDE
- Which version of ARM does the M1 chip run on?
- Why would %rbp not be equal to the value of %rsp, which is 0x28?
- Move immediate 8-bit value into RSI, RDI, RSP or RBP
- Unable to run get .exe file from assembly NASM
- DOSbox automatically freezes and crashes without any prompt warnings
- Load function written in amd64 assembly into memory and call it
- link.exe unresolved external symbol _mainCRTStartup
- x86 Wrote a boot loader that prints a message to the screen but the characters are completely different to what I expected
- running an imf file using dosbox in parallel to a game
Related Questions in X86-64
- What is causing the store latency in this program?
- Move immediate 8-bit value into RSI, RDI, RSP or RBP
- What is Win32 x86-64 CONTEXT::VectorRegister for?
- Why does MSVC never return struct in RAX for member-functions?
- How to change UP (direction) flag in x86 assembly to 1?
- docker inspect splunkImage Container ID: Warining: cannot create \"/opt/splunk/var/log/splunk
- Infinite loop while trying to print numbers 1 to 10 in assembly x86 64 bits
- Get the address and size of a loaded shared object on memory from C
- What a reason for C2148 or similar errors on another compilers?
- In a Linux signal handler, will x86 extended state always be in XSAVE format, or can it be in XSAVEC format as well?
- ASM register-variable from existing register-value in clang
- Smallest possible 64-bit MASM GUI application not working correctly
- How do I fix the jsonobject architecture problem I am having in PyCharm CE when the terminal says the package is installed?
- x86 Assembly: handling exponent 1 in power calculation
- How to navigate to the structure definition for the target architecture when cross-compiling on Ubuntu with VS Code?
Related Questions in SIMD
- What is Win32 x86-64 CONTEXT::VectorRegister for?
- Avx2 intrinsics don't use all registers available. .NET 8
- How to convert DoubleVector to IntVector in Java Vector API?
- Understanding throughput of simd sum implementation x86
- SIMD method to get all consecutive sums of 4 or 8 DWORD integers (prefix-sum within each vector)
- Convert Variable Width Bitstream (2-bit or 4-bit symbols) into Fixed Width
- How can I adapt my code using Math.round and remainder on integer-valued FP double into a Java code using SIMD instructions?
- What is the benefit of using SIMD to pre-calculate the branching results?
- Extract icons from exe in Rust?
- How to load uint8_t "as" 32 bits integer efficiently into a SIMD register?
- Dot-product groups of 4 bytes against 4 small constants, over an array of bytes (efficiently using SIMD)?
- Intel classic compiler reports non-unit strided load in simple assignment
- Optimizing Mandelbrot Set Calculation in C++ on a High-Performance CPU
- AVX2 vectorization for code similar to prefix sum (decrement by count of preceding matches in short fixed-length arrays)
- SIMD performance does not look right
Related Questions in CPU-REGISTERS
- Understanding the differences between mov and lea instructions in x86 assembly
- Move immediate 8-bit value into RSI, RDI, RSP or RBP
- Maximum CPU Voltage reading
- Enabling one timer using another
- CMP ESI, -20. This part of code makes no sense to me. How does this magic work?
- Why doesn't this pop instruction restore the register values?
- Configuring timer channel as output
- Setting up Segment Registers, x86
- Why arm64 pass params throught register x8-x17?
- gdbserver and ymm0h register
- Unit tests on registers with bare metal programming
- Performance advantage of 32bit registers in AArch64?
- What is the meaning of "ptr" in assembly?
- What is the meaning of register1:register2 in assembly language?
- The SUB instruction in CPU
Related Questions in SWAR
- Can packing variables or parameters into structures/unions introduce unforseen performance penalties?
- Speed up strlen using SWAR in x86-64 assembly
- How to check if a register contains a zero byte without SIMD instructions
- Add two vectors (uint64_t type) with saturation for each int8_t element
- SIMD-within-a-register version of min/max
- Fastest way to find 16bit match in a 4 element short array?
- Multiplication of two packed signed integers in one
- Performantly reverse the order of 16-bit quantities within a 64-bit word
- bit twiddling to right pack bits
- How to implement SWAR unsigned less-than?
- How to write a SWAR comparison which puts 0xFF in a lane on matches?
- SWAR byte counting methods from 'Bit Twiddling Hacks' - why do they work?
- Can a register hold multiple values at a time?
- Subtracting packed 8-bit integers in an 64-bit integer by 1 in parallel, SWAR without hardware SIMD
- Compare 64-bit integers by segments
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
Popular Tags
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
Registers don't hold instructions, but I'll assume you meant fitting multiple values into one register, so that you can add them both with one instruction.
Yes, this is called SIMD. (Single Instruction, Multiple Data) On x86-64, SSE2 (Streaming SIMD Extensions) is guaranteed to be available, so you have sixteen different 16-byte registers (xmm0..15). And you have instructions that can do packed FP add/sub/mul/div/sqrt/cmp of 4x 32-bit floats, 2x 64-bit double, packed integer add/sub/cmp/shift/etc for byte, word, dword, and qword operand-sizes.
(With some gaps; SSE2 is not very orthogonal, e.g. narrowest shift is 16-bit, packed min/max only available for certain sizes. Some of these gaps are filled in by SSE4.1).
And bitwise-boolean stuff where element width is irrelevant (until AVX512 with mask registers...)
See https://www.felixcloutier.com/x86/.
p...instructions likepaddware packed-integer....psandpdare floating point packed-single or packed-double.Compilers frequently use SSE/SSE2 instructions like
movdqato zero or copy memory in 16-byte chunks, as well as to "vectorize" (use SIMD computations) for loops over arrays. And GCC 7 or 8 and later know how to coalesce loads/stores of adjacent struct members or array elements into a scalar load or store using RAX, for example.e.g. this sum of an array:
compiles like this with GCC9.3 -O3 for x86-64 on the Godbolt compiler explorer
Vectorization is sort of like parallelization and for a reduction like this (summing an array down to scalar) requires associative operations. e.g. an FP version would only vectorize with
-ffast-mathor with OpenMP.In a general purpose register like RAX that doesn't have instructions to do SIMD addition without carry between byte boundaries (like
paddb xmm0, xmm1would), it's called SWAR (SIMD within a register).This technique was more useful in the past, on ISAs without a proper SIMD instruction set like Alpha or MIPS64. But it's still possible, and SWAR techniques can be useful as part of something like a popcount without the
popcntinstruction, e.g. masking out every other bit and shifting so you're effectively doing 32 separate additions (that can't overflow into each other) into 2-bit accumulators.The popcnt bithack shown in How to count the number of set bits in a 32-bit integer? does that, widening to 4-bit counters then 8-bit, then using a multiply to shift-and-add by 4 different shifts and produce the sum in the high byte.