As I understand The LSU(Load/Store Unit) in a RISC architecture like Arm handles load/store calls, and DMA(Direct Memory Access) Unit is responsible for moving data independent from the processor, memory to memory, peripheral to memory, etc. What I am confused about is which one handles the prefetching of instructions or data for branch predictor or instruction/data cache. Since prefetching is not an instruction but an automatic process to speed up the processor, is this job handled by DMA? I am confused since the DMA unit is shown as an external unit in the example design given at Arm Cortex-M85 technical reference manual example design
Related Questions in MEMORY
- 9 Digit Addresses in Hexadecimal System in MacOS
- Memory location changing from 0 to 1 consistently on Mac
- Would event listeners prevent garbage collecting objects referenced in outer function scopes?
- tensorrt inference problem: CPU memory leak
- How to estimate the memory size of a binary voxelized geometry?
- Java Memory UTF-16 Vs UTF-8
- Spring Boot application container memory footprint (Java 21)
- Low memory Windows CE
- How to throw an error when a program acesses a block of memory created by you that has been deallocated by a call of free?
- Golang bufio.Scanner: token too long
- Get the address and size of a loaded shared object on memory from C
- In Redis Databases how do we need to calculate the table size
- ClickHouse Materialized View consuming a lot of Memory and CPU
- How to reduce memory usage for large matrix calculations?
- How to use memray with Gunicorn or flask dev server?
Related Questions in ARM
- Jiobook flashing
- How to flush denormal numbers to zero for apple silicon?
- How to exploit Unified Memory in OpenCL with CL_MEM_ALLOC_HOST_PTR flag?
- ARM Assembly code is not executing in Vitis IDE
- Which version of ARM does the M1 chip run on?
- Vector by Scalar Division with -ffast-math
- Why veneer code generated by gcc for cortex-m0 seems 8-byte aligned?
- Getting almost random time stamp counter on ARM
- Portenta H7 Baremetal Development and a Little Guidance on Embedded System Learning Roadmap
- STM32 RTC3 Mixed Mode: Writing TR resets SSR
- Implementing Quick Sort Algorithm in Visual2 with armv7
- How can I create an Inline assembly command with a multi-variable register offset?
- Inquiry: ARM Compatibility for Puppeteer
- Confusion with thumb instructions while compiling recipe for cortexm4 CPU
- Difficulty understanding virtual LPIs in GICv3
Related Questions in CPU-ARCHITECTURE
- What is causing the store latency in this program?
- what's the difference between "nn layout" and "nt layout"
- Will a processor with such a defect work?
- How do i find number of Cycles of a processor?
- Why does LLVM-MCA measure an execution stall?
- Can out-of-order execution of CPU affect the order of new operator in C++?
- running SPEC in gem5 using the SimPoint methodology
- Why don't x86-64 (or other architectures) implement division by 10?
- warn: MOVNTDQ: Ignoring non-temporal hint, modeling as cacheable!, While simulating x86 with spec2006 benchamrks I am getting stuck in warn message
- arithmetic intensity of zgemv versus dgemv/sgemv?
- What is the microcode scoreboard?
- Why don't x86/ARM CPU just stop speculation for indirect branches when hardware prediction is not available?
- Question about the behaviour of registers
- How to increase throughput of random memory reads/writes on multi-GB buffers?
- RISVC Single Cycle Processor Data Path and Testbench
Related Questions in RISC
- Why interrupts are entering the datapath at decode stage in many riscv project?
- RISC-V architecture, why do one add 4 bytes with no branch but shift with one when branch?
- RISC V Processor
- ArmV7 simulation using cpulator
- Why RISC-V CRC algorithm fails on verify_image?
- how to implement delayslot in superscalar processors?
- Which criteria is having lower and higher values for CISC and RISC?
- DMA vs Load/Store Unit
- How many bits do instruction sets have in ARM?
- YASMIN CPU simulator instruction set, RISC-based but what does #h mean?
- Instructions with Long (32 and 64 bit) immediate operands in RISC processors
- Is there a flag register in the Power ISA?
- How RISC reducing cycles while having many instructions?
- ARM vs RISC and x86 vs CISC
- I cannot find a solution to muliply unsigned integers
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
Popular Tags
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
Based on the comment question to Jake's answer
DMA is generally specific to the chip not the core (so not an ARM thing (as answered)). There are a number of MCUs that have DMA built in. So that for example you can set up some sort of data transfer, and the peripheral can go get the data for you rather than you have to service interrupts in a certain amount of time or poll. Due to limited resources and/or continuous data transfer it may have a buffer with a watermark or ping pong buffers, and this gives you time to prepare the next buffer while the peripheral uses DMA to transfer from the current buffer.
Do not assume that DMA is free or fast. Many folks make that mistake; it is very much based on the system design. Sometimes the DMA transfers happen during unused bus slots and for the most part feel free. Some designs intentionally leave slots just in case you are doing DMA. I think it is wasteful, but I have seen that. And also there are designs (ARM-based even) where the DMA takes over the bus for a period of time and the CPU essentially is stalled: as soon as it needs to touch that bus (fetching or load/store) it is stalled until DMA completes.
Ask yourself in your design whether you have data transfers in/out of a peripheral that you do not have storage for in the peripheral and want to use the SRAM used by the processor? Call it DMA or just an arbiter but you will want to then design your SRAM interface so that either the ARM or peripheral can access the SRAM. Ideally without too much performance pain on either one, and or let the programmer chose some rate; DMA only one transfer every X clocks...
Or do you have storage on the peripheral for a whole transfer, but moving that transfer to/from SRAM for the processor to operate would burn a fair amount of load/store operations on the processor? And that may also want a DMA transfer capability so that the processor can fire and forget and poll or wait for an interrupt to know the transfer has completed.
ARM docs just get you the ARM bus,;your system is not necessarily ARM bus, your SRAM doesn't have an ARN bus (nor your DDR controller on a larger system), nor the peripherals, etc, generally. That is often driven by the peripheral or SRAM so you are already gluing it all together as you know. That is where the DMA lives usually. You would buffer up ARM transfers in your logic (you would anyway) as well as peripheral driven if the peripheral can be a bus master, and then arbitrate the shared resource.
Recommendations for resources is certainly not what this site is for and is a quick way to get a question closed.
I'm confused as to why you are asking this because if you have the resources to actually build a chip, this is all basic chip design stuff. And to build something with an ARM in it (I guess other than educational FPGA work) really adds to the cost.
At the end of the day, do you have peripherals/transfers that you don't want to overly burden the processor with, or the processor cannot handle due to bus timing, interrupt latency, etc? Overly burdened would start off with senior members of the software team warning you that if you try to go into production with this design they will not write software to support it and it will fail. Historically there is a wall, but these days with pretty much all chip startups failing, silicon, hardware, and software teams all need to work together from the inception of the chip and through simulation and emulation.
Knowing your partners allows for give and take: if you give me DMA on this one then your FIFO can be smaller or slower; I want to be able to poll my way through it for various reasons but also have an interrupt with at least a 50% watermark (or ping pong buffers). So I can offer you some logic that makes this software task much easier if you are interested, a CRC engine or hashing, etc. - trivial for me, time consuming for you. And so on.
The real bottom line is work with your software and hardware (PCB, put the part on a board with other components, packaging, electrical specs, etc) folks. Very quickly between your thoughts/experience on peripheral implementation and the software/hardware team's experience it should quickly close on all the data transfer solutions for all the peripherals inside and outside the chip. And not all are assumed to want DMA nor use the same engine if you make it its own engine.