Constant cache vs Texture cache for broadcasting behaviour in CUDA

576 views Asked by At

I am interested in the differences between the constant cache and the texture cache for devices of compute capability 3.5, particularly the broadcasting behaviour. When all threads in a warps issue a request for the same data element from the constant memory and it hits in the cache, it is broadcasted to all threads in a single cycle. What is the behaviour of the texture cache in this case? Do the loads get serialised?

Also, am I correct to think that both the constant and texture cache are per multiprocessor and hence shared by multiple blocks?

1

There are 1 answers

1
Greg Smith On

NVIDIA does not provide additional details on the size or location of the constant cache.

The number of texture caches vary.

  • CC 2.0 1 Texture unit per SM
  • CC 2.1 2 Texture units per SM (1 per warp scheduler)
  • CC 3.0/3.5 4 Texture units per SM (1 per warp scheduler)
  • CC 3.2/gk208 2 Texture units per SM (1 per 2 warp schedulers)

Warps in blocks will be allocated across the warp schedulers in a SM.

If all 32 threads in a warp perform an indexed constant read to the same address it will be performed in 1 instruction issue if the request hits in the cache.

If all 32 threads in a warp perform a LDG to the same address in CC3.5 texture cache the data will be requested and returned over 8 cycles.