TechQA.

Question

On today's GPUs, can warps be recombined dynamically?

score 47 · Answer 1 · 2023-11-05T18:00:44.897000

0

Answer

47

Views

On today's GPUs, can warps be recombined dynamically?

47 views Asked by Armin Rigo At 05 November 2023 at 18:00

score 178 · Answer 2 · 2023-09-28T00:19:29.357000

CUDA __shfl_down_sync does not work with __match_any_sync

178 views Asked by SnowSR At 28 September 2023 at 00:19

score 2151 · Answer 3 · 2023-04-27T18:54:47.483000

What is warp shuffling in CUDA and why is it useful?

2.1k views Asked by gonidelis At 27 April 2023 at 18:54

score 159 · Answer 4 · 2022-11-15T12:20:19.553000

Compute per-warp histogram without shared memory

159 views Asked by pem At 15 November 2022 at 12:20

score 619 · Answer 5 · 2022-07-08T04:42:50.273000

Why is my CUDA warp shuffle sum using the wrong offset for one shuffle step?

619 views Asked by nanofarad At 08 July 2022 at 04:42

score 546 · Answer 6 · 2022-06-17T21:31:26.767000

Are threads in a multi-dimensional CUDA kernel blocks packed to fill warps?

546 views Asked by einpoklum At 17 June 2022 at 21:31

score 645 · Answer 7 · 2022-03-22T18:41:50.587000

In CUDA, how can I get this warp's thread mask in conditionally executed code (in order to execute e.g., __shfl_sync or <cg>.shfl?

645 views Asked by sg_man At 22 March 2022 at 18:41

score 442 · Answer 8 · 2022-01-26T16:40:36.710000

Monitor active warps and threads during a divergent CUDA run

442 views Asked by Silicomancer At 26 January 2022 at 16:40

score 259 · Answer 9 · 2021-10-16T14:23:05.947000

Pre 8.x equivalent of __reduce_max_sync() in CUDA

259 views Asked by Serge Rogatch At 16 October 2021 at 14:23

score 1025 · Answer 10 · 2020-01-23T13:06:01.373000

What's the alternative for __match_any_sync on compute capability 6?

1k views Asked by Johan At 23 January 2020 at 13:06

score 299 · Answer 11 · 2019-07-23T12:36:34.463000

Why use thread blocks larger than the number of cores per multiprocessor

299 views Asked by Numaerius At 23 July 2019 at 12:36

score 2035 · Answer 12 · 2019-01-08T11:27:34.523000

CUDA shared memory and warp synchronization

2k views Asked by nglee At 08 January 2019 at 11:27

score 4572 · Answer 13 · 2019-01-05T18:54:03.450000

activemask() vs ballot_sync()

4.5k views Asked by Fabio T. At 05 January 2019 at 18:54

score 1013 · Answer 14 · 2018-12-08T18:06:21.190000

OpenGL compute shader mapping to nVidia warps

1k views Asked by Danol At 08 December 2018 at 18:06

score 84 · Answer 15 · 2018-05-08T07:32:26.283000

Warp scheduling in Kepler GPU

84 views Asked by StrikeW At 08 May 2018 at 07:32

score 3871 · Answer 16 · 2018-03-09T16:19:24.490000

Warp shuffling for CUDA

3.8k views Asked by Timocafé At 09 March 2018 at 16:19

score 1803 · Answer 17 · 2018-03-08T00:24:47.633000

CUDA Reduction: Warp Unrolling (School)

1.8k views Asked by Michael Choi At 08 March 2018 at 00:24

score 315 · Answer 18 · 2018-02-07T00:11:02.677000

How do I do the converse of shfl.idx (i.e. warp scatter instead of warp gather)?

315 views Asked by einpoklum At 07 February 2018 at 00:11

score 734 · Answer 19 · 2018-01-04T16:29:39.490000

Do modern nVIDIA GPUs perform sub-warp scheduling of work?

734 views Asked by einpoklum At 04 January 2018 at 16:29

score 645 · Answer 20 · 2017-09-27T22:15:17.227000

Some intrinsics named with `_sync()` appended in CUDA 9; semantics same?

645 views Asked by einpoklum At 27 September 2017 at 22:15

TechQA.

List Question

On today's GPUs, can warps be recombined dynamically?

CUDA __shfl_down_sync does not work with __match_any_sync

What is warp shuffling in CUDA and why is it useful?

Compute per-warp histogram without shared memory

Why is my CUDA warp shuffle sum using the wrong offset for one shuffle step?

Are threads in a multi-dimensional CUDA kernel blocks packed to fill warps?

In CUDA, how can I get this warp's thread mask in conditionally executed code (in order to execute e.g., __shfl_sync or <cg>.shfl?

Monitor active warps and threads during a divergent CUDA run

Pre 8.x equivalent of __reduce_max_sync() in CUDA

What's the alternative for __match_any_sync on compute capability 6?

Why use thread blocks larger than the number of cores per multiprocessor

CUDA shared memory and warp synchronization

activemask() vs ballot_sync()

OpenGL compute shader mapping to nVidia warps

Warp scheduling in Kepler GPU

Warp shuffling for CUDA

CUDA Reduction: Warp Unrolling (School)

How do I do the converse of shfl.idx (i.e. warp scatter instead of warp gather)?

Do modern nVIDIA GPUs perform sub-warp scheduling of work?

Some intrinsics named with `_sync()` appended in CUDA 9; semantics same?

Popular Questions

Trending Questions