Nvidia visual profiler not showing cudaMalloc() after kernel launch

Question

Nvidia visual profiler not showing cudaMalloc() after kernel launch

176 views Asked by progammer At 23 October 2018 at 09:43

I am trying to write a program that runs almost entirely on the GPU (with very little interaction with the host). initKernel is the first kernel that is being launched from the host. I use Dynamic parallelism to launch successive kernels from the initKernel, two of which are thrust::sort(thrust::device,...).

Before launching the initKernel, I do a cudaMalloc() on the host code and it is shown in the Runtime API of the Visual profiler. None of the cudaMallocs that appear in the __device__ functions and successive kernels (after the launch of initKernel) are shown in the Runtime API of the Visual profiler. Can someone help me understand why I cannot see the cudaMallocs in the Visual profiler?

Thank you for your time.

Original Q&A

There are 1 answers

**Robert Crovella** · Accepted Answer · 2018-10-23T13:22:01+00:00

Robert Crovella On 23 October 2018 at 13:22 BEST ANSWER

Can someone help me understand why I cannot see the cudaMallocs in the Visual profiler?

Because it is a documented limitation of the tool. From the documentation:

The Visual Profiler timeline does not display CUDA API calls invoked from within device-launched kernels.

TechQA.

Nvidia visual profiler not showing cudaMalloc() after kernel launch

There are 1 answers

Related Questions in CUDA

Related Questions in NVIDIA

Related Questions in THRUST

Related Questions in DYNAMIC-PARALLELISM

Popular Questions

Trending Questions