NvidiaGpuDriverLinux fails to install on NC6 instance

604 views Asked by At

Pretty much what the title says. The VM is "Standard NC6s v3" running Linux (ubuntu 20.04) which supports NVIDIA Tesla V100. I added the NVIDIA GPU Driver Extension when I provisioned this machine.

The actual deployment is stuck in "Transitioning" state

enter image description here

I'm able to connect to the VM and can confirm that there's a background apt-get task running:

> ps -aux | grep 2736
0:01 apt-get -o Dpkg::Options::=--force-overwrite --no-install-recommends install -y cuda-drivers
0:00 /usr/bin/perl -w /usr/share/debconf/frontend /usr/lib/dkms/common.postinst nvidia 530.30.02 /usr/share/nvidia x86_64

It's been more than 40 mins. How long should this take to complete (if it would complete at all)?

1

There are 1 answers

0
HowAreYou On BEST ANSWER

The issue with the NvidiaGpuDriverLinux extension being stuck in a transition state seems to be intermittent. I tried provisioning a Linux VM with the same extension and configuration in my environment. The first attempt failed, but when I tried again with the same configuration, it succeeded.

It's been more than 40 mins. How long should this take to complete (if it would complete at all)?

The deployment usually takes 10-15 minutes, sometimes up to 30 minutes. However, if the extension remains in the transitioning state for more than 30 minutes, it is possible that the deployment of the extension has failed.

You can try redeploying the extension in the 'Extensions + applications' tab in the VM by following the below steps or create a new VM.

delete the failed extension

install the extension again

enter image description here

References: NVIDIA GPU Driver Extension for Linux | Microsdoft Doccumentation