gcloud GPU VM ERROR: Unable to load the kernel module 'nvidia.ko'

85 views Asked by At

I created a vm instance on Google Cloud using

  • 1 Nvidia V100 GPU
  • Deep Learning on Linux system
  • Deep Learning VM for Pytorch 2.1 with CUDA 12.1 M118

On the first start of the machine I am asked

This VM requires Nvidia drivers to function correctly.   Installation takes ~1 minute.
Would you like to install the Nvidia driver? [y/n] y

after 30 seconds I get:

ERROR: Unable to load the kernel module 'nvidia.ko'.  This happens most frequently when this kernel module was built against the wrong or improperly configured kernel sources, with a version of gcc that differs from the one used to build the target kernel, or if another driver, such as nouveau, is present and prevents the NVIDIA kernel module from obtaining ownership of the NVIDIA device(s), or no NVIDIA device installed in this system is supported by this NVIDIA Linux graphics driver release.

and I can see ...

>>>import torch
>>>torch.__version__
'2.0.0+cu118'
>>>torch.cuda.is_available()
False

I use vm images provided by Google. What shall I do here?

0

There are 0 answers