CUDA 11.1 Torch 3.8 + Conda issues trying to build Apex. TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'

78 views Asked by At

I started from a clean environment (I think), and I removed cuda and nvidia related drivers. From a fresh conda envrionment, I did the following:

conda install pytorch==1.8.0 torchvision==0.9.0 torchaudio==0.8.0 cudatoolkit=11.1 -c pytorch -c conda-forge

Following this, I installed nvidia-driver-450 with apt and restarted my computer. torch.cuda.is_available() returns true, however, when I try to build apex I get the following issue:

  TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'
  error: subprocess-exited-with-error
  

Supposedly this occurs when you try to build the repo without GPU, which I don't understand why that would be.

Side question: I cant figure this out. I deleted cuda from /usr/local because I kept having issues, and for the most part, using the cudatoolkit install by conda seems to work fine. Do I need to apt install cuda? I would really like to not, if possible.

I attempted multiple times to install the relevant drivers and wound up restarting from scratch as described above. I also tried setting TORCH_CUDA_ARCH_LIST to "compute capability", but this did not work.

EDIT: Figured it out. conda install cudatoolkit does NOT give you all cuda binarys. After doing that install, you have to set CUDA_HOME to CONDA_PREFIX, however, nvcc wont be in the Conda bin. So I need to install a development version like here: ibm.com/docs/en/wmlce/…. I'll just keep this open for future people

0

There are 0 answers