The PyTorch documentation on torch.save and torch.load contains the following section:
Save on CPU, Load on GPU
Save:
torch.save(model.state_dict(), PATH)
Load:
device = torch.device("cuda")
model = TheModelClass(*args, **kwargs)
model.load_state_dict(torch.load(PATH, map_location="cuda:0")) # Choose whatever GPU device number you want
model.to(device)
When loading a model on a GPU that was trained and saved on CPU, set the map_location argument in the torch.load() function to cuda:device_id. This loads the model to a given GPU device. Next, be sure to call model.to(torch.device('cuda')) to convert the model’s parameter tensors to CUDA tensors.
Isn't calling .to(device) in this example redundant since calling load with map_location must have already placed it in the GPU?
Moreover, the text says "Next, be sure to call model.to(torch.device('cuda')) to convert the model’s parameter tensors to CUDA tensors". But wasn't that conversion already done during load with map_location?
Using
torch.loadwithmap_locationset to a cuda device will load the state to the desired device. However, it is likely that whenload_state_dictis called, the tensors are copied to the model's CPU, including if the model is on the CPU. If you test your code without themodel.to(device), you will notice your model is, in fact, not on the GPU. The state was on the GPU, but the model never was. You can read more here.This means you are required to use the
model.to(device).You can prevent that by first transferring the model to the Cuda device, and then loading your state. But overall the number of transfers to the GPU is the same, 2: 1st with your initialized model, then 2nd the state.
You should be able to reduce the number of transfers by first loading the state on the CPU, then transfer the model to the GPU: