Vilt Model causing RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)

23 views Asked by At

Vilt Model causing RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu) in the new update.

I have placed the model already on the GPU then running the below code

My same old code is running fine on other envs, but in my current env I newly installed the huggingface transformers library since then I’m facing a lot of issues in the same codes.

Any insights would be helpful, I checked on SO the solutions for this on other models but none helped, so raising a new question.

Here is the finetuning code I have:

optimizer = torch.optim.AdamW(model.parameters(), lr = 5e-5)
torch.set_grad_enabled(True)  # Context-manager 
model.train()

epochList, accList = [],[]


for epoch in tqdm(range(20)):  

print(f"Epoch: {epoch}")

for idx, batch in enumerate(train_dataloader):

    batch = {k:v.to(device) for k,v in batch.items()}

    optimizer.zero_grad()
    
    outputs = model(**batch)
    loss = outputs.loss

    print(idx,"-> Loss:", loss.item())
    
    loss.backward()
    optimizer.step()

    
    if (idx != 0 ) and (idx % 200 == 0):
        
        model.eval()
        
        acc_score_test = calculateAccuracyTest()
        acc_score_val = calculateAccuracyVal()
        
        print(f'\nValidation Accuracy: {acc_score_val}, Test Accuracy: {acc_score_test} \n')
                
        epochList.append((epoch*tot_number_of_steps)+idx)
        accList.append((acc_score_test,acc_score_val))

        model.train()

The stack Trace is huge, I'm bottom trace is:

~/miniconda3/envs/yolo/lib/python3.11/site-packages/transformers/models/vilt/modeling_vilt.py:219, in ViltEmbeddings.forward(self, input_ids, attention_mask, token_type_ids, pixel_values, pixel_mask, inputs_embeds, image_embeds, image_token_type_idx)
217 # PART 2: patch embeddings (with interpolated position encodings)
218 if image_embeds is None:
--> 219     image_embeds, image_masks, patch_index = self.visual_embed(
220         pixel_values, pixel_mask, max_image_length=self.config.max_image_length
221     )
222 else:
223     image_masks = pixel_mask.flatten(1)


~/miniconda3/envs/yolo/lib/python3.11/site-packages/transformers/models/vilt/modeling_vilt.py:186, in ViltEmbeddings.visual_embed(self, pixel_values, pixel_mask, max_image_length)
184 x = x[select[:, 0], select[:, 1]].view(batch_size, -1, num_channels)
185 x_mask = x_mask[select[:, 0], select[:, 1]].view(batch_size, -1)
--> 186 patch_index = patch_index[select[:, 0], select[:, 1]].view(batch_size, -1, 2)
187 pos_embed = pos_embed[select[:, 0], select[:, 1]].view(batch_size, -1, num_channels)
189 cls_tokens = self.cls_token.expand(batch_size, -1, -1)

RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)
0

There are 0 answers