Custom patch embedding layer for pre-trained Vision transformers

Question

Custom patch embedding layer for pre-trained Vision transformers

26 views Asked by paper At 25 March 2024 at 20:38

I am looking for a way to use a custom patch embedding layer for vanilla ViT. I want the rest of the ViT from the pre-trained model. Is there a way to do it using Pytorch?

I can load a pre-trained model or can write the ViT code from scratch. But I want something where I can use the weights of the model for the layers after the patch embeddings.

Thanks,

Original Q&A

There are 1 answers

**Karl** · Answer 1 · 2024-03-26T17:00:40+00:00

It depends a lot on the architecture of the two models, but you can do something like this:

class MyVIT(nn.Module):
    def __init__(self, embedding, vit_model):
        super().__init__()
        self.embedding = embedding
        self.vit_model = vit_model

    def forward(self, x):
        x = self.embedding(x)
        x = self.vit_model(x)
        return x

In this example, embedding would be your custom embedding, and vit_model would be all the trained layers of the vit model except the embedding. Depending on how the vit model is structured, you may need to hack into it to extract the non-embedding layers in a way that allows you to simply pass an input to them.

TechQA.

Custom patch embedding layer for pre-trained Vision transformers

There are 1 answers

Related Questions in PYTORCH

Related Questions in VISION-TRANSFORMER

Popular Questions

Trending Questions