From_pretrained not loading custom fine-tuned model correctly "encoder weights were not tied to the decoder"

94 views Asked by At

In Google Colab I have loaded a BERT model using the Hugging Face transformers library and then finetuned it using Seq2SeqTrainer. I then saved this model to my Google Drive using model.save_pretrained("folder/path"). However, when I load this model in another Google Colab notebook using EncoderDecoder.from_pretrained(), I get this message:

 The following encoder weights were not tied to the decoder ['bert/pooler'] 
 The following encoder weights were not tied to the decoder ['bert/pooler'] 
 The following encoder weights were not tied to the decoder ['bert/pooler'] 
 The following encoder weights were not tied to the decoder ['bert/pooler']

Now, here's where it gets weird: My model then seems to work for the first time it's run (with some differences from when it's run in the same Colab notebook as it was finetuned on). But then I take the output I got and put it back into the model and I get the exact same output. As in, I input "apple" the first time, I get "banana". Then I input "banana" into the model and get "banana" again! Is this normal, or is this because the pooler weights haven't been properly set?

Here is my minimal code sample:

model = EncoderDecoderModel.from_encoder_decoder_pretrained("bert-base-uncased", "bert-base-uncased", tie_encoder_decoder=True)

model.config.max_length = 512
model.config.min_length = 10
model.config.no_repeat_ngram_size = 0
model.config.early_stopping = True
model.config.length_penalty = 2.0
model.config.num_beams = 4

training_args = Seq2SeqTrainingArguments(
    predict_with_generate=True,
    fp16=True,
    output_dir="./",
    logging_steps=2,
    save_steps=10
)

trainer = Seq2SeqTrainer(
    model=model,
    tokenizer=tokenizer,
    args=training_args,
    train_dataset=train
)

model.save_pretrained("/folder/path")
1

There are 1 answers

0
thewaterbuffalo On

So I figured out that the reason I got the "The following encoder weights were not tied to the decoder ['bert/pooler']" was because I had "tie_encoder_decoder=True" as an option when warm-starting the model before finetuning and then saving it. When I removed that the message went away.

I'm still having trouble with the output being basically the same after the second time the model is run, but there are a few differences now, so that's better. If someone could ever explain to me why I get the same output the second time the model is run, that would be nice, but it's a bit better now than it was.