Pytorch Lightning Learning Rate Tuners Giving unexpected results

380 views Asked by At

I'm trying to find an optimal learning rate using python pl.tuner.Tuner but results aren't as expected

The model I am running is a linear classifier on top of a BertForSequenceClassification Automodel

I want to find the optimum learning rate when the bert model is frozen.

To do this I am running this code:

  
    tuner = pl.tuner.Tuner(trainer)
    results = tuner.lr_find(
        model, 
        # optimizer = optimizer,
        
        train_dataloaders=data_module, 
        min_lr=10e-8,
        max_lr=10.0,
    )
    # Plot with
    fig = results.plot(suggest=True)
    fig.show()

My optimizer is configured like this in the model:

   def configure_optimizers(self):
        """
        :return:
        """
        optimizer = torch.optim.AdamW(self.parameters(), lr=self.learning_rate)

        scheduler = get_linear_schedule_with_warmup(
            optimizer,
            num_warmup_steps=self.n_warmup_steps,
            num_training_steps=self.n_training_steps,
        )
        return dict(optimizer=optimizer, lr_scheduler=dict(scheduler=scheduler, interval="step"))

This produces:

Chart of loss against learning rate

I am confused as to why the loss is increasing at lower learning rates, and this is not what I was expecting.

I have tried:

  • removing the scheduler
  • freezing/ unfreezing the weights
  • Changing the initial learning rate

I was expecting a chart like this: https://github.com/comhar/pytorch-learning-rate-tuner/blob/master/images/learning_rate_tuner_plot.png

Any help appreciated

Many thanks

2

There are 2 answers

0
user22926078 On

I'm not sure if you have solved this problem, but I suggest you to use a larger num_training in tuner.lr_find()

Accoriding to the source code, the default value is 100

def _lr_find(
trainer: "pl.Trainer",
model: "pl.LightningModule",
min_lr: float = 1e-8,
max_lr: float = 1,
num_training: int = 100,
mode: str = "exponential",
early_stop_threshold: Optional[float] = 4.0,
update_attr: bool = False,
attr_name: str = "", 
) -> Optional[_LRFinder]:
0
Ricky On

I'm getting a similar plot to you, and there is a similar question on Stackoverflow: Unusual Learning Rate Finder Curve: Loss Lowest at Smallest Learning Rate

It may be due to the issue reported here: https://github.com/Lightning-AI/pytorch-lightning/issues/14167

i.e., there may be some moving average smoothing applied, which starts at 0, so the first few loss values are averaged along with 0 leading to the low loss observed on the plot.

However, that doesn't explain why there are many images online of results without this behaviour, unless they were generated with different versions of Lightning. If it is the cause, though, I guess we just have to ensure that the lowest learning rate tested is much too low to be near the optimal and then ignore the left-hand side of the resulting plot.