Skip to content

Conversation

KAVYANSHTYAGI
Copy link
Contributor

@KAVYANSHTYAGI KAVYANSHTYAGI commented Sep 7, 2025

Fixes #21141

Summary

  • Fix LR-Finder’s suggestion by giving torch.gradient the correct sample points:
  • Exponential mode: differentiate w.r.t. log10(lr) → spacing=[torch.log10(lrs)]
  • Linear mode: differentiate w.r.t. lr → spacing=[lrs]
  • This matches how points are sampled and avoids bias from treating log-spaced LRs as linear.

Motivation

torch.gradient(y, spacing=x)

expects the coordinates of samples. With log-spaced LRs, the intended derivative is
∂loss/∂log⁡10lr
Previous behavior used linear LR coordinates, which is inconsistent and can skew the suggested LR.

Tests

  • Use torch.logspace for exponential mode.
  • Assert finite gradients and correct index selection using proper spacing.
  • Remove brittle “with vs without spacing must differ” check.

Impact

  • User-visible: Suggested LR may shift in mode="exponential" (now mathematically correct).
  • API: No changes.
  • Performance: Negligible overhead (single log10 over ~10²–10³ points).

📚 Documentation preview 📚: https://pytorch-lightning--21171.org.readthedocs.build/en/21171/

@github-actions github-actions bot added the pl Generic label for PyTorch Lightning package label Sep 7, 2025
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes the LR Finder's gradient computation to use the correct sample spacing for torch.gradient, ensuring mathematical correctness when suggesting learning rates. The key change is differentiating with respect to log10(lr) in exponential mode rather than treating log-spaced learning rates as linear coordinates.

  • Updates gradient computation to use log10(lr) spacing for exponential mode and linear lr spacing for linear mode
  • Fixes test to use torch.logspace for exponential mode testing and validates the suggestion against expected gradient-based selection
  • Removes brittle test assertion comparing gradients with and without spacing

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
src/lightning/pytorch/tuner/lr_finder.py Updates suggestion method to use correct spacing parameter based on mode
tests/tests_pytorch/tuner/test_lr_finder.py Fixes test to use log-spaced LRs and validates gradient computation correctness

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@bhimrazy bhimrazy added the tuner label Sep 8, 2025
@SkafteNicki
Copy link
Collaborator

@KAVYANSHTYAGI thanks for the PR. Does this solve the problems mention in issue #21141?

Copy link
Contributor Author

@KAVYANSHTYAGI KAVYANSHTYAGI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It mitigates the problem from #21141 but does not fully implement the “step based gradient” behavior some commenters requested.

What this PR fixes

  • In mode="exponential", gradients are now computed w.r.t. log10(lr) (i.e., the actual sampling coordinate) not linear LR.
  • This removes the non-uniform-grid pathology introduced in ≥2.5.3, which amplified early-step noise and biased suggestions toward tiny LRs.

What it does not change

  • It does not switch the derivative to training-step index (uniform step spacing). Linear mode remains unchanged.

@SkafteNicki
Copy link
Collaborator

Got confirmation that issue #21141 can be closed when this is merged, so we can move forward with this

@daidahao
Copy link

With all due respect, this essentially reverts #21055 and is mathematically equivalent to how gradient was calculated before. This is also what I suggested in #21141. To be clear, I am not blaming anyone here. Rather, I think we could all learn from this experience that robustly finding a "good" learning rate is a hard problem deserving in-depth discussions and real-world testing before code breaking changes. In light of this, I would recommend that any future changes of LR finder behaviours should require a good amount of academic literature and/or empirical evidence.

@SkafteNicki @dennisbader @KAVYANSHTYAGI Thank you for your contributions on addressing this issue so quickly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pl Generic label for PyTorch Lightning package tuner
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Learning Rate finder does not find optimal lr anymore since version 2.5.3
5 participants