-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Fix min-epochs and early-stopping triggering too many validation runs #16719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
for more information, see https://pre-commit.ci
…o bugfix/early-stop-min-steps
⚡ Required checks status: All passing 🟢Groups summary🟢 pytorch_lightning: Tests workflow
These checks are required after the changes to 🟢 pytorch_lightning: Azure GPU
These checks are required after the changes to 🟢 pytorch_lightning: Azure HPU
These checks are required after the changes to 🟢 pytorch_lightning: Azure IPU
These checks are required after the changes to 🟢 pytorch_lightning: Docs
These checks are required after the changes to 🟢 mypy
These checks are required after the changes to 🟢 installThese checks are required after the changes to 🟢 link-check
These checks are required after the changes to Thank you for your contribution! 💜
|
…#16719) Co-authored-by: Jirka Borovec <[email protected]>
* Add .git-blame-ignore-revs (#16709) Co-authored-by: Jirka Borovec <[email protected]> * Fix strategy type validation in connectors (#16693) * Disable strict loading in multiprocessing launcher (#16365) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Jirka <[email protected]> * Fix min-epochs and early-stopping triggering too many validation runs (#16719) Co-authored-by: Jirka Borovec <[email protected]> * Update hydra-core requirement from <1.3.0,>=1.0.5 to >=1.0.5,<1.4.0 in /requirements (#16736) Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * [App] Add support for private data (#16738) Co-authored-by: thomas <[email protected]> * [App] Add rm one level below project level (#16740) Co-authored-by: Ethan Harris <[email protected]> Co-authored-by: Justus Schock <[email protected]> Co-authored-by: thomas <[email protected]> * ci: cleaning caches (#16752) * CI: Update colossalai version (#16747) Co-authored-by: Carlos Mocholí <[email protected]> type * Update version and changelog for 1.9.2 --------- Co-authored-by: Akihiro Nitta <[email protected]> Co-authored-by: Jirka Borovec <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Jirka <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: thomas chaton <[email protected]> Co-authored-by: thomas <[email protected]> Co-authored-by: Ethan Harris <[email protected]> Co-authored-by: Justus Schock <[email protected]>
What does this PR do?
Fixes #15708
There is an unfortunate interaction between the early stopping trigger mechanism and min_epochs not being reached, that then leads to the validation being triggered on every subsequent training step due to this condition here:
https://github.com/Lightning-AI/lightning/blob/5196eaa5264c7b95316718aa2d173dd42c5d9936/src/lightning/pytorch/loops/epoch/training_epoch_loop.py#L392-L394
This then manifests in a big runtime increase for subsequent epochs.
Before submitting
PR review
Anyone in the community is free to review the PR once the tests have passed.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:
Did you have fun?
I made sure I had fun coding 🙃
cc @Borda @carmocca @awaelchli @justusschock