-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Closed
Labels
Description
Description & Motivation
I wasted a lot of GPU compute and time trying to figure out why I was getting issues similar to #15708, #17813, and #18063. I thought something had been corrupted in my checkpoint due to the SLURM cluster I work on being iffy, but it turns out it was something as simple as an early-stopping and min_epochs conflict not being handled. I didn't think that something as simple as this would be the source of my issue.
Pitch
I see that the fix (#16719) has only been deployed to >1.9 users in the 1.9.2 release.
It would be nice to at least have a warning to users in 1.8 that using min_epochs with early stopping causes unwanted behaviour, to prevent users from wasting time and resources.
Alternatives
Merging the fix into 1.8 as well would be nicer of course.
Additional context
No response
awaelchli