-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Closed
Labels
Milestone
Description
🐛 Bug
If a trainer with MLFlowLogger
raises an error, the user should be able to see the MLflow's screen to check the training has been failed.
MLflow's status remains "RUNNING" even after trainer.fit
raises an error in the current implementation, so the user cannot know whether the training is still in progress or failed.
Current behavior when training finished with an error:
Expected behavior:
To Reproduce
class CustomModel(BoringModel):
def training_step(self, batch, batch_idx):
super().training_step(batch, batch_idx)
raise BaseException
trainer = Trainer(logger=MLFlowLogger("test"))
try:
trainer.fit(CustomModel())
finally:
print(trainer.logger.experiment.get_run(trainer.logger.run_id).info.status) # This should be 'FAILED'
cc @Borda