Skip to content

MLFlowLogger's status is "RUNNING" even after training failed #12291

@ritsuki1227

Description

@ritsuki1227

🐛 Bug

If a trainer with MLFlowLogger raises an error, the user should be able to see the MLflow's screen to check the training has been failed.
MLflow's status remains "RUNNING" even after trainer.fit raises an error in the current implementation, so the user cannot know whether the training is still in progress or failed.

Current behavior when training finished with an error:

スクリーンショット 2022-03-10 21 41 18

Expected behavior:

スクリーンショット 2022-03-10 21 31 23

To Reproduce

class CustomModel(BoringModel):
    def training_step(self, batch, batch_idx):
        super().training_step(batch, batch_idx)
        raise BaseException
trainer = Trainer(logger=MLFlowLogger("test"))
try:
    trainer.fit(CustomModel())
finally:
    print(trainer.logger.experiment.get_run(trainer.logger.run_id).info.status) # This should be 'FAILED'

cc @Borda

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions