Skip to content

Upload pytorch lightning config.yaml as artifact for WandB logger #14628

@jlperla

Description

@jlperla

🚀 Feature

At this point, I don't think that the pytorch lightning config.yaml generated directly or with the CLI is uploaded as an artifact when using the WandbLogger

It would be nice to have that as an option for full reproducibility and to avoid saving out a temporary file in cases where many processes are running at the same time.

Motivation

Saving out the config.yaml file for CLI usage is a great way to make things reproducible. However, when using WandbLogger for storing assets/etc. you wouldn't have access to those files without manually managing them.

Pitch

Just as you have a log_model option in the WandbLogger in https://pytorch-lightning.readthedocs.io/en/stable/api/pytorch_lightning.loggers.wandb.html what about having a log_configoption in the constructor. If so, it uploads a WandB artifact just like the model. Given the importance of reproducibility, I think that defaulting this toTrue` would be best. So for example, using the CLI I could do something like

    cli = LightningCLI(MyModule,run=False)

called with a config file like

trainer:
  logger:
    class_path: pytorch_lightning.loggers.WandbLogger
    init_args:
      log_model: false
      log_config: true
      project: my_project_name

and it would log the generated config file. Since I didn't change the save_config_callback maybe it still says to a file as well, but if I went LightningCLI(MyModule,run=False,save_config_callback=None) then it wouldn't save out a file but would log it to wandb.

As for its implementation, as a user I don't really care, but my understanding is that this could override the save_config_callback callback, but keep in mind that it still might be helpful to have the config file stored locally. So my gut says that it is best if this is ortthogonal to that callback, so users can still turn on/off the file logging or swap out their on save_config_callback. But none of my business what happens behind the scenes.

One other thing. In the assets, wandb has its own config.yaml stored in the artifacts which is confusing, so the filename would need to be something different, or maybe even customizable. e.g. log_config = None`` vs. log_config = "my_filename_in_artifacts.yaml"` etc.

Alternatives

I tried to setup the code to upload the content directly for the run but it was just too messy. Focusing on the LIghtningCLI note that if I have

    cli = LightningCLI(MyModule,run=False,save_config_callback=None)

with a config file like

trainer:
  logger:
    class_path: pytorch_lightning.loggers.WandbLogger
    init_args:
      dir: .wandb
      log_model: false
      project: my_project_name
      tags:
        - my_tag

Then at least it won't save a file in the .wandb directory - which leads to clashes with multiplie concurrent processes.

I imagine this could be done by implementing a save_config_callback manually, but I think it might be a lot of boilerplate.

cc @Borda @awaelchli @edward-io @ananthsub @rohitgr7 @Blaizzy @carmocca @mauvilsa

Metadata

Metadata

Assignees

No one assigned

    Labels

    featureIs an improvement or enhancementlightningclipl.cli.LightningCLIloggerRelated to the Loggers

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions