-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Description
🚀 Feature
At this point, I don't think that the pytorch lightning config.yaml
generated directly or with the CLI is uploaded as an artifact when using the WandbLogger
It would be nice to have that as an option for full reproducibility and to avoid saving out a temporary file in cases where many processes are running at the same time.
Motivation
Saving out the config.yaml
file for CLI usage is a great way to make things reproducible. However, when using WandbLogger
for storing assets/etc. you wouldn't have access to those files without manually managing them.
Pitch
Just as you have a log_model
option in the WandbLogger in https://pytorch-lightning.readthedocs.io/en/stable/api/pytorch_lightning.loggers.wandb.html what about having a
log_configoption in the constructor. If so, it uploads a WandB artifact just like the model. Given the importance of reproducibility, I think that defaulting this to
True` would be best. So for example, using the CLI I could do something like
cli = LightningCLI(MyModule,run=False)
called with a config file like
trainer:
logger:
class_path: pytorch_lightning.loggers.WandbLogger
init_args:
log_model: false
log_config: true
project: my_project_name
and it would log the generated config file. Since I didn't change the save_config_callback
maybe it still says to a file as well, but if I went LightningCLI(MyModule,run=False,save_config_callback=None)
then it wouldn't save out a file but would log it to wandb.
As for its implementation, as a user I don't really care, but my understanding is that this could override the save_config_callback
callback, but keep in mind that it still might be helpful to have the config file stored locally. So my gut says that it is best if this is ortthogonal to that callback, so users can still turn on/off the file logging or swap out their on save_config_callback
. But none of my business what happens behind the scenes.
One other thing. In the assets, wandb has its own config.yaml
stored in the artifacts which is confusing, so the filename would need to be something different, or maybe even customizable. e.g. log_config = None`` vs.
log_config = "my_filename_in_artifacts.yaml"` etc.
Alternatives
I tried to setup the code to upload the content directly for the run but it was just too messy. Focusing on the LIghtningCLI note that if I have
cli = LightningCLI(MyModule,run=False,save_config_callback=None)
with a config file like
trainer:
logger:
class_path: pytorch_lightning.loggers.WandbLogger
init_args:
dir: .wandb
log_model: false
project: my_project_name
tags:
- my_tag
Then at least it won't save a file in the .wandb
directory - which leads to clashes with multiplie concurrent processes.
I imagine this could be done by implementing a save_config_callback
manually, but I think it might be a lot of boilerplate.
cc @Borda @awaelchli @edward-io @ananthsub @rohitgr7 @Blaizzy @carmocca @mauvilsa