-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Closed
Labels
breaking changeIncludes a breaking changeIncludes a breaking changediscussionIn a discussion stageIn a discussion stagestrategy: ddpDistributedDataParallelDistributedDataParallel
Milestone
Description
🚀 Feature
Currently Lightning defaults to setting find_unused_parameters=True
when using PyTorch's DistributedDataParallel. https://github.com/PyTorchLightning/pytorch-lightning/blob/39274273a4c0a8d0b6bf3ee862a02ec4f5bf705a/pytorch_lightning/plugins/training_type/ddp.py#L221-L225
I propose that we change it back to False
Motivation
Changing the default to False offers these benefits:
- We keep in sync with upstream PyTorch.
- For the vast majority of models, we save their time and compute resources ($). Then users don't need to worry about this setting for a performance tip: https://pytorch-lightning.readthedocs.io/en/latest/benchmarking/performance.html#when-using-ddp-set-find-unused-parameters-false
- Example: Users who are migrating from a custom PyTorch training loop to Lightning can easily miss this flag difference, causing confusion around why performance doesn't match
Cons:
- For some models this crashes
Mitigations:
- In these instances, we can have clear documentation for the error message, and that they need to instantiate a DDP plugin with find_unused_parameters=True, and pass this to the trainer like so
ddp = DDPPlugin(find_unused_parameters=True)
trainer = Trainer(..., plugins=[ddp])
Or by using the new registry
cc @Borda @justusschock @kaushikb11 @awaelchli @akihironitta @rohitgr7 @Queuecumber
Pitch
Remove this default here: https://github.com/PyTorchLightning/pytorch-lightning/blob/39274273a4c0a8d0b6bf3ee862a02ec4f5bf705a/pytorch_lightning/plugins/training_type/ddp.py#L221-L225
Alternatives
Additional context
camruta, maximsch2, SeanNaren, yutingye, bernardwin and 5 moreSeanNaren, hudeven and spirosbax
Metadata
Metadata
Assignees
Labels
breaking changeIncludes a breaking changeIncludes a breaking changediscussionIn a discussion stageIn a discussion stagestrategy: ddpDistributedDataParallelDistributedDataParallel