Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 11 additions & 15 deletions docs/source-pytorch/accelerators/gpu_faq.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,19 +20,15 @@ Let's say you have a batch size of 7 in your dataloader.
def train_dataloader(self):
return Dataset(..., batch_size=7)

In DDP, DDP_SPAWN, Deepspeed, DDP_SHARDED your effective batch size will be 7 * devices * num_nodes.
Whenever you use multiple devices and/or nodes, your effective batch size will be 7 * devices * num_nodes.

.. code-block:: python

# effective batch size = 7 * 8
Trainer(accelerator="gpu", devices=8, strategy="ddp")
Trainer(accelerator="gpu", devices=8, strategy="ddp_spawn")
Trainer(accelerator="gpu", devices=8, strategy="ddp_sharded")
Trainer(accelerator="gpu", devices=8, strategy=...)

# effective batch size = 7 * 8 * 10
Trainer(accelerator="gpu", devices=8, num_nodes=10, strategy="ddp")
Trainer(accelerator="gpu", devices=8, num_nodes=10, strategy="ddp_spawn")
Trainer(accelerator="gpu", devices=8, num_nodes=10, strategy="ddp_sharded")
Trainer(accelerator="gpu", devices=8, num_nodes=10, strategy=...)


.. note:: Huge batch sizes are actually really bad for convergence. Check out:
Expand All @@ -45,21 +41,21 @@ In DDP, DDP_SPAWN, Deepspeed, DDP_SHARDED your effective batch size will be 7 *
How do I use multiple GPUs on Jupyter or Colab notebooks?
*********************************************************

To use multiple GPUs on notebooks, use the *DDP_SPAWN* or *DDP_NOTEBOOK* mode.
To use multiple GPUs on notebooks, use the *DDP_NOTEBOOK* mode.

.. code-block:: python

Trainer(accelerator="gpu", devices=4, strategy="ddp_notebook" | "ddp_spawn")
Trainer(accelerator="gpu", devices=4, strategy="ddp_notebook")

If you want to use other models, please launch your training via the command-shell.
If you want to use other strategies, please launch your training via the command-shell.

----

*****************************************************
I'm getting errors related to Pickling. What do I do?
*****************************************************

Pickle is Python's mechanism for serializing and unserializing data. A majority of distributed modes require that your code is fully pickle compliant. If you run into an issue with pickling try the following to figure out the issue
Pickle is Python's mechanism for serializing and unserializing data. Some distributed modes require that your code is fully pickle compliant. If you run into an issue with pickling, try the following to figure out the issue.

.. code-block:: python

Expand All @@ -68,14 +64,14 @@ Pickle is Python's mechanism for serializing and unserializing data. A majority
model = YourModel()
pickle.dumps(model)

If you `ddp` your code doesn't need to be pickled.
For example, the `ddp_spawn` strategy has the pickling requirement. This is a limitation of Python.

.. code-block:: python

Trainer(accelerator="gpu", devices=4, strategy="ddp")
Trainer(accelerator="gpu", devices=4, strategy="ddp_spawn")

If you use `ddp_spawn` the pickling requirement remains. This is a limitation of Python.
If you use `ddp`, your code doesn't need to be pickled:

.. code-block:: python

Trainer(accelerator="gpu", devices=4, strategy="ddp_spawn")
Trainer(accelerator="gpu", devices=4, strategy="ddp")
8 changes: 2 additions & 6 deletions docs/source-pytorch/accelerators/gpu_intermediate.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ Lightning supports multiple ways of doing distributed training.
- Notebook/Fork (``strategy='ddp_notebook'``)

.. note::
If you request multiple GPUs or nodes without setting a mode, DDP Spawn will be automatically used.
If you request multiple GPUs or nodes without setting a strategy, DDP will be automatically used.

For a deeper understanding of what Lightning is doing, feel free to read this
`guide <https://medium.com/@_willfalcon/9-tips-for-training-lightning-fast-neural-networks-in-pytorch-8e63a502f565>`_.
Expand Down Expand Up @@ -196,13 +196,9 @@ Comparison of DDP variants and tradeoffs
- No
- Yes
- No
* - Is the guard ``if __name__=="__main__"`` required?
- Yes
- Yes
- No
* - Limitations in the main process
- None
- None
- The state of objects is not up-to-date after returning to the main process (`Trainer.fit()` etc). Only the model parameters get transferred over.
- GPU operations such as moving tensors to the GPU or calling ``torch.cuda`` functions before invoking ``Trainer.fit`` is not allowed.
* - Process creation time
- Slow
Expand Down
21 changes: 6 additions & 15 deletions docs/source-pytorch/common/trainer.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1055,32 +1055,23 @@ By setting to False, you have to add your own distributed sampler:
strategy
^^^^^^^^

Supports passing different training strategies with aliases (ddp, ddp_spawn, etc) as well as custom strategies.
Supports passing different training strategies with aliases (ddp, fsdp, etc) as well as configured strategies.

.. code-block:: python

# Training with the DistributedDataParallel strategy on 4 GPUs
# Data-parallel training with the DDP strategy on 4 GPUs
trainer = Trainer(strategy="ddp", accelerator="gpu", devices=4)

# Training with the DDP Spawn strategy using 4 cpu processes
trainer = Trainer(strategy="ddp_spawn", accelerator="cpu", devices=4)
# Model-parallel training with the FSDP strategy on 4 GPUs
trainer = Trainer(strategy="fsdp", accelerator="gpu", devices=4)

.. note:: Additionally, you can pass your custom strategy to the ``strategy`` argument.
Additionally, you can pass a strategy object.

.. code-block:: python

from pytorch_lightning.strategies import DDPStrategy


class CustomDDPStrategy(DDPStrategy):
def configure_ddp(self):
self._model = MyCustomDistributedDataParallel(
self.model,
device_ids=...,
)


trainer = Trainer(strategy=CustomDDPStrategy(), accelerator="gpu", devices=2)
trainer = Trainer(strategy=DDPStrategy(static_graph=True), accelerator="gpu", devices=2)

See Also:
- :ref:`Multi GPU Training <multi_gpu>`.
Expand Down
2 changes: 1 addition & 1 deletion docs/source-pytorch/extensions/strategy.rst
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@ The below table lists all relevant strategies available in Lightning with their
- Strategy for Fully Sharded Data Parallel training. :ref:`Learn more. <advanced/model_parallel:Fully Sharded Training>`
* - ddp_spawn
- :class:`~pytorch_lightning.strategies.DDPSpawnStrategy`
- Spawns processes using the :func:`torch.multiprocessing.spawn` method and joins processes after training finishes. :ref:`Learn more. <accelerators/gpu_intermediate:Distributed Data Parallel Spawn>`
- Spawns processes using the :func:`torch.multiprocessing.spawn` method and joins processes after training finishes. Useful for debugging. :ref:`Learn more. <accelerators/gpu_intermediate:Distributed Data Parallel Spawn>`
* - ddp
- :class:`~pytorch_lightning.strategies.DDPStrategy`
- Strategy for multi-process single-device training on one or multiple nodes. :ref:`Learn more. <accelerators/gpu_intermediate:Distributed Data Parallel>`
Expand Down
1 change: 0 additions & 1 deletion docs/source-pytorch/starter/style_guide.rst
Original file line number Diff line number Diff line change
Expand Up @@ -202,7 +202,6 @@ DataLoaders
Lightning uses :class:`~torch.utils.data.DataLoader` to handle all the data flow through the system. Whenever you structure dataloaders,
make sure to tune the number of workers for maximum efficiency.

.. warning:: Make sure not to use ``Trainer(strategy="ddp_spawn")`` with ``num_workers>0`` in the DataLoader or you will bottleneck you code.

DataModules
===========
Expand Down
3 changes: 3 additions & 0 deletions src/lightning/fabric/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,9 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).

- Changed arguments for precision settings (from [64|32|16|bf16] to ["64-true"|"32-true"|"16-mixed"|"bf16-mixed"]) ([#16767](https://github.com/Lightning-AI/lightning/pull/16767))

- The selection `Fabric(strategy="ddp_spawn", ...)` no longer falls back to "ddp" when a cluster environment gets detected ([#16780](https://github.com/Lightning-AI/lightning/pull/16780))


### Deprecated

-
Expand Down
8 changes: 0 additions & 8 deletions src/lightning/fabric/connector.py
Original file line number Diff line number Diff line change
Expand Up @@ -415,14 +415,6 @@ def _check_strategy_and_fallback(self) -> None:
# TODO this logic should apply to both str and object config
strategy_flag = "" if isinstance(self._strategy_flag, Strategy) else self._strategy_flag

if strategy_flag == "ddp_spawn" and (
TorchElasticEnvironment.detect()
or KubeflowEnvironment.detect()
or SLURMEnvironment.detect()
or LSFEnvironment.detect()
or MPIEnvironment.detect()
):
strategy_flag = "ddp"
if strategy_flag == "dp" and self._accelerator_flag == "cpu":
rank_zero_warn(f"{strategy_flag!r} is not supported on CPUs, hence setting `strategy='ddp'`.")
strategy_flag = "ddp"
Expand Down
7 changes: 7 additions & 0 deletions src/lightning/pytorch/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -109,6 +109,13 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).

- Changed arguments for precision settings (from [64|32|16|bf16] to ["64-true"|"32-true"|"16-mixed"|"bf16-mixed"]) ([#16783](https://github.com/Lightning-AI/lightning/pull/16783))


- When using multiple devices, the strategy now defaults to "ddp" instead of "ddp_spawn" when none is set ([#16780](https://github.com/Lightning-AI/lightning/pull/16780))


- The selection `Trainer(strategy="ddp_spawn", ...)` no longer falls back to "ddp" when a cluster environment gets detected ([#16780](https://github.com/Lightning-AI/lightning/pull/16780))


### Deprecated

-
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -450,12 +450,9 @@ def _choose_strategy(self) -> Union[Strategy, str]:
device = "cpu"
# TODO: lazy initialized device, then here could be self._strategy_flag = "single_device"
return SingleDeviceStrategy(device=device) # type: ignore
if len(self._parallel_devices) > 1:
if _IS_INTERACTIVE:
return "ddp_fork"
return "ddp_spawn"

return DDPStrategy.strategy_name
if len(self._parallel_devices) > 1 and _IS_INTERACTIVE:
return "ddp_fork"
return "ddp"

def _check_strategy_and_fallback(self) -> None:
"""Checks edge cases when the strategy selection was a string input, and we need to fall back to a
Expand All @@ -464,18 +461,6 @@ def _check_strategy_and_fallback(self) -> None:
# TODO this logic should apply to both str and object config
strategy_flag = "" if isinstance(self._strategy_flag, Strategy) else self._strategy_flag

if strategy_flag in (
"ddp_spawn",
"ddp_spawn_find_unused_parameters_false",
"ddp_spawn_find_unused_parameters_true",
) and (
TorchElasticEnvironment.detect()
or KubeflowEnvironment.detect()
or SLURMEnvironment.detect()
or LSFEnvironment.detect()
or MPIEnvironment.detect()
):
strategy_flag = "ddp"
if (
strategy_flag in FSDPStrategy.get_registered_strategies() or isinstance(self._strategy_flag, FSDPStrategy)
) and self._accelerator_flag not in ("cuda", "gpu"):
Expand Down
36 changes: 8 additions & 28 deletions tests/tests_fabric/test_connector.py
Original file line number Diff line number Diff line change
Expand Up @@ -534,9 +534,9 @@ def test_strategy_choice_ddp_spawn(*_):

@mock.patch("lightning.fabric.accelerators.cuda.num_cuda_devices", return_value=2)
@pytest.mark.parametrize("job_name,expected_env", [("some_name", SLURMEnvironment), ("bash", LightningEnvironment)])
@pytest.mark.parametrize("strategy", ["ddp", DDPStrategy])
@pytest.mark.parametrize("strategy", [None, "ddp", DDPStrategy])
def test_strategy_choice_ddp_slurm(_, strategy, job_name, expected_env):
if not isinstance(strategy, str):
if strategy and not isinstance(strategy, str):
strategy = strategy()

with mock.patch.dict(
Expand Down Expand Up @@ -571,35 +571,15 @@ def test_strategy_choice_ddp_slurm(_, strategy, job_name, expected_env):
)
@mock.patch("lightning.fabric.accelerators.cuda.num_cuda_devices", return_value=2)
@mock.patch("lightning.fabric.accelerators.mps.MPSAccelerator.is_available", return_value=False)
def test_strategy_choice_ddp_te(*_):
connector = _Connector(strategy="ddp", accelerator="gpu", devices=2)
def test_strategy_choice_ddp_torchelastic(*_):
connector = _Connector(accelerator="gpu", devices=2)
assert isinstance(connector.accelerator, CUDAAccelerator)
assert isinstance(connector.strategy, DDPStrategy)
assert isinstance(connector.strategy.cluster_environment, TorchElasticEnvironment)
assert connector.strategy.cluster_environment.local_rank() == 1
assert connector.strategy.local_rank == 1


@mock.patch.dict(
os.environ,
{
"WORLD_SIZE": "2",
"LOCAL_WORLD_SIZE": "2",
"RANK": "1",
"LOCAL_RANK": "1",
"GROUP_RANK": "0",
"TORCHELASTIC_RUN_ID": "1",
},
)
def test_strategy_choice_ddp_cpu_te():
connector = _Connector(strategy="ddp_spawn", accelerator="cpu", devices=2)
assert isinstance(connector.accelerator, CPUAccelerator)
assert isinstance(connector.strategy, DDPStrategy)
assert isinstance(connector.strategy.cluster_environment, TorchElasticEnvironment)
assert connector.strategy.cluster_environment.local_rank() == 1
assert connector.strategy.local_rank == 1


@mock.patch.dict(
os.environ,
{
Expand All @@ -611,10 +591,10 @@ def test_strategy_choice_ddp_cpu_te():
"RANK": "1",
},
)
@mock.patch("lightning.fabric.accelerators.cuda.num_cuda_devices", return_value=1)
@mock.patch("lightning.fabric.accelerators.cuda.num_cuda_devices", return_value=2)
@mock.patch("lightning.fabric.accelerators.mps.MPSAccelerator.is_available", return_value=False)
def test_strategy_choice_ddp_kubeflow(*_):
connector = _Connector(strategy="ddp", accelerator="gpu", devices=1)
connector = _Connector(accelerator="gpu", devices=2)
assert isinstance(connector.accelerator, CUDAAccelerator)
assert isinstance(connector.strategy, DDPStrategy)
assert isinstance(connector.strategy.cluster_environment, KubeflowEnvironment)
Expand All @@ -633,7 +613,7 @@ def test_strategy_choice_ddp_kubeflow(*_):
},
)
def test_strategy_choice_ddp_cpu_kubeflow():
connector = _Connector(strategy="ddp_spawn", accelerator="cpu", devices=2)
connector = _Connector(accelerator="cpu", devices=2)
assert isinstance(connector.accelerator, CPUAccelerator)
assert isinstance(connector.strategy, DDPStrategy)
assert isinstance(connector.strategy.cluster_environment, KubeflowEnvironment)
Expand All @@ -653,7 +633,7 @@ def test_strategy_choice_ddp_cpu_kubeflow():
"SLURM_LOCALID": "0",
},
)
@pytest.mark.parametrize("strategy", ["ddp", DDPStrategy()])
@pytest.mark.parametrize("strategy", [None, "ddp", DDPStrategy()])
def test_strategy_choice_ddp_cpu_slurm(strategy):
connector = _Connector(strategy=strategy, accelerator="cpu", devices=2)
assert isinstance(connector.accelerator, CPUAccelerator)
Expand Down
2 changes: 1 addition & 1 deletion tests/tests_pytorch/accelerators/test_common.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
from lightning.pytorch.strategies import DDPStrategy


def test_pluggable_accelerator():
def test_pluggable_accelerator(mps_count_0, cuda_count_2):
class TestAccelerator(Accelerator):
def setup_device(self, device: torch.device) -> None:
pass
Expand Down
4 changes: 2 additions & 2 deletions tests/tests_pytorch/accelerators/test_tpu.py
Original file line number Diff line number Diff line change
Expand Up @@ -236,7 +236,7 @@ def forward(self, x):
assert torch.all(torch.eq(model.net_a.layer.weight, model.net_b.layer.weight))


def test_tpu_invalid_raises(tpu_available):
def test_tpu_invalid_raises(tpu_available, mps_count_0):
strategy = XLAStrategy(accelerator=TPUAccelerator(), precision_plugin=PrecisionPlugin())
with pytest.raises(ValueError, match="TPUAccelerator` can only be used with a `TPUPrecisionPlugin"):
Trainer(strategy=strategy, devices=8)
Expand All @@ -246,7 +246,7 @@ def test_tpu_invalid_raises(tpu_available):
Trainer(strategy=strategy, devices=8)


def test_tpu_invalid_raises_set_precision_with_strategy(tpu_available):
def test_tpu_invalid_raises_set_precision_with_strategy(tpu_available, mps_count_0):
accelerator = TPUAccelerator()
strategy = XLAStrategy(accelerator=accelerator, precision_plugin=PrecisionPlugin())
with pytest.raises(ValueError, match="`TPUAccelerator` can only be used with a `TPUPrecisionPlugin`"):
Expand Down
2 changes: 1 addition & 1 deletion tests/tests_pytorch/models/test_amp.py
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,7 @@ def test_amp_gpus(tmpdir, strategy, precision, devices):
max_epochs=1,
accelerator="gpu",
devices=devices,
strategy=strategy,
strategy=("ddp_spawn" if strategy is None and devices > 1 else strategy),
precision=precision,
)

Expand Down
2 changes: 2 additions & 0 deletions tests/tests_pytorch/models/test_gpu.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@ def test_multi_gpu_none_backend(tmpdir):
limit_train_batches=0.2,
limit_val_batches=0.2,
accelerator="gpu",
strategy="ddp_spawn",
devices=2,
)

Expand All @@ -62,6 +63,7 @@ def test_single_gpu_model(tmpdir, devices):
limit_val_batches=0.1,
accelerator="gpu",
devices=devices,
strategy="ddp_spawn",
)

model = BoringModel()
Expand Down
2 changes: 1 addition & 1 deletion tests/tests_pytorch/strategies/test_ddp.py
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@ def test_torch_distributed_backend_invalid(cuda_count_2, tmpdir):
@RunIf(skip_windows=True)
@mock.patch("torch.cuda.set_device")
@mock.patch("lightning.pytorch.accelerators.cuda._check_cuda_matmul_precision")
def test_ddp_torch_dist_is_available_in_setup(_, __, cuda_count_1, tmpdir):
def test_ddp_torch_dist_is_available_in_setup(_, __, cuda_count_1, mps_count_0, tmpdir):
"""Test to ensure torch distributed is available within the setup hook using ddp."""

class TestModel(BoringModel):
Expand Down
7 changes: 2 additions & 5 deletions tests/tests_pytorch/strategies/test_ddp_spawn_strategy.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,14 +46,11 @@ def validation_step(self, batch, batch_idx):
@RunIf(skip_windows=True)
def test_ddp_cpu():
"""Tests if device is set correctly when training for DDPSpawnStrategy."""
trainer = Trainer(devices=2, accelerator="cpu", fast_dev_run=True)
trainer = Trainer(devices=2, strategy="ddp_spawn", accelerator="cpu", fast_dev_run=True)
# assert strategy attributes for device setting

assert isinstance(trainer.strategy, DDPSpawnStrategy)
assert trainer.strategy.root_device == torch.device("cpu")

model = BoringModelDDPCPU()

trainer.fit(model)


Expand Down Expand Up @@ -125,7 +122,7 @@ def test_ddp_spawn_configure_ddp(tmpdir):


@mock.patch("torch.distributed.init_process_group")
def test_ddp_spawn_strategy_set_timeout(mock_init_process_group):
def test_ddp_spawn_strategy_set_timeout(mock_init_process_group, cuda_count_2, mps_count_0):
"""Test that the timeout gets passed to the ``torch.distributed.init_process_group`` function."""
test_timedelta = timedelta(seconds=30)
model = BoringModel()
Expand Down
Loading