Skip to content

Commit 732fdf9

Browse files
authored
Remove legacy device arguments in Trainer (#16171)
1 parent 1ad68e5 commit 732fdf9

File tree

19 files changed

+75
-405
lines changed

19 files changed

+75
-405
lines changed

docs/source-pytorch/common/trainer.rst

Lines changed: 0 additions & 138 deletions
Original file line numberDiff line numberDiff line change
@@ -492,8 +492,6 @@ devices
492492
^^^^^^^
493493

494494
Number of devices to train on (``int``), which devices to train on (``list`` or ``str``), or ``"auto"``.
495-
It will be mapped to either ``gpus``, ``tpu_cores``, ``num_processes`` or ``ipus``,
496-
based on the accelerator type (``"cpu", "gpu", "tpu", "ipu", "auto"``).
497495

498496
.. code-block:: python
499497
@@ -624,56 +622,6 @@ impact to subsequent runs. These are the changes enabled:
624622
- Disables the Tuner.
625623
- If using the CLI, the configuration file is not saved.
626624

627-
.. _gpus:
628-
629-
gpus
630-
^^^^
631-
632-
.. warning:: ``gpus=x`` has been deprecated in v1.7 and will be removed in v2.0.
633-
Please use ``accelerator='gpu'`` and ``devices=x`` instead.
634-
635-
.. raw:: html
636-
637-
<video width="50%" max-width="400px" controls
638-
poster="https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/pl_docs/trainer_flags/thumb/gpus.jpg"
639-
src="https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/pl_docs/trainer_flags/gpus.mp4"></video>
640-
641-
|
642-
643-
- Number of GPUs to train on (int)
644-
- or which GPUs to train on (list)
645-
- can handle strings
646-
647-
.. testcode::
648-
649-
# default used by the Trainer (ie: train on CPU)
650-
trainer = Trainer(gpus=None)
651-
652-
# equivalent
653-
trainer = Trainer(gpus=0)
654-
655-
Example::
656-
657-
# int: train on 2 gpus
658-
trainer = Trainer(gpus=2)
659-
660-
# list: train on GPUs 1, 4 (by bus ordering)
661-
trainer = Trainer(gpus=[1, 4])
662-
trainer = Trainer(gpus='1, 4') # equivalent
663-
664-
# -1: train on all gpus
665-
trainer = Trainer(gpus=-1)
666-
trainer = Trainer(gpus='-1') # equivalent
667-
668-
# combine with num_nodes to train on multiple GPUs across nodes
669-
# uses 8 gpus in total
670-
trainer = Trainer(gpus=2, num_nodes=4)
671-
672-
# train only on GPUs 1 and 4 across nodes
673-
trainer = Trainer(gpus=[1, 4], num_nodes=4)
674-
675-
See Also:
676-
- :ref:`Multi GPU Training <multi_gpu>`
677625

678626
gradient_clip_val
679627
^^^^^^^^^^^^^^^^^
@@ -951,33 +899,6 @@ Number of GPU nodes for distributed training.
951899
# to train on 8 nodes
952900
trainer = Trainer(num_nodes=8)
953901

954-
num_processes
955-
^^^^^^^^^^^^^
956-
957-
.. warning:: ``num_processes=x`` has been deprecated in v1.7 and will be removed in v2.0.
958-
Please use ``accelerator='cpu'`` and ``devices=x`` instead.
959-
960-
.. raw:: html
961-
962-
<video width="50%" max-width="400px" controls
963-
poster="https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/pl_docs/trainer_flags/thumb/num_processes.jpg"
964-
src="https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/pl_docs/trainer_flags/num_processes.mp4"></video>
965-
966-
|
967-
968-
Number of processes to train with. Automatically set to the number of GPUs
969-
when using ``strategy="ddp"``. Set to a number greater than 1 when
970-
using ``accelerator="cpu"`` and ``strategy="ddp"`` to mimic distributed training on a
971-
machine without GPUs. This is useful for debugging, but **will not** provide
972-
any speedup, since single-process Torch already makes efficient use of multiple
973-
CPUs. While it would typically spawns subprocesses for training, setting
974-
``num_nodes > 1`` and keeping ``num_processes = 1`` runs training in the main
975-
process.
976-
977-
.. testcode::
978-
979-
# Simulate DDP for debugging on your GPU-less laptop
980-
trainer = Trainer(accelerator="cpu", strategy="ddp", num_processes=2)
981902

982903
num_sanity_val_steps
983904
^^^^^^^^^^^^^^^^^^^^
@@ -1320,65 +1241,6 @@ track_grad_norm
13201241
# track the 2-norm
13211242
trainer = Trainer(track_grad_norm=2)
13221243

1323-
.. _tpu_cores:
1324-
1325-
tpu_cores
1326-
^^^^^^^^^
1327-
1328-
.. warning:: ``tpu_cores=x`` has been deprecated in v1.7 and will be removed in v2.0.
1329-
Please use ``accelerator='tpu'`` and ``devices=x`` instead.
1330-
1331-
.. raw:: html
1332-
1333-
<video width="50%" max-width="400px" controls
1334-
poster="https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/pl_docs/trainer_flags/thumb/tpu_cores.jpg"
1335-
src="https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/pl_docs/trainer_flags/tpu_cores.mp4"></video>
1336-
1337-
|
1338-
1339-
- How many TPU cores to train on (1 or 8).
1340-
- Which TPU core to train on [1-8]
1341-
1342-
A single TPU v2 or v3 has 8 cores. A TPU pod has
1343-
up to 2048 cores. A slice of a POD means you get as many cores
1344-
as you request.
1345-
1346-
Your effective batch size is batch_size * total tpu cores.
1347-
1348-
This parameter can be either 1 or 8.
1349-
1350-
Example::
1351-
1352-
# your_trainer_file.py
1353-
1354-
# default used by the Trainer (ie: train on CPU)
1355-
trainer = Trainer(tpu_cores=None)
1356-
1357-
# int: train on a single core
1358-
trainer = Trainer(tpu_cores=1)
1359-
1360-
# list: train on a single selected core
1361-
trainer = Trainer(tpu_cores=[2])
1362-
1363-
# int: train on all cores few cores
1364-
trainer = Trainer(tpu_cores=8)
1365-
1366-
# for 8+ cores must submit via xla script with
1367-
# a max of 8 cores specified. The XLA script
1368-
# will duplicate script onto each TPU in the POD
1369-
trainer = Trainer(tpu_cores=8)
1370-
1371-
To train on more than 8 cores (ie: a POD),
1372-
submit this script using the xla_dist script.
1373-
1374-
Example::
1375-
1376-
python -m torch_xla.distributed.xla_dist
1377-
--tpu=$TPU_POD_NAME
1378-
--conda-env=torch-xla-nightly
1379-
--env=XLA_USE_BF16=1
1380-
-- python your_trainer_file.py
1381-
13821244

13831245
val_check_interval
13841246
^^^^^^^^^^^^^^^^^^

src/lightning_fabric/accelerators/tpu.py

Lines changed: 22 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@ def teardown(self) -> None:
4141
@staticmethod
4242
def parse_devices(devices: Union[int, str, List[int]]) -> Optional[Union[int, List[int]]]:
4343
"""Accelerator device parsing logic."""
44-
return _parse_tpu_cores(devices)
44+
return _parse_tpu_devices(devices)
4545

4646
@staticmethod
4747
def get_parallel_devices(devices: Union[int, List[int]]) -> List[int]:
@@ -128,13 +128,13 @@ def _tpu_distributed() -> bool:
128128
return xm.xrt_world_size() > 1
129129

130130

131-
def _parse_tpu_cores(tpu_cores: Optional[Union[int, str, List[int]]]) -> Optional[Union[int, List[int]]]:
131+
def _parse_tpu_devices(devices: Optional[Union[int, str, List[int]]]) -> Optional[Union[int, List[int]]]:
132132
"""
133-
Parses the tpu_cores given in the format as accepted by the
134-
:class:`~pytorch_lightning.trainer.Trainer`.
133+
Parses the TPU devices given in the format as accepted by the
134+
:class:`~pytorch_lightning.trainer.Trainer` and :class:`~lightning_fabric.Fabric`.
135135
136136
Args:
137-
tpu_cores: An int of 1 or string '1' indicates that 1 core with multi-processing should be used
137+
devices: An int of 1 or string '1' indicates that 1 core with multi-processing should be used
138138
An int 8 or string '8' indicates that all 8 cores with multi-processing should be used
139139
A list of ints or a strings containing a list of comma separated integers
140140
indicates the specific TPU core to use.
@@ -143,37 +143,37 @@ def _parse_tpu_cores(tpu_cores: Optional[Union[int, str, List[int]]]) -> Optiona
143143
A list of tpu_cores to be used or ``None`` if no TPU cores were requested
144144
145145
Raises:
146-
MisconfigurationException:
147-
If TPU cores aren't 1, 8 or [<1-8>]
146+
TypeError:
147+
If TPU devices aren't 1, 8 or [<1-8>]
148148
"""
149-
_check_data_type(tpu_cores)
149+
_check_data_type(devices)
150150

151-
if isinstance(tpu_cores, str):
152-
tpu_cores = _parse_tpu_cores_str(tpu_cores.strip())
151+
if isinstance(devices, str):
152+
devices = _parse_tpu_devices_str(devices.strip())
153153

154-
if not _tpu_cores_valid(tpu_cores):
155-
raise TypeError("`tpu_cores` can only be 1, 8 or [<1-8>]")
154+
if not _tpu_devices_valid(devices):
155+
raise TypeError("`devices` can only be 1, 8 or [<1-8>] for TPUs.")
156156

157-
return tpu_cores
157+
return devices
158158

159159

160-
def _tpu_cores_valid(tpu_cores: Any) -> bool:
160+
def _tpu_devices_valid(devices: Any) -> bool:
161161
# allow 1 or 8 cores
162-
if tpu_cores in (1, 8, None):
162+
if devices in (1, 8, None):
163163
return True
164164

165165
# allow picking 1 of 8 indexes
166-
if isinstance(tpu_cores, (list, tuple, set)):
167-
has_1_tpu_idx = len(tpu_cores) == 1
168-
is_valid_tpu_idx = 1 <= list(tpu_cores)[0] <= 8
166+
if isinstance(devices, (list, tuple, set)):
167+
has_1_tpu_idx = len(devices) == 1
168+
is_valid_tpu_idx = 1 <= list(devices)[0] <= 8
169169

170170
is_valid_tpu_core_choice = has_1_tpu_idx and is_valid_tpu_idx
171171
return is_valid_tpu_core_choice
172172

173173
return False
174174

175175

176-
def _parse_tpu_cores_str(tpu_cores: str) -> Union[int, List[int]]:
177-
if tpu_cores in ("1", "8"):
178-
return int(tpu_cores)
179-
return [int(x.strip()) for x in tpu_cores.split(",") if len(x) > 0]
176+
def _parse_tpu_devices_str(devices: str) -> Union[int, List[int]]:
177+
if devices in ("1", "8"):
178+
return int(devices)
179+
return [int(x.strip()) for x in devices.split(",") if len(x) > 0]

src/pytorch_lightning/CHANGELOG.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,12 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
1818
* Removed the `pytorch_lightning.utilities.enums.AMPType` enum
1919
* Removed the `DeepSpeedPrecisionPlugin(amp_type=..., amp_level=...)` arguments
2020

21+
- Removed legacy device arguments in Trainer ([#16171](https://github.com/Lightning-AI/lightning/pull/16171))
22+
* Removed the `Trainer(gpus=...)` argument
23+
* Removed the `Trainer(tpu_cores=...)` argument
24+
* Removed the `Trainer(ipus=...)` argument
25+
* Removed the `Trainer(num_processes=...)` argument
26+
2127

2228
## [unreleased] - 202Y-MM-DD
2329

src/pytorch_lightning/accelerators/tpu.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@
1515

1616
import torch
1717

18-
from lightning_fabric.accelerators.tpu import _parse_tpu_cores, _XLA_AVAILABLE
18+
from lightning_fabric.accelerators.tpu import _parse_tpu_devices, _XLA_AVAILABLE
1919
from lightning_fabric.accelerators.tpu import TPUAccelerator as LiteTPUAccelerator
2020
from lightning_fabric.utilities.types import _DEVICE
2121
from pytorch_lightning.accelerators.accelerator import Accelerator
@@ -58,7 +58,7 @@ def teardown(self) -> None:
5858
@staticmethod
5959
def parse_devices(devices: Union[int, str, List[int]]) -> Optional[Union[int, List[int]]]:
6060
"""Accelerator device parsing logic."""
61-
return _parse_tpu_cores(devices)
61+
return _parse_tpu_devices(devices)
6262

6363
@staticmethod
6464
def get_parallel_devices(devices: Union[int, List[int]]) -> List[int]:

0 commit comments

Comments
 (0)