Skip to content

Commit 79dbefc

Browse files
awaelchlicarmocca
authored andcommitted
Remove legacy device arguments in Trainer (#16171)
1 parent 730d2bc commit 79dbefc

File tree

14 files changed

+45
-368
lines changed

14 files changed

+45
-368
lines changed

docs/source-pytorch/common/trainer.rst

Lines changed: 0 additions & 138 deletions
Original file line numberDiff line numberDiff line change
@@ -492,8 +492,6 @@ devices
492492
^^^^^^^
493493

494494
Number of devices to train on (``int``), which devices to train on (``list`` or ``str``), or ``"auto"``.
495-
It will be mapped to either ``gpus``, ``tpu_cores``, ``num_processes`` or ``ipus``,
496-
based on the accelerator type (``"cpu", "gpu", "tpu", "ipu", "auto"``).
497495

498496
.. code-block:: python
499497
@@ -624,56 +622,6 @@ impact to subsequent runs. These are the changes enabled:
624622
- Disables the Tuner.
625623
- If using the CLI, the configuration file is not saved.
626624

627-
.. _gpus:
628-
629-
gpus
630-
^^^^
631-
632-
.. warning:: ``gpus=x`` has been deprecated in v1.7 and will be removed in v2.0.
633-
Please use ``accelerator='gpu'`` and ``devices=x`` instead.
634-
635-
.. raw:: html
636-
637-
<video width="50%" max-width="400px" controls
638-
poster="https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/pl_docs/trainer_flags/thumb/gpus.jpg"
639-
src="https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/pl_docs/trainer_flags/gpus.mp4"></video>
640-
641-
|
642-
643-
- Number of GPUs to train on (int)
644-
- or which GPUs to train on (list)
645-
- can handle strings
646-
647-
.. testcode::
648-
649-
# default used by the Trainer (ie: train on CPU)
650-
trainer = Trainer(gpus=None)
651-
652-
# equivalent
653-
trainer = Trainer(gpus=0)
654-
655-
Example::
656-
657-
# int: train on 2 gpus
658-
trainer = Trainer(gpus=2)
659-
660-
# list: train on GPUs 1, 4 (by bus ordering)
661-
trainer = Trainer(gpus=[1, 4])
662-
trainer = Trainer(gpus='1, 4') # equivalent
663-
664-
# -1: train on all gpus
665-
trainer = Trainer(gpus=-1)
666-
trainer = Trainer(gpus='-1') # equivalent
667-
668-
# combine with num_nodes to train on multiple GPUs across nodes
669-
# uses 8 gpus in total
670-
trainer = Trainer(gpus=2, num_nodes=4)
671-
672-
# train only on GPUs 1 and 4 across nodes
673-
trainer = Trainer(gpus=[1, 4], num_nodes=4)
674-
675-
See Also:
676-
- :ref:`Multi GPU Training <multi_gpu>`
677625

678626
gradient_clip_val
679627
^^^^^^^^^^^^^^^^^
@@ -951,33 +899,6 @@ Number of GPU nodes for distributed training.
951899
# to train on 8 nodes
952900
trainer = Trainer(num_nodes=8)
953901

954-
num_processes
955-
^^^^^^^^^^^^^
956-
957-
.. warning:: ``num_processes=x`` has been deprecated in v1.7 and will be removed in v2.0.
958-
Please use ``accelerator='cpu'`` and ``devices=x`` instead.
959-
960-
.. raw:: html
961-
962-
<video width="50%" max-width="400px" controls
963-
poster="https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/pl_docs/trainer_flags/thumb/num_processes.jpg"
964-
src="https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/pl_docs/trainer_flags/num_processes.mp4"></video>
965-
966-
|
967-
968-
Number of processes to train with. Automatically set to the number of GPUs
969-
when using ``strategy="ddp"``. Set to a number greater than 1 when
970-
using ``accelerator="cpu"`` and ``strategy="ddp"`` to mimic distributed training on a
971-
machine without GPUs. This is useful for debugging, but **will not** provide
972-
any speedup, since single-process Torch already makes efficient use of multiple
973-
CPUs. While it would typically spawns subprocesses for training, setting
974-
``num_nodes > 1`` and keeping ``num_processes = 1`` runs training in the main
975-
process.
976-
977-
.. testcode::
978-
979-
# Simulate DDP for debugging on your GPU-less laptop
980-
trainer = Trainer(accelerator="cpu", strategy="ddp", num_processes=2)
981902

982903
num_sanity_val_steps
983904
^^^^^^^^^^^^^^^^^^^^
@@ -1320,65 +1241,6 @@ track_grad_norm
13201241
# track the 2-norm
13211242
trainer = Trainer(track_grad_norm=2)
13221243

1323-
.. _tpu_cores:
1324-
1325-
tpu_cores
1326-
^^^^^^^^^
1327-
1328-
.. warning:: ``tpu_cores=x`` has been deprecated in v1.7 and will be removed in v2.0.
1329-
Please use ``accelerator='tpu'`` and ``devices=x`` instead.
1330-
1331-
.. raw:: html
1332-
1333-
<video width="50%" max-width="400px" controls
1334-
poster="https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/pl_docs/trainer_flags/thumb/tpu_cores.jpg"
1335-
src="https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/pl_docs/trainer_flags/tpu_cores.mp4"></video>
1336-
1337-
|
1338-
1339-
- How many TPU cores to train on (1 or 8).
1340-
- Which TPU core to train on [1-8]
1341-
1342-
A single TPU v2 or v3 has 8 cores. A TPU pod has
1343-
up to 2048 cores. A slice of a POD means you get as many cores
1344-
as you request.
1345-
1346-
Your effective batch size is batch_size * total tpu cores.
1347-
1348-
This parameter can be either 1 or 8.
1349-
1350-
Example::
1351-
1352-
# your_trainer_file.py
1353-
1354-
# default used by the Trainer (ie: train on CPU)
1355-
trainer = Trainer(tpu_cores=None)
1356-
1357-
# int: train on a single core
1358-
trainer = Trainer(tpu_cores=1)
1359-
1360-
# list: train on a single selected core
1361-
trainer = Trainer(tpu_cores=[2])
1362-
1363-
# int: train on all cores few cores
1364-
trainer = Trainer(tpu_cores=8)
1365-
1366-
# for 8+ cores must submit via xla script with
1367-
# a max of 8 cores specified. The XLA script
1368-
# will duplicate script onto each TPU in the POD
1369-
trainer = Trainer(tpu_cores=8)
1370-
1371-
To train on more than 8 cores (ie: a POD),
1372-
submit this script using the xla_dist script.
1373-
1374-
Example::
1375-
1376-
python -m torch_xla.distributed.xla_dist
1377-
--tpu=$TPU_POD_NAME
1378-
--conda-env=torch-xla-nightly
1379-
--env=XLA_USE_BF16=1
1380-
-- python your_trainer_file.py
1381-
13821244

13831245
val_check_interval
13841246
^^^^^^^^^^^^^^^^^^

src/pytorch_lightning/CHANGELOG.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,12 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
2727

2828
- Removed `Trainer(strategy='horovod')` support ([#16150](https://github.com/Lightning-AI/lightning/pull/16150))
2929

30+
- Removed legacy device arguments in Trainer ([#16171](https://github.com/Lightning-AI/lightning/pull/16171))
31+
* Removed the `Trainer(gpus=...)` argument
32+
* Removed the `Trainer(tpu_cores=...)` argument
33+
* Removed the `Trainer(ipus=...)` argument
34+
* Removed the `Trainer(num_processes=...)` argument
35+
3036

3137
## [unreleased] - 202Y-MM-DD
3238

src/pytorch_lightning/trainer/connectors/accelerator_connector.py

Lines changed: 7 additions & 83 deletions
Original file line numberDiff line numberDiff line change
@@ -99,10 +99,6 @@ def __init__(
9999
replace_sampler_ddp: bool = True,
100100
deterministic: Optional[Union[bool, _LITERAL_WARN]] = False,
101101
auto_select_gpus: Optional[bool] = None, # TODO: Remove in v1.10.0
102-
num_processes: Optional[int] = None, # deprecated
103-
tpu_cores: Optional[Union[List[int], str, int]] = None, # deprecated
104-
ipus: Optional[int] = None, # deprecated
105-
gpus: Optional[Union[List[int], str, int]] = None, # deprecated
106102
) -> None:
107103
"""The AcceleratorConnector parses several Trainer arguments and instantiates the Strategy including other
108104
components such as the Accelerator and Precision plugins.
@@ -157,7 +153,6 @@ def __init__(
157153

158154
# Raise an exception if there are conflicts between flags
159155
# Set each valid flag to `self._x_flag` after validation
160-
# For devices: Assign gpus, ipus, etc. to the accelerator flag and devices flag
161156
self._strategy_flag: Optional[Union[Strategy, str]] = None
162157
self._accelerator_flag: Optional[Union[Accelerator, str]] = None
163158
self._precision_flag: Optional[Union[int, str]] = None
@@ -175,9 +170,6 @@ def __init__(
175170
plugins=plugins,
176171
sync_batchnorm=sync_batchnorm,
177172
)
178-
self._check_device_config_and_set_final_flags(
179-
devices=devices, num_nodes=num_nodes, num_processes=num_processes, gpus=gpus, ipus=ipus, tpu_cores=tpu_cores
180-
)
181173
# 2. Instantiate Accelerator
182174
self._set_accelerator_if_ipu_strategy_is_passed()
183175

@@ -187,6 +179,7 @@ def __init__(
187179
elif self._accelerator_flag == "gpu":
188180
self._accelerator_flag = self._choose_gpu_accelerator_backend()
189181

182+
self._check_device_config_and_set_final_flags(devices=devices, num_nodes=num_nodes)
190183
self._set_parallel_devices_and_init_accelerator()
191184

192185
# 3. Instantiate ClusterEnvironment
@@ -360,10 +353,6 @@ def _check_device_config_and_set_final_flags(
360353
self,
361354
devices: Optional[Union[List[int], str, int]],
362355
num_nodes: int,
363-
num_processes: Optional[int],
364-
gpus: Optional[Union[List[int], str, int]],
365-
ipus: Optional[int],
366-
tpu_cores: Optional[Union[List[int], str, int]],
367356
) -> None:
368357
self._num_nodes_flag = int(num_nodes) if num_nodes is not None else 1
369358
self._devices_flag = devices
@@ -379,76 +368,12 @@ def _check_device_config_and_set_final_flags(
379368
f" using {accelerator_name} accelerator."
380369
)
381370

382-
# TODO: Delete this method when num_processes, gpus, ipus and tpu_cores gets removed
383-
self._map_deprecated_devices_specific_info_to_accelerator_and_device_flag(
384-
devices, num_processes, gpus, ipus, tpu_cores
385-
)
386-
387371
if self._devices_flag == "auto" and self._accelerator_flag is None:
388372
raise MisconfigurationException(
389373
f"You passed `devices={devices}` but haven't specified"
390374
" `accelerator=('auto'|'tpu'|'gpu'|'ipu'|'cpu'|'hpu'|'mps')` for the devices mapping."
391375
)
392376

393-
def _map_deprecated_devices_specific_info_to_accelerator_and_device_flag(
394-
self,
395-
devices: Optional[Union[List[int], str, int]],
396-
num_processes: Optional[int],
397-
gpus: Optional[Union[List[int], str, int]],
398-
ipus: Optional[int],
399-
tpu_cores: Optional[Union[List[int], str, int]],
400-
) -> None:
401-
"""Emit deprecation warnings for num_processes, gpus, ipus, tpu_cores and set the `devices_flag` and
402-
`accelerator_flag`."""
403-
if num_processes is not None:
404-
rank_zero_deprecation(
405-
f"Setting `Trainer(num_processes={num_processes})` is deprecated in v1.7 and will be removed"
406-
f" in v2.0. Please use `Trainer(accelerator='cpu', devices={num_processes})` instead."
407-
)
408-
if gpus is not None:
409-
rank_zero_deprecation(
410-
f"Setting `Trainer(gpus={gpus!r})` is deprecated in v1.7 and will be removed"
411-
f" in v2.0. Please use `Trainer(accelerator='gpu', devices={gpus!r})` instead."
412-
)
413-
if tpu_cores is not None:
414-
rank_zero_deprecation(
415-
f"Setting `Trainer(tpu_cores={tpu_cores!r})` is deprecated in v1.7 and will be removed"
416-
f" in v2.0. Please use `Trainer(accelerator='tpu', devices={tpu_cores!r})` instead."
417-
)
418-
if ipus is not None:
419-
rank_zero_deprecation(
420-
f"Setting `Trainer(ipus={ipus})` is deprecated in v1.7 and will be removed"
421-
f" in v2.0. Please use `Trainer(accelerator='ipu', devices={ipus})` instead."
422-
)
423-
self._gpus: Optional[Union[List[int], str, int]] = gpus
424-
self._tpu_cores: Optional[Union[List[int], str, int]] = tpu_cores
425-
deprecated_devices_specific_flag = num_processes or gpus or ipus or tpu_cores
426-
if deprecated_devices_specific_flag and deprecated_devices_specific_flag not in ([], 0, "0"):
427-
if devices:
428-
# TODO improve error message
429-
rank_zero_warn(
430-
f"The flag `devices={devices}` will be ignored, "
431-
f"instead the device specific number {deprecated_devices_specific_flag} will be used"
432-
)
433-
434-
if [(num_processes is not None), (gpus is not None), (ipus is not None), (tpu_cores is not None)].count(
435-
True
436-
) > 1:
437-
# TODO: improve error message
438-
rank_zero_warn("more than one device specific flag has been set")
439-
self._devices_flag = deprecated_devices_specific_flag
440-
441-
if self._accelerator_flag is None:
442-
# set accelerator type based on num_processes, gpus, ipus, tpu_cores
443-
if ipus:
444-
self._accelerator_flag = "ipu"
445-
if tpu_cores:
446-
self._accelerator_flag = "tpu"
447-
if gpus:
448-
self._accelerator_flag = "cuda"
449-
if num_processes:
450-
self._accelerator_flag = "cpu"
451-
452377
def _set_accelerator_if_ipu_strategy_is_passed(self) -> None:
453378
# current logic only apply to object config
454379
# TODO this logic should apply to both str and object config
@@ -501,12 +426,7 @@ def _set_parallel_devices_and_init_accelerator(self) -> None:
501426
)
502427

503428
self._set_devices_flag_if_auto_passed()
504-
505-
self._gpus = self._devices_flag if not self._gpus else self._gpus
506-
self._tpu_cores = self._devices_flag if not self._tpu_cores else self._tpu_cores
507-
508429
self._set_devices_flag_if_auto_select_gpus_passed()
509-
510430
self._devices_flag = accelerator_cls.parse_devices(self._devices_flag)
511431
if not self._parallel_devices:
512432
self._parallel_devices = accelerator_cls.get_parallel_devices(self._devices_flag)
@@ -521,9 +441,13 @@ def _set_devices_flag_if_auto_select_gpus_passed(self) -> None:
521441
"The Trainer argument `auto_select_gpus` has been deprecated in v1.9.0 and will be removed in v1.10.0."
522442
" Please use the function `pytorch_lightning.accelerators.find_usable_cuda_devices` instead."
523443
)
524-
if self._auto_select_gpus and isinstance(self._gpus, int) and isinstance(self.accelerator, CUDAAccelerator):
444+
if (
445+
self._auto_select_gpus
446+
and isinstance(self._devices_flag, int)
447+
and isinstance(self.accelerator, CUDAAccelerator)
448+
):
525449
self._devices_flag = pick_multiple_gpus(
526-
self._gpus,
450+
self._devices_flag,
527451
# we already show a deprecation message when user sets Trainer(auto_select_gpus=...)
528452
_show_deprecation=False,
529453
)

0 commit comments

Comments
 (0)