Skip to content

Commit 1c50c74

Browse files
committed
Remove the deprecated Trainer device arguments (#16171)
1 parent bc20405 commit 1c50c74

File tree

14 files changed

+45
-367
lines changed

14 files changed

+45
-367
lines changed

docs/source-pytorch/common/trainer.rst

Lines changed: 0 additions & 138 deletions
Original file line numberDiff line numberDiff line change
@@ -492,8 +492,6 @@ devices
492492
^^^^^^^
493493

494494
Number of devices to train on (``int``), which devices to train on (``list`` or ``str``), or ``"auto"``.
495-
It will be mapped to either ``gpus``, ``tpu_cores``, ``num_processes`` or ``ipus``,
496-
based on the accelerator type (``"cpu", "gpu", "tpu", "ipu", "auto"``).
497495

498496
.. code-block:: python
499497
@@ -624,56 +622,6 @@ impact to subsequent runs. These are the changes enabled:
624622
- Disables the Tuner.
625623
- If using the CLI, the configuration file is not saved.
626624

627-
.. _gpus:
628-
629-
gpus
630-
^^^^
631-
632-
.. warning:: ``gpus=x`` has been deprecated in v1.7 and will be removed in v2.0.
633-
Please use ``accelerator='gpu'`` and ``devices=x`` instead.
634-
635-
.. raw:: html
636-
637-
<video width="50%" max-width="400px" controls
638-
poster="https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/pl_docs/trainer_flags/thumb/gpus.jpg"
639-
src="https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/pl_docs/trainer_flags/gpus.mp4"></video>
640-
641-
|
642-
643-
- Number of GPUs to train on (int)
644-
- or which GPUs to train on (list)
645-
- can handle strings
646-
647-
.. testcode::
648-
649-
# default used by the Trainer (ie: train on CPU)
650-
trainer = Trainer(gpus=None)
651-
652-
# equivalent
653-
trainer = Trainer(gpus=0)
654-
655-
Example::
656-
657-
# int: train on 2 gpus
658-
trainer = Trainer(gpus=2)
659-
660-
# list: train on GPUs 1, 4 (by bus ordering)
661-
trainer = Trainer(gpus=[1, 4])
662-
trainer = Trainer(gpus='1, 4') # equivalent
663-
664-
# -1: train on all gpus
665-
trainer = Trainer(gpus=-1)
666-
trainer = Trainer(gpus='-1') # equivalent
667-
668-
# combine with num_nodes to train on multiple GPUs across nodes
669-
# uses 8 gpus in total
670-
trainer = Trainer(gpus=2, num_nodes=4)
671-
672-
# train only on GPUs 1 and 4 across nodes
673-
trainer = Trainer(gpus=[1, 4], num_nodes=4)
674-
675-
See Also:
676-
- :ref:`Multi GPU Training <multi_gpu>`
677625

678626
gradient_clip_val
679627
^^^^^^^^^^^^^^^^^
@@ -952,33 +900,6 @@ Number of GPU nodes for distributed training.
952900
# to train on 8 nodes
953901
trainer = Trainer(num_nodes=8)
954902

955-
num_processes
956-
^^^^^^^^^^^^^
957-
958-
.. warning:: ``num_processes=x`` has been deprecated in v1.7 and will be removed in v2.0.
959-
Please use ``accelerator='cpu'`` and ``devices=x`` instead.
960-
961-
.. raw:: html
962-
963-
<video width="50%" max-width="400px" controls
964-
poster="https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/pl_docs/trainer_flags/thumb/num_processes.jpg"
965-
src="https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/pl_docs/trainer_flags/num_processes.mp4"></video>
966-
967-
|
968-
969-
Number of processes to train with. Automatically set to the number of GPUs
970-
when using ``strategy="ddp"``. Set to a number greater than 1 when
971-
using ``accelerator="cpu"`` and ``strategy="ddp"`` to mimic distributed training on a
972-
machine without GPUs. This is useful for debugging, but **will not** provide
973-
any speedup, since single-process Torch already makes efficient use of multiple
974-
CPUs. While it would typically spawns subprocesses for training, setting
975-
``num_nodes > 1`` and keeping ``num_processes = 1`` runs training in the main
976-
process.
977-
978-
.. testcode::
979-
980-
# Simulate DDP for debugging on your GPU-less laptop
981-
trainer = Trainer(accelerator="cpu", strategy="ddp", num_processes=2)
982903

983904
num_sanity_val_steps
984905
^^^^^^^^^^^^^^^^^^^^
@@ -1321,65 +1242,6 @@ track_grad_norm
13211242
# track the 2-norm
13221243
trainer = Trainer(track_grad_norm=2)
13231244

1324-
.. _tpu_cores:
1325-
1326-
tpu_cores
1327-
^^^^^^^^^
1328-
1329-
.. warning:: ``tpu_cores=x`` has been deprecated in v1.7 and will be removed in v2.0.
1330-
Please use ``accelerator='tpu'`` and ``devices=x`` instead.
1331-
1332-
.. raw:: html
1333-
1334-
<video width="50%" max-width="400px" controls
1335-
poster="https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/pl_docs/trainer_flags/thumb/tpu_cores.jpg"
1336-
src="https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/pl_docs/trainer_flags/tpu_cores.mp4"></video>
1337-
1338-
|
1339-
1340-
- How many TPU cores to train on (1 or 8).
1341-
- Which TPU core to train on [1-8]
1342-
1343-
A single TPU v2 or v3 has 8 cores. A TPU pod has
1344-
up to 2048 cores. A slice of a POD means you get as many cores
1345-
as you request.
1346-
1347-
Your effective batch size is batch_size * total tpu cores.
1348-
1349-
This parameter can be either 1 or 8.
1350-
1351-
Example::
1352-
1353-
# your_trainer_file.py
1354-
1355-
# default used by the Trainer (ie: train on CPU)
1356-
trainer = Trainer(tpu_cores=None)
1357-
1358-
# int: train on a single core
1359-
trainer = Trainer(tpu_cores=1)
1360-
1361-
# list: train on a single selected core
1362-
trainer = Trainer(tpu_cores=[2])
1363-
1364-
# int: train on all cores few cores
1365-
trainer = Trainer(tpu_cores=8)
1366-
1367-
# for 8+ cores must submit via xla script with
1368-
# a max of 8 cores specified. The XLA script
1369-
# will duplicate script onto each TPU in the POD
1370-
trainer = Trainer(tpu_cores=8)
1371-
1372-
To train on more than 8 cores (ie: a POD),
1373-
submit this script using the xla_dist script.
1374-
1375-
Example::
1376-
1377-
python -m torch_xla.distributed.xla_dist
1378-
--tpu=$TPU_POD_NAME
1379-
--conda-env=torch-xla-nightly
1380-
--env=XLA_USE_BF16=1
1381-
-- python your_trainer_file.py
1382-
13831245

13841246
val_check_interval
13851247
^^^^^^^^^^^^^^^^^^

src/pytorch_lightning/CHANGELOG.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,12 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
2727

2828
- Removed `Trainer(strategy='horovod')` support ([#16150](https://github.com/Lightning-AI/lightning/pull/16150))
2929

30+
- Removed legacy device arguments in Trainer ([#16171](https://github.com/Lightning-AI/lightning/pull/16171))
31+
* Removed the `Trainer(gpus=...)` argument
32+
* Removed the `Trainer(tpu_cores=...)` argument
33+
* Removed the `Trainer(ipus=...)` argument
34+
* Removed the `Trainer(num_processes=...)` argument
35+
3036

3137
## [1.9.0] - 2023-01-12
3238

src/pytorch_lightning/trainer/connectors/accelerator_connector.py

Lines changed: 7 additions & 83 deletions
Original file line numberDiff line numberDiff line change
@@ -102,10 +102,6 @@ def __init__(
102102
replace_sampler_ddp: bool = True,
103103
deterministic: Optional[Union[bool, _LITERAL_WARN]] = False,
104104
auto_select_gpus: Optional[bool] = None, # TODO: Remove in v2.0.0
105-
num_processes: Optional[int] = None, # TODO: Remove in v2.0.0
106-
tpu_cores: Optional[Union[List[int], str, int]] = None, # TODO: Remove in v2.0.0
107-
ipus: Optional[int] = None, # TODO: Remove in v2.0.0
108-
gpus: Optional[Union[List[int], str, int]] = None, # TODO: Remove in v2.0.0
109105
) -> None:
110106
"""The AcceleratorConnector parses several Trainer arguments and instantiates the Strategy including other
111107
components such as the Accelerator and Precision plugins.
@@ -159,7 +155,6 @@ def __init__(
159155

160156
# Raise an exception if there are conflicts between flags
161157
# Set each valid flag to `self._x_flag` after validation
162-
# For devices: Assign gpus, ipus, etc. to the accelerator flag and devices flag
163158
self._strategy_flag: Optional[Union[Strategy, str]] = None
164159
self._accelerator_flag: Optional[Union[Accelerator, str]] = None
165160
self._precision_flag: _PRECISION_INPUT_STR = "32"
@@ -177,9 +172,6 @@ def __init__(
177172
plugins=plugins,
178173
sync_batchnorm=sync_batchnorm,
179174
)
180-
self._check_device_config_and_set_final_flags(
181-
devices=devices, num_nodes=num_nodes, num_processes=num_processes, gpus=gpus, ipus=ipus, tpu_cores=tpu_cores
182-
)
183175
# 2. Instantiate Accelerator
184176
self._set_accelerator_if_ipu_strategy_is_passed()
185177

@@ -189,6 +181,7 @@ def __init__(
189181
elif self._accelerator_flag == "gpu":
190182
self._accelerator_flag = self._choose_gpu_accelerator_backend()
191183

184+
self._check_device_config_and_set_final_flags(devices=devices, num_nodes=num_nodes)
192185
self._set_parallel_devices_and_init_accelerator()
193186

194187
# 3. Instantiate ClusterEnvironment
@@ -376,10 +369,6 @@ def _check_device_config_and_set_final_flags(
376369
self,
377370
devices: Optional[Union[List[int], str, int]],
378371
num_nodes: int,
379-
num_processes: Optional[int],
380-
gpus: Optional[Union[List[int], str, int]],
381-
ipus: Optional[int],
382-
tpu_cores: Optional[Union[List[int], str, int]],
383372
) -> None:
384373
self._num_nodes_flag = int(num_nodes) if num_nodes is not None else 1
385374
self._devices_flag = devices
@@ -395,76 +384,12 @@ def _check_device_config_and_set_final_flags(
395384
f" using {accelerator_name} accelerator."
396385
)
397386

398-
# TODO: Delete this method when num_processes, gpus, ipus and tpu_cores gets removed
399-
self._map_deprecated_devices_specific_info_to_accelerator_and_device_flag(
400-
devices, num_processes, gpus, ipus, tpu_cores
401-
)
402-
403387
if self._devices_flag == "auto" and self._accelerator_flag is None:
404388
raise MisconfigurationException(
405389
f"You passed `devices={devices}` but haven't specified"
406390
" `accelerator=('auto'|'tpu'|'gpu'|'ipu'|'cpu'|'hpu'|'mps')` for the devices mapping."
407391
)
408392

409-
def _map_deprecated_devices_specific_info_to_accelerator_and_device_flag(
410-
self,
411-
devices: Optional[Union[List[int], str, int]],
412-
num_processes: Optional[int],
413-
gpus: Optional[Union[List[int], str, int]],
414-
ipus: Optional[int],
415-
tpu_cores: Optional[Union[List[int], str, int]],
416-
) -> None:
417-
"""Emit deprecation warnings for num_processes, gpus, ipus, tpu_cores and set the `devices_flag` and
418-
`accelerator_flag`."""
419-
if num_processes is not None:
420-
rank_zero_deprecation(
421-
f"Setting `Trainer(num_processes={num_processes})` is deprecated in v1.7 and will be removed"
422-
f" in v2.0. Please use `Trainer(accelerator='cpu', devices={num_processes})` instead."
423-
)
424-
if gpus is not None:
425-
rank_zero_deprecation(
426-
f"Setting `Trainer(gpus={gpus!r})` is deprecated in v1.7 and will be removed"
427-
f" in v2.0. Please use `Trainer(accelerator='gpu', devices={gpus!r})` instead."
428-
)
429-
if tpu_cores is not None:
430-
rank_zero_deprecation(
431-
f"Setting `Trainer(tpu_cores={tpu_cores!r})` is deprecated in v1.7 and will be removed"
432-
f" in v2.0. Please use `Trainer(accelerator='tpu', devices={tpu_cores!r})` instead."
433-
)
434-
if ipus is not None:
435-
rank_zero_deprecation(
436-
f"Setting `Trainer(ipus={ipus})` is deprecated in v1.7 and will be removed"
437-
f" in v2.0. Please use `Trainer(accelerator='ipu', devices={ipus})` instead."
438-
)
439-
self._gpus: Optional[Union[List[int], str, int]] = gpus
440-
self._tpu_cores: Optional[Union[List[int], str, int]] = tpu_cores
441-
deprecated_devices_specific_flag = num_processes or gpus or ipus or tpu_cores
442-
if deprecated_devices_specific_flag and deprecated_devices_specific_flag not in ([], 0, "0"):
443-
if devices:
444-
# TODO improve error message
445-
rank_zero_warn(
446-
f"The flag `devices={devices}` will be ignored, "
447-
f"instead the device specific number {deprecated_devices_specific_flag} will be used"
448-
)
449-
450-
if [(num_processes is not None), (gpus is not None), (ipus is not None), (tpu_cores is not None)].count(
451-
True
452-
) > 1:
453-
# TODO: improve error message
454-
rank_zero_warn("more than one device specific flag has been set")
455-
self._devices_flag = deprecated_devices_specific_flag
456-
457-
if self._accelerator_flag is None:
458-
# set accelerator type based on num_processes, gpus, ipus, tpu_cores
459-
if ipus:
460-
self._accelerator_flag = "ipu"
461-
if tpu_cores:
462-
self._accelerator_flag = "tpu"
463-
if gpus:
464-
self._accelerator_flag = "cuda"
465-
if num_processes:
466-
self._accelerator_flag = "cpu"
467-
468393
def _set_accelerator_if_ipu_strategy_is_passed(self) -> None:
469394
# current logic only apply to object config
470395
# TODO this logic should apply to both str and object config
@@ -517,12 +442,7 @@ def _set_parallel_devices_and_init_accelerator(self) -> None:
517442
)
518443

519444
self._set_devices_flag_if_auto_passed()
520-
521-
self._gpus = self._devices_flag if not self._gpus else self._gpus
522-
self._tpu_cores = self._devices_flag if not self._tpu_cores else self._tpu_cores
523-
524445
self._set_devices_flag_if_auto_select_gpus_passed()
525-
526446
self._devices_flag = accelerator_cls.parse_devices(self._devices_flag)
527447
if not self._parallel_devices:
528448
self._parallel_devices = accelerator_cls.get_parallel_devices(self._devices_flag)
@@ -537,9 +457,13 @@ def _set_devices_flag_if_auto_select_gpus_passed(self) -> None:
537457
"The Trainer argument `auto_select_gpus` has been deprecated in v1.9.0 and will be removed in v2.0.0."
538458
" Please use the function `pytorch_lightning.accelerators.find_usable_cuda_devices` instead."
539459
)
540-
if self._auto_select_gpus and isinstance(self._gpus, int) and isinstance(self.accelerator, CUDAAccelerator):
460+
if (
461+
self._auto_select_gpus
462+
and isinstance(self._devices_flag, int)
463+
and isinstance(self.accelerator, CUDAAccelerator)
464+
):
541465
self._devices_flag = pick_multiple_gpus(
542-
self._gpus,
466+
self._devices_flag,
543467
# we already show a deprecation message when user sets Trainer(auto_select_gpus=...)
544468
_show_deprecation=False,
545469
)

0 commit comments

Comments
 (0)