Skip to content

Commit 3232354

Browse files
awaelchlicarmocca
authored andcommitted
Remove legacy device arguments in Trainer (#16171)
1 parent feff486 commit 3232354

File tree

14 files changed

+45
-368
lines changed

14 files changed

+45
-368
lines changed

docs/source-pytorch/common/trainer.rst

Lines changed: 0 additions & 138 deletions
Original file line numberDiff line numberDiff line change
@@ -492,8 +492,6 @@ devices
492492
^^^^^^^
493493

494494
Number of devices to train on (``int``), which devices to train on (``list`` or ``str``), or ``"auto"``.
495-
It will be mapped to either ``gpus``, ``tpu_cores``, ``num_processes`` or ``ipus``,
496-
based on the accelerator type (``"cpu", "gpu", "tpu", "ipu", "auto"``).
497495

498496
.. code-block:: python
499497
@@ -624,56 +622,6 @@ impact to subsequent runs. These are the changes enabled:
624622
- Disables the Tuner.
625623
- If using the CLI, the configuration file is not saved.
626624

627-
.. _gpus:
628-
629-
gpus
630-
^^^^
631-
632-
.. warning:: ``gpus=x`` has been deprecated in v1.7 and will be removed in v2.0.
633-
Please use ``accelerator='gpu'`` and ``devices=x`` instead.
634-
635-
.. raw:: html
636-
637-
<video width="50%" max-width="400px" controls
638-
poster="https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/pl_docs/trainer_flags/thumb/gpus.jpg"
639-
src="https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/pl_docs/trainer_flags/gpus.mp4"></video>
640-
641-
|
642-
643-
- Number of GPUs to train on (int)
644-
- or which GPUs to train on (list)
645-
- can handle strings
646-
647-
.. testcode::
648-
649-
# default used by the Trainer (ie: train on CPU)
650-
trainer = Trainer(gpus=None)
651-
652-
# equivalent
653-
trainer = Trainer(gpus=0)
654-
655-
Example::
656-
657-
# int: train on 2 gpus
658-
trainer = Trainer(gpus=2)
659-
660-
# list: train on GPUs 1, 4 (by bus ordering)
661-
trainer = Trainer(gpus=[1, 4])
662-
trainer = Trainer(gpus='1, 4') # equivalent
663-
664-
# -1: train on all gpus
665-
trainer = Trainer(gpus=-1)
666-
trainer = Trainer(gpus='-1') # equivalent
667-
668-
# combine with num_nodes to train on multiple GPUs across nodes
669-
# uses 8 gpus in total
670-
trainer = Trainer(gpus=2, num_nodes=4)
671-
672-
# train only on GPUs 1 and 4 across nodes
673-
trainer = Trainer(gpus=[1, 4], num_nodes=4)
674-
675-
See Also:
676-
- :ref:`Multi GPU Training <multi_gpu>`
677625

678626
gradient_clip_val
679627
^^^^^^^^^^^^^^^^^
@@ -951,33 +899,6 @@ Number of GPU nodes for distributed training.
951899
# to train on 8 nodes
952900
trainer = Trainer(num_nodes=8)
953901

954-
num_processes
955-
^^^^^^^^^^^^^
956-
957-
.. warning:: ``num_processes=x`` has been deprecated in v1.7 and will be removed in v2.0.
958-
Please use ``accelerator='cpu'`` and ``devices=x`` instead.
959-
960-
.. raw:: html
961-
962-
<video width="50%" max-width="400px" controls
963-
poster="https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/pl_docs/trainer_flags/thumb/num_processes.jpg"
964-
src="https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/pl_docs/trainer_flags/num_processes.mp4"></video>
965-
966-
|
967-
968-
Number of processes to train with. Automatically set to the number of GPUs
969-
when using ``strategy="ddp"``. Set to a number greater than 1 when
970-
using ``accelerator="cpu"`` and ``strategy="ddp"`` to mimic distributed training on a
971-
machine without GPUs. This is useful for debugging, but **will not** provide
972-
any speedup, since single-process Torch already makes efficient use of multiple
973-
CPUs. While it would typically spawns subprocesses for training, setting
974-
``num_nodes > 1`` and keeping ``num_processes = 1`` runs training in the main
975-
process.
976-
977-
.. testcode::
978-
979-
# Simulate DDP for debugging on your GPU-less laptop
980-
trainer = Trainer(accelerator="cpu", strategy="ddp", num_processes=2)
981902

982903
num_sanity_val_steps
983904
^^^^^^^^^^^^^^^^^^^^
@@ -1320,65 +1241,6 @@ track_grad_norm
13201241
# track the 2-norm
13211242
trainer = Trainer(track_grad_norm=2)
13221243

1323-
.. _tpu_cores:
1324-
1325-
tpu_cores
1326-
^^^^^^^^^
1327-
1328-
.. warning:: ``tpu_cores=x`` has been deprecated in v1.7 and will be removed in v2.0.
1329-
Please use ``accelerator='tpu'`` and ``devices=x`` instead.
1330-
1331-
.. raw:: html
1332-
1333-
<video width="50%" max-width="400px" controls
1334-
poster="https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/pl_docs/trainer_flags/thumb/tpu_cores.jpg"
1335-
src="https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/pl_docs/trainer_flags/tpu_cores.mp4"></video>
1336-
1337-
|
1338-
1339-
- How many TPU cores to train on (1 or 8).
1340-
- Which TPU core to train on [1-8]
1341-
1342-
A single TPU v2 or v3 has 8 cores. A TPU pod has
1343-
up to 2048 cores. A slice of a POD means you get as many cores
1344-
as you request.
1345-
1346-
Your effective batch size is batch_size * total tpu cores.
1347-
1348-
This parameter can be either 1 or 8.
1349-
1350-
Example::
1351-
1352-
# your_trainer_file.py
1353-
1354-
# default used by the Trainer (ie: train on CPU)
1355-
trainer = Trainer(tpu_cores=None)
1356-
1357-
# int: train on a single core
1358-
trainer = Trainer(tpu_cores=1)
1359-
1360-
# list: train on a single selected core
1361-
trainer = Trainer(tpu_cores=[2])
1362-
1363-
# int: train on all cores few cores
1364-
trainer = Trainer(tpu_cores=8)
1365-
1366-
# for 8+ cores must submit via xla script with
1367-
# a max of 8 cores specified. The XLA script
1368-
# will duplicate script onto each TPU in the POD
1369-
trainer = Trainer(tpu_cores=8)
1370-
1371-
To train on more than 8 cores (ie: a POD),
1372-
submit this script using the xla_dist script.
1373-
1374-
Example::
1375-
1376-
python -m torch_xla.distributed.xla_dist
1377-
--tpu=$TPU_POD_NAME
1378-
--conda-env=torch-xla-nightly
1379-
--env=XLA_USE_BF16=1
1380-
-- python your_trainer_file.py
1381-
13821244

13831245
val_check_interval
13841246
^^^^^^^^^^^^^^^^^^

src/pytorch_lightning/CHANGELOG.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,12 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
1818
* Removed the `pytorch_lightning.utilities.enums.AMPType` enum
1919
* Removed the `DeepSpeedPrecisionPlugin(amp_type=..., amp_level=...)` arguments
2020

21+
- Removed legacy device arguments in Trainer ([#16171](https://github.com/Lightning-AI/lightning/pull/16171))
22+
* Removed the `Trainer(gpus=...)` argument
23+
* Removed the `Trainer(tpu_cores=...)` argument
24+
* Removed the `Trainer(ipus=...)` argument
25+
* Removed the `Trainer(num_processes=...)` argument
26+
2127

2228
## [unreleased] - 202Y-MM-DD
2329

src/pytorch_lightning/trainer/connectors/accelerator_connector.py

Lines changed: 7 additions & 83 deletions
Original file line numberDiff line numberDiff line change
@@ -99,10 +99,6 @@ def __init__(
9999
replace_sampler_ddp: bool = True,
100100
deterministic: Optional[Union[bool, _LITERAL_WARN]] = False,
101101
auto_select_gpus: Optional[bool] = None, # TODO: Remove in v1.10.0
102-
num_processes: Optional[int] = None, # deprecated
103-
tpu_cores: Optional[Union[List[int], str, int]] = None, # deprecated
104-
ipus: Optional[int] = None, # deprecated
105-
gpus: Optional[Union[List[int], str, int]] = None, # deprecated
106102
) -> None:
107103
"""The AcceleratorConnector parses several Trainer arguments and instantiates the Strategy including other
108104
components such as the Accelerator and Precision plugins.
@@ -157,7 +153,6 @@ def __init__(
157153

158154
# Raise an exception if there are conflicts between flags
159155
# Set each valid flag to `self._x_flag` after validation
160-
# For devices: Assign gpus, ipus, etc. to the accelerator flag and devices flag
161156
self._strategy_flag: Optional[Union[Strategy, str]] = None
162157
self._accelerator_flag: Optional[Union[Accelerator, str]] = None
163158
self._precision_flag: Optional[Union[int, str]] = None
@@ -177,9 +172,6 @@ def __init__(
177172
plugins=plugins,
178173
sync_batchnorm=sync_batchnorm,
179174
)
180-
self._check_device_config_and_set_final_flags(
181-
devices=devices, num_nodes=num_nodes, num_processes=num_processes, gpus=gpus, ipus=ipus, tpu_cores=tpu_cores
182-
)
183175
# 2. Instantiate Accelerator
184176
self._set_accelerator_if_ipu_strategy_is_passed()
185177

@@ -189,6 +181,7 @@ def __init__(
189181
elif self._accelerator_flag == "gpu":
190182
self._accelerator_flag = self._choose_gpu_accelerator_backend()
191183

184+
self._check_device_config_and_set_final_flags(devices=devices, num_nodes=num_nodes)
192185
self._set_parallel_devices_and_init_accelerator()
193186

194187
# 3. Instantiate ClusterEnvironment
@@ -362,10 +355,6 @@ def _check_device_config_and_set_final_flags(
362355
self,
363356
devices: Optional[Union[List[int], str, int]],
364357
num_nodes: int,
365-
num_processes: Optional[int],
366-
gpus: Optional[Union[List[int], str, int]],
367-
ipus: Optional[int],
368-
tpu_cores: Optional[Union[List[int], str, int]],
369358
) -> None:
370359
self._num_nodes_flag = int(num_nodes) if num_nodes is not None else 1
371360
self._devices_flag = devices
@@ -381,76 +370,12 @@ def _check_device_config_and_set_final_flags(
381370
f" using {accelerator_name} accelerator."
382371
)
383372

384-
# TODO: Delete this method when num_processes, gpus, ipus and tpu_cores gets removed
385-
self._map_deprecated_devices_specific_info_to_accelerator_and_device_flag(
386-
devices, num_processes, gpus, ipus, tpu_cores
387-
)
388-
389373
if self._devices_flag == "auto" and self._accelerator_flag is None:
390374
raise MisconfigurationException(
391375
f"You passed `devices={devices}` but haven't specified"
392376
" `accelerator=('auto'|'tpu'|'gpu'|'ipu'|'cpu'|'hpu'|'mps')` for the devices mapping."
393377
)
394378

395-
def _map_deprecated_devices_specific_info_to_accelerator_and_device_flag(
396-
self,
397-
devices: Optional[Union[List[int], str, int]],
398-
num_processes: Optional[int],
399-
gpus: Optional[Union[List[int], str, int]],
400-
ipus: Optional[int],
401-
tpu_cores: Optional[Union[List[int], str, int]],
402-
) -> None:
403-
"""Emit deprecation warnings for num_processes, gpus, ipus, tpu_cores and set the `devices_flag` and
404-
`accelerator_flag`."""
405-
if num_processes is not None:
406-
rank_zero_deprecation(
407-
f"Setting `Trainer(num_processes={num_processes})` is deprecated in v1.7 and will be removed"
408-
f" in v2.0. Please use `Trainer(accelerator='cpu', devices={num_processes})` instead."
409-
)
410-
if gpus is not None:
411-
rank_zero_deprecation(
412-
f"Setting `Trainer(gpus={gpus!r})` is deprecated in v1.7 and will be removed"
413-
f" in v2.0. Please use `Trainer(accelerator='gpu', devices={gpus!r})` instead."
414-
)
415-
if tpu_cores is not None:
416-
rank_zero_deprecation(
417-
f"Setting `Trainer(tpu_cores={tpu_cores!r})` is deprecated in v1.7 and will be removed"
418-
f" in v2.0. Please use `Trainer(accelerator='tpu', devices={tpu_cores!r})` instead."
419-
)
420-
if ipus is not None:
421-
rank_zero_deprecation(
422-
f"Setting `Trainer(ipus={ipus})` is deprecated in v1.7 and will be removed"
423-
f" in v2.0. Please use `Trainer(accelerator='ipu', devices={ipus})` instead."
424-
)
425-
self._gpus: Optional[Union[List[int], str, int]] = gpus
426-
self._tpu_cores: Optional[Union[List[int], str, int]] = tpu_cores
427-
deprecated_devices_specific_flag = num_processes or gpus or ipus or tpu_cores
428-
if deprecated_devices_specific_flag and deprecated_devices_specific_flag not in ([], 0, "0"):
429-
if devices:
430-
# TODO improve error message
431-
rank_zero_warn(
432-
f"The flag `devices={devices}` will be ignored, "
433-
f"instead the device specific number {deprecated_devices_specific_flag} will be used"
434-
)
435-
436-
if [(num_processes is not None), (gpus is not None), (ipus is not None), (tpu_cores is not None)].count(
437-
True
438-
) > 1:
439-
# TODO: improve error message
440-
rank_zero_warn("more than one device specific flag has been set")
441-
self._devices_flag = deprecated_devices_specific_flag
442-
443-
if self._accelerator_flag is None:
444-
# set accelerator type based on num_processes, gpus, ipus, tpu_cores
445-
if ipus:
446-
self._accelerator_flag = "ipu"
447-
if tpu_cores:
448-
self._accelerator_flag = "tpu"
449-
if gpus:
450-
self._accelerator_flag = "cuda"
451-
if num_processes:
452-
self._accelerator_flag = "cpu"
453-
454379
def _set_accelerator_if_ipu_strategy_is_passed(self) -> None:
455380
# current logic only apply to object config
456381
# TODO this logic should apply to both str and object config
@@ -503,12 +428,7 @@ def _set_parallel_devices_and_init_accelerator(self) -> None:
503428
)
504429

505430
self._set_devices_flag_if_auto_passed()
506-
507-
self._gpus = self._devices_flag if not self._gpus else self._gpus
508-
self._tpu_cores = self._devices_flag if not self._tpu_cores else self._tpu_cores
509-
510431
self._set_devices_flag_if_auto_select_gpus_passed()
511-
512432
self._devices_flag = accelerator_cls.parse_devices(self._devices_flag)
513433
if not self._parallel_devices:
514434
self._parallel_devices = accelerator_cls.get_parallel_devices(self._devices_flag)
@@ -523,9 +443,13 @@ def _set_devices_flag_if_auto_select_gpus_passed(self) -> None:
523443
"The Trainer argument `auto_select_gpus` has been deprecated in v1.9.0 and will be removed in v1.10.0."
524444
" Please use the function `pytorch_lightning.accelerators.find_usable_cuda_devices` instead."
525445
)
526-
if self._auto_select_gpus and isinstance(self._gpus, int) and isinstance(self.accelerator, CUDAAccelerator):
446+
if (
447+
self._auto_select_gpus
448+
and isinstance(self._devices_flag, int)
449+
and isinstance(self.accelerator, CUDAAccelerator)
450+
):
527451
self._devices_flag = pick_multiple_gpus(
528-
self._gpus,
452+
self._devices_flag,
529453
# we already show a deprecation message when user sets Trainer(auto_select_gpus=...)
530454
_show_deprecation=False,
531455
)

0 commit comments

Comments
 (0)