@@ -492,8 +492,6 @@ devices
492
492
^^^^^^^
493
493
494
494
Number of devices to train on (``int ``), which devices to train on (``list `` or ``str ``), or ``"auto" ``.
495
- It will be mapped to either ``gpus ``, ``tpu_cores ``, ``num_processes `` or ``ipus ``,
496
- based on the accelerator type (``"cpu", "gpu", "tpu", "ipu", "auto" ``).
497
495
498
496
.. code-block :: python
499
497
@@ -624,56 +622,6 @@ impact to subsequent runs. These are the changes enabled:
624
622
- Disables the Tuner.
625
623
- If using the CLI, the configuration file is not saved.
626
624
627
- .. _gpus :
628
-
629
- gpus
630
- ^^^^
631
-
632
- .. warning :: ``gpus=x`` has been deprecated in v1.7 and will be removed in v2.0.
633
- Please use ``accelerator='gpu' `` and ``devices=x `` instead.
634
-
635
- .. raw :: html
636
-
637
- <video width =" 50%" max-width =" 400px" controls
638
- poster =" https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/pl_docs/trainer_flags/thumb/gpus.jpg"
639
- src =" https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/pl_docs/trainer_flags/gpus.mp4" ></video >
640
-
641
- |
642
-
643
- - Number of GPUs to train on (int)
644
- - or which GPUs to train on (list)
645
- - can handle strings
646
-
647
- .. testcode ::
648
-
649
- # default used by the Trainer (ie: train on CPU)
650
- trainer = Trainer(gpus=None)
651
-
652
- # equivalent
653
- trainer = Trainer(gpus=0)
654
-
655
- Example::
656
-
657
- # int: train on 2 gpus
658
- trainer = Trainer(gpus=2)
659
-
660
- # list: train on GPUs 1, 4 (by bus ordering)
661
- trainer = Trainer(gpus=[1, 4])
662
- trainer = Trainer(gpus='1, 4') # equivalent
663
-
664
- # -1: train on all gpus
665
- trainer = Trainer(gpus=-1)
666
- trainer = Trainer(gpus='-1') # equivalent
667
-
668
- # combine with num_nodes to train on multiple GPUs across nodes
669
- # uses 8 gpus in total
670
- trainer = Trainer(gpus=2, num_nodes=4)
671
-
672
- # train only on GPUs 1 and 4 across nodes
673
- trainer = Trainer(gpus=[1, 4], num_nodes=4)
674
-
675
- See Also:
676
- - :ref: `Multi GPU Training <multi_gpu >`
677
625
678
626
gradient_clip_val
679
627
^^^^^^^^^^^^^^^^^
@@ -951,33 +899,6 @@ Number of GPU nodes for distributed training.
951
899
# to train on 8 nodes
952
900
trainer = Trainer(num_nodes=8)
953
901
954
- num_processes
955
- ^^^^^^^^^^^^^
956
-
957
- .. warning :: ``num_processes=x`` has been deprecated in v1.7 and will be removed in v2.0.
958
- Please use ``accelerator='cpu' `` and ``devices=x `` instead.
959
-
960
- .. raw :: html
961
-
962
- <video width =" 50%" max-width =" 400px" controls
963
- poster =" https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/pl_docs/trainer_flags/thumb/num_processes.jpg"
964
- src =" https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/pl_docs/trainer_flags/num_processes.mp4" ></video >
965
-
966
- |
967
-
968
- Number of processes to train with. Automatically set to the number of GPUs
969
- when using ``strategy="ddp" ``. Set to a number greater than 1 when
970
- using ``accelerator="cpu" `` and ``strategy="ddp" `` to mimic distributed training on a
971
- machine without GPUs. This is useful for debugging, but **will not ** provide
972
- any speedup, since single-process Torch already makes efficient use of multiple
973
- CPUs. While it would typically spawns subprocesses for training, setting
974
- ``num_nodes > 1 `` and keeping ``num_processes = 1 `` runs training in the main
975
- process.
976
-
977
- .. testcode ::
978
-
979
- # Simulate DDP for debugging on your GPU-less laptop
980
- trainer = Trainer(accelerator="cpu", strategy="ddp", num_processes=2)
981
902
982
903
num_sanity_val_steps
983
904
^^^^^^^^^^^^^^^^^^^^
@@ -1320,65 +1241,6 @@ track_grad_norm
1320
1241
# track the 2-norm
1321
1242
trainer = Trainer(track_grad_norm=2)
1322
1243
1323
- .. _tpu_cores :
1324
-
1325
- tpu_cores
1326
- ^^^^^^^^^
1327
-
1328
- .. warning :: ``tpu_cores=x`` has been deprecated in v1.7 and will be removed in v2.0.
1329
- Please use ``accelerator='tpu' `` and ``devices=x `` instead.
1330
-
1331
- .. raw :: html
1332
-
1333
- <video width =" 50%" max-width =" 400px" controls
1334
- poster =" https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/pl_docs/trainer_flags/thumb/tpu_cores.jpg"
1335
- src =" https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/pl_docs/trainer_flags/tpu_cores.mp4" ></video >
1336
-
1337
- |
1338
-
1339
- - How many TPU cores to train on (1 or 8).
1340
- - Which TPU core to train on [1-8]
1341
-
1342
- A single TPU v2 or v3 has 8 cores. A TPU pod has
1343
- up to 2048 cores. A slice of a POD means you get as many cores
1344
- as you request.
1345
-
1346
- Your effective batch size is batch_size * total tpu cores.
1347
-
1348
- This parameter can be either 1 or 8.
1349
-
1350
- Example::
1351
-
1352
- # your_trainer_file.py
1353
-
1354
- # default used by the Trainer (ie: train on CPU)
1355
- trainer = Trainer(tpu_cores=None)
1356
-
1357
- # int: train on a single core
1358
- trainer = Trainer(tpu_cores=1)
1359
-
1360
- # list: train on a single selected core
1361
- trainer = Trainer(tpu_cores=[2])
1362
-
1363
- # int: train on all cores few cores
1364
- trainer = Trainer(tpu_cores=8)
1365
-
1366
- # for 8+ cores must submit via xla script with
1367
- # a max of 8 cores specified. The XLA script
1368
- # will duplicate script onto each TPU in the POD
1369
- trainer = Trainer(tpu_cores=8)
1370
-
1371
- To train on more than 8 cores (ie: a POD),
1372
- submit this script using the xla_dist script.
1373
-
1374
- Example::
1375
-
1376
- python -m torch_xla.distributed.xla_dist
1377
- --tpu=$TPU_POD_NAME
1378
- --conda-env=torch-xla-nightly
1379
- --env=XLA_USE_BF16=1
1380
- -- python your_trainer_file.py
1381
-
1382
1244
1383
1245
val_check_interval
1384
1246
^^^^^^^^^^^^^^^^^^
0 commit comments