Lightning-AI
diff --git a/‎docs/source-pytorch/accelerators/gpu_intermediate.rst
Lines changed: 8 additions & 7 deletions b/‎docs/source-pytorch/accelerators/gpu_intermediate.rst
Lines changed: 8 additions & 7 deletions
diff --git a/‎docs/source-pytorch/api_references.rst
Lines changed: 1 addition & 2 deletions b/‎docs/source-pytorch/api_references.rst
Lines changed: 1 addition & 2 deletions
diff --git a/‎docs/source-pytorch/common/checkpointing_basic.rst
Lines changed: 1 addition & 1 deletion b/‎docs/source-pytorch/common/checkpointing_basic.rst
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/source-pytorch/common/optimization.rst
Lines changed: 0 additions & 4 deletions b/‎docs/source-pytorch/common/optimization.rst
Lines changed: 0 additions & 4 deletions
diff --git a/‎docs/source-pytorch/common/precision_intermediate.rst
Lines changed: 1 addition & 40 deletions b/‎docs/source-pytorch/common/precision_intermediate.rst
Lines changed: 1 addition & 40 deletions
diff --git a/‎docs/source-pytorch/common/trainer.rst
Lines changed: 0 additions & 42 deletions b/‎docs/source-pytorch/common/trainer.rst
Lines changed: 0 additions & 42 deletions
diff --git a/‎docs/source-pytorch/conf.py
Lines changed: 0 additions & 1 deletion b/‎docs/source-pytorch/conf.py
Lines changed: 0 additions & 1 deletion
diff --git a/‎docs/source-pytorch/extensions/plugins.rst
Lines changed: 1 addition & 2 deletions b/‎docs/source-pytorch/extensions/plugins.rst
Lines changed: 1 addition & 2 deletions
diff --git a/‎docs/source-pytorch/model/manual_optimization.rst
Lines changed: 1 addition & 1 deletion b/‎docs/source-pytorch/model/manual_optimization.rst
Lines changed: 1 addition & 1 deletion
diff --git a/‎src/lightning_fabric/connector.py
Lines changed: 3 additions & 3 deletions b/‎src/lightning_fabric/connector.py
Lines changed: 3 additions & 3 deletions
@@ -469,25 +469,26 @@ Validation and test step have the same option when using DP.
 Distributed and 16-bit precision
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
-Due to an issue with Apex and DataParallel (PyTorch and NVIDIA issue), Lightning does
-not allow 16-bit and DP training. We tried to get this to work, but it's an issue on their end.
-
 Below are the possible configurations we support.
 
 +-------+---------+-----+-----+--------+-----------------------------------------------------------------------+
-| 1 GPU | 1+ GPUs | DP  | DDP | 16-bit | command                                                               |
+| 1 GPU | 1+ GPUs | DDP  | DP | 16-bit | command                                                               |
 +=======+=========+=====+=====+========+=======================================================================+
 | Y     |         |     |     |        | `Trainer(accelerator="gpu", devices=1)`                               |
 +-------+---------+-----+-----+--------+-----------------------------------------------------------------------+
 | Y     |         |     |     | Y      | `Trainer(accelerator="gpu", devices=1, precision=16)`                 |
 +-------+---------+-----+-----+--------+-----------------------------------------------------------------------+
-|       | Y       | Y   |     |        | `Trainer(accelerator="gpu", devices=k, strategy='dp')`                |
+|       | Y       | Y   |     |        | `Trainer(accelerator="gpu", devices=k, strategy='ddp')`               |
++-------+---------+-----+-----+--------+-----------------------------------------------------------------------+
+|       | Y       | Y   |     | Y      | `Trainer(accelerator="gpu", devices=k, strategy='ddp', precision=16)` |
 +-------+---------+-----+-----+--------+-----------------------------------------------------------------------+
-|       | Y       |     | Y   |        | `Trainer(accelerator="gpu", devices=k, strategy='ddp')`               |
+|       | Y       |     | Y   |        | `Trainer(accelerator="gpu", devices=k, strategy='dp')`                |
 +-------+---------+-----+-----+--------+-----------------------------------------------------------------------+
-|       | Y       |     | Y   | Y      | `Trainer(accelerator="gpu", devices=k, strategy='ddp', precision=16)` |
+|       | Y       |     | Y   | Y      | `Trainer(accelerator="gpu", devices=k, strategy='dp', precision=16)`  |
 +-------+---------+-----+-----+--------+-----------------------------------------------------------------------+
 
+DDP and DP can also be used with 1 GPU, but there's no reason to do so other than debugging distributed-related issues.
+
 
 Implement Your Own Distributed (DDP) training
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
@@ -184,15 +184,14 @@ precision
     :nosignatures:
     :template: classtemplate.rst
 
-    ApexMixedPrecisionPlugin
     ColossalAIPrecisionPlugin
     DeepSpeedPrecisionPlugin
     DoublePrecisionPlugin
     FullyShardedNativeMixedPrecisionPlugin
     FullyShardedNativeNativeMixedPrecisionPlugin
     HPUPrecisionPlugin
     IPUPrecisionPlugin
-    NativeMixedPrecisionPlugin
+    MixedPrecisionPlugin
     PrecisionPlugin
     ShardedNativeMixedPrecisionPlugin
     TPUBf16PrecisionPlugin
 
@@ -186,5 +186,5 @@ If you don't just want to load weights, but instead restore the full training, d
    model = LitModel()
    trainer = Trainer()
 
-   # automatically restores model, epoch, step, LR schedulers, apex, etc...
+   # automatically restores model, epoch, step, LR schedulers, etc...
    trainer.fit(model, ckpt_path="some/path/to/my_checkpoint.ckpt")
@@ -151,7 +151,6 @@ For example, here step optimizer A every batch and optimizer B every 2 batches.
         optimizer_idx,
         optimizer_closure,
         on_tpu=False,
-        using_native_amp=False,
         using_lbfgs=False,
     ):
         # update generator every step
@@ -183,7 +182,6 @@ Here we add a manual learning rate warm-up without an lr scheduler.
         optimizer_idx,
         optimizer_closure,
         on_tpu=False,
-        using_native_amp=False,
         using_lbfgs=False,
     ):
         # update params
@@ -215,7 +213,6 @@ to perform a step, Lightning won't be able to support accelerators, precision an
         optimizer_idx,
         optimizer_closure,
         on_tpu=False,
-        using_native_amp=False,
         using_lbfgs=False,
     ):
         optimizer.step(closure=optimizer_closure)
@@ -232,7 +229,6 @@ to perform a step, Lightning won't be able to support accelerators, precision an
         optimizer_idx,
         optimizer_closure,
         on_tpu=False,
-        using_native_amp=False,
         using_lbfgs=False,
     ):
         optimizer = optimizer.optimizer
 
@@ -58,6 +58,7 @@ FP16 Mixed Precision
 ********************
 
 In most cases, mixed precision uses FP16. Supported `PyTorch operations <https://pytorch.org/docs/stable/amp.html#op-specific-behavior>`__ automatically run in FP16, saving memory and improving throughput on the supported accelerators.
+Since computation happens in FP16, there is a chance of numerical instability during training. This is handled internally by a dynamic grad scaler which skips invalid steps and adjusts the scaler to ensure subsequent steps fall within a finite range. For more information `see the autocast docs <https://pytorch.org/docs/stable/amp.html#gradient-scaling>`__.
 
 
 .. note::
@@ -69,46 +70,6 @@ In most cases, mixed precision uses FP16. Supported `PyTorch operations <https:/
 
     Trainer(accelerator="gpu", devices=1, precision=16)
 
-
-PyTorch Native
---------------
-
-PyTorch 1.6 release introduced mixed precision functionality into their core as the AMP package, `torch.cuda.amp <https://pytorch.org/docs/stable/amp.html>`__. It is more flexible and intuitive compared to `NVIDIA APEX <https://github.com/NVIDIA/apex>`__.
-Since computation happens in FP16, there is a chance of numerical instability during training. This is handled internally by a dynamic grad scaler which skips invalid steps and adjusts the scaler to ensure subsequent steps fall within a finite range. For more information `see the autocast docs <https://pytorch.org/docs/stable/amp.html#gradient-scaling>`__.
-Lightning uses native amp by default with ``precision=16|"bf16"``. You can also set it using:
-
-.. testcode::
-
-    Trainer(precision=16, amp_backend="native")
-
-
-NVIDIA APEX
------------
-
-.. warning::
-
-    We strongly recommend using the above native mixed precision rather than NVIDIA APEX unless you require more refined control.
-
-`NVIDIA APEX <https://github.com/NVIDIA/apex>`__ offers additional flexibility in setting mixed precision. This can be useful when trying out different precision configurations, such as keeping most of your weights in FP16 and running computation in FP16.
-
-.. testcode::
-    :skipif: not _APEX_AVAILABLE or not torch.cuda.is_available()
-
-    Trainer(accelerator="gpu", devices=1, amp_backend="apex", precision=16)
-
-Set the `NVIDIA optimization level <https://nvidia.github.io/apex/amp.html#opt-levels>`__ via the precision plugin.
-
-.. testcode::
-    :skipif: not _APEX_AVAILABLE or not torch.cuda.is_available()
-
-    from pytorch_lightning.plugins import ApexMixedPrecisionPlugin
-
-
-    apex_plugin = ApexMixedPrecisionPlugin(amp_level="O3")
-    Trainer(accelerator="gpu", devices=1, precision=16, plugins=[apex_plugin])
-
-----
-
 ************************
 BFloat16 Mixed Precision
 ************************
 
@@ -289,27 +289,6 @@ Example::
     # no accumulation for epochs 1-4. accumulate 3 for epochs 5-10. accumulate 20 after that
     trainer = Trainer(accumulate_grad_batches={5: 3, 10: 20})
 
-amp_backend
-^^^^^^^^^^^
-
-.. raw:: html
-
-    <video width="50%" max-width="400px" controls
-    poster="https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/pl_docs/trainer_flags/thumb/amp_backend.jpg"
-    src="https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/pl_docs/trainer_flags/amp_backend.mp4"></video>
-
-|
-
-Use PyTorch AMP ('native'), or NVIDIA apex ('apex').
-
-.. testcode::
-
-    # using PyTorch built-in AMP, default used by the Trainer
-    trainer = Trainer(amp_backend="native")
-
-    # using NVIDIA Apex
-    trainer = Trainer(amp_backend="apex")
-
 auto_scale_batch_size
 ^^^^^^^^^^^^^^^^^^^^^
 
@@ -1156,27 +1135,6 @@ Half precision, or mixed precision, is the combined use of 32 and 16 bit floatin
 
 .. note:: When running on TPUs, torch.bfloat16 will be used but tensor printing will still show torch.float32.
 
-.. admonition::  If you are interested in using Apex 16-bit training:
-   :class: dropdown
-
-    NVIDIA Apex and DDP have instability problems. We recommend using the native AMP for 16-bit precision with multiple GPUs.
-    To use Apex 16-bit training:
-
-    1. `Install apex. <https://github.com/NVIDIA/apex#quick-start>`__
-
-    2. Set the ``precision`` trainer flag to 16. You can customize the `Apex optimization level <https://nvidia.github.io/apex/amp.html#opt-levels>`_ by setting the ``amp_level`` flag
-       in the precision plugin.
-
-    .. testcode::
-        :skipif: not _APEX_AVAILABLE or not torch.cuda.is_available()
-
-        from pytorch_lightning.plugins import ApexMixedPrecisionPlugin
-
-
-        apex_plugin = ApexMixedPrecisionPlugin(amp_level="O2")
-        # turn on 16-bit
-        trainer = Trainer(accelerator="gpu", devices=1, precision=16, plugins=[apex_plugin])
-
 profiler
 ^^^^^^^^
 
 
@@ -398,7 +398,6 @@ def package_list_from_file(file):
 from pytorch_lightning.callbacks import Callback
 from pytorch_lightning.cli import _JSONARGPARSE_SIGNATURES_AVAILABLE as _JSONARGPARSE_AVAILABLE
 from pytorch_lightning.utilities import (
-    _APEX_AVAILABLE,
     _TORCHVISION_AVAILABLE,
 )
 from pytorch_lightning.loggers.neptune import _NEPTUNE_AVAILABLE
 
@@ -52,15 +52,14 @@ The full list of built-in precision plugins is listed below.
     :nosignatures:
     :template: classtemplate.rst
 
-    ApexMixedPrecisionPlugin
     ColossalAIPrecisionPlugin
     DeepSpeedPrecisionPlugin
     DoublePrecisionPlugin
     FullyShardedNativeMixedPrecisionPlugin
     FullyShardedNativeNativeMixedPrecisionPlugin
     HPUPrecisionPlugin
     IPUPrecisionPlugin
-    NativeMixedPrecisionPlugin
+    MixedPrecisionPlugin
     PrecisionPlugin
     ShardedNativeMixedPrecisionPlugin
     TPUBf16PrecisionPlugin
 
@@ -319,4 +319,4 @@ Here is an example using a closure function.
         opt.step(closure=closure)
 
 .. warning::
-   The :class:`~torch.optim.LBFGS` optimizer is not supported for apex AMP, native AMP, IPUs, or DeepSpeed.
+   The :class:`~torch.optim.LBFGS` optimizer is not supported for AMP, IPUs, or DeepSpeed.
@@ -26,7 +26,7 @@
 from lightning_fabric.plugins import (
     CheckpointIO,
     DeepSpeedPrecision,
-    NativeMixedPrecision,
+    MixedPrecision,
     Precision,
     TPUBf16Precision,
     TPUPrecision,
@@ -452,7 +452,7 @@ def _check_and_init_precision(self) -> Precision:
                     )
                 return TPUBf16Precision()
         if isinstance(self.strategy, DeepSpeedStrategy):
-            return DeepSpeedPrecision(self._precision_input, amp_type="native", amp_level=None)  # type: ignore
+            return DeepSpeedPrecision(self._precision_input)  # type: ignore
 
         if self._precision_input == 32:
             return Precision()
@@ -476,7 +476,7 @@ def _check_and_init_precision(self) -> Precision:
 
             if isinstance(self.strategy, FSDPStrategy):
                 return FSDPPrecision(precision=self._precision_input, device=device)
-            return NativeMixedPrecision(precision=self._precision_input, device=device)
+            return MixedPrecision(precision=self._precision_input, device=device)
 
         raise RuntimeError("No precision set")
Original file line number	Diff line number	Diff line change
`@@ -398,7 +398,6 @@ def package_list_from_file(file):`
`398`	`398`	`from pytorch_lightning.callbacks import Callback`
`399`	`399`	`from pytorch_lightning.cli import _JSONARGPARSE_SIGNATURES_AVAILABLE as _JSONARGPARSE_AVAILABLE`
`400`	`400`	`from pytorch_lightning.utilities import (`
`401`		`- _APEX_AVAILABLE,`
`402`	`401`	`_TORCHVISION_AVAILABLE,`
`403`	`402`	`)`
`404`	`403`	`from pytorch_lightning.loggers.neptune import _NEPTUNE_AVAILABLE`