Skip to content

RFC: Deprecate auto_select_gpus Trainer argument #13079

@awaelchli

Description

@awaelchli

Proposed refactor

Deprecate the Trainer(auto_select_gpus=True|False) option and just enable it always when devices=int gets selected.

Motivation

  • The name is misleading, we don't actually have an algorithm to select the gpus automatically in a smart way. A confusion by user was raised recently: Auto_select_gpus always choose gpus [0,1] when devices=2, strategy = dp, accelarator = 'auto' #13012
  • The flag only applies to the GPU accelerator. There are no equivalent flags for tpus, ipus, etc.
  • What auto_select_gpus does is such a niche use case that it's almost not worth talking about. The implementation just runs through all available GPUs and tests if it can place a tensor in memory. This essentially tests wether the GPU is in exclusive mode or not. See the implementation here.
  • Internally, the flag is framed as "tuning" as the functions are under the tuner module, but this does not fall under the term "tuning" in my opinion.

Pitch

Keep the feature, but simply enable it by default and remove the flag from the Trainer. It is still useful to have the check against exclusivity, for example on managed clusters, and I can't think of a reason why it would be undesired. This would only apply when passing devices=int and not when indices get passed.

Alternatives

  • Keep it and come up with sophisticated methods to determine whether a GPU should be selected or not based on memory profile, utilization etc. This IMO is not feasible because not enough information is available before training.

  • Make it clearer in the documentation what this actually does.

Additional context

#13012


If you enjoy Lightning, check out our other projects! ⚡

  • Metrics: Machine learning metrics for distributed, scalable PyTorch applications.

  • Lite: enables pure PyTorch users to scale their existing code on any kind of device while retaining full control over their own loops and optimization logic.

  • Flash: The fastest way to get a Lightning baseline! A collection of tasks for fast prototyping, baselining, fine-tuning, and solving problems with deep learning.

  • Bolts: Pretrained SOTA Deep Learning models, callbacks, and more for research and production with PyTorch Lightning and PyTorch.

  • Lightning Transformers: Flexible interface for high-performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.

cc @Borda @justusschock @awaelchli @rohitgr7 @tchaton @kaushikb11 @carmocca

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions