-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Description
Proposed refactor
Deprecate the Trainer(auto_select_gpus=True|False)
option and just enable it always when devices=int
gets selected.
Motivation
- The name is misleading, we don't actually have an algorithm to select the gpus automatically in a smart way. A confusion by user was raised recently: Auto_select_gpus always choose gpus [0,1] when devices=2, strategy = dp, accelarator = 'auto' #13012
- The flag only applies to the GPU accelerator. There are no equivalent flags for tpus, ipus, etc.
- What auto_select_gpus does is such a niche use case that it's almost not worth talking about. The implementation just runs through all available GPUs and tests if it can place a tensor in memory. This essentially tests wether the GPU is in exclusive mode or not. See the implementation here.
- Internally, the flag is framed as "tuning" as the functions are under the tuner module, but this does not fall under the term "tuning" in my opinion.
Pitch
Keep the feature, but simply enable it by default and remove the flag from the Trainer. It is still useful to have the check against exclusivity, for example on managed clusters, and I can't think of a reason why it would be undesired. This would only apply when passing devices=int
and not when indices get passed.
Alternatives
-
Keep it and come up with sophisticated methods to determine whether a GPU should be selected or not based on memory profile, utilization etc. This IMO is not feasible because not enough information is available before training.
-
Make it clearer in the documentation what this actually does.
Additional context
If you enjoy Lightning, check out our other projects! ⚡
-
Metrics: Machine learning metrics for distributed, scalable PyTorch applications.
-
Lite: enables pure PyTorch users to scale their existing code on any kind of device while retaining full control over their own loops and optimization logic.
-
Flash: The fastest way to get a Lightning baseline! A collection of tasks for fast prototyping, baselining, fine-tuning, and solving problems with deep learning.
-
Bolts: Pretrained SOTA Deep Learning models, callbacks, and more for research and production with PyTorch Lightning and PyTorch.
-
Lightning Transformers: Flexible interface for high-performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.
cc @Borda @justusschock @awaelchli @rohitgr7 @tchaton @kaushikb11 @carmocca