Distributed training of multiple nodes with different GPU numbers

## 🐛 Bug


I have four nodes
```
g-1-0: 2*A100
g-1-1: 4*A40
g-1-2: 8*3090
g-1-3: 8*3090
```
I get error:
```
RuntimeError: Timed out initializing process group in store based barrier on rank: 0, for key: store_based_barrier_key:1 (world_size=8, worker_count=2, timeout=0:30:00)
```
Guess it may be because the wrong world size is obtained.
### To Reproduce
```
trainer = pl.Trainer(devices=-1, num_nodes=4, strategy="fsdp", accelerator='gpu')
```



### Expected behavior



### Environment



- PyTorch Lightning Version (e.g., 1.5.0): 1.5.10
- PyTorch Version (e.g., 1.10): 1.10
- Python version (e.g., 3.9): 3.8
- OS (e.g., Linux): Linux
- CUDA/cuDNN version: 11.2
- GPU models and configuration: 
- How you installed PyTorch (`conda`, `pip`, source): conda
- If compiling from source, the output of `torch.__config__.show()`:
- Any other relevant information:

### Additional context




cc @justusschock @kaushikb11 @awaelchli @akihironitta @rohitgr7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Distributed training of multiple nodes with different GPU numbers #13506

🐛 Bug

To Reproduce

Expected behavior

Environment

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Distributed training of multiple nodes with different GPU numbers #13506

Description

🐛 Bug

To Reproduce

Expected behavior

Environment

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions