Skip to content

Commit 6e52562

Browse files
awaelchlilantiga
authored andcommitted
Fix comm initialization in MPIEnvironment (#19074)
(cherry picked from commit 197b225)
1 parent eb108f0 commit 6e52562

File tree

5 files changed

+12
-2
lines changed

5 files changed

+12
-2
lines changed

.github/workflows/ci-tests-fabric.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -164,6 +164,7 @@ jobs:
164164
working-directory: tests/tests_fabric
165165
# NOTE: do not include coverage report here, see: https://github.com/nedbat/coveragepy/issues/1003
166166
run: |
167+
echo $GITHUB_RUN_ID
167168
python -m coverage run --source ${{ env.COVERAGE_SCOPE }} \
168169
-m pytest -v --timeout=30 --durations=50 --random-order-seed=$GITHUB_RUN_ID
169170

src/lightning/fabric/CHANGELOG.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,10 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
1717
-
1818

1919

20+
- Fixed broadcast at initialization in `MPIEnvironment` ([#19074](https://github.com/Lightning-AI/lightning/pull/19074))
21+
22+
23+
2024
## [2.1.2] - 2023-11-15
2125

2226
### Fixed

src/lightning/fabric/plugins/environments/mpi.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -107,9 +107,9 @@ def _get_main_port(self) -> int:
107107

108108
def _init_comm_local(self) -> None:
109109
hostname = socket.gethostname()
110-
all_hostnames = self._comm_world.gather(hostname, root=0)
110+
all_hostnames = self._comm_world.gather(hostname, root=0) # returns None on non-root ranks
111111
# sort all the hostnames, and find unique ones
112-
unique_hosts = sorted(set(all_hostnames))
112+
unique_hosts = sorted(set(all_hostnames)) if all_hostnames is not None else []
113113
unique_hosts = self._comm_world.bcast(unique_hosts, root=0)
114114
# find the index for this host in the list of hosts:
115115
self._node_rank = unique_hosts.index(hostname)

src/lightning/pytorch/CHANGELOG.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,9 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
3232
- Fixed an edge case where `ModelCheckpoint` would alternate between versioned and unversioned filename ([#19064](https://github.com/Lightning-AI/lightning/pull/19064))
3333

3434

35+
- Fixed broadcast at initialization in `MPIEnvironment` ([#19074](https://github.com/Lightning-AI/lightning/pull/19074))
36+
37+
3538
## [2.1.2] - 2023-11-15
3639

3740
### Fixed

tests/tests_fabric/plugins/environments/test_mpi.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -80,11 +80,13 @@ def test_init_local_comm(monkeypatch):
8080
env = MPIEnvironment()
8181

8282
hostname_mock.return_value = "host1"
83+
env._comm_world.gather.return_value = ["host1", "host2"]
8384
env._comm_world.bcast.return_value = ["host1", "host2"]
8485
assert env.node_rank() == 0
8586

8687
env._node_rank = None
8788
hostname_mock.return_value = "host2"
89+
env._comm_world.gather.return_value = None
8890
env._comm_world.bcast.return_value = ["host1", "host2"]
8991
assert env.node_rank() == 1
9092

0 commit comments

Comments
 (0)