-
Notifications
You must be signed in to change notification settings - Fork 3.6k
debug failing tests for Fabric with ddp_fork
on PT 2.8
#21093
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ware platforms (#21057) * support more hardware platforms and no longer hard code cuda when call _get_default_process_group_backend_for_device * Apply suggestions from code review --------- Signed-off-by: taozhiwei <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Nicki Skafte Detlefsen <[email protected]> Co-authored-by: Jirka Borovec <[email protected]> (cherry picked from commit 119a640)
⚡ Required checks status: All passing 🟢Groups summary🟢 pytorch_lightning: Tests workflow
These checks are required after the changes to 🟢 pytorch_lightning: Azure GPU
These checks are required after the changes to 🟢 pytorch_lightning: Benchmarks
These checks are required after the changes to 🟢 fabric: Docs
These checks are required after the changes to 🟢 lightning_fabric: CPU workflow
These checks are required after the changes to 🟢 lightning_fabric: Azure GPU
These checks are required after the changes to 🟢 mypy
These checks are required after the changes to 🟢 install
These checks are required after the changes to Thank you for your contribution! 💜
|
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #21093 +/- ##
=======================================
- Coverage 87% 87% -0%
=======================================
Files 269 269
Lines 23595 23604 +9
=======================================
- Hits 20582 20581 -1
- Misses 3013 3023 +10 |
for more information, see https://pre-commit.ci
ddp_fork
on PT 2.8 [wip]ddp_fork
on PT 2.8
* let `_get_default_process_group_backend_for_device` support more hardware platforms (#21057) * support more hardware platforms and no longer hard code cuda when call _get_default_process_group_backend_for_device * Apply suggestions from code review --------- * try it * chlog --------- Signed-off-by: taozhiwei <[email protected]> Co-authored-by: taozhiwei <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Nicki Skafte Detlefsen <[email protected]> (cherry picked from commit 3c81316)
* let `_get_default_process_group_backend_for_device` support more hardware platforms (#21057) * support more hardware platforms and no longer hard code cuda when call _get_default_process_group_backend_for_device * Apply suggestions from code review --------- * try it * chlog --------- Signed-off-by: taozhiwei <[email protected]> Co-authored-by: taozhiwei <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Nicki Skafte Detlefsen <[email protected]> (cherry picked from commit 3c81316)
What does this PR do?
This needs have a permanent resolution
deepspeed >=0.14.1,<=0.15.0
#21076 (comment)_get_default_process_group_backend_for_device
support more hardware platforms #21057 it was failing constantlyBefore submitting
PR review
Anyone in the community is welcome to review the PR.
Before you start reviewing, make sure you have read the review guidelines. In short, see the following bullet-list:
Reviewer checklist
📚 Documentation preview 📚: https://pytorch-lightning--21093.org.readthedocs.build/en/21093/