Skip to content

Conversation

tohtana
Copy link
Contributor

@tohtana tohtana commented Jun 14, 2025

This PR improves pad_tensors in deepspeed/compile/util.py, which pads tensors so that all ranks have tensors with the same shape.
Previously, this function only adjusts tensor shapes, but tensor strides could differ across ranks, leading to recompilation on only some ranks. As DeepCompile inserts communication operators in the graph, the communication collective easily gets stuck.

To address this issue, this PR replaces the use of torch.nn.functional.pad with a new approach that ensures consistent strides and avoids communication issues during distributed operations.

Signed-off-by: Masahiro Tanaka <[email protected]>
@tohtana tohtana requested review from loadams and tjruwase as code owners June 14, 2025 01:33
@tohtana tohtana merged commit 600d280 into master Jun 14, 2025
9 checks passed
@tohtana tohtana deleted the tohtana/fix_padding_for_compile branch June 14, 2025 23:58
deepcharm pushed a commit to deepcharm/DeepSpeed that referenced this pull request Jun 16, 2025
This PR improves `pad_tensors` in `deepspeed/compile/util.py`, which
pads tensors so that all ranks have tensors with the same shape.
Previously, this function only adjusts tensor shapes, but tensor strides
could differ across ranks, leading to recompilation on only some ranks.
As DeepCompile inserts communication operators in the graph, the
communication collective easily gets stuck.

To address this issue, this PR replaces the use of
`torch.nn.functional.pad` with a new approach that ensures consistent
strides and avoids communication issues during distributed operations.

Signed-off-by: Masahiro Tanaka <[email protected]>
Signed-off-by: Max Kovalenko <[email protected]>
Antlera pushed a commit to Antlera/DeepSpeed that referenced this pull request Jun 27, 2025
This PR improves `pad_tensors` in `deepspeed/compile/util.py`, which
pads tensors so that all ranks have tensors with the same shape.
Previously, this function only adjusts tensor shapes, but tensor strides
could differ across ranks, leading to recompilation on only some ranks.
As DeepCompile inserts communication operators in the graph, the
communication collective easily gets stuck.

To address this issue, this PR replaces the use of
`torch.nn.functional.pad` with a new approach that ensures consistent
strides and avoids communication issues during distributed operations.

Signed-off-by: Masahiro Tanaka <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants