Improve padding util for compile #7355

tohtana · 2025-06-14T01:33:19Z

This PR improves pad_tensors in deepspeed/compile/util.py, which pads tensors so that all ranks have tensors with the same shape.
Previously, this function only adjusts tensor shapes, but tensor strides could differ across ranks, leading to recompilation on only some ranks. As DeepCompile inserts communication operators in the graph, the communication collective easily gets stuck.

To address this issue, this PR replaces the use of torch.nn.functional.pad with a new approach that ensures consistent strides and avoids communication issues during distributed operations.

Signed-off-by: Masahiro Tanaka <[email protected]>

This PR improves `pad_tensors` in `deepspeed/compile/util.py`, which pads tensors so that all ranks have tensors with the same shape. Previously, this function only adjusts tensor shapes, but tensor strides could differ across ranks, leading to recompilation on only some ranks. As DeepCompile inserts communication operators in the graph, the communication collective easily gets stuck. To address this issue, this PR replaces the use of `torch.nn.functional.pad` with a new approach that ensures consistent strides and avoids communication issues during distributed operations. Signed-off-by: Masahiro Tanaka <[email protected]> Signed-off-by: Max Kovalenko <[email protected]>

This PR improves `pad_tensors` in `deepspeed/compile/util.py`, which pads tensors so that all ranks have tensors with the same shape. Previously, this function only adjusts tensor shapes, but tensor strides could differ across ranks, leading to recompilation on only some ranks. As DeepCompile inserts communication operators in the graph, the communication collective easily gets stuck. To address this issue, this PR replaces the use of `torch.nn.functional.pad` with a new approach that ensures consistent strides and avoids communication issues during distributed operations. Signed-off-by: Masahiro Tanaka <[email protected]>

Improve padding util for compile

ef5587e

Signed-off-by: Masahiro Tanaka <[email protected]>

tohtana requested review from loadams and tjruwase as code owners June 14, 2025 01:33

Merge branch 'master' into tohtana/fix_padding_for_compile

aecd693

sfc-gh-truwase approved these changes Jun 14, 2025

View reviewed changes

tohtana merged commit 600d280 into master Jun 14, 2025
9 checks passed

tohtana deleted the tohtana/fix_padding_for_compile branch June 14, 2025 23:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve padding util for compile #7355

Improve padding util for compile #7355

Uh oh!

tohtana commented Jun 14, 2025

Uh oh!

Uh oh!

Uh oh!

Improve padding util for compile #7355

Improve padding util for compile #7355

Uh oh!

Conversation

tohtana commented Jun 14, 2025

Uh oh!

Uh oh!

Uh oh!