Support complicated use cases with TiedLayerSpec #7208

limjcst · 2025-04-08T06:33:58Z

I want to reuse a composed module in the pipeline. For example, the following MyModule has a member linear, which is also a module.

class MyModule(torch.nn.Module):
    def __init__(self, n_in: int, n_out: int):
        super().__init__()
        self.linear = torch.nn.Linear(n_in, n_out)
        self.layer_norm = torch.nn.LayerNorm(n_out)

    def forward(self, data: torch.Tensor) -> torch.Tensor:
        hidden = self.linear(data)
        hidden = self.layer_norm(hidden)
        return hidden

MyModule.linear.weight should be synchronized among related ranks. As a result, I add linear.weight to TiedLayerSpec.tied_weight_attr.
BTW, I generate the whole tied_weight_attr by the following instruction.

tied_weight_attr = [name for name, p in layer.named_parameters() if p.numel() > 1]

However, the builtin getattr used by PipelineModule fails to find a nested attribute like linear.weight.
Hence, this PR first extends the builtin getattr to a recursive version PipelineModule._recursive_getattr, accessing each attribute segment one by one.

Meanwhile, the order of tied weights matters in synchronization. This PR suggests to sort tie_keys in PipelineModule._index_tied_modules to avoid hanging.

Extend the builtin `getattr` to a recursive version `PipelineModule._recursive_getattr` for nested tied weights, e.g., "linear.weight". Meanwhile, sort tie_keys in `PipelineModule._index_tied_modules` to avoid hanging. Signed-off-by: Mingjie Li <[email protected]>

tohtana · 2025-04-08T17:45:43Z

Thank you @limjcst for the contribution! This is a significant improvement.

limjcst · 2025-04-09T07:05:37Z

nv-accelerate-v100 failed, raising "invalid command 'bdist_wheel'". However, this job succeeded in another run.

Note that the failed job used "cached wheel-0.46.1-py3-none-any.whl.metadata".
@agronholm suggested setuptools >= 70.1 for wheel==0.46.0. Is this suggestion meaningful for DeepSpeed?

pypa/wheel#660 (comment)

loadams · 2025-04-09T14:43:01Z

nv-accelerate-v100 failed, raising "invalid command 'bdist_wheel'". However, this job succeeded in another run.

Note that the failed job used "cached wheel-0.46.1-py3-none-any.whl.metadata". @agronholm suggested setuptools >= 70.1 for wheel==0.46.0. Is this suggestion meaningful for DeepSpeed?

pypa/wheel#660 (comment)

Thanks @limjcst - I saw this failure on another PR and will take a look and merge the fixes into your PR when ready.

loadams · 2025-04-09T15:19:54Z

nv-accelerate-v100 failed, raising "invalid command 'bdist_wheel'". However, this job succeeded in another run.
Note that the failed job used "cached wheel-0.46.1-py3-none-any.whl.metadata". @agronholm suggested setuptools >= 70.1 for wheel==0.46.0. Is this suggestion meaningful for DeepSpeed?
pypa/wheel#660 (comment)

Thanks @limjcst - I saw this failure on another PR and will take a look and merge the fixes into your PR when ready.

@limjcst - it looks like the wheel team has yanked the problemabic wheel version so tests should be passing again.

agronholm · 2025-04-09T15:21:51Z

Nevertheless, upgrading to setuptools >= 70.1 should prevent any future issues. I noticed that this project lacks a standard pyproject.toml which would be able to specify build requirements.

loadams · 2025-04-09T15:25:49Z

Nevertheless, upgrading to setuptools >= 70.1 should prevent any future issues. I noticed that this project lacks a standard pyproject.toml which would be able to specify build requirements.

@agronholm - yes, we have a PR for one here which we will prioritize merging as we know this is needed.

I want to reuse a composed module in the pipeline. For example, the following `MyModule` has a member `linear`, which is also a module. ```python class MyModule(torch.nn.Module): def __init__(self, n_in: int, n_out: int): super().__init__() self.linear = torch.nn.Linear(n_in, n_out) self.layer_norm = torch.nn.LayerNorm(n_out) def forward(self, data: torch.Tensor) -> torch.Tensor: hidden = self.linear(data) hidden = self.layer_norm(hidden) return hidden ``` `MyModule.linear.weight` should be synchronized among related ranks. As a result, I add `linear.weight` to `TiedLayerSpec.tied_weight_attr`. BTW, I generate the whole `tied_weight_attr` by the following instruction. ```python tied_weight_attr = [name for name, p in layer.named_parameters() if p.numel() > 1] ``` However, the builtin `getattr` used by `PipelineModule` fails to find a nested attribute like `linear.weight`. Hence, this PR first extends the builtin `getattr` to a recursive version `PipelineModule._recursive_getattr`, accessing each attribute segment one by one. Meanwhile, the order of tied weights matters in synchronization. This PR suggests to sort tie_keys in `PipelineModule._index_tied_modules` to avoid hanging. Signed-off-by: Mingjie Li <[email protected]> Co-authored-by: Mingjie Li <[email protected]> Co-authored-by: Masahiro Tanaka <[email protected]> Signed-off-by: yisheng <[email protected]>

I want to reuse a composed module in the pipeline. For example, the following `MyModule` has a member `linear`, which is also a module. ```python class MyModule(torch.nn.Module): def __init__(self, n_in: int, n_out: int): super().__init__() self.linear = torch.nn.Linear(n_in, n_out) self.layer_norm = torch.nn.LayerNorm(n_out) def forward(self, data: torch.Tensor) -> torch.Tensor: hidden = self.linear(data) hidden = self.layer_norm(hidden) return hidden ``` `MyModule.linear.weight` should be synchronized among related ranks. As a result, I add `linear.weight` to `TiedLayerSpec.tied_weight_attr`. BTW, I generate the whole `tied_weight_attr` by the following instruction. ```python tied_weight_attr = [name for name, p in layer.named_parameters() if p.numel() > 1] ``` However, the builtin `getattr` used by `PipelineModule` fails to find a nested attribute like `linear.weight`. Hence, this PR first extends the builtin `getattr` to a recursive version `PipelineModule._recursive_getattr`, accessing each attribute segment one by one. Meanwhile, the order of tied weights matters in synchronization. This PR suggests to sort tie_keys in `PipelineModule._index_tied_modules` to avoid hanging. Signed-off-by: Mingjie Li <[email protected]> Co-authored-by: Mingjie Li <[email protected]> Co-authored-by: Masahiro Tanaka <[email protected]> Signed-off-by: Max Kovalenko <[email protected]>

limjcst requested review from loadams and tohtana as code owners April 8, 2025 06:33

tohtana approved these changes Apr 8, 2025

View reviewed changes

Merge branch 'master' into master

765da1e

loadams added this pull request to the merge queue Apr 8, 2025

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Apr 8, 2025

loadams added this pull request to the merge queue Apr 9, 2025

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Apr 9, 2025

loadams added this pull request to the merge queue Apr 9, 2025

Merged via the queue into deepspeedai:master with commit 185330c Apr 9, 2025
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support complicated use cases with TiedLayerSpec #7208

Support complicated use cases with TiedLayerSpec #7208

Uh oh!

limjcst commented Apr 8, 2025

Uh oh!

tohtana commented Apr 8, 2025

Uh oh!

Uh oh!

Uh oh!

limjcst commented Apr 9, 2025

Uh oh!

loadams commented Apr 9, 2025

Uh oh!

loadams commented Apr 9, 2025

Uh oh!

agronholm commented Apr 9, 2025

Uh oh!

loadams commented Apr 9, 2025

Uh oh!

Uh oh!

Uh oh!

Support complicated use cases with TiedLayerSpec #7208

Support complicated use cases with TiedLayerSpec #7208

Uh oh!

Conversation

limjcst commented Apr 8, 2025

Uh oh!

tohtana commented Apr 8, 2025

Uh oh!

Uh oh!

Uh oh!

limjcst commented Apr 9, 2025

Uh oh!

loadams commented Apr 9, 2025

Uh oh!

loadams commented Apr 9, 2025

Uh oh!

agronholm commented Apr 9, 2025

Uh oh!

loadams commented Apr 9, 2025

Uh oh!

Uh oh!

Uh oh!