-
Notifications
You must be signed in to change notification settings - Fork 4.6k
Fix AutoTP gathering replaced layer params when bias is not None #7257
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hi @Yejing-Lai can you also take a look at this PR? |
LGTM thanks! |
Fixed the formatting issue. |
CI error seems to be caused by the environment instead of this PR:
|
Yes @HollowMan6 - thanks for following up on this PR. This is a known CI issue I am working on and hope to have resolved ASAP. |
@HollowMan6 - the CI is working now and we see this error:
|
Head branch was pushed to by a user without write access
Did a fix according to the error message suggestion, hope this can make things work. |
hi @HollowMan6 ,You might consider printing the output. If they appear equal, try slightly relaxing the precision in allclose check. I've run into cases where allclose passed in my device but failed in CI environments. This might indicate that they're on the edge of a threshold. |
Thanks, I will check! |
Some params are one-dimentional, this PR adds support for these params. ```log with deepspeed.module_inject.layers.GatherReplacedLayerParams([param], model, enabled=True): ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "deepspeed/module_inject/layers.py", line 359, in __enter__ self.params[0].gather_params(self.params) File "torch/utils/_contextlib.py", line 116, in decorate_context return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "deepspeed/module_inject/layers.py", line 473, in gather_params param.shape[1], ~~~~~~~~~~~^^^ IndexError: tuple index out of range ``` Signed-off-by: Hollow Man <[email protected]>
the ci failure should be another issue, I will send a patch on this branch to fix the ci. |
https://github.com/HollowMan6/DeepSpeed/pull/1 @HollowMan6, you can merge this, I verified the ci has passed. hope this is helpful for you. thanks for your effort on this issue. |
Signed-off-by: inkcherry <[email protected]>
Signed-off-by: Hollow Man <[email protected]>
Signed-off-by: inkcherry <[email protected]>
…pspeedai#7257) Some params are one-dimensional, this PR adds support for these params. Resolve deepspeedai#7249 ```log param.shape torch.Size([768, 1536]) param.shape torch.Size([768]) ... ``` ```log with deepspeed.module_inject.layers.GatherReplacedLayerParams([param], model, enabled=True): ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "deepspeed/module_inject/layers.py", line 359, in __enter__ self.params[0].gather_params(self.params) File "torch/utils/_contextlib.py", line 116, in decorate_context return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "deepspeed/module_inject/layers.py", line 473, in gather_params param.shape[1], ~~~~~~~~~~~^^^ IndexError: tuple index out of range ``` --------- Signed-off-by: Hollow Man <[email protected]> Signed-off-by: inkcherry <[email protected]> Co-authored-by: Hongwei Chen <[email protected]> Co-authored-by: inkcherry <[email protected]> Signed-off-by: Max Kovalenko <[email protected]>
Some params are one-dimensional, this PR adds support for these params.
Resolve #7249