-
Notifications
You must be signed in to change notification settings - Fork 4.6k
Fix the GPU memory usage of ZeRO-Offload (only update stage_1_and_2.py) #7309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix the GPU memory usage of ZeRO-Offload (only update stage_1_and_2.py) #7309
Conversation
Why the DCO is so difficult to pass? |
Sorry about that. Is it okay that I fix it for you? |
Thank you very much! Please fix it if it's convenient for you. |
Okay, I will fix it as the last step when the PR is ready for merging. |
Thanks, sir. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
The
else
is for configurationpin_memory = false
-
It can already achieve the same shape as the
self.single_partition_of_fp32_groups
.
Single buffer can save CPU memory. But if I haven't misunderstood,self.single_partition_of_fp32_groups
has already be single partition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.
@sfc-gh-truwase Sir, how to fix the formatting? |
https://github.com/deepspeedai/DeepSpeed/blob/master/CONTRIBUTING.md#prerequisites |
Thanks! I think I need a little time to study this. |
@sfc-gh-truwase I really have no idea what format is right. I use the pre-commit, but it will have 1000+ lines need to be changed. |
Run pre-commit straightly will have all file scanned, while you only want to scan your changed files. The trick is run Start from the following link and ignore the 'run against all files' part.
|
@delock Thank you very much! I enabled the |
format
I'm very sorry. I just discovered the file |
@sfc-gh-truwase @delock I think the format is OK now. Do we need to re-start the test? |
Can you re-run the DCO test? |
@PKUWZP Mabey it is difficult now. Because it seems like I need a new commit, and a new commit needs re-run all the tests. |
@sfc-gh-truwase Sir, I think I need your help to fix the DCO failed problem. If we decide to merge the commit and it is convenient for you, please help me fix it. Thank you very much! |
…y) (deepspeedai#7309) Signed-off-by: Armin Zhu <[email protected]> Fix the memory usage of ZeRO-Offload with stage 1 and 2. Before the fix, the memory usage is about 3x that of params_FP16. This is caused by the H2D data copy is using different data type. Now the GPU memory usage is about 1x params_FP16. And the H2D memory copy needs a 16bit pinned memory buffer. Signed-off-by: Max Kovalenko <[email protected]>
…y) (deepspeedai#7309) Signed-off-by: Armin Zhu <[email protected]> Fix the memory usage of ZeRO-Offload with stage 1 and 2. Before the fix, the memory usage is about 3x that of params_FP16. This is caused by the H2D data copy is using different data type. Now the GPU memory usage is about 1x params_FP16. And the H2D memory copy needs a 16bit pinned memory buffer.
Signed-off-by: Armin Zhu [email protected]
Fix the memory usage of ZeRO-Offload with stage 1 and 2. Before the fix, the memory usage is about 3x that of params_FP16. This is caused by the H2D data copy is using different data type. Now the GPU memory usage is about 1x params_FP16. And the H2D memory copy needs a 16bit pinned memory buffer.