Fix the GPU memory usage of ZeRO-Offload (only update stage_1_and_2.py) #7309

arminzhu · 2025-05-23T04:04:26Z

Signed-off-by: Armin Zhu [email protected]

Fix the memory usage of ZeRO-Offload with stage 1 and 2. Before the fix, the memory usage is about 3x that of params_FP16. This is caused by the H2D data copy is using different data type. Now the GPU memory usage is about 1x params_FP16. And the H2D memory copy needs a 16bit pinned memory buffer.

arminzhu · 2025-05-23T04:19:14Z

Why the DCO is so difficult to pass?

sfc-gh-truwase · 2025-05-23T12:17:34Z

Why the DCO is so difficult to pass?

Sorry about that. Is it okay that I fix it for you?

deepspeed/runtime/zero/stage_1_and_2.py

arminzhu · 2025-05-23T13:48:28Z

Why the DCO is so difficult to pass?

Sorry about that. Is it okay that I fix it for you?

Thank you very much! Please fix it if it's convenient for you.

sfc-gh-truwase · 2025-05-23T13:50:13Z

Thank you very much! Please fix it if it's convenient for you.

Okay, I will fix it as the last step when the PR is ready for merging.

arminzhu · 2025-05-23T13:57:22Z

Thank you very much! Please fix it if it's convenient for you.

Okay, I will fix it as the last step when the PR is ready for merging.

Thanks, sir.

arminzhu

The else is for configuration pin_memory = false
It can already achieve the same shape as the self.single_partition_of_fp32_groups.
Single buffer can save CPU memory. But if I haven't misunderstood, self.single_partition_of_fp32_groups has already be single partition.

deepspeed/runtime/zero/stage_1_and_2.py

arminzhu

.

deepspeed/runtime/zero/stage_1_and_2.py

arminzhu · 2025-05-23T15:08:36Z

@sfc-gh-truwase Sir, how to fix the formatting?

sfc-gh-truwase · 2025-05-23T15:51:52Z

@sfc-gh-truwase Sir, how to fix the formatting?

https://github.com/deepspeedai/DeepSpeed/blob/master/CONTRIBUTING.md#prerequisites

arminzhu · 2025-05-23T15:59:13Z

@sfc-gh-truwase Sir, how to fix the formatting?

https://github.com/deepspeedai/DeepSpeed/blob/master/CONTRIBUTING.md#prerequisites

Thanks! I think I need a little time to study this.

arminzhu · 2025-05-25T14:08:23Z

@sfc-gh-truwase Sir, how to fix the formatting?

https://github.com/deepspeedai/DeepSpeed/blob/master/CONTRIBUTING.md#prerequisites

@sfc-gh-truwase I really have no idea what format is right. I use the pre-commit, but it will have 1000+ lines need to be changed.
Is there has some configurations of pre-commit?

delock · 2025-05-26T02:48:00Z

Run pre-commit straightly will have all file scanned, while you only want to scan your changed files. The trick is run pre-commit install to trigger pre-commit when you commit, then follow from that.

Start from the following link and ignore the 'run against all files' part.
https://pre-commit.com/#3-install-the-git-hook-scripts

@sfc-gh-truwase Sir, how to fix the formatting?

https://github.com/deepspeedai/DeepSpeed/blob/master/CONTRIBUTING.md#prerequisites

@sfc-gh-truwase I really have no idea what format is right. I use the pre-commit, but it will have 1000+ lines need to be changed. Is there has some configurations of pre-commit?

format

arminzhu · 2025-05-26T05:37:00Z

Run pre-commit straightly will have all file scanned, while you only want to scan your changed files. The trick is run pre-commit install to trigger pre-commit when you commit, then follow from that.

Start from the following link and ignore the 'run against all files' part. https://pre-commit.com/#3-install-the-git-hook-scripts

@sfc-gh-truwase Sir, how to fix the formatting?

https://github.com/deepspeedai/DeepSpeed/blob/master/CONTRIBUTING.md#prerequisites

@sfc-gh-truwase I really have no idea what format is right. I use the pre-commit, but it will have 1000+ lines need to be changed. Is there has some configurations of pre-commit?

@delock Thank you very much! I enabled the black in the file .pre-commit-config.yaml and check the format of stage_1_and_2.py. But it still have 1000+ lines code need to be changed. (I used the pre-commit on Windows 11.)
This really stumped me >_<.

format

arminzhu · 2025-05-26T05:56:51Z

Run pre-commit straightly will have all file scanned, while you only want to scan your changed files. The trick is run pre-commit install to trigger pre-commit when you commit, then follow from that.
Start from the following link and ignore the 'run against all files' part. https://pre-commit.com/#3-install-the-git-hook-scripts

@sfc-gh-truwase Sir, how to fix the formatting?

https://github.com/deepspeedai/DeepSpeed/blob/master/CONTRIBUTING.md#prerequisites

@sfc-gh-truwase I really have no idea what format is right. I use the pre-commit, but it will have 1000+ lines need to be changed. Is there has some configurations of pre-commit?

@delock Thank you very much! I enabled the black in the file .pre-commit-config.yaml and check the format of stage_1_and_2.py. But it still have 1000+ lines code need to be changed. (I used the pre-commit on Windows 11.) This really stumped me >_<.

Run pre-commit straightly will have all file scanned, while you only want to scan your changed files. The trick is run pre-commit install to trigger pre-commit when you commit, then follow from that.

Start from the following link and ignore the 'run against all files' part. https://pre-commit.com/#3-install-the-git-hook-scripts

@sfc-gh-truwase Sir, how to fix the formatting?

https://github.com/deepspeedai/DeepSpeed/blob/master/CONTRIBUTING.md#prerequisites

@sfc-gh-truwase I really have no idea what format is right. I use the pre-commit, but it will have 1000+ lines need to be changed. Is there has some configurations of pre-commit?

I'm very sorry. I just discovered the file .pre-commit-config.yaml in this repository. And it seems need .flake8, .style.yapf and scripts, too.
I think the format is OK now. Can we re-start the test?

arminzhu · 2025-05-26T09:47:57Z

@sfc-gh-truwase @delock I think the format is OK now. Do we need to re-start the test?

PKUWZP · 2025-05-27T06:56:44Z

@sfc-gh-truwase @delock I think the format is OK now. Do we need to re-start the test?

Can you re-run the DCO test?

arminzhu · 2025-05-27T08:23:56Z

@sfc-gh-truwase @delock I think the format is OK now. Do we need to re-start the test?

Can you re-run the DCO test?

@PKUWZP Mabey it is difficult now. Because it seems like I need a new commit, and a new commit needs re-run all the tests.

arminzhu · 2025-05-27T08:27:39Z

@sfc-gh-truwase @delock I think the format is OK now. Do we need to re-start the test?

Can you re-run the DCO test?

@PKUWZP Mabey it is difficult now. Because it seems like I need a new commit, and a new commit needs re-run all the tests.

@sfc-gh-truwase Sir, I think I need your help to fix the DCO failed problem. If we decide to merge the commit and it is convenient for you, please help me fix it. Thank you very much!

…y) (deepspeedai#7309) Signed-off-by: Armin Zhu <[email protected]> Fix the memory usage of ZeRO-Offload with stage 1 and 2. Before the fix, the memory usage is about 3x that of params_FP16. This is caused by the H2D data copy is using different data type. Now the GPU memory usage is about 1x params_FP16. And the H2D memory copy needs a 16bit pinned memory buffer. Signed-off-by: Max Kovalenko <[email protected]>

…y) (deepspeedai#7309) Signed-off-by: Armin Zhu <[email protected]> Fix the memory usage of ZeRO-Offload with stage 1 and 2. Before the fix, the memory usage is about 3x that of params_FP16. This is caused by the H2D data copy is using different data type. Now the GPU memory usage is about 1x params_FP16. And the H2D memory copy needs a 16bit pinned memory buffer.

Update stage_1_and_2.py

09ecb73

arminzhu requested review from tjruwase and tohtana as code owners May 23, 2025 04:04

sfc-gh-truwase reviewed May 23, 2025

View reviewed changes

deepspeed/runtime/zero/stage_1_and_2.py Show resolved Hide resolved

sfc-gh-truwase reviewed May 23, 2025

View reviewed changes

deepspeed/runtime/zero/stage_1_and_2.py Show resolved Hide resolved

arminzhu commented May 23, 2025

View reviewed changes

deepspeed/runtime/zero/stage_1_and_2.py Show resolved Hide resolved

deepspeed/runtime/zero/stage_1_and_2.py Show resolved Hide resolved

arminzhu commented May 23, 2025

View reviewed changes

deepspeed/runtime/zero/stage_1_and_2.py Show resolved Hide resolved

arminzhu and others added 3 commits May 25, 2025 21:20

Merge branch 'master' into arminzhu-Zero-Offload-patch-2

af85928

Update stage_1_and_2.py

87e0370

Update stage_1_and_2.py

4332b36

arminzhu added 3 commits May 26, 2025 13:10

Update stage_1_and_2.py

24c885d

Update stage_1_and_2.py

8114ca5

format

Update stage_1_and_2.py

cdf0e0b

format

Update stage_1_and_2.py

ba8ef07

format

sfc-gh-truwase approved these changes May 27, 2025

View reviewed changes

sfc-gh-truwase added this pull request to the merge queue May 27, 2025

Merged via the queue into deepspeedai:master with commit 17c8be0 May 27, 2025
13 checks passed

Fix the GPU memory usage of ZeRO-Offload (only update stage_1_and_2.py) #7309

Fix the GPU memory usage of ZeRO-Offload (only update stage_1_and_2.py) #7309

Uh oh!

Conversation

arminzhu commented May 23, 2025

Uh oh!

arminzhu commented May 23, 2025

Uh oh!

sfc-gh-truwase commented May 23, 2025

Uh oh!

Uh oh!

Uh oh!

arminzhu commented May 23, 2025

Uh oh!

sfc-gh-truwase commented May 23, 2025

Uh oh!

arminzhu commented May 23, 2025

Uh oh!

arminzhu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

arminzhu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

arminzhu commented May 23, 2025

Uh oh!

sfc-gh-truwase commented May 23, 2025

Uh oh!

arminzhu commented May 23, 2025

Uh oh!

arminzhu commented May 25, 2025

Uh oh!

delock commented May 26, 2025

Uh oh!

arminzhu commented May 26, 2025

Uh oh!

arminzhu commented May 26, 2025

Uh oh!

arminzhu commented May 26, 2025

Uh oh!

PKUWZP commented May 27, 2025

Uh oh!

arminzhu commented May 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

arminzhu commented May 27, 2025

Uh oh!

Uh oh!

Uh oh!

arminzhu commented May 27, 2025 •

edited

Loading