Skip to content

Conversation

arminzhu
Copy link
Contributor

Signed-off-by: Armin Zhu [email protected]

Fix the memory usage of ZeRO-Offload with stage 1 and 2. Before the fix, the memory usage is about 3x that of params_FP16. This is caused by the H2D data copy is using different data type. Now the GPU memory usage is about 1x params_FP16. And the H2D memory copy needs a 16bit pinned memory buffer.

@arminzhu arminzhu requested review from tjruwase and tohtana as code owners May 23, 2025 04:04
@arminzhu
Copy link
Contributor Author

Why the DCO is so difficult to pass?

@sfc-gh-truwase
Copy link
Collaborator

Why the DCO is so difficult to pass?

Sorry about that. Is it okay that I fix it for you?

@arminzhu
Copy link
Contributor Author

Why the DCO is so difficult to pass?

Sorry about that. Is it okay that I fix it for you?

Thank you very much! Please fix it if it's convenient for you.

@sfc-gh-truwase
Copy link
Collaborator

Thank you very much! Please fix it if it's convenient for you.

Okay, I will fix it as the last step when the PR is ready for merging.

@arminzhu
Copy link
Contributor Author

Thank you very much! Please fix it if it's convenient for you.

Okay, I will fix it as the last step when the PR is ready for merging.

Thanks, sir.

Copy link
Contributor Author

@arminzhu arminzhu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. The else is for configuration pin_memory = false

  2. It can already achieve the same shape as the self.single_partition_of_fp32_groups.
    Single buffer can save CPU memory. But if I haven't misunderstood, self.single_partition_of_fp32_groups has already be single partition.

Copy link
Contributor Author

@arminzhu arminzhu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.

@arminzhu
Copy link
Contributor Author

@sfc-gh-truwase Sir, how to fix the formatting?

@sfc-gh-truwase
Copy link
Collaborator

@arminzhu
Copy link
Contributor Author

@sfc-gh-truwase Sir, how to fix the formatting?

https://github.com/deepspeedai/DeepSpeed/blob/master/CONTRIBUTING.md#prerequisites

Thanks! I think I need a little time to study this.

@arminzhu
Copy link
Contributor Author

@sfc-gh-truwase Sir, how to fix the formatting?

https://github.com/deepspeedai/DeepSpeed/blob/master/CONTRIBUTING.md#prerequisites

@sfc-gh-truwase I really have no idea what format is right. I use the pre-commit, but it will have 1000+ lines need to be changed.
Is there has some configurations of pre-commit?

@delock
Copy link
Collaborator

delock commented May 26, 2025

Run pre-commit straightly will have all file scanned, while you only want to scan your changed files. The trick is run pre-commit install to trigger pre-commit when you commit, then follow from that.

Start from the following link and ignore the 'run against all files' part.
https://pre-commit.com/#3-install-the-git-hook-scripts

@sfc-gh-truwase Sir, how to fix the formatting?

https://github.com/deepspeedai/DeepSpeed/blob/master/CONTRIBUTING.md#prerequisites

@sfc-gh-truwase I really have no idea what format is right. I use the pre-commit, but it will have 1000+ lines need to be changed. Is there has some configurations of pre-commit?

@arminzhu
Copy link
Contributor Author

Run pre-commit straightly will have all file scanned, while you only want to scan your changed files. The trick is run pre-commit install to trigger pre-commit when you commit, then follow from that.

Start from the following link and ignore the 'run against all files' part. https://pre-commit.com/#3-install-the-git-hook-scripts

@sfc-gh-truwase Sir, how to fix the formatting?

https://github.com/deepspeedai/DeepSpeed/blob/master/CONTRIBUTING.md#prerequisites

@sfc-gh-truwase I really have no idea what format is right. I use the pre-commit, but it will have 1000+ lines need to be changed. Is there has some configurations of pre-commit?

@delock Thank you very much! I enabled the black in the file .pre-commit-config.yaml and check the format of stage_1_and_2.py. But it still have 1000+ lines code need to be changed. (I used the pre-commit on Windows 11.)
This really stumped me >_<.

@arminzhu
Copy link
Contributor Author

Run pre-commit straightly will have all file scanned, while you only want to scan your changed files. The trick is run pre-commit install to trigger pre-commit when you commit, then follow from that.
Start from the following link and ignore the 'run against all files' part. https://pre-commit.com/#3-install-the-git-hook-scripts

@sfc-gh-truwase Sir, how to fix the formatting?

https://github.com/deepspeedai/DeepSpeed/blob/master/CONTRIBUTING.md#prerequisites

@sfc-gh-truwase I really have no idea what format is right. I use the pre-commit, but it will have 1000+ lines need to be changed. Is there has some configurations of pre-commit?

@delock Thank you very much! I enabled the black in the file .pre-commit-config.yaml and check the format of stage_1_and_2.py. But it still have 1000+ lines code need to be changed. (I used the pre-commit on Windows 11.) This really stumped me >_<.

Run pre-commit straightly will have all file scanned, while you only want to scan your changed files. The trick is run pre-commit install to trigger pre-commit when you commit, then follow from that.

Start from the following link and ignore the 'run against all files' part. https://pre-commit.com/#3-install-the-git-hook-scripts

@sfc-gh-truwase Sir, how to fix the formatting?

https://github.com/deepspeedai/DeepSpeed/blob/master/CONTRIBUTING.md#prerequisites

@sfc-gh-truwase I really have no idea what format is right. I use the pre-commit, but it will have 1000+ lines need to be changed. Is there has some configurations of pre-commit?

I'm very sorry. I just discovered the file .pre-commit-config.yaml in this repository. And it seems need .flake8, .style.yapf and scripts, too.
I think the format is OK now. Can we re-start the test?

@arminzhu
Copy link
Contributor Author

@sfc-gh-truwase @delock I think the format is OK now. Do we need to re-start the test?

@PKUWZP
Copy link
Collaborator

PKUWZP commented May 27, 2025

@sfc-gh-truwase @delock I think the format is OK now. Do we need to re-start the test?

Can you re-run the DCO test?

@arminzhu
Copy link
Contributor Author

arminzhu commented May 27, 2025

@sfc-gh-truwase @delock I think the format is OK now. Do we need to re-start the test?

Can you re-run the DCO test?

@PKUWZP Mabey it is difficult now. Because it seems like I need a new commit, and a new commit needs re-run all the tests.

@arminzhu
Copy link
Contributor Author

@sfc-gh-truwase @delock I think the format is OK now. Do we need to re-start the test?

Can you re-run the DCO test?

@PKUWZP Mabey it is difficult now. Because it seems like I need a new commit, and a new commit needs re-run all the tests.

@sfc-gh-truwase Sir, I think I need your help to fix the DCO failed problem. If we decide to merge the commit and it is convenient for you, please help me fix it. Thank you very much!

@sfc-gh-truwase sfc-gh-truwase added this pull request to the merge queue May 27, 2025
Merged via the queue into deepspeedai:master with commit 17c8be0 May 27, 2025
13 checks passed
deepcharm pushed a commit to deepcharm/DeepSpeed that referenced this pull request Jun 16, 2025
…y) (deepspeedai#7309)

Signed-off-by: Armin Zhu <[email protected]>

Fix the memory usage of ZeRO-Offload with stage 1 and 2. Before the fix,
the memory usage is about 3x that of params_FP16. This is caused by the
H2D data copy is using different data type. Now the GPU memory usage is
about 1x params_FP16. And the H2D memory copy needs a 16bit pinned
memory buffer.

Signed-off-by: Max Kovalenko <[email protected]>
Antlera pushed a commit to Antlera/DeepSpeed that referenced this pull request Jun 27, 2025
…y) (deepspeedai#7309)

Signed-off-by: Armin Zhu <[email protected]>

Fix the memory usage of ZeRO-Offload with stage 1 and 2. Before the fix,
the memory usage is about 3x that of params_FP16. This is caused by the
H2D data copy is using different data type. Now the GPU memory usage is
about 1x params_FP16. And the H2D memory copy needs a 16bit pinned
memory buffer.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants