-
-
Notifications
You must be signed in to change notification settings - Fork 10.4k
fix(tests): Ensure reliable CUDA cache clearing in MoE test #23416
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
In `test_mixtral_moe`, `torch.cuda.empty_cache()` is called immediately after an `F.pad` operation. Due to the asynchronous nature of CUDA kernels, there's no guarantee that the padding operation has completed on the GPU before the cache is cleared. This can make the `empty_cache()` call ineffective. This commit adds `torch.cuda.synchronize()` before `torch.cuda.empty_cache()` to ensure all pending GPU work is finished, making memory management in the test more reliable and deterministic. Signed-off-by: AzizCode92 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request correctly addresses a potential race condition in the test_mixtral_moe
test by adding torch.cuda.synchronize()
before torch.cuda.empty_cache()
. This ensures that asynchronous CUDA operations complete before attempting to clear the cache, making the test more reliable. I have one suggestion to further improve the efficiency by reducing the number of synchronization points.
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run You ask your reviewers to trigger select CI tests on top of Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add If you have any questions, please reach out to us on Slack at https://slack.vllm.ai. 🚀 |
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Michael Goin <[email protected]>
Thanks for the find! Although I haven't seen this test flaky before, it makes sense. |
…ject#23416) Signed-off-by: AzizCode92 <[email protected]> Signed-off-by: Michael Goin <[email protected]> Co-authored-by: Michael Goin <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
…ject#23416) Signed-off-by: AzizCode92 <[email protected]> Signed-off-by: Michael Goin <[email protected]> Co-authored-by: Michael Goin <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Xiao Yu <[email protected]>
…ject#23416) Signed-off-by: AzizCode92 <[email protected]> Signed-off-by: Michael Goin <[email protected]> Co-authored-by: Michael Goin <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
…ject#23416) Signed-off-by: AzizCode92 <[email protected]> Signed-off-by: Michael Goin <[email protected]> Co-authored-by: Michael Goin <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
…ject#23416) Signed-off-by: AzizCode92 <[email protected]> Signed-off-by: Michael Goin <[email protected]> Co-authored-by: Michael Goin <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
…ject#23416) Signed-off-by: AzizCode92 <[email protected]> Signed-off-by: Michael Goin <[email protected]> Co-authored-by: Michael Goin <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
In
test_mixtral_moe
,torch.cuda.empty_cache()
is called immediately after anF.pad
operation.Due to the asynchronous nature of CUDA kernels, there's no guarantee that the padding operation has completed on the GPU before the cache is cleared. This can make the
empty_cache()
call ineffective.This commit adds
torch.cuda.synchronize()
beforetorch.cuda.empty_cache()
to ensure all pending GPU work is finished, making memory management in the test more reliable and deterministic.Purpose
This PR improves the reliability of memory management in the test_mixtral_moe test by addressing a potential race condition between the CPU and GPU.
Currently, torch.cuda.empty_cache() is called immediately after an asynchronous CUDA operation (F.pad). There is no guarantee that the GPU has finished the operation before the cache-clearing command is issued, which can render empty_cache() ineffective.
This change introduces a torch.cuda.synchronize() call before empty_cache() to ensure all pending GPU kernels are complete, making the test's memory handling more robust and deterministic.
Test Plan
Test Result
(Optional) Documentation Update
Essential Elements of an Effective PR Description Checklist
supported_models.md
andexamples
for a new model.