fix(tests): Ensure reliable CUDA cache clearing in MoE test #23416

AzizCode92 · 2025-08-22T07:19:45Z

In test_mixtral_moe, torch.cuda.empty_cache() is called immediately after an F.pad operation.

Due to the asynchronous nature of CUDA kernels, there's no guarantee that the padding operation has completed on the GPU before the cache is cleared. This can make the empty_cache() call ineffective.

This commit adds torch.cuda.synchronize() before torch.cuda.empty_cache() to ensure all pending GPU work is finished, making memory management in the test more reliable and deterministic.

Purpose

This PR improves the reliability of memory management in the test_mixtral_moe test by addressing a potential race condition between the CPU and GPU.

Currently, torch.cuda.empty_cache() is called immediately after an asynchronous CUDA operation (F.pad). There is no guarantee that the GPU has finished the operation before the cache-clearing command is issued, which can render empty_cache() ineffective.

This change introduces a torch.cuda.synchronize() call before empty_cache() to ensure all pending GPU kernels are complete, making the test's memory handling more robust and deterministic.

Test Plan

pytest tests/kernels/test_moe.py -k test_mixtral_moe

Test Result

(Optional) Documentation Update

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

In `test_mixtral_moe`, `torch.cuda.empty_cache()` is called immediately after an `F.pad` operation. Due to the asynchronous nature of CUDA kernels, there's no guarantee that the padding operation has completed on the GPU before the cache is cleared. This can make the `empty_cache()` call ineffective. This commit adds `torch.cuda.synchronize()` before `torch.cuda.empty_cache()` to ensure all pending GPU work is finished, making memory management in the test more reliable and deterministic. Signed-off-by: AzizCode92 <[email protected]>

gemini-code-assist

Code Review

This pull request correctly addresses a potential race condition in the test_mixtral_moe test by adding torch.cuda.synchronize() before torch.cuda.empty_cache(). This ensures that asynchronous CUDA operations complete before attempting to clear the cache, making the test more reliable. I have one suggestion to further improve the efficiency by reducing the number of synchronization points.

tests/kernels/moe/test_moe.py

github-actions · 2025-08-22T08:01:52Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Michael Goin <[email protected]>

mgoin · 2025-08-22T16:09:18Z

Thanks for the find! Although I haven't seen this test flaky before, it makes sense.

…ject#23416) Signed-off-by: AzizCode92 <[email protected]> Signed-off-by: Michael Goin <[email protected]> Co-authored-by: Michael Goin <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

…ject#23416) Signed-off-by: AzizCode92 <[email protected]> Signed-off-by: Michael Goin <[email protected]> Co-authored-by: Michael Goin <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Xiao Yu <[email protected]>

…ject#23416) Signed-off-by: AzizCode92 <[email protected]> Signed-off-by: Michael Goin <[email protected]> Co-authored-by: Michael Goin <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

AzizCode92 requested review from tlrmchlsmth, WoosukKwon and yewentao256 as code owners August 22, 2025 07:19

gemini-code-assist bot reviewed Aug 22, 2025

View reviewed changes

tests/kernels/moe/test_moe.py Outdated Show resolved Hide resolved

Update tests/kernels/moe/test_moe.py

c0fa5bb

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Michael Goin <[email protected]>

mgoin approved these changes Aug 22, 2025

View reviewed changes

mgoin enabled auto-merge (squash) August 22, 2025 16:09

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Aug 22, 2025

mgoin merged commit 341923b into vllm-project:main Aug 22, 2025
24 of 26 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

fix(tests): Ensure reliable CUDA cache clearing in MoE test #23416

fix(tests): Ensure reliable CUDA cache clearing in MoE test #23416

Uh oh!

AzizCode92 commented Aug 22, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

github-actions bot commented Aug 22, 2025

Uh oh!

mgoin commented Aug 22, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

fix(tests): Ensure reliable CUDA cache clearing in MoE test #23416

fix(tests): Ensure reliable CUDA cache clearing in MoE test #23416

Uh oh!

Conversation

AzizCode92 commented Aug 22, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

(Optional) Documentation Update

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

github-actions bot commented Aug 22, 2025

Uh oh!

mgoin commented Aug 22, 2025

Uh oh!

Uh oh!

Uh oh!

AzizCode92 commented Aug 22, 2025 •

edited by github-actions bot

Loading