Skip to content

Conversation

AzizCode92
Copy link
Contributor

@AzizCode92 AzizCode92 commented Aug 22, 2025

In test_mixtral_moe, torch.cuda.empty_cache() is called immediately after an F.pad operation.

Due to the asynchronous nature of CUDA kernels, there's no guarantee that the padding operation has completed on the GPU before the cache is cleared. This can make the empty_cache() call ineffective.

This commit adds torch.cuda.synchronize() before torch.cuda.empty_cache() to ensure all pending GPU work is finished, making memory management in the test more reliable and deterministic.

Purpose

This PR improves the reliability of memory management in the test_mixtral_moe test by addressing a potential race condition between the CPU and GPU.

Currently, torch.cuda.empty_cache() is called immediately after an asynchronous CUDA operation (F.pad). There is no guarantee that the GPU has finished the operation before the cache-clearing command is issued, which can render empty_cache() ineffective.

This change introduces a torch.cuda.synchronize() call before empty_cache() to ensure all pending GPU kernels are complete, making the test's memory handling more robust and deterministic.

Test Plan

pytest tests/kernels/test_moe.py -k test_mixtral_moe

Test Result

(Optional) Documentation Update


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

In `test_mixtral_moe`, `torch.cuda.empty_cache()` is called immediately after an `F.pad` operation.

Due to the asynchronous nature of CUDA kernels, there's no guarantee that the padding operation has completed on the GPU before the cache is cleared. This can make the `empty_cache()` call ineffective.

This commit adds `torch.cuda.synchronize()` before `torch.cuda.empty_cache()` to ensure all pending GPU work is finished, making memory management in the test more reliable and deterministic.

Signed-off-by: AzizCode92 <[email protected]>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly addresses a potential race condition in the test_mixtral_moe test by adding torch.cuda.synchronize() before torch.cuda.empty_cache(). This ensures that asynchronous CUDA operations complete before attempting to clear the cache, making the test more reliable. I have one suggestion to further improve the efficiency by reducing the number of synchronization points.

Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Michael Goin <[email protected]>
@mgoin
Copy link
Member

mgoin commented Aug 22, 2025

Thanks for the find! Although I haven't seen this test flaky before, it makes sense.

@mgoin mgoin enabled auto-merge (squash) August 22, 2025 16:09
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Aug 22, 2025
@mgoin mgoin merged commit 341923b into vllm-project:main Aug 22, 2025
24 of 26 checks passed
epwalsh pushed a commit to epwalsh/vllm that referenced this pull request Aug 28, 2025
…ject#23416)

Signed-off-by: AzizCode92 <[email protected]>
Signed-off-by: Michael Goin <[email protected]>
Co-authored-by: Michael Goin <[email protected]>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
xiao-llm pushed a commit to xiao-llm/vllm that referenced this pull request Aug 28, 2025
…ject#23416)

Signed-off-by: AzizCode92 <[email protected]>
Signed-off-by: Michael Goin <[email protected]>
Co-authored-by: Michael Goin <[email protected]>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Xiao Yu <[email protected]>
zhewenl pushed a commit to zhewenl/vllm that referenced this pull request Aug 28, 2025
…ject#23416)

Signed-off-by: AzizCode92 <[email protected]>
Signed-off-by: Michael Goin <[email protected]>
Co-authored-by: Michael Goin <[email protected]>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
mengxingkongzhouhan pushed a commit to mengxingkongzhouhan/vllm that referenced this pull request Aug 30, 2025
…ject#23416)

Signed-off-by: AzizCode92 <[email protected]>
Signed-off-by: Michael Goin <[email protected]>
Co-authored-by: Michael Goin <[email protected]>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
zhewenl pushed a commit to zhewenl/vllm that referenced this pull request Sep 3, 2025
…ject#23416)

Signed-off-by: AzizCode92 <[email protected]>
Signed-off-by: Michael Goin <[email protected]>
Co-authored-by: Michael Goin <[email protected]>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025
…ject#23416)

Signed-off-by: AzizCode92 <[email protected]>
Signed-off-by: Michael Goin <[email protected]>
Co-authored-by: Michael Goin <[email protected]>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants