[BugFix][easy] Fix flaky test test_gpt_oss_multi_turn_chat #24549

lacora · 2025-09-10T01:48:14Z

Purpose

Current test on test_gpt_oss_multi_turn_chat is pretty flaky and fails >50% times.

Example of failed test asking for if to use Celsius or Fahrenheit :
root:test_serving_chat.py:187 lacora: response ChatCompletion(id='chatcmpl-9596f8584920486b8e0e22468e55e606', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='Sure! Would you like the temperature in Celsius or Fahrenheit?', refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=[], reasoning_content='We need to call the function get_current_weather with city "Dallas", state "TX", unit maybe default? The user didn't specify unit. We can ask for unit? But we can choose default. Probably ask for unit? The user didn't specify. We can ask: "Would you like Celsius or Fahrenheit?" But we can also default to Fahrenheit for US. Let's ask.'), stop_reason=None, token_ids=None)], created=1757467045, model='openai/gpt-oss-20b', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=97, prompt_tokens=153, total_tokens=250, completion_tokens_details=None, prompt_tokens_details=None), prompt_logprobs=None, prompt_token_ids=None, kv_transfer_params=None)

Example of successful test outputing function call:
root:test_serving_chat.py:187 lacora: response ChatCompletion(id='chatcmpl-04de7729d5a1444fbc7072b029b4a945', choices=[Choice(finish_reason='tool_calls', index=0, logprobs=None, message=ChatCompletionMessage(content='{"city":"Dallas","state":"TX","unit":"celsius"}', refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=[ChatCompletionMessageFunctionToolCall(id='chatcmpl-tool-3190042ee74c466db8b019aa3431dfc0', function=Function(arguments='{"city":"Dallas","state":"TX","unit":"celsius"}', name='get_current_weather'), type='function')], reasoning_content='We need to call the function get_current_weather with city "Dallas", state "TX", unit "celsius".'), stop_reason=200012, token_ids=None)], created=1757466163, model='openai/gpt-oss-20b', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=57, prompt_tokens=157, total_tokens=214, completion_tokens_details=None, prompt_tokens_details=None), prompt_logprobs=None, prompt_token_ids=None, kv_transfer_params=None)

Test Plan & Test Result

Fix and run multiple times all succeeded.

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

yeqcharlotte · 2025-09-10T04:00:22Z

thanks! cc: @heheda12345

Signed-off-by: lacora2017 <[email protected]>

…ect#24549) Signed-off-by: lacora2017 <[email protected]> Co-authored-by: lacora2017 <[email protected]>

lacora marked this pull request as ready for review September 10, 2025 01:48

lacora requested review from DarkLight1337, robertgshaw2-redhat, simon-mo and aarnphm as code owners September 10, 2025 01:48

mergify bot added the gpt-oss Related to GPT-OSS models label Sep 10, 2025

lacora changed the title ~~Fix flaky test test_gpt_oss_multi_turn_chat~~ [BugFix][easy] Fix flaky test test_gpt_oss_multi_turn_chat Sep 10, 2025

heheda12345 approved these changes Sep 10, 2025

View reviewed changes

heheda12345 enabled auto-merge (squash) September 10, 2025 04:08

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 10, 2025

lacora2017 added 3 commits September 9, 2025 21:59

Fix flaky test for serving_chat

26ce34a

Signed-off-by: lacora2017 <[email protected]>

Fix flaky test for serving_chat fix

a8426b4

Signed-off-by: lacora2017 <[email protected]>

Fix flaky test for serving_chat fix

12a32a6

Signed-off-by: lacora2017 <[email protected]>

auto-merge was automatically disabled September 10, 2025 04:59
Head branch was pushed to by a user without write access

lacora force-pushed the fix_test branch from d5dd5c3 to 12a32a6 Compare September 10, 2025 04:59

DarkLight1337 merged commit 0b9a612 into vllm-project:main Sep 10, 2025
16 checks passed

lacora deleted the fix_test branch September 12, 2025 22:24

skyloevil pushed a commit to skyloevil/vllm that referenced this pull request Sep 13, 2025

[BugFix][easy] Fix flaky test test_gpt_oss_multi_turn_chat (vllm-proj…

69ffdb6

…ect#24549) Signed-off-by: lacora2017 <[email protected]> Co-authored-by: lacora2017 <[email protected]>

FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025

[BugFix][easy] Fix flaky test test_gpt_oss_multi_turn_chat (vllm-proj…

d6bc820

…ect#24549) Signed-off-by: lacora2017 <[email protected]> Co-authored-by: lacora2017 <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[BugFix][easy] Fix flaky test test_gpt_oss_multi_turn_chat #24549

[BugFix][easy] Fix flaky test test_gpt_oss_multi_turn_chat #24549

Uh oh!

lacora commented Sep 10, 2025 •

edited by github-actions bot

Loading

Uh oh!

yeqcharlotte commented Sep 10, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[BugFix][easy] Fix flaky test test_gpt_oss_multi_turn_chat #24549

[BugFix][easy] Fix flaky test test_gpt_oss_multi_turn_chat #24549

Uh oh!

Conversation

lacora commented Sep 10, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan & Test Result

Uh oh!

yeqcharlotte commented Sep 10, 2025

Uh oh!

Uh oh!

Uh oh!

lacora commented Sep 10, 2025 •

edited by github-actions bot

Loading