Skip to content

Conversation

lacora
Copy link
Contributor

@lacora lacora commented Sep 10, 2025

Purpose

Current test on test_gpt_oss_multi_turn_chat is pretty flaky and fails >50% times.

Example of failed test asking for if to use Celsius or Fahrenheit :
root:test_serving_chat.py:187 lacora: response ChatCompletion(id='chatcmpl-9596f8584920486b8e0e22468e55e606', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='Sure! Would you like the temperature in Celsius or Fahrenheit?', refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=[], reasoning_content='We need to call the function get_current_weather with city "Dallas", state "TX", unit maybe default? The user didn't specify unit. We can ask for unit? But we can choose default. Probably ask for unit? The user didn't specify. We can ask: "Would you like Celsius or Fahrenheit?" But we can also default to Fahrenheit for US. Let's ask.'), stop_reason=None, token_ids=None)], created=1757467045, model='openai/gpt-oss-20b', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=97, prompt_tokens=153, total_tokens=250, completion_tokens_details=None, prompt_tokens_details=None), prompt_logprobs=None, prompt_token_ids=None, kv_transfer_params=None)

Example of successful test outputing function call:
root:test_serving_chat.py:187 lacora: response ChatCompletion(id='chatcmpl-04de7729d5a1444fbc7072b029b4a945', choices=[Choice(finish_reason='tool_calls', index=0, logprobs=None, message=ChatCompletionMessage(content='{"city":"Dallas","state":"TX","unit":"celsius"}', refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=[ChatCompletionMessageFunctionToolCall(id='chatcmpl-tool-3190042ee74c466db8b019aa3431dfc0', function=Function(arguments='{"city":"Dallas","state":"TX","unit":"celsius"}', name='get_current_weather'), type='function')], reasoning_content='We need to call the function get_current_weather with city "Dallas", state "TX", unit "celsius".'), stop_reason=200012, token_ids=None)], created=1757466163, model='openai/gpt-oss-20b', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=57, prompt_tokens=157, total_tokens=214, completion_tokens_details=None, prompt_tokens_details=None), prompt_logprobs=None, prompt_token_ids=None, kv_transfer_params=None)

Test Plan & Test Result

Fix and run multiple times all succeeded.


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

@lacora lacora marked this pull request as ready for review September 10, 2025 01:48
@mergify mergify bot added the gpt-oss Related to GPT-OSS models label Sep 10, 2025
@lacora lacora changed the title Fix flaky test test_gpt_oss_multi_turn_chat [BugFix][easy] Fix flaky test test_gpt_oss_multi_turn_chat Sep 10, 2025
@yeqcharlotte
Copy link
Collaborator

thanks! cc: @heheda12345

@heheda12345 heheda12345 enabled auto-merge (squash) September 10, 2025 04:08
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 10, 2025
auto-merge was automatically disabled September 10, 2025 04:59

Head branch was pushed to by a user without write access

@DarkLight1337 DarkLight1337 merged commit 0b9a612 into vllm-project:main Sep 10, 2025
16 checks passed
@lacora lacora deleted the fix_test branch September 12, 2025 22:24
skyloevil pushed a commit to skyloevil/vllm that referenced this pull request Sep 13, 2025
FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
gpt-oss Related to GPT-OSS models ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants