Skip to content

Conversation

csahithi
Copy link
Contributor

@csahithi csahithi commented Aug 29, 2025

Purpose

  • Reduce CI time for entrypoint tests by creating shared server for grouped tests
  • Removed v0 references in entrypoint tests
  • Replaced large models with smaller ones - hmellor/tiny-random-LlamaForCausalLM, microsoft/DialoGPT-small

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

@robertgshaw2-redhat
Copy link
Collaborator

wow! great job!

"--reasoning-parser", "deepseek_r1",
"--enable-auto-tool-choice",
"--tool-call-parser", "hermes",
"--disable-log-stats",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can remove --disable-log-stats and --disable-log-requests

Copy link

mergify bot commented Aug 29, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @csahithi.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Aug 29, 2025
@njhill njhill self-requested a review August 29, 2025 16:04
@csahithi csahithi marked this pull request as ready for review August 29, 2025 16:07
@njhill njhill changed the title Optimize entrypoints API server tests [CI] Optimize entrypoints API server tests Aug 29, 2025
Copy link
Member

@njhill njhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @csahithi this is great!!

Replaced large models with smaller ones - hmellor/tiny-random-LlamaForCausalLM, microsoft/DialoGPT-small

Is the reason for the latter that the former doesn't have a chat template?
If so we can just ask @hmellor to add the llama 3.2 chat template and replace them all with that.

Oh sorry I see that it does already have a chat template. Then I'm curious what's the reason for using microsoft/DialoGPT-small too?

I know you have ideas for possible further streamlining but in the interests of incremental improvement could we get this merged first?

Could you fix the merge conflicts and we can see what the new CI timings are like after that too.

@hmellor
Copy link
Member

hmellor commented Aug 30, 2025

If anything needs changing about hmellor/tiny-random-LlamaForCausalLM to make it more useful for our tests do let me know, vLLM testing is what I made it for and it's easy to update!

@csahithi csahithi force-pushed the entrypoint-tests-optimize branch from 4732592 to e799966 Compare September 2, 2025 23:25
@mergify mergify bot removed the needs-rebase label Sep 2, 2025
@csahithi csahithi force-pushed the entrypoint-tests-optimize branch 2 times, most recently from aac1623 to fba4775 Compare September 5, 2025 13:33
@njhill
Copy link
Member

njhill commented Sep 5, 2025

CI failures look related

Copy link

mergify bot commented Sep 7, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @csahithi.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Sep 7, 2025
Signed-off-by: Sahithi Chigurupati <[email protected]>
Signed-off-by: Sahithi Chigurupati <[email protected]>
@csahithi csahithi force-pushed the entrypoint-tests-optimize branch from fba4775 to bedeffd Compare September 8, 2025 01:29
@mergify mergify bot removed the needs-rebase label Sep 8, 2025
Signed-off-by: Sahithi Chigurupati <[email protected]>
@csahithi csahithi force-pushed the entrypoint-tests-optimize branch from bedeffd to d43d48b Compare September 8, 2025 01:40

with RemoteOpenAIServer(MODEL_NAME, args) as remote_server:
yield remote_server
MODEL_NAME = "hmellor/tiny-random-LlamaForCausalLM"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this also be made a fixture so that we don't have to keep anything in sync with conftest?


with RemoteOpenAIServer(MODEL_NAME, args) as remote_server:
yield remote_server
MODEL_NAME = "hmellor/tiny-random-LlamaForCausalLM"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this also be made a fixture so that we don't have to keep anything in sync with conftest?

from ...utils import RemoteOpenAIServer

MODEL_NAME = "Qwen/Qwen2.5-1.5B-Instruct"
MODEL_NAME = "hmellor/tiny-random-LlamaForCausalLM"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this also be made a fixture so that we don't have to keep anything in sync with conftest?

]
with RemoteOpenAIServer(MODEL_NAME, args) as remote_server:
yield remote_server
MODEL_NAME = "hmellor/tiny-random-LlamaForCausalLM"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this also be made a fixture so that we don't have to keep anything in sync with conftest?

Copy link
Member

@hmellor hmellor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make paths shorter should we rename the following directories:

  • basic_tests -> basic
  • correctness (unchanged)
  • embedding_tests -> embedding
  • individual_tests -> individual
  • lora_tests -> lora
  • multimodal_tests -> multimodal


# Use a small embeddings model for faster startup and smaller memory footprint.
# Since we are not testing any chat functionality,
# using a chat capable model is overkill.
MODEL_NAME = "intfloat/multilingual-e5-small"
MODEL_NAME = "hmellor/tiny-random-LlamaForCausalLM"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this also be made a fixture so that we don't have to keep anything in sync with conftest?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The video_server fixture is not used here, is that intentional?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The vision_server fixture is not used here, is that intentional?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The vision_server fixture is not used here, is that intentional?

Copy link

mergify bot commented Sep 8, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @csahithi.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Sep 8, 2025
@njhill
Copy link
Member

njhill commented Sep 8, 2025

@csahithi is out this week, we'll see if someone else can take this over

@njhill
Copy link
Member

njhill commented Sep 11, 2025

@debroy-rh has offered to work on this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

4 participants