Skip to content

Conversation

frieda-huang
Copy link
Contributor

@frieda-huang frieda-huang commented Apr 23, 2025

FIX #13567
FIX #17415

Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@mergify mergify bot added documentation Improvements or additions to documentation frontend labels Apr 23, 2025
Copy link
Member

@DarkLight1337 DarkLight1337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some initial comments. Can you also add an example script to examples/online_serving to show how to use this endpoint?

@frieda-huang
Copy link
Contributor Author

frieda-huang commented Apr 23, 2025

Some initial comments. Can you also add an example script to examples/online_serving to show how to use this endpoint?

Yes! It's openai_classification_client.py under examples/online_serving.

@DarkLight1337
Copy link
Member

examples/online_serving to show how to use this endpoint?

Oh, right. Sorry I missed that

@frieda-huang
Copy link
Contributor Author

frieda-huang commented Apr 23, 2025

examples/online_serving to show how to use this endpoint?

Oh, right. Sorry I missed that

Really appreciate the feedback! I’ve implemented your suggestions in the latest commit.

I’ve noticed that serving_classification.py, serving_score.py, serving_transcription.py, and serving_embedding.py all share the same pattern of input preprocessing, request scheduling, and result‑generator aggregation. Should we move that pipeline into a shared helper or abstract base class?

@DarkLight1337
Copy link
Member

I’ve noticed that serving_classification.py, serving_score.py, serving_transcription.py, and serving_embedding.py all share the same pattern of input preprocessing, request scheduling, and result‑generator aggregation. Should we move that pipeline into a shared helper or abstract base class?

Sure!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

encoding_format doesn't make much sense outside of pooling requests. Maybe we should have a separate mixin class for pooling request handlers?

Copy link
Contributor Author

@frieda-huang frieda-huang Apr 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

encoding_format doesn't make much sense outside of pooling requests. Maybe we should have a separate mixin class for pooling request handlers?

Yeah. It looks like only serving_pool.py and serving_embedding.py are using it.

serving_embedding.py and serving_classification.py are straightforward to refactor; however, transcription and score each have some twists on the base pattern. I left the other endpoints untouched to avoid a large-scale overhaul.

Copy link
Contributor Author

@frieda-huang frieda-huang Apr 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @DarkLight1337 The latest commit includes the following changes:

  • Added specialized contexts like

    • ClassificationServeContext
    • EmbeddingServeContext
  • The OpenAIServing class has the following to take care of common processing logic:

    • handle()
    • _pipeline()
    • _validate_request()
    • _prepare_generators()
    • _collect_batch()
  • Added two abstract classes including _preprocess() and _build_response, which I also added to the following classes to avoid mypy errors:

    • serving_chat.py
    • serving_completion.py
    • serving_tokenization.py
    • serving_score.py
    • serving_pooling.py
    • serving_transcription.py
  • I didn’t add a separate mixin for encoding_format because I found it’s already accessible via ctx.request.encoding_format, and I thought creating a mixin would be overkill.

  • I've introduced RequestT for request handling. I'm considering adding ResponseT as well, and updating OpenAIServing to class OpenAIServing(Generic[RequestT, ResponseT]). What are your thoughts on this approach? That way, we can do something like the following for all the subclasses:

class OpenAIServingEmbedding(OpenAIServing[EmbeddingRequest, EmbeddingResponse]):
    ...
async def handle(self, ctx: ServeContext[EmbeddingRequest]) -> Union[EmbeddingResponse, ErrorResponse]:
    # Process the embedding request
    return EmbeddingResponse(...)  # or ErrorResponse if there's an error

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that's fine

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you going to address these in this PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking of addressing these in a different PR, since this PR is specifically about /classify, what do you think?

Copy link
Member

@DarkLight1337 DarkLight1337 Apr 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you can't do this in this PR then I suggest using a mixin for now until the other serving classes have also been migrated, so that we don't have dead code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For completeness, let me do the migration on the rest classes as well! Is there anything else regarding all the endpoints that I need to be aware of?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that I'm aware of. As long as the tests pass it should be fine

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DarkLight1337 Sorry for the delay. I was migrating the rest of subclasses and encountered a lot of issues regarding running tests on my local machine (M2 chip). test_chat.py for original implementation fails due to RuntimeError: Server exited unexpectedly but works just fine on a cloud GPU instance. It will take substantial hours than I expected. Given the demand for the endpoint, I will update the current PR by using the mixin approach. I will push the change tomorrow.

Copy link

mergify bot commented Apr 30, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @frieda-huang.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Apr 30, 2025
Signed-off-by: Frieda (Jingying) Huang <[email protected]>
Signed-off-by: Frieda (Jingying) Huang <[email protected]>
Signed-off-by: Frieda (Jingying) Huang <[email protected]>
Signed-off-by: Frieda (Jingying) Huang <[email protected]>
Signed-off-by: Frieda (Jingying) Huang <[email protected]>
Signed-off-by: Frieda (Jingying) Huang <[email protected]>
@mergify mergify bot removed the needs-rebase label Apr 30, 2025
Copy link
Member

@DarkLight1337 DarkLight1337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your time! LGTM if the tests pass

@DarkLight1337
Copy link
Member

There is a failing test that's relevant to this PR, please fix it

@frieda-huang
Copy link
Contributor Author

There is a failing test that's relevant to this PR, please fix it

Ok. I'm on it.

Signed-off-by: Frieda (Jingying) Huang <[email protected]>
@frieda-huang
Copy link
Contributor Author

There is a failing test that's relevant to this PR, please fix it

Hi @DarkLight1337. Do you know when my PR will get merged?

@DarkLight1337 DarkLight1337 enabled auto-merge (squash) May 10, 2025 02:41
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label May 10, 2025
@DarkLight1337
Copy link
Member

Starting the merge process now, sorry for the delay!

@frieda-huang
Copy link
Contributor Author

Starting the merge process now, sorry for the delay!

Thank you! Looks like some tests are failing. I'll fix them!

@DarkLight1337 DarkLight1337 merged commit 9cea90e into vllm-project:main May 11, 2025
60 checks passed
RichardoMrMu pushed a commit to RichardoMrMu/vllm that referenced this pull request May 12, 2025
Signed-off-by: Frieda (Jingying) Huang <[email protected]>
Signed-off-by: Mu Huai <[email protected]>
mawong-amd pushed a commit to ROCm/vllm that referenced this pull request May 14, 2025
zzzyq pushed a commit to zzzyq/vllm that referenced this pull request May 24, 2025
Signed-off-by: Frieda (Jingying) Huang <[email protected]>
Signed-off-by: Yuqi Zhang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation frontend ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Usage]: How to use LLM.classify(...) through OpenAI endpoint? [Feature]: Support for Running Classification Task in Online Server
3 participants