[Frontend] Add /classify endpoint #17032

frieda-huang · 2025-04-23T07:14:14Z

github-actions · 2025-04-23T07:14:25Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

DarkLight1337

Some initial comments. Can you also add an example script to examples/online_serving to show how to use this endpoint?

docs/source/models/pooling_models.md

docs/source/serving/openai_compatible_server.md

vllm/entrypoints/openai/serving_classification.py

frieda-huang · 2025-04-23T07:40:15Z

Some initial comments. Can you also add an example script to examples/online_serving to show how to use this endpoint?

Yes! It's openai_classification_client.py under examples/online_serving.

DarkLight1337 · 2025-04-23T07:51:49Z

examples/online_serving to show how to use this endpoint?

Oh, right. Sorry I missed that

frieda-huang · 2025-04-23T16:47:02Z

examples/online_serving to show how to use this endpoint?

Oh, right. Sorry I missed that

Really appreciate the feedback! I’ve implemented your suggestions in the latest commit.

I’ve noticed that serving_classification.py, serving_score.py, serving_transcription.py, and serving_embedding.py all share the same pattern of input preprocessing, request scheduling, and result‑generator aggregation. Should we move that pipeline into a shared helper or abstract base class?

vllm/entrypoints/openai/serving_classification.py

DarkLight1337 · 2025-04-24T02:51:17Z

I’ve noticed that serving_classification.py, serving_score.py, serving_transcription.py, and serving_embedding.py all share the same pattern of input preprocessing, request scheduling, and result‑generator aggregation. Should we move that pipeline into a shared helper or abstract base class?

Sure!

vllm/entrypoints/openai/serving_engine.py

DarkLight1337 · 2025-04-25T08:05:55Z

vllm/entrypoints/openai/serving_engine.py

encoding_format doesn't make much sense outside of pooling requests. Maybe we should have a separate mixin class for pooling request handlers?

encoding_format doesn't make much sense outside of pooling requests. Maybe we should have a separate mixin class for pooling request handlers?

Yeah. It looks like only serving_pool.py and serving_embedding.py are using it.

serving_embedding.py and serving_classification.py are straightforward to refactor; however, transcription and score each have some twists on the base pattern. I left the other endpoints untouched to avoid a large-scale overhaul.

Hi @DarkLight1337 The latest commit includes the following changes:

Added specialized contexts like

ClassificationServeContext

EmbeddingServeContext

The OpenAIServing class has the following to take care of common processing logic:

handle()

_pipeline()

_validate_request()

_prepare_generators()

_collect_batch()

Added two abstract classes including _preprocess() and _build_response, which I also added to the following classes to avoid mypy errors:

serving_chat.py

serving_completion.py

serving_tokenization.py

serving_score.py

serving_pooling.py

serving_transcription.py

I didn’t add a separate mixin for encoding_format because I found it’s already accessible via ctx.request.encoding_format, and I thought creating a mixin would be overkill.

I've introduced RequestT for request handling. I'm considering adding ResponseT as well, and updating OpenAIServing to class OpenAIServing(Generic[RequestT, ResponseT]). What are your thoughts on this approach? That way, we can do something like the following for all the subclasses:

class OpenAIServingEmbedding(OpenAIServing[EmbeddingRequest, EmbeddingResponse]): ...

async def handle(self, ctx: ServeContext[EmbeddingRequest]) -> Union[EmbeddingResponse, ErrorResponse]: # Process the embedding request return EmbeddingResponse(...) # or ErrorResponse if there's an error

Yes that's fine

vllm/entrypoints/openai/api_server.py

DarkLight1337 · 2025-04-27T03:11:20Z

vllm/entrypoints/openai/serving_completion.py

Are you going to address these in this PR?

I'm thinking of addressing these in a different PR, since this PR is specifically about /classify, what do you think?

If you can't do this in this PR then I suggest using a mixin for now until the other serving classes have also been migrated, so that we don't have dead code

For completeness, let me do the migration on the rest classes as well! Is there anything else regarding all the endpoints that I need to be aware of?

Not that I'm aware of. As long as the tests pass it should be fine

@DarkLight1337 Sorry for the delay. I was migrating the rest of subclasses and encountered a lot of issues regarding running tests on my local machine (M2 chip). test_chat.py for original implementation fails due to RuntimeError: Server exited unexpectedly but works just fine on a cloud GPU instance. It will take substantial hours than I expected. Given the demand for the endpoint, I will update the current PR by using the mixin approach. I will push the change tomorrow.

mergify · 2025-04-30T02:46:15Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @frieda-huang.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Frieda (Jingying) Huang <[email protected]>

…classify

DarkLight1337

Thanks for your time! LGTM if the tests pass

DarkLight1337 · 2025-04-30T17:02:17Z

There is a failing test that's relevant to this PR, please fix it

frieda-huang · 2025-04-30T17:44:38Z

There is a failing test that's relevant to this PR, please fix it

Ok. I'm on it.

Signed-off-by: Frieda (Jingying) Huang <[email protected]>

docs/source/serving/openai_compatible_server.md

Signed-off-by: Frieda (Jingying) Huang <[email protected]>

frieda-huang · 2025-05-09T23:12:11Z

There is a failing test that's relevant to this PR, please fix it

Hi @DarkLight1337. Do you know when my PR will get merged?

DarkLight1337 · 2025-05-10T02:41:29Z

Starting the merge process now, sorry for the delay!

frieda-huang · 2025-05-10T03:17:14Z

Starting the merge process now, sorry for the delay!

Thank you! Looks like some tests are failing. I'll fix them!

Signed-off-by: Frieda (Jingying) Huang <[email protected]> Signed-off-by: Mu Huai <[email protected]>

Signed-off-by: Frieda (Jingying) Huang <[email protected]>

Signed-off-by: Frieda (Jingying) Huang <[email protected]> Signed-off-by: Yuqi Zhang <[email protected]>

frieda-huang requested review from DarkLight1337, robertgshaw2-redhat and simon-mo as code owners April 23, 2025 07:14

mergify bot added documentation Improvements or additions to documentation frontend labels Apr 23, 2025

DarkLight1337 reviewed Apr 23, 2025

View reviewed changes

docs/source/models/pooling_models.md Outdated Show resolved Hide resolved

docs/source/serving/openai_compatible_server.md Outdated Show resolved Hide resolved

vllm/entrypoints/openai/serving_classification.py Outdated Show resolved Hide resolved

DarkLight1337 reviewed Apr 24, 2025

View reviewed changes

vllm/entrypoints/openai/serving_classification.py Outdated Show resolved Hide resolved

DarkLight1337 reviewed Apr 25, 2025

View reviewed changes

vllm/entrypoints/openai/serving_engine.py Outdated Show resolved Hide resolved

DarkLight1337 reviewed Apr 25, 2025

View reviewed changes

DarkLight1337 reviewed Apr 27, 2025

View reviewed changes

vllm/entrypoints/openai/api_server.py Outdated Show resolved Hide resolved

DarkLight1337 reviewed Apr 27, 2025

View reviewed changes

aarnphm mentioned this pull request Apr 29, 2025

[Usage]: How to use LLM.classify(...) through OpenAI endpoint? #17415

Closed

1 task

mergify bot added the needs-rebase label Apr 30, 2025

frieda-huang added 10 commits April 30, 2025 00:34

Add classification endpoint along with tests and example

aaeab80

Signed-off-by: Frieda (Jingying) Huang <[email protected]>

feat(classification): Add /classify endpoint, tests, and documentation

110e04b

Signed-off-by: Frieda (Jingying) Huang <[email protected]>

Reorder Classification API section & handle missing hf_config

35e46c6

Signed-off-by: Frieda (Jingying) Huang <[email protected]>

Refactor to standardize preprocessing for classify and embed endpoints

0a2789d

Signed-off-by: Frieda (Jingying) Huang <[email protected]>

Fix mypy issue

8775df1

Signed-off-by: Frieda (Jingying) Huang <[email protected]>

Resolve mypy type issues

3d26a4c

Signed-off-by: Frieda (Jingying) Huang <[email protected]>

Reorder /classify and /score

214558a

Signed-off-by: Frieda (Jingying) Huang <[email protected]>

Reorder /classify and /score

3b0b390

Signed-off-by: Frieda (Jingying) Huang <[email protected]>

Refactor to use mixin

f66f027

Signed-off-by: Frieda (Jingying) Huang <[email protected]>

Refactor to use mixin

1788d9e

Signed-off-by: Frieda (Jingying) Huang <[email protected]>

frieda-huang force-pushed the classify branch from e033d6d to 1788d9e Compare April 30, 2025 07:48

Merge branch 'vllm-project:main' into classify

2c641cc

mergify bot removed the needs-rebase label Apr 30, 2025

frieda-huang added 2 commits April 30, 2025 01:00

Fix formatting

55dcba1

Signed-off-by: Frieda (Jingying) Huang <[email protected]>

Merge branch 'classify' of https://github.com/frieda-huang/vllm into …

2ba02e5

…classify

DarkLight1337 approved these changes Apr 30, 2025

View reviewed changes

Fix test_max_truncation_size failing

d88967c

Signed-off-by: Frieda (Jingying) Huang <[email protected]>

aarnphm reviewed Apr 30, 2025

View reviewed changes

docs/source/serving/openai_compatible_server.md Show resolved Hide resolved

frieda-huang and others added 2 commits April 30, 2025 13:51

Fix test_embeddings; replace Config with ConfigDict

d4bb1b5

Signed-off-by: Frieda (Jingying) Huang <[email protected]>

Merge branch 'vllm-project:main' into classify

f7d85a7

DarkLight1337 enabled auto-merge (squash) May 10, 2025 02:41

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label May 10, 2025

Merge branch 'vllm-project:main' into classify

4c7a0dc

DarkLight1337 merged commit 9cea90e into vllm-project:main May 11, 2025
60 checks passed

juncgu mentioned this pull request May 11, 2025

[Bug]: import TypedDict from typing does not work for python<3.12 #17966

Closed

1 task

RichardoMrMu pushed a commit to RichardoMrMu/vllm that referenced this pull request May 12, 2025

[Frontend] Add /classify endpoint (vllm-project#17032)

f9147d3

Signed-off-by: Frieda (Jingying) Huang <[email protected]> Signed-off-by: Mu Huai <[email protected]>

mawong-amd pushed a commit to ROCm/vllm that referenced this pull request May 14, 2025

[Frontend] Add /classify endpoint (vllm-project#17032)

0b7570e

Signed-off-by: Frieda (Jingying) Huang <[email protected]>

zzzyq pushed a commit to zzzyq/vllm that referenced this pull request May 24, 2025

[Frontend] Add /classify endpoint (vllm-project#17032)

8ab6bb6

Signed-off-by: Frieda (Jingying) Huang <[email protected]> Signed-off-by: Yuqi Zhang <[email protected]>

Uh oh!

[Frontend] Add /classify endpoint #17032

[Frontend] Add /classify endpoint #17032

Uh oh!

Conversation

frieda-huang commented Apr 23, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Apr 23, 2025

Uh oh!

DarkLight1337 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

frieda-huang commented Apr 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DarkLight1337 commented Apr 23, 2025

Uh oh!

frieda-huang commented Apr 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

DarkLight1337 commented Apr 24, 2025

Uh oh!

Uh oh!

DarkLight1337 Apr 25, 2025

Choose a reason for hiding this comment

Uh oh!

frieda-huang Apr 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

frieda-huang Apr 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 Apr 27, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

DarkLight1337 Apr 27, 2025

Choose a reason for hiding this comment

Uh oh!

frieda-huang Apr 27, 2025

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 Apr 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

frieda-huang Apr 27, 2025

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 Apr 27, 2025

Choose a reason for hiding this comment

Uh oh!

frieda-huang Apr 30, 2025

Choose a reason for hiding this comment

Uh oh!

mergify bot commented Apr 30, 2025

Uh oh!

DarkLight1337 left a comment

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 commented Apr 30, 2025

Uh oh!

frieda-huang commented Apr 30, 2025

Uh oh!

Uh oh!

frieda-huang commented May 9, 2025

Uh oh!

DarkLight1337 commented May 10, 2025

Uh oh!

frieda-huang commented May 10, 2025

Uh oh!

Uh oh!

Uh oh!

frieda-huang commented Apr 23, 2025 •

edited by github-actions bot

Loading

frieda-huang commented Apr 23, 2025 •

edited

Loading

frieda-huang commented Apr 23, 2025 •

edited

Loading

frieda-huang Apr 25, 2025 •

edited

Loading

frieda-huang Apr 26, 2025 •

edited

Loading

DarkLight1337 Apr 27, 2025 •

edited

Loading