-
-
Notifications
You must be signed in to change notification settings - Fork 10.4k
[Frontend] Add /classify endpoint #17032
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some initial comments. Can you also add an example script to examples/online_serving
to show how to use this endpoint?
Yes! It's |
Oh, right. Sorry I missed that |
Really appreciate the feedback! I’ve implemented your suggestions in the latest commit. I’ve noticed that |
Sure! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
encoding_format
doesn't make much sense outside of pooling requests. Maybe we should have a separate mixin class for pooling request handlers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
encoding_format
doesn't make much sense outside of pooling requests. Maybe we should have a separate mixin class for pooling request handlers?
Yeah. It looks like only serving_pool.py
and serving_embedding.py
are using it.
serving_embedding.py
and serving_classification.py
are straightforward to refactor; however, transcription
and score
each have some twists on the base pattern. I left the other endpoints untouched to avoid a large-scale overhaul.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @DarkLight1337 The latest commit includes the following changes:
-
Added specialized contexts like
ClassificationServeContext
EmbeddingServeContext
-
The
OpenAIServing
class has the following to take care of common processing logic:handle()
_pipeline()
_validate_request()
_prepare_generators()
_collect_batch()
-
Added two abstract classes including
_preprocess()
and_build_response
, which I also added to the following classes to avoid mypy errors:serving_chat.py
serving_completion.py
serving_tokenization.py
serving_score.py
serving_pooling.py
serving_transcription.py
-
I didn’t add a separate mixin for
encoding_format
because I found it’s already accessible viactx.request.encoding_format
, and I thought creating a mixin would be overkill. -
I've introduced
RequestT
for request handling. I'm considering addingResponseT
as well, and updatingOpenAIServing
to classOpenAIServing(Generic[RequestT, ResponseT])
. What are your thoughts on this approach? That way, we can do something like the following for all the subclasses:
class OpenAIServingEmbedding(OpenAIServing[EmbeddingRequest, EmbeddingResponse]):
...
async def handle(self, ctx: ServeContext[EmbeddingRequest]) -> Union[EmbeddingResponse, ErrorResponse]:
# Process the embedding request
return EmbeddingResponse(...) # or ErrorResponse if there's an error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that's fine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you going to address these in this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking of addressing these in a different PR, since this PR is specifically about /classify
, what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you can't do this in this PR then I suggest using a mixin for now until the other serving classes have also been migrated, so that we don't have dead code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For completeness, let me do the migration on the rest classes as well! Is there anything else regarding all the endpoints that I need to be aware of?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not that I'm aware of. As long as the tests pass it should be fine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@DarkLight1337 Sorry for the delay. I was migrating the rest of subclasses and encountered a lot of issues regarding running tests on my local machine (M2 chip). test_chat.py
for original implementation fails due to RuntimeError: Server exited unexpectedly
but works just fine on a cloud GPU instance. It will take substantial hours than I expected. Given the demand for the endpoint, I will update the current PR by using the mixin approach. I will push the change tomorrow.
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Frieda (Jingying) Huang <[email protected]>
Signed-off-by: Frieda (Jingying) Huang <[email protected]>
Signed-off-by: Frieda (Jingying) Huang <[email protected]>
Signed-off-by: Frieda (Jingying) Huang <[email protected]>
Signed-off-by: Frieda (Jingying) Huang <[email protected]>
Signed-off-by: Frieda (Jingying) Huang <[email protected]>
Signed-off-by: Frieda (Jingying) Huang <[email protected]>
Signed-off-by: Frieda (Jingying) Huang <[email protected]>
Signed-off-by: Frieda (Jingying) Huang <[email protected]>
Signed-off-by: Frieda (Jingying) Huang <[email protected]>
Signed-off-by: Frieda (Jingying) Huang <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your time! LGTM if the tests pass
There is a failing test that's relevant to this PR, please fix it |
Ok. I'm on it. |
Signed-off-by: Frieda (Jingying) Huang <[email protected]>
Signed-off-by: Frieda (Jingying) Huang <[email protected]>
Hi @DarkLight1337. Do you know when my PR will get merged? |
Starting the merge process now, sorry for the delay! |
Thank you! Looks like some tests are failing. I'll fix them! |
Signed-off-by: Frieda (Jingying) Huang <[email protected]> Signed-off-by: Mu Huai <[email protected]>
Signed-off-by: Frieda (Jingying) Huang <[email protected]>
Signed-off-by: Frieda (Jingying) Huang <[email protected]> Signed-off-by: Yuqi Zhang <[email protected]>
FIX #13567
FIX #17415