-
-
Notifications
You must be signed in to change notification settings - Fork 10.1k
[Refactor] Introduce basic Renderer for completion-style request #24010
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a Renderer
class to centralize input processing, which is a good step towards simplifying the codebase. The refactoring of the pooling and tokenization endpoints to use this new component is well-executed. However, I've found a few critical issues in the new Renderer
implementation and its integration. One issue can lead to incorrect prompt ordering in batched requests with mixed input types. Another is a regression where handling for a specific parameter value is missing. Additionally, there's a bug in the logging logic that will cause runtime errors. Addressing these issues will ensure the new component is robust and correct.
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: sfeng33 <[email protected]>
Signed-off-by: sfeng33 <[email protected]>
Signed-off-by: sfeng33 <[email protected]>
Signed-off-by: sfeng33 <[email protected]>
Signed-off-by: sfeng33 <[email protected]>
Signed-off-by: sfeng33 <[email protected]>
Signed-off-by: sfeng33 <[email protected]>
Signed-off-by: sfeng33 <[email protected]>
Signed-off-by: sfeng33 <[email protected]>
Signed-off-by: sfeng33 <[email protected]>
…m-project#24010) Signed-off-by: sfeng33 <[email protected]> Signed-off-by: JasonZhu1313 <[email protected]>
…m-project#24010) Signed-off-by: sfeng33 <[email protected]>
…m-project#24010) Signed-off-by: sfeng33 <[email protected]> Signed-off-by: LopezCastroRoberto <[email protected]>
Purpose
This PR introduces a basic Renderer component that aims to consolidate vLLM's fragmented input processing pipeline, implementing the design outlined in RFC #22880. The Renderer will serve as a single unified entry point for converting high-level API requests into tokenized formats ready for engine consumption.
Changes
Core Implementation
Integration Updates
Updated two endpoints to use render_prompt for input processing:
Test Plan