[Refactor] Introduce basic Renderer for completion-style request #24010

sfeng33 · 2025-09-01T00:40:07Z

Purpose

This PR introduces a basic Renderer component that aims to consolidate vLLM's fragmented input processing pipeline, implementing the design outlined in RFC #22880. The Renderer will serve as a single unified entry point for converting high-level API requests into tokenized formats ready for engine consumption.

Changes

Core Implementation

BaseRenderer: Abstract base class defining the unified input processing interface.
Renderer.render_prompt: Concrete implementation for completion-style requests handling text/token inputs, truncation, and validation.

Integration Updates

Updated two endpoints to use render_prompt for input processing:

Pooling endpoint
Tokenize endpoint

Test Plan

Unit test

python -m pytest tests/entrypoints/test_renderer.py -v
python -m pytest tests/entrypoints/openai/test_tokenization.py -v
python -m pytest tests/entrypoints/openai/test_pooling.py -v

Manual test

python -m vllm.entrypoints.openai.api_server \
  --model BAAI/bge-base-en-v1.5 \
  --port 8000

curl -X POST http://localhost:8000/pooling \
-H "Content-Type: application/json" \
-d '{
  "model": "BAAI/bge-base-en-v1.5",
  "input": ["First sentence", "Second sentence", "Third sentence"]
}'

curl -X POST http://localhost:8000/pooling \
-H "Content-Type: application/json" \
-d '{
  "model": "BAAI/bge-base-en-v1.5",
  "input": [[101, 2028, 102], [101, 2048, 102], [101, 2093, 102]],
  "truncate_prompt_tokens": 1
}'

vllm/entrypoints/renderer.py

DarkLight1337 · 2025-09-01T03:58:00Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a Renderer class to centralize input processing, which is a good step towards simplifying the codebase. The refactoring of the pooling and tokenization endpoints to use this new component is well-executed. However, I've found a few critical issues in the new Renderer implementation and its integration. One issue can lead to incorrect prompt ordering in batched requests with mixed input types. Another is a regression where handling for a specific parameter value is missing. Additionally, there's a bug in the logging logic that will cause runtime errors. Addressing these issues will ensure the new component is robust and correct.

vllm/entrypoints/openai/serving_engine.py

vllm/entrypoints/renderer.py

mergify · 2025-09-01T06:16:26Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @sfeng33.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

vllm/entrypoints/renderer.py

Signed-off-by: sfeng33 <[email protected]>

…m-project#24010) Signed-off-by: sfeng33 <[email protected]> Signed-off-by: JasonZhu1313 <[email protected]>

…m-project#24010) Signed-off-by: sfeng33 <[email protected]>

…m-project#24010) Signed-off-by: sfeng33 <[email protected]> Signed-off-by: LopezCastroRoberto <[email protected]>

mergify bot added the frontend label Sep 1, 2025

sfeng33 changed the title ~~[Refactor] Implement basic Renderer~~ [Refactor] Introduce basic Renderer for completion-style request Sep 1, 2025

sfeng33 marked this pull request as ready for review September 1, 2025 01:15

sfeng33 requested review from DarkLight1337, robertgshaw2-redhat, simon-mo and aarnphm as code owners September 1, 2025 01:15