feat(router): Parallel Acompletions #14462
Open
+761
−42
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
feat(router): Parallel Acompletions
Summary
parallel_acompletions
in the router to fan out concurrent sub-requests and aggregate results.Motivation
Diagram
Return shape
parallel_acompletions(...) -> List[RouterParallelResult]
(optionallypreserve_order=True
).iter_parallel_acompletions(...) -> AsyncIterator[RouterParallelResult]
(yields in completion order).What’s Included
Router
litellm/router_utils/parallel_acompletion.py
orchestration helper.litellm/router.py
gated behind an experimental flag.Docs/Tests
docs/my-website/docs/guides/parallel_acompletions.md
+ sidebar entry.tests/router/test_parallel_acompletions.py
.tests/router/test_parallel_acompletions_live_gemini.py
.Scope/Impact
Validation
pytest -n auto
.Risks/Follow-ups
Checklist
Ancillary: Tokenizer Stability
HF_HUB_ENABLE_HF_TRANSFER
during HF loads (avoid timeouts/noise when hf_transfer isn’t installed).tiktoken
when HF is unavailable or download/signature fails.fix(tokenizer): stabilize create_pretrained_tokenizer
.Links
Examples
Gather in one shot (preserve order)
Iterate as each finishes (completion order)