feat(router): Parallel Acompletions #14462

grahama1970 · 2025-09-11T17:15:27Z

feat(router): Parallel Acompletions

Summary

Adds parallel_acompletions in the router to fan out concurrent sub-requests and aggregate results.

Motivation

Many production calls need parallel fanout (tooling, ensembles, multi-provider redundancy). Providing a first-class helper in the router reduces duplicated user logic and improves observability.

Diagram

flowchart TD
  A[Client Request] --> B[Router parallel acompletions]
  B --> C[Build requests and batch id]
  C --> S[Semaphore]
  S --> T1[run one 0]
  S --> T2[run one 1]
  T1 --> G[Gather results]
  T2 --> G
  G --> R[Aggregated results]
  R --> O[index, request, response or exception]

Return shape

parallel_acompletions(...) -> List[RouterParallelResult] (optionally preserve_order=True).
iter_parallel_acompletions(...) -> AsyncIterator[RouterParallelResult] (yields in completion order).

What’s Included

Router

New litellm/router_utils/parallel_acompletion.py orchestration helper.
Integration in litellm/router.py gated behind an experimental flag.
Minor auth/route compat tweaks in proxy code paths.

Docs/Tests

Guide: docs/my-website/docs/guides/parallel_acompletions.md + sidebar entry.
Unit test: tests/router/test_parallel_acompletions.py.
Live test (Gemini): tests/router/test_parallel_acompletions_live_gemini.py.

Scope/Impact

Default behavior unchanged; feature is gated via experimental flag.
No broad formatting or unrelated refactors.

Validation

Unit tests pass locally with pytest -n auto.
Router behavior verified via unit + live tests.

Risks/Follow-ups

Parallel fanout introduces concurrency. Isolated via helper and flag-gated; feedback on API ergonomics and metrics surface is welcome.
If CI enforces repo-wide formatting, maintainers may prefer “Squash and merge” to keep history tidy.

Checklist

Feature behind experimental flag
Docs and sidebar updated
Unit + live tests added

Ancillary: Tokenizer Stability

Rationale: stabilize tokenizer loading behavior in CI/offline environments without affecting public API. These changes were made simply to make parallel acompletions pass reliably in tests/CI and are not part of the feature surface.
Changes
- Use single-argument HF tokenizer loading to match mocks/tests.
- Temporarily disable HF_HUB_ENABLE_HF_TRANSFER during HF loads (avoid timeouts/noise when hf_transfer isn’t installed).
- Preserve robust fallback to tiktoken when HF is unavailable or download/signature fails.
Validation
- Verified tests that exercise tokenizer path no longer flake; fallback path exercised when HF unavailable.
- Non-breaking; same return types and behavior when HF is present.
- Lives in its own commit: fix(tokenizer): stabilize create_pretrained_tokenizer.

Links

Compare (upstream PR): https://github.com/BerriAI/litellm/compare/main...grahama1970:feat/parallel-acompletions-clean?expand=1

Examples

Gather in one shot (preserve order)

import os
import asyncio
from litellm import Router
from litellm.router_utils.parallel_acompletion import RouterParallelRequest

os.environ["LITELLM_ENABLE_PARALLEL_ACOMPLETIONS"] = "1"  # enable feature

async def main():
    router = Router(
        model_list=[{
            "model_name": "prod",
            "litellm_params": {"model": "gpt-3.5-turbo", "api_key": "sk-..."},
        }]
    )

    requests = [
        RouterParallelRequest(model="prod", messages=[{"role": "user", "content": "A"}]),
        RouterParallelRequest(model="prod", messages=[{"role": "user", "content": "B"}]),
        RouterParallelRequest(model="prod", messages=[{"role": "user", "content": "C"}]),
    ]

    results = await router.parallel_acompletions(
        requests,
        concurrency=2,
        preserve_order=True,    # returned list matches input order
        return_exceptions=True, # keep errors in result.exception
    )

    for r in results:
        if r.exception:
            print("error", r.index, r.exception)
        else:
            print("ok", r.index, r.response)

asyncio.run(main())

Iterate as each finishes (completion order)

import os
import asyncio
from litellm import Router
from litellm.router_utils.parallel_acompletion import RouterParallelRequest

os.environ["LITELLM_ENABLE_PARALLEL_ACOMPLETIONS"] = "1"

async def main():
    router = Router(model_list=[{"model_name": "prod", "litellm_params": {"model": "gpt-3.5-turbo", "api_key": "sk-..."}}])

    requests = [
        RouterParallelRequest(model="prod", messages=[{"role": "user", "content": "X"}]),
        RouterParallelRequest(model="prod", messages=[{"role": "user", "content": "Y"}]),
        RouterParallelRequest(model="prod", messages=[{"role": "user", "content": "Z"}]),
    ]

    try:
        async for r in router.iter_parallel_acompletions(requests, concurrency=3, return_exceptions=False):
            # if any call fails, iteration raises immediately (fail-fast)
            print("ok", r.index, r.response)
    except Exception as e:
        print("aborted due to:", e)

asyncio.run(main())

vercel · 2025-09-11T17:15:35Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Preview	Comments	Updated (UTC)
litellm	Ready	Preview	Comment	Sep 12, 2025 3:12pm

- Use single-arg HF tokenizer loading to match mocks/tests - Temporarily disable hf_transfer via context manager during HF loads - Preserve robust fallback to tiktoken when HF unavailable - Avoid network chatter/timeouts in CI when hf_transfer not installed

- Router parallel fanout helper + iterator - Docs and tests - Resolve rebase conflicts and keep commit clean

…-installed provider

…provider and use get_tracer only

vercel bot deployed to Preview September 11, 2025 17:16 View deployment

grahama1970 force-pushed the feat/parallel-acompletions-clean branch 2 times, most recently from dfb3a1d to 6dc2590 Compare September 11, 2025 17:27

vercel bot deployed to Preview September 11, 2025 17:29 View deployment

grahama1970 added 2 commits September 12, 2025 08:03

feat(router): add experimental parallel_acompletions + docs + tests

9d251a7

- Router parallel fanout helper + iterator - Docs and tests - Resolve rebase conflicts and keep commit clean

grahama1970 force-pushed the feat/parallel-acompletions-clean branch from 6dc2590 to 9d251a7 Compare September 12, 2025 12:07

vercel bot deployed to Preview September 12, 2025 12:08 View deployment

ci: rerun checks

d3ef004

vercel bot deployed to Preview September 12, 2025 14:23 View deployment

fix(otel): respect existing SDK TracerProvider; avoid overriding test…

305c869

…-installed provider

grahama1970 force-pushed the feat/parallel-acompletions-clean branch from 45643b6 to 305c869 Compare September 12, 2025 14:54

vercel bot deployed to Preview September 12, 2025 14:57 View deployment

fix(otel): never set global TracerProvider; always respect host/test …

1910cfc

…provider and use get_tracer only

vercel bot deployed to Preview September 12, 2025 15:04 View deployment

chore(otel): remove unused TracerProvider import in _init_tracing (lint)

b3570f4

vercel bot deployed to Preview September 12, 2025 15:12 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat(router): Parallel Acompletions #14462

feat(router): Parallel Acompletions #14462

grahama1970 commented Sep 11, 2025 •

edited

Loading

Uh oh!

vercel bot commented Sep 11, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

feat(router): Parallel Acompletions #14462

Are you sure you want to change the base?

feat(router): Parallel Acompletions #14462

Conversation

grahama1970 commented Sep 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

feat(router): Parallel Acompletions

Summary

Motivation

Diagram

What’s Included

Router

Docs/Tests

Scope/Impact

Validation

Risks/Follow-ups

Checklist

Ancillary: Tokenizer Stability

Links

Examples

Gather in one shot (preserve order)

Iterate as each finishes (completion order)

Uh oh!

vercel bot commented Sep 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

grahama1970 commented Sep 11, 2025 •

edited

Loading

vercel bot commented Sep 11, 2025 •

edited

Loading