-
-
Notifications
You must be signed in to change notification settings - Fork 4.2k
[Performance] Use _PROXY_MaxParallelRequestsHandler_v3 by default again #14450
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
ishaan-jaff
merged 7 commits into
main
from
litellm_/performance/proxy-parallel-request-handler-v3
Sep 13, 2025
Merged
[Performance] Use _PROXY_MaxParallelRequestsHandler_v3 by default again #14450
ishaan-jaff
merged 7 commits into
main
from
litellm_/performance/proxy-parallel-request-handler-v3
Sep 13, 2025
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The latest updates on your projects. Learn more about Vercel for GitHub.
|
602dcba
to
96bdf20
Compare
5240e27
to
432fdee
Compare
432fdee
to
0be84c9
Compare
0be84c9
to
d6d36c8
Compare
(cherry picked from commit f3fa45cf8fbd5f5cce2f45a7312776d5005fb08e) (cherry picked from commit 5b680bb)
The rate limiter was incorrectly rejecting requests when the limit was met, but not exceeded. The check in `is_cache_list_over_limit` was `int(counter_value) + 1 > current_limit`, which caused the first request to be rejected if the limit was 1. This commit removes the `+ 1`, changing the logic to `int(counter_value) > current_limit`. The check now correctly allows requests up to the specified parallel limit.
d6d36c8
to
e62f0ec
Compare
Should be ready to merge, please review |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Follow up on #14420 and #14352
The rate limiter was incorrectly rejecting requests when the limit was met, but not exceeded. The check in
is_cache_list_over_limit
wasint(counter_value) + 1 > current_limit
, which caused the first request to be rejected if the limit was 1.This PR removes the
+ 1
, changing the logic toint(counter_value) > current_limit
. The check now correctly allows requests up to the specified parallel limit.It also adds several tests to ensure correct behavior when handling parallel/sequential requests from multiple users.
Pre-Submission checklist
Please complete all items before asking a LiteLLM maintainer to review your PR
tests/
directorymake test-unit
pytest tests/test_end_users.py::test_aaaend_user_specific_region tests/local_testing/test_pass_through_endpoints.py -k 'rpm or specific_region' -n 6 -vv
Type
🆕 New Feature
🚄 Infrastructure
✅ Test
Changes