-
-
Notifications
You must be signed in to change notification settings - Fork 10.4k
[BugFix] Threadsafe close async zmq sockets #22877
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Ensure async zmq sockets are closed from the event loop - Set LINGER=0 on stats update sockets to ensure they don't block the context being closed Signed-off-by: Nick Hill <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request addresses a potential hanging issue when closing asynchronous ZMQ sockets by ensuring they are closed from within their event loop. The changes introduce a close_sockets
helper function and modify the BackgroundResources
finalizer to use loop.call_soon_threadsafe
for closing async sockets and cancelling related tasks. This is the correct approach for thread-safe cleanup of asyncio resources. Additionally, linger=0
is set on stats update sockets to prevent them from blocking context termination. The changes appear correct and well-implemented to fix the described bug.
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
As @Isotr0py did in his original PR Signed-off-by: Nick Hill <[email protected]> Co-authored-by: Isotr0py <[email protected]> Signed-off-by: Nick Hill <[email protected]>
Signed-off-by: Nick Hill <[email protected]>
Unblocking 4 GPUs test |
Passed as well, merging |
@Isotr0py you probably saw, I incorporated your changes after all for explicitly closing the other sockets because they did appear to be needed. I wasn't able to reproduce locally but the CI test was still hanging without them. |
Yes, I can't reproduce the hanging issue locally but it will stuck at
I wonder if there are some issues at ZMQ side... |
Signed-off-by: Nick Hill <[email protected]> Co-authored-by: Isotr0py <[email protected]>
Signed-off-by: Nick Hill <[email protected]> Co-authored-by: Isotr0py <[email protected]>
Signed-off-by: Nick Hill <[email protected]> Co-authored-by: Isotr0py <[email protected]> Signed-off-by: Duncan Moss <[email protected]>
Signed-off-by: Nick Hill <[email protected]> Co-authored-by: Isotr0py <[email protected]>
Signed-off-by: Nick Hill <[email protected]> Co-authored-by: Isotr0py <[email protected]> Signed-off-by: Xiao Yu <[email protected]>
Signed-off-by: Nick Hill <[email protected]> Co-authored-by: Isotr0py <[email protected]>
FIX #22831
FIX #22839
LINGER=0
on stats update sockets to ensure they don't block the context being closed