-
-
Notifications
You must be signed in to change notification settings - Fork 10.4k
add an env var for path to pre-downloaded flashinfer cubin files #22675
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
This pull request was exported from Phabricator. Differential Revision: D80035005 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds a new environment variable FLASHINFER_CUBIN_DIR
to allow specifying a local directory for pre-downloaded flashinfer cubin files. This is a useful feature for environments without network access to NVIDIA's artifactory. My review focuses on improving the robustness of the implementation and ensuring it aligns with the project's conventions for managing environment variables. I've provided two comments: one to centralize the new environment variable definition for better maintainability, and another to add a more robust check for the provided directory path to prevent potential runtime errors.
vllm/utils/flashinfer.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current check for FLASHINFER_CUBIN_DIR
is not robust enough. It will return True
even if the environment variable is set to an empty string (''
), which is not a valid path and will likely cause failures later. It's better to also verify that the path exists and is a directory.
# Since FLASHINFER_CUBIN_DIR defines the pre-downloaded cubins path, when | |
# it's true, we could assume the cubins are available. | |
if FLASHINFER_CUBIN_DIR is not None: | |
return True | |
# If FLASHINFER_CUBIN_DIR is set and points to a valid directory, | |
# we can assume the cubins are available. | |
if FLASHINFER_CUBIN_DIR and os.path.isdir(FLASHINFER_CUBIN_DIR): | |
return True |
vllm/utils/flashinfer.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For consistency and maintainability, environment variables should be managed centrally in vllm/envs.py
. Please add FLASHINFER_CUBIN_DIR
to vllm/envs.py
and import it from there. This aligns with how other environment variables are handled in the project.
After adding it to vllm/envs.py
, you can change this line to:
FLASHINFER_CUBIN_DIR = os.environ.get("FLASHINFER_CUBIN_DIR", None) | |
FLASHINFER_CUBIN_DIR = envs.FLASHINFER_CUBIN_DIR |
Having an RFC PR on flashinfer to add the same env var as well flashinfer-ai/flashinfer#1462. |
Thanks for the PR - Idea looks good to me, please address gemini's comments, they seem reasonable. |
4f0961a
to
e992997
Compare
This pull request was exported from Phabricator. Differential Revision: D80035005 |
@pavanimajety updated the PR, please take another look! |
e992997
to
6db3c6b
Compare
This pull request was exported from Phabricator. Differential Revision: D80035005 |
6db3c6b
to
fe167d0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, please rebase your branch for the CI
@mgoin for CI and further comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks
Head branch was pushed to by a user without write access
2312211
to
0d79764
Compare
This pull request was exported from Phabricator. Differential Revision: D80035005 |
0d79764
to
23d05e3
Compare
…m-project#22675) Summary: Previously vllm-project#21893 added a check for network access to NV cubin artifact endpoint. And if the check failed, we would opt out from trtllm attn backend. This PR added a env var `FLASHINFER_CUBIN_DIR` to allow people specify a cubin dir which contains pre-downloaded cubin files. When `FLASHINFER_CUBIN_DIR` is specified, we will directly return True in `has_nvidia_artifactory()`. Reviewed By: frank-wei, Adolfo-Karim Differential Revision: D80035005
23d05e3
to
24ede43
Compare
This pull request was exported from Phabricator. Differential Revision: D80035005 |
…m-project#22675) Signed-off-by: Xiao Yu <[email protected]>
…m-project#22675) Signed-off-by: Ekagra Ranjan <[email protected]>
Summary:
Previously #21893 added a check for network access to NV cubin artifact endpoint. And if the check failed, we would opt out from trtllm attn backend.
This PR added a env var
FLASHINFER_CUBIN_DIR
to allow people specify a cubin dir which contains pre-downloaded cubin files. WhenFLASHINFER_CUBIN_DIR
is specified, we will directly return True inhas_nvidia_artifactory()
.Differential Revision: D80035005