Skip to content

Conversation

842974287
Copy link
Contributor

Summary:
Previously #21893 added a check for network access to NV cubin artifact endpoint. And if the check failed, we would opt out from trtllm attn backend.

This PR added a env var FLASHINFER_CUBIN_DIR to allow people specify a cubin dir which contains pre-downloaded cubin files. When FLASHINFER_CUBIN_DIR is specified, we will directly return True in has_nvidia_artifactory().

Differential Revision: D80035005

Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@facebook-github-bot
Copy link

This pull request was exported from Phabricator. Differential Revision: D80035005

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds a new environment variable FLASHINFER_CUBIN_DIR to allow specifying a local directory for pre-downloaded flashinfer cubin files. This is a useful feature for environments without network access to NVIDIA's artifactory. My review focuses on improving the robustness of the implementation and ensuring it aligns with the project's conventions for managing environment variables. I've provided two comments: one to centralize the new environment variable definition for better maintainability, and another to add a more robust check for the provided directory path to prevent potential runtime errors.

Comment on lines 136 to 138
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The current check for FLASHINFER_CUBIN_DIR is not robust enough. It will return True even if the environment variable is set to an empty string (''), which is not a valid path and will likely cause failures later. It's better to also verify that the path exists and is a directory.

Suggested change
# Since FLASHINFER_CUBIN_DIR defines the pre-downloaded cubins path, when
# it's true, we could assume the cubins are available.
if FLASHINFER_CUBIN_DIR is not None:
return True
# If FLASHINFER_CUBIN_DIR is set and points to a valid directory,
# we can assume the cubins are available.
if FLASHINFER_CUBIN_DIR and os.path.isdir(FLASHINFER_CUBIN_DIR):
return True

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

For consistency and maintainability, environment variables should be managed centrally in vllm/envs.py. Please add FLASHINFER_CUBIN_DIR to vllm/envs.py and import it from there. This aligns with how other environment variables are handled in the project.

After adding it to vllm/envs.py, you can change this line to:

Suggested change
FLASHINFER_CUBIN_DIR = os.environ.get("FLASHINFER_CUBIN_DIR", None)
FLASHINFER_CUBIN_DIR = envs.FLASHINFER_CUBIN_DIR

@842974287
Copy link
Contributor Author

Having an RFC PR on flashinfer to add the same env var as well flashinfer-ai/flashinfer#1462.

@pavanimajety
Copy link
Contributor

Thanks for the PR - Idea looks good to me, please address gemini's comments, they seem reasonable.

@facebook-github-bot
Copy link

This pull request was exported from Phabricator. Differential Revision: D80035005

@842974287
Copy link
Contributor Author

@pavanimajety updated the PR, please take another look!

@facebook-github-bot
Copy link

This pull request was exported from Phabricator. Differential Revision: D80035005

Copy link
Contributor

@pavanimajety pavanimajety left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, please rebase your branch for the CI

@mgoin for CI and further comments

Copy link
Collaborator

@houseroad houseroad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks

@houseroad houseroad added performance Performance-related issues ready ONLY add when PR is ready to merge/full CI is needed labels Aug 20, 2025
@houseroad houseroad enabled auto-merge (squash) August 20, 2025 16:54
auto-merge was automatically disabled August 20, 2025 21:44

Head branch was pushed to by a user without write access

@facebook-github-bot
Copy link

This pull request was exported from Phabricator. Differential Revision: D80035005

…m-project#22675)

Summary:

Previously vllm-project#21893 added a check for network access to NV cubin artifact endpoint. And if the check failed, we would opt out from trtllm attn backend.

This PR added a env var `FLASHINFER_CUBIN_DIR` to allow people specify a cubin dir which contains pre-downloaded cubin files. When `FLASHINFER_CUBIN_DIR` is specified, we will directly return True in `has_nvidia_artifactory()`.

Reviewed By: frank-wei, Adolfo-Karim

Differential Revision: D80035005
@facebook-github-bot
Copy link

This pull request was exported from Phabricator. Differential Revision: D80035005

@houseroad houseroad enabled auto-merge (squash) August 22, 2025 07:37
@houseroad houseroad merged commit da65bec into vllm-project:main Aug 22, 2025
37 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance-related issues ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants