-
-
Notifications
You must be signed in to change notification settings - Fork 10.4k
Fix CUDA kernel index data type in vllm/csrc/quantization/gptq_marlin/awq_marlin_repack.cu +10 #15160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
This pull request was exported from Phabricator. Differential Revision: D71355454 |
4625c88
to
b065ee3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Also cc @tlrmchlsmth
b065ee3
to
98a5a45
Compare
Can you update this after #15282 has been merged? |
98a5a45
to
3e5e7b6
Compare
rebased. hope the CI can be fixed. Keep running into timeout... sigh. |
Kernels tests aren't failing on main, so maybe there is indeed something wrong with this PR? |
kernel test-2: attempt 1, failed on [2025-03-22T08:06:54Z] kernels/test_marlin_gemm.py::test_gptq_marlin_gemm[False-False-False-False-mnk_factors1-128-quant_type1-256-128] PASSED| [2025-03-22T09:27:08Z] kernels/test_marlin_gemm.py::test_gptq_marlin_gemm[False-False-False-False-mnk_factors2--1-quant_type1-256-128] # Received cancellation signal, interrupting attempt 2, failed on [2025-03-23T00:44:47Z] kernels/test_marlin_gemm.py::test_gptq_marlin_gemm[False-False-False-False-mnk_factors1-128-quant_type1-256-128] PASSED | [2025-03-23T02:03:09Z] kernels/test_marlin_gemm.py::test_gptq_marlin_gemm[False-False-False-False-mnk_factors2--1-quant_type1-256-128] # Received cancellation signal, interrupting attempt3 failed on [2025-03-23T06:29:36Z] kernels/test_marlin_gemm.py::test_gptq_marlin_gemm[False-False-False-False-mnk_factors1-64-quant_type0-256-128] PASSED | [2025-03-23T06:29:36Z] kernels/test_marlin_gemm.py::test_gptq_marlin_gemm[False-False-False-False-mnk_factors1-128-quant_type1-256-128] PASSED Seems problematic, let me take a close look |
c3d7cd1
to
1d5811f
Compare
I think I found the problematic place: int delta_first = iters * blockIdx.x - col_first; // this shouldn't be changed to auto. Since the type deduce to uint, where iters and col_first are int, and blockIdx.x is uint. |
…/awq_marlin_repack.cu +10 Summary: CUDA kernel variables matching the type `(thread|block|grid).(Idx|Dim).(x|y|z)` [have the data type `uint`](https://docs.nvidia.com/cuda/cuda-c-programming-guide/#built-in-variables). Many programmers mistakenly use implicit casts to turn these data types into `int`. In fact, the [CUDA Programming Guide](https://docs.nvidia.com/cuda/cuda-c-programming-guide/) it self is inconsistent and incorrect in its use of data types in programming examples. The result of these implicit casts is that our kernels may give unexpected results when exposed to large datasets, i.e., those exceeding >~2B items. While we now have linters in place to prevent simple mistakes (D71236150), our codebase has many problematic instances. This diff fixes some of them. Differential Revision: D71355454 Signed-off-by: Lu Fang <[email protected]>
Signed-off-by: Lu Fang <[email protected]>
1d5811f
to
130a5d0
Compare
The problems should be fixed, cc: @DarkLight1337 |
Pre-commit is failing |
It shows all 60 checks passed |
Weird, the mobile app showed that it's failing. I'm back on my PC now and everything looks fine, sorry for the confusion! |
…/awq_marlin_repack.cu +10 (vllm-project#15160) Signed-off-by: Lu Fang <[email protected]> Co-authored-by: Richard Barnes <[email protected]>
…/awq_marlin_repack.cu +10 (vllm-project#15160) Signed-off-by: Lu Fang <[email protected]> Co-authored-by: Richard Barnes <[email protected]> Signed-off-by: Wes Medford <[email protected]>
…/awq_marlin_repack.cu +10 (vllm-project#15160) Signed-off-by: Lu Fang <[email protected]> Co-authored-by: Richard Barnes <[email protected]> Signed-off-by: Louis Ulmer <[email protected]>
…/awq_marlin_repack.cu +10 (vllm-project#15160) Signed-off-by: Lu Fang <[email protected]> Co-authored-by: Richard Barnes <[email protected]>
…/awq_marlin_repack.cu +10 (vllm-project#15160) Signed-off-by: Lu Fang <[email protected]> Co-authored-by: Richard Barnes <[email protected]>
…/awq_marlin_repack.cu +10 (vllm-project#15160) Signed-off-by: Lu Fang <[email protected]> Co-authored-by: Richard Barnes <[email protected]> Signed-off-by: Mu Huai <[email protected]>
Summary:
CUDA kernel variables matching the type
(thread|block|grid).(Idx|Dim).(x|y|z)
have the data typeuint
.Many programmers mistakenly use implicit casts to turn these data types into
int
. In fact, the CUDA Programming Guide it self is inconsistent and incorrect in its use of data types in programming examples.The result of these implicit casts is that our kernels may give unexpected results when exposed to large datasets, i.e., those exceeding >~2B items.
While we now have linters in place to prevent simple mistakes (D71236150), our codebase has many problematic instances. This diff fixes some of them.
Differential Revision: D71355454