Fix CUDA kernel index data type in vllm/csrc/quantization/gptq_marlin/awq_marlin_repack.cu +10 #15160

houseroad · 2025-03-19T20:19:52Z

Many programmers mistakenly use implicit casts to turn these data types into int. In fact, the CUDA Programming Guide it self is inconsistent and incorrect in its use of data types in programming examples.

The result of these implicit casts is that our kernels may give unexpected results when exposed to large datasets, i.e., those exceeding >~2B items.

While we now have linters in place to prevent simple mistakes (D71236150), our codebase has many problematic instances. This diff fixes some of them.

Differential Revision: D71355454

github-actions · 2025-03-19T20:20:01Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

facebook-github-bot · 2025-03-19T20:20:07Z

This pull request was exported from Phabricator. Differential Revision: D71355454

comaniac

LGTM. Also cc @tlrmchlsmth

DarkLight1337 · 2025-03-21T10:57:47Z

Can you update this after #15282 has been merged?

houseroad · 2025-03-21T17:30:37Z

rebased. hope the CI can be fixed. Keep running into timeout... sigh.

DarkLight1337 · 2025-03-22T16:03:11Z

Kernels tests aren't failing on main, so maybe there is indeed something wrong with this PR?

houseroad · 2025-03-24T05:11:01Z

kernel test-2:

attempt 1, failed on

[2025-03-22T08:06:54Z] kernels/test_marlin_gemm.py::test_gptq_marlin_gemm[False-False-False-False-mnk_factors1-128-quant_type1-256-128] PASSED

| [2025-03-22T09:27:08Z] kernels/test_marlin_gemm.py::test_gptq_marlin_gemm[False-False-False-False-mnk_factors2--1-quant_type1-256-128] # Received cancellation signal, interrupting

attempt 2, failed on

[2025-03-23T00:44:47Z] kernels/test_marlin_gemm.py::test_gptq_marlin_gemm[False-False-False-False-mnk_factors1-128-quant_type1-256-128] PASSED

| [2025-03-23T02:03:09Z] kernels/test_marlin_gemm.py::test_gptq_marlin_gemm[False-False-False-False-mnk_factors2--1-quant_type1-256-128] # Received cancellation signal, interrupting
| [2025-03-23T02:03:11Z] 🚨 Error: The command exited with status -1

attempt3 failed on

[2025-03-23T06:29:36Z] kernels/test_marlin_gemm.py::test_gptq_marlin_gemm[False-False-False-False-mnk_factors1-64-quant_type0-256-128] PASSED

| [2025-03-23T06:29:36Z] kernels/test_marlin_gemm.py::test_gptq_marlin_gemm[False-False-False-False-mnk_factors1-128-quant_type1-256-128] PASSED
| [2025-03-23T07:50:10Z] kernels/test_marlin_gemm.py::test_gptq_marlin_gemm[False-False-False-False-mnk_factors2--1-quant_type1-256-128] # Received cancellation signal, interrupting
| [2025-03-23T07:50:12Z] 🚨 Error: The command exited with status -1

Seems problematic, let me take a close look

houseroad · 2025-03-24T07:10:30Z

I think I found the problematic place:

int delta_first = iters * blockIdx.x - col_first; // this shouldn't be changed to auto. Since the type deduce to uint, where iters and col_first are int, and blockIdx.x is uint.

…/awq_marlin_repack.cu +10 Summary: CUDA kernel variables matching the type `(thread|block|grid).(Idx|Dim).(x|y|z)` [have the data type `uint`](https://docs.nvidia.com/cuda/cuda-c-programming-guide/#built-in-variables). Many programmers mistakenly use implicit casts to turn these data types into `int`. In fact, the [CUDA Programming Guide](https://docs.nvidia.com/cuda/cuda-c-programming-guide/) it self is inconsistent and incorrect in its use of data types in programming examples. The result of these implicit casts is that our kernels may give unexpected results when exposed to large datasets, i.e., those exceeding >~2B items. While we now have linters in place to prevent simple mistakes (D71236150), our codebase has many problematic instances. This diff fixes some of them. Differential Revision: D71355454 Signed-off-by: Lu Fang <[email protected]>

Signed-off-by: Lu Fang <[email protected]>

houseroad · 2025-03-25T07:15:14Z

The problems should be fixed, cc: @DarkLight1337

DarkLight1337 · 2025-03-25T07:23:59Z

Pre-commit is failing

houseroad · 2025-03-25T07:27:30Z

It shows all 60 checks passed

DarkLight1337 · 2025-03-25T07:36:37Z

Weird, the mobile app showed that it's failing. I'm back on my PC now and everything looks fine, sorry for the confusion!

…/awq_marlin_repack.cu +10 (vllm-project#15160) Signed-off-by: Lu Fang <[email protected]> Co-authored-by: Richard Barnes <[email protected]>

…/awq_marlin_repack.cu +10 (vllm-project#15160) Signed-off-by: Lu Fang <[email protected]> Co-authored-by: Richard Barnes <[email protected]> Signed-off-by: Wes Medford <[email protected]>

…/awq_marlin_repack.cu +10 (vllm-project#15160) Signed-off-by: Lu Fang <[email protected]> Co-authored-by: Richard Barnes <[email protected]> Signed-off-by: Louis Ulmer <[email protected]>

…/awq_marlin_repack.cu +10 (vllm-project#15160) Signed-off-by: Lu Fang <[email protected]> Co-authored-by: Richard Barnes <[email protected]>

…/awq_marlin_repack.cu +10 (vllm-project#15160) Signed-off-by: Lu Fang <[email protected]> Co-authored-by: Richard Barnes <[email protected]> Signed-off-by: Mu Huai <[email protected]>

houseroad force-pushed the export-D71355454 branch from 4625c88 to b065ee3 Compare March 20, 2025 00:02

comaniac approved these changes Mar 20, 2025

View reviewed changes

comaniac added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 20, 2025

tlrmchlsmth approved these changes Mar 20, 2025

View reviewed changes

houseroad force-pushed the export-D71355454 branch from b065ee3 to 98a5a45 Compare March 20, 2025 20:44

houseroad force-pushed the export-D71355454 branch from 98a5a45 to 3e5e7b6 Compare March 21, 2025 17:27

houseroad marked this pull request as draft March 24, 2025 05:11

houseroad force-pushed the export-D71355454 branch from c3d7cd1 to 1d5811f Compare March 24, 2025 07:05

houseroad marked this pull request as ready for review March 24, 2025 07:06

r-barnes and others added 2 commits March 24, 2025 09:26

revert the change on the inappropriate change

130a5d0

Signed-off-by: Lu Fang <[email protected]>

houseroad force-pushed the export-D71355454 branch from 1d5811f to 130a5d0 Compare March 24, 2025 16:27

houseroad closed this Mar 25, 2025

houseroad reopened this Mar 25, 2025

DarkLight1337 merged commit 051da7e into vllm-project:main Mar 25, 2025
61 of 62 checks passed

ckhordiasma mentioned this pull request Apr 17, 2025

[do not merge] pr test for nm changes into 2.20 red-hat-data-services/vllm#107

Closed

jinzhen-lin mentioned this pull request Apr 23, 2025

[Kernel] some optimizations for dense marlin and moe marlin #16850

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Fix CUDA kernel index data type in vllm/csrc/quantization/gptq_marlin/awq_marlin_repack.cu +10 #15160

Fix CUDA kernel index data type in vllm/csrc/quantization/gptq_marlin/awq_marlin_repack.cu +10 #15160

Uh oh!

houseroad commented Mar 19, 2025

Uh oh!

github-actions bot commented Mar 19, 2025

Uh oh!

facebook-github-bot commented Mar 19, 2025

Uh oh!

comaniac left a comment

Uh oh!

DarkLight1337 commented Mar 21, 2025

Uh oh!

houseroad commented Mar 21, 2025

Uh oh!

DarkLight1337 commented Mar 22, 2025

Uh oh!

houseroad commented Mar 24, 2025

Uh oh!

houseroad commented Mar 24, 2025

Uh oh!

houseroad commented Mar 25, 2025

Uh oh!

DarkLight1337 commented Mar 25, 2025

Uh oh!

houseroad commented Mar 25, 2025

Uh oh!

DarkLight1337 commented Mar 25, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Fix CUDA kernel index data type in vllm/csrc/quantization/gptq_marlin/awq_marlin_repack.cu +10 #15160

Fix CUDA kernel index data type in vllm/csrc/quantization/gptq_marlin/awq_marlin_repack.cu +10 #15160

Uh oh!

Conversation

houseroad commented Mar 19, 2025

Uh oh!

github-actions bot commented Mar 19, 2025

Uh oh!

facebook-github-bot commented Mar 19, 2025

Uh oh!

comaniac left a comment

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 commented Mar 21, 2025

Uh oh!

houseroad commented Mar 21, 2025

Uh oh!

DarkLight1337 commented Mar 22, 2025

Uh oh!

houseroad commented Mar 24, 2025

[2025-03-22T08:06:54Z] kernels/test_marlin_gemm.py::test_gptq_marlin_gemm[False-False-False-False-mnk_factors1-128-quant_type1-256-128] PASSED

[2025-03-23T00:44:47Z] kernels/test_marlin_gemm.py::test_gptq_marlin_gemm[False-False-False-False-mnk_factors1-128-quant_type1-256-128] PASSED

[2025-03-23T06:29:36Z] kernels/test_marlin_gemm.py::test_gptq_marlin_gemm[False-False-False-False-mnk_factors1-64-quant_type0-256-128] PASSED

Uh oh!

houseroad commented Mar 24, 2025

Uh oh!

houseroad commented Mar 25, 2025

Uh oh!

DarkLight1337 commented Mar 25, 2025

Uh oh!

houseroad commented Mar 25, 2025

Uh oh!

DarkLight1337 commented Mar 25, 2025

Uh oh!

Uh oh!

Uh oh!