[Compile] Fix Compile Warning for `w4a8_mm_entry.cu` #23660

yewentao256 · 2025-08-26T14:46:58Z

Purpose

Fix warning for

home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu(215): warning #2361-D: invalid narrowing conversion from "signed long" to "int"
          S_ptr, stride_S, group_size};
                           ^
          detected during instantiation of "at::Tensor vllm::cutlass_w4a8::W4A8GemmKernel<TileShape_MN, ClusterShape_MNK>::mm(const at::Tensor &, const at::Tensor &, const at::Tensor &, int64_t, const at::Tensor &, const at::Tensor &, const std::optional<c10::ScalarType> &) [with TileShape_MN=cute::tuple<cute::_256, cute::_128>, ClusterShape_MNK=cute::tuple<cute::_1, cute::_1, cute::_1>]" at line 272

Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"

/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu(215): warning #2361-D: invalid narrowing conversion from "signed long" to "int"
          S_ptr, stride_S, group_size};
                           ^
          detected during instantiation of "at::Tensor vllm::cutlass_w4a8::W4A8GemmKernel<TileShape_MN, ClusterShape_MNK>::mm(const at::Tensor &, const at::Tensor &, const at::Tensor &, int64_t, const at::Tensor &, const at::Tensor &, const std::optional<c10::ScalarType> &) [with TileShape_MN=cute::tuple<cute::C<256>, cute::C<64>>, ClusterShape_MNK=cute::tuple<cute::_1, cute::_1, cute::_1>]" at line 276

/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu(215): warning #2361-D: invalid narrowing conversion from "signed long" to "int"
          S_ptr, stride_S, group_size};
                           ^
          detected during instantiation of "at::Tensor vllm::cutlass_w4a8::W4A8GemmKernel<TileShape_MN, ClusterShape_MNK>::mm(const at::Tensor &, const at::Tensor &, const at::Tensor &, int64_t, const at::Tensor &, const at::Tensor &, const std::optional<c10::ScalarType> &) [with TileShape_MN=cute::tuple<cute::C<256>, cute::C<32>>, ClusterShape_MNK=cute::tuple<cute::_1, cute::_1, cute::_1>]" at line 280

/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu(215): warning #2361-D: invalid narrowing conversion from "signed long" to "int"
          S_ptr, stride_S, group_size};
                           ^
          detected during instantiation of "at::Tensor vllm::cutlass_w4a8::W4A8GemmKernel<TileShape_MN, ClusterShape_MNK>::mm(const at::Tensor &, const at::Tensor &, const at::Tensor &, int64_t, const at::Tensor &, const at::Tensor &, const std::optional<c10::ScalarType> &) [with TileShape_MN=cute::tuple<cute::C<256>, cute::C<16>>, ClusterShape_MNK=cute::tuple<cute::_1, cute::_1, cute::_1>]" at line 284

/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu(215): warning #2361-D: invalid narrowing conversion from "signed long" to "int"
          S_ptr, stride_S, group_size};
                           ^
          detected during instantiation of "at::Tensor vllm::cutlass_w4a8::W4A8GemmKernel<TileShape_MN, ClusterShape_MNK>::mm(const at::Tensor &, const at::Tensor &, const at::Tensor &, int64_t, const at::Tensor &, const at::Tensor &, const std::optional<c10::ScalarType> &) [with TileShape_MN=cute::tuple<cute::_128, cute::_256>, ClusterShape_MNK=cute::tuple<cute::_2, cute::_1, cute::_1>]" at line 288

/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu(215): warning #2361-D: invalid narrowing conversion from "signed long" to "int"
          S_ptr, stride_S, group_size};
                           ^
          detected during instantiation of "at::Tensor vllm::cutlass_w4a8::W4A8GemmKernel<TileShape_MN, ClusterShape_MNK>::mm(const at::Tensor &, const at::Tensor &, const at::Tensor &, int64_t, const at::Tensor &, const at::Tensor &, const std::optional<c10::ScalarType> &) [with TileShape_MN=cute::tuple<cute::_128, cute::_256>, ClusterShape_MNK=cute::tuple<cute::_1, cute::_1, cute::_1>]" at line 292

/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu(215): warning #2361-D: invalid narrowing conversion from "signed long" to "int"
          S_ptr, stride_S, group_size};
                           ^
          detected during instantiation of "at::Tensor vllm::cutlass_w4a8::W4A8GemmKernel<TileShape_MN, ClusterShape_MNK>::mm(const at::Tensor &, const at::Tensor &, const at::Tensor &, int64_t, const at::Tensor &, const at::Tensor &, const std::optional<c10::ScalarType> &) [with TileShape_MN=cute::tuple<cute::_128, cute::_128>, ClusterShape_MNK=cute::tuple<cute::_1, cute::_1, cute::_1>]" at line 296

/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu(215): warning #2361-D: invalid narrowing conversion from "signed long" to "int"
          S_ptr, stride_S, group_size};
                           ^
          detected during instantiation of "at::Tensor vllm::cutlass_w4a8::W4A8GemmKernel<TileShape_MN, ClusterShape_MNK>::mm(const at::Tensor &, const at::Tensor &, const at::Tensor &, int64_t, const at::Tensor &, const at::Tensor &, const std::optional<c10::ScalarType> &) [with TileShape_MN=cute::tuple<cute::C<128>, cute::C<64>>, ClusterShape_MNK=cute::tuple<cute::_1, cute::_1, cute::_1>]" at line 300

/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu(215): warning #2361-D: invalid narrowing conversion from "signed long" to "int"
          S_ptr, stride_S, group_size};
                           ^
          detected during instantiation of "at::Tensor vllm::cutlass_w4a8::W4A8GemmKernel<TileShape_MN, ClusterShape_MNK>::mm(const at::Tensor &, const at::Tensor &, const at::Tensor &, int64_t, const at::Tensor &, const at::Tensor &, const std::optional<c10::ScalarType> &) [with TileShape_MN=cute::tuple<cute::C<128>, cute::C<32>>, ClusterShape_MNK=cute::tuple<cute::_1, cute::_1, cute::_1>]" at line 304

/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu(215): warning #2361-D: invalid narrowing conversion from "signed long" to "int"
          S_ptr, stride_S, group_size};
                           ^
          detected during instantiation of "at::Tensor vllm::cutlass_w4a8::W4A8GemmKernel<TileShape_MN, ClusterShape_MNK>::mm(const at::Tensor &, const at::Tensor &, const at::Tensor &, int64_t, const at::Tensor &, const at::Tensor &, const std::optional<c10::ScalarType> &) [with TileShape_MN=cute::tuple<cute::C<128>, cute::C<16>>, ClusterShape_MNK=cute::tuple<cute::_1, cute::_1, cute::_1>]" at line 308

/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu: In instantiation of ‘static at::Tensor vllm::cutlass_w4a8::W4A8GemmKernel<TileShape_MN, ClusterShape_MNK>::mm(const at::Tensor&, const at::Tensor&, const at::Tensor&, int64_t, const at::Tensor&, const at::Tensor&, const std::optional<c10::ScalarType>&) [with TileShape_MN = cute::tuple<cute::C<256>, cute::C<128> >; ClusterShape_MNK = cute::tuple<cute::C<1>, cute::C<1>, cute::C<1> >; int64_t = long int]’:
/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu:272:30:   required from here
/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu:213:99: warning: narrowing conversion of ‘group_size’ from ‘int64_t’ {aka ‘long int’} to ‘int’ [-Wnarrowing]
  213 |     MainloopArguments mainloop_arguments{
      |                                                                                                   ^         
/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu: In instantiation of ‘static at::Tensor vllm::cutlass_w4a8::W4A8GemmKernel<TileShape_MN, ClusterShape_MNK>::mm(const at::Tensor&, const at::Tensor&, const at::Tensor&, int64_t, const at::Tensor&, const at::Tensor&, const std::optional<c10::ScalarType>&) [with TileShape_MN = cute::tuple<cute::C<256>, cute::C<64> >; ClusterShape_MNK = cute::tuple<cute::C<1>, cute::C<1>, cute::C<1> >; int64_t = long int]’:
/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu:276:29:   required from here
/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu:213:99: warning: narrowing conversion of ‘group_size’ from ‘int64_t’ {aka ‘long int’} to ‘int’ [-Wnarrowing]
/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu: In instantiation of ‘static at::Tensor vllm::cutlass_w4a8::W4A8GemmKernel<TileShape_MN, ClusterShape_MNK>::mm(const at::Tensor&, const at::Tensor&, const at::Tensor&, int64_t, const at::Tensor&, const at::Tensor&, const std::optional<c10::ScalarType>&) [with TileShape_MN = cute::tuple<cute::C<256>, cute::C<32> >; ClusterShape_MNK = cute::tuple<cute::C<1>, cute::C<1>, cute::C<1> >; int64_t = long int]’:
/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu:280:29:   required from here
/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu:213:99: warning: narrowing conversion of ‘group_size’ from ‘int64_t’ {aka ‘long int’} to ‘int’ [-Wnarrowing]
/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu: In instantiation of ‘static at::Tensor vllm::cutlass_w4a8::W4A8GemmKernel<TileShape_MN, ClusterShape_MNK>::mm(const at::Tensor&, const at::Tensor&, const at::Tensor&, int64_t, const at::Tensor&, const at::Tensor&, const std::optional<c10::ScalarType>&) [with TileShape_MN = cute::tuple<cute::C<256>, cute::C<16> >; ClusterShape_MNK = cute::tuple<cute::C<1>, cute::C<1>, cute::C<1> >; int64_t = long int]’:
/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu:284:29:   required from here
/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu:213:99: warning: narrowing conversion of ‘group_size’ from ‘int64_t’ {aka ‘long int’} to ‘int’ [-Wnarrowing]
/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu: In instantiation of ‘static at::Tensor vllm::cutlass_w4a8::W4A8GemmKernel<TileShape_MN, ClusterShape_MNK>::mm(const at::Tensor&, const at::Tensor&, const at::Tensor&, int64_t, const at::Tensor&, const at::Tensor&, const std::optional<c10::ScalarType>&) [with TileShape_MN = cute::tuple<cute::C<128>, cute::C<256> >; ClusterShape_MNK = cute::tuple<cute::C<2>, cute::C<1>, cute::C<1> >; int64_t = long int]’:
/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu:288:30:   required from here
/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu:213:99: warning: narrowing conversion of ‘group_size’ from ‘int64_t’ {aka ‘long int’} to ‘int’ [-Wnarrowing]
/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu: In instantiation of ‘static at::Tensor vllm::cutlass_w4a8::W4A8GemmKernel<TileShape_MN, ClusterShape_MNK>::mm(const at::Tensor&, const at::Tensor&, const at::Tensor&, int64_t, const at::Tensor&, const at::Tensor&, const std::optional<c10::ScalarType>&) [with TileShape_MN = cute::tuple<cute::C<128>, cute::C<256> >; ClusterShape_MNK = cute::tuple<cute::C<1>, cute::C<1>, cute::C<1> >; int64_t = long int]’:
/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu:292:30:   required from here
/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu:213:99: warning: narrowing conversion of ‘group_size’ from ‘int64_t’ {aka ‘long int’} to ‘int’ [-Wnarrowing]
/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu: In instantiation of ‘static at::Tensor vllm::cutlass_w4a8::W4A8GemmKernel<TileShape_MN, ClusterShape_MNK>::mm(const at::Tensor&, const at::Tensor&, const at::Tensor&, int64_t, const at::Tensor&, const at::Tensor&, const std::optional<c10::ScalarType>&) [with TileShape_MN = cute::tuple<cute::C<128>, cute::C<128> >; ClusterShape_MNK = cute::tuple<cute::C<1>, cute::C<1>, cute::C<1> >; int64_t = long int]’:
/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu:296:30:   required from here
/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu:213:99: warning: narrowing conversion of ‘group_size’ from ‘int64_t’ {aka ‘long int’} to ‘int’ [-Wnarrowing]
/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu: In instantiation of ‘static at::Tensor vllm::cutlass_w4a8::W4A8GemmKernel<TileShape_MN, ClusterShape_MNK>::mm(const at::Tensor&, const at::Tensor&, const at::Tensor&, int64_t, const at::Tensor&, const at::Tensor&, const std::optional<c10::ScalarType>&) [with TileShape_MN = cute::tuple<cute::C<128>, cute::C<64> >; ClusterShape_MNK = cute::tuple<cute::C<1>, cute::C<1>, cute::C<1> >; int64_t = long int]’:
/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu:300:29:   required from here
/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu:213:99: warning: narrowing conversion of ‘group_size’ from ‘int64_t’ {aka ‘long int’} to ‘int’ [-Wnarrowing]
/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu: In instantiation of ‘static at::Tensor vllm::cutlass_w4a8::W4A8GemmKernel<TileShape_MN, ClusterShape_MNK>::mm(const at::Tensor&, const at::Tensor&, const at::Tensor&, int64_t, const at::Tensor&, const at::Tensor&, const std::optional<c10::ScalarType>&) [with TileShape_MN = cute::tuple<cute::C<128>, cute::C<32> >; ClusterShape_MNK = cute::tuple<cute::C<1>, cute::C<1>, cute::C<1> >; int64_t = long int]’:
/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu:304:29:   required from here
/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu:213:99: warning: narrowing conversion of ‘group_size’ from ‘int64_t’ {aka ‘long int’} to ‘int’ [-Wnarrowing]
/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu: In instantiation of ‘static at::Tensor vllm::cutlass_w4a8::W4A8GemmKernel<TileShape_MN, ClusterShape_MNK>::mm(const at::Tensor&, const at::Tensor&, const at::Tensor&, int64_t, const at::Tensor&, const at::Tensor&, const std::optional<c10::ScalarType>&) [with TileShape_MN = cute::tuple<cute::C<128>, cute::C<16> >; ClusterShape_MNK = cute::tuple<cute::C<1>, cute::C<1>, cute::C<1> >; int64_t = long int]’:
/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu:308:29:   required from here
/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu:213:99: warning: narrowing conversion of ‘group_size’ from ‘int64_t’ {aka ‘long int’} to ‘int’ [-Wnarrowing]

Test

Now

[2/3] Install the project...
-- Install configuration: "Release"
-- Up-to-date: /home/wentao/vllm/vllm/cumem_allocator.abi3.so
-- Installing: /home/wentao/vllm/vllm/_C.abi3.so
-- Set non-toolchain portion of runtime path of "/home/wentao/vllm/vllm/_C.abi3.so" to ""
...
-- Up-to-date: /home/wentao/vllm/vllm/vllm_flash_attn/flash_attn_interface.py
-- Up-to-date: /home/wentao/vllm/vllm/vllm_flash_attn/__init__.py

Signed-off-by: yewentao256 <[email protected]>

gemini-code-assist

Code Review

This pull request addresses a compilation warning in w4a8_mm_entry.cu caused by a narrowing conversion from int64_t to int for the group_size parameter. The solution implements a runtime check to validate that group_size is within the representable range of an int before casting it. This change is correct, safe, and effectively resolves the compiler warning. The updated code is clean and I have no further suggestions for improvement.

yewentao256 · 2025-08-27T15:00:03Z

@mgoin CC

mgoin

LGTM, thanks

DarkLight1337 · 2025-08-30T13:43:01Z

Please merge from main to fix CI

) Signed-off-by: yewentao256 <[email protected]> Co-authored-by: Luka Govedič <[email protected]>

* 'main' of https://github.com/845473182/vllm: (457 commits) [BugFix] Fix routed_scaling_factor double mul for dots1 and glm4 MoE models (vllm-project#24132) [Misc] Add check for dual_chunk_attention (vllm-project#24070) [Doc]: fix typos in Python comments (vllm-project#24115) [Doc]: fix typos in Python comments (vllm-project#24093) [Compile] Fix Compile Warning for `w4a8_mm_entry.cu` (vllm-project#23660) fix some typos (vllm-project#24071) [V1] Wrapper which plumbs request-level logits processors into vLLM batch-level logits processing (vllm-project#23656) Upgrade xgrammar to 0.1.23 (vllm-project#22988) Update release pipeline post PyTorch 2.8.0 update (vllm-project#24073) [XPU] Fix the bug of LoRA logits on the XPU platform (vllm-project#24081) [CI/Build] Disable SiluMul NVFP4 quant fusion tests (vllm-project#24121) [Bug] R1 Accuracy: Fix `routed_scaling_factor` Double Mul Issue (vllm-project#24119) [AMD][Kernel][Bugfix] Cast offsets tensor bn to tl.int64 to avoid GPU segfault (vllm-project#23692) [CI] Enable all hf transformers baselines in test_hybrid (vllm-project#23936) [Log] Only Print Profiler Results on Rank 0 (vllm-project#23370) Fix weights loading for Apertus (vllm-project#24100) [Metrics] Deprecate TPOT in favor of ITL (vllm-project#24110) [Bugfix] Fix packed_factor missing attribute error (vllm-project#23902) Run ruff format on a few files. (vllm-project#24075) [Bugfix] Fix transform_config parsing in Compressed Tensors (vllm-project#23945) ...

) Signed-off-by: yewentao256 <[email protected]> Co-authored-by: Luka Govedič <[email protected]> Signed-off-by: Shiyan Deng <[email protected]>

) Signed-off-by: yewentao256 <[email protected]> Co-authored-by: Luka Govedič <[email protected]>

) Signed-off-by: yewentao256 <[email protected]> Co-authored-by: Luka Govedič <[email protected]> Signed-off-by: LopezCastroRoberto <[email protected]>

) Signed-off-by: yewentao256 <[email protected]> Co-authored-by: Luka Govedič <[email protected]> Signed-off-by: bruceszchen <[email protected]>

fix compile warning

37eeff2

Signed-off-by: yewentao256 <[email protected]>

gemini-code-assist bot reviewed Aug 26, 2025

View reviewed changes

mgoin approved these changes Aug 29, 2025

View reviewed changes

mgoin enabled auto-merge (squash) August 29, 2025 18:57

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Aug 29, 2025

Merge branch 'main' into wye-fix-compile-warning-mm_entry

e083416

yewentao256 and others added 4 commits August 30, 2025 17:59

Merge branch 'main' into wye-fix-compile-warning-mm_entry

3bf80cb

Merge branch 'main' into wye-fix-compile-warning-mm_entry

9e85fc0

Merge branch 'main' into wye-fix-compile-warning-mm_entry

5521add

Merge branch 'main' into wye-fix-compile-warning-mm_entry

aa98aed

vllm-bot merged commit c4ed78b into vllm-project:main Sep 3, 2025
68 of 71 checks passed

mayuyuace pushed a commit to mayuyuace/vllm that referenced this pull request Sep 3, 2025

[Compile] Fix Compile Warning for w4a8_mm_entry.cu (vllm-project#23660

0331c7e

) Signed-off-by: yewentao256 <[email protected]> Co-authored-by: Luka Govedič <[email protected]>

eicherseiji pushed a commit to eicherseiji/vllm that referenced this pull request Sep 9, 2025

[Compile] Fix Compile Warning for w4a8_mm_entry.cu (vllm-project#23660

a6d41aa

) Signed-off-by: yewentao256 <[email protected]> Co-authored-by: Luka Govedič <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Compile] Fix Compile Warning for `w4a8_mm_entry.cu` #23660

[Compile] Fix Compile Warning for `w4a8_mm_entry.cu` #23660

Uh oh!

yewentao256 commented Aug 26, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

yewentao256 commented Aug 27, 2025

Uh oh!

mgoin left a comment

Uh oh!

DarkLight1337 commented Aug 30, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[Compile] Fix Compile Warning for w4a8_mm_entry.cu #23660

[Compile] Fix Compile Warning for w4a8_mm_entry.cu #23660

Uh oh!

Conversation

yewentao256 commented Aug 26, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

yewentao256 commented Aug 27, 2025

Uh oh!

mgoin left a comment

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 commented Aug 30, 2025

Uh oh!

Uh oh!

Uh oh!

[Compile] Fix Compile Warning for `w4a8_mm_entry.cu` #23660

[Compile] Fix Compile Warning for `w4a8_mm_entry.cu` #23660

yewentao256 commented Aug 26, 2025 •

edited by github-actions bot

Loading