Skip to content

Conversation

yewentao256
Copy link
Collaborator

@yewentao256 yewentao256 commented Aug 26, 2025

Purpose

Fix warning for

home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu(215): warning #2361-D: invalid narrowing conversion from "signed long" to "int"
          S_ptr, stride_S, group_size};
                           ^
          detected during instantiation of "at::Tensor vllm::cutlass_w4a8::W4A8GemmKernel<TileShape_MN, ClusterShape_MNK>::mm(const at::Tensor &, const at::Tensor &, const at::Tensor &, int64_t, const at::Tensor &, const at::Tensor &, const std::optional<c10::ScalarType> &) [with TileShape_MN=cute::tuple<cute::_256, cute::_128>, ClusterShape_MNK=cute::tuple<cute::_1, cute::_1, cute::_1>]" at line 272

Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"

/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu(215): warning #2361-D: invalid narrowing conversion from "signed long" to "int"
          S_ptr, stride_S, group_size};
                           ^
          detected during instantiation of "at::Tensor vllm::cutlass_w4a8::W4A8GemmKernel<TileShape_MN, ClusterShape_MNK>::mm(const at::Tensor &, const at::Tensor &, const at::Tensor &, int64_t, const at::Tensor &, const at::Tensor &, const std::optional<c10::ScalarType> &) [with TileShape_MN=cute::tuple<cute::C<256>, cute::C<64>>, ClusterShape_MNK=cute::tuple<cute::_1, cute::_1, cute::_1>]" at line 276

/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu(215): warning #2361-D: invalid narrowing conversion from "signed long" to "int"
          S_ptr, stride_S, group_size};
                           ^
          detected during instantiation of "at::Tensor vllm::cutlass_w4a8::W4A8GemmKernel<TileShape_MN, ClusterShape_MNK>::mm(const at::Tensor &, const at::Tensor &, const at::Tensor &, int64_t, const at::Tensor &, const at::Tensor &, const std::optional<c10::ScalarType> &) [with TileShape_MN=cute::tuple<cute::C<256>, cute::C<32>>, ClusterShape_MNK=cute::tuple<cute::_1, cute::_1, cute::_1>]" at line 280

/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu(215): warning #2361-D: invalid narrowing conversion from "signed long" to "int"
          S_ptr, stride_S, group_size};
                           ^
          detected during instantiation of "at::Tensor vllm::cutlass_w4a8::W4A8GemmKernel<TileShape_MN, ClusterShape_MNK>::mm(const at::Tensor &, const at::Tensor &, const at::Tensor &, int64_t, const at::Tensor &, const at::Tensor &, const std::optional<c10::ScalarType> &) [with TileShape_MN=cute::tuple<cute::C<256>, cute::C<16>>, ClusterShape_MNK=cute::tuple<cute::_1, cute::_1, cute::_1>]" at line 284

/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu(215): warning #2361-D: invalid narrowing conversion from "signed long" to "int"
          S_ptr, stride_S, group_size};
                           ^
          detected during instantiation of "at::Tensor vllm::cutlass_w4a8::W4A8GemmKernel<TileShape_MN, ClusterShape_MNK>::mm(const at::Tensor &, const at::Tensor &, const at::Tensor &, int64_t, const at::Tensor &, const at::Tensor &, const std::optional<c10::ScalarType> &) [with TileShape_MN=cute::tuple<cute::_128, cute::_256>, ClusterShape_MNK=cute::tuple<cute::_2, cute::_1, cute::_1>]" at line 288

/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu(215): warning #2361-D: invalid narrowing conversion from "signed long" to "int"
          S_ptr, stride_S, group_size};
                           ^
          detected during instantiation of "at::Tensor vllm::cutlass_w4a8::W4A8GemmKernel<TileShape_MN, ClusterShape_MNK>::mm(const at::Tensor &, const at::Tensor &, const at::Tensor &, int64_t, const at::Tensor &, const at::Tensor &, const std::optional<c10::ScalarType> &) [with TileShape_MN=cute::tuple<cute::_128, cute::_256>, ClusterShape_MNK=cute::tuple<cute::_1, cute::_1, cute::_1>]" at line 292

/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu(215): warning #2361-D: invalid narrowing conversion from "signed long" to "int"
          S_ptr, stride_S, group_size};
                           ^
          detected during instantiation of "at::Tensor vllm::cutlass_w4a8::W4A8GemmKernel<TileShape_MN, ClusterShape_MNK>::mm(const at::Tensor &, const at::Tensor &, const at::Tensor &, int64_t, const at::Tensor &, const at::Tensor &, const std::optional<c10::ScalarType> &) [with TileShape_MN=cute::tuple<cute::_128, cute::_128>, ClusterShape_MNK=cute::tuple<cute::_1, cute::_1, cute::_1>]" at line 296

/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu(215): warning #2361-D: invalid narrowing conversion from "signed long" to "int"
          S_ptr, stride_S, group_size};
                           ^
          detected during instantiation of "at::Tensor vllm::cutlass_w4a8::W4A8GemmKernel<TileShape_MN, ClusterShape_MNK>::mm(const at::Tensor &, const at::Tensor &, const at::Tensor &, int64_t, const at::Tensor &, const at::Tensor &, const std::optional<c10::ScalarType> &) [with TileShape_MN=cute::tuple<cute::C<128>, cute::C<64>>, ClusterShape_MNK=cute::tuple<cute::_1, cute::_1, cute::_1>]" at line 300

/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu(215): warning #2361-D: invalid narrowing conversion from "signed long" to "int"
          S_ptr, stride_S, group_size};
                           ^
          detected during instantiation of "at::Tensor vllm::cutlass_w4a8::W4A8GemmKernel<TileShape_MN, ClusterShape_MNK>::mm(const at::Tensor &, const at::Tensor &, const at::Tensor &, int64_t, const at::Tensor &, const at::Tensor &, const std::optional<c10::ScalarType> &) [with TileShape_MN=cute::tuple<cute::C<128>, cute::C<32>>, ClusterShape_MNK=cute::tuple<cute::_1, cute::_1, cute::_1>]" at line 304

/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu(215): warning #2361-D: invalid narrowing conversion from "signed long" to "int"
          S_ptr, stride_S, group_size};
                           ^
          detected during instantiation of "at::Tensor vllm::cutlass_w4a8::W4A8GemmKernel<TileShape_MN, ClusterShape_MNK>::mm(const at::Tensor &, const at::Tensor &, const at::Tensor &, int64_t, const at::Tensor &, const at::Tensor &, const std::optional<c10::ScalarType> &) [with TileShape_MN=cute::tuple<cute::C<128>, cute::C<16>>, ClusterShape_MNK=cute::tuple<cute::_1, cute::_1, cute::_1>]" at line 308

/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu: In instantiation of ‘static at::Tensor vllm::cutlass_w4a8::W4A8GemmKernel<TileShape_MN, ClusterShape_MNK>::mm(const at::Tensor&, const at::Tensor&, const at::Tensor&, int64_t, const at::Tensor&, const at::Tensor&, const std::optional<c10::ScalarType>&) [with TileShape_MN = cute::tuple<cute::C<256>, cute::C<128> >; ClusterShape_MNK = cute::tuple<cute::C<1>, cute::C<1>, cute::C<1> >; int64_t = long int]’:
/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu:272:30:   required from here
/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu:213:99: warning: narrowing conversion of ‘group_size’ from ‘int64_t’ {aka ‘long int’} to ‘int’ [-Wnarrowing]
  213 |     MainloopArguments mainloop_arguments{
      |                                                                                                   ^         
/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu: In instantiation of ‘static at::Tensor vllm::cutlass_w4a8::W4A8GemmKernel<TileShape_MN, ClusterShape_MNK>::mm(const at::Tensor&, const at::Tensor&, const at::Tensor&, int64_t, const at::Tensor&, const at::Tensor&, const std::optional<c10::ScalarType>&) [with TileShape_MN = cute::tuple<cute::C<256>, cute::C<64> >; ClusterShape_MNK = cute::tuple<cute::C<1>, cute::C<1>, cute::C<1> >; int64_t = long int]’:
/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu:276:29:   required from here
/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu:213:99: warning: narrowing conversion of ‘group_size’ from ‘int64_t’ {aka ‘long int’} to ‘int’ [-Wnarrowing]
/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu: In instantiation of ‘static at::Tensor vllm::cutlass_w4a8::W4A8GemmKernel<TileShape_MN, ClusterShape_MNK>::mm(const at::Tensor&, const at::Tensor&, const at::Tensor&, int64_t, const at::Tensor&, const at::Tensor&, const std::optional<c10::ScalarType>&) [with TileShape_MN = cute::tuple<cute::C<256>, cute::C<32> >; ClusterShape_MNK = cute::tuple<cute::C<1>, cute::C<1>, cute::C<1> >; int64_t = long int]’:
/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu:280:29:   required from here
/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu:213:99: warning: narrowing conversion of ‘group_size’ from ‘int64_t’ {aka ‘long int’} to ‘int’ [-Wnarrowing]
/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu: In instantiation of ‘static at::Tensor vllm::cutlass_w4a8::W4A8GemmKernel<TileShape_MN, ClusterShape_MNK>::mm(const at::Tensor&, const at::Tensor&, const at::Tensor&, int64_t, const at::Tensor&, const at::Tensor&, const std::optional<c10::ScalarType>&) [with TileShape_MN = cute::tuple<cute::C<256>, cute::C<16> >; ClusterShape_MNK = cute::tuple<cute::C<1>, cute::C<1>, cute::C<1> >; int64_t = long int]’:
/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu:284:29:   required from here
/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu:213:99: warning: narrowing conversion of ‘group_size’ from ‘int64_t’ {aka ‘long int’} to ‘int’ [-Wnarrowing]
/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu: In instantiation of ‘static at::Tensor vllm::cutlass_w4a8::W4A8GemmKernel<TileShape_MN, ClusterShape_MNK>::mm(const at::Tensor&, const at::Tensor&, const at::Tensor&, int64_t, const at::Tensor&, const at::Tensor&, const std::optional<c10::ScalarType>&) [with TileShape_MN = cute::tuple<cute::C<128>, cute::C<256> >; ClusterShape_MNK = cute::tuple<cute::C<2>, cute::C<1>, cute::C<1> >; int64_t = long int]’:
/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu:288:30:   required from here
/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu:213:99: warning: narrowing conversion of ‘group_size’ from ‘int64_t’ {aka ‘long int’} to ‘int’ [-Wnarrowing]
/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu: In instantiation of ‘static at::Tensor vllm::cutlass_w4a8::W4A8GemmKernel<TileShape_MN, ClusterShape_MNK>::mm(const at::Tensor&, const at::Tensor&, const at::Tensor&, int64_t, const at::Tensor&, const at::Tensor&, const std::optional<c10::ScalarType>&) [with TileShape_MN = cute::tuple<cute::C<128>, cute::C<256> >; ClusterShape_MNK = cute::tuple<cute::C<1>, cute::C<1>, cute::C<1> >; int64_t = long int]’:
/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu:292:30:   required from here
/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu:213:99: warning: narrowing conversion of ‘group_size’ from ‘int64_t’ {aka ‘long int’} to ‘int’ [-Wnarrowing]
/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu: In instantiation of ‘static at::Tensor vllm::cutlass_w4a8::W4A8GemmKernel<TileShape_MN, ClusterShape_MNK>::mm(const at::Tensor&, const at::Tensor&, const at::Tensor&, int64_t, const at::Tensor&, const at::Tensor&, const std::optional<c10::ScalarType>&) [with TileShape_MN = cute::tuple<cute::C<128>, cute::C<128> >; ClusterShape_MNK = cute::tuple<cute::C<1>, cute::C<1>, cute::C<1> >; int64_t = long int]’:
/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu:296:30:   required from here
/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu:213:99: warning: narrowing conversion of ‘group_size’ from ‘int64_t’ {aka ‘long int’} to ‘int’ [-Wnarrowing]
/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu: In instantiation of ‘static at::Tensor vllm::cutlass_w4a8::W4A8GemmKernel<TileShape_MN, ClusterShape_MNK>::mm(const at::Tensor&, const at::Tensor&, const at::Tensor&, int64_t, const at::Tensor&, const at::Tensor&, const std::optional<c10::ScalarType>&) [with TileShape_MN = cute::tuple<cute::C<128>, cute::C<64> >; ClusterShape_MNK = cute::tuple<cute::C<1>, cute::C<1>, cute::C<1> >; int64_t = long int]’:
/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu:300:29:   required from here
/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu:213:99: warning: narrowing conversion of ‘group_size’ from ‘int64_t’ {aka ‘long int’} to ‘int’ [-Wnarrowing]
/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu: In instantiation of ‘static at::Tensor vllm::cutlass_w4a8::W4A8GemmKernel<TileShape_MN, ClusterShape_MNK>::mm(const at::Tensor&, const at::Tensor&, const at::Tensor&, int64_t, const at::Tensor&, const at::Tensor&, const std::optional<c10::ScalarType>&) [with TileShape_MN = cute::tuple<cute::C<128>, cute::C<32> >; ClusterShape_MNK = cute::tuple<cute::C<1>, cute::C<1>, cute::C<1> >; int64_t = long int]’:
/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu:304:29:   required from here
/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu:213:99: warning: narrowing conversion of ‘group_size’ from ‘int64_t’ {aka ‘long int’} to ‘int’ [-Wnarrowing]
/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu: In instantiation of ‘static at::Tensor vllm::cutlass_w4a8::W4A8GemmKernel<TileShape_MN, ClusterShape_MNK>::mm(const at::Tensor&, const at::Tensor&, const at::Tensor&, int64_t, const at::Tensor&, const at::Tensor&, const std::optional<c10::ScalarType>&) [with TileShape_MN = cute::tuple<cute::C<128>, cute::C<16> >; ClusterShape_MNK = cute::tuple<cute::C<1>, cute::C<1>, cute::C<1> >; int64_t = long int]’:
/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu:308:29:   required from here
/home/wentao/vllm/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu:213:99: warning: narrowing conversion of ‘group_size’ from ‘int64_t’ {aka ‘long int’} to ‘int’ [-Wnarrowing]

Test

Now

[2/3] Install the project...
-- Install configuration: "Release"
-- Up-to-date: /home/wentao/vllm/vllm/cumem_allocator.abi3.so
-- Installing: /home/wentao/vllm/vllm/_C.abi3.so
-- Set non-toolchain portion of runtime path of "/home/wentao/vllm/vllm/_C.abi3.so" to ""
...
-- Up-to-date: /home/wentao/vllm/vllm/vllm_flash_attn/flash_attn_interface.py
-- Up-to-date: /home/wentao/vllm/vllm/vllm_flash_attn/__init__.py

Signed-off-by: yewentao256 <[email protected]>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a compilation warning in w4a8_mm_entry.cu caused by a narrowing conversion from int64_t to int for the group_size parameter. The solution implements a runtime check to validate that group_size is within the representable range of an int before casting it. This change is correct, safe, and effectively resolves the compiler warning. The updated code is clean and I have no further suggestions for improvement.

@yewentao256
Copy link
Collaborator Author

@mgoin CC

Copy link
Member

@mgoin mgoin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks

@mgoin mgoin enabled auto-merge (squash) August 29, 2025 18:57
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Aug 29, 2025
@DarkLight1337
Copy link
Member

Please merge from main to fix CI

@vllm-bot vllm-bot merged commit c4ed78b into vllm-project:main Sep 3, 2025
68 of 71 checks passed
mayuyuace pushed a commit to mayuyuace/vllm that referenced this pull request Sep 3, 2025
845473182 pushed a commit to 845473182/vllm that referenced this pull request Sep 3, 2025
* 'main' of https://github.com/845473182/vllm: (457 commits)
  [BugFix] Fix routed_scaling_factor double mul for dots1 and glm4 MoE models (vllm-project#24132)
  [Misc] Add check for dual_chunk_attention (vllm-project#24070)
  [Doc]: fix typos in Python comments (vllm-project#24115)
  [Doc]: fix typos in Python comments (vllm-project#24093)
  [Compile] Fix Compile Warning for `w4a8_mm_entry.cu` (vllm-project#23660)
  fix some typos (vllm-project#24071)
  [V1] Wrapper which plumbs request-level logits processors into vLLM batch-level logits processing (vllm-project#23656)
  Upgrade xgrammar to 0.1.23 (vllm-project#22988)
  Update release pipeline post PyTorch 2.8.0 update (vllm-project#24073)
  [XPU] Fix the bug of LoRA logits on the XPU platform (vllm-project#24081)
  [CI/Build] Disable SiluMul NVFP4 quant fusion tests (vllm-project#24121)
  [Bug] R1 Accuracy: Fix `routed_scaling_factor` Double Mul Issue (vllm-project#24119)
  [AMD][Kernel][Bugfix] Cast offsets tensor bn to tl.int64 to avoid GPU segfault (vllm-project#23692)
  [CI] Enable all hf transformers baselines in test_hybrid (vllm-project#23936)
  [Log] Only Print Profiler Results on Rank 0 (vllm-project#23370)
  Fix weights loading for Apertus (vllm-project#24100)
  [Metrics] Deprecate TPOT in favor of ITL (vllm-project#24110)
  [Bugfix] Fix packed_factor missing attribute error (vllm-project#23902)
  Run ruff format on a few files. (vllm-project#24075)
  [Bugfix] Fix transform_config parsing in Compressed Tensors (vllm-project#23945)
  ...
842974287 pushed a commit to 842974287/vllm that referenced this pull request Sep 3, 2025
)

Signed-off-by: yewentao256 <[email protected]>
Co-authored-by: Luka Govedič <[email protected]>
Signed-off-by: Shiyan Deng <[email protected]>
eicherseiji pushed a commit to eicherseiji/vllm that referenced this pull request Sep 9, 2025
LopezCastroRoberto pushed a commit to LopezCastroRoberto/vllm that referenced this pull request Sep 11, 2025
)

Signed-off-by: yewentao256 <[email protected]>
Co-authored-by: Luka Govedič <[email protected]>
Signed-off-by: LopezCastroRoberto <[email protected]>
cboss6 pushed a commit to cboss6/vllm that referenced this pull request Sep 16, 2025
)

Signed-off-by: yewentao256 <[email protected]>
Co-authored-by: Luka Govedič <[email protected]>
Signed-off-by: bruceszchen <[email protected]>
cboss6 pushed a commit to cboss6/vllm that referenced this pull request Sep 16, 2025
)

Signed-off-by: yewentao256 <[email protected]>
Co-authored-by: Luka Govedič <[email protected]>
Signed-off-by: bruceszchen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants