CANN: fix acl_rstd allocation size in ggml_cann_rms_norm #15760

noemotiovon · 2025-09-03T03:43:24Z

Adjust the allocation size of acl_rstd. The parameter dims is set to 3 according to the CANN documentation.

Co-authored-by: Yuchuan [email protected]

noemotiovon · 2025-09-03T03:45:07Z

@yuchuan-cao, could you review my code?

noemotiovon · 2025-09-03T03:47:15Z

Opt Test:

ggml_backend_cann_context: device 0 async operator submission is OFF
ggml_backend_cann_context: device 0 execution mode is GRAPH (acl graph enabled)
Backend 1/2: CANN0
  Device description: Ascend910B4
  Device memory: 30196 MB (29848 MB free)

new_pool_for_device: device 0 use vmm pool
  RMS_NORM(type=f32,ne=[64,5,4,3],v=0,eps=0.000000): OK
  RMS_NORM(type=f32,ne=[64,5,4,3],v=1,eps=0.000000): OK
  RMS_NORM(type=f32,ne=[64,5,4,3],v=0,eps=0.000001): OK
  RMS_NORM(type=f32,ne=[64,5,4,3],v=1,eps=0.000001): OK
  RMS_NORM(type=f32,ne=[64,5,4,3],v=0,eps=0.000100): OK
  RMS_NORM(type=f32,ne=[64,5,4,3],v=1,eps=0.000100): OK
  RMS_NORM(type=f32,ne=[64,5,4,3],v=0,eps=0.100000): OK
  RMS_NORM(type=f32,ne=[64,5,4,3],v=1,eps=0.100000): OK
  11837/11837 tests passed
  Backend CANN0: OK
Backend 2/2: CPU
  Skipping
2/2 backends passed
OK

Fixes ggml-org#15330 Adjust the allocation size of acl_rstd. The parameter `dims` is set to 3 according to the CANN documentation. Co-authored-by: Yuchuan <[email protected]>

hipudding · 2025-09-03T10:32:36Z

ggml/src/ggml-cann/aclnn_ops.cpp

    }
    aclTensor* acl_rstd = get_f32_cache_acl_tensor(
        ctx,
        &ctx.rms_norm_zero_tensor_cache.cache,
        ctx.rms_norm_zero_tensor_cache.size,
-        src->ne,
+        acl_rstd_ne,


You can jsut use src as rstd_ne, acl_rstd_ne array is not necessary.

That approach won’t work. If we simply reuse src->ne, then for a 3D tensor the resulting dimensions would be {ne0, ne1, ne2}, whereas what we actually need are {ne1, ne2, ne3}.

yuchuan-cao

I suppose preparing src->ne sized buffer would work fine, and preparing acl_rstd_ne sized buffer would be a more memory-efficient solution.

I suggest you double-check the changes in llama-bench. The confusing thing is that the memory problem in #15330 doesn't occur in test-backend-ops and llama-cli. It only emerges when running llama-bench. I failed to find out why.

noemotiovon · 2025-09-04T01:54:11Z

@yuchuan-cao, Thanks for the suggestion. I ran the llama-bench test, and it works fine.

# script
./bin/llama-bench -fa 0 -n 0 -p 512 -r 50 -m /home/lichenguang25/.ollama/models/blobs/sha256-6f96e01a3f550ca08aea1e5725bb8d5a7eccc6f281c30417e9d380b8c46467bd -ngl 99

# log
| model                          |       size |     params | backend    | ngl |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
| qwen2 1B F16                   | 942.43 MiB |   494.03 M | CANN       |  99 |           pp512 |      3261.30 ± 18.48 |

…upport * origin/master: (72 commits) metal : Add template specialization for mul_mm_id w/ ne20 == 10 (ggml-org#15799) llama : set n_outputs to 1 to avoid 0 outputs mean-pooling (ggml-org#15791) CANN: Refactor ND to NZ workspace to be per-device (ggml-org#15763) server: add exceed_context_size_error type (ggml-org#15780) Document the new max GPU layers default in help (ggml-org#15771) ggml: add ops for WAN video model (cuda && cpu) (ggml-org#15669) CANN: Fix precision issue on 310I DUO multi-devices (ggml-org#15784) opencl: add hs=40 to FA (ggml-org#15758) CANN: fix acl_rstd allocation size in ggml_cann_rms_norm (ggml-org#15760) vulkan: fix mmv subgroup16 selection (ggml-org#15775) vulkan: don't use std::string in load_shaders, to improve compile time (ggml-org#15724) vulkan : update ggml_vk_instance_validation_ext_available (ggml-org#15666) ggml vulkan: add hardsigmoid and hardswish operations (ggml-org#15762) CUDA: Optimize `rms_norm_f32` kernel and its fused variants, giving 1-6% perf E2E (ggml-org#15715) model-conversion : fix pyright errors (ggml-org#15770) sampling : optimize dist sampler (ggml-org#15704) llama : fix incorrect model type for Gemma 270M (ggml-org#15764) model-conversion : remove hardcoded /bin/bash shebangs [no ci] (ggml-org#15765) CANN: Add RoPE contiguous check for 310I DUP device (ggml-org#15735) ggml-cpu : optimize RVV kernels (ggml-org#15720) ...

…g-model-disabled-agent-prefill * origin/master: (84 commits) CUDA: fastdiv, launch bounds for mmvq + q8_1 quant (ggml-org#15802) tests : add --list-ops and --show-coverage options (ggml-org#15745) gguf: gguf_writer refactor (ggml-org#15691) kv-cache : fix SWA checks + disable cacheless iSWA (ggml-org#15811) model-conversion : add --embeddings flag to modelcard.template [no ci] (ggml-org#15801) chat : fixed crash when Hermes 2 <tool_call> had a newline before it (ggml-org#15639) chat : nemotron thinking & toolcalling support (ggml-org#15676) scripts : add Jinja tester PySide6 simple app (ggml-org#15756) llama : add support for EmbeddingGemma 300m (ggml-org#15798) metal : Add template specialization for mul_mm_id w/ ne20 == 10 (ggml-org#15799) llama : set n_outputs to 1 to avoid 0 outputs mean-pooling (ggml-org#15791) CANN: Refactor ND to NZ workspace to be per-device (ggml-org#15763) server: add exceed_context_size_error type (ggml-org#15780) Document the new max GPU layers default in help (ggml-org#15771) ggml: add ops for WAN video model (cuda && cpu) (ggml-org#15669) CANN: Fix precision issue on 310I DUO multi-devices (ggml-org#15784) opencl: add hs=40 to FA (ggml-org#15758) CANN: fix acl_rstd allocation size in ggml_cann_rms_norm (ggml-org#15760) vulkan: fix mmv subgroup16 selection (ggml-org#15775) vulkan: don't use std::string in load_shaders, to improve compile time (ggml-org#15724) ...

) Fixes ggml-org#15330 Adjust the allocation size of acl_rstd. The parameter `dims` is set to 3 according to the CANN documentation. Co-authored-by: Yuchuan <[email protected]>

github-actions bot added ggml changes relating to the ggml tensor library for machine learning Ascend NPU issues specific to Ascend NPUs labels Sep 3, 2025

noemotiovon force-pushed the opt_rms_type branch from de19f0a to 8896981 Compare September 3, 2025 03:52

CANN: fix acl_rstd allocation size in ggml_cann_rms_norm

e1d137a

Fixes ggml-org#15330 Adjust the allocation size of acl_rstd. The parameter `dims` is set to 3 according to the CANN documentation. Co-authored-by: Yuchuan <[email protected]>

noemotiovon force-pushed the opt_rms_type branch from 8896981 to e1d137a Compare September 3, 2025 07:31

noemotiovon mentioned this pull request Sep 3, 2025

优化rms_norm算子的维度信息 cosdt/llama.cpp#26

Closed

hipudding reviewed Sep 3, 2025

View reviewed changes

yuchuan-cao approved these changes Sep 3, 2025

View reviewed changes

hipudding approved these changes Sep 4, 2025

View reviewed changes

hipudding merged commit 239b60e into ggml-org:master Sep 4, 2025
90 of 91 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CANN: fix acl_rstd allocation size in ggml_cann_rms_norm #15760

CANN: fix acl_rstd allocation size in ggml_cann_rms_norm #15760

Uh oh!

noemotiovon commented Sep 3, 2025

Uh oh!

noemotiovon commented Sep 3, 2025

Uh oh!

noemotiovon commented Sep 3, 2025

Uh oh!

hipudding Sep 3, 2025

Uh oh!

noemotiovon Sep 4, 2025

Uh oh!

yuchuan-cao left a comment •

edited

Loading

Uh oh!

noemotiovon commented Sep 4, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

CANN: fix acl_rstd allocation size in ggml_cann_rms_norm #15760

CANN: fix acl_rstd allocation size in ggml_cann_rms_norm #15760

Uh oh!

Conversation

noemotiovon commented Sep 3, 2025

Uh oh!

noemotiovon commented Sep 3, 2025

Uh oh!

noemotiovon commented Sep 3, 2025

Uh oh!

hipudding Sep 3, 2025

Choose a reason for hiding this comment

Uh oh!

noemotiovon Sep 4, 2025

Choose a reason for hiding this comment

Uh oh!

yuchuan-cao left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

noemotiovon commented Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yuchuan-cao left a comment •

edited

Loading

noemotiovon commented Sep 4, 2025 •

edited

Loading