CANN: Refactor ND to NZ workspace to be per-device in Ascend backend #15763

noemotiovon · 2025-09-03T09:35:32Z

Replaced the previous single global ND→NZ workspace with a per-device cache using unordered_map keyed by device ID.
Functions release_nz_workspace, relloc_nz_workspace, and get_nz_workspace now manage workspace independently for each device, preventing memory conflicts in multi-device / pipeline parallel scenarios.
This change fixes potential precision issues caused by workspace overwrites when multiple devices perform ND→NZ conversions concurrently.

Co-authored-by: hipudding [email protected]

- Replaced the previous single global ND→NZ workspace with a per-device cache using unordered_map keyed by device ID. - Functions `release_nz_workspace`, `relloc_nz_workspace`, and `get_nz_workspace` now manage workspace independently for each device, preventing memory conflicts in multi-device / pipeline parallel scenarios. - This change fixes potential precision issues caused by workspace overwrites when multiple devices perform ND→NZ conversions concurrently. Co-authored-by: hipudding <[email protected]>

noemotiovon · 2025-09-03T09:41:46Z

Model test on 2 devices with nd2nz:

# script
GGML_CANN_WEIGHT_NZ=1 ./bin/llama-cli -m /home/lichenguang25/.ollama/models/blobs/sha256-6f96e01a3f550ca08aea1e5725bb8d5a7eccc6f281c30417e9d380b8c46467bd 
-p "Building a website can be done in 10 steps:" -ngl 32

# log
build: 6362 (a83dc461) with cc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 for aarch64-linux-gnu
main: llama backend init
main: load the model and apply lora adapter, if any
llama_model_load_from_file_impl: using device CANN0 (Ascend910B4) - 29848 MiB free
llama_model_load_from_file_impl: using device CANN1 (Ascend910B4) - 29843 MiB free
......
user
Building a website can be done in 10 steps:
assistant
Here’s a list of 10 steps to build a website, categorized by functionality and complexity:

### 1. **Define Your Objectives and Audience**
   - **Objective**: Understand what you want to achieve with the website.
   - **Audience**: Identify who your target audience is, their needs, and what they expect from your website.

### 2. **Set Up Your Development Environment**
   - **Software**: Choose an appropriate web development platform. Popular options include HTML5, CSS3, and JavaScript.
   - **Hardware**: Ensure you have a reliable computer and necessary software installed.

### 3. **Choose a Domain Name**
   - **Purpose**: Ensure your domain name is unique and reflects your brand.
   - **Features**: Consider features like SSL, email support, and domain registration.

### 4. **Design Your Website**
   - **Website Structure**: Plan the layout and structure of your website.
   - **Content Creation**: Create high-quality, relevant content.
   - **Design Elements**: Incorporate design elements that enhance user experience.

### 5. **Choose a Content Management System (CMS)**
   - **Features**: Select a CMS that suits your needs (e.g., WordPress, Joomla, Drupal).
   - **Ease of Use**: Make sure the CMS is easy to use and maintain.

### 6. **Develop Your Website**
   - **Coding**: Use the CMS to design your website and build the pages.
   - **Testing**: Test your website thoroughly to ensure all functionalities are working as expected.

### 7. **Implement User Authentication**
   - **Features**: Implement a secure login system and ensure user privacy.
   - **Authentication**: Use secure password hashing and encryption methods.

### 8. **Deploy Your Website**
   - **Hosting**: Choose a hosting provider that suits your needs.
   - **Backup**: Regularly backup your website and data.

### 9. **Launch Your Website**
   - **Launch Date**: Set a launch date for your website.
   - **Launch Event**: Organize a launch event to introduce your website to your audience.

### 10. **Monitor and Update**
   - **Regular Updates**: Regularly update your website to fix bugs and improve functionality.
   - **Analytics**: Use analytics tools to track user behavior and improve your website.

### Additional Steps
- **SEO Optimization**: Optimize your website for search engines to increase visibility.
- **Social Media Integration**: Integrate social media features to increase engagement.
- **Content Marketing**: Publish valuable content to attract visitors.
- **Analytics**: Use analytics tools to track website performance and improve user experience.

### Best Practices
- **SEO**: Use optimized content, meta tags, and mobile-friendly design.
- **Security**: Ensure your website is secure with strong authentication and encryption.
- **User Experience**: Keep the website user-friendly and responsive.

By following these steps, you can build a robust and functional website that meets your needs and expectations.

> 
llama_perf_sampler_print:    sampling time =     166.28 ms /   627 runs   (    0.27 ms per token,  3770.66 tokens per second)
llama_perf_context_print:        load time =    1726.04 ms
llama_perf_context_print: prompt eval time =      36.21 ms /    20 tokens (    1.81 ms per token,   552.35 tokens per second)
llama_perf_context_print:        eval time =    3310.65 ms /   606 runs   (    5.46 ms per token,   183.05 tokens per second)
llama_perf_context_print:       total time =    4368.10 ms /   626 tokens
llama_perf_context_print:    graphs reused =        603

ggml/src/ggml-cann/ggml-cann.cpp

Signed-off-by: noemotiovon <[email protected]>

ggml/src/ggml-cann/ggml-cann.cpp

Signed-off-by: noemotiovon <[email protected]>

ggml/src/ggml-cann/ggml-cann.cpp

Signed-off-by: noemotiovon <[email protected]>

…upport * origin/master: (72 commits) metal : Add template specialization for mul_mm_id w/ ne20 == 10 (ggml-org#15799) llama : set n_outputs to 1 to avoid 0 outputs mean-pooling (ggml-org#15791) CANN: Refactor ND to NZ workspace to be per-device (ggml-org#15763) server: add exceed_context_size_error type (ggml-org#15780) Document the new max GPU layers default in help (ggml-org#15771) ggml: add ops for WAN video model (cuda && cpu) (ggml-org#15669) CANN: Fix precision issue on 310I DUO multi-devices (ggml-org#15784) opencl: add hs=40 to FA (ggml-org#15758) CANN: fix acl_rstd allocation size in ggml_cann_rms_norm (ggml-org#15760) vulkan: fix mmv subgroup16 selection (ggml-org#15775) vulkan: don't use std::string in load_shaders, to improve compile time (ggml-org#15724) vulkan : update ggml_vk_instance_validation_ext_available (ggml-org#15666) ggml vulkan: add hardsigmoid and hardswish operations (ggml-org#15762) CUDA: Optimize `rms_norm_f32` kernel and its fused variants, giving 1-6% perf E2E (ggml-org#15715) model-conversion : fix pyright errors (ggml-org#15770) sampling : optimize dist sampler (ggml-org#15704) llama : fix incorrect model type for Gemma 270M (ggml-org#15764) model-conversion : remove hardcoded /bin/bash shebangs [no ci] (ggml-org#15765) CANN: Add RoPE contiguous check for 310I DUP device (ggml-org#15735) ggml-cpu : optimize RVV kernels (ggml-org#15720) ...

…g-model-disabled-agent-prefill * origin/master: (84 commits) CUDA: fastdiv, launch bounds for mmvq + q8_1 quant (ggml-org#15802) tests : add --list-ops and --show-coverage options (ggml-org#15745) gguf: gguf_writer refactor (ggml-org#15691) kv-cache : fix SWA checks + disable cacheless iSWA (ggml-org#15811) model-conversion : add --embeddings flag to modelcard.template [no ci] (ggml-org#15801) chat : fixed crash when Hermes 2 <tool_call> had a newline before it (ggml-org#15639) chat : nemotron thinking & toolcalling support (ggml-org#15676) scripts : add Jinja tester PySide6 simple app (ggml-org#15756) llama : add support for EmbeddingGemma 300m (ggml-org#15798) metal : Add template specialization for mul_mm_id w/ ne20 == 10 (ggml-org#15799) llama : set n_outputs to 1 to avoid 0 outputs mean-pooling (ggml-org#15791) CANN: Refactor ND to NZ workspace to be per-device (ggml-org#15763) server: add exceed_context_size_error type (ggml-org#15780) Document the new max GPU layers default in help (ggml-org#15771) ggml: add ops for WAN video model (cuda && cpu) (ggml-org#15669) CANN: Fix precision issue on 310I DUO multi-devices (ggml-org#15784) opencl: add hs=40 to FA (ggml-org#15758) CANN: fix acl_rstd allocation size in ggml_cann_rms_norm (ggml-org#15760) vulkan: fix mmv subgroup16 selection (ggml-org#15775) vulkan: don't use std::string in load_shaders, to improve compile time (ggml-org#15724) ...

* CANN:Refactor ND to NZ workspace to be per-device in Ascend backend - Replaced the previous single global ND→NZ workspace with a per-device cache using unordered_map keyed by device ID. - Functions `release_nz_workspace`, `relloc_nz_workspace`, and `get_nz_workspace` now manage workspace independently for each device, preventing memory conflicts in multi-device / pipeline parallel scenarios. - This change fixes potential precision issues caused by workspace overwrites when multiple devices perform ND→NZ conversions concurrently. Co-authored-by: hipudding <[email protected]> * refactor Signed-off-by: noemotiovon <[email protected]> * rename Signed-off-by: noemotiovon <[email protected]> * fix review comments Signed-off-by: noemotiovon <[email protected]> --------- Signed-off-by: noemotiovon <[email protected]> Co-authored-by: hipudding <[email protected]>

noemotiovon changed the title ~~CANN:Refactor ND to NZ workspace to be per-device in Ascend backend~~ CANN: Refactor ND to NZ workspace to be per-device in Ascend backend Sep 3, 2025

noemotiovon added the Ascend NPU issues specific to Ascend NPUs label Sep 3, 2025

github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Sep 3, 2025

noemotiovon mentioned this pull request Sep 3, 2025

修复NZ转ND多卡问题 cosdt/llama.cpp#27

Closed

hipudding reviewed Sep 3, 2025

View reviewed changes

ggml/src/ggml-cann/ggml-cann.cpp Outdated Show resolved Hide resolved

refactor

540d7b4

Signed-off-by: noemotiovon <[email protected]>

noemotiovon force-pushed the fix_nz branch from c44cc66 to 540d7b4 Compare September 4, 2025 02:48

hipudding reviewed Sep 4, 2025

View reviewed changes

ggml/src/ggml-cann/ggml-cann.cpp Outdated Show resolved Hide resolved

ggml/src/ggml-cann/ggml-cann.cpp Outdated Show resolved Hide resolved

ggml/src/ggml-cann/ggml-cann.cpp Outdated Show resolved Hide resolved

rename

600ef99

Signed-off-by: noemotiovon <[email protected]>

hipudding reviewed Sep 4, 2025

View reviewed changes

ggml/src/ggml-cann/ggml-cann.cpp Outdated Show resolved Hide resolved

ggml/src/ggml-cann/ggml-cann.cpp Outdated Show resolved Hide resolved

ggml/src/ggml-cann/ggml-cann.cpp Outdated Show resolved Hide resolved

ggml/src/ggml-cann/ggml-cann.cpp Outdated Show resolved Hide resolved

fix review comments

11d6aa7

Signed-off-by: noemotiovon <[email protected]>

noemotiovon force-pushed the fix_nz branch from cbf22a0 to 11d6aa7 Compare September 4, 2025 07:06

hipudding approved these changes Sep 4, 2025

View reviewed changes

hipudding merged commit c1c354e into ggml-org:master Sep 4, 2025
49 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CANN: Refactor ND to NZ workspace to be per-device in Ascend backend #15763

CANN: Refactor ND to NZ workspace to be per-device in Ascend backend #15763

Uh oh!

noemotiovon commented Sep 3, 2025

Uh oh!

noemotiovon commented Sep 3, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

CANN: Refactor ND to NZ workspace to be per-device in Ascend backend #15763

CANN: Refactor ND to NZ workspace to be per-device in Ascend backend #15763

Uh oh!

Conversation

noemotiovon commented Sep 3, 2025

Uh oh!

noemotiovon commented Sep 3, 2025

Model test on 2 devices with nd2nz:

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!