Skip to content

Conversation

noemotiovon
Copy link
Collaborator

  • Replaced the previous single global ND→NZ workspace with a per-device cache using unordered_map keyed by device ID.
  • Functions release_nz_workspace, relloc_nz_workspace, and get_nz_workspace now manage workspace independently for each device, preventing memory conflicts in multi-device / pipeline parallel scenarios.
  • This change fixes potential precision issues caused by workspace overwrites when multiple devices perform ND→NZ conversions concurrently.

Co-authored-by: hipudding [email protected]

- Replaced the previous single global ND→NZ workspace with a per-device
  cache using unordered_map keyed by device ID.
- Functions `release_nz_workspace`, `relloc_nz_workspace`, and
  `get_nz_workspace` now manage workspace independently for each device,
  preventing memory conflicts in multi-device / pipeline parallel scenarios.
- This change fixes potential precision issues caused by workspace
  overwrites when multiple devices perform ND→NZ conversions concurrently.

Co-authored-by: hipudding <[email protected]>
@noemotiovon
Copy link
Collaborator Author

Model test on 2 devices with nd2nz:

# script
GGML_CANN_WEIGHT_NZ=1 ./bin/llama-cli -m /home/lichenguang25/.ollama/models/blobs/sha256-6f96e01a3f550ca08aea1e5725bb8d5a7eccc6f281c30417e9d380b8c46467bd 
-p "Building a website can be done in 10 steps:" -ngl 32
# log
build: 6362 (a83dc461) with cc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 for aarch64-linux-gnu
main: llama backend init
main: load the model and apply lora adapter, if any
llama_model_load_from_file_impl: using device CANN0 (Ascend910B4) - 29848 MiB free
llama_model_load_from_file_impl: using device CANN1 (Ascend910B4) - 29843 MiB free
......
user
Building a website can be done in 10 steps:
assistant
Here’s a list of 10 steps to build a website, categorized by functionality and complexity:

### 1. **Define Your Objectives and Audience**
   - **Objective**: Understand what you want to achieve with the website.
   - **Audience**: Identify who your target audience is, their needs, and what they expect from your website.

### 2. **Set Up Your Development Environment**
   - **Software**: Choose an appropriate web development platform. Popular options include HTML5, CSS3, and JavaScript.
   - **Hardware**: Ensure you have a reliable computer and necessary software installed.

### 3. **Choose a Domain Name**
   - **Purpose**: Ensure your domain name is unique and reflects your brand.
   - **Features**: Consider features like SSL, email support, and domain registration.

### 4. **Design Your Website**
   - **Website Structure**: Plan the layout and structure of your website.
   - **Content Creation**: Create high-quality, relevant content.
   - **Design Elements**: Incorporate design elements that enhance user experience.

### 5. **Choose a Content Management System (CMS)**
   - **Features**: Select a CMS that suits your needs (e.g., WordPress, Joomla, Drupal).
   - **Ease of Use**: Make sure the CMS is easy to use and maintain.

### 6. **Develop Your Website**
   - **Coding**: Use the CMS to design your website and build the pages.
   - **Testing**: Test your website thoroughly to ensure all functionalities are working as expected.

### 7. **Implement User Authentication**
   - **Features**: Implement a secure login system and ensure user privacy.
   - **Authentication**: Use secure password hashing and encryption methods.

### 8. **Deploy Your Website**
   - **Hosting**: Choose a hosting provider that suits your needs.
   - **Backup**: Regularly backup your website and data.

### 9. **Launch Your Website**
   - **Launch Date**: Set a launch date for your website.
   - **Launch Event**: Organize a launch event to introduce your website to your audience.

### 10. **Monitor and Update**
   - **Regular Updates**: Regularly update your website to fix bugs and improve functionality.
   - **Analytics**: Use analytics tools to track user behavior and improve your website.

### Additional Steps
- **SEO Optimization**: Optimize your website for search engines to increase visibility.
- **Social Media Integration**: Integrate social media features to increase engagement.
- **Content Marketing**: Publish valuable content to attract visitors.
- **Analytics**: Use analytics tools to track website performance and improve user experience.

### Best Practices
- **SEO**: Use optimized content, meta tags, and mobile-friendly design.
- **Security**: Ensure your website is secure with strong authentication and encryption.
- **User Experience**: Keep the website user-friendly and responsive.

By following these steps, you can build a robust and functional website that meets your needs and expectations.

> 
llama_perf_sampler_print:    sampling time =     166.28 ms /   627 runs   (    0.27 ms per token,  3770.66 tokens per second)
llama_perf_context_print:        load time =    1726.04 ms
llama_perf_context_print: prompt eval time =      36.21 ms /    20 tokens (    1.81 ms per token,   552.35 tokens per second)
llama_perf_context_print:        eval time =    3310.65 ms /   606 runs   (    5.46 ms per token,   183.05 tokens per second)
llama_perf_context_print:       total time =    4368.10 ms /   626 tokens
llama_perf_context_print:    graphs reused =        603

@noemotiovon noemotiovon changed the title CANN:Refactor ND to NZ workspace to be per-device in Ascend backend CANN: Refactor ND to NZ workspace to be per-device in Ascend backend Sep 3, 2025
@noemotiovon noemotiovon added the Ascend NPU issues specific to Ascend NPUs label Sep 3, 2025
@github-actions github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Sep 3, 2025
Signed-off-by: noemotiovon <[email protected]>
Signed-off-by: noemotiovon <[email protected]>
Signed-off-by: noemotiovon <[email protected]>
@hipudding hipudding merged commit c1c354e into ggml-org:master Sep 4, 2025
49 checks passed
gabe-l-hart added a commit to gabe-l-hart/llama.cpp that referenced this pull request Sep 4, 2025
…upport

* origin/master: (72 commits)
metal : Add template specialization for mul_mm_id w/ ne20 == 10 (ggml-org#15799)
llama : set n_outputs to 1 to avoid 0 outputs mean-pooling (ggml-org#15791)
CANN: Refactor ND to NZ workspace to be per-device (ggml-org#15763)
server: add exceed_context_size_error type (ggml-org#15780)
Document the new max GPU layers default in help (ggml-org#15771)
ggml: add ops for WAN video model (cuda && cpu) (ggml-org#15669)
CANN: Fix precision issue on 310I DUO multi-devices (ggml-org#15784)
opencl: add hs=40 to FA (ggml-org#15758)
CANN: fix acl_rstd allocation size in ggml_cann_rms_norm (ggml-org#15760)
vulkan: fix mmv subgroup16 selection (ggml-org#15775)
vulkan: don't use std::string in load_shaders, to improve compile time (ggml-org#15724)
vulkan : update ggml_vk_instance_validation_ext_available (ggml-org#15666)
ggml vulkan: add hardsigmoid and hardswish operations (ggml-org#15762)
CUDA: Optimize `rms_norm_f32` kernel and its fused variants, giving 1-6% perf E2E (ggml-org#15715)
model-conversion : fix pyright errors (ggml-org#15770)
sampling : optimize dist sampler (ggml-org#15704)
llama : fix incorrect model type for Gemma 270M (ggml-org#15764)
model-conversion : remove hardcoded /bin/bash shebangs [no ci] (ggml-org#15765)
CANN: Add RoPE contiguous check for 310I DUP device (ggml-org#15735)
ggml-cpu : optimize RVV kernels (ggml-org#15720)
...
gabe-l-hart added a commit to gabe-l-hart/llama.cpp that referenced this pull request Sep 5, 2025
…g-model-disabled-agent-prefill

* origin/master: (84 commits)
CUDA: fastdiv, launch bounds for mmvq + q8_1 quant (ggml-org#15802)
tests : add --list-ops and --show-coverage options (ggml-org#15745)
gguf: gguf_writer refactor (ggml-org#15691)
kv-cache : fix SWA checks + disable cacheless iSWA (ggml-org#15811)
model-conversion : add --embeddings flag to modelcard.template [no ci] (ggml-org#15801)
chat : fixed crash when Hermes 2 <tool_call> had a newline before it (ggml-org#15639)
chat : nemotron thinking & toolcalling support (ggml-org#15676)
scripts : add Jinja tester PySide6 simple app (ggml-org#15756)
llama : add support for EmbeddingGemma 300m (ggml-org#15798)
metal : Add template specialization for mul_mm_id w/ ne20 == 10 (ggml-org#15799)
llama : set n_outputs to 1 to avoid 0 outputs mean-pooling (ggml-org#15791)
CANN: Refactor ND to NZ workspace to be per-device (ggml-org#15763)
server: add exceed_context_size_error type (ggml-org#15780)
Document the new max GPU layers default in help (ggml-org#15771)
ggml: add ops for WAN video model (cuda && cpu) (ggml-org#15669)
CANN: Fix precision issue on 310I DUO multi-devices (ggml-org#15784)
opencl: add hs=40 to FA (ggml-org#15758)
CANN: fix acl_rstd allocation size in ggml_cann_rms_norm (ggml-org#15760)
vulkan: fix mmv subgroup16 selection (ggml-org#15775)
vulkan: don't use std::string in load_shaders, to improve compile time (ggml-org#15724)
...
walidbr pushed a commit to walidbr/llama.cpp that referenced this pull request Sep 7, 2025
* CANN:Refactor ND to NZ workspace to be per-device in Ascend backend

- Replaced the previous single global ND→NZ workspace with a per-device
  cache using unordered_map keyed by device ID.
- Functions `release_nz_workspace`, `relloc_nz_workspace`, and
  `get_nz_workspace` now manage workspace independently for each device,
  preventing memory conflicts in multi-device / pipeline parallel scenarios.
- This change fixes potential precision issues caused by workspace
  overwrites when multiple devices perform ND→NZ conversions concurrently.

Co-authored-by: hipudding <[email protected]>

* refactor

Signed-off-by: noemotiovon <[email protected]>

* rename

Signed-off-by: noemotiovon <[email protected]>

* fix review comments

Signed-off-by: noemotiovon <[email protected]>

---------

Signed-off-by: noemotiovon <[email protected]>
Co-authored-by: hipudding <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Ascend NPU issues specific to Ascend NPUs ggml changes relating to the ggml tensor library for machine learning
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants