-
Notifications
You must be signed in to change notification settings - Fork 13.1k
CANN: Refactor ND to NZ workspace to be per-device in Ascend backend #15763
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
- Replaced the previous single global ND→NZ workspace with a per-device cache using unordered_map keyed by device ID. - Functions `release_nz_workspace`, `relloc_nz_workspace`, and `get_nz_workspace` now manage workspace independently for each device, preventing memory conflicts in multi-device / pipeline parallel scenarios. - This change fixes potential precision issues caused by workspace overwrites when multiple devices perform ND→NZ conversions concurrently. Co-authored-by: hipudding <[email protected]>
Model test on 2 devices with nd2nz:# script
GGML_CANN_WEIGHT_NZ=1 ./bin/llama-cli -m /home/lichenguang25/.ollama/models/blobs/sha256-6f96e01a3f550ca08aea1e5725bb8d5a7eccc6f281c30417e9d380b8c46467bd
-p "Building a website can be done in 10 steps:" -ngl 32 # log
build: 6362 (a83dc461) with cc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 for aarch64-linux-gnu
main: llama backend init
main: load the model and apply lora adapter, if any
llama_model_load_from_file_impl: using device CANN0 (Ascend910B4) - 29848 MiB free
llama_model_load_from_file_impl: using device CANN1 (Ascend910B4) - 29843 MiB free
......
user
Building a website can be done in 10 steps:
assistant
Here’s a list of 10 steps to build a website, categorized by functionality and complexity:
### 1. **Define Your Objectives and Audience**
- **Objective**: Understand what you want to achieve with the website.
- **Audience**: Identify who your target audience is, their needs, and what they expect from your website.
### 2. **Set Up Your Development Environment**
- **Software**: Choose an appropriate web development platform. Popular options include HTML5, CSS3, and JavaScript.
- **Hardware**: Ensure you have a reliable computer and necessary software installed.
### 3. **Choose a Domain Name**
- **Purpose**: Ensure your domain name is unique and reflects your brand.
- **Features**: Consider features like SSL, email support, and domain registration.
### 4. **Design Your Website**
- **Website Structure**: Plan the layout and structure of your website.
- **Content Creation**: Create high-quality, relevant content.
- **Design Elements**: Incorporate design elements that enhance user experience.
### 5. **Choose a Content Management System (CMS)**
- **Features**: Select a CMS that suits your needs (e.g., WordPress, Joomla, Drupal).
- **Ease of Use**: Make sure the CMS is easy to use and maintain.
### 6. **Develop Your Website**
- **Coding**: Use the CMS to design your website and build the pages.
- **Testing**: Test your website thoroughly to ensure all functionalities are working as expected.
### 7. **Implement User Authentication**
- **Features**: Implement a secure login system and ensure user privacy.
- **Authentication**: Use secure password hashing and encryption methods.
### 8. **Deploy Your Website**
- **Hosting**: Choose a hosting provider that suits your needs.
- **Backup**: Regularly backup your website and data.
### 9. **Launch Your Website**
- **Launch Date**: Set a launch date for your website.
- **Launch Event**: Organize a launch event to introduce your website to your audience.
### 10. **Monitor and Update**
- **Regular Updates**: Regularly update your website to fix bugs and improve functionality.
- **Analytics**: Use analytics tools to track user behavior and improve your website.
### Additional Steps
- **SEO Optimization**: Optimize your website for search engines to increase visibility.
- **Social Media Integration**: Integrate social media features to increase engagement.
- **Content Marketing**: Publish valuable content to attract visitors.
- **Analytics**: Use analytics tools to track website performance and improve user experience.
### Best Practices
- **SEO**: Use optimized content, meta tags, and mobile-friendly design.
- **Security**: Ensure your website is secure with strong authentication and encryption.
- **User Experience**: Keep the website user-friendly and responsive.
By following these steps, you can build a robust and functional website that meets your needs and expectations.
>
llama_perf_sampler_print: sampling time = 166.28 ms / 627 runs ( 0.27 ms per token, 3770.66 tokens per second)
llama_perf_context_print: load time = 1726.04 ms
llama_perf_context_print: prompt eval time = 36.21 ms / 20 tokens ( 1.81 ms per token, 552.35 tokens per second)
llama_perf_context_print: eval time = 3310.65 ms / 606 runs ( 5.46 ms per token, 183.05 tokens per second)
llama_perf_context_print: total time = 4368.10 ms / 626 tokens
llama_perf_context_print: graphs reused = 603
|
hipudding
reviewed
Sep 3, 2025
Signed-off-by: noemotiovon <[email protected]>
hipudding
reviewed
Sep 4, 2025
Signed-off-by: noemotiovon <[email protected]>
hipudding
reviewed
Sep 4, 2025
Signed-off-by: noemotiovon <[email protected]>
hipudding
approved these changes
Sep 4, 2025
gabe-l-hart
added a commit
to gabe-l-hart/llama.cpp
that referenced
this pull request
Sep 4, 2025
…upport * origin/master: (72 commits) metal : Add template specialization for mul_mm_id w/ ne20 == 10 (ggml-org#15799) llama : set n_outputs to 1 to avoid 0 outputs mean-pooling (ggml-org#15791) CANN: Refactor ND to NZ workspace to be per-device (ggml-org#15763) server: add exceed_context_size_error type (ggml-org#15780) Document the new max GPU layers default in help (ggml-org#15771) ggml: add ops for WAN video model (cuda && cpu) (ggml-org#15669) CANN: Fix precision issue on 310I DUO multi-devices (ggml-org#15784) opencl: add hs=40 to FA (ggml-org#15758) CANN: fix acl_rstd allocation size in ggml_cann_rms_norm (ggml-org#15760) vulkan: fix mmv subgroup16 selection (ggml-org#15775) vulkan: don't use std::string in load_shaders, to improve compile time (ggml-org#15724) vulkan : update ggml_vk_instance_validation_ext_available (ggml-org#15666) ggml vulkan: add hardsigmoid and hardswish operations (ggml-org#15762) CUDA: Optimize `rms_norm_f32` kernel and its fused variants, giving 1-6% perf E2E (ggml-org#15715) model-conversion : fix pyright errors (ggml-org#15770) sampling : optimize dist sampler (ggml-org#15704) llama : fix incorrect model type for Gemma 270M (ggml-org#15764) model-conversion : remove hardcoded /bin/bash shebangs [no ci] (ggml-org#15765) CANN: Add RoPE contiguous check for 310I DUP device (ggml-org#15735) ggml-cpu : optimize RVV kernels (ggml-org#15720) ...
gabe-l-hart
added a commit
to gabe-l-hart/llama.cpp
that referenced
this pull request
Sep 5, 2025
…g-model-disabled-agent-prefill * origin/master: (84 commits) CUDA: fastdiv, launch bounds for mmvq + q8_1 quant (ggml-org#15802) tests : add --list-ops and --show-coverage options (ggml-org#15745) gguf: gguf_writer refactor (ggml-org#15691) kv-cache : fix SWA checks + disable cacheless iSWA (ggml-org#15811) model-conversion : add --embeddings flag to modelcard.template [no ci] (ggml-org#15801) chat : fixed crash when Hermes 2 <tool_call> had a newline before it (ggml-org#15639) chat : nemotron thinking & toolcalling support (ggml-org#15676) scripts : add Jinja tester PySide6 simple app (ggml-org#15756) llama : add support for EmbeddingGemma 300m (ggml-org#15798) metal : Add template specialization for mul_mm_id w/ ne20 == 10 (ggml-org#15799) llama : set n_outputs to 1 to avoid 0 outputs mean-pooling (ggml-org#15791) CANN: Refactor ND to NZ workspace to be per-device (ggml-org#15763) server: add exceed_context_size_error type (ggml-org#15780) Document the new max GPU layers default in help (ggml-org#15771) ggml: add ops for WAN video model (cuda && cpu) (ggml-org#15669) CANN: Fix precision issue on 310I DUO multi-devices (ggml-org#15784) opencl: add hs=40 to FA (ggml-org#15758) CANN: fix acl_rstd allocation size in ggml_cann_rms_norm (ggml-org#15760) vulkan: fix mmv subgroup16 selection (ggml-org#15775) vulkan: don't use std::string in load_shaders, to improve compile time (ggml-org#15724) ...
walidbr
pushed a commit
to walidbr/llama.cpp
that referenced
this pull request
Sep 7, 2025
* CANN:Refactor ND to NZ workspace to be per-device in Ascend backend - Replaced the previous single global ND→NZ workspace with a per-device cache using unordered_map keyed by device ID. - Functions `release_nz_workspace`, `relloc_nz_workspace`, and `get_nz_workspace` now manage workspace independently for each device, preventing memory conflicts in multi-device / pipeline parallel scenarios. - This change fixes potential precision issues caused by workspace overwrites when multiple devices perform ND→NZ conversions concurrently. Co-authored-by: hipudding <[email protected]> * refactor Signed-off-by: noemotiovon <[email protected]> * rename Signed-off-by: noemotiovon <[email protected]> * fix review comments Signed-off-by: noemotiovon <[email protected]> --------- Signed-off-by: noemotiovon <[email protected]> Co-authored-by: hipudding <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Ascend NPU
issues specific to Ascend NPUs
ggml
changes relating to the ggml tensor library for machine learning
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
release_nz_workspace
,relloc_nz_workspace
, andget_nz_workspace
now manage workspace independently for each device, preventing memory conflicts in multi-device / pipeline parallel scenarios.Co-authored-by: hipudding [email protected]