Skip to content

Conversation

ggerganov
Copy link
Member

@ggerganov ggerganov commented Mar 20, 2024

fix #6173

We were padding kv_self.n but not n_ctx, leading to unaligned memory access with Metal

@ggerganov ggerganov changed the title metal : require ne00 >= 128 for mat-mat kernels metal : pad n_ctx by 32 Mar 21, 2024
@ggerganov ggerganov merged commit 95d576b into master Mar 22, 2024
@ggerganov ggerganov deleted the gg/metal-fix-mm branch March 22, 2024 07:36
hodlen pushed a commit to hodlen/llama.cpp that referenced this pull request Apr 1, 2024
* metal : require ne00 >= 128 for mat-mat kernels

ggml-ci

* llama : pad n_ctx by 32

ggml-ci
hodlen pushed a commit to hodlen/llama.cpp that referenced this pull request Apr 3, 2024
* metal : require ne00 >= 128 for mat-mat kernels

ggml-ci

* llama : pad n_ctx by 32

ggml-ci
tybalex pushed a commit to rubra-ai/tools.cpp that referenced this pull request Apr 17, 2024
* metal : require ne00 >= 128 for mat-mat kernels

ggml-ci

* llama : pad n_ctx by 32

ggml-ci
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Regression: llama.cpp produces nonsensical outputs when using batched decoding on Metal
1 participant