Add LLaDA-7b-MoE diffusion model #16003
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Add support for https://huggingface.co/inclusionAI/LLaDA-MoE-7B-A1B-Instruct, MoE diffusion models similar to OLMoE (except the QK norm). Added two ggufs - bf16 and q8_0
Example command:
./llama-diffusion-cli -m llada-moe-7B-instruct-BF16.gguf -p "Write code to train MNIST in pytroch" -ngl 99 --diffusion-block-length 32 --diffusion-steps 256 -ub 256 --diffusion-algorithm 4 -fa 0 --temp 0 -sys "You are a helpful AI assistant"