File tree Expand file tree Collapse file tree 2 files changed +3
-3
lines changed Expand file tree Collapse file tree 2 files changed +3
-3
lines changed Original file line number Diff line number Diff line change @@ -261,6 +261,8 @@ huggingface-cli download ChenMnZ/Llama-3-8b-instruct-EfficientQAT-w2g128-GPTQ --
261
261
python tools/run_pipeline.py -o ${model_dir} -m llama-3-8b-2bit
262
262
```
263
263
264
+ > Use ` -p ` or ` -s ` argument to select the steps you want to run. And use ` -u ` argument to use our prebuilt kernels for ARM.
265
+
264
266
An example output:
265
267
266
268
```
@@ -288,8 +290,6 @@ Running STEP.6: Run inference
288
290
Check logs/2024-07-15-17-10-11.log for inference output
289
291
```
290
292
291
- Check [ e2e.md] ( docs/e2e.md ) for the purpose of each step.
292
-
293
293
## Upcoming Features
294
294
295
295
We will soon:
Original file line number Diff line number Diff line change 1
- # End-2-End Inference Through llama.cpp
1
+ # End-2-End Inference Through llama.cpp (legacy)
2
2
3
3
> The following guide use BitNet-3B. We will add instructions how to use GPTQ/GGUF/BitDistiller models or even your customized models.
4
4
You can’t perform that action at this time.
0 commit comments