[doc] Update docs

kaleid-liner · kaleid-liner · commit 1ad13639c85e · 2024-07-30T16:22:10.000+08:00
diff --git a/README.md b/README.md
@@ -261,6 +261,8 @@ huggingface-cli download ChenMnZ/Llama-3-8b-instruct-EfficientQAT-w2g128-GPTQ --
 python tools/run_pipeline.py -o ${model_dir} -m llama-3-8b-2bit
 ```
 
+> Use `-p` or `-s` argument to select the steps you want to run. And use `-u` argument to use our prebuilt kernels for ARM.
+
 An example output:
 
 ```
@@ -288,8 +290,6 @@ Running STEP.6: Run inference
 Check logs/2024-07-15-17-10-11.log for inference output
 ```
 
-Check [e2e.md](docs/e2e.md) for the purpose of each step.
-
 ## Upcoming Features
 
 We will soon:
diff --git a/docs/e2e.md b/docs/e2e.md
@@ -1,4 +1,4 @@
-# End-2-End Inference Through llama.cpp
+# End-2-End Inference Through llama.cpp (legacy)
 
 > The following guide use BitNet-3B. We will add instructions how to use GPTQ/GGUF/BitDistiller models or even your customized models.
 

Original file line number	Diff line number	Diff line change
`@@ -1,4 +1,4 @@`
`1`		`-# End-2-End Inference Through llama.cpp`
	`1`	`+# End-2-End Inference Through llama.cpp (legacy)`
`2`	`2`
`3`	`3`	`> The following guide use BitNet-3B. We will add instructions how to use GPTQ/GGUF/BitDistiller models or even your customized models.`
`4`	`4`