Skip to content

Commit cc24b3b

Browse files
committed
[Doc] Add T-MAN news and update release link
1 parent 039feb6 commit cc24b3b

File tree

2 files changed

+3
-5
lines changed

2 files changed

+3
-5
lines changed

README.md

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,8 @@
1212

1313
## News
1414

15+
- 10/10/2024 🚀🚀: The idea of T-MAC extends its capabilities to NPU! For more information, check out the [t-man README](t-man/README.md) and try BitNet/Qwen3/Llama3 with the demo app!
16+
1517
- 10/21/2024 🎉🎉: [BitNet](https://github.com/microsoft/BitNet), powered by T-MAC, is open-sourced.
1618

1719
- 10/10/2024 🚀🚀: By updating and rebasing our llama.cpp version, T-MAC now support more models (e.g., qwen2) and the end-to-end performance is further improved by 10~15%! Try qwen2 using [the Official GPTQ model](https://huggingface.co/Qwen/Qwen2-7B-Instruct-GPTQ-Int4).
@@ -24,10 +26,6 @@
2426

2527
- 07/27/2024 ✨: We've noted that T-MAC is even faster than the NPU in token generation speed on the latest Snapdragon X Elite chipset! Check [Compared to NPU](#compared-to-npu) for more details.
2628

27-
- 07/23/2024 🚀🚀: We've enabled the execution of any 2-bit quantized Llama model in GPTQ format via T-MAC! Test it using the pretrained models released by [EfficientQAT](https://github.com/OpenGVLab/EfficientQAT).
28-
29-
- 07/22/2024 🚀🚀: We've added native deployment support for Windows on ARM. T-MAC demonstrates a substantial 5x speedup on the Surface Laptop 7.
30-
3129
## Introduction
3230

3331
T-MAC is a kernel library to directly support mixed-precision matrix multiplication (int1/2/3/4 x int8/fp16/fp32) without the need for dequantization by utilizing lookup tables. T-MAC aims to boost low-bit LLM inference on CPUs. T-MAC already offers support for various low-bit models, including W4A16 from GPTQ/gguf, W2A16 from [BitDistiller](https://github.com/DD-DuDa/BitDistiller)/[EfficientQAT](https://github.com/OpenGVLab/EfficientQAT) and W1(.58)A8 from [BitNet](https://huggingface.co/1bitLLM/bitnet_b1_58-3B) on OSX/Linux/Windows equipped with ARM/Intel CPUs.

t-man/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ By achieving up to 50 t/s token generation for [BitNet-2B-4T](https://huggingfac
1717

1818
### Use the Android App
1919

20-
- Get the apk from the [release page]().
20+
- Get the apk from the [release page](https://github.com/microsoft/T-MAC/releases).
2121
- Select a model (e.g., Qwen3-8B) in the settings. The model files will be downloaded automatically (requires internet access).
2222
- Load the model.
2323
- Enjoy your conversation!

0 commit comments

Comments
 (0)