You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
> The following process could be more complicated. However, if your deployment scenerio doesn't require a native build, you can use WSL/docker and follow the Ubuntu guide.
190
197
@@ -240,6 +247,8 @@ pip install wmi # To detect the native ARM64 CPU within x86_64 python
240
247
pip install . -v # or pip install -e . -v
241
248
```
242
249
250
+
</details>
251
+
243
252
### Verification
244
253
245
254
After that, you can verify the installation through: `python -c "import t_mac; print(t_mac.__version__); from tvm.contrib.clang import find_clang; print(find_clang())"`.
@@ -317,7 +326,6 @@ Our method exhibits several notable characteristics:
317
326
318
327
1. T-MAC shows a linear scaling ratio of FLOPs and inference latency relative to the number of bits. This contrasts with traditional convert-based methods, which fail to achieve additional speedup when reducing from 4 bits to lower bits.
319
328
2. T-MAC inherently supports bit-wise computation for int1/2/3/4, eliminating the need for dequantization. Furthermore, it accommodates all types of activations (e.g., fp8, fp16, int8) using fast table lookup and add instructions, bypassing the need for poorly supported fused-multiply-add instructions.
320
-
3. T-MAC holds the potential to realize performance gains across all processing units (PUs).
321
329
322
330
## Cite
323
331
If you find this repository useful, please use the following BibTeX entry for citation.
0 commit comments