Skip to content

Commit 7ddd9d5

Browse files
authored
Merge pull request #29 from kaleid-liner/main
Add Android cross-compilation support (#12, #18)
2 parents 21cedbf + 775eeb6 commit 7ddd9d5

File tree

7 files changed

+328
-126
lines changed

7 files changed

+328
-126
lines changed

README.md

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -277,6 +277,16 @@ pip install . -v # or pip install -e . -v
277277

278278
</details>
279279

280+
</details>
281+
<details>
282+
<summary><h3>Android</h3></summary>
283+
284+
First, follow the normal workflow to install T-MAC on your PC (OSX/Ubuntu recommended).
285+
286+
Then, refer to [Android Cross Compilation Guidance](docs/android.md).
287+
288+
</details>
289+
280290
### Verification
281291

282292
After that, you can verify the installation through: `python -c "import t_mac; print(t_mac.__version__); from tvm.contrib.clang import find_clang; print(find_clang())"`.
@@ -335,13 +345,16 @@ Running STEP.6: Run inference
335345
Check logs/2024-07-15-17-10-11.log for inference output
336346
```
337347

348+
Please note that main is used here do demo token generation output. Use `3rdparty/llama.cpp/build/bin/llama-bench` to benchmark performance. A benchmark script is also provided at `tools/bench_e2e.py`.
349+
338350
## Upcoming Features
339351

340352
We will soon:
341353

342354
- [x] Add `I4` format to simplify the deployment of 4-bit models.
343355
- [x] Embed T-MAC GEMM kernels into llama.cpp to accelerate prefill/prompt.
344-
- [ ] Android cross-compilation guidance
356+
- [x] Android cross-compilation guidance
357+
- [ ] Merge latest llama.cpp for more functionalities
345358
- [ ] Optimize for ARMv9 CPU with SME2 through LUTI4
346359

347360
## Techniques

deploy/tvmrpc-release.apk

18.3 MB
Binary file not shown.

docs/android.md

Lines changed: 98 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,98 @@
1+
## Android Cross Compilation Guidance
2+
3+
### Pre-requisites
4+
5+
Install platform-tools and ndk from [Android Studio](https://developer.android.com/studio) or [command line tools](https://developer.android.com/studio#command-line-tools-only). Please make sure that `adb` can be found in PATH and set `NDK_HOME`.
6+
7+
For example, in my PC:
8+
9+
```
10+
export PATH="$HOME/Library/Android/sdk/platform-tools:$PATH"
11+
export NDK_HOME="$HOME/Library/Android/sdk/ndk/26.1.10909125"
12+
```
13+
14+
**There are three options to cross-compile T-MAC for Android, from simple to complex**:
15+
16+
### Option.1: Use Prebuilt Kernels
17+
18+
Using prebuilt kernels is the simplest solution.
19+
20+
```
21+
python tools/run_pipeline.py -o ~/Downloads/test_models/llama-2-7b-eqat-w2g128-gptq -m llama-2-7b-2bit -d android -ndk $NDK_HOME -u
22+
```
23+
24+
Please note these arguments:
25+
- `-as`, `--adb_serial`: If there are multiple ADB devices connected to your host computer, you need to specify it according to results of `adb devices -l`.
26+
- `-rd`, `--remote_dir`: Our binaries and models are pushed to `/data/local/tmp` for execution. Alter this argument to change the directory.
27+
28+
Here, we specify `-u` to use the prebuilt kernels. The performance may not be optimal.
29+
30+
### Option.2: Cross Compilation without Tuning
31+
32+
```
33+
cd $NDK_HOME/build/tools
34+
python make_standalone_toolchain.py --arch arm64 --install-dir /opt/android-toolchain-arm64
35+
export TVM_NDK_CC=/opt/android-toolchain-arm64/bin/clang++
36+
# Back to T-MAC root dir
37+
python tools/run_pipeline.py -o ~/Downloads/test_models/llama-2-7b-eqat-w2g128-gptq -m llama-2-7b-2bit -d android -ndk $NDK_HOME -dt
38+
```
39+
40+
Here, we specify `-dt` to disable tuning. The performance may not be optimal.
41+
42+
### Option.3: Tuning (experimental)
43+
44+
Install TVM RPC APK:
45+
46+
```
47+
# Back to T-MAC root dir
48+
adb install deploy/tvmrpc-release.apk
49+
```
50+
51+
Start RPC tracker:
52+
53+
```
54+
python -m tvm.exec.rpc_tracker
55+
```
56+
57+
Connect to the tracker in the TVM RPC APK by setting the fields:
58+
59+
- Address: Make sure the Android and your host PC are in the same network (e.g., wlan). Type the IP address of your host PC here
60+
- Port: 9190
61+
- Key: android
62+
63+
Then toggle on `Enable RPC`.
64+
65+
Verify the RPC setup with:
66+
```
67+
python -m tvm.exec.query_rpc_tracker
68+
```
69+
70+
The setup is successful if you get something like this:
71+
72+
```
73+
Tracker address 0.0.0.0:9190
74+
75+
Server List
76+
------------------------------
77+
server-address key
78+
------------------------------
79+
192.168.67.86:5001 server:android
80+
------------------------------
81+
82+
Queue Status
83+
-------------------------------
84+
key total free pending
85+
-------------------------------
86+
android 1 1 0
87+
-------------------------------
88+
```
89+
90+
Finally:
91+
92+
```
93+
cd $NDK_HOME/build/tools
94+
python make_standalone_toolchain.py --arch arm64 --install-dir /opt/android-toolchain-arm64
95+
export TVM_NDK_CC=/opt/android-toolchain-arm64/bin/clang++
96+
# Back to T-MAC root dir
97+
python tools/run_pipeline.py -o ~/Downloads/test_models/llama-2-7b-eqat-w2g128-gptq -m llama-2-7b-2bit -d android -ndk $NDK_HOME
98+
```

python/t_mac/ops/base.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -96,7 +96,9 @@ def tuning(
9696
tuner = autotvm.tuner.GridSearchTuner(task)
9797

9898
def _preload_function(remote: rpc.RPCSession, build_result: tvm.runtime.Module):
99-
remote.get_function("runtime.config_threadpool")(thread_affinity, self.num_threads)
99+
# remote.get_function("runtime.config_threadpool")(thread_affinity, self.num_threads)
100+
# TODO: fix this in Android RPC
101+
pass
100102

101103
if self.remote_kwargs is not None:
102104
measure_option = autotvm.measure_option(

python/t_mac/platform.py

Lines changed: 117 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,8 @@
22
import platform
33
from typing import Tuple, List
44
import logging
5+
import copy
6+
import os
57

68
logger = logging.getLogger("platform")
79

@@ -68,3 +70,118 @@ def is_win() -> bool:
6870
def is_arm() -> bool:
6971
"""Check if is windows or not"""
7072
return get_system_info()[1] == "aarch64"
73+
74+
75+
_device_kwargs = {
76+
"m2": {
77+
"target": "llvm -mtriple=arm64-apple-darwin23.1.0 -mcpu=apple-m2",
78+
"eval_kwargs": {
79+
"min_repeat_ms": 50,
80+
"repeat": 100,
81+
},
82+
# "remote_kwargs": {
83+
# "key": "local",
84+
# "host": os.environ.get("TVM_TRACKER_HOST", "0.0.0.0"),
85+
# "port": int(os.environ.get("TVM_TRACKER_PORT", 9190)),
86+
# "build_func": "default",
87+
# "timeout": 600,
88+
# },
89+
"remote_kwargs": None,
90+
"cc_opts": ["-O3", "-std=c++17", "-mcpu=apple-m2", "-mllvm", "-inline-threshold=10000"] + get_osx_isysroot(),
91+
"out_dtype": "float16",
92+
"aggregation_dtype": "int32",
93+
},
94+
"android": {
95+
"target": "llvm -device=arm_cpu -mtriple=aarch64-linux-gnu -mattr=+v8.2a,+fullfp16,+fp-armv8,+neon",
96+
"eval_kwargs": {
97+
"number": 10,
98+
"repeat": 10,
99+
},
100+
"remote_kwargs": {
101+
"key": "android",
102+
"host": os.environ.get("TVM_TRACKER_HOST", "0.0.0.0"),
103+
"port": int(os.environ.get("TVM_TRACKER_PORT", 9190)),
104+
"build_func": "ndk",
105+
"timeout": 600,
106+
},
107+
"cc": os.environ.get("TVM_NDK_CC", "clang++"),
108+
"cc_opts": ["-O3", "-march=armv8.2a+fp16", "-mllvm", "-inline-threshold=10000"],
109+
"out_dtype": "float16",
110+
"aggregation_dtype": "int32",
111+
},
112+
"intel_win": {
113+
"target": "llvm -mtriple=x86_64-pc-windows-msvc -mcpu=core-avx2",
114+
"eval_kwargs": {
115+
"number": 10,
116+
"repeat": 10,
117+
},
118+
"remote_kwargs": None,
119+
# TODO: check if inline-threshold is needed for other devices
120+
"cc_opts": ["-O3", "-march=native", "-mllvm", "-inline-threshold=10000"],
121+
"out_dtype": "float32",
122+
"aggregation_dtype": "int32",
123+
},
124+
"intel_linux": {
125+
"target": "llvm -mtriple=x86_64-unknown-linux-gnu -mcpu=core-avx2",
126+
"eval_kwargs": {
127+
"number": 10,
128+
"repeat": 10,
129+
},
130+
"remote_kwargs": None,
131+
# TODO: check if inline-threshold is needed for other devices
132+
"cc_opts": ["-O3", "-march=native", "-mllvm", "-inline-threshold=10000"],
133+
"out_dtype": "float32",
134+
"aggregation_dtype": "int32",
135+
},
136+
"jetson": {
137+
"target": "llvm -device=arm_cpu -mtriple=aarch64-linux-gnu -mattr=+v8.2a,+fullfp16,+fp-armv8,+neon",
138+
"eval_kwargs": {
139+
"number": 10,
140+
"repeat": 10,
141+
},
142+
"remote_kwargs": None,
143+
"cc_opts": ["-O3", "-std=c++17", "-march=armv8.2a+fp16", "-mllvm", "-inline-threshold=10000"],
144+
"out_dtype": "float16",
145+
"aggregation_dtype": "int32",
146+
},
147+
"arm_win": {
148+
"target": "llvm -device=arm_cpu -mtriple=aarch64-pc-windows-msvc -mattr=+v8.2a,+fullfp16,+fp-armv8,+neon",
149+
"eval_kwargs": {
150+
"number": 10,
151+
"repeat": 10,
152+
},
153+
"remote_kwargs": None,
154+
"cc_opts": ["-O3", "-std=c++17", "-march=armv8.2a+fp16", "-mllvm", "-inline-threshold=10000"],
155+
"out_dtype": "float16",
156+
"aggregation_dtype": "int32",
157+
},
158+
}
159+
160+
161+
def get_devices():
162+
return list(_device_kwargs.keys())
163+
164+
165+
_platform_device_default_map = {
166+
("Darwin", "aarch64"): "m2",
167+
("Linux", "aarch64"): "jetson",
168+
("Windows", "x86_64"): "intel_win",
169+
("Linux", "x86_64"): "intel_linux",
170+
("Windows", "aarch64"): "arm_win",
171+
}
172+
173+
174+
def get_default_device_kwargs(device: str = ""):
175+
if not device:
176+
device = _platform_device_default_map[get_system_info()]
177+
return copy.deepcopy(_device_kwargs.get(device, {}))
178+
179+
180+
def get_arch(device: str = ""):
181+
if not device:
182+
return get_system_info()[1]
183+
elif device == "android":
184+
return "aarch64"
185+
else:
186+
_, arch = next(key for key, value in _platform_device_default_map.items() if value == device)
187+
return arch

python/t_mac/utils.py

Lines changed: 1 addition & 107 deletions
Original file line numberDiff line numberDiff line change
@@ -1,112 +1,6 @@
1-
import os
2-
import copy
31
import numpy as np
42

5-
from .platform import get_osx_isysroot, get_system_info
6-
7-
8-
_device_kwargs = {
9-
"m2": {
10-
"target": "llvm -mtriple=arm64-apple-darwin23.1.0 -mcpu=apple-m2",
11-
"eval_kwargs": {
12-
"min_repeat_ms": 50,
13-
"repeat": 100,
14-
},
15-
# "remote_kwargs": {
16-
# "key": "local",
17-
# "host": os.environ.get("TVM_TRACKER_HOST", "0.0.0.0"),
18-
# "port": int(os.environ.get("TVM_TRACKER_PORT", 9190)),
19-
# "build_func": "default",
20-
# "timeout": 600,
21-
# },
22-
"remote_kwargs": None,
23-
"cc_opts": ["-O3", "-std=c++17", "-mcpu=apple-m2", "-mllvm", "-inline-threshold=10000"] + get_osx_isysroot(),
24-
"out_dtype": "float16",
25-
"aggregation_dtype": "int32",
26-
},
27-
"android": {
28-
"target": "llvm -device=arm_cpu -mtriple=aarch64-linux-gnu -mattr=+v8.2a,+fullfp16,+fp-armv8,+neon",
29-
"eval_kwargs": {
30-
"number": 10,
31-
"repeat": 10,
32-
},
33-
"remote_kwargs": {
34-
"key": "android",
35-
"host": os.environ.get("TVM_TRACKER_HOST", "0.0.0.0"),
36-
"port": int(os.environ.get("TVM_TRACKER_PORT", 9190)),
37-
"build_func": "ndk",
38-
"timeout": 600,
39-
},
40-
"cc_opts": ["-O3", "-march=armv8.2a+fp16", "-mllvm", "-inline-threshold=10000"],
41-
"out_dtype": "float16",
42-
"aggregation_dtype": "int32",
43-
},
44-
"intel_win": {
45-
"target": "llvm -mtriple=x86_64-pc-windows-msvc -mcpu=core-avx2",
46-
"eval_kwargs": {
47-
"number": 10,
48-
"repeat": 10,
49-
},
50-
"remote_kwargs": None,
51-
# TODO: check if inline-threshold is needed for other devices
52-
"cc_opts": ["-O3", "-march=native", "-mllvm", "-inline-threshold=10000"],
53-
"out_dtype": "float32",
54-
"aggregation_dtype": "int32",
55-
},
56-
"intel_linux": {
57-
"target": "llvm -mtriple=x86_64-unknown-linux-gnu -mcpu=core-avx2",
58-
"eval_kwargs": {
59-
"number": 10,
60-
"repeat": 10,
61-
},
62-
"remote_kwargs": None,
63-
# TODO: check if inline-threshold is needed for other devices
64-
"cc_opts": ["-O3", "-march=native", "-mllvm", "-inline-threshold=10000"],
65-
"out_dtype": "float32",
66-
"aggregation_dtype": "int32",
67-
},
68-
"jetson": {
69-
"target": "llvm -device=arm_cpu -mtriple=aarch64-linux-gnu -mattr=+v8.2a,+fullfp16,+fp-armv8,+neon",
70-
"eval_kwargs": {
71-
"number": 10,
72-
"repeat": 10,
73-
},
74-
"remote_kwargs": None,
75-
"cc_opts": ["-O3", "-std=c++17", "-march=armv8.2a+fp16", "-mllvm", "-inline-threshold=10000"],
76-
"out_dtype": "float16",
77-
"aggregation_dtype": "int32",
78-
},
79-
"arm_win": {
80-
"target": "llvm -device=arm_cpu -mtriple=aarch64-pc-windows-msvc -mattr=+v8.2a,+fullfp16,+fp-armv8,+neon",
81-
"eval_kwargs": {
82-
"number": 10,
83-
"repeat": 10,
84-
},
85-
"remote_kwargs": None,
86-
"cc_opts": ["-O3", "-std=c++17", "-march=armv8.2a+fp16", "-mllvm", "-inline-threshold=10000"],
87-
"out_dtype": "float16",
88-
"aggregation_dtype": "int32",
89-
},
90-
}
91-
92-
93-
def get_devices():
94-
return list(_device_kwargs.keys())
95-
96-
97-
_platform_device_default_map = {
98-
("Darwin", "aarch64"): "m2",
99-
("Linux", "aarch64"): "jetson",
100-
("Windows", "x86_64"): "intel_win",
101-
("Linux", "x86_64"): "intel_linux",
102-
("Windows", "aarch64"): "arm_win",
103-
}
104-
105-
106-
def get_default_device_kwargs(device: str = ""):
107-
if device == "":
108-
device = _platform_device_default_map[get_system_info()]
109-
return copy.deepcopy(_device_kwargs.get(device, {}))
3+
from .platform import * # for backward compatibility
1104

1115

1126
def get_bits_alphas(bits: int):

0 commit comments

Comments
 (0)