Optimized ONNX Transform via Class Merging and Thread Pooling #546

abhishek-singh591 · 2025-08-23T09:53:23Z

Optimized ONNX Transform via Class Merging and Thread Pooling

This PR follows up on #539 – Optimized ONNX transform class via multithreading.

It merges the FP16 and Split ONNX transform classes into a single implementation to eliminate redundant tensor loading and iteration. Additionally, the transform logic has been refactored to use a thread pool, replacing the previous sequential loop to parallelize tensor operations.

Performance Benchmarks:-

Model	Original Duration (s)	Optimized Duration (s)
LLaMA 3.1 8B	88.35	58.55
LLaMA 3.1 70B	1029.82	727.37

Note: Thread count is set to os.cpu_count() * 4 to better handle I/O-bound workloads. Performance may vary depending on system hardware and threading capabilities.

QEfficient/base/onnx_transforms.py

ochougul

LGTM. can merge if CI is passing

QEfficient/base/onnx_transforms.py

Signed-off-by: abhishek-singh591 <[email protected]>

…added removed comments Signed-off-by: abhishek-singh591 <[email protected]>

Signed-off-by: abhishek-singh591 <[email protected]>

quic-rishinr · 2025-09-10T09:33:16Z

QEfficient/base/onnx_transforms.py

+        if not apply_clip and not apply_split:
+            warnings.warn("Both apply_clip and apply_split are False. Skipping transformation.")
+            return model, False
+
        external_data_helper.load_external_data_for_model(model, onnx_base_dir)


Even though combining the transform save time in this case, it also reduces the flexibility we have over multiple transforms. In future if we need to add more transforms the condition would become more complex and if its a new transform would need to load the tensors again. I have added few changes as part #538 of the memory clean and reducing the peak memory usage. Can you check if the same concepts can be used here?

Yeah, its kind off tradeoff between time and flexibility, let me check that will get back to you.

abhishek-singh591 requested review from quic-rishinr, ochougul, quic-hemagnih and quic-amitraj as code owners August 23, 2025 09:53

ochougul requested changes Sep 1, 2025

View reviewed changes

QEfficient/base/onnx_transforms.py Outdated Show resolved Hide resolved

QEfficient/base/onnx_transforms.py Show resolved Hide resolved

QEfficient/base/onnx_transforms.py Show resolved Hide resolved

abhishek-singh591 force-pushed the optimized_onnx_tranform branch from 2df6780 to a528b29 Compare September 3, 2025 09:40

abhishek-singh591 requested a review from ochougul September 3, 2025 10:06

ochougul requested changes Sep 4, 2025

View reviewed changes

QEfficient/base/onnx_transforms.py Show resolved Hide resolved

abhishek-singh591 force-pushed the optimized_onnx_tranform branch 2 times, most recently from 00fcaf2 to 9b9c41d Compare September 5, 2025 11:12

abhishek-singh591 requested a review from ochougul September 5, 2025 18:03

abhishek-singh591 added 6 commits September 10, 2025 06:23

merged fp16 and split in onnx transform

6361518

Signed-off-by: abhishek-singh591 <[email protected]>

Add warning when both flags apply_split and apply_clip are false and …

e957632

…added removed comments Signed-off-by: abhishek-singh591 <[email protected]>

Add warning when both flags apply_split and apply_clip are false and …

a9b01c3

…added removed comments Signed-off-by: abhishek-singh591 <[email protected]>

Fixed transform flag and other importing related issue

cfc4809

Signed-off-by: abhishek-singh591 <[email protected]>

Deleted run.py file

6910ece

Signed-off-by: abhishek-singh591 <[email protected]>

Modified the default model_name argument

f64b429

Signed-off-by: abhishek-singh591 <[email protected]>

abhishek-singh591 force-pushed the optimized_onnx_tranform branch from e83ac9e to f64b429 Compare September 10, 2025 06:24

quic-rishinr requested changes Sep 10, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimized ONNX Transform via Class Merging and Thread Pooling #546

Optimized ONNX Transform via Class Merging and Thread Pooling #546

abhishek-singh591 commented Aug 23, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ochougul left a comment

Uh oh!

Uh oh!

quic-rishinr Sep 10, 2025

Uh oh!

abhishek-singh591 Sep 10, 2025

Uh oh!

Uh oh!

Optimized ONNX Transform via Class Merging and Thread Pooling #546

Are you sure you want to change the base?

Optimized ONNX Transform via Class Merging and Thread Pooling #546

Conversation

abhishek-singh591 commented Aug 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!