Skip to content

Conversation

abhishek-singh591
Copy link

@abhishek-singh591 abhishek-singh591 commented Aug 23, 2025

Optimized ONNX Transform via Class Merging and Thread Pooling

This PR follows up on #539 – Optimized ONNX transform class via multithreading.

It merges the FP16 and Split ONNX transform classes into a single implementation to eliminate redundant tensor loading and iteration. Additionally, the transform logic has been refactored to use a thread pool, replacing the previous sequential loop to parallelize tensor operations.

Performance Benchmarks:-

Model Original Duration (s) Optimized Duration (s)
LLaMA 3.1 8B 88.35 58.55
LLaMA 3.1 70B 1029.82 727.37

Note: Thread count is set to os.cpu_count() * 4 to better handle I/O-bound workloads. Performance may vary depending on system hardware and threading capabilities.

Copy link
Contributor

@ochougul ochougul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. can merge if CI is passing

@abhishek-singh591 abhishek-singh591 force-pushed the optimized_onnx_tranform branch 2 times, most recently from 00fcaf2 to 9b9c41d Compare September 5, 2025 11:12
if not apply_clip and not apply_split:
warnings.warn("Both apply_clip and apply_split are False. Skipping transformation.")
return model, False

external_data_helper.load_external_data_for_model(model, onnx_base_dir)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though combining the transform save time in this case, it also reduces the flexibility we have over multiple transforms. In future if we need to add more transforms the condition would become more complex and if its a new transform would need to load the tensors again. I have added few changes as part #538 of the memory clean and reducing the peak memory usage. Can you check if the same concepts can be used here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, its kind off tradeoff between time and flexibility, let me check that will get back to you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants