Skip to content

Conversation

petrex
Copy link
Collaborator

@petrex petrex commented Sep 18, 2025

TLDR: Refactor setup.py for lazy loading and build optimization

This pull request refactors the setup.py build process to significantly reduce import overhead and improve build-time performance, especially for users who do not need to build C++/CUDA extensions. The main strategy is to defer heavy imports (like torch and torch.utils.cpp_extension) and submodule checks until they are actually needed at build time, rather than at import time. Additionally, the extension discovery and build logic is restructured for efficiency and maintainability.

Key changes include:

Build-time Import Optimization:

  • Defers heavy imports (such as torch and torch.utils.cpp_extension) from the top-level module scope to the specific functions or build steps where they are actually required. This reduces unnecessary overhead when running non-build commands or simply importing the package. [1] [2] [3] [4]

Custom Build Extension Logic:

  • Introduces a new LazyTorchAOBuildExt class that inherits from setuptools's build_ext and dynamically morphs into the real BuildExtension from torch.utils.cpp_extension only when building extensions. This ensures that submodule checks and extension discovery happen only when necessary. [1] [2] [3]
  • Moves submodule checks and extension discovery from import time to build time, further reducing unnecessary work for non-build operations.

Extension Discovery and Build Improvements:

  • Defers extension discovery (get_extensions()) to build time rather than at setup import, and initializes ext_modules as an empty list in the setup() call.
  • Improves logic for locating the torch CMake directory by using importlib.util.find_spec instead of distutils.sysconfig.get_python_lib, making it more robust and future-proof.

Debugging and Verbosity Enhancements:

  • Adds more granular and conditional debug print statements throughout the build process, controlled by the VERBOSE_BUILD environment variable or debug mode. This helps with diagnosing build issues without cluttering normal output. [1] [2] [3]

Package Discovery and Metadata:

  • Changes find_packages to only include torchao* packages, making package discovery more precise.

These changes together make the build process faster and less error-prone for users who do not need to build extensions, while retaining full build functionality for those who do.

- Introduced lazy imports for heavy dependencies like `torch` and `torch.utils.cpp_extension` to reduce initial import overhead.
- Replaced the existing `TorchAOBuildExt` class with `LazyTorchAOBuildExt` to defer submodule checks and extension discovery until build time.
- Updated the `setup()` function to set `ext_modules` to an empty list, deferring extension discovery for performance improvements.
- Enhanced debug output for build processes based on environment variables.

This refactor aims to streamline the build process and improve performance during package setup.
Copy link

pytorch-bot bot commented Sep 18, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3024

Note: Links to docs will display an error until the docs builds have been completed.

❌ 5 New Failures

As of commit d0e86c2 with merge base 9e5059e (image):

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 18, 2025
@petrex petrex added topic: not user facing Use this tag if you don't want this PR to show up in release notes enhancement New feature or request labels Sep 18, 2025
- Added blank lines for improved separation of code blocks.
- Reformatted list comprehensions for better clarity.
- Adjusted line breaks in function calls to enhance readability.

These changes aim to make the code more maintainable and easier to navigate.
@msaroufim
Copy link
Member

@petrex what kinds of build time speedups are you seeing? The import time for ao is indeed insane and our setup.py has gotten really complicated in the last year and could use some love

@petrex
Copy link
Collaborator Author

petrex commented Sep 18, 2025

Quick test in my env I am seeing 0.5–1.5s faster cold start.
This is coming from :

  • Avoided torch import
  • Avoided recursive source/glob scans
  • Avoided submodule checks unless actually building.

Build time itself is unchanged; only pre-build overhead is reduced.

@petrex
Copy link
Collaborator Author

petrex commented Sep 18, 2025

looks like I accidentally break some other things, let me look into that later this week.

…kflow

- Introduced the `PIP_NO_BUILD_ISOLATION` environment variable in the `build_wheels_linux.yml` workflow to ensure that the `setuptools` installed in the `pre_build_script.sh` is accessible during the build process.

This change aims to improve the build process by allowing the use of the correct version of `setuptools` without isolation issues.
- Removed the `PIP_NO_BUILD_ISOLATION` environment variable from the `build_wheels_linux.yml` workflow.
- Added the `PIP_NO_BUILD_ISOLATION` export to the `env_var_script_linux.sh` to ensure pre-installed tools are accessible during the build process.

These changes aim to streamline the build environment and maintain consistency in the usage of environment variables.
- Added a check to ensure that auditwheel is only executed if the wheel contains at least one shared object (.so) file.
- Included a message to indicate when auditwheel is skipped due to the absence of shared libraries.
- Updated the wheel removal command to use `rm -f` for safer deletion.

These changes improve the robustness of the post-build process by preventing unnecessary execution of auditwheel.
@petrex petrex self-assigned this Sep 18, 2025
- Updated the script to determine the original wheel file produced by the build process, ensuring that the correct wheel is used for auditwheel operations.
- Changed the wheel installation command to select the most recent wheel file from the distribution directory.
- Enhanced logging messages to reflect the changes in wheel handling.

These modifications enhance the reliability and clarity of the post-build process.
@petrex
Copy link
Collaborator Author

petrex commented Sep 18, 2025

hey @msaroufim

Seeing this in CI, is that something you could help?
torchao::unpack_tensor_core_tiled_layout' with arguments from the 'CUDA' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions.

@petrex petrex mentioned this pull request Sep 19, 2025
- Updated the logic to skip CUDA extension compilation only if both CUDA_HOME is unset and nvcc is not found, improving compatibility with CI environments.
- Adjusted the condition for building CUDA extensions to check for the presence of either CUDA_HOME or nvcc.

These changes aim to provide clearer messaging and better support for CUDA extension compilation in various environments.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. enhancement New feature or request topic: not user facing Use this tag if you don't want this PR to show up in release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants