-
Notifications
You must be signed in to change notification settings - Fork 338
Refactor setup.py for lazy loading and build optimization #3024
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Introduced lazy imports for heavy dependencies like `torch` and `torch.utils.cpp_extension` to reduce initial import overhead. - Replaced the existing `TorchAOBuildExt` class with `LazyTorchAOBuildExt` to defer submodule checks and extension discovery until build time. - Updated the `setup()` function to set `ext_modules` to an empty list, deferring extension discovery for performance improvements. - Enhanced debug output for build processes based on environment variables. This refactor aims to streamline the build process and improve performance during package setup.
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3024
Note: Links to docs will display an error until the docs builds have been completed. ❌ 5 New FailuresAs of commit d0e86c2 with merge base 9e5059e ( NEW FAILURES - The following jobs have failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
- Added blank lines for improved separation of code blocks. - Reformatted list comprehensions for better clarity. - Adjusted line breaks in function calls to enhance readability. These changes aim to make the code more maintainable and easier to navigate.
@petrex what kinds of build time speedups are you seeing? The import time for ao is indeed insane and our setup.py has gotten really complicated in the last year and could use some love |
Quick test in my env I am seeing 0.5–1.5s faster cold start.
Build time itself is unchanged; only pre-build overhead is reduced. |
looks like I accidentally break some other things, let me look into that later this week. |
…kflow - Introduced the `PIP_NO_BUILD_ISOLATION` environment variable in the `build_wheels_linux.yml` workflow to ensure that the `setuptools` installed in the `pre_build_script.sh` is accessible during the build process. This change aims to improve the build process by allowing the use of the correct version of `setuptools` without isolation issues.
- Removed the `PIP_NO_BUILD_ISOLATION` environment variable from the `build_wheels_linux.yml` workflow. - Added the `PIP_NO_BUILD_ISOLATION` export to the `env_var_script_linux.sh` to ensure pre-installed tools are accessible during the build process. These changes aim to streamline the build environment and maintain consistency in the usage of environment variables.
- Added a check to ensure that auditwheel is only executed if the wheel contains at least one shared object (.so) file. - Included a message to indicate when auditwheel is skipped due to the absence of shared libraries. - Updated the wheel removal command to use `rm -f` for safer deletion. These changes improve the robustness of the post-build process by preventing unnecessary execution of auditwheel.
- Updated the script to determine the original wheel file produced by the build process, ensuring that the correct wheel is used for auditwheel operations. - Changed the wheel installation command to select the most recent wheel file from the distribution directory. - Enhanced logging messages to reflect the changes in wheel handling. These modifications enhance the reliability and clarity of the post-build process.
hey @msaroufim Seeing this in CI, is that something you could help? |
- Updated the logic to skip CUDA extension compilation only if both CUDA_HOME is unset and nvcc is not found, improving compatibility with CI environments. - Adjusted the condition for building CUDA extensions to check for the presence of either CUDA_HOME or nvcc. These changes aim to provide clearer messaging and better support for CUDA extension compilation in various environments.
TLDR: Refactor setup.py for lazy loading and build optimization
This pull request refactors the
setup.py
build process to significantly reduce import overhead and improve build-time performance, especially for users who do not need to build C++/CUDA extensions. The main strategy is to defer heavy imports (liketorch
andtorch.utils.cpp_extension
) and submodule checks until they are actually needed at build time, rather than at import time. Additionally, the extension discovery and build logic is restructured for efficiency and maintainability.Key changes include:
Build-time Import Optimization:
torch
andtorch.utils.cpp_extension
) from the top-level module scope to the specific functions or build steps where they are actually required. This reduces unnecessary overhead when running non-build commands or simply importing the package. [1] [2] [3] [4]Custom Build Extension Logic:
LazyTorchAOBuildExt
class that inherits fromsetuptools
'sbuild_ext
and dynamically morphs into the realBuildExtension
fromtorch.utils.cpp_extension
only when building extensions. This ensures that submodule checks and extension discovery happen only when necessary. [1] [2] [3]Extension Discovery and Build Improvements:
get_extensions()
) to build time rather than at setup import, and initializesext_modules
as an empty list in thesetup()
call.torch
CMake directory by usingimportlib.util.find_spec
instead ofdistutils.sysconfig.get_python_lib
, making it more robust and future-proof.Debugging and Verbosity Enhancements:
VERBOSE_BUILD
environment variable or debug mode. This helps with diagnosing build issues without cluttering normal output. [1] [2] [3]Package Discovery and Metadata:
find_packages
to only includetorchao*
packages, making package discovery more precise.These changes together make the build process faster and less error-prone for users who do not need to build extensions, while retaining full build functionality for those who do.