Skip to content

Conversation

potiuk
Copy link
Member

@potiuk potiuk commented Sep 6, 2025

Previously we were only using prek cache from canary builds, but since we are starting to use several separate prek runs, it makes sense to install prek hooks only once and store them in cache and reuse even in the same build.

This PR removes "only-canary" prek cache preparation - now all builds including all PRs from fork preapare prek cache once and upload them as artifacts - then restoring prek cache for every prek run should be much faster.


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in airflow-core/newsfragments.

@boring-cyborg boring-cyborg bot added area:dev-tools backport-to-v3-0-test Mark PR with this label to backport to v3-0-test branch labels Sep 6, 2025
@potiuk
Copy link
Member Author

potiuk commented Sep 6, 2025

I need to measure the gain, but I feel it's worth it

@potiuk potiuk changed the title Improving cache usage for prek hooks Improve cache usage for prek hooks Sep 6, 2025
@potiuk potiuk force-pushed the improve-caching-for-prek branch 3 times, most recently from a90be24 to a977405 Compare September 6, 2025 16:11
@jscheffl
Copy link
Contributor

jscheffl commented Sep 6, 2025

I'd be interested in the gain. Do we actually have "endless" storage? Do we need to purge this at some time? How can cache be invalidated in case of "The problem is not DNS but cache corruption"?

@potiuk
Copy link
Member Author

potiuk commented Sep 6, 2025

cc: @jscheffl

I'd be interested in the gain. Do we actually have "endless" storage? Do we need to purge this at some time? How can cache be invalidated in case of "The problem is not DNS but cache corruption"?

  • installing uv + prek without cache: 1m 18s
  • installing uv + prek with cache: 12 s

Certainly worth it.

The stash / restore action works in the way that it uses artifacts from current and "main" builds if current is not available - and even the "current" build uses initially the "main" build as source of cache.

We have automated retention of the cache (each artifact is kept for 2 days ( retention-days: '2'`) - so with our "velocity" that's more than enough :)

We can invalidate the cache by bumping the v6 to v7 and so-on -> this how we can force cache invalidation. Also any change to teh .pre-commit-yaml will make the hash change and cache invalidated. In the future when we get monorepo we will hash all the .pre-commit-config.yml files matching the **/,pre-commit-config.yml - so hash wil be calculated from all the .yml files. :)

Very good question BTW.

Copy link
Contributor

@jscheffl jscheffl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the answers and aaaah, now I see it in the YAML (in German we say "Tomatoes on my eyes..."): retention-days: '2' and keyfor the cache.

Great!

Previously we were only using prek cache from canary builds, but
since we are starting to use several separate prek runs, it makes
sense to install prek hooks only once and store them in cache and
reuse even in the same build.

This PR removes "only-canary" prek cache preparation - now all
builds including all PRs from fork preapare prek cache once and
upload them as artifacts - then restoring prek cache for every
prek run should be much faster.
@potiuk potiuk force-pushed the improve-caching-for-prek branch from a977405 to 3e47332 Compare September 6, 2025 21:10
@potiuk
Copy link
Member Author

potiuk commented Sep 6, 2025

Thanks for the answers and aaaah, now I see it in the YAML (in German we say "Tomatoes on my eyes..."): retention-days: '2' and keyfor the cache.

Indeed @assignUser had done a very good job with the action we are using.

BTW, Jacob is known as Jacob Wujciak-Jens ... so you have something in common @jscheffl :D

path: /tmp/cache-prek.tar.gz
if-no-files-found: 'error'
retention-days: '2'
if: inputs.save-cache == 'true'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we are defaulting to upload cache to artifact for every build, do we need this check?

Copy link
Member

@gopidesupavan gopidesupavan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM :) nice

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:dev-tools backport-to-v3-0-test Mark PR with this label to backport to v3-0-test branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants