-
Notifications
You must be signed in to change notification settings - Fork 381
Overlay: add automation ID to cache key #3080
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This commit adds automation ID to the overlay-base database cache key so that we properly distinguish different analyses in the same repo for the same language. Since I am changing the cache key format, I also moved the CodeQL bundle version to the end of the cache restore key, in case we want to remove it from the restore key sometime in the future. Note that I chose to leave CACHE_VERSION unchanged because the old and the new cache keys are sufficiently different that there should be no risk of confusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR enhances the overlay-base database caching mechanism by adding automation ID to the cache key to properly distinguish different analyses in the same repository for the same language. The change restructures the cache key format and includes hashing of additional components while moving the CodeQL bundle version to the end for future flexibility.
- Adds automation ID to cache key components for better analysis differentiation
- Introduces a hashing mechanism for cache key components to maintain manageable key length
- Restructures the cache key format and makes getCacheRestoreKey async
Reviewed Changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated no comments.
File | Description |
---|---|
src/overlay-database-utils.ts | Implements the core changes: adds automation ID to cache key, converts getCacheRestoreKey to async, and introduces component hashing |
src/overlay-database-utils.test.ts | Updates tests to mock getAutomationID function for the new async cache key generation |
lib/init-action.js | Generated JavaScript compilation output reflecting the TypeScript changes |
lib/analyze-action.js | Generated JavaScript compilation output reflecting the TypeScript changes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great, thank you for proactively tackling this problem! I agree with your reasoning for keeping the cache version the same as well.
I only have a few minor, non-blocking comments.
src/overlay-database-utils.ts
Outdated
function createCacheKeyHash(components: Record<string, any>): string { | ||
const componentsJson = JSON.stringify( | ||
components, | ||
Object.keys(components).sort(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: This makes sense, but it's not very obvious from the documentation for JSON.stringify
that the keys in the first argument will be stringified in the order given by the second argument, so I had to convince myself that this works as intended by actually trying it. Perhaps worth a small note here or a quick unit test that throws some permutations of an object at this function and checks that the resulting hash is the same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a good question. I read the JSON.stringify() documentation, and you are right that there is no mentioning of the replacer
array (2nd argument here) affecting the ordering of object properties.
What I found was a promise that object property ordering is already guaranteed to be stable without the replacer
array. So I copied that documentation into a code comment and removed the replacer
arugment.
// Technically we can also include languages and codeQlVersion in the | ||
// componentsHash, but including them explicitly in the cache key makes it | ||
// easier to debug and understand the cache key structure. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: I was initially going to ask whether languages
should be hashed too (since it can potentially be a long list), but then I saw your comment here, which makes sense.
For completeness, another option to consider here might be to only hash languages
if there is more than one. That keeps the benefit of making it easier to understand for the "usual" case, while avoiding overly long keys.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion!
When I looked into it a while back, IIRC most analyses list 1 (most common) or 2 (much less common) languages. A handful have 3, and none have more. So I think I will keep things the way they are right now.
const componentsHash = createCacheKeyHash(cacheKeyComponents); | ||
|
||
// For a cached overlay-base database to be considered compatible for overlay | ||
// analysis, all components in the cache restore key must match: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't recall if we discussed the possibility of using partial restore keys in the PR that added the overlay caches (I skimmed over it again, but didn't see anything). You have probably considered it already (since there's a comment further up talking about partial restore keys), but it could be worth adding once you are happy with how stable the overlay database functionality is (e.g. to use an existing overlay database when a new CodeQL CLI version is released without dropping the CodeQL version from the cache key).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the interesting comment! We will keep that in consideration.
src/overlay-database-utils.ts
Outdated
@@ -393,10 +395,14 @@ async function generateCacheKey( | |||
checkoutPath: string, | |||
): Promise<string> { | |||
const sha = await getCommitOid(checkoutPath); | |||
return `${getCacheRestoreKey(config, codeQlVersion)}${sha}`; | |||
const restoreKey = await getCacheRestoreKey(config, codeQlVersion); | |||
return `${restoreKey}${sha}`; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing I am not sure I realised until now is that restoreCache
is happy to use the primary cache key as a prefix for restoring a cache. I think I was previously under the assumption that only the partial restore keys (if any) were prefix-matched. It might be worth adding a comment for this somewhere, since it might otherwise be non-obvious how restoring the cache can work if sha
is included here, but not when restoring the cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about it for quite a while. What is an effective way to highlight the prefix matching used for cache restore?
I ended up making three changes:
- Rename functions and variables to highlight the difference between "save key" and "restore key"
- Append "Prefix" to the restore key function and variable names to highlight the fact that they are key prefixes
- Add comment that the save key consists of the restore key prefix followed by the checkout SHA
Hopefully that will make things clearer!
src/overlay-database-utils.ts
Outdated
async function getCacheRestoreKey( | ||
config: Config, | ||
codeQlVersion: string, | ||
): Promise<string> { | ||
// The restore key (prefix) specifies which cached overlay-base databases are | ||
// compatible with the current analysis: the cached database must have the | ||
// same cache version and the same CodeQL bundle version. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this comment be removed now that there's a more detailed one further down / merged with that one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right that there is a lot of overlap between the two comment blocks, but they serve somewhat different purposes.
I think of this comment as mainly about "why are we returning only one key when Actions Cache supports multiple restore keys". The comment near the end of this function, in contrast, is about what goes into the restore key.
In any case, your point is well taken, and I updated the comments to remove the overlap.
This commit updates componentsJson computation to call JSON.stringify() without the replacer array and documents why the result is stable.
This PR adds automation ID to the overlay-base database cache key so that we properly distinguish different analyses in the same repo for the same language.
Since I am changing the cache key format, I also moved the CodeQL bundle version to the end of the cache restore key, in case we want to remove it from the restore key sometime in the future.
Note that I chose to leave
CACHE_VERSION
unchanged because the old and the new cache keys are sufficiently different that there should be no risk of confusion.Changes in this PR has been validated in an internal test repository.
Risk assessment
For internal use only. Please select the risk level of this change:
Merge / deployment checklist