-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Use ccache for emscripten library build #25392
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
EMCC_CORES: 16 | ||
EMCC_USE_NINJA: 1 | ||
CCACHE_BASE_DIR: "ccache_dir" | ||
EM_COMPILER_WRAPPER: "ccache" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I saw that but I don't really see why it's necessary. I figured I'd try the easy thing first, and if it looks like it works, dig more into that. It may be that since we just want to use it for library compilation, it's enough.
@juj can you say more about why you wanted ccache to wrap the entire emscripten driver rather than just the underlying clang?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think @dschuff is trying to integrate ccache in another way: in the backend between emcc and clang.
My emsdk support is placed in user -> ccache -> emcc -> clang
. This looks like is doing user -> emcc -> ccache -> clang
.
Would be fantastic to see what the performance difference of this approach ends up being.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Performance-wise, the main difference would obviously be that we still have to run all the python code of the driver. Since this is a compile and not a link, there's not a huge amount of stuff that gets done, but certainly there's a little cost. Probably the builtin profiling support could estimate how much. But mostly I just picked this way because I didn't want to bother with a fork of ccache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason was performance. Then it would also work for final link, and e.g. wrap over binaryen invocations of wasm-opt and so on.
However I have to say that in my approach, I found performance gains to be very small, so we didn't end up deploying it at Unity. I do have an itch to re-try though, and see if I could optimize the ccache implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, @juj, we are trying this out as way to speed up our CI here in circleci. Currently there is no caching so each needs to build everything from scratch.
We are looking at some kind of shared cache in combination with heuristic_clear_cache.py
, or maybe just relying on ccache to notice when llvm changes.
BTW, how do your building handle LLVM changes? I guess you somehow clobber the build when llvm changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I was under the impression that ccache doesn't work for linking or other cases that aren't basically just source -> object file. It just falls back to the underlying compiler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, how do your building handle LLVM changes?
My ccache port looked at git hash if it was detected (developer installation), or if not, then emscripten-version.txt.
Additionally the contents of EM_CONFIG was hashed into the state.
The implementation can be seen here: ccache/ccache@master...juj:ccache:emscripten
Ah, I was under the impression that ccache doesn't work for linking or other cases that aren't basically just source -> object file. It just falls back to the underlying compiler.
Err actually, now that I scan through my fork, I think you are right: ccache does not cache link commands, only compile commands. So my fork didn't end up helping cache any wasm-opt calls either.
I haven't worked on the ccache fork in a while now - it looks like something has changed in CMake that it does not build with CMake >= 4 anymore. So it would need some freshening to bring up to speed again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My plan is to just key on the git version of clang or the emsdk installed by CI. Since I'm just trying to cache clang's output and not emscripten's, I think everything should be included in the clang version, the flags, the file inputs (i.e. the headers and sources).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't currently actually know whether ccache automatically takes the compiler version into account or not, but in order to have CircleCI automatically save and restore the cache across builds, I have to give it a cache key anyway.
As of eaf7d42 the proof-of-concept seems to work and appears to cut the time for a hot rebuild approximately in half (to about 13.5 minutes down from 26). |
.circleci/config.yml
Outdated
name: "Save Ccache cache" | ||
paths: | ||
- ~/.ccache | ||
key: clang-{{ checksum "~/emsdk/clang_version.txt" }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this will mean that each clang version has its own cache, so that cache effectively be invalidated on each llvm roll.
However, what about changes outside the llvm? i.e. emscripten changes? How should we hangle them? How does ccache decided if the compiler itself has changed? Has it hash the compile binary or something like that? Or is it just the command line string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ccache is only caching the output of clang itself (not the output of all of emcc). So if we invalidate the cache each time clang changes, that solves the problem of deciding whether the compiler has changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(and for emscripten changes: if clang's input changes, e.g. the sources of the libraries, then ccache handles that already with it usual hashing; and if something outside of that changes, then it doesn't affect the library build and we want a cache hit)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh nice!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could potentially even use it for caching the compiles for tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the link time will massively dominate that tests TBH
Nice, we should probably be using this caching for other things like node_modules and pip installs, but that is separate I suppose. |
OK, I think this version is usable. |
.circleci/config.yml
Outdated
condition: | ||
and: | ||
- equal: [ "main", << pipeline.git.branch >> ] | ||
- equal: [ "https://github.com/emscripten-core/emscripten", << pipeline.project.git_url >> ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe worth comment here? Why this condition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I originally had it in there to avoid the possibility of having a cache update from a forked PR from the main branch of another repo. But I just tested and I'm not sure it's actually necessary, the branch always appears as "pull/12345". So I could just take it out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I meant the overall condition? What is it trying to achieve? It looks like "Only preserve the cache when building main branch"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the idea is that the cache will be written only by the main branch and read by the other branches. That prevents PRs (which are untrusted) from being able to pollute caches used by other branches.
Use ccache to cache the output of clang when building the emscripten libraries.
When building on the main branch, save the ccache directory to the CircleCI cache affter
building, using the clang version as the key. When building from other branches (including PRs,
where the branch will appear as 'pull/12345') restore the cache before building.