Use ccache for emscripten library build #25392

dschuff · 2025-09-25T20:56:00Z

Use ccache to cache the output of clang when building the emscripten libraries.
When building on the main branch, save the ccache directory to the CircleCI cache affter
building, using the clang version as the key. When building from other branches (including PRs,
where the branch will appear as 'pull/12345') restore the cache before building.

sbc100 · 2025-09-25T21:08:45Z

.circleci/config.yml

      EMCC_CORES: 16
      EMCC_USE_NINJA: 1
+      CCACHE_BASE_DIR: "ccache_dir"
+      EM_COMPILER_WRAPPER: "ccache"


I don't know if this is enough. I know that @juj had a custom patch for ccache and added a specific _EMCC_CCACHE feature to support it. See #13681

Yeah I saw that but I don't really see why it's necessary. I figured I'd try the easy thing first, and if it looks like it works, dig more into that. It may be that since we just want to use it for library compilation, it's enough.

@juj can you say more about why you wanted ccache to wrap the entire emscripten driver rather than just the underlying clang?

I think @dschuff is trying to integrate ccache in another way: in the backend between emcc and clang.

My emsdk support is placed in user -> ccache -> emcc -> clang. This looks like is doing user -> emcc -> ccache -> clang.

Would be fantastic to see what the performance difference of this approach ends up being.

Performance-wise, the main difference would obviously be that we still have to run all the python code of the driver. Since this is a compile and not a link, there's not a huge amount of stuff that gets done, but certainly there's a little cost. Probably the builtin profiling support could estimate how much. But mostly I just picked this way because I didn't want to bother with a fork of ccache.

The reason was performance. Then it would also work for final link, and e.g. wrap over binaryen invocations of wasm-opt and so on.

However I have to say that in my approach, I found performance gains to be very small, so we didn't end up deploying it at Unity. I do have an itch to re-try though, and see if I could optimize the ccache implementation.

BTW, @juj, we are trying this out as way to speed up our CI here in circleci. Currently there is no caching so each needs to build everything from scratch.

We are looking at some kind of shared cache in combination with heuristic_clear_cache.py, or maybe just relying on ccache to notice when llvm changes.

BTW, how do your building handle LLVM changes? I guess you somehow clobber the build when llvm changes?

Ah, I was under the impression that ccache doesn't work for linking or other cases that aren't basically just source -> object file. It just falls back to the underlying compiler.

BTW, how do your building handle LLVM changes?

My ccache port looked at git hash if it was detected (developer installation), or if not, then emscripten-version.txt.

Additionally the contents of EM_CONFIG was hashed into the state.

The implementation can be seen here: ccache/ccache@master...juj:ccache:emscripten

Ah, I was under the impression that ccache doesn't work for linking or other cases that aren't basically just source -> object file. It just falls back to the underlying compiler.

Err actually, now that I scan through my fork, I think you are right: ccache does not cache link commands, only compile commands. So my fork didn't end up helping cache any wasm-opt calls either.

I haven't worked on the ccache fork in a while now - it looks like something has changed in CMake that it does not build with CMake >= 4 anymore. So it would need some freshening to bring up to speed again.

My plan is to just key on the git version of clang or the emsdk installed by CI. Since I'm just trying to cache clang's output and not emscripten's, I think everything should be included in the clang version, the flags, the file inputs (i.e. the headers and sources).

I don't currently actually know whether ccache automatically takes the compiler version into account or not, but in order to have CircleCI automatically save and restore the cache across builds, I have to give it a cache key anyway.

dschuff · 2025-09-26T00:05:42Z

As of eaf7d42 the proof-of-concept seems to work and appears to cut the time for a hot rebuild approximately in half (to about 13.5 minutes down from 26).
It's not actually as fast as I hoped a build with what should be 100% hit rate might be. But it's fast!
The cold build is also slower, at 36 minutes, so it's not completely a slam dunk. But it's looking promising.

sbc100 · 2025-09-26T00:14:34Z

.circleci/config.yml

+          name: "Save Ccache cache"
+          paths:
+            - ~/.ccache
+          key: clang-{{ checksum "~/emsdk/clang_version.txt" }}


So this will mean that each clang version has its own cache, so that cache effectively be invalidated on each llvm roll.

However, what about changes outside the llvm? i.e. emscripten changes? How should we hangle them? How does ccache decided if the compiler itself has changed? Has it hash the compile binary or something like that? Or is it just the command line string?

ccache is only caching the output of clang itself (not the output of all of emcc). So if we invalidate the cache each time clang changes, that solves the problem of deciding whether the compiler has changed.

(and for emscripten changes: if clang's input changes, e.g. the sources of the libraries, then ccache handles that already with it usual hashing; and if something outside of that changes, then it doesn't affect the library build and we want a cache hit)

We could potentially even use it for caching the compiles for tests.

I think the link time will massively dominate that tests TBH

sbc100 · 2025-09-26T00:36:55Z

Nice, we should probably be using this caching for other things like node_modules and pip installs, but that is separate I suppose.

dschuff · 2025-09-26T22:47:24Z

OK, I think this version is usable.
It saves the cache when building from main, and restores it otherwise, using the clang version as the key.
We could potentially do something more fancy (e.g. if you have some change in your PR that updates a header or something, and it changes the inputs for all the library files, then you'll have to rebuild everything, every time you push to that PR. In theory we could try to cache between pushes in the same PR, but that would take some careful work).
But it would be interesting to see how it works for now.
Also this means the cache will get invalidated whenever a new version of clang rolls, which will result in fairly frequent invalidations. We could reduce the rate at which we roll clang, or we could change the rate at which the emsdk sync into github changes (right now we just always pull 'tot'). I'm not exactly sure how we'd do the latter but it might be better if we can; that way we'd still get the benefit of the Chromium CI covering a smaller number of LLVM changes at once.

sbc100 · 2025-09-26T23:01:01Z

.circleci/config.yml

+          condition:
+            and:
+              - equal: [ "main", << pipeline.git.branch >> ]
+              - equal: [ "https://github.com/emscripten-core/emscripten", << pipeline.project.git_url >> ]


Maybe worth comment here? Why this condition?

I originally had it in there to avoid the possibility of having a cache update from a forked PR from the main branch of another repo. But I just tested and I'm not sure it's actually necessary, the branch always appears as "pull/12345". So I could just take it out.

I guess I meant the overall condition? What is it trying to achieve? It looks like "Only preserve the cache when building main branch"?

Yes, the idea is that the cache will be written only by the main branch and read by the other branches. That prevents PRs (which are untrusted) from being able to pollute caches used by other branches.

dschuff added 3 commits September 25, 2025 20:55

Try ccache for lib build

39d2b25

smaller build

68d1165

absolute directory, print stats, restore cache

3bc90a5

sbc100 reviewed Sep 25, 2025

View reviewed changes

dschuff added 11 commits September 25, 2025 21:17

use a new key

7e6a05f

rename phase

7f32ac3

fix env var

4312b65

debug ccache

7ae8ffd

unset CCACHE_DIR

6fe5544

name restore step

8156345

bigger test

010f7a2

rename

98ffe31

restore all steps

54d3efc

cache based on clang version

cf6fe42

remove extraneous change

eaf7d42

zero the stats before persisting the cache

588b0f8

sbc100 reviewed Sep 26, 2025

View reviewed changes

dschuff added 3 commits September 26, 2025 00:24

conditionalize main vs PR

be30413

do a minimal build

50466f5

try it with named branch

a408894

dschuff added 8 commits September 26, 2025 00:45

try with repo match

0ec58b1

update condition

e22c0f2

switch event

831178e

use name

6ee7b08

undo temporary changes

9a5f05a

Merge branch 'main' into ccache-libs

048bdf4

update condition

1ec1225

use PR_NUMBER

4f7de3d

dschuff added 3 commits September 26, 2025 22:30

use not-main

c79c297

remove debug prints

1756551

fix posixtestsuite

e9ad070

dschuff changed the title ~~Try ccache for lib build~~ Use ccache for emscripten library build Sep 26, 2025

dschuff marked this pull request as ready for review September 26, 2025 22:41

sbc100 approved these changes Sep 26, 2025

View reviewed changes

remove unnecessary condition, and add comment

dfb4244

Use ccache for emscripten library build #25392

Are you sure you want to change the base?

Use ccache for emscripten library build #25392

Conversation

dschuff commented Sep 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dschuff commented Sep 26, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dschuff Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sbc100 commented Sep 26, 2025

Uh oh!

dschuff commented Sep 26, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dschuff commented Sep 25, 2025 •

edited

Loading

dschuff Sep 26, 2025 •

edited

Loading