Performance

The performance of ccache depends on a lot of factors, which makes it quite hard to predict the improvement for a given use case. This page contains some different performance measurements that try to give an idea about the potential speedup.

It should also be noted that if the expected hit rate is low, there may be a net performance loss when using ccache because of the overhead of cache misses (typically 5%-20%). Also, if the build machine is short on memory compared to the amount of memory used by the build tools (compiler, linker, etc), usage of ccache could decrease performance due the fact that ccache's cached files may flush other files from the OS's disk cache. See this mailing list post by Christopher Tate for a good write-up on this issue. So to sum it up: it is probably wise to perform some measurements with and without ccache for your typical use case before enabling it!

The following measurements were made on a fairly standard Linux-based desktop system: Intel Core i5-750, standard SATA disk, Ubuntu 10.04 with Linux 2.6.32 x86_64.

“ccache 3.0 direct” in the tables below means running ccache with direct mode enabled (which is the default) and “ccache 3.0 prepr.” means running ccache with CCACHE_NODIRECT=1, thus disabling the direct mode and enabling the preprocessor mode. (A side note: The performance of ccache 2.4 is very close to ccache 3.0 in preprocessor mode.)

ccache.c

Here are the results of building ccache's own ccache.c source code file 1000 times using perf.py with the flags -g -O2.

Elapsed time Percent Factor
Without ccache 367.11 s 100.00 % 1.0000 x
ccache 3.0 direct, first time 385.67 s 105.06 % 0.9519 x
ccache 3.0 direct, second time 9.70 s 2.64 % 37.8464 x
ccache 3.0 prepr., first time 382.26 s 104.13 % 0.9604 x
ccache 3.0 prepr., second time 23.90 s 6.51 % 15.3603 x

As can be seen above, cache hits in the direct mode are about 2.5 times faster than in preprocessor mode. The speedup compared to compiling without ccache is very large since the compilation costs a relatively large amount of CPU time. The overhead of cache misses can also be seen, though in this case, it's relatively small.

c++_includes.cc

This is a test that aims to measure preprocessor-intensive compilation. Here, c++_includes.cc (a file including nine common include files from the C++ standard library) was compiled 1000 times using perf.py with no special flags.

Elapsed time Percent Factor
Without ccache 189.17 s 100.00 % 1.0000 x
ccache 3.0 direct, first time 225.86 s 119.40 % 0.8376 x
ccache 3.0 direct, second time 9.05 s 4.78 % 20.9028 x
ccache 3.0 prepr., first time 215.90 s 114.13 % 0.8762 x
ccache 3.0 prepr., second time 40.86 s 21.60 % 4.6297 x

The difference between direct and preprocessor mode is about a factor 4.5 — much higher than for the ccache.c test because the preprocessor overhead is higher.

Samba 3.5.3

Here is a perhaps more realistic use case. First, the Samba 3.5.3 source code was unpacked, ./configure was run and then make (without -j) was run and timed. All tests were run eight times and only the best results were kept. Note that the figures also include linking and other things that aren't affected by ccache.

Warm disk cache

Elapsed time Percent Factor
Without ccache 316.23 s 100.00 % 1.0000 x
ccache 3.0 direct, first time 375.16 s 118.64 % 0.8429 x
ccache 3.0 direct, second time 32.09 s 10.15 % 9.8545 x
ccache 3.0 prepr., first time 360.62 s 114.04 % 0.8769 x
ccache 3.0 prepr., second time 161.44 s 51.05 % 1.9588 x
ccache 2.4, first time 359.42 s 113.66 % 0.8798 x
ccache 2.4, second time 159.31 s 50.38 % 1.9850 x

The cost of cache misses is relatively high: 14% for the preprocessor mode and 19% for the direct mode. The direct mode has higher overhead than the preprocessor mode for cache misses, but is much faster for cache hits. In fact, the speedup is in this case so high that it may become interesting to optimize how the Makefile is written. The rule for compiling a C file in Samba 3.5.3 looks like this:

.c.o:
	@if (: >> $@ || : > $@) >/dev/null 2>&1; then rm -f $@; else \
	 dir=`echo $@ | sed 's,/[^/]*$$,,;s,^$$,.,'` $(MAKEDIR); fi
	@if test -n "$(CC_CHECKER)"; then \
	  echo "Checking  $*.c with '$(CC_CHECKER)'";\
	  $(CHECK_CC); \
	 fi
	@echo Compiling $*.c
	@$(COMPILE) && exit 0;\
		echo "The following command failed:" 1>&2;\
		echo "$(subst ",\",$(COMPILE_CC))" 1>&2;\
		$(COMPILE_CC) >/dev/null 2>&1
By rewriting this to
.c.o:
	@echo Compiling $*.c
	@$(COMPILE)

the build time goes down from 32 seconds to about 23 seconds!

Cold disk cache

To measure the speed when the disk cache is cold, the disk cache was dropped with sync; echo 3 >/proc/sys/vm/drop_caches before running make.

Elapsed time Percent Factor
Without ccache 324.08 s 100.00 % 1.0000 x
ccache 3.0 direct, first time 382.24 s 120.87 % 0.8273 x
ccache 3.0 direct, second time 44.59 s 14.10 % 7.0919 x
ccache 3.0 prepr., first time 367.82 s 116.31 % 0.8597 x
ccache 3.0 prepr., second time 172.34 s 54.50 % 1.8349 x
ccache 2.4, first time 366.48 s 115.89 % 0.8629 x
ccache 2.4, second time 169.67 s 53.65 % 1.8638 x

A cold disk cache makes the performance gain on hits slightly lower because of the extra disk cache misses for files in the ccache directory.