Caching¶
Methanol comes with an RFC-compliant HTTP cache that supports disk & memory storage backends. There's also an extension for Redis.
Setup¶
An HttpCache
is utilized by injecting it into a Methanol
client.
// Select a size limit thats suitable for your application.
long maxSizeInBytes = 100 * 1024 * 1024; // 100 MBs
var cache = HttpCache.newBuilder()
.cacheOnDisk(Path.of(".cache"), maxSizeInBytes)
.build();
// The cache intercepts requests you send through this client.
var client = Methanol.newBuilder()
.cache(cache)
.build();
// Don't forget to close the cache when you're done!
cache.close();
// Select a size limit thats suitable for your application.
long maxSizeInBytes = 50 * 1024 * 1024; // 50 MBs
var cache = HttpCache.newBuilder()
.cacheOnMemory(maxSizeInBytes)
.build();
// The cache intercepts requests you send through this client.
var client = Methanol.newBuilder()
.cache(cache)
.build();
// Don't forget to close the cache when you're done!
cache.close();
Hint
You can pass the builder a custom Executor
for launching asynchronous tasks needed by the
cache. By default, an unbounded thread pool of daemon threads is used.
Caution
To avoid surprises, make sure the disk directory is exclusively owned by a single cache instance
as long as it's open and nothing else. The cache enforces that to some degree by complaining with
an IOException
if it's initialized with a directory that's already in use by another instance
in the same or a different JVM. Note that you can use the same HttpCache
with multiple clients.
An HTTP client can also be configured with a chain of caches, typically in the order of decreasing locality. The chain is invoked in the given order, and a cache either returns the response if it has a suitable one, or forwards to the next cache (or finally to the network) otherwise.
var memoryCache = HttpCache.newBuilder()
.cacheOnMemory(100 * 1024 * 1024)
.build();
var diskCache = HttpCache.newBuilder()
.cacheOnDisk(Path.of(".cache"), 500 * 1024 * 1024)
.build();
var client = Methanol.newBuilder()
.cacheChain(List.of(memoryCache, diskCache))
.build();
Usage¶
An HTTP cache is a transparent layer between you and the origin server. Its main goal is to save time & bandwidth by avoiding network if requested resources are locally retrievable. It does so while preserving the typical HTTP client-server semantics. Thus, applications can start using a cache-configured HTTP client instance as a drop-in replacement without further setup.
CacheControl¶
Requests override default cache behaviour using CacheControl
.
// Specify your cache directives
var cacheControl = CacheControl.newBuilder()
.maxAge(Duration.ofMinutes(30))
.staleIfError(Duration.ofSeconds(60))
.build();
// Apply the directives to your request
var request = MutableRequest.GET("...")
.cacheControl(cacheControl);
// Cache-Control headers work as well
var request = MutableRequest.GET("...")
.header("Cache-Control", "max-age=1800, stale-if-error=60");
To properly use CacheControl
, it is good to understand the key attributes of a cached response.
Age¶
The age of a stored response is the time it has been resident in your cache or any other cache along the route to the origin. In other words, a response's age is the time evaluated since it was last generated by the server.
Freshness¶
A fresh response is one that is servable by the cache without contacting the origin. A server
specifies how long a stored response stays fresh. This is known as the response's freshness lifetime.
The freshness value of a response is its age subtracted from its freshness lifetime. A response is
fresh if its freshness value is >= 0
.
CacheControl
lets you override a response's freshness lifetime by setting the max-age
directive.
var cacheControl = CacheControl.newBuilder()
.maxAge(Duration.ofSeconds(10)) // Override the lifetime set by the server, if any
.build();
You can specify how fresh you'd like the response to be by putting a lower bound on its freshness value.
var cacheControl = CacheControl.newBuilder()
.minFresh(Duration.ofSeconds(30)) // Accept a response that stays fresh for at least the next 30 seconds
.build();
Info
Sometimes, a response lacks an explicit freshness lifetime. As encouraged by the standard & followed by browsers, Methanol uses a heuristic of 10% of the time between a response's generation & last modification times in such cases.
Staleness¶
Responses with negative freshness values are said to be stale. The staleness value of a stored
response is simply its freshness value negated. Normally, the cache won't serve a stale response until
it's revalidated with the server. Revalidation causes the cache to ask the server, using special headers
like If-None-Match
& If-Modified-Since
, if it can serve the stale response at its disposal.
If the server doesn't mind, the cache serves said response without re-downloading its payload.
Otherwise, the response is re-fetched.
You can let the cache tolerate some staleness so it doesn't trigger revalidation.
var cacheControl = CacheControl.newBuilder()
.maxStale(Duration.ofSeconds(30)) // Allow at most 30 seconds of staleness
.build();
var cacheControl = CacheControl.newBuilder()
.anyMaxStale() // Allow any staleness
.build();
stale-if-error
makes the cache recover from network or server failures if there's a stored response.
In such occasions, the cache falls back to the stored response if it satisfies the specified staleness.
var cacheControl = CacheControl.newBuilder()
.staleIfError(Duration.ofSeconds(30))
.build();
No Cache¶
You might want the cache to forward your request to origin even if there's a fresh stored response
(e.g. refreshing a page). That's what no-cache
is meant for.
var cacheControl = CacheControl.newBuilder()
.noCache()
.build();
The cache will use network efficiently. If there's a stored response, its presence is communicated to the server, so it can decide to let the cache serve said response if nothing has changed.
Only If Cached¶
Use only-if-cached
to avoid network in all cases. As usual, a stored response is served if it's
suitable. Otherwise, however, the cache immediately serves a locally generated 504 Gateway Timeout
response.
var cacheControl = CacheControl.newBuilder()
.onlyIfCached()
.build();
A perfect use-case is when network is down. You may want to get a cached response if it's there or otherwise nothing.
Prohibiting Storage¶
Use no-store
if you don't want the cache to store anything about the response.
var cacheControl = CacheControl.newBuilder()
.noStore()
.build();
Note that this, however, doesn't prohibit the cache from serving an already stored response.
Asynchronous Revalidation¶
Sometimes you need a balance between responsiveness & freshness. You are willing to get a response
immediately even if it's stale, but ensure it is freshened for later access. That's exactly what
stale-while-revalidate
does.
If the directive is found on a stale response, the cache serves it immediately provided it satisfies allowed staleness. Meanwhile, an asynchronous revalidation is triggered and the response is updated in background, keeping things fresh.
Invalidation¶
HttpCache
has APIs that give you more control over what's stored.
var cache = HttpCache.newBuilder()
.cacheOnDisk(Path.of(".cache"), 500 * 1024 * 1024)
.build();
// Remove the entry mapped to a particular URI.
cache.remove(URI.create("https://i.imgur.com/NYvl8Sy.mp4"));
// Remove the response variant matching a particular request.
cache.remove(
MutableRequest.GET(URI.create("https://i.imgur.com/NYvl8Sy.mp4"))
.header("Accept-Encoding", "gzip"));
// Remove specific entries by examining their URIs.
var iterator = cache.uris();
while (iterator.hasNext()) {
var uri = iterator.next();
if (uri.getHost().equals("i.imgur.com")) {
iterator.remove();
}
}
// Remove all entries.
cache.clear();
// Dispose of the cache by deleting its entries then closing it in an atomic fashion.
// The cache is rendered unusable after this call. This is meant for applications that
// use a temporary directory for caching in case persistence isn't needed.
cache.dispose();
Cache Operation & Statistics¶
Cache operation typically involves 3 scenarios.
- Cache Hit: The desired scenario; everything was entirely served from cache and no network was used.
- Conditional Cache Hit: The cache had to contact the origin to revalidate its copy of the response and the server decided it was valid. The cache uses server's response to update some metadata in background. Response payload isn't downloaded so network is used efficiently.
- Cache Miss: Either the cache had no matching response or the server decided such response is too stale to be server. In both cases, the whole response is fetched from network. This is when the cache populates or updates its entries if appropriate.
CacheAwareResponse¶
CacheAwareResponse
complements HttpResponse
to better reflect cache interference. If a cache is
installed, any HttpResponse<T>
returned by Methanol
is also a CacheAwareResponse<T>
, which you
can use to know which of the previous scenarios was the case.
var cache = HttpCache.newBuilder()
.cacheOnDisk(Path.of(".cache"), 500 * 1024 * 1024)
.build();
var client = Methanol.newBuilder()
.cache(cache)
.build();
var response = (CacheAwareResponse<Path>) client.send(
MutableRequest.GET("https://i.imgur.com/NYvl8Sy.mp4"), BodyHandlers.ofFile(Path.of("banana_cat.mp4")));
var timeElapsed = Duration.between(response.timeRequestSent(), response.timeResponseReceived());
System.out.println("Time elapsed: " + timeElapsed);
// networkResponse & cacheResponse are optional HttpResponses that you can further investigate
var networkResponse = response.networkResponse();
var cacheResponse = response.cacheResponse();
switch (response.cacheStatus()) {
case HIT:
assert networkResponse.isEmpty();
assert cacheResponse.isPresent();
break;
case CONDITIONAL_HIT:
assert networkResponse.isPresent();
assert cacheResponse.isPresent();
break;
case MISS:
assert networkResponse.isPresent();
// cacheResponse can be absent or present
break;
case UNSATISFIABLE:
// Network was forbidden by only-if-cached but there was no valid cache response
assert response.statusCode() == HttpURLConnection.HTTP_GATEWAY_TIMEOUT;
assert networkResponse.isEmpty();
// cacheResponse can be absent or present
break;
}
cache.close();
HttpCache.Stats¶
You can examine cache statistics to measure its effectiveness. Statistics are either global or
correspond to a specific URI
.
var cache = HttpCache.newBuilder()
.cacheOnDisk(Path.of(".cache"), 500 * 1024 * 1024)
.build();
var stats = cache.stats();
System.out.println(stats.hitRate());
System.out.println(stats.missRate());
// Per URI statistics aren't recoded by default
var cache = HttpCache.newBuilder()
.cacheOnDisk(Path.of(".cache"), 500 * 1024 * 1024)
.statsRecorder(StatsRecorder.createConcurrentPerUriRecorder())
.build();
var stats = cache.stats(URI.create("https://i.imgur.com/NYvl8Sy.mp4"));
System.out.println(stats.hitRate());
System.out.println(stats.missRate());
See HttpCache.Stats
for all recorded statistics.
Limitations¶
- The cache only stores responses to GETs. This is typical for most caches.
- The cache never stores partial responses.
- Only the most recent response variant can be stored.
- The cache doesn't store responses that have a
Vary
header with any of the values:Cookie
,Cookie2
,Authorization
,Proxy-Authroization
. The first two if the client has a configuredCookieHandler
, the latter two if the client has a configuredAuthentciator
. That's becauseHttpClient
can implicitly add these to requests, so Methanol won't be able to access their values to match requests against.