Caching¶
Methanol comes with an RFC-compliant HTTP cache that supports both disk & memory storage backends.
Setup¶
An HttpCache
is utilized by injecting it into a Methanol
client. First, it needs to know where
it stores entries and how much space it can occupy.
// Select a size limit thats suitable for your application
long maxSizeInBytes = 100 * 1024 * 1024; // 100 MBs
var cache = HttpCache.newBuilder()
.cacheOnDisk(Path.of("my-cache-dir"), maxSizeInBytes)
.build();
// The cache intercepts requests you send through this client
var client = Methanol.newBuilder()
.cache(cache)
.build();
// It's important that you close the disk cache after you're done
cache.close();
// Select a size limit thats suitable for your application
long maxSizeInBytes = 50 * 1024 * 1024; // 50 MBs
var cache = HttpCache.newBuilder()
.cacheOnMemory(maxSizeInBytes)
.build();
// The cache intercepts requests you send through this client
var client = Methanol.newBuilder()
.cache(cache)
.build();
// No need to close, but doing so avoids surprises if you later switch to disk
cache.close();
Hint
You can pass the builder a custom Executor
for launching asynchronous tasks needed by the
cache. By default, an unbounded thread pool of daemon threads is used.
Caution
To avoid surprises, make sure the disk directory is exclusively owned by a single cache instance
as long as it's open and nothing else. The cache enforces that to some degree by complaining with
an IOException
if it's initialized with a directory that's already in use by another instance
in the same or a different JVM. Note that you can use the same HttpCache
with multiple clients.
Usage¶
An HTTP cache is a transparent layer between you and the origin server. Its main goal is to save
time & bandwidth by avoiding network if requested resources are locally retrievable. It does so
while preserving the typical HTTP client-server semantics. Thus, it should be OK for modules to
start using a cache-configured Methanol
(and hence HttpClient
) instance as a drop-in replacement
without further setup.
CacheControl¶
Requests override default cache behaviour using CacheControl
.
// Specify your cache directives
var cacheControl = CacheControl.newBuilder()
.maxAge(Duration.ofMinutes(30))
.staleIfError(Duration.ofSeconds(60))
.build();
// Apply the directives to your request
var request = MutableRequest.GET("...")
.cacheControl(cacheControl);
// Cache-Control headers work as well
var request = MutableRequest.GET("...")
.header("Cache-Control", "max-age=1800, stale-if-error=60");
To properly use CacheControl
, it is good to understand the key attributes of a cached response.
Age¶
The age of a stored response is the time it has been resident in your cache or any other cache along the route to the origin. In other words, a response's age is the time evaluated since it was last generated by the server.
Freshness¶
A fresh response is one that is servable by the cache without contacting the origin. A server
specifies how long a stored response stays fresh. This is known as the response's freshness lifetime.
The freshness value of a response is its age subtracted from its freshness lifetime. A response is
fresh if its freshness value is >= 0
.
CacheControl
lets you override a response's freshness lifetime by setting the max-age
directive.
var cacheControl = CacheControl.newBuilder()
.maxAge(Duration.ofSeconds(10)) // Override the lifetime set by the server, if any
.build();
You can specify how fresh you'd like the response to be by putting a lower bound on its freshness value.
var cacheControl = CacheControl.newBuilder()
.minFresh(Duration.ofMinutes(10)) // Accept a response that stays fresh for at least the next 10 minutes
.build();
Info
Sometimes, a response lacks an explicit freshness lifetime. As encouraged by the standard & followed by browsers, Methanol uses a heuristic of 10% of the time between a response's generation & last modification times in such cases.
Staleness¶
Responses with negative freshness values are said to be stale. The staleness value of a stored
response is simply its freshness value negated. Normally, the cache won't serve a stale response until
it's revalidated with the server. Revalidation causes the cache to ask the server, using special headers
like If-None-Match
& If-Modified-Since
, if it can serve the stale response at its disposal.
If the server doesn't mind, the cache serves said response without re-downloading its payload.
Otherwise, the response is re-fetched.
You can let the cache tolerate some stalness so it doesn't trigger revalidation.
var cacheControl = CacheControl.newBuilder()
.maxStale(Duration.ofSeconds(30)) // Allow at most 30 seconds of staleness
.build();
var cacheControl = CacheControl.newBuilder()
.anyMaxStale() // Allow any staleness
.build();
stale-if-error
makes the cache recover from network or server failures if there's a stored response.
In such occasions, the cache falls back to the stored response if it satisfies the specified staleness.
var cacheControl = CacheControl.newBuilder()
.staleIfError(Duration.ofSeconds(30))
.build();
No Cache¶
You might want the cache to forward your request to origin even if there's a fresh stored response
(e.g. refreshing a page). That's what no-cache
is meant for.
var cacheControl = CacheControl.newBuilder()
.noCache()
.build();
The cache will use network efficiently. If there's a stored response, its presence is communicated to the server, so it can decide to let the cache serve said response if nothing has changed.
Only If Cached¶
Use only-if-cached
to avoid network in all cases. As usual, a stored response is served if it's
suitable. Otherwise, however, the cache immediately serves a locally generated 504 Gateway Timeout
response.
var cacheControl = CacheControl.newBuilder()
.onlyIfCached()
.build();
A perfect use-case is when network is down or the app is offline. You'd want to get a cached response if it's there or otherwise nothing.
Prohibiting Storage¶
Use no-store
if you don't want the cache to store anything about the response.
var cacheControl = CacheControl.newBuilder()
.noStore()
.build();
Note that this, however, doesn't prohibit the cache from serving an already stored response.
Asynchronous Revalidation¶
Sometimes you need a balance between responsiveness & freshness. You are willing to get a response
immediately even if it's stale, but ensure it is freshened for later access. That's exactly what
stale-while-revalidate
does.
If the directive is found on a stale response, the cache serves it immediately provided it satisfies allowed staleness. What's interesting is that an asynchronous revalidation is triggered and the response is updated in background, keeping things fresh.
Invalidation¶
HttpCache
has APIs that give you more control over what's stored.
var cache = HttpCache.newBuilder()
.cacheOnDisk(Path.of("my-cache-dir"), 100 * 1024 * 1024)
.build();
// Remove the entry mapped to a particular URI
cache.remove(URI.create("https://i.imgur.com/NYvl8Sy.mp4"));
// Remove the response variant matching a particular request
cache.remove(
MutableRequest.GET(URI.create("https://i.imgur.com/NYvl8Sy.mp4"))
.header("Accept-Encoding", "gzip"));
// Remove specific entries by examining their URIs
var iterator = cache.uris();
while (iterator.hasNext()) {
var uri = iterator.next();
if (uri.getHost().equals("i.imgur.com")) {
iterator.remove();
}
}
// Remove all entries
cache.clear();
// Dispose of the cache by deleting its entries then closing it in an atomic fashion.
// The cache is rendered unusable after this call. This is meant for applications that
// use a temporary directory for caching in case persistence isn't needed.
cache.dispose();
Cache Operation & Statistics¶
Cache operation typically involves 3 scenarios.
- Cache Hit: The blessed scenario; everything was entirely served from cache and no network was used.
- Conditional Cache Hit: The cache had to contact the origin to revalidate its copy of the response and the server decided it was valid. The cache uses server's response to update some metadata in background. Response payload isn't downloaded so network is used efficiently.
- Cache Miss: Either the cache had no matching response or the server decided such response is too stale to be server. In both cases, the whole response is fetched from network. This is when the cache populates or updates its entries if appropriate.
CacheAwareResponse¶
CacheAwareResponse
complements HttpResponse
to better reflect cache interference. If a cache is
installed, any HttpResponse<T>
returned by Methanol
is also a CacheAwareResponse<T>
, which you
can use to know which of the previous scenarios was the case.
var cache = HttpCache.newBuilder()
.cacheOnDisk(Path.of("my-cache-dir"), 100 * 1024 * 1024)
.build();
var client = Methanol.newBuilder()
.cache(cache)
.build();
var response = (CacheAwareResponse<Path>) client.send(
MutableRequest.GET("https://i.imgur.com/NYvl8Sy.mp4"), BodyHandlers.ofFile(Path.of("banana_cat.mp4")));
var timeElapsed = Duration.between(response.timeRequestSent(), response.timeResponseReceived());
System.out.println("Time elapsed: " + timeElapsed);
// networkResponse & cacheResponse are optional HttpResponses that you can further investigate
var networkResponse = response.networkResponse();
var cacheResponse = response.cacheResponse();
switch (response.cacheStatus()) {
case HIT:
assert networkResponse.isEmpty();
assert cacheResponse.isPresent();
break;
case CONDITIONAL_HIT:
assert networkResponse.isPresent();
assert cacheResponse.isPresent();
break;
case MISS:
assert networkResponse.isPresent();
// cacheResponse can be absent or present
break;
case UNSATISFIABLE:
// Network was forbidden by only-if-cached but there was no valid cache response
assert response.statusCode() == HttpURLConnection.HTTP_GATEWAY_TIMEOUT;
assert networkResponse.isEmpty();
// cacheResponse can be absent or present
break;
}
cache.close();
HttpCache.Stats¶
You can examine cache statistics to measure its effectiveness. Statistics are either global or
correspond to a specific URI
.
var cache = HttpCache.newBuilder()
.cacheOnDisk(Path.of("my-cache-dir"), 100 * 1024 * 1024)
.build();
var stats = cache.stats();
System.out.println(stats.hitRate());
System.out.println(stats.missRate());
// Per URI statistics aren't recorder by default
var cache = HttpCache.newBuilder()
.cacheOnDisk(Path.of("my-cache-dir"), 100 * 1024 * 1024)
.statsRecorder(StatsRecorder.createConcurrentPerUriRecorder())
.build();
var stats = cache.stats(URI.create("https://i.imgur.com/NYvl8Sy.mp4"));
System.out.println(stats.hitRate());
System.out.println(stats.missRate());
See HttpCache.Stats
for all recorded statistics.
Limitations¶
- The cache only stores responses to GETs. This is typical for most caches.
- The cache never stores partial responses.
- Only the most recent response variant can be stored.
- The cache doesn't store responses that have a
Vary
header with any of the values:Cookie
,Cookie2
,Authorization
,Proxy-Authroization
. That's because theHttpClient
can implicitly add these to requests, so Methanol won't be able to access their values to match responses against.