Skip to content

Caching

Methanol comes with an RFC-compliant HTTP cache that supports disk & memory storage backends. There's also an extension for Redis.

Setup

An HttpCache is utilized by injecting it into a Methanol client.

// Select a size limit thats suitable for your application.
long maxSizeInBytes = 100 * 1024 * 1024; // 100 MBs
var cache = HttpCache.newBuilder()
    .cacheOnDisk(Path.of(".cache"), maxSizeInBytes)
    .build();

// The cache intercepts requests you send through this client.
var client = Methanol.newBuilder()
    .cache(cache)
    .build();

// Don't forget to close the cache when you're done!
cache.close();
// Select a size limit thats suitable for your application.
long maxSizeInBytes = 50 * 1024 * 1024; // 50 MBs
var cache = HttpCache.newBuilder()
    .cacheOnMemory(maxSizeInBytes)
    .build();

// The cache intercepts requests you send through this client.
var client = Methanol.newBuilder()
    .cache(cache)
    .build();

// Don't forget to close the cache when you're done!
cache.close();

Hint

You can pass the builder a custom Executor for launching asynchronous tasks needed by the cache. By default, an unbounded thread pool of daemon threads is used.

Caution

To avoid surprises, make sure the disk directory is exclusively owned by a single cache instance as long as it's open and nothing else. The cache enforces that to some degree by complaining with an IOException if it's initialized with a directory that's already in use by another instance in the same or a different JVM. Note that you can use the same HttpCache with multiple clients.

An HTTP client can also be configured with a chain of caches, typically in the order of decreasing locality. The chain is invoked in the given order, and a cache either returns the response if it has a suitable one, or forwards to the next cache (or finally to the network) otherwise.

var memoryCache = HttpCache.newBuilder()
    .cacheOnMemory(100 * 1024 * 1024)
    .build();
var diskCache = HttpCache.newBuilder()
    .cacheOnDisk(Path.of(".cache"), 500 * 1024 * 1024)
    .build();
var client = Methanol.newBuilder()
    .cacheChain(List.of(memoryCache, diskCache))
    .build();

Usage

An HTTP cache is a transparent layer between you and the origin server. Its main goal is to save time & bandwidth by avoiding network if requested resources are locally retrievable. It does so while preserving the typical HTTP client-server semantics. Thus, applications can start using a cache-configured HTTP client instance as a drop-in replacement without further setup.

CacheControl

Requests override default cache behaviour using CacheControl.

// Specify your cache directives
var cacheControl = CacheControl.newBuilder()
    .maxAge(Duration.ofMinutes(30))
    .staleIfError(Duration.ofSeconds(60))
    .build();

// Apply the directives to your request
var request = MutableRequest.GET("...")
    .cacheControl(cacheControl);
// Cache-Control headers work as well
var request = MutableRequest.GET("...")
    .header("Cache-Control", "max-age=1800, stale-if-error=60");

To properly use CacheControl, it is good to understand the key attributes of a cached response.

Age

The age of a stored response is the time it has been resident in your cache or any other cache along the route to the origin. In other words, a response's age is the time evaluated since it was last generated by the server.

Freshness

A fresh response is one that is servable by the cache without contacting the origin. A server specifies how long a stored response stays fresh. This is known as the response's freshness lifetime. The freshness value of a response is its age subtracted from its freshness lifetime. A response is fresh if its freshness value is >= 0.

CacheControl lets you override a response's freshness lifetime by setting the max-age directive.

var cacheControl = CacheControl.newBuilder() 
    .maxAge(Duration.ofSeconds(10)) // Override the lifetime set by the server, if any
    .build();

You can specify how fresh you'd like the response to be by putting a lower bound on its freshness value.

var cacheControl = CacheControl.newBuilder()
    .minFresh(Duration.ofSeconds(30)) // Accept a response that stays fresh for at least the next 30 seconds
    .build();

Info

Sometimes, a response lacks an explicit freshness lifetime. As encouraged by the standard & followed by browsers, Methanol uses a heuristic of 10% of the time between a response's generation & last modification times in such cases.

Staleness

Responses with negative freshness values are said to be stale. The staleness value of a stored response is simply its freshness value negated. Normally, the cache won't serve a stale response until it's revalidated with the server. Revalidation causes the cache to ask the server, using special headers like If-None-Match & If-Modified-Since, if it can serve the stale response at its disposal. If the server doesn't mind, the cache serves said response without re-downloading its payload. Otherwise, the response is re-fetched.

You can let the cache tolerate some staleness so it doesn't trigger revalidation.

var cacheControl = CacheControl.newBuilder() 
    .maxStale(Duration.ofSeconds(30)) // Allow at most 30 seconds of staleness
    .build();
var cacheControl = CacheControl.newBuilder() 
    .anyMaxStale() // Allow any staleness
    .build();

stale-if-error makes the cache recover from network or server failures if there's a stored response. In such occasions, the cache falls back to the stored response if it satisfies the specified staleness.

var cacheControl = CacheControl.newBuilder() 
    .staleIfError(Duration.ofSeconds(30))
    .build();

No Cache

You might want the cache to forward your request to origin even if there's a fresh stored response (e.g. refreshing a page). That's what no-cache is meant for.

var cacheControl = CacheControl.newBuilder() 
    .noCache()
    .build();

The cache will use network efficiently. If there's a stored response, its presence is communicated to the server, so it can decide to let the cache serve said response if nothing has changed.

Only If Cached

Use only-if-cached to avoid network in all cases. As usual, a stored response is served if it's suitable. Otherwise, however, the cache immediately serves a locally generated 504 Gateway Timeout response.

var cacheControl = CacheControl.newBuilder() 
    .onlyIfCached()
    .build();

A perfect use-case is when network is down. You may want to get a cached response if it's there or otherwise nothing.

Prohibiting Storage

Use no-store if you don't want the cache to store anything about the response.

var cacheControl =  CacheControl.newBuilder() 
    .noStore()
    .build();

Note that this, however, doesn't prohibit the cache from serving an already stored response.

Asynchronous Revalidation

Sometimes you need a balance between responsiveness & freshness. You are willing to get a response immediately even if it's stale, but ensure it is freshened for later access. That's exactly what stale-while-revalidate does.

If the directive is found on a stale response, the cache serves it immediately provided it satisfies allowed staleness. Meanwhile, an asynchronous revalidation is triggered and the response is updated in background, keeping things fresh.

Invalidation

HttpCache has APIs that give you more control over what's stored.

var cache = HttpCache.newBuilder()
    .cacheOnDisk(Path.of(".cache"), 500 * 1024 * 1024)
    .build();

// Remove the entry mapped to a particular URI.
cache.remove(URI.create("https://i.imgur.com/NYvl8Sy.mp4"));

// Remove the response variant matching a particular request.
cache.remove(
    MutableRequest.GET(URI.create("https://i.imgur.com/NYvl8Sy.mp4"))
        .header("Accept-Encoding", "gzip"));

// Remove specific entries by examining their URIs.
var iterator = cache.uris();
while (iterator.hasNext()) {
  var uri = iterator.next();  
  if (uri.getHost().equals("i.imgur.com")) {
    iterator.remove();  
  }
}

// Remove all entries.
cache.clear();

// Dispose of the cache by deleting its entries then closing it in an atomic fashion.
// The cache is rendered unusable after this call. This is meant for applications that
// use a temporary directory for caching in case persistence isn't needed.
cache.dispose();

Cache Operation & Statistics

Cache operation typically involves 3 scenarios.

  • Cache Hit: The desired scenario; everything was entirely served from cache and no network was used.
  • Conditional Cache Hit: The cache had to contact the origin to revalidate its copy of the response and the server decided it was valid. The cache uses server's response to update some metadata in background. Response payload isn't downloaded so network is used efficiently.
  • Cache Miss: Either the cache had no matching response or the server decided such response is too stale to be server. In both cases, the whole response is fetched from network. This is when the cache populates or updates its entries if appropriate.

CacheAwareResponse

CacheAwareResponse complements HttpResponse to better reflect cache interference. If a cache is installed, any HttpResponse<T> returned by Methanol is also a CacheAwareResponse<T>, which you can use to know which of the previous scenarios was the case.

var cache = HttpCache.newBuilder()
    .cacheOnDisk(Path.of(".cache"), 500 * 1024 * 1024)
    .build();
var client = Methanol.newBuilder()
    .cache(cache)
    .build();

var response = (CacheAwareResponse<Path>) client.send(
    MutableRequest.GET("https://i.imgur.com/NYvl8Sy.mp4"), BodyHandlers.ofFile(Path.of("banana_cat.mp4")));

var timeElapsed  = Duration.between(response.timeRequestSent(), response.timeResponseReceived());
System.out.println("Time elapsed: " + timeElapsed);

// networkResponse & cacheResponse are optional HttpResponses that you can further investigate
var networkResponse = response.networkResponse();
var cacheResponse = response.cacheResponse();
switch (response.cacheStatus()) {
  case HIT:
    assert networkResponse.isEmpty();
    assert cacheResponse.isPresent();
    break;

  case CONDITIONAL_HIT:
    assert networkResponse.isPresent();
    assert cacheResponse.isPresent();
    break;

  case MISS:
    assert networkResponse.isPresent();
    // cacheResponse can be absent or present
    break;

  case UNSATISFIABLE:
    // Network was forbidden by only-if-cached but there was no valid cache response
    assert response.statusCode() == HttpURLConnection.HTTP_GATEWAY_TIMEOUT;
    assert networkResponse.isEmpty();
    // cacheResponse can be absent or present
    break;
}

cache.close();

HttpCache.Stats

You can examine cache statistics to measure its effectiveness. Statistics are either global or correspond to a specific URI.

var cache = HttpCache.newBuilder()
    .cacheOnDisk(Path.of(".cache"), 500 * 1024 * 1024)
    .build();

var stats = cache.stats();
System.out.println(stats.hitRate());
System.out.println(stats.missRate());
// Per URI statistics aren't recoded by default
var cache = HttpCache.newBuilder()
    .cacheOnDisk(Path.of(".cache"), 500 * 1024 * 1024)
    .statsRecorder(StatsRecorder.createConcurrentPerUriRecorder())
    .build();

var stats = cache.stats(URI.create("https://i.imgur.com/NYvl8Sy.mp4"));
System.out.println(stats.hitRate());
System.out.println(stats.missRate());

See HttpCache.Stats for all recorded statistics.

Limitations

  • The cache only stores responses to GETs. This is typical for most caches.
  • The cache never stores partial responses.
  • Only the most recent response variant can be stored.
  • The cache doesn't store responses that have a Vary header with any of the values: Cookie, Cookie2, Authorization, Proxy-Authroization. The first two if the client has a configured CookieHandler, the latter two if the client has a configured Authentciator. That's because HttpClient can implicitly add these to requests, so Methanol won't be able to access their values to match requests against.