Skip to content

Caching

Methanol comes with an RFC-compliant HTTP cache that supports both disk & memory storage backends.

Setup

An HttpCache is utilized by injecting it into a Methanol client. First, it needs to know where it stores entries and how much space it can occupy.

// Select a size limit thats suitable for your application
long maxSizeInBytes = 100 * 1024 * 1024; // 100 MBs

var cache = HttpCache.newBuilder()
    .cacheOnDisk(Path.of("my-cache-dir"), maxSizeInBytes)
    .build();

// The cache intercepts requests you send through this client
var client = Methanol.newBuilder()
    .cache(cache)
    .build();

// It's important that you close the disk cache after you're done
cache.close();
// Select a size limit thats suitable for your application
long maxSizeInBytes = 50 * 1024 * 1024; // 50 MBs

var cache = HttpCache.newBuilder()
    .cacheOnMemory(maxSizeInBytes)
    .build();

// The cache intercepts requests you send through this client
var client = Methanol.newBuilder()
    .cache(cache)
    .build();

// No need to close, but doing so avoids surprises if you later switch to disk
cache.close();

Hint

You can pass the builder a custom Executor for launching asynchronous tasks needed by the cache. By default, an unbounded thread pool of daemon threads is used.

Caution

To avoid surprises, make sure the disk directory is exclusively owned by a single cache instance as long as it's open and nothing else. The cache enforces that to some degree by complaining with an IOException if it's initialized with a directory that's already in use by another instance in the same or a different JVM. Note that you can use the same HttpCache with multiple clients.

Usage

An HTTP cache is a transparent layer between you and the origin server. Its main goal is to save time & bandwidth by avoiding network if requested resources are locally retrievable. It does so while preserving the typical HTTP client-server semantics. Thus, it should be OK for modules to start using a cache-configured Methanol (and hence HttpClient) instance as a drop-in replacement without further setup.

CacheControl

Requests override default cache behaviour using CacheControl.

// Specify your cache directives
var cacheControl = CacheControl.newBuilder()
    .maxAge(Duration.ofMinutes(30))
    .staleIfError(Duration.ofSeconds(60))
    .build();

// Apply the directives to your request
var request = MutableRequest.GET("...")
    .cacheControl(cacheControl);
// Cache-Control headers work as well
var request = MutableRequest.GET("...")
    .header("Cache-Control", "max-age=1800, stale-if-error=60");

To properly use CacheControl, it is good to understand the key attributes of a cached response.

Age

The age of a stored response is the time it has been resident in your cache or any other cache along the route to the origin. In other words, a response's age is the time evaluated since it was last generated by the server.

Freshness

A fresh response is one that is servable by the cache without contacting the origin. A server specifies how long a stored response stays fresh. This is known as the response's freshness lifetime. The freshness value of a response is its age subtracted from its freshness lifetime. A response is fresh if its freshness value is >= 0.

CacheControl lets you override a response's freshness lifetime by setting the max-age directive.

var cacheControl = CacheControl.newBuilder() 
    .maxAge(Duration.ofSeconds(10)) // Override the lifetime set by the server, if any
    .build();

You can specify how fresh you'd like the response to be by putting a lower bound on its freshness value.

var cacheControl = CacheControl.newBuilder() 
    .minFresh(Duration.ofMinutes(10)) // Accept a response that stays fresh for at least the next 10 minutes
    .build();

Info

Sometimes, a response lacks an explicit freshness lifetime. As encouraged by the standard & followed by browsers, Methanol uses a heuristic of 10% of the time between a response's generation & last modification times in such cases.

Staleness

Responses with negative freshness values are said to be stale. The staleness value of a stored response is simply its freshness value negated. Normally, the cache won't serve a stale response until it's revalidated with the server. Revalidation causes the cache to ask the server, using special headers like If-None-Match & If-Modified-Since, if it can serve the stale response at its disposal. If the server doesn't mind, the cache serves said response without re-downloading its payload. Otherwise, the response is re-fetched.

You can let the cache tolerate some stalness so it doesn't trigger revalidation.

var cacheControl = CacheControl.newBuilder() 
    .maxStale(Duration.ofSeconds(30)) // Allow at most 30 seconds of staleness
    .build();
var cacheControl = CacheControl.newBuilder() 
    .anyMaxStale() // Allow any staleness
    .build();

stale-if-error makes the cache recover from network or server failures if there's a stored response. In such occasions, the cache falls back to the stored response if it satisfies the specified staleness.

var cacheControl = CacheControl.newBuilder() 
    .staleIfError(Duration.ofSeconds(30))
    .build();

No Cache

You might want the cache to forward your request to origin even if there's a fresh stored response (e.g. refreshing a page). That's what no-cache is meant for.

var cacheControl = CacheControl.newBuilder() 
    .noCache()
    .build();

The cache will use network efficiently. If there's a stored response, its presence is communicated to the server, so it can decide to let the cache serve said response if nothing has changed.

Only If Cached

Use only-if-cached to avoid network in all cases. As usual, a stored response is served if it's suitable. Otherwise, however, the cache immediately serves a locally generated 504 Gateway Timeout response.

var cacheControl = CacheControl.newBuilder() 
    .onlyIfCached()
    .build();

A perfect use-case is when network is down or the app is offline. You'd want to get a cached response if it's there or otherwise nothing.

Prohibiting Storage

Use no-store if you don't want the cache to store anything about the response.

var cacheControl =  CacheControl.newBuilder() 
    .noStore()
    .build();

Note that this, however, doesn't prohibit the cache from serving an already stored response.

Asynchronous Revalidation

Sometimes you need a balance between responsiveness & freshness. You are willing to get a response immediately even if it's stale, but ensure it is freshened for later access. That's exactly what stale-while-revalidate does.

If the directive is found on a stale response, the cache serves it immediately provided it satisfies allowed staleness. What's interesting is that an asynchronous revalidation is triggered and the response is updated in background, keeping things fresh.

Invalidation

HttpCache has APIs that give you more control over what's stored.

var cache = HttpCache.newBuilder()
    .cacheOnDisk(Path.of("my-cache-dir"), 100 * 1024 * 1024)
    .build();

// Remove the entry mapped to a particular URI
cache.remove(URI.create("https://i.imgur.com/NYvl8Sy.mp4"));

// Remove the response variant matching a particular request
cache.remove(
    MutableRequest.GET(URI.create("https://i.imgur.com/NYvl8Sy.mp4"))
        .header("Accept-Encoding", "gzip"));

// Remove specific entries by examining their URIs
var iterator = cache.uris();
while (iterator.hasNext()) {
  var uri = iterator.next();  
  if (uri.getHost().equals("i.imgur.com")) {
    iterator.remove();  
  }
}

// Remove all entries
cache.clear();

// Dispose of the cache by deleting its entries then closing it in an atomic fashion.
// The cache is rendered unusable after this call. This is meant for applications that
// use a temporary directory for caching in case persistence isn't needed.
cache.dispose();

Cache Operation & Statistics

Cache operation typically involves 3 scenarios.

  • Cache Hit: The blessed scenario; everything was entirely served from cache and no network was used.
  • Conditional Cache Hit: The cache had to contact the origin to revalidate its copy of the response and the server decided it was valid. The cache uses server's response to update some metadata in background. Response payload isn't downloaded so network is used efficiently.
  • Cache Miss: Either the cache had no matching response or the server decided such response is too stale to be server. In both cases, the whole response is fetched from network. This is when the cache populates or updates its entries if appropriate.

CacheAwareResponse

CacheAwareResponse complements HttpResponse to better reflect cache interference. If a cache is installed, any HttpResponse<T> returned by Methanol is also a CacheAwareResponse<T>, which you can use to know which of the previous scenarios was the case.

var cache = HttpCache.newBuilder()
    .cacheOnDisk(Path.of("my-cache-dir"), 100 * 1024 * 1024)
    .build();
var client = Methanol.newBuilder()
    .cache(cache)
    .build();

var response = (CacheAwareResponse<Path>) client.send(
    MutableRequest.GET("https://i.imgur.com/NYvl8Sy.mp4"), BodyHandlers.ofFile(Path.of("banana_cat.mp4")));

var timeElapsed  = Duration.between(response.timeRequestSent(), response.timeResponseReceived());
System.out.println("Time elapsed: " + timeElapsed);

// networkResponse & cacheResponse are optional HttpResponses that you can further investigate
var networkResponse = response.networkResponse();
var cacheResponse = response.cacheResponse();
switch (response.cacheStatus()) {
  case HIT:
    assert networkResponse.isEmpty();
    assert cacheResponse.isPresent();
    break;

  case CONDITIONAL_HIT:
    assert networkResponse.isPresent();
    assert cacheResponse.isPresent();
    break;

  case MISS:
    assert networkResponse.isPresent();
    // cacheResponse can be absent or present
    break;

  case UNSATISFIABLE:
    // Network was forbidden by only-if-cached but there was no valid cache response
    assert response.statusCode() == HttpURLConnection.HTTP_GATEWAY_TIMEOUT;
    assert networkResponse.isEmpty();
    // cacheResponse can be absent or present
    break;
}

cache.close();

HttpCache.Stats

You can examine cache statistics to measure its effectiveness. Statistics are either global or correspond to a specific URI.

var cache = HttpCache.newBuilder()
    .cacheOnDisk(Path.of("my-cache-dir"), 100 * 1024 * 1024)
    .build();

var stats = cache.stats();
System.out.println(stats.hitRate());
System.out.println(stats.missRate());
// Per URI statistics aren't recorder by default
var cache = HttpCache.newBuilder()
    .cacheOnDisk(Path.of("my-cache-dir"), 100 * 1024 * 1024)
    .statsRecorder(StatsRecorder.createConcurrentPerUriRecorder())
    .build();

var stats = cache.stats(URI.create("https://i.imgur.com/NYvl8Sy.mp4"));
System.out.println(stats.hitRate());
System.out.println(stats.missRate());

See HttpCache.Stats for all recorded statistics.

Limitations

  • The cache only stores responses to GETs. This is typical for most caches.
  • The cache never stores partial responses.
  • Only the most recent response variant can be stored.
  • The cache doesn't store responses that have a Vary header with any of the values: Cookie, Cookie2, Authorization, Proxy-Authroization. That's because the HttpClient can implicitly add these to requests, so Methanol won't be able to access their values to match responses against.