Stop doing these mistakes with your caching proxy

How to get sub-10ms response times, real time content expiration, automatic compression all while offloading your backends. Enter the power of RFC 9110 and discover the underdog of reverse proxies.

Feb 17, 2024

You got yourself a website with static-ish content which takes a lot of time to generate and you are looking to make it faster. The obvious solution is caching, but that is a surprisingly intricate and delicate topic when even the slightest bit of interaction starts to happen. Moreover you would probably be happy if your cache could refresh your content automatically when you hit “Publish” in your CMS. And finally, as state you have different interaction points in your website so you can’t exactly turn towards static website generators.

Your typical stack

The application that you are going to develop through this article is a very simple Flask application designed only to explore the concepts that we’re introducing without getting the noise of a more complex setup.

However these concepts can apply to many different configurations. Specifically, if you are reading this in 2024 chances are that you are already running a given amount of headless CMS’:

A back-end/API which runs whichever CMS that has a “headless mode”. This author would recommend Wagtail but the list of such beasts is growing extremely long these days.
A front-end meta-framework which renders a first version of your content on the server side and then hydrates the HTML within the client to give full interactivity to the pages. You could use SvelteKit, Nuxt, Astro or Next.js for example.

If you were to do so, you essentially need to consider your front-end as a proxy on top of your API, which performs a JSON to HTML encoding transformation. This suggests that if you implement the RFC 9110 in your front-end, the solution that you’re going to discover below should still apply. Maybe the topic for a future article!

A bit of HTTP

That’s where lies the secret that the reverse proxying industry doesn’t want you to learn. HTTP presents numerous cache modes — in particular through the Cache-Control header — but oftentimes you’ll end up with a solution that is time-based. You tell the cache “please keep this for an hour” and it will do so. Hell if you updated the content, it will expire when it will. Of course there are techniques to alleviate this issue with background revalidation for example but in addition to the time-based inconvenience, the higher the refesh rate the higher the load on your back-end.

On the other hand, an extremely easy way to keep the cache up-to-date is through conditional validation and in particular the use of the ETag header. The conversation looks like this:

Client: give me /foo
Server: here’s /foo, with ETag 1234
Client: give me /foo if it is not 1234 anymore
Server: not changed, use your cache

Simply checking that the value of the ETag didn’t change is incredibly cheap to perform while also making sure that your content always stays up-to-date. For example you can imagine putting in this header something built upon the version of the page in your CMS. As soon as something new is published, all the caches will be renewed.

Story time: yours truly used to work on a private social network that used to have lots of interactive widgets which relied heavily on polling, as websockets were not invented yet. The polling as wearing the server down at a crazy rate, but implementing an ETag-based cache that was solely relying on the browser cache made an utterly dramatic improvement on the server load.

Obviously this is far from being the only valid caching strategy out there but if your target audience is geographically close enough and you want to rely only on standard HTTP mechanisms instead of implementing proprietary logics using mystical lines in your proxy’s configuration DSL, you this is a fairly efficient solution which will bring you sub-10ms response generation time.

The mighty RFC 9110

The governing RFC for what you are trying to accomplish here is the RFC 9110. To summarize the interesting parts, a cached resource has different states:

Fresh — The content is in cache and we know it’s still valid
Stale — The content is in cache but we need to re-validate it
Missing — No concent in cache, must do the request

When putting an ETag on a resource, it will automatically cache it as stale and re-validate it using If-None-Match, which is the mechanic described above. On the server-side it’s very easy, in pseudo code:

if 'if-none-match' in headers:
    if headers['if-none-match'] == latest_etag_for_route():
        return 304

return normal_response()

However at the cache level it seems to be more tricky. It’s easy to set your proxy to forward the client’s If-None-Match (INM) header, but when you start to consider different possibilities it’s not so obvious anymore:

What if the client doesn’t say INM but the proxy has this resource in cache?
What if the client’s INM mismatches the one in cache?
What if the client has an INM but the proxy has nothing in cache?
And so forth ad nauseam

This mechanic being so tricky, this author attempted to implement it with many different caching proxies without success:

nginx — Has many options which could probably lead up to correctly implementing RFC 9110 but it is dishearting of complexity and uncertainty
varnish — Does the job with a bit of tweaking but will make your life hard if you have cookies
squid — Fails miserably
traefik — Maybe the enterprise version has the feature but the license is just prohibitive
caddy — Does not actually have a cache
Apache’s httpd — Honestly maybe but I could not figure it out

You will probably be wondering at this stage which solution can you then use, as the most popular solutions from today and yesterday are all listed here. Turns out that another solution, which was barely even on the radar, has the following table in its documentation:

This is an extract from the Apache Traffic Server documentation. By-product of an acquisition from Yahoo which subsequently open-sourced it in 2009, ATS probably has one of the most unfriendly configuration syntax that you’ll ever see — especially if you look at the default files in the Debian package — which may make you want to give up immediately.

Beyond the initial intimidating look, it’s actually a strong contender:

It is used by massive CDN companies, so while it’s going to be hard to compare it directly to something like nginx you can imagine that it is at least at the same level of performance and feature.
It is explicitly a proxy and specializes in doing so. You won’t be configuring a plugin to do proxying, it is the core feature. It changes radically the ease of configuration.
Last but not least, it implements RFC 9110 correctly enough by default so that you can configure the cache behavior through standard HTTP headers and not be too surprised about the actual behavior.

You can dig deeper into ATS through this video, but you will be reading about the important bits of configuration right below.

The project itself

The goal of today is to demonstrate how you can use ETags to cache and expire your content on a proxy. To that extent you’ll be implementing the following page:

The example app displays the "Expected ETag" value, a "Random String" and a button to change the ETag. — Our demo of the ETag caching

You can control that the ETag and caching mechanisms work properly using this page:

If the expected ETag doesn’t change, it means that the server is indeed consistent with its ETag
If the random string changes it means that the page has been re-generated while if it stays the same it means that the page came from the cache

The rest of this article will contain extracts of code, but the whole project can be found on GitHub and shall serve a reference.

View logic

While there are probably many ways to deal with ETag that are nicer than this (for example in Django there is a super-easy etag decorator), here is the logic you need to implement an ETag/If-None-Match cache:

from flask import Flask, make_response, redirect, render_template, request, url_for

from .etag import *

app = Flask(__name__)


@app.route("/", methods=["GET", "POST"])
def etag_demo():
    """This view displays a simple template that informs the user about the
    current ETag value and a random string. This allows to demonstrate how
    ETag works (cache gets refreshed when ETag changes) and to test if caching
    works (if the cache works, the random string shouldn't change)."""

    if request.method == "POST":
        new_etag = generate_random_string()
        set_etag(new_etag)
        return redirect(url_for("etag_demo"))

    current_etag = get_or_set_etag()

    if current_etag == extract_etag(request.headers.get("If-None-Match", "")):
        response = make_response("", 304)
    else:
        response = make_response(
            render_template(
                "etag.html",
                etag=current_etag,
                random_string=generate_random_string(),
            )
        )

    # We put s-maxage=0 instead of no-cache because somehow this incites
    # caching proxy better to store the request into cache
    response.headers["Cache-Control"] = "public, must-revalidate, s-maxage=0"
    response.headers["ETag"] = f'W/"{current_etag}"'

    return response


if __name__ == "__main__":
    app.run(debug=True)

A bunch of helpers are abstracted away in the etag.py file, but the logic is basically exactly the same as the one listed right above, with simply the addition of rendering the template.

You can checkout the project from GitHub and run the backend:

git clone git@github.com:Xowap/cache-cache.git
cd cache-cache/backend
poetry install
make serve

This will start the server on http://localhost:5000/, which you can now visit. You can in theory see the page, refresh it as many times as you want without seeing the random string change and then click the button to change the ETag. That’s all because you are doing this from the same browser but if you open another browser you will get a completely different output — albeite the output being consistent within one browser.

The next steps aim to configure a shared cache in front of this backend which will allow to cache the same resource for different users at the same time.

Core configuration

Most likely these days you will be deploying in a Kubernetes or at least dockerized environment. However ATS has surprisingly few options available for Docker, leading yours truely to create a base Docker image which you can use and which will be the base of this configuration. It is based on the standard Debian package, with a bit of wrapping to help extrapolating the configuration from environment variables. Also it offers a simpler way to fill up the infamous records.config file.

The file structure you need to create is the following:

├── Dockerfile
└── etc
    ├── compress.config
    ├── header_rewrite.config
    ├── logging.yaml
    ├── plugin.config
    ├── records.config.yaml
    └── remap.tpl.config

Base configuration

It looks complex but in truth each file manages one specific and simple aspect of the configuration. Let’s start with the two only files that you really need to edit to get started:

First is the records.config, which here is records.config.yaml thanks to the Docker image’s wrapper which will do the conversion from friendly YAML to whatever the ATS DSL is.

proxy:
    config:
        admin:
            user_id: trafficserver
        log:
            logging_enabled: 3
        http:
            server_ports: "9000"
            connect_attempts_timeout: 30
            normalize_ae: 2
        reverse_proxy:
            enabled: true
        url_remap:
            remap_required: true
            pristine_host_hdr: true

At the core, we’ve got the two most essential lines:

reverse_proxy.enabled — makes sure to work in reverse proxy mode
remap_required — disables the forward proxy mode

Then a bunch of stuff that will be useful now or later:

user_id — required to run it as trafficserver user (which is the default on Debian)
logging_enabled — you’ll see the logging config later
server_ports — put there whichever port(s) you fancy
connect_attempts_timeout — always have a timeout, this sounds reasonable
normalize_ae — normalization of Accept-Encoding HTTP header which optimizes the caching of resources when Accept-Encoding is part of Vary (the value 2 is to have both gzip and brotli supported)
pristine_host_hdr — just forward the initial hostname to the services behind, makes your life easier

Second is the remap.config, whose job is basically to route your URLs to your web servers. You will however write the remap.tpl.config file, which leverage’s the Docker image’s wrapper that can inject environment variables into it:

map / {{ BACKEND_URL }}/

Nothing fancy here. You are just redirecting all traffic to BACKEND_URL, which is an environment variable that you will have to feed into your Docker container.

You could stop there in the configuration as this is already a working reverse caching proxy routing to your app! But you’ll see that there are a few more goodies to be found.

Compression

It is often advised to use compression of all the text assets1 for performance reasons, and indeed a long HTML page can be much faster to download if encoded in Brotli for example.

The web has commonly 3 compression algorithms:

gzip — The fastest, most commonly supported and not necessarily the most efficient but it is already doing a good job
brotli — The newest kid on the block, made by Google, outperforms gzip by far in compression rate but is obviously much more expensive to encode
deflate — Too similar to gzip to be interetsing

When making a HTTP request, a client will specify through the Accept-Encoding header which of those algorithms it supports. Typically, all the major browsers support all of them.

However not all servers support compression — and even if they do, the support is often complex or straight out causing bugs. It is typically handled through middlewares that will modify the rendered response on-the-fly in a more or less accurate and standard-aware way. Not to mention the cost and complexity of getting those CPU-bound algorithms running in Python, Node or your favorite server-side interpreted language.

Because of that, you will have a much more consistent result if you just rely on your reverse proxy for this. It is a popular feature of Cloudflare, or if you want it with nginx you’re gonna have to go with an experimental plugin or with the paid version of nginx.

Fortunately it’s already embedded in ATS, which is able for each resource that you cache to generate different alternates for different compressions, including gzip and brotli. This all happens on-the-fly, and the cache is able to convert one encoding to the other without fetching the original resource again.

Here’s what you need to do in order to perform this magic.

First, edit the plugin.config file to put the following line:

compress.so /etc/trafficserver/compress.config

You’re telling to load the compress.so module with the compress.config file as configuration. You could enable the plugin just for some routes with a different configuration for example, but for this exemple it will just be global.

In compress.config, put the following:

remove-accept-encoding true
supported-algorithms br,gzip
minimum-content-length 0

compressible-content-type text/*
compressible-content-type *font*
compressible-content-type *javascript
compressible-content-type *json
compressible-content-type *ml;*
compressible-content-type *mpegURL
compressible-content-type *mpegurl
compressible-content-type *otf
compressible-content-type *ttf
compressible-content-type *type
compressible-content-type *xml
compressible-content-type application/eot
compressible-content-type application/pkix-crl
compressible-content-type application/x-httpd-cgi
compressible-content-type application/x-perl
compressible-content-type application/json
compressible-content-type image/vnd.microsoft.icon
compressible-content-type image/x-icon

You can interpret the options the following way:

remove-accept-encoding — don’t tell the server that the client accepts different encodings as it doesn’t really matter, the work is going to be done on the proxy side
supported-algorithm — allow brotli and gzip, which as stated before are the two interesting algorithms. In order for this to work, you’ll observe that normalize_ae from records.config is set to 2, because otherwise the normalization process would just systematically strip brotli from the list of candidates
minimum-content-length — no limits on the content size, as the default value is made for gzip and brotli is more efficient
compress-content-type — a reasonable list of content types that we’d like to compress before sending away, adjust for your needs

With this configured, you get top-of-the-line compression basically effortlessly and for free. Keep an eye on your CPU though, because this might hurt of abused: if required you can disable brotli for routes that have lots of throughput and don’t stay in cache as gzip still has significant gains over the absence of compression while being much faster to compress.

Logging

You probably want at least some logs, to have a glance at what is going through your server. You will be the one deciding what to put in there, following the fairly extensive documentation, but let’s consider that since you’re dealing with a Docker service you’ll want to output everything to stdout.

You can start with the following logging.yaml file:

logging:
    formats:
        - name: access
          format: '%<cqtn> %<cqhm> %<cluc> -> %<shn>:%<nhp> %<crc>'

    logs:
        - mode: ascii
          filename: stdout
          format: access

That’s super basic but you can extend it as much as you want!

Headers

A last thing that you’ll probably want to do is to add some meta information to the response header in order to know the caching status. Add to your plugins:

header_rewrite.so /etc/trafficserver/header_rewrite.config

And then put this content in header_rewrite.config:

add-header X-Cache %{CACHE}

Thanks to this you can know when navigating to your project which pages come from the cache and which don’t.

Run it all

Now is the time to test the whole solution. Start the whole thing using Docker Compose:

docker-compose up --build

When it’s started, give a try to http://localhost:9000/. The same thing as with the stand-alone backend should be displayed and if you try it from a single browser you should see exactly the same result.

The interesting part is when you open with a different browser, or when you disable the cache in your current browser. You’ll notice that the random string stays consistent between different browser instances. It means that indeed, the cache is shared between all browser instances. Mission accomplished!

To convince yourself even further, you can inspect the X-Cache header from your HTTP requests. If you just refresh the page without changing the ETag, whether your receive a 200 or a 304 on the client — depending on your browser’s cache status — you will see in the header that you had a cache hit, which will be confirmed by the backend’s access log which will only show 304 responses.

Wrap up

You have explored throughout this article the power of the RFC 9110 and of respecting it. It allows to express advanced indications regarding the caching of content, its re-validation in real time and its transformation.

Using this tactically can greatly reduce the load on a backend server by getting most of the request results from the cache instead of implementing proprietary logics through middlewares and obscure configuration mechanisms.

This however outlines that few reverse proxies actually implement all the necessary tools. Which puts the light on Apache Traffic Server, an extremely powerful piece of software quite generally ignored by the community but which provides out of the box all the latest goodies from your dreams, with a specialized and simple configuration — if you go beyond the initial intimidating aspect of the configuration files.

And while the respect of RFC 9110 applies to the reverse proxy, it can also be a powerful tool for you to leverage in a typical headless CMS setup. This remains a topic to be explored further in a new article!

Compression and encryption are fundamentally incompatible notions as they try to achieve strictly opposite goals. Compression will try to condense the entropy of your text while encryption tries to drown it into as much noise as possible. As such, compressing a secret and serving it through HTTPS will lead to security issues such as BREACH. Just make sure to never ever compress a secret.