Benefits & considerations for caching HTML at CDN edge.
Sep 23, 2019
Some websites have a high Time to First Byte due to server side issues. To check if your website has such an issue, run it through WebPageTest from a location geographically closest to your servers with the fastest possible connection setting. A First Byte time in excess of 1.5 seconds for such a run would point to a possible server side issue.
The ideal way to address this is to root cause the issue within infrastructure, code, config or database on the server. But, often due to legacy, resource, expertise or time constraints, caching the entire HTML becomes a lucerative option to evaluate.
HTML Caching Considerations
However, a lot of HTML pages need not change from visitor to visitor and may be updated after days, weeks or months. Some examples - lead generation landing pages, blog post pages (like this one), product documentation, etc. Often, the most popular pages of a website fall in this category and can be cached. In such cases, caching the HTML offers an easier way to resolve server side slowness issues.
If your website's most popular pages need not differ from visitor to visitor and are not updated every minute, caching them may offer a notable speed boost.
Caching HTML at CDN Edge
It is a common practice to cache HTML at origin with Redis or Varnish. Both enable powerful caching and cache invalidation capabilities. For example, Varnish Edge Side Includes enables caching parts of a page while fetching rest from the origin. But, not every HTML requires such powerful configurability. And, such HTML pages can be cached at the CDN edge. Caching at the CDN edge offers following benefits over caching at origin:
Considerations when Caching HTML at CDN Edge
Caching HTML at CDN edge is a lucerative option, but it comes with the following caveats that need to be considered:
CDN needs to be configured to enable caching@edge for certain URL patterns while leaving other URL patterns as is. This means, a strategy to invalidate cache@edge that is in sync with website build & deploy process needs to be worked out. Failure to do so may result in stale pages or broken functionality being served.
For every URL that isn't cached@edge, serving the request requires establishing two SSL connections (client <-> edge and edge <-> origin). This can cause first-byte time to increase as we observed in the following WebPageTest run:
|HTML via CDN?||No||Yes|
|Test Client Location||Mumbai||Mumbai|
|Time to First Byte||391 msec||713 msec|
|WebPageTest Run URL||source run||source run|
This can be overcome by increasing the HTTP keep-alive value within CDN configuration. A higher keep-alive value means the CDN <-> origin connection isn't closed for longer duration to avoid connection overhead everytime a new request comes in.
Implementing HTML Caching with AWS Cloudfront CDN
I set-up HTML caching for certain sections of this website a month ago. Infact, this post's HTML is being served by Cloudfront CDN (Oct 2019 update : This website is now removed from the CDN which was setup for testing purposes for this blog post). Below are the high-level details of the configuration:
a. Create a Cloudfront Distribution to cache HTML
It is ideal to have a different Cloudfront distribution to handle HTML from the one used to handle the static files (CSS, JS, images). This is because we would invalidate CDN cache for HTML differently than we do for static files. More on this later.
b. Create Behaviors for Cacheable Path Patterns
By default, Cloudfront CDN determines if it should cache an object by looking at HTTP cache header from the origin's response. To cache HTML for certain path patterns, we should create separate behavior for those path patterns (from the default behavior):
For each of the non-default behaviors, we should specify the TTL values. These values determine how long HTML matching these URL patterns would be cached (irrespective of their HTTP cache header value). In the below example, we ensure that the CDN caches URLs matching a certain pattern for 7 days:
c. Invalidating CDN Cache
We cannot use file-name based cache busting for HTML files like we use for static files. This means we will have to explicitly invalidate HTML CDN cache during website updates, changes to pages, etc. This can be done via AWS CLI and can be integrated within build & deploy scripts.
aws cloudfront create-invalidation --distribution-id <DISTRIBUTION_ID> --paths /post/*
If some of the most popular pages on your website are anonymous static pages that do not change frequently, caching them at the CDN edge can boost their performance better than caching them at the origin. However, it is important to work out their cache invalidation in sync with website's build & update process.