Improving HTML First Byte Time with CDN Caching

Benefits & considerations for caching HTML at CDN edge.

Sep 23, 2019


Some websites have a high Time to First Byte due to server side issues. To check if your website has such an issue, run it through WebPageTest from a location geographically closest to your servers with the fastest possible connection setting. A First Byte time in excess of 1.5 seconds for such a run would point to a possible server side issue.

High First Byte time within WebPageTest
Example of a high First Byte time from WebPageTest

The ideal way to address this is to root cause the issue within infrastructure, code, config or database on the server. But, often due to legacy, resource, expertise or time constraints, caching the entire HTML becomes a lucerative option to evaluate.

HTML Caching Considerations

Unlike Image, JavaScript and CSS files, HTML is rarely cached because:

  • It may need to differ from visitor to visitor
  • It may have to be updated with passing time

However, a lot of HTML pages need not change from visitor to visitor and may be updated after days, weeks or months. Some examples - lead generation landing pages, blog post pages (like this one), product documentation, etc. Often, the most popular pages of a website fall in this category and can be cached. In such cases, caching the HTML offers an easier way to resolve server side slowness issues.

If your website's most popular pages need not differ from visitor to visitor and are not updated every minute, caching them may offer a notable speed boost.

Caching HTML at CDN Edge

It is a common practice to cache HTML at origin with Redis or Varnish. Both enable powerful caching and cache invalidation capabilities. For example, Varnish Edge Side Includes enables caching parts of a page while fetching rest from the origin. But, not every HTML requires such powerful configurability. And, such HTML pages can be cached at the CDN edge. Caching at the CDN edge offers following benefits over caching at origin:

  • Serving HTML from the edge reduces latency. This is notably helpful for websites with global traffic footprint.
  • Serving part of the website traffic directly from the CDN takes the load off the origin server(s).
  • During traffic spikes, the CDN served HTML pages remain unaffected from the slowness due to load on origin servers.

Considerations when Caching HTML at CDN Edge

Caching HTML at CDN edge is a lucerative option, but it comes with the following caveats that need to be considered:

Cache@edge Invalidation

CDN needs to be configured to enable caching@edge for certain URL patterns while leaving other URL patterns as is. This means, a strategy to invalidate cache@edge that is in sync with website build & deploy process needs to be worked out. Failure to do so may result in stale pages or broken functionality being served.

HTTP Keep-Alive

For every URL that isn't cached@edge, serving the request requires establishing two SSL connections (client <-> edge and edge <-> origin). This can cause first-byte time to increase as we observed in the following WebPageTest run:


HTML via CDN? No Yes
Origin Location Mumbai Mumbai
Test Client Location Mumbai Mumbai
Time to First Byte 391 msec 713 msec
WebPageTest Run URL source run source run

This can be overcome by increasing the HTTP keep-alive value within CDN configuration. A higher keep-alive value means the CDN <-> origin connection isn't closed for longer duration to avoid connection overhead everytime a new request comes in.

Setting origin keep-alive for AWS Cloudfront CDN
Setting origin keep-alive for AWS Cloudfront CDN

Implementing HTML Caching with AWS Cloudfront CDN

I set-up HTML caching for certain sections of this website a month ago. Infact, this post's HTML is being served by Cloudfront CDN (Oct 2019 update : This website is now removed from the CDN which was setup for testing purposes for this blog post). Below are the high-level details of the configuration:

a. Create a Cloudfront Distribution to cache HTML

It is ideal to have a different Cloudfront distribution to handle HTML from the one used to handle the static files (CSS, JS, images). This is because we would invalidate CDN cache for HTML differently than we do for static files. More on this later.

b. Create Behaviors for Cacheable Path Patterns

By default, Cloudfront CDN determines if it should cache an object by looking at HTTP cache header from the origin's response. To cache HTML for certain path patterns, we should create separate behavior for those path patterns (from the default behavior):

AWS Cloudfront Behavior Path Patterns
Creating path pattern specific caching behaviors

For each of the non-default behaviors, we should specify the TTL values. These values determine how long HTML matching these URL patterns would be cached (irrespective of their HTTP cache header value). In the below example, we ensure that the CDN caches URLs matching a certain pattern for 7 days:

Overriding caching TTL within AWS Cloudfront
Specifying Caching TTL

c. Invalidating CDN Cache

We cannot use file-name based cache busting for HTML files like we use for static files. This means we will have to explicitly invalidate HTML CDN cache during website updates, changes to pages, etc. This can be done via AWS CLI and can be integrated within build & deploy scripts.

aws cloudfront create-invalidation --distribution-id <DISTRIBUTION_ID> --paths /post/*

Conclusion

If some of the most popular pages on your website are anonymous static pages that do not change frequently, caching them at the CDN edge can boost their performance better than caching them at the origin. However, it is important to work out their cache invalidation in sync with website's build & update process.


Punit Sethi

About the Author

Punit Sethi has been working with large e-Commerce & B2C websites on improving their site speed, scalability and frontend architecture. He tweets on these topics here.