Improving HTML First Byte Time with CDN Caching


Benefits & considerations for caching HTML at CDN edge.

Sep 23, 2019




Some websites have a high Time to First Byte due to server side issues. To check if your website has such an issue, run it through WebPageTest from a location geographically closest to your servers with the fastest possible connection setting. A First Byte time in excess of 1.5 seconds for such a run would point to a possible server side issue.


High First Byte time within WebPageTest
Example of a high First Byte time from WebPageTest


The ideal way to address this is to root cause the issue within infrastructure, code, config or database on the server. But, often due to legacy, resource, expertise or time constraints, caching the entire HTML becomes a lucerative option to evaluate.


HTML Caching Considerations

Unlike Image, JavaScript and CSS files, HTML is rarely cached because:


  • It may need to differ from visitor to visitor
  • It may have to be updated with passing time

However, a lot of HTML pages need not change from visitor to visitor and may be updated after days, weeks or months. Some examples - lead generation landing pages, blog post pages (like this one), product documentation, etc. Often, the most popular pages of a website fall in this category and can be cached. In such cases, caching the HTML offers an easier way to resolve server side slowness issues.



If your website’s most popular pages need not differ from visitor to visitor and are not updated every minute, caching them may offer a notable speed boost.



Caching HTML at CDN Edge

It is a common practice to cache HTML at origin with Redis or Varnish. Both enable powerful caching and cache invalidation capabilities. For example, Varnish Edge Side Includes enables caching parts of a page while fetching rest from the origin. But, not every HTML requires such powerful configurability. And, such HTML pages can be cached at the CDN edge. Caching at the CDN edge offers following benefits over caching at origin:


  • Serving HTML from the edge reduces latency. This is notably helpful for websites with global traffic footprint.
  • Serving part of the website traffic directly from the CDN takes the load off the origin server(s).
  • During traffic spikes, the CDN served HTML pages remain unaffected from the slowness due to load on origin servers.


Limitations of HTML Caching at CDN Edge

Caching HTML at CDN edge is a lucerative option, but it comes with the following caveats that need to be considered:

Cache@edge Invalidation

CDN needs to be configured to enable caching@edge for certain URL patterns while leaving other URL patterns as is. This means, a strategy to invalidate cache@edge that is in sync with website build & deploy process needs to be worked out. Failure to do so may result in stale pages or broken functionality being served.

First Byte Time for URLs not cached@edge

For every URL that isn’t cached@edge, every request & response will now be routed through an additional node (the edge). Processing these requests will also require additional connection between edge and origin. For such a URL on our website, we observed first byte time increase as following:


HTML via CDN? No Yes
Origin Location Mumbai Mumbai
Test Client Location Mumbai Mumbai
Time to First Byte 391 msec 713 msec
WebPageTest Run URL source run source run

Based on the above table, it is important to know the first byte degradation for non-cached pages. One also needs to factor in how popular & critical these non-cached pages are for the website.



Implementing HTML Caching with AWS Cloudfront CDN

I set-up HTML caching for certain sections of this website a month ago. Infact, this post’s HTML is being served by Cloudfront CDN (Oct 2019 update : This website is now removed from the CDN which was setup for testing purposes for this blog post). Below are the high-level details of the configuration:

a. Create a Cloudfront Distribution to cache HTML

It is ideal to have a different Cloudfront distribution to handle HTML from the one used to handle the static files (CSS, JS, images). This is because we would invalidate CDN cache for HTML differently than we do for static files. More on this later.

b. Create Behaviors for Cacheable Path Patterns

By default, Cloudfront CDN determines if it should cache an object by looking at HTTP cache header from the origin’s response. To cache HTML for certain path patterns, we should create separate behavior for those path patterns (from the default behavior):


AWS Cloudfront Behavior Path Patterns
Creating path pattern specific caching behaviors

For each of the non-default behaviors, we should specify the TTL values. These values determine how long HTML matching these URL patterns would be cached (irrespective of their HTTP cache header value). In the below example, we ensure that the CDN caches URLs matching a certain pattern for 7 days:


Overriding caching TTL within AWS Cloudfront
Specifying Caching TTL

c. Invalidating CDN Cache

We cannot use file-name based cache busting for HTML files like we use for static files. This means we will have to explicitly invalidate HTML CDN cache during website updates, changes to pages, etc. This can be done via AWS CLI and can be integrated within build & deploy scripts.

aws cloudfront create-invalidation –distribution-id <DISTRIBUTION_ID> –paths /post/*

Conclusion

If some of the most popular pages on your website are anonymous static pages that do not change frequently, caching them at the CDN edge can boost their performance better than caching them at the origin. However, it is important to work out their cache invalidation in sync with website’s build & update process.




About the Author
Punit Sethi has been Performance Engineer for a decade working on improving speed of websites. He frequently tweets here.
Punit Sethi


Previous Post

How to Accurately Measure Your PageSpeed Score

Well, can’t we just go to PageSpeed Insights or web.dev measure, slap our dear URL and know the scores? Unfortunately, the answer is ‘No’. Read on to know...continue reading



We blog about Site Speed, it's impact on Site Goals and what can be done about it. Join the mailing list to be notified of new posts (about twice a month).