🔙 Improving Time To First Byte and Web Vitals

9/7/2021

In this post we will cover quite a few different concepts I recently explored that influence the page speed, how they relate to Core Web Vitals and how to improve them!

Let's start by agreeing on some terminologies and concepts that will be often referred on this post!

Time To First Byte (TTFB)

TTFB measures the duration from the user or client making an HTTP request to the first byte of the page being received by the client's browser.
https://en.wikipedia.org/wiki/Time_to_first_byte

This measure is used to indicate the responsiveness of the resource, our websites, our servers and so forth. This is often displayed in milliseconds (ms) in the tools, and the rule of thumb recommended by several players in the industry is 200ms!

This concept alone is important enough to look for improvements that will impact our customer's experience. However, it becomes better when we correlate it with another customer-focused metric, the Largest Contentful Paint.

Web Vitals

The Core Web Vitals (CWV) initiated is meant to help us quantify the experience of our sites and find improvements that will result in a better customer experience. Besides providing metrics to look after and improve, these factors are now considered a ranking signal for the Google Search algorithm.

From the CWV metrics, we will be focusing on Largest Contentful Paint (LCP). If you are interested in knowing more about these metrics, check the Web Vitals page.

LCP metric reports the render time of the largest image or text block visible within the viewport relative to when the page started loading.
https://web.dev/lcp/

The time it takes to render our website's largest image or text block depends on how fast we deliver our pages and how fast they download any additional assets that make it.

So knowing that TTFB measures the responsiveness of our websites, then the LCP is probably the most important metric we can influence from those of CWV. And that is why we are going to focus on improving TTFB in this post.

The element associated with LCP on spotify.com.

Now we know what these concepts are and how to interpret them, let's see how to measure them!

Measuring where the time is spent

Before jumping on ways to improve metrics, we need to understand the current state of our applications and where the bottlenecks are.

Knowing how to measure changes is the most important step to get confidence out of our initiatives.

It is possible to track TTFB on;

Devtools, on previewing time breakdown that highlights the value for every resource requested by the browser, including the website itself. That is present in every modern browser.
cURL, on your terminal, can tell you the TTFB of any request. There are plenty of gists on how to do it.
Using other tools/sites like Bytecheck or KeyCDN.
Application Performance Monitoring (APMs) can also help us track this from within our clients (CSR) and servers (SSR).

There are also a few ways you can track LCP;

Lighthouse is available on Chrome or as a standalone app on wed.dev and generates a report about Performance that tells you the LCP of the page inspected.
Other websites like WebPageTest will review your website and provide useful and detailed reports on areas of improvement.
Some tools, like Calibre, help us automate and track progress over time.
Application Performance Monitoring (APMs) can also help us track this from our clients and servers #2!

The problem can be anywhere in between our routing infrastructure to the application code! Thankfully these tools help us understand better where the issues lay.

My advice here is to start small and start early. Pick the tool we are currently more familiar with, or the one we find easier to start and then move on until we reached its limits!

Let's talk improvements now!

Improving TTFB for websites

CDNs are an excellent way to speed up the responsiveness of your pages, assets, etc. That is especially true when serving assets that do not change so often or rarely change. We should aim to have CDN caches on top of our fonts, images, data payloads, and entire pages (when possible).

This directly impacts several customer experience factors, more evident on LCP, as the customer will be downloading our pages much faster than if they had to reach the server.

Photo by NASA on Unsplash

Next is the data source closer to the server and the server closer to the customer!

Caching strategies are ineffective when the requests are unique or too distributed to a point CDNs will not get many hits. This scenario increases the importance of;

Having our server as close to the customer as possible, distributing our sites globally when possible.
Having our data stores as close as possible to the servers. If our pages fetch data from databases or APIs to render (CSR or SSR), then let's ensure these resources are in the same region as our servers.

Both of these strategies avoid round-trips between regions and avoid adding a lot of latency to the requests.

Improving TTFB of the assets in your websites

Occasionally we can also observe a good time spent on the "pre-transfer" phase. The DNS resolution/Handshake/SSL is part of the initial setup of a request lifecycle, and they can take a considered portion of the time of the request.

Web request time breakdown considering HTTP + TCP + TLS as per Timing web requests.

Anecdotally I often observe around 200ms spent on this phase on various sites and their respective resources.

The following rel values for the link tags are good ways to speed up your TTFB on our websites.

DNS prefetching; adding this rel to a link tag pointing to the domain you will download the resource will make the browser attempt resolving the domain before that resource is requested on the page. Effectively saving time when you actually need the resource. Example; <link rel="dns-prefetch" href="[https://fonts.googleapis.com/](https://fonts.googleapis.com/)">.
Preconnect; adding this rel to a link tag results in the DNS resolution, and also the TCP handshake, connection establishment (on HTTPS). Example; <link rel="preconnect" href="[https://fonts.googleapis.com/](https://fonts.googleapis.com/)" crossorigin>.
Preload; adding this rel to a link tag results in the browser fetching the asset while looking at that tag in the head of our documents. This will make the resources available sooner and avoid blocking or delaying the rendering! Example<link rel="preload" href="style.css" as="style">.

⚠️ Utilising DNS-prefetch or preconnect against the same website domain is ineffective because that would already be resolved and cached by the browser. So target other domains!

⚠️ Because these are all tags included in the head of our documents, if we are already preloading assets under a DNS, we are less likely to have the compounding effect of using preload+prefetch+preconnect.

⚠️ Do not preload too many assets; otherwise, we can make things worse than before! Any preloaded asset will compete for bandwidth with other resources of ours pages.

💡 Consider using both preconnect and prefetching together so while one will save time on the resolution, the other will save time on the handshake.

💡 Consider using preload for assets above the fold only to optimise LCP, for example, hero images or fonts. Additionally, consider using prefetch and preconnect for resources that live in other domains and will be requested later in the page lifecycle.

Improving TTFB on server

Reviewing the connections between the servers and other data sources (databases, APIs, ...) is important because the pre-transfer phase can take a long time there too!

This can positively impact all requests on the servers and not only initial page loads.

Request breakdown when fetching extra resources from a data source on the server

The most impactful advice regarding TTFB is to utilise keep-alive when possible.

Keep-alive is a property of an HTTP connection that maintains a connection alive after being established and used for the first time. Subsequent requests to the same destination will reuse that connection as opposed to creating new ones every time.
This is commonly supported by HTTP clients in the vast majority of frameworks and languages. For instance, in Node.js, we could do it by doing const agent = new https.Agent({ keepAlive: true });.

I hope we can see now how we can save time spent on pre-transfer protocols on every request **when utilising this.

⚠️ Worth pointing out that maintaining keep-alive connections on the server can impact load balancing and memory consumption, so there are valid reasons to keep it disabled. It requires testing!

When using HTTP/2, this will probably be handled for us when utilizing their clients, and it is even more powerful.

The impact

TL: DR; The higher your TTFB, the higher the LCP will be! However, I could not find a linear correlation between TTFB and LCP in my endeavours on page performance. For instance, in some experiments, I noticed:

A small delay in the request time, 50ms to 200ms, did not clearly affect the LCP.
A longer delay, 1s to 2s, correlated to an increase of the LCP time, but not by the same values, maybe from 0.5 to 1 second?

My personal conclusion is that chasing up improvements of < 200ms is less likely to improve LCP scores individually, but if that is an improvement on the TTFB of your website, then it is awesome!

The point is not to get fixated on the metrics! Depending on your website and infrastructure, different initiatives can yield many different results!

Ensuring we review our websites and APIs from first principles is important to identify potential improvements and deliver better customer experiences!

I hope this was useful, and I see you next time 👋

Related readings

Let me know what you think about this post on twitter!