We Analyzed 5 Million Desktop and Mobile Pages. Here’s What We Learned About Page Speed

2019-09-24

Introduction

This data-driven study aims to provide a good foundation for the speed of browsing sessions. We look at different metrics and categories to identify patterns or find insights on what works and what doesn’t.

By the end of the study, the reader should be able to gain appreciation on the importance of factors that affect speed performance of a website. We also speak from a viewpoint of the user experience, how these metrics ultimately affect the browsing experience.

The study is structured as follows:

  • Part 1 sheds some light on the various page characteristics such as Image Weight that impact speed performance of a website, and rank them according to their importance.

  • Part 2 provides a more detailed picture of the various page characteristics by examining some individually. This descriptive information complements and expands on the results presented in Part 1 and helps contextualise patterns and trends.

Methodology

Part 1

The first part of the study aims to answer the following two research questions:

  1. How accurately can speed metrics such as Time to First Byte (TTFB) or Start Render be predicted based on page characteristics?

  2. What are the most important page characteristics that influence page speed ?

To answer the first question, our approach is based on statistical modelling and machine learning. The method of choice is called Gradient boosted decision trees. Gradient boosting is a popular machine learning technique which is highly scalable, efficient, doesn’t require a lot of data transformations and considered to have a high prediction performance.

The process of training the model follows the commonly used steps: 1) split the data set to train and test sets, 2) fit the model using training data, 3) make predictions for the unseen test set, and finally 4) evaluate the results.

In this study, the prediction accuracy of the model is measured by calculating the correlation between the observed and predicted values of dependent variables (i.e. different speed metrics). Correlation of 0 would indicate that the model with given independent variables (i.e. a page characteristic like Image Weight) is not able to predict a dependent variables (e.g. TTFB) better than randomly guessing, i.e. independent variables don’t contain any relevant information or at least the model is failing to take that information into account. Whereas a maximum value for a perfect fit is 1; independent variables can perfectly explain the observations.

For the second question at hand, the choice was to use a recently developed technique called Leave-One-Feature-Out-Importance (LOFO). The idea behind LOFO is to iteratively remove one independent variable at a time from the data set and measure how much predictive power is lost compared to the full model. If the prediction accuracy is not affected at all, then the feature can be considered to be irrelevant for the task. On the other hand, removing important features should cause large loss of accuracy. The results provide insights into where site owners may need to look for page speed optimizations.

The HttpArchive database acted as datasource. HTTPArchive tracks how the web is built by crawling some 5 Million Webpages pages with Web Page Test, including Lighthouse results, and stores the information in BigQuery where it is publicly available. More information can be found on https://httparchive.org/ and a getting started guide can be accessed here: https://github.com/HTTPArchive/httparchive.org/blob/master/docs/gettingstarted_bigquery.md

We randomly sampled 100.000 rows from the May to July 2019 crawls. Each row contains details about a single page including timings, # of requests, types of requests and sizes. In addition, data points on the # of domains, redirects, errors, https requests, CDN, etc. are available. In total, we looked at 300.000 rows for both Mobile and Desktop. Please note that adding more data points to the model would not change the overall results. We did run the models with 3x more data, but the results were similar to the reported ones.

Part 2

We looked at various speed metrics in more detail that contribute to better user experience. We looked at First Contentful Paint (FCP), First Input Delay (FID), various Image Performance Metrics, and Time to First Byte (TTFB) as our core metrics.

We gathered data from browser sessions for the entire month of May 2019. Here we decided to join the May 2019 HttpArchive data with the June 2019 Chrome-UX data. Chrome-UX data reflects how Chrome users experienced TTFB’s, FCP´s and FID´s in real-world conditions. For example, TTFB data in the HttpArchive has traditionally been measured synthetically in the lab accessed from a single server location, so what makes this dataset unique is that it reflects the real-world server response times experienced by Chrome users as they navigate the web. More information on the Chrome-UX data can be found here: https://developers.google.com/web/tools/chrome-user-experience-report/

Information on defining the SQL queries to extract the desired data were partly drawn from discussions on the HttpArchive. Most of the time we expanded on those to provide novel insights. See: https://discuss.httparchive.org. The SQL queries can be provided on request. One needs to copy-paste those into Google´s BigQuery (see getting started guide above).

We aggregated each metric and sliced it according to various categories. We looked at metrics across devices (Desktop vs Mobile), JavaScript Frameworks, Compression, Use of Third-party Scripts, CMS and Hosting Platforms, CDN Usage and CDN Providers.

We also categorized metrics based on existing benchmarks or provisioned standards. A metric can be categorized mainly as Fast, Average, or Slow. We see that exact numbers make sense, but users will most likely view a browsing experience as a good, average, or poor.

Speed Metrics

To provide some context for the forthcoming sections, please find below a description of the various speed performance metrics.

Overview of Page Speed Metrics

  • Time-to-First-Byte (TTFB) is measured as the time from the start of navigation request until the time that the client receives the first byte of the response from the server. It includes network setup time (SSL, DNS, TCP) as well as server-side processing. This metric is useful as it ignores the variability of front end performance and focuses only on network setup and backend response time. Available in both HttpArchive (lab) and Chrome-UX (field) datasets.

  • StartRender / First Paint (FP) / First Contentful Paint (FCP) mark the points, immediately after navigation, when the browser renders pixels to the screen. FCP, FP and StartRender are all slightly different but in practice for the vast majority of sites they end up being the same (within measurement error). FCP and first paint are when chrome thinks it painted content. StartRender is observed from the outside and is when the viewport actually changed. StartRender is available in HttpArchive (lab) while FCP in Chrome-UX (field), while the FCP metric is available in both datasets.

  • Visually Complete is a user experience metric that identifies the moment in time when users perceive that all the visual elements of a page have completely loaded. Only available in the HttpArchive dataset.

  • Speed Index is a performance metric that measures how quickly a page renders visual elements, from the user’s perspective. This is a rate-of-speed metric that is closely related to Visually Complete, which is a moment-in-time measure. Only available in the HttpArchive dataset.

  • onLoad calculates the speed when the processing of the page is complete and all the resources on that particular page, such as images, CSS and other functionality have finished downloading. Onload Time is usually integrated with online speed testing tools such as Pingdom. Only available in the HttpArchive dataset.

  • Fully Loaded adopts the exact same process to page speed as onLoad but will add an additional two seconds after the Onload trigger has fired to make sure there is no further network activity. The reasoning behind this is to ensure more consistently with tests. Only available in the HttpArchive dataset.

  • First Input Delay (FID) measures the time from when a user first interacts with your site (i.e. when they click a link, tap on a button, or use a custom, JavaScript-powered control) to the time when the browser is actually able to respond to that interaction. Only available in the Chrome-UX dataset.

References and learn more http://designingforperformance.com/basics-of-page-speed/ https://addyosmani.com/blog/usability/ https://www.youtube.com/watch?v=XvZ7-Uh0R4Q https://developers.google.com/web/fundamentals/performance/user-centric-performance-metrics https://www.slideshare.net/nicjansma/reliably-measuring-responsiveness

PART 1

As discussed above, in this part of the study we look at the various page characteristics and see how the different speed metrics vary across each. Also, we look how well the selected page characteristics can explain the observed speed metric values.

We consider the following page characteristics:

  • Request counts
    • reqTotal
    • reqHtml
    • reqJS
    • reqCSS
    • reqImg
    • reqFont
    • reqFlash
    • reqOther
    • reqAudio
    • reqVideo
    • reqText
    • reqXml
    • reqWebp
    • reqSvg
  • Sizes // # of bytes TRANSFERRED (so may be bigger when uncompressed)
    • bytesTotal
    • bytesHtml
    • bytesJS // e.g. average total bytes of JS downloaded per page
    • bytesCSS
    • bytesImg
    • bytesFont
    • bytesFlash
    • bytesOther
    • bytesHtmlDoc // size of the main HTML document
    • bytesAudio
    • bytesVideo
    • bytesText
    • bytesXml
    • bytesWebp
    • bytesSvg
  • Script counts
    • num_scripts
    • num_scripts_sync
    • num_scripts_async
    • num_iframes
  • Other
    • numDomains // # of unique domains across all requests
    • maxage0 // # of responses with max-age=0
    • gzipTotal // # of bytes xferred for resources that COULD have been gzipped
    • gzipSavings // # of bytes that could have been saved if ALL gzippable resources were gzipped
    • cdn

We consider the following page speed metrics:

  • TTFB // time to first byte of HTML doc response
  • StartRender // when rendering started
  • Visual Complete
  • Speed Index // webpagetest.org Speed Index score
  • onLoad // window.onload
  • Fully Loaded // the page is fully done (according to webpagetest.org)

1. Which Page Characteristics Impact the TTFB Metric?

Figure 1.1