Crawl Budget Tracking Before and After an Update

December 10, 2019 - 7  min reading time - by Steve
Accueil > Technical SEO > Crawl Budget Tracking Before and After an Update

Psst… There is a secret I want to tell you.

Your site has a “crawl budget” set by Google.

This is the secret metric used by Google to measure two things:

  1. How well your site is built
  2. How popular your site is

This article will focus on the first point.

Improving the build quality of your site will increase your crawl budget.

The bigger your crawl budget the more frequently Google will stop by and read your pages.

Let’s start by sharing an understanding of what a crawl budget is.

What is Crawl Budget

Google uses a special software program called a web crawler (or spider) to read pages on your site.

They call this web crawler, Googlebot.

Crawl budget is the term to describe how often Googlebot will crawl your pages.

By optimizing your site you can increase your crawl budget.

Google has said that your crawl budget is a combination of:

  • Crawl rate – The speed at which Googlebot can crawl your site without breaking your servers
  • Crawl demand – How important your webpage is to Google’s users

As these metrics improve, you will see Googlebot visiting more often. Reading more pages on every visit.
Once Google crawls a page it will add the content to the Google Index. Which then updates the information shown in Google Search results.

By optimizing for crawl budget you can improve the speed of updates from your site to Google Search.

Why you should improve your Crawl Budget

Google has a tough task. They need to crawl and index every page on the internet.

The power they need to do this is huge and they can’t index every page.

Optimizing your crawl budget will give your site the best chance of appearing in search.

How to improve your Crawl Budget

Improving a site is about making Googlebot’s time on a site as efficient as possible.

We don’t want:

  • Googlebot reading pages that we don’t want in Google Search.
  • Googlebot seeing server errors
  • Googlebot following broken / dead links
  • Googlebot waiting for the page to load
  • Googlebot reading duplicate content

All the above is wasting the precious resources of Google and could see your crawl rate drop.

Crawl Budget and Technical SEO

A lot of what you need to do as part of technical SEO is the same as optimizing the Crawl Budget.

We need:

  • Optimize robots.txt and check for errors
  • Fix any hreflang and canonical link tags
  • Resolve non-200 pages
  • Fix redirects and any redirect loops
  • Make sure any sitemaps are free from error

Let’s take a look next at creating the perfect page for Googlebot.

How to Create the Perfect Page

OK so maybe not the perfect page but we should try and improve the page as much as we can.

Let’s look at some common on-page issues that you can improve.

Page Issues

  • Duplicate Content – Mark any duplicate content on your site with a canonical link tag.
  • Non-SSL pages – Find any HTTP links and convert them to HTTPS. If you don’t have an SSL certificate then get a free one from Let’s Encrypt.
  • Only crawl useful pages – Use your robots.txt file to reduce where Googlebot can go. For example, if you have pages used for an admin then Disallow this in your robots.txt file.

  • Thin Content – Consider blocking the scraping of pages with thin or very low content. If you have pages that have little value to a user then don’t waste Googlebot’s time on these pages.
  • Server Errors – Server errors are a sign of an unhealthy web server. If your site is returning 5xx errors then fixing these can increase the crawl rate.

Slow Loading Pages

One way to improve the crawl budget is to make the page fast.

Fast pages make Googlebot faster and this is a sign to Google that the webserver is “healthy”.

Google has already said that page speed increases the crawl rate:

Making a site faster improves the users’ experience while also increasing the crawl rate.

  • Page Weight – This metric is the overall size of your page. This includes all the Javascript CSS and images on the page. This should be under 1mb in total.
  • Optimized Images – Images should be as small as possible in KB without losing quality. Using a tool like Squoosh can help with this.

  • Minified CSS and JS – Minify your JS and CSS files. This is the process of removing all unnecessary characters from the file. Use CSSNano and UglifyJS to minify the files.
  • Compression and Caching – Make sure the GZip or BR compression on the server. This will speed up the time it takes to get a file. Add caching so that the file is only downloaded once.

For a list of page speed improvements have a look at this in-depth website performance review. It has 30 steps to improve website performance.

Methods to Measure Your Changes

As a smart SEO you know before starting any optimization you need to track the changes.

You need to pick a data point with two properties:

  1. You must be able to track the data point over time.
  2. You must be able to influence that data with your actions.

So what is the data point we should track for crawl budget?

We said earlier that Google uses two factors when deciding on a crawl budget:

  • Crawl rate – The speed at which Googlebot can crawl your site without breaking your servers
  • Crawl demand – How important your webpage is to Google’s users

Since we are technical SEO’s our job is to improve the crawl rate.

So this is the data point that we should track.

Tracking Crawl Rate

So how do we track the Googlebot crawl rate?

We need to use your web server access logs.

The logs store every request made to your webserver. Every time a user or Googlebot visits your site a log entry gets added to the access log file.

Here is what an entry would look like for Googlebot:

127.0.0.1 - - [11/Nov/2019:08:29:01 +0100] "GET /example HTTP/1.1" 200 2326 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

There are three important data points in each log. The date:

[11/Nov/2019:08:29:01 +0100]

The URL:

“GET /example HTTP/1.1”

And the user-agent which tell us that it’s Googlebot making the request:

"Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

The above log is from an Nginx webserver. But, all web servers such as Apache or IIS will have a similar access log entry.

Depending on your set up, you may have a Content Delivery Network (CDN). A CDN such as Cloudflare or Fastly will also create access logs.

Analyzing an access log manually is not the most fun although it is possible.

You could download the access.log and analyze this using Excel. Yet, I would recommend that you use a log analyzer such as the one from OnCrawl.

This will allow you to see the Googlebot crawl rate on a graph and in real-time. Once you have this monitoring setup to track the crawl rate you can start to improve it.

Making Changes

Now we know what we are tracking we can look at making some improvements. But, don’t make many changes at the same time. Be methodical and make changes one by one.

Build, Measure, Learn.

Using this technique you can adapt the changes you are making as you learn. Concentrating on the tasks that are improving the crawl rate.

If you rush and change too much at once it can difficult to understand the results.

Making it hard to tell what has and has not worked.

Over time as the page improves you will see an increase in the crawl budget as the crawl rate goes up.

Wrapping Up, Crawl Budget Tracking Before and After an Update

We have covered exactly what a Crawl Budget is.

As a Technical SEO, you have the power to increase the crawl rate of the site.

Improving the technical health you can make Googlebot’s time on your site efficient.

Track the crawl rate using your logs for accurate results.

Use Build, Measure, Learn as a technique to make one change at a time and improve as you go.

Over time your crawl rate will increase. Your pages will appear quicker in Google Search Results. And users will have a great experience on your site.

Stephen is the co-founder of PageDart a Technical SEO site dedicated to making the web stable, simple, speedy, and secure.
Related subjects: