OnCrawl Labs: Anomaly Detection in SEO

November 16, 2020 - 4  min reading time - by Rebecca Berbel
Accueil > Technical SEO > Anomaly Detection in SEO

“Anomaly detection” is a fancy way of saying “finding things that aren’t normal. It falls under the umbrella of SEO monitoring, which is probably already part of your SEO strategy and is covered by any number of features in classic tools, from Google Analytics to OnCrawl’s SEO crawler. There is an endless list of SEO use cases for anomaly detection, whether we’re talking about website performance, indexing statistics, or search behavior.

The principle is to use the pattern-sensing abilities of an algorithm to filter out noise in order to find elements that don’t fall into the predicted pattern.

What types of SEO metrics can show anomalies?

Any SEO metric that is subject to change can show little variances that don’t make much of a difference — or big jumps, which can have an impact on your SEO and your SEO objectives: that is, your website’s visibility, your organic traffic, and — ultimately — your digital marketing conversions.

Depending on what you’re analyzing, anomaly detection can alert you to:

  • A spike in traffic
  • A series of new pages that aren’t indexed as fast as they usually are
  • A surge in Googlebot crawl, which often comes before an update that might affect the site
  • An increase HTTP status errors, whether 404s or 5xx server errors
  • Slower response times than usual
  • A significant change in the number or type of search terms that your pages rank for

In many cases, detecting when this sort of change falls outside the range of normal variance is extremely important. It allows you to react in a timely manner, either to correct problems before they get too big, or to seize opportunities before they’ve passed.

How can you use OnCrawl data to detect anomalies?

If you run regular crawls, you create a history of many different metrics for your site. By looking at this historical data, machine learning algorithms can then examine new crawls to determine whether or not the metrics you want to track are within a normal range — or, alternatively, if they are unusual.

To give an extremely simplified example, if your payload data shows that your breaking news articles generally load in between 0.12 and 0.18 seconds, but your latest crawl shows them loading in 0.25 seconds, that’s clearly something you’ll want to look into!

What can you do with the results of anomaly detection?

Finding unusual behavior or unusual performances on your website can be good or bad news, but in either case, it’s news you want to know about!

This means that there can be huge benefits in running regular crawls, and then checking for anomalies each time a crawl has run. If you’re not good at automating repeated tasks, you can still do this manually.

Once you are running regular analyses to look for anomalies, one of the best things to do with the results from running machine learning models that can detect anomalies is to set up an alerting system. This can be as simple as printing a conclusion on the screen, or as complex as implementing certain fixes automatically. A good place to start might be somewhere in between, like sending an alert via Slack or email to yourself or someone else in charge of SEO.

What you need to get anomaly detection up and running

As you’ve probably realized, the OnCrawl Labs project relies on your crawl data via API access, so you’ll need both API access and a history of crawls of your site. They’ll need to meet the following criteria:

  • The crawls must all use the same crawl profile, so we can be sure we’re comparing apples to apples.
  • The crawls must all contain the metrics you want to analyze. For example, if you intend to detect spikes in Googlebot activity, you’ll need to have log monitoring enabled, for all of the crawls. If you want to monitor drops in average SERP position, you need to have connected Google Search Console for all of your crawls.
  • There must be at least five, unarchived crawls with the chosen profile available. Clearly, the more history you can provide, the more accurate the results will be, but five is a good place to start.

The calculations can require a bit of computing power, so you’ll want to run them on a computer with a good GPU (graphics card).

You’ll also need a data scientist or a software engineer to run the operations following the steps and the code provided in the OnCrawl Labs project.

Why use OnCrawl Labs for anomaly detection?

We know the online life of your site is not a perfectly stable and unchanging thing.

For one thing, your ecommerce store has a slow August every year, your news site peaks with every major election in influential countries, or your e-learning site is busy at back-to-school time and before end-of-the-year tests.

For another, your site changes over time: your SEO improves, your brand becomes more well-known, your budget for web hosting increases. Normal traffic five years ago hopefully isn’t anywhere near what normal traffic looks like today.

This is why the anomaly detection project we’ve provided in OnCrawl Labs takes all of that into account. Using machine learning rather than statistical analysis allows for growth (or other changes) over time. And we’ve chosen an algorithm that does just that.

Additionally, because you can access pretty much any metric through our API, you can easily change the metrics we track in the code we’ve provided in order to suit your site or your particular SEO project.

What other uses of OnCrawl Labs can improve your SEO?

Rebecca is the Product Marketing Manager at Oncrawl. Fascinated by NLP and machine models of language in particular, and by systems and how they work in general, Rebecca is never at a loss for technical SEO subjects to get excited about. She believes in evangelizing tech and using data to understand website performance on search engines. She regularly writes articles for the Oncrawl blog.
Related subjects: