- Use cases
- Customer Success
- LOG IN
- Start free trial
Back in April, 2018 we were attending the world-known UK SEO conference called BrightonSEO. This conference is a great place for networking opportunities but also for its high-level conferences hold by top speakers from the industry. OnCrawl was attending for the second time and gave a conference about Rankbrain and AI in SEO.
This article aims to summarize this talk and to highlight strategic points to maximize your SEO efforts in a Rankbrain environment. You can read the full article or skip to the part you’re more interested in.
Before to dig more into the impact of Rankbrain on SEO, it is essential to understand how search engines work.
Actually, the life of a query made by a user begins long before your capture, by exploring and indexing the billions of documents that make up the web. Google’s algorithms are computer programs designed to navigate through billions of pages, find the right clues and send you exactly the answer to your question.
Doing so, Google consumes annually as much energy as the city of San Francisco. This energy is called crawl budget and refers to resources Google setups to crawl your website.
Understanding crawl behavior with log files analysis (OnCrawl data)
It is thus important to understand how Google is seeing your website and how often he is crawling it.
Google says that:
“[…] we do not have a single term to describe everything this term seems to mean
If you observe that new pages are usually explored the same day they are published, then you don’t really have to worry about the exploration budget
[…] if a site has less than a few thousand URLs, it will be browsed correctly most of the time”
Things are, we all know this is not accurate and that we need more precise data about how crawl budget is being spent on your website. Indeed, 100% of websites have exploration data within their Search Console account.
While you can only access a sample of results using GSC, log files analysis uncovers your whole website’s crawl behavior. There, you can detect anomalies in the bot’s behavior, crawl spikes or any other issue.
SEO visits repartition – OnCrawl’s data
Is crawl budget related to rankings and visits? The more the index is updated, the more Google will know that the page fit « the best answer to a query ».
In order to define how your crawl budget is going to be spent you need to:
“Google’s spiders regularly crawl the web to rebuild our index. Crawls are based on many factors such as PageRank, links to a page, and crawling constraints such as the number of parameters in a URL. Any number of factors can affect the crawl frequency of individual sites.
Our crawl process is algorithmic; computer programs determine which sites to crawl, how often and how many pages to fetch from each site. We don’t accept payment to crawl a site more frequently.” Google Webmaster Guidelines.
Prior to optimize crawl frequency, you must:
Crawl frequency regarding strategic groups of pages – OnCrawl’s data
Algorithms use variables to define the amount of crawl budget to spend on your website. It is necessary to return the most beautiful data to help algorithms trigger the increase of crawl budget.
Google uses internal metrics (content, entity detection, schema.org, payload, popularity, etc) and external metrics (page rank, Trust & Citation Flow, etc) to index a page or not.
Correlation between the number of links and crawl frequency – OnCrawl’s data
Web scaling principles relies on :
Once indexed, Google scores each page of your website and uses many different computed metrics such as Pagerank, quality scores, website trust, etc and then aggregates many attributes to each page (Meta data, Title, Schema.org, N-Grams, payload, content interpretation…)
Web scaling principles also relies on:
Crawling, indexing and ranking are now bounded to artificial intelligence. Algorithms, data aggregation, human/supervised validation, massive usage of data models, knowledge and interrelationship analysis refers to AI. Machine Learning can be a part of the process by adding multiple iteration cycles.
Google Rankbrain is a machine learning process and consists of re-ranking results. From interpreting a request to guessing what the user was looking for and adapting the result, Rankbrain is an ongoing test and learn process.
To keep your rankings and match Rankbrain’s guidelines, you need to help Google by maximizing exploration sessions and point the bot and users to the right direction:
Page importance also plays a huge role in the re-indexing process. A query-independent score (also called a document score) is computed for each URL by URL page rankers.
“[…] the page importance score is computed by considering not only the number of URLs that reference a given URL but also the page importance score of such referencing URLs”
The data about the Page importance score are shared to URL managers, which pass a page importance score for each URL robots, and content processing servers.
One example of a Page Importance score is the PageRank, which makes crawl rate data-driven by SEO.
Page importance patent US20110179178
Page importance can however be optimized by playing with the right metrics:
Experimentations from the OnCrawl SEO team and insights shared by the community have shown that:
Impact of depth on crawl frequency – OnCrawl’s data
Google is sensitive to the volume of content;
Impact of word count on crawl frequency – OnCrawl’s data
Impact of number of links on average position
Less bounce rate= more bots hits
Better CTR=better positions
Number of bots hits regarding level of depth – OnCrawl’s data
Google classifies three types of request:
Depending on the type of request to rank, Google will more or less crawl you.
Google does not understand the content, but it seeks to understand concepts by using detection of named entities:
To illustrate this type of request, we have use the following request: “converse men’s chuck taylor”.
Here the user wants to get to a website where there will be more interaction.
Pages on “converse men’s chuck taylor” refers to cold content and transactional request and receive an average crawl frequency. It’s important to think the linking in terms of brand/product entity relationships.
This type of query is made when the user is looking for a specific bit of information. For example “Clinton”.
Results show pages about “Clinton” or “Trump” as this is a hot “informational” subject, receiving a high crawl frequency. The linking needs to be powerful and based on named entities.
Navigational requests are made by users looking to reach a particular website.
On this example, pages targeting “BrightonSEO speakers” are displayed because they are known as part of the BrightonSEO website. This website is receiving a low crawl frequency and needs to rely on good semantic SEO optimizations, Trust and Citations.
Furthermore, Rankbrain is based on Natural Language Processing. It means the algorithm is able to represent strings of text in very high-dimensional space and see how they relate to another. Google then maintains a knowledge base on named entities and understands the relationships between entities:
In a search engine algorithm, a tool is needed to calculate a “similarity” score between two documents.
This note is strategic for creating a relevant ranking, but it is used in combination with a very large number of other signals, rather major (like the popularity of the page), or minor (like the presence of a keyword in the page URL).
Each entity or concept is vectorized for the machines to understand:
And then Google can evaluate the distance between two concepts. For each entity, Google knows:
To use a more concrete example, let’s search for “how old is the wife of Bill Gates?”
To summarize, Google will tend to crawl pages by “package”:
The more content it will met with expected entities (related to the theme), the deeper it will crawl. The type of entity, its rarity or popularity will directly impact the crawl frequency.
Internal linking must then be thought from the relationship between entities present in your content. You can check you named entities using the OnCrawl Toolbox or directly within your crawl reports.
Editorial insights tab about named entities within OnCrawl’s crawl report
First, map your website by:
Inrank flow: Shows how internal linking popularity is spread – OnCrawl’s data
To do so, the first step would be to crawl your website in order to:
Then, use named entities in your link anchors, create packages of linked pages according to entity typology. Here is the example of a media site:
Crawling, indexing, ranking and re-ranking are all based on artificial intelligence and machine learning principles.
It is not so intelligent because they need us to validate models. Never forget that they are only algorithms: you have to know and manipulate the metrics they take into account to manipulate them!
The key is to make users want to come back to manipulate the CTR and bounce rate! In other words, favorise titles, meta description, content, speed and UX/UI!