NLP (Natural Language Processing). Is it really starting to play a role in SEO, or is it just a geeky term used amongst over-thinking technical SEO bods?
“When people realise NLP stands for Natural Language Processing instead of some 1970s hypno-mumbo-jumbo, they’ll realise that not only is it here to stay, it’s the very bedrock of the mantra “organising the world’s information”. Information is not just about websites, so Google needs its own repository to store ideas, topics and “things”. NLP (and NLU and BERTs and BERTIEs and Word2Vec, CBOW and even nGrams and countless other mathematical approaches) are attempts at turning human ideas into machine understandable structure. Google bought Freebase for a massive sum and now it seems the company is being run by prophets of this approach to understanding human endeavour. It’s going places… but it isn’t going away!”
Dixon Jones, CEO Inlinks.net
From time to time, a change in an algorithm happens, and if you are lucky to be one of the first to see it and adapt your SEO, you reap massive rewards while everyone else is playing catch up.
So, savvy SEOs are always checking down the road to see where the next significant change is coming from.
In that case, is NLP one of those game-changers? It looks like it might just be.
Our recent case study where we partnered different and well-respected SEOs in the industry, showed significant results when NLP optimization was implemented.
But let’s start way back at the beginning. Let me walk you through:
- What NLP is
- How it is connected to Google’s algorithm.
- Why It’s important
- Practical ways you can implement NLP into your SEO.
Rewinding to the end of 2019, Google announced the official BERT algorithm release. The very fact that they were announcing it meant that it was something substantial, and Google confirmed this when they stated the full rollout affected 10% of all search queries.
The SEO community did what it always does after an update. They collected data to run tests, put their collective brains together, and decided that BERT was focused on quality content, context, and natural language processing (NLP).
Why BERT AND NLP go hand-in-hand
“By trying to understand the ‘context’ of search queries, and by mining the relationship between stop-words with other words in the query, Google’s BERT algorithm pushes the limits of how a traditional search engine understands a user’s needs. With Google weighing in on NLP to gain a deeper understanding of user’s queries, it implies that content-makers who get more specific, relevant and descriptive with their content and information (including links) in their pages tend to rank higher.”
Jaya Kumar Data Scientist, Deep Learning and NLP Specialist
The best way to understand NLP from an SEO perspective is to first understand BERT.
BERT stands for Bidirectional Encoder Representations from Transformers.
BERT contains two major components: data (pre-trained models) and methodology (defined way to learn and use those models).
When we talk about models, we mean sets of data. So BERT collects specific sets of data relating to content and then learns how to analyze that data.
NLP is BERTs brain. It’s able to understand the word or phrase in its context by looking at various signals around it. From the words before it to the words preceding it. From the subsection of the page to the entire page.
If you can analyze the content on pages that Google ranks highly, and look at the content before and after phrases and words, then optimize your page to offer something similar, you would be able to provide Google with something very similar to the top-ranking pages.
That’s what NLP represents and, with the BERT algorithm update, Google is using it.
Google is no longer looking at words or phrases individually, as we understood in the past when it would traditionally run keyword research. Still, now they are looking at sentences, paragraphs, and the query as a whole. They are looking at the sentiment.
“It’s important to remember that NLP has been around for decades. It’s not “new tech” that Google all of a sudden adopted. They’ve always used it, in one form or another. Now, it’s BERT but it’s subject to change as developments in this area progress.
NLP is very useful to compare topics that you covered in your content, to your competitor’s content. I mainly use it as a way to see if I may have missed anything, or if I should add supporting content around the main one.
At the end of the day, don’t get tied up in tools. When your content is done, take a few steps back and evaluate whether your content makes sense. Search engines will never buy anything from you, users will. Give the users what they want!”
Steven van Vessum, VP of Community at ContentKing
Looking at NLP from Google’s perspective
When trying to predict and evolve with Google, we always need to be looking at the evolution of the algorithm from Google’s perspective. What are they looking to get from the update and the introduction of NLP?
The answer is search quality.
For Google, It’s the user’s experience that keeps the lights on.
Users have owned a whole generation of the internet. They are smarter at searching and are more specific about what they want to see. They are also more impatient and Google has to keep adjusting or improving, or another search engine will, and Google’s monopoly on the world dies.
According to the information we can find on Google’s blog, 15% of search queries are used for the first time. People are using more and more long-tail searches to find an answer to their question, especially with the rise of voice search.
It means that, sometimes, the algorithm doesn’t have enough historical data to anticipate the intent behind the search term, so it’s going to struggle to understand what the user is looking for.
The key is to understand the language better. NLP is Google’s way of doing that.
Let me quote the statement from Pandu Nayak’s article:
“With the latest advancements from our research team in the science of language understanding–made possible by machine learning– we’re making a significant improvement to how we understand queries, representing the biggest leap forward in the past five years, and one of the most significant leaps forward in the history of search. “
Content is more than king; it’s the kingdom
“Google BERT is one of the most significant and enormous leaps when it comes to the overall development of Google Search within the years. The real power of Google BERT comes from the Transformer, an attention mechanism that recognizes the contextual relation between words in a text. ELMo and ULMFiT are the other 2 components of BERT – the first one resolves the problem with the polysemy in NLP and the second one significantly improves the transfer learning process.
With BERT we have attention to the next and previous word, attention between identical/related keywords, identical/related words in other sentences, attention to other words predictive of word, attention to delimiter tokens. This makes learning and examining of the words containing in a query much more sophisticated and accurate than before”
Dido Grigorov, Head of SEO, Serpact
Many sentences contain “stop words” or words that have multiple meanings, like to, in, get-go, etc. These words have so many purposes that it makes it difficult for Google to understand the context, even with machine learning advancing as quickly as it is.
That’s where the sentiment comes in. Another tool Google has developed to understand content.
By sentiment, we mean the undertone or feeling represented in the content. It can be positive, negative, and neutral, and it’s scalable.
In layman’s terms, positive sentiment means the use of positive words, like excellent, affordable, and alleviate. Anything in any such context which has a positive meaning or outcome. For example:
“The medicine is awesome, it really works, it alleviates pain, and it’s affordable too.”
They are given a sentiment score between 0.25 and 1, whereas, on the flip side, negative sentiment is given a score between -0.25 and -1.0.
That leaves us with neutral sentiment, which is when the sentiment score falls between those two numbers. So between -0.25 and +0.25.
We also know that Google looks at sentiment both on page and subsection levels too.
Why is sentiment important for SEOs?
Simply, if all the results on page 1 are offering positive sentiment and your page is mostly classified as having negative sentiment, there is a strong chance Google will not consider your page relevant to what the user is looking for.
You need to know what an “entity” is…
If you’re going to look into NLP further and start to work with it (which we suggest you do), you’ll come across the term “entity,” and it’s vital when understanding NLP and how it works.
The entity is a word or phrase that represents an object which can be identified, classified, and categorized.
Examples of objects are:
- consumer goods
NLP’s job is to select and evaluate these entities from your content.
Since Google distinguishes these entities, the search engine is capable of utilizing this information to satisfy the user and provide better search results.
With NLP, two additional metrics are essential—salience and category.
Category – there is not much to explain. As SEOs, we’re used to categories being important.
Salience – in NLP represents the entity’s importance in the text.
For example, the word “morning” may be more important than “evening” when we talk about breakfast. So Google would score salience higher for morning than it would evening in this context.
The entity is given a salience score that ranges from 0.0 to 1.0. The higher salience value, the more important and relevant the entity is for the subject of the page.
Google is putting words in context and ranking entities in order of importance to the context of the page.
Understanding NLP: putting it all together
We’ve covered the foundations of NLP, and, in particular, its relationship with BERT. Let’s quickly recap before moving on to talk about how we can incorporate NLP into some of our SEO strategies and processes.
NLP is essentially the process that Google has incorporated to better understand the main keywords or phrases on a page by looking at the content surrounding them.
That can be a word directly before and after the “entity” that is being analyzed, the context of the subsection, or the entire page. Then we can look at it from a sentiment perspective. What kind of emotions does a piece of content have, compared to others that are ranking?
Finally, there’s a category and salience. How can you categorize this piece of content? We rate entities in relation to the surrounding content to determine importance. Some words are more important than others in a specific context
That’s the theory of NLP in relation to Google and SEO moving forward. But what does it mean in practicality?
How you can implement NLP into your SEO Strategy (Especially after the BERT update)
NLP, natural language processing, is a term that is often used to mean “entities.” Entities are “things” such as a person, place, or object. I have seen entities effective in the field. This year, entities have consistently ranked as one of the top 15 factors in IMG’s top 100 ranking factors. The tricky part has been isolating entities in a testing environment. I haven’t had much luck there. One note though, entities are usually nouns. When you look at lists of contextual terms, such as LSI lists or lists generated from tools that use TF-IDF, you’ll notice that there is a lot of overlap between the lists. With the overlap, it will be difficult for people to continue to say that Google doesn’t use LSI if they also make the claim that Google uses entities.
Kyle Roof, Co-founder, Internet Marketing Gold
There’s a common phrase in British English. “there is more than one way to skin a cat.” Why English people were running around skinning cats, I have no idea, but the sentiment is perfect for this section.
There are a variety of strategies that are effective in SEO. No two are ever the same, so it’s very difficult to offer advice on implementing NLP into your SEO without sounding like you think you know it all.
I don’t think that. But it would be good to use practical examples of how some of the SEO community have taken on board NLP and how you can adapt it to your own theories and strategies of SEO as you see fits best.
Google’s Natural Language API demo
“It wouldn’t surprise me a bit to learn that Sergey and Larry were already talking about some sort of NLP capability when BackRub was first conceived. Sure, it was originally planned as just a way to quantify and qualify links to a given URL. But once they overcame their initial resistance to the notion of an advertising-supported search engine, I’d be willing to bet they recognized how much greater a challenge – not to mention, opportunity – it was to get under the covers and match a query to a page’s content. Natural language processing already existed – and if anyone has contributed more to its further development than Google, they’ve maintained a very low profile.
NLP can be applied to both query analysis and content analysis, but the former is where it really pays off. Determining what a searcher’s intents and needs are is, after all, the most important factor for any company professing to seek to deliver the best possible results for each search. Doing that in milliseconds and at scale is the trick, and NLP seems to be at the heart of many of Google’s recent updates. NLP and word vectors go hand-in-hand, and it’s obvious the search engine has gotten much better at understanding both queries and page content over the last few years.
Doc Sheldon, Founder, Intrinsic Value SEO
Successful SEO nearly always starts with data. In this case, we’re fortunate that Google has a natural language API demo that you can plug into and examine any text for free, giving you a lot of the data you need to work with NLP.
You can visibly see how Google has analyzed your page’s text and compare it to the pages that are dominating the SERP.
Keyword Research goes deeper
The BERT update has given us a “before and after” situation we can use to make comparisons. Looking at keywords that took a hit after the BERT update, and those that stayed strong or improved can help you to understand better what Google is looking for.
If you had an affiliate review page which took a hit, it might be that the new page 1 results for this search are mostly made up of eCommerce stores or large authority sites like eBay or Amazon.
In this example, Google seems to be looking at this query and deciding that searchers are looking for prices and to buy the product. That’s why your review page took a hit; it offers different intent.
Look at which other keywords those top pages are ranking for too. Don’t just settle on looking at traffic and thinking what you’d like to rank for, look at combinations of keywords that Google deems fits for this specific query type.
The top-ranking pages are also ranking for X, Y, and Z keywords. If Google likes that combination of keywords, have your own content target that combination of keywords too.
Internal and external backlinks
There’s a new dimension to links. It’s not just about authority and relevance. (Which it is, we’re not devaluing those metrics) but it’s also about context on the page.
Thanks to NLP, link structure and placement are even more of a factor. What is the anchor text? Is it contextually relevant to the page? And, most importantly, where is it placed on the page?
We can’t just randomly put a link on the page; it has to make sense to the content. The link needs to have the right context to have full value.
Rick Lomas from Link Detective says,
“Google has re-iterated over and over again that a link should exist to enhance the reader’s experience. This is going to be more important than ever.
There is no point ‘re-inventing the wheel’. If there is adequate information about something elsewhere on the internet, then it makes complete sense to link to it.
An example of a Google friendly external link (IMHO) would be in a sentence like;
“If you are wondering how to use the space you have in your kitchen, Bella Cuisine have some excellent kitchen designs.”
This would be a legitimate link that would make sense for people to follow to find out more about kitchen designs, thus enhancing the reader’s experience.
On the other hand…
Links in sentences like;
“I decided to make this sandwich using the best stainless steel kitchen knife I could find”
This would not enhance the experience of someone who is looking for sandwich ideas because the link focuses on the knife they could use to make it.
Internal links are totally under your control, but the same rules apply – try to use the internal links to enhance the reader’s experience so that they can achieve their original search intention”.
Let’s just work off the theory that if the top pages (of similar intent) are ranking so highly, it must be because Google likes them. They’re a good fit.
Taking the assumption that with all other things being equal, similar relevancy, authority, age of the site, etc., if your on-page content was similar in terms of the metrics that matter to Google, you should expect to be ranking similarly.
It’s never that precise, there are always exceptions, but it’s a great starting point.
Analyzing content like tf–idf is one step, and an important step, and optimizing this way will get you results.
We know that Google incorporates other data sets, like sentiment, entities, category, and salience score. It then interprets them as part of their ranking algorithm; it therefore makes sense to analyze competitors by these metrics and identify where your page is different from those top-ranking pages. With that information, you can make changes.
“The role of search is to connect people with answers.
To do this, intelligent, AI-powered services like search engines and voice assistants need to understand the world around us, in the same terms that we do — and they do this by scouring all of the information that exists on the web. They learn best when that information is structured for search with special schema markup that provides context and meaning, but most of the content out there right now does not exist in that format. This makes it difficult for search engines to discern between objective and subjective facts — and easy to return wrong answers.
If the same systems were able to deeply and objectively understand a topic, they may more reliably be able to tell the right answers from the wrong answers. Advancements in natural language understanding can help us unlock this future.”
Ric Rodriguez, SEO Consultant – Yext
Understanding NLP allows you to understand how Google is now looking at your page. The more traditional ranking factors still hold their importance, but additionally, Google is pushing massively towards understanding user intent and search relevancy. We should embrace that.
Now we know what Google’s new metrics are, we can analyze our own pages, analyze the pages that are doing well, and see where there are any significant differences. We can make changes and see how Google reacts.
At Surfer, we’ve been creating a tool that automates a lot of this analysis for you. Both in the data collection and also interpretation. In other words, what you need to do to your page to give it the salience or sentiment of the most valued pages in the search.
It doesn’t mean NLP is the only factor that matters. But our data and the data collected by other SEO community members have demonstrated to us that it moves the needle, so it’s exciting.
Like with most things in SEO, those who react first and adapt to the changing landscape of SEO reap all the rewards. With that in mind, NLP has to be worth exploring because perhaps your competitors aren’t yet, and that gives you an advantage.
Huge thanks to Michał Suski from Surfer SEO for providing superb data and being a mentor about NLP topic.
Bonus quotes from the industry experts…
“Google’s Bert algorithm is ultimately looking for more context on page. It’s important to consider the main keyword layers included in the first page results and adapt your content to suit. I’ve had a lot of success analysing content on the first page for primary and secondary keywords and their parent keywords to create better content. For example one user might be searching for the ‘best wineries in Australia’. When you analyse the first page results for the most mentioned keyword and their crossovers with their competitors, you’ll notice a pattern. That pattern is what Google is noticing it’s users are looking for. For example I might get a breakdown of the top Australian wineries and what makes them the best in one article on the first page. Another result I might get some mentions of great wineries in Australia but they’ve also included the main grape varietals that Australia is known for first. If I were to produce a stronger article I would analyse the primary and secondary keyword crossovers and paint a picture of the way users need their answer written and structured. In this case I would include a layer of the regions and what grape varietals Australia is known for then I would talk about the best wine Australia has to offer. See how I’ve taken the best of 2 results and incorporated into mine? Now Google has a strong article to choose from when considering the best results.”
“As Google advances as an AI-first company, the importance of understanding how to optimize content for NLP cannot be overstated. With voice search further advancing semantic search queries from a bag of keywords to long tailed questions, there is no doubt that crafting content for both human language and machine understanding simultaneously is here to stay.”
Clarence Lam NLP & AI Consultant
“Google’s algorithm update BERT focuses on improving language understanding. It means that Google is trying to better understand more natural language/conversational queries and better understand the nuance and context of words in searches to be able to better match those queries with helpful results. This means that with the use of natural language processing (NLP) to understand the context behind the searches, providing high-quality informational content becomes more and more important to rank higher in SERPs. SEOs will have more focus on creating high-quality content that answers very specific questions. This will also include focusing on long-tail keywords that would also be helpful to rank better for voice searches as people use longer sentences (more long-tail keywords) when using voice search in comparison to typing in a query in Google.”
Eva Lauridsen, Head of Partnerships, AccuRanker
“Clearly, there’s been a heightened interest in this whole topic area by SEOs even before the BERT update.
I think the bi-directional nature of this algorithm is interesting as it shows the additional lengths Google is striving to go to in its quest to properly understand user intent – and let’s face it – they’re only just getting started. Where are we going to be in 10 years’ time!
But asking whether NLP is a good tool for SEO, is like asking whether a hammer is a useful tool for renovating your property. It certainly could be useful, but it will of course depend on the task at hand.
If you really need to analyse a reasonable body of text and understand the context and meaning of the article then it will be useful – for example you can use Google’s Natural Language API (or IBM Watson or other open source alternatives like OpenAI) to analyse and/or categorise web pages which might be useful to you if you are looking at competitors in the SERPs at a market or keyword level to tell you what types of sites/pages you are competing with.
If you are looking to cluster relevant groups of ranking keywords and pages together to discover the best optimisation opportunities and map them to the buyer journey then there are a number of different approaches and tools you could utilise. E.g. A tool like a community detection algorithm might help you find multiple sets of closely related keywords (that are not necessarily semantically related) at scale, and you could use another tool like TF/IDF to label each cluster. You could then use a third tool (like our SERP Intent API) to understand the overwhelming user intent for this set of keywords by analysing each SERP.
So, my advice is pay attention. We can clearly see why Google is using it. Does it mean you need to fight fire with fire and also invest time and money in understanding the latest NLP or machine learning techniques? Maybe it would be fun, but it’s probably not necessary….I mean after all there are plenty of SEO software platforms out there that love doing this all day.”
Laurence O’Toole, CEO, Authoritas & Linkdex
“NLP in SEO is absolutely vital now and will become even more so as it helps Google understand and then it can determine relevancy (of a block of content, the page, the website). As you become more skilled it gives you better-trained SEO eyes, and you see things very few can see. You start to understand, visualize, and then can mimic the patterns Google is seeing and preferring for any search term. Google is simply trying to reproduce in search of what happens already, in everyday speech and life.
Remember when your mom would say your name and was also mad, vs. saying your name when she is so proud of you? Mine would be “Bryan Bloom!” (same 2 words) but I knew the tonality, even without any other words. Google understands tonality (mad vs. happy, etc.) and meaning with NLP, the words before, after, how they relate, connect, entities, etc. Further, Google looks for connections to rank relevance. For example if an article is written about horse racing and it contains the words, jockey, finish line, betting, race track, and the other article does not… which do you think will rank better?”
Bryan Bloom, Founder, Mover Search Marketing
Technical SEO is one of the growing SEO fields today. It involves finding SEO solutions based on the how and why of how search engines–and websites–work. This ebook is everything you’ve always wanted to share with your clients, your friends, and your marketing teammates.
“For the longest time, SEO folks have been banging their heads against (virtual) brick walls trying to reverse-engineer search engines, while at the same time search engines have spent vast resources trying to reverse-engineer humans.
NLP from both a query and on-page standpoint takes search engines that much closer to scalable human mimicking to better understand the what for and why of search queries and the best match of document to that query.
The evolution of smarter entity-focused & NLP-driven tools to help SEO’s understand why a current page might rank and how to improve an existing page to rank is exciting, primarily because SEO’s are inherently human (and not databases of entities, vectors and BERT computations).
The biggest challenge I see is one of educating clients and colleagues that this is more than traditional keyword research, counts, and inclusion, rather a sophisticated analysis methodology that incorporates a more ‘human-factor’ into page content and connections to deliver the most satisfying answers (to both humans and engines) for each user query.
While Google specifically has made great strides, there’s still (at least currently) an ‘art’ to support the science of NLP, with a human touch often necessary in sorting, culling or refining machine-learned suggestions.
As Kyle Roof says often… “Google’s algorithm is an algorithm”, but – I might add – becoming a little more human with every query.”
Grant Simmons, VP of Search Marketing, Homes.com
“NLP plays a massive role in understanding what a user wants when he types something, and that’s what the number one USP Google has, delivering the results matching users intent. Google, being the market leader in the search engine industry, has been doing everything that can take them closer to deliver the best possible results/experience to retain every user that they get and, BERT is the name of Google’s adaptation of the latest search patterns which aren’t really easy to understand otherwise.
The most important learning for us, the SEOs, is to understand what the user wants when landing on your website i.e the intent, and design your content elements to cater those needs, rather than focusing on some specific keywords like those old and beautiful days.”
Nitin Manchanda, Global Head of SEO, Omio.com
Is it worth your time? We’ve found that websites in highly competitive sectors or those who have lost traffic can benefit the most from integrating NLP into their SEO strategy. Using NLP can be very time consuming, but it can really help to inform content strategy and reduce wasting resources.
Risk and benefits? The biggest risk with focusing on NLP is not doing your research. We find it highly effective once we have analysed not only the client website but competitor pages too. This allows us to see what impact NLP is having within a sector with some certainty – guessing is potentially harmful and undermines the primary reason to use NLP in the first place.
We use NLP to understand where our clients will be considered experts and use it for evidence based content strategies. We can optimise content better but more importantly we can be much more aggressive in positioning the whole website within a sector.
Future of Buzz? When the SEO industry first got its hands on the tools to use NLP there was a lot of excitement. But much of that dropped off. It’s not the quick fix many in SEO were looking for. However NLP is always going to be a part of search and if you put in the time then it can become an integral part of your SEO campaigns.
Steve Bailey – Head of Technical SEO & CRO – Spike Digital
“NLP has always been a core topic for search engines. The core feature of a search engine is to provide pertinent results to input queries. This requires several steps which all involve NLP techniques at different levels :
Queries processing: identify typographic errors, understand search intent, identify core entities
Documents processing: understand document structures and topics addressed, find named entities and understand discourse, identify sentiments expressed and figure out human discourse markers such as irony and humour.
Building pertinent SERP pages: by selecting documents matching the user query and ordering the results according to what would maximize user expectations. Scoring models are part of NLP techniques with TF-IDF in the first place and other scoring techniques added recently including personalization based on user experiences and understanding. At this stage, machine learning adds customisation to provide better SERPs
Tanguy Moal, CTO, Co-founder, OnCrawl