Natural language processing (commonly referred to as NLP) is a subset of Artificial Intelligence research, which is concerned with machine learning modeling tasks, aimed at giving computer programs the ability to understand human language, both written and spoken.
Natural language processing is not only concerned with processing, as recent developments in the field such as the introduction of Large Language Models (LLMs) and GPT3, are also aimed at language generation as well.
With the rise of people using machine learning in SEO, it’s time to go back to the basics and dig into the theoretical aspects of NLP, and more specifically – the five phases of NLP and how you can utilise them in your SEO projects. As part of this article, there will also be some example models that you can use in each of these, alongside sample projects or scripts to test.
The five phases presented in this article are the five phases of compiler design – which is a subset of software engineering, concerned with programming machines that convert a high-level language to a low-level language.
You can also watch the video below which is based on the full article:
Phase I: Lexical or morphological analysis
The first phase of NLP is word structure analysis, which is referred to as lexical or morphological analysis. A lexicon is defined as a collection of words and phrases in a given language, with the analysis of this collection being the process of splitting the lexicon into components, based on what the user sets as parameters – paragraphs, phrases, words, or characters.
Similarly, morphological analysis is the process of identifying the morphemes of a word. A morpheme is a basic unit of English language construction, which is a small element of a word, that carries meaning. These can be either a free morpheme (e.g. walk) or a bound morpheme (e.g. -ing, -ed), with the difference between the two being that the latter cannot stand on it’s own to produce a word with meaning, and should be assigned to a free morpheme to attach meaning.
What can you use lexical or morphological analysis for in SEO?
There are multiple SEO projects, where you can implement lexical or morphological analysis to help guide your strategy.
For instance, when doing on-page analysis, you can perform lexical and morphological analysis to understand how often the target keywords are used in their core form (as free morphemes, or when in composition with bound morphemes). This type of analysis can ensure that you have an accurate understanding of the different variations of the morphemes that are used.
Of course, this analysis can be performed with the SERP results as well, which will help you gain an understanding of the importance of certain keywords and their keyword variations for ranking in key positions (bare in mind here that correlation does not equal causation).
Another useful way to implement this initial phase of natural language processing into your SEO work is to apply lexical and morphological analysis to your collected database of keywords during keyword research.
This can help you quantify the importance of morphemes in the context of other metrics, such as search volume or keyword difficulty, as well as gain a better understanding of what aspects of a given topic your content should address.
Morphological analysis can also be applied in transcription and translation projects, so can be very useful in content repurposing projects, and international SEO and linguistic analysis.
What are some tools you can use to do lexical or morphological analysis?
Phase II: Syntax analysis (parsing)
Syntax Analysis is the second phase of natural language processing. Syntax analysis or parsing is the process of checking grammar, word arrangement, and overall – the identification of relationships between words and whether those make sense. The process involved examination of all words and phrases in a sentence, and the structures between them.
As part of the process, there’s a visualisation built of semantic relationships referred to as a syntax tree (similar to a knowledge graph). This process ensures that the structure and order and grammar of sentences makes sense, when considering the words and phrases that make up those sentences. Syntax analysis also involves tagging words and phrases with POS tags. There are two common methods, and multiple approaches to construct the syntax tree – top-down and bottom-up, however, both are logical and check for sentence formation, or else they reject the input.
What can you use syntax analysis for in SEO?
Syntax analysis can be beneficial for SEO in several ways:
- Programmatic SEO: Checking whether the produced content makes sense, especially when producing content at scale using an automated or semi-automated approach.
- Semantic analysis: Once you have a syntax analysis conducted, semantic analysis is easy, as well as uncovering the relationship between the different entities recognized in the content.
What are some tools you can use to do syntax analysis?
There are multiple tools and libraries available to do parsing and syntax analysis in Python, for which I recommend going through the tutorial, written by Gabriele Tomassetti, titled: Parsing in Python: all the tools and libraries you can use.
One approach not mentioned in the linked article is an API, used frequently by technical and data SEOs – Google’s Natural Language API, which has a module for syntax analysis. According to the documentation of this API method:
“While most Natural Language methods analyze what a given text is about, the
analyzeSyntaxmethod inspects the structure of the language itself. Syntactic Analysis breaks up the given text into a series of sentences and tokens (generally, words) and provides linguistic information about those tokens.”
Phase III: Semantic analysis
Semantic analysis is the third stage in NLP, when an analysis is performed to understand the meaning in a statement. This type of analysis is focused on uncovering the definitions of words, phrases, and sentences and identifying whether the way words are organized in a sentence makes sense semantically.
This task is performed by mapping the syntaxic structure, and checking for logic in the presented relationships between entities, words, phrases and sentences in the text. There are a couple of important functions of semantic analysis, which allow for natural language understanding:
- To ensure that the data types are used in a way that’s consistent with their definition.
- To ensure that the flow of the text is consistent.
- Identification of synonyms, antonyms, homonyms, and other lexical items.
- Overall word sense disambiguation.
- Relationship extraction from the different entities identified from the text.
What can you use semantic analysis for in SEO?
There are several things you can utilise semantic analysis for in SEO. Here are some examples:
- Topic modeling and classification – sort your page content into topics (predefined or modelled by an algorithm). You can then use this for ML-enabled internal linking, where you link pages together on your website using the identified topics. Topic modeling can also be used for classifying first-party collected data such as customer service tickets, or feedback users left on your articles or videos in free form (i.e. comments).
- Entity analysis, sentiment analysis and intent classification – You can use this type of analysis to perform sentiment analysis and identify intent expressed in the content analysed. Entity identification and sentiment analysis are separate tasks, and both can be done on things like keywords, titles, meta descriptions, page content, but works best when analysing data like comments, feedback forms, or customer service or social media interactions. Intent classification can be done on user queries (in keyword research or traffic analysis), but can also be done in analysis of customer service interactions.
What are some tools you can use to do semantic analysis?
For topic modeling, there are multiple ways to do this in Python, but for a quick, beginner-friendly app, I recommend using Cornell’s LDA analysis web application. Here’s a tutorial on how to use it on your site’s web content.
Google’s Natural Language API, has a modules for:
- Entity identification – inspects the given text for known entities, and returns information about those entities.
- Entity sentiment analysis – combines both entity analysis and sentiment analysis and attempts to determine the sentiment (positive or negative) expressed about entities within the text.
- Sentiment analysis – analyses text and identifies the dominant emotional opinion within it, determining whether the writer’s attitude is positive, negative, or neutral.
All of these can be channeled in Google Sheets, but can be used in Python as well, which will be more suitable for websites and projects, where scalability is desired, or otherwise – when working with big data.
Phase IV: Discourse integration
Discourse integration is the fourth phase in NLP, and simply means contextualisation. Discourse integration is the analysis and identification of the larger context for any smaller part of natural language structure (e.g. a phrase, word or sentence).
During this phase, it’s important to ensure that each phrase, word, and entity mentioned are mentioned within the appropriate context. This analysis involves considering not only sentence structure and semantics, but also sentence combination and meaning of the text as a whole.
Otherwise, when analyzing the structure of text, sentences are broken up and analyzed and also considered in the context of the sentences that precede and follow them, and the impact that they have on the structure of text. Some common tasks in this phase include: information extraction, conversation analysis, text summarisation, discourse analysis.
Here are some complexities of natural language understanding introduced during this phase:
- Understanding of the expressed motivations within the text, and its underlying meaning.
- Understanding of the relationships between entities and topics mentioned, thematic understanding, and interactions analysis.
- Understanding the social and historical context of entities mentioned.
What can you use discourse integration for in SEO?
Discourse integration and analysis can be used in SEO to ensure that appropriate tense is used, that the relationships expressed in the text make logical sense, and that there is overall coherency in the text analysed. This can be especially useful for programmatic SEO initiatives or text generation at scale. The analysis can also be used as part of international SEO localization, translation, or transcription tasks on big corpuses of data.
There are some research efforts to incorporate discourse analysis into systems that detect hate speech (or in the SEO space for things like content and comment moderation), with this technology being aimed at uncovering intention behind text by aligning the expression with meaning, derived from other texts. This means that, theoretically, discourse analysis can also be used for modeling of user intent (e.g search intent or purchase intent) and detection of such notions in texts.
What are some tools you can use to do discourse integration?
In order to do discourse analysis machine learning from scratch, it is best to have a big dataset at your disposal, as most advanced techniques involve deep learning. Many researchers and developers in the field have created discourse analysis APIs available for use, however, those might not be applicable to any text or use case with an out of the box setting, which is where the custom data comes in handy.
One API that is released by Google and applied in real-life scenarios is the Perspective API, which is aimed at helping content moderators host better conversations online. According to the description the API does discourse analysis by analyzing “a string of text and predicting the perceived impact that it might have on a conversation”. You can try the Perspective API for free online as well, and incorporate it easily onto your site for automated comment moderation.
Phase V: Pragmatic analysis
Pragmatic analysis is the fifth and final phase of natural language processing. As the final stage, pragmatic analysis extrapolates and incorporates the learnings from all other, preceding phases of NLP.
Pragmatic analysis involves the process of abstracting or extracting meaning from the use of language, and translating a text, using the gathered knowledge from all other NLP steps performed beforehand.
Here are some complexities that are introduced during this phase
- Information extraction, enabling an advanced text understanding functions such as question-answering.
- Meaning extraction, which allows for programs to break down definitions or documentation into a more accessible language.
- Understanding of the meaning of the words, and context, in which they are used, which enables conversational functions between machine and human (e.g. chatbots).
What can you use pragmatic analysis for in SEO?
Pragmatic analysis has multiple applications in SEO. One of the most straightforward ones is programmatic SEO and automated content generation. This type of analysis can also be used for generating FAQ sections on your product, using textual analysis of product documentation, or even captializing on the ‘People Also Ask’ featured snippets by adding an automatically-generated FAQ section for each page you produce on your site.
What are some tools you can use to do pragmatic analysis?
The most accessible tool for pragmatic analysis at the time of writing is ChatGPT by OpenAI. ChatGPT is a large language model (LLM) chatbot developed by OpenAI, which is based on their GPT-3.5 model. The aim of this chatbot is to enable the ability of conversational interaction, with which to enable the more widespread use of the GPT technology. Because of the large dataset, on which this technology has been trained, it is able to extrapolate information, or make predictions to string words together in a convincing way.
With that said, there are also multiple limitations of using this technology for purposes like automated content generation for SEO, including text inaccuracy at best, and inappropriate or hateful content at worst.
To summarize, the five phases of natural language processing, as expressed in the theory for compiler design programs are:
- Lexical or morphological analysis
- Syntax analysis (parsing)
- Semantic analysis
- Discourse integration
- Pragmatic analysis
As the article demonstrated, there are numerous applications of each of these five phases in SEO, and a plethora of tools and technologies you can use to implement NLP into your work.