How to Get Started with Regex

July 21, 2021 - 6  min reading time - by Ashkar Gomez
Home > Technical SEO > How to Get Started with Regex

Regex is one of the technical skills which is often overlooked in the marketing space. It can be used in Google Search Console, Google Analytics, and Google Data Studio for data extraction for SEO.

This article is completely focused on the concept of Regex, and its advantage to leverage data analysis as part of SEO Services.

Data science and data analysis have a huge role in the future of Search Engine Optimization. In today’s practice, we can’t rely only on On-Page, Off Page, and Technical SEO.

Data SEO has a huge role in getting the desired result of keyword ranking and Organic Traffic.

Regex helps in extracting specific patterns of characters from data sets which is a very important skill for many actors in the SEO space:

  • SEO Specialists
  • Web Operators and Web Analytics Team
  • Researchers/Data Engineers
  • Digital Marketing Experts/Consultants

What is Regex?

Regex, commonly known as Regular Expressions, is one of the tools used for pattern matching. A Regex is a string, or series of characters, that represents a pattern in order to match, manage, and filter texts.

A Regex string might look like this:
([0-9]+(\.[0-9]*)?)
It uses sets of characters, symbols, and other elements to describe a pattern. A pattern might be a phone number, a URL, a date or time, an address, an identifier such as a product reference, or even a sentence of text within a page of code.

You can then find the sequences in any text or list that match the pattern you’ve described.

[Case Study] Improving rankings, organic visits and sales with log files analysis

In the beginning of 2017, the team of TutorFair.com asked for Omi Sido’ SEO services to help them. Their website was struggling with rankings and organic visits.

How is Regex used in SEO?

In SEO practices, this helps in filtering the keywords or phrases by which a website is earning traffic. In turn, this filter helps in analyzing the behaviour and search intent of your users. This has become increasingly important since the BERT update of Google Search Engine, which has helped Google better identify user intent using NLP.

After this, search engines now focus on understanding the user intent and rank the most compelling content on the 1st page of SERP. Google Analytics and Google Search Console are both free tools widely used in SEO that support the use of the technology of Regex.

Basic Regex Skills: Operators

Before going further, you have to understand operators to use Regex effectively. Regex operators are divided into five categories:

  1. Character sets
  2. Wildcards
  3. Anchors
  4. Groups
  5. Escape characters

Each operator represents a type of character or an instruction. Here are some of the main operators.

Character classes

Character classes are sets, or types, of characters.

  • \d – It matches any one digit.
  • \D – It matches any one character that is not a digit.
  • \w – It matches any one “word character” (letters, numbers, underscore).
  • \s – It matches any whitespace (spaces, tabs, …).
  • \S – It matches any character that is not whitespace.
  • ?-i – Specifies case-sensitive matches for all following characters.

Wildcards

Wildcards don’t specify the specific character that they match.

  • Dot (.) – It matches any single character (a letter, number, or symbol).
  • Question Mark (?) – It matches the previous character 0 or 1 time.
  • Plus Sign (+) – It helps to match the previous character 1 or more times.
  • Asterisk (*) – It helps to match the previous character 0 or more times.
  • Pipe (|) – Creates an OR match.

Anchors

Anchors describe the part of the pattern you’re trying to match.

  • Caret (^) – It indicates that the Regex should match the characters at the beginning of the string or line, rather than anywhere in the string.
  • Dollar Sign ($) – It indicates that the Regex should match the characters at the end of the string or line, rather than anywhere in the string.

Groups

Groups are ways to group elements in the Regex.

  • Parentheses (()) – It “captures” the characters enclosed inside the parentheses, which might be described by the pattern around them, matched by the rest of the Regex. You can use multiple capturing groups, and they will be identified in the order that they appear.
  • Square Brackets ([]) – It matches the set of enclosed characters in any order, anywhere in a string.
  • Dash (-) – It is used within square brackets to indicate a range of characters, like 0-9 or A-Z.

Escape

The escape character allows you to use a character literally even if it is usually interpreted as an operator.

  • Backslash (\) – Indicates that the adjacent character should be interpreted literally rather than as a Regex operator.

Now let’s look at a few basic examples of how it is used in Google Analytics and Google Search Console.

How to filter tables in Google Analytics

Google Analytics is one of the free tools which helps in analyzing the user journey on your website with the help of data including:

  • Audience: demographic information
  • Acquisition: how the user arrived on your site
  • Behavior: what the user does on your site
  • Conversion: whether the user accomplishes the sales or marketing goals you set for them on your site

We can use Regex to filter the data in Google Analytics and understand the user behavior.

In the above image, the Regex /ebooks/|/tools/ is being used to filter for two pages out of 1000 pages on the website with the help of the |(pipe), which means “or”. This string can be read as: “Find only pages that contain either /ebooks/ or /tools/

In the same way, you can use all the other strings to become a master at discerning the behavior of users and the pages they visit on your website.

How to Filter Queries in Google Search Console

Google Search Console is one of the important tools like Google Analytics. It provides information about how Google uses pages in search results, diagnoses the issues in terms of Technical SEO and adds value in getting data related to user behavior.

Recently, Google Search Console has added the feature “Regex” in April 2021 to improve data filtering to an advanced level. You can filter for patterns that:,

  • Match a Regex
  • Don’t match a Regex

There are many facilities that GSC offers, among which the Performance report stands out. Here we can find information such as:

  • Total Clicks
  • Total Impressions
  • Average CTR
  • Average Position
  • Queries (Keywords up to 1000)
  • Pages that are ranking
  • Countries
  • Devices
  • Search Appearance
  • Dates

At the very top of the report, there are filtering options. To use Regex, you need to click on the option “+New”.

You use Regex to filter Queries, Pages, Countries, Devices, and Search Appearance.

Here is a basic example of filtering for the phrase “digital agency”, “digital agency” or phrases with some other text between digital and agency (like “digital communication agency” and “what is the digital expertise of an SEO agency”) using the Regex digital.+agency:

Here are the results:

Why Use Regex?

Although you have to first learn the concept of strings and operators, Regex can be a useful new tool for many SEO practitioners. Regex can help in identifying search intent, content analysis, user behavior, etc.

The future of SEO depends on data and understanding the technical issues with an immediate effect that need to be addressed.

There are many tools that use data filtering to provide more information about any website. This can include Ahrefs and SEMrush, and crawlers like Oncrawl, but also tools like Google Analytics and Google Search Console.

When using Regex, it is necessary to understand the operators and characters. Then the methodologies to get the benefit out of them will be clear. Using Regex filters will help us to understand the data available to figure out the search intent, and focus on the search queries that bring users to your website.

Yes, the goal of SEO is to get traffic and ranking keywords on top. But, the top priority is to get more conversions and sales. Regex can help you turn your website into a conversion machine.

Ashkar Gomez, Founder of 7 Eagles - Digital Marketing Company in India has an overall experience of 7+ years in SEO and other Digital Marketing Services. He also works on projects worldwide and does SEO consultation to Corporate companies and Startups.
Related subjects: