Introducing new hreflang metrics and language clusters

August 2, 2018 - 10  min reading time - by Rebecca Berbel
Home > Product updates > Introducing new hreflang metrics and language clusters

We’re excited to present Oncrawl’s extended hreflang metrics.

Using hreflang tags allows Google to offer the correct regional or language version of your page to users. Hreflang tags help send a strong signal to Google that one page is a translation or localization of another, which helps boost the ranking of new translations and offer appropriate content to search engine users based on their language or country. Hreflang tags are a key element to an internationalized website.

You can find our new metrics on the Rel alternate dashboard, in the Data explorer, and in the Dashboard builder.

Hreflang metrics in the Oncrawl report

Hreflang data analyzed by Oncrawl

Hreflang declarations for your site’s pages can be indicated in multiple ways. In each case, a declaration must be made for each of the localized versions of the page, including the current or original page itself.

Hreflangs can be declared:

  • by including <link> tags in the page’s HTML, in the <head> section. These tags take the following form:

<link rel=”alternate” hreflang=”en-US” href=”https://www.yoursite.com/your_translated_page” />

  • by including a reference in the page’s HTTP headers, particularly if the page is a file that isn’t in HTML format. These declarations take the following form:

Link: <https://www.yoursite.com/your_original_page>; rel=”alternate”; hreflang=”en-US”, <https://www.yoursite.com/your_translated_page>

  • by including references in sitemaps. These references look like this:

<url>
<loc>https://www.yoursite.com/your_original_page</loc>
<xhtml:link
rel=”alternate”
hreflang=”en-US”
href=”https://www.yoursite.com/your_translated_page”/>
</url>

Oncrawl examines each page with these tags and analyzes the results.

Clusters: grouping pages by equivalent content

The objective of listing hreflangs is to indicate which pages contain equivalent content targeted for speakers of different languages or for visitors in different national regions.

Oncrawl helps make that visible. All pages that reference one another using hreflangs are grouped together in a cluster. This cluster contains all of the translations (and, of course, the original content) of a page. Ideally, each page of the cluster should contain references to every other page in the cluster, including itself:

Ideal hreflang cluster

In order to make it easy to fix potential problems, we highlight typical cluster errors for you. Common errors include:

Hreflang cluster with missing links

What we’ll tell you:

  • The cluster contains missing links.
  • Italian page (it): this page doesn’t declare a hreflang link to itself.
  • French (fr-FR): missing an inbound link from the Belgian page.
  • Belgian page (fr-BE): missing an outbound link to the French page.

Hreflang cluster with missing links

What we’ll tell you:

  • The cluster contains missing links.
  • Italian page (it): this page doesn’t declare a hreflang link to itself.
  • French page (fr-FR): this page is missing an inbound link from the Belgian page, and an outbound link to the Belgian page.
  • Belgian page (fr-BE): this page is missing an inbound link from the French page, and an outbound link to the French page.

Hreflang cluster with non-indexable page

What we’ll tell you:

  • The cluster contains missing links and a non-indexable page.
  • Italian page (it): this page doesn’t declare a hreflang link to itself.
  • French page (fr-FR): this page is missing an inbound link from the Belgian page, and an outbound link to the Belgian page.
  • Belgian page (fr-BE): this page is non-indexable. It is also missing an inbound link from the French page, and an outbound link to the French page.

We’ll also tell you the source of each translation in the cluster (HTTP headers, HTML tags, or the address of a sitemap). And we let you know if we find clusters that are too big. (This usually happens when a bunch of pages all list the home page as their translation.)

Using Oncrawl to improve your internationalization

Prevent Google from ignoring your hreflang declarations

Google representatives have stated that although Google will try to resolve certain hreflang errors on its own, certain issues will make Google ignore your hreflang declarations. One example of an error that will cause Google to ignore your declarations is listing a non-indexable page.

With our new metrics, you can find these pages and correct them in just a few clicks:

  • In the Crawl Report, navigate to “Indexability”, then to the “Rel alternate” dashboard.
  • Among the first statistics are “Non-indexable pages declared as hreflang“.
  • Click on the number to go directly to the list of these pages in the Data Explorer.

If you need more information, for each page, you can click on the URL for a detailed analysis, or add columns to view possible reasons that a page might not be indexed. This might include columns for:

  • Meta robots: forbidding bots in the meta declarations in the page HTML will prevent a page from being indexed
  • Denied by robots.txt: forbidding bots via the robots.txt file will prevent a page from being indexed
  • Canonical evaluation: a non-canonical page (a page with a non-matching canonical declaration) will not be indexed
  • Status code: pages with 5xx, 4xx and 3xx status codes are not crawled, and therefore not indexed

Find all translations in X language with Y problem

You can now use Oncrawl to find all of your translations that use an invalid language code.

  • In the Crawl Report, navigate to “Indexability”, then to the “Rel alternate” dashboard.
  • Scroll down to the “Hreflang issues” chart near the bottom.
  • Click on “Incorrect language code” to view a list of these pages in the Data Explorer.

Sometimes a basic check of the language and country codes isn’t enough. But that’s okay: we can take you further.

For example: a common error we see is using the language and region code “es-LA” for Latin American Spanish, although LA is the country code for Laos.

Because LA is a valid country code, it’s unlikely to raise a red flag in error reports. However, you can still search for “es-LA” and find all of the pages that use it:

From the Data Explorer, use the Oncrawl Query language to set up the following rule:

Hreflangs – none of the values equals – es-LA

hreflang filter for language code

Unless you’re targeting Spanish speakers in Laos, if your hreflang declarations are in perfect order you won’t find any results.

Find pages in language X without a translation in language Y

We can now help you find the pages you don’t have translations for yet.

Perhaps your site exists in Dutch and French, and now you’ve started adding German translations.

Use the Oncrawl Query Language in the Data Explorer to find pages that already have a Dutch and a French version, but not a German one. Set up the following rules:

Hreflangs – has at least one value

AND

Hreflangs – none of the values equals – de

Hreflang search for missing language

The resulting table will show you the pages for which you do not yet have (or for which you do not yet declare) German translations.

The rest is up to you!

Crawl two websites on different domains connected by hreflang

Oncrawl handles cases where each language is implemented on its own domain, for example, https://www.mysite.co.uk and https://www.mysite.de.

In order to get an analysis of the hreflangs implementation, set your crawl to cover both domains. To make this type of crawl, add both domains as “Start URL” in the Oncrawl crawl settings.

From the crawl settings page, click on “Start URL” to unfold the section. In the field below the statement “You can define additional start URLs to your crawl”, add start URLs for all additional languages linked by hreflang.

Then start your crawl as usual.

This will add hreflang data, as well as standard crawl data, for each of the domains listed as start URLs.

Get a global view of your internationalization

Are these examples too detailed? Did you just want an overview of what’s going on with hreflangs on your website?

Or are you an hreflang guru who just wants to check now and then to make sure everything’s still in perfect working order?

A quick glance at our charts (you can add them to a custom dashboard if that’s easier for you) may be all you need.

Hreflang metrics

Tips and tricks for getting the most out of our hreflang analysis

Exploring the hreflang details for a URL

When looking at results in the Data Explorer, you can find extensive details on a URL’s hreflang environment in the column “Hreflang error details”. If this column is not automatically present in your report, then you can add it by clicking “Add columns” and searching for “Hreflang error details”.

Oncrawl hreflang error results

Click on “View details” to open the hreflang details window.

Oncrawl hreflang details window

This window presents three types of information:

  • All pages that this page declares as translations using hreflang, You can explore these translation pages from this page. You can also download this list as a CSV file.
  • A link to the all of the pages in this page’s hreflang cluster, with a mention of the number of pages the cluster contains and where we found the hreflang declaration for each page in the cluster.
  • All of the hreflang errors found for this page. Expand each error to show details, including an explanation of the error, and a list of the URLs for which something went wrong. For example, in the case of “missing outbound declarations” you’ll find the explanation that this error means the page is missing declarations to other pages in the same cluster, and a list of the pages that it does not declare.

Using the URL navigation shortcut menu

When looking at URLs in Data Explorer results, you can use the quick shortcut menu next to each URL to jump to hreflang information for that URL.

Oncrawl URL navigation shortcuts

When looking at a page with hreflang errors, you may frequently find yourself wanting to examine the other pages in the same cluster. Using the URL’s shortcut menu, choose “View all pages in its hreflang cluster” to go directly to the list.

Oncrawl URL navigation shortcut menu

You may also be interested in viewing all of the pages that list a URL as an hreflang translation. From the URL’s shortcut menu, choose “Get hreflang pointing to this page.”  This is especially helpful if you have errors resulting in very large clusters: usually, what has happened is that many pages with unrelated content declare the same page as a translation. (The page that they declare as a translation is often the home page.)

Hreflang cluster pointing to homepages

Setting up a crawl to optimize hreflang analysis

Don’t forget that your crawl reports can only show you what you’ve chosen to crawl. When you start running crawls in order to take advantage of hreflang reporting, you’ll need to keep your site’s structure in mind.

If you set your site up with international pages…

  • …in separate domains, such as mysite.fr and mysite.de, you will need to make sure to list all language or regional domains as start URLs.
  • …in separate subdomains, such as fr.mysite.com and de.mysite.com, you will need to make sure to list all language or regional domains as start URLs.
  • …in separate directories, such as https://www.mysite.com/fr and https://www.mysite.com/de, you will need to make sure that our bot has access to both directories. We suggest listing both directories as start URLs.
  • ...using separate parameters, such as https://www.mysite.com?lang=fr and https://www.mysite.com?lang=de, you may want to be aware that this strategy is not recommended for SEO. It is very complicated for bots to explore your pages. Consequently, search engines have a difficult time indexing your pages correctly. Furthermore, our bot and search engines’ bots will encounter the same problems.However, this won’t keep you from crawling your site to find out what Google sees. You will need to make sure that our bot has access to parameters. In the crawl settings, click on “URL with parameters” to expand the section. Make sure that you have checked the “Crawl URLs with parameters” box. If you use other parameters that you don’t want to crawl, you can also check the “Filter URL parameters” box to only include language parameters, or to exclude non-language parameters — whichever is easier. List both URLs, parameters included, as start URLs.If your results aren’t satisfying, make sure all links on your site contain parameters and try again.

Availability

This feature does not require any setup. It is included in all of the plans and packages.

For more information about Oncrawl and hreflang analysis, check out this article in our knowledge base.

Rebecca Berbel See all their articles
Rebecca is the Product Marketing Manager at Oncrawl. Fascinated by NLP and machine models of language in particular, and by systems and how they work in general, Rebecca is never at a loss for technical SEO subjects to get excited about. She believes in evangelizing tech and using data to understand website performance on search engines. She regularly writes articles for the Oncrawl blog.
Related subjects: