How do you manage hreflang declarations for international content spanning a large number of country-specific sites? At OnCrawl, our Customer Success Managers have looked at a few tricks to help several clients in this position. Some of these are pretty simple, thanks to a few built-in tricks that make it easier for you.

How to set up a cross-domain crawl

The first step is to make sure all of your data is available within the same project and as the results of a single crawl. We recommend creating a separate project for this type of analysis.

Set up the project using the URL of the homepage of the most extensive or original version of the website.

Set up a first crawl with the following characteristics.

Start URL:

  • Spider mode
  • Start URL: the project URL
  • Additional start URLs: list each of the additional home pages for the country-specific sites, one per line.

Crawl limits:

  • Max URLs: make sure that this value is large enough to cover all of the pages on all of the sites you’ve listed. If you have 1000 pages per site and 14 sites to cover, set the max URLs to at least 14000 URLs.

Sitemaps (if your hreflang declarations are in one or more sitemaps):

  • Allow soft mode: make sure this value option is checked, unless your sitemaps and their location conform exactly to sitemap.org standards.

Additional crawl settings should be adjusted in accordance with the characteristics of your sites. For example, you may want to make sure that the crawler ignores URLs with parameters.

If one or more of your sites is created with Javascript, don’t forget to show the extra settings and enable Javascript crawling.

How to set up segmentation for cross-domain analysis

Segmenting your site based on a criterion (such as country-specific domains) at the core of your analysis allows you to see trends and exceptions that aren’t otherwise visible.

Segmenting your site by country-specific domains allows you to:

  • Easily switch between data for one domain, the data for another domain, and data for all domains
  • Display breakdowns by domain in certain charts

OnCrawl Trick: We’ve included an automation that allows you to quickly and easily create a domain-based segmentation. Make sure the multi-domain crawl is finished before you begin.

In the Create Segmentation area, follow these steps:

  • 1. Choose to create a segmentation “From field automatically”.
  • 2. In the next step, use the drop down menus to select the following values:
    Crawl report used for preview: choose the multi-domain crawl that just finished
    Create from field: URL host
  • 3. Provide a name for your segmentation. “Domains” works well.

When viewing crawl results, you can now use the menus at the top left of the page to change to the “Domains” segmentation.

To filter in order to view data for only one site, use the base filter drop down and select the site you want to focus on.

How to use data from a cross-domain hreflang report

Data from a multi-domain crawl focusing on hreflang statuses can be used to accomplish many goals. Here are a few of them:

You can establish error reports for the complete ecosystem of international sites:

You now have an overview at a glance that allows you to compare the number of pages per site that implement hreflang tagging:

You can also compare the percentages of each site that are missing hreflang indications. This statistic can also be viewed by number of pages per site. As with all OnCrawl charts, you can hover for details.

Running regular crawls with this crawl profile will show you how your sites’ hreflang ecosystem evolves over time:

Remember that cross-domain links in your multi-domain crawl are still counted as internal links. Therefore, the internal link flow graphic will look something like this:

You can look up numbers for cross-domain links using the Data Explorer, either by clicking on the link flow diagram or by using an OnCrawl query like this:

OnCrawl Trick: If you link to all of your other versions in the footer, for instance, this will produce a staggeringly massive number of links. If the anchor text is just the name of the country, you can filter these links out with a regular expression using the “pipe” character (|) between country names in each applicable language:

Filter footer anchor text for links to different language version using a single OnCrawl Query Language expression

More generally, you can add the URL host to any Data Explorer report…

…such as the one produced by this OQL query for pages that list a redirected (non-indexable) page as an hreflang equivalent…

…in order to be able to use the column filters to filter the report by website:

It’s your turn to make managing a portfolio of international sites easy

Internationalization is a sticky topic. At OnCrawl, we’ve tried to make it just that much easier for you.

You can find all sorts of data to help you manage your cross-domain hreflang strategy using OnCrawl, whether you’re looking at hreflang coverage or spotting implementation errors, or even looking at the volume of links between separate domains targeting different country regions.

Not an OnCrawl user yet? It’s a perfect time to start your free trial, gain insights from real data from your website, and benefit from expert help from the Customer Success Managers at OnCrawl.