Multilingual SEO is optimizing a website’s SEO process for multiple languages and multiple regions. Multilingual SEO focuses on a websites’ different sections, or a brand entity’s multiple domains for different regions and languages. It includes technical SEO, semantic SEO, local SEO, content related metrics, and contextual search principles. Since multilingual websites are larger, costly but also more informative for search engine users, search engines’ perspective for multilingual websites focuses on the categorical quality, topical authority, and cost of document retrieval across different languages.
ForexSuggest.com is the main example for Multilingual SEO Guideline and Case Study. It has a tremendous amount of technical SEO Problems, but still, with key changes, it is able to win two broad core algorithm updates of Google from Summer 2021. In the next sections, you will see the updated GSC performance graphics from September.
When it comes to multilingual SEO, most SEOs only focus on hreflangs, or, if they are a little bit more careful, they also focus on URL patterns. But, actually, for international and multilingual SEO, there are very many more things a search engine can focus on. A search engine can use multinational websites to improve its language understanding, named entity recognition and related algorithms, synonym extraction, and query expansion processes along with SERP quality. Additionally, multilingual websites can use this interest of search engines to empower their own authority and existence on the SERP.
An article by Danny Sullivan from 2011 that demonstrates the importance of cross-lingual information retrieval, and an announcement from Google in the same year that shows the importance of multilingual websites.
In this SEO case study, a website that even does not respond most of the time with its “www” version will be shown to win two different broad core algorithm updates from summer 2021.
“By implementing the hreflangs for the most popular and most important sections of the ForexSuggest.com, and by making the different language sections more consistent in terms of design and content amount, we have increased our organic visibility with multilingual SEO.”
Background of the SEO case study and an introduction
These multilingual SEO guidelines and case study will illustrate search engines’ perspective, the broad appealing advantage of multilingual sites, the importance of hreflang tags and their effectiveness, and the symmetry between different web page sections, using ForexSuggest.com with 41 different languages as an SEO case study.
Results for the ForexSuggest.com SEO case study are below.
- 435% organic traffic increase in six months
- 164% impression increase
- 157% Active Page Increase
- 58% New Query Count Increase
- 72% CTR Increase
During this SEO case study, ForexSuggest.com has published more new content than their competitors. With the cross-language and -regional content, contextual internal links, and design consistency, and despite the existing tremendous amount of technical SEO errors and asymmetrical content and multilingual website problems, it has managed to thrive by winning 2 Broad Core Algorithm Updates of Google, during the summer of 2021.
Last 220 Days Comparison of ForexSuggest.com.
What are the technical SEO errors and general look of the ForexSuggest.com project?
ForexSuggest.com had every type of possible technical SEO error, including not responding with the every origin name variations such as with “www”. Below, you will see a general background of the technical SEO errors on the website.
- Thousands of canonicalization and duplicate content errors
- Hundreds of pages with thin and insufficient content
- Some of these thin content pages are perceived as soft 404
- Tens of thousands of internal redirects
- Tens of thousands of “Crawled not Indexed” URLs which is a signal for quality issues.
- Hundreds of 404 pages
- Thousands of internal 404 links
- Hundreds of discovered but not indexed pages which is a signal for quality and technical problems in the eyes of search engines
- Structured data parsing and property errors, earnings
- 404 URLs in sitemap
- Half of the websites are not in the sitemaps
- Noindexed URLs are in the sitemap
- Missing canonicals
- Missing web security response headers
- Hundreds of 500-level errors
- Hundreds of Indexed but Blocked errors
- 99% of the website fails to pass Core Web Vitals
- Not responding with the version “www”
- Having mixed content errors for thousands of pages
- The website doesn’t have a URL consolidation for trailing slash or upper-lower case letters
- 80% of the paginations are flawed.
- There are hundreds of internal nofollow
- HTML size, unused code amount, or any kind of PageSpeed and crawl efficiency optimization is missing
- More than one H1 one tags, or no proper heading hierarchy for accessibility
A general view of the Excluded Section of ForexSuggest.com from Google Search Console. The real error counts are way much higher than in the image.
Besides the technical SEO problems, there are also content-related issues in terms of Multilingual SEO such as asymmetrical website structure, and big amounts of untranslated content, shifting anchor texts, or web page design, and layout. To be honest, until now, I didn’t even tell the 40% of the general SEO problems of the multilingual SEO case study website. I would like to talk about the AMP related problems, such as non-valid AMP HTML Structure, and blank opening AMP URLs, for instance: “https://forexsuggest.com/fr/examen-infinox/amp/”.
Last 6 months organic search performance change graphic for ForexSuggest.com. During these months, 5 big unconfirmed Google Update, Two Broad Core Algorithm Update, One Spam Update, One PageSpeed, and Page Experience Update happened.
But, I told my Oncrawl friends to keep this article short, and as you noticed already, I failed as always. But, if you want to learn more about multilingual SEO concepts, and if you have an advanced understanding of search engines, their structure, decision trees, and ecosystem, read the long version of the Multilingual SEO Guide and Case Study to encounter further concepts, perspectives, and deep understanding.
An AMP Test Example from Google’s AMP Validation tool for ForexSuggest.com and its result.
Unless you have a decent level of search engine understanding, you should read this brief version that I have prepared for you with Oncrawl instead of the long version.
An empty AMP URL sediment example for the website.
Lastly, do I need to mention that the website has lots of spam harmful external links, and no proper disavow file yet? So, from every angle of SEO, the website has lots of problems, but still, it is able to win every Google update by doubling its traffic every two months.
Checking your new client’s crawl data be like pic.twitter.com/lOytgJL561
— Izzi Smith (@izzionfire) October 19, 2020
When I see all these Technical, and Non-technical problems, I feel like Izzi Smith’s awesome GIF creation for SEOs.
Decision-making and prioritization process for this multilingual SEO case study
The ForexSuggest SEO case study is also a good example for an SEO to make decisions on spending resources on the right problems to improve the overall site quality, clarity, and understandability for search engines to win the broad core algorithm updates. In the content section, most of the content was published when I started to work on the project, and it has continued to be published.
This is an example crawl output from Bing Webmaster Tools for ForexSuggest.com. And, especially the HTML Size is too long, errors can cause partial indexing, partial caching, while making the HTML Digestion for the search engines harder.
To keep the international SEO guidelines tidy, I will give the decision tree that I have used along with possible improvements to use the all workforce at the best efficiency from the start.
- I didn’t focus on technical SEO problems because it would take a 20 person team months to solve these problems.
- I didn’t focus on the issues with the content because the project owner had already invested a lot of money in past content, all I could do was give the writers training and advice to keep the content release frequency higher than the competitors by decreasing the asymmetry between different language sections.
- There were basically two advantages for the website to show its potential, being multilingual made it more important than its competitors for the search engines. The second advantage was the broad appeal.
- To use both of these advantages, I had to focus on four main concepts for multilingual SEO.
- The symmetry between different language sections, in terms of internal links, anchor texts, content amount.
- The consistency in terms of design, web page layout, components, brand identity, brand colors between different language sections.
- Matching the different language sections, in terms of content profile, heading-text pairs, and URL count.
- Prioritizing the canonical version, popular topic, and popular language.
Prioritization, symmetry, matchability and consistency are the main principles for this multilingual SEO case study and international SEO guidelines.
In the following sections, 20 points will explain the dos, don’ts, and understandings of international SEO according to these principles from search engine’s perspective for multilingual websites.
Statista Language Usage statistics for web users.
1. Understand the value of the canonical version of a multilingual website
The canonical version of the website is the website with the canonical language. A multilingual website’s main intended audience is the audience from the canonical version’s location and language. This is one of the common errors made on multilingual websites.
Remember, the full intended audience is the targeted audience of the diversified language and region section of the website. However, it is the main intended audience’s needs that must be satisfied through the content of the website. This can be measured through click satisfaction so that the search engines can trust the same source over cross-lingual information retrieval and cross-lingual indexing and ranking.
For the canonical version, the CTR and Average Position is way much higher than the alternate versions. In our case, it is South Africa.
For multilingual websites, the things that should be followed for the canonical version can be found below.
- Content Publication Frequency and Size: The canonical version of the website should be prioritized in terms of content publication frequency and the information amount that will be served.
- Link Flow Direction: The canonical version should have more internal links pointing to it than the alternate versions.
- Symmetrical: Canonical versions of the multilingual website and alternate versions of the multilingual website should have a symmetrical structure to one another.
- Audit: Canonical version should be prioritized for relevance and quality audit in terms of content, user experience, web page layout, or page loading performance optimization.
The language distribution for content on the web.
Topical authority effect of multilingual websites and canonical versions
A quality, irrelevancy, or topical coverage gap and non-comprehensive content on the canonical version can harm the multilingual website’s topical authority more than similar issues with the alternate versions. Becoming a topical authority for industry, or topic is possible with the help of semantic SEO by covering every related entity, attribute, possible search activity, and related search intent with the correct content structure with the help of historical data. Based on this, a multilingual website can increase its topical authority with the language-agnostic nature of semantic SEO.
A question for Google Search from Lindsey Wiebe (SEO Manager of New York Times) for John Mueller of Google to show the Search Engine’s perspective for different languages, and geographies for News SEO, and Top Carousel Selection.
Covering a topic with a comprehensive content network across languages will generate better and more historical data and topical authority for the source with pre-defined quality assignment… To learn what Topical Authority is and how to use it for SEO, you can read the related SEO guide and case study.
A section from “Website quality signal generation” patent of Google.
2. Use consistent internal link structure and link flow
Link flow is the direction and intensity of the links between web pages. Consistent internal link structure between the canonical version of the multilingual website and the alternate versions includes the same site-tree and symmetrical site structure. If a multilingual website has a different internal link structure and internal link popularity than another alternate version, this might create an asymmetrical website structure between different language and region sections.
This is the screenshot from 4th of September for the ForexSuggest Multilingual SEO Project.
Having inconsistent alternate versions in terms of site-tree and crawl path for the search engines might make the crawling process harder, and consequently relating different web pages or sections to each other based on concepts, phrases, and intents will be costlier.
Same Internal Link Structure, and Design for the Homepage for every Language Section.
Using the same internal link structure, and same link flow design between alternates will create consistent, clear relevance, context, and crawling path for the search engines, and a good navigation path for the users.
Consistent link structures between alternates and canonical versions are more important when it comes to the homepage.
- Use the same internal link structure for cross-lingual sections.
- Use the same link flow and direction, popularity for different sections.
- Link the canonical version more than the alternate versions.
This is the ForexSuggest.com’s Performance Report from Bing Search Engine for the last 6 months. I will include the performance report of Bing for the future SEO Case Studies too. And, during the summer, we can also tell that Bing performs its own updates, and the organic search performance increase is valid for Bing Search too.
4. Use consistently similar anchor texts across languages
Search Engines process the link text to understand the links’ purpose and the relation between the link source and link target web pages. Like using consistent internal link structure, also using symmetrical anchor texts across languages is important. If from Page “A” to Page “B”, you use C as link text, for another language, you should use the closest translation of the “C” as link text.
Imagine that you are a search engine, and you try to crawl and understand a multilingual website, you will realize that it is costlier than a monolingual website since it is much bigger, and there are more relevant queries across languages, historical data, and extra ranking, and quality signals. In this context, if you link the “Children’s Beds” web page from the “Childcare” web page with the link text of “Children’s bed prices”, and for another alternate language and version, if you use the anchor text of “Child sleep improvement” for the same pages, this inconsistency can dilute the relevance and context of the web page and its alternates.
In the internal link popularity, the asymmetry in the multilingual website has gotten better, but still the most popular, and the canonical versions of the international website are not prioritized properly.
Using the same link structure, link flow direction, popularity distribution with link texts will make the crawling, understanding, and matching the symmetrical alternate versions with the canonical version.
Lastly, to see the multilingual website’s main language, a search engine can compare the cross-lingual anchor texts, and their count too. Thus, prioritizing the canonical version is also important for anchor text usage in a cross-lingual way. And, be sure that the translated canonical tag will protect its relevance to the link source web page’s title, headings, and target web page’s title, and headings even if you translate them. Thus, this is not merely a translation process. The translated content also should contain the anchor text as-is in terms of meaning of the anchor text.
- Use consistent anchor text with their meanings.
- Translate all of the anchor texts.
- Create the same relevance and navigational direction across languages.
- Translate titles and headings as-is, and be sure that the anchor text exists within the targeted content as cross-lingual.
- Prioritize the canonical version for anchor text usage.
Most of the anchor texts from the external references are in English Language. For a successful multilingual website, using audience diversification for every user-based signal is also important.
5. Translate all of the content without exception
Understanding the website’s and webpage’s content’s language might be problematic for a search engine. For example, if a website has 500 web pages that 250 of which are English, 50 of them are German, and 200 of them are French, this can confuse the search engine for the questions below.
- What is the canonical version of this website?
- What is the main intended audience for this multilingual website?
- Are these contents equivalent to each other?
- Does it publish content for the same topic across the same different languages?
- Do they have a service for France, the USA, New Zealand, Canada, UK, Turkey at the same time?
- Which content can be matched to which one?
- Is there a hreflang signal, and is it correctly used?
- Is there a URL breakdown?
- Which section is more popular in terms of internal links?
- Where do they get their links more? From the UK, or the USA?
- What is the main language of their anchor texts?
- Is there an address, phone number, or founder information for the localized versions?
- Which language do they use within their social media, and cooperate sections?
- What is the IP Address, DNS Zone for the canonical version?
Besides these questions, there are way more questions for a search engine, but for the brief version of this multilingual SEO guide, these will be enough. To understand the multilingual website’s main identity and service area, a search engine will check the homepage’s language, and anchor text’s language. But, if the alternate versions are not translated enough, it will create a cross-lingual quality demotion.
A schema from “Augmenting queries with synonyms selected using language statistics” patent of Google.
In some cases, some sources on the web use untranslated content within the boilerplate content such as sidebars, headers, and footers. A multilingual website should translate boilerplate, and also main content at the same time. The main language of that specific section whether it is a subdomain or subfolder, or a separate domain, should be used completely for every inch of the webpage.
Search Engines will care more about the visible content, visible links, texts, and visual elements to determine the language of the web page, thus using clear design and visual signals is important.
- Translate boilerplate content.
- Translate main content.
- Translate slogans, mottos, minor sections.
- Translate all different website sections without exception unless there is a valid cultural reason.
[Case Study] Driving growth in new markets with on-page SEO
6. Audit the translation quality with local authors and teams
An SEO can’t know every language, but within multilingual SEO projects, you will need to manage SEO performance for the languages that you don’t actually know. Technical SEO is language-agnostic. In other words, it is valid for every language with the same methods. But, multilingual SEO is based on linguistic differences. Thus, auditing a website’s content’s quality might not be easy to understand for a foreign language.
A section from Google’s Automatically Generated Content.
In this context, the alternate versions’ language should be audited by the local teams, or the writer and the auditor should be different people that don’t know each other. If the translation quality is not enough for a language and alternate version, non-quality content will affect the topical authority, categorical quality, and relevance of the alternate versions too.
- Audit the different language content with the native speakers.
- Educate the native speakers on your SEO perspective and strategy.
- Auditors and authors shouldn’t know each other.
- Audit the auditors, if it is necessary, with other auditors.
General view of the Page Experience report of the ForexSuggest.com.
7. Localize the alternate versions of the canonical section
A multilingual asymmetric website may have poor SEO performance in some languages, making it difficult to crawl and match relevant parts. However, due to regional and cultural differences, some parts of a multilingual website may be asymmetric.
Content localization is important to keep every section of the multilingual website relevant and quality for the possible, related search activities. Content localization is the optimization of the content of a web page according to the local culture and language, for the audience. Therefore, some web pages may not be translated into a language, or a web page may have extra sections in some cases.
A section for “Cross-language Search Patent” of Google. For local connectivity, there are two different types of entities, local entities and remote entities. An entity can be remote for a monolingual website, but for an organized multilingual website, entities from the same topic can be local, and more relevant.
However, it is not possible for a website to reach an asymmetrical point with such localization examples. By creating template-based content, conjugation can be provided for country names, city names, region names, or other named entities and their attributes. Thus, when updating content specific to a region, it can also be marked as an alternative. Some notes related to the localized alternate versions are below.
- Use content localization to increase the quality and relevance of the alternate versions.
- If it doesn’t make sense to a region or culture, do not translate it.
- Try to keep the localized content symmetrical as much as possible with template-based content.
Query Evaluation Module for the Cross-Language, a section from the “Cross-language search” patent of Google.
8. Never skip hreflang usage
Hreflang is a relevant, multilingual SEO tag between the alternate and canonical versions of a web page. Hreflang can help search engines for better web page exploration, alternate version matching, and share the PageRank, quality, and relevance signals between them.
Thus, hreflang is not just about matching the alternatives, it is also about sharing the quality, relevance, and authority signals by helping search engines to crawl the website in an easier way. Thus, content symmetry and complete translation are important. If a web page doesn’t have half of the content of the canonical version, the hreflang can be ignored by the search engines since it includes missing and inconsistent signals.
“Identifying relevant document languages through link context” patent of Google. Importance of Language Transition buttons for multi language SEO can be seen also from this schema too.
Hreflangs are part of multilingual SEO, but are not the whole of multilingual and international SEO. But as part of a multilingual SEO strategy, hreflang tags for languages with ISO 639-1 codes and regions with ISO 3166-1 codes, should be used.
- Use hreflangs from the beginning.
- Do not use relative URLs with hreflangs.
- Add hreflang for all of the alternate versions.
- Use the correct form for languages and regions within hreflang tags.
- If you haven’t used hreflangs from the beginning, check the average position change after adding the hreflangs, along with query count changes for alternates.
After adding the hreflangs to the ForexSuggest.com multilingual SEO project, an organic click increase of around 100% occured. This increase varied somewhat from region to region: for some regions, it was better, and for some regions, it was lesser.
The internal linking between different sections, and the language change button always should be visible.
9. Use CDN servers or multiple local servers
A search engine can check the DNS zone and IP address of a website to see the website’s location and relate it to the intended audience. The use of a single server location for a multilingual and multi-regional website can make it difficult to mark alternative sections of the website and match alternative sections with the intended audience. Using a local server will not only facilitate the process of matching each part of the website with a different region and language but also will enable users using the relevant language in the relevant region to access the website more quickly.
Total Content Length for alternate versions, and the canonical version along with their percentages within the total content length of the multilingual website.
Using a separate server for each region and language can reduce Time to First Byte for a multilingual website. This will be beneficial for search engine crawlers that come directly from different regions to crawl your website, and the improvement in Page Experience will contribute directly and indirectly to SEO.
- Use local IP addresses and DNS qones for every language and region
- Use multiple servers for multilingual and regional audiences
During the ForexSuggest.com project, for every language and region, a different server, database, and WordPress setup are used from the local region with a localized IP. My reasoning for this is also based on experience: in the past, I have experienced situations where a Polish website mostly drew traffic from Ukraine because of the Ukrainian IP address. Even if the search engines can fix these problems later, giving consistent and clear signals from the beginning will make a website rank better over time.
Pie chart for the content amount and content length for different sections.
10. Use consistent design, brand colors, identity, and web page layout across different sections
Consistent brand identity across every URL of a website along with usability patterns, colors, web page layout is useful to give consistent visual communication. This decreases the learning time for users and increases the confidence level of search engines, preventing them from perceiving an inorganic site structure.. An inorganic site structure is a structure that is complicated, not integrated across different subdomains, different domains, or subfolders. Design changes or logo changes, as well as web page layout changes can change the meaning, context, and usability of a web page.
Multilingual and multiregional websites can have an inorganic site structure because of the multiple servers, or multiple front-end, back-end, and author, design teams. If the brand logo, baseline, addresses, phone numbers, web page layout, colors, and typography of the website change, it will make eye-tracking and branding harder while making search engines think that the website doesn’t belong to a single brand entity.
Due to the broken HTML, the extracted anchor texts are not in the best shape, but the language transition button, and its internal links can be seen in a semantic order. Despite, “Read Review” and other types of “un-contextual” anchor texts.
To understand the problem of inorganic site structure, you can use Bing’s guidelines that explain what it is and why it should be avoided. Prior to 2017, lots of site owners rented their subdomains or subfolders to other brands. Since authoritative top-level domains are easier to rank, people rented sections of these websites for their own topics or brands. Also, in the black hat industry, differently designed subdomains are used to channel the flow of PageRank externally, even if the subdomain belongs to the main domain. In other words, if the design, topic, and identity were different, search engines thought that it was actually a different website. However, this practice continues to some extent today.
Augmenting queries with synonyms selected using language statistics patent for Google. Google can use the “Relative Frequency” to choose a source’s click satisfaction possibility, and local relevance for other countries…
Thus, using consistent design, web page layout, logo, brand name, brand baseline, localized addresses, typography, and web component placement is important to make a multilingual website consistent. This consistency also makes language transition smoother for users in terms of usability.
- Use the same design, colors, and visual elements for every sub-section.
- Use the same logo, brand name, brand motto, brand colors for every sub-section.
- Use the same web page layout, web page components, and placement for every sub-section.
During the ForexSuggest.com SEO case study, most of the websites had the same design for important sections such as the homepage, but also in some sections there are different designs and usability patterns that should be fixed.
After 10 years of Danny Sullivan’s article for cross-lingual information retrieval and Google’s “Language Barrier Removal” announcement, Google started to translate documents to close the information gap between languages to satisfy the queries from other countries, and languages. This is also a signal that shows why SEOs should think about making their website multilingual.
11. Use multiple sitemaps and separate Google Search Console accounts to make analysis easier
Using multiple sitemaps is useful to analyze the Google Search Console’s coverage report for different sections. If a multilingual website has 41 different languages or more, analyzing every web section for a different language in terms of organic traffic, user behaviors, conversions, or indexing, and technical SEO requires a high level of data filtering. Filtering and grouping all of the URLs from the beginning via different and segmented sitemaps are useful to make data analysis easier for SEO.
The errored sitemaps can be seen for the ForexSuggest.com from both of the Google Search Console and Bing Webmaster Tools.
Data analysis and data science is an important part for SEO. Part of this process is data preparation, and examining the right part of the data at each step. Another part of this process is visualization. I will show some examples of visualization later, but if you want to learn more about Data Science, and Visualization for SEO, you can read related guidelines.
Google Question Hub is about satisfying the questions that are not answered on the web. A question might be answered for a language, but might not be answered for another one. In this context, Question Hub, and Cross-lingual Information retrieval, crawling, indexing, and query expansion are connected to each other.
When preparing data, segmenting the available data according to the axes you will want to analyze is an important step. Thus, segmenting sitemaps is important as creating different Google Search Consoles for different subsections. Different GSC account setups for every different language section of a multilingual and international website are useful to make organic search performance, Page Experience Report, and Links report analysis easier.
There are 45 different sitemaps for this multilingual SEO Case Study, and the asymmetry and topical coverage differences can be seen clearly.
Having multiple but smaller sitemaps can help a search engine to download the sitemap more frequently for crawling and refreshing purposes. In news SEO, smaller sitemaps for similar site segments usually help to increase the count of indexed web pages. A multilingual website can also use smaller and multiple sitemap counts for the same purposes.
- Use different Google Search Console accounts for every language and region section.
- Use different sitemaps for every language and region section.
- Make the sitemap files smaller if it is possible, and use different sitemaps, and GSC accounts for data analysis.
For every sub-section or language-based subfolder of the multilingual website, examining the coverage report will be more efficient.
12. Understand the value of multilingual and regional websites for search engines
Multiregional and multilingual websites are valuable for search engines to perform query expansion for cross-lingual information retrieval and indexing.
A research paper publication from Google related to the gender related bias within the open web that affects the NLP and NLU. Google talks translating content on Wikipedia without any gender bias by explaining the language based structural differences, and it includes cultural differences too.
The goal of cross-lingual information retrieval is to understand multiple languages for a topic by recognizing entities, understanding sentence structures, and matching the synonyms. The second value of multilingual websites for a search engine is that they satisfy users for multiple languages for the same topic. Instead of crawling, rendering, understanding, evaluating, indexing, and ranking 10 different sources from the web, a search engine can choose one source with “broad appeal” for a topic with multiple languages. Quality multilingual websites help a search engine to understand a topic in a better way by examining the queries from different languages, and satisfy more users from different cultures and segments.
This is another example for entirely another industry. The main difference is that the most important sections are translated into the other languages. And, e-commerce and informational web pages improved organic search performance for all regions together.
The third value of multilingual websites for a search engine is that they help search engines to find quality documents for other languages by clustering. A search engine can find a quality document and then clusters similar other documents for the specific document. If a source is authoritative across languages for a topic, these documents can be used to predict the quality of new documents or new sources from these sources.
A short list of benefits and values of multilingual websites for search engines is listed below.
- Helps for query expansion
- Helps for cross-lingual information retrieval
- Helps for cross-lingual indexing and ranking
- Helps for cross-lingual synonym extraction
- Helps for cross-lingual quality evaluation for similar documents
- Helps for cross-lingual entity extraction, and natural language understanding
The change for the different countries for the second example after the translation is reciprocal as expected. Every improved organic search performance for a country benefited the other organic search performance profiles for other countries as well.
13. Understand the cross-lingual indexing and ranking for search engines
Cross-lingual indexing and ranking is the process of comparative indexing and ranking across different languages based on different queries, entities, and possible search intents. With the help of semantic SEO, Google has started to understand entities in a language-agnostic way and to connect the questions and possible search activities to each other across languages.
The main obstacle of uniting all of the user data from around the world is that of creating a consistent list of synonyms and co-occurrence phrases for every contextual domain. Entities do not need language to be expressed or to exist, but users of web search engines need phrases to express their needs behind the query. Thus, cross-lingual indexing and ranking help a search engine to increase the SERP quality and documents’ relevance for certain entities based on cross-lingual concepts.
A search engine can collect all of the n-grams, co-occurrent phrases, and entities from a multilingual website in a comparative way. Every entity will be processed with different phrases but also different angles from every geography. Winston Churchill is the victor of the Second World War for a British newspaper, but a newspaper from India can have a different attribute for the same entity. Thus, cross-lingual indexing and ranking will help a search engine to increase its confidence score for certain connections between phrases and entities by improving its knowledge graph full of facts between entities.
A research paper from 2018 in Google. Daniel Vrandecic seeks methods to make information more available for different languages by translating Wikipedia articles to other languages by protecting objectivity. It is also a good research paper to see trust signals, and subjectivity perception of a search engine.
Cross-lingual indexing and ranking are possible with cross-lingual crawling and information retrieval. Google is a semantic search engine and semantically created sources with semantic information will thrive in SERP. Google is also a multilingual search engine. A multilingual website will have the benefit of aligning its own features and structure with the search engine.
Creating a topical authority for a source for a language is useful, but doing the same thing across different languages is more useful, and it will make the source more valuable.
- Query expansion is the process of translating a query to another language to retrieve quality documents for the purpose of finding the most similar documents from the similar sources from the original query’s language.
- Cross-lingual indexing can help a search engine to unite the user data and feedback across languages to generate better satisfying quality detection algorithms.
- Multilingual search engines have their own translators, and they allow users to use translated versions of the documents if the document language is different from the language of the device, the OS (operation system), or the browser.
- Google calls the original query language “source language” and calls the original query “source query”.
- Some queries are mutual for all languages, thus language-detection might be hard. For these queries, Google gives a cross-language SERP instance. Google calls it “cross-language search”.
- Google uses the “CLIR” abbreviation for “Cross-Language Information Retrieval”.
- In the documents, the search engine realizes that there is a bias between the languages for the controversial entities, thus sources with authority, evidence, and E-A-T will be prioritized by CLIR systems.
A schema from Google’s Cross-language search patent.
14. Use Audience diversification with external references, mentions, and connections
Audience diversification and popularity for cross-lingual search are connected to each other in the eyes of search engines. Google mentions the “popularity signals” for “disambiguation of a spoken query term”, “scoring authors of posts”, “enhanced identification of interesting points-of-interest”, “displaying information related to content playing on a device”. Google can use popularity as a compass for reliability, and quality. Audience diversification is the process of diversifying the IP addresses of external references with links, mentions, or visual elements, visitors. The IP addresses can show the location of a user, even if the user tries to hide his/her IP address, Google knows the ISP (Internet Service Provider) of the user. Additionally, Google DNS servers are useful for Google to keep track of users in terms of their searched terms, visited websites, or even listened to songs on the web, or downloaded movies.
Refining location estimates and reverses geocoding based on a user profile schema from the patent of Google to guess the geo-location of a user.
For multilingual SEO, given the popularity, reliability, and quality signals across languages by diversifying all of the information points for relevance to a different geographical area, language, and user segment is important.
- Diversified IP addresses of external references
- Diversify the popularity signals by diversifying the actual traffic across languages
- Increase the search demand for navigational query of the brand across languages
- Use social media across languages to improve the popularity signals
Google can trigger different types of algorithms based on popularity signals whether it is for a query or a source across languages, ”Dynamic Language Model” patent of Google”.
To learn more about Audience diversification, you can read its importance for news SEO.
[Case Study] Optimize links to improve pages with the greatest ROI
15. Do not use local-adaptive web pages
Local-adaptive pages are users are redirected to from another URL based on their IP Addresses or it can show different language content based on the language of the browser, OS, or device. Local-adaptive pages can prevent a source from using its own homepage for welcoming a search engine crawler. The most important web page of a source is its homepage in terms of understanding the most important internal links, anchor texts, or source’s context, and identity. Thus, redirecting the search engine crawler to a non-homepage, or homepage alternate page with local-adaptive web pages is not search engine friendly communication and practice.
An announcement from Google for “locale-adaptive pages” and their crawling process in 2018.
Redirecting search engine crawlers, and users to different URLs based on different IPs will dilute the confidence score of a search engine to understand the relevance of a web page to a certain geography, language, and also search intent, concept, and phrase.
- Being redirected to different URLs based on different IPs will dilute the context of the web page, and it will make it harder to understand the original content of the web page.
- Do not redirect users when they try to land the homepage. Do not use localized homepage alternates instead of the real homepage. Reflect your main audience’s language on the homepage.
- If you block the users from a certain geographic area, also block the search engine crawlers from these certain areas.
- Be consistent with the URL and content structure with the “separate locale URLs”.
Lastly, Google supports “geo-distributed crawling” and “language-dependent crawling” by changing its “Accept-Language” HTTP header and IP address to see the actual content of the web page. Google supports local-adaptive web pages to work on the web as they find it, but it still doesn’t mean that for international SEO, locale-adaptive pages are more costly and less understandable from the search engines’ perspective.
This is the visualization of the ForexSuggest.com’s URL pattern for the different language sections, and their frequency within the site-tree. The asymmetrical structure can be seen here.
16. Always use a consistent homepage
Having a consistent homepage across different site languages is important in multilingual and international SEO because it reflects the source’s identity and the most important website sections. A search engine crawler visits the homepage right after visiting the robots.txt file. Inconsistent homepages between different alternate languages can dilute the identity, brand profile, and relevance of the source for the specific industry. There are sources on the web that do not have a homepage, instead, they use sub-directories as the homepage of a specific language and region. This situation will harm the clarity of the source and cut off the potential of better PageRank distribution, relevance signals via the homepage.
This is a visualization example of the Hreflang Implementation of the ForexSuggest.com. The prioritized sections, and asymmetry, delayed sections and the general missing parts can be seen easily.
Even if an SEO project uses dynamic rendering, redirecting the user and the crawler to a different URL with an inconsistency will harm the reliability of the source, and the efficiency of the search engine communication. Even if there is an SEO A/B test with a URL change, using 302 redirects, only for two weeks is the ideal duration. After two weeks, Bing assumes that the 302 redirects are actually intended to be used as 301 redirects; Google considers that 302 redirects left in place for excessively long are intended to deceive search engines, but it doesn’t define the exact duration amount for this specific algorithm trigger.
The URL Count for the Canonical English Version and the Alternate Versions with their percentages.
The benefits of a consistent homepage for multilingual SEO can be found below.
- Improving the PageRank distribution efficiency
- Improving the crawling efficiency
- Reflecting the source’s identity and prioritized audience from the homepage
- Giving the most important terms, n-grams, anchor texts from the homepage
- Reflecting the brand’s identity to the users and also search engines.
- Having a consistent URL for the Knowledge Panel of the brand.
- Better consistent audience diversification for external references and organic traffic.
17. Use UTF-8 encoding whenever possible
UTF-8 uses a one-byte (8-bit) system to encode all existing characters, of which there are 1,112,064. The official Internet Assigned Numbers Authority (IANA) uses the UTF-8 encoding. Additionally, it is a standard for all of the CSS, HTML, XML documents, resources, and HTTP headers. Thus, Google prefers UTF-8 to make crawling, indexing, understanding, and ranking processes easier, and less costly. Due to the multilingual, multiregional, and international SEO projects, a domain can have non-ASCII characters and non-UTF-8 documents, letters within the resources, documents, headers, or URLs (names and paths of the documents on the server).
For all of the international SEO Case Study subject website, UTF-8 is used.
The habits of a search engine can be seen also for structured data as in UTF-8 encoding, or other types of search engine optimization-related tags. For Schema markup, Google prefers JSON-LD, because it is less costly to understand and extract data from a JSON-LD than microdata, or RDFa. Using internationalized domain names can be useful to connect with the main intended audience, but still, it might be hard to understand for all of the targeted audience.
To perform the conversions between ASCII and non-ASCII characters, a search engine might need to use ToASCII and ToUnicode types of algorithms. And, any extra algorithm work or stage to understand or index a website, can make the international SEO more complex. Being clearly understandable with international standards is more useful for the search engines’ perception, and also users’ understanding.
A section from “Identifying common co-occurring elements in lists” patent of Google for entity-disambiguation and entity information expanding in Knowledge Graph.
18. Do not rely on language and region HTML tags
Language- and region-related HTML and meta tags can be used to define the content language of the HTML document or its intended language, and the geography to be served. Google ignores the HTML Lang attribute and other region-related HTML tags because these are some of the most wrongly used and least consistent signals on the web.
Multilingual websites can increase their crawling efficiency by using a semantic site-structure across different language sections whether it is in the URLs, Breadcrumbs, or language transition button, or in the hreflang. ForexSuggest.com has 869 MS response time for the Googlebot, which is more than 4x of Google suggestions.
Most of the HTML Lang tags come from the website’s theme language, not from the content’s language, and most of the website owners do not know how to use or change the locational meta tags. Since the search engines try to take the control of their own perception from website operators, and SEOs to their own internal systems by turning most of the “commands” for the SEO into “hints”, multilingual and regional HTML, and meta tags are ignored by the search engines.
Some of the regional meta tags are listed below.
- HTML Lang attribute
- “Geo.region” meta tag for district, city, or country name.
- “Geo.position” meta tag for coordinates with “;” separator.
- “ICBM” meta tag for coordinates with “,” separator.
19. Audit the initial-ranking and re-ranking processes for multiple regions and languages
Initial ranking and re-ranking are two-phase of the search engine ranking algorithms and processes. Initial ranking is the process of ranking an indexed document for the first time. Re-ranking is the process of modifying the ranking of an existing document on the SERP. Re-ranking and initial ranking processes can be performed by the mutual, repeatedly and always-on working algorithms along with different and partially activated algorithms.
For this international SEO Project, we have at least one impression from 241 different countries. In this context, as I know we have only 195 countries, but Google has a different perspective to define the total country count on the planet for improving the search results quality.
For an international SEO project, a source can rank higher for certain queries from a language than other languages, or it can rank higher for certain topics than other topics. The same situation can happen for different website sections, authors, or other types of differences such as social media shares, the inclusion of URL within the sitemap, or linking from the homepage. The initial-ranking and re-ranking process can be affected by the removed external and internal links, changing internal links or anchor texts, source’s topical authority, query demand increases or decreases, and quality, reliability, relevance signals.
In terms of the Re-ranking and Initial Ranking in the context of Importance of Canonical Version, since the canonical version is in South Africa, also other close African Countries have better average positions, because if a international SEO is located in X country, it can be related also to the neighboring countries and audience from these countries more.
In this context, the initial-ranking and re-ranking process observation can be used for international and multilingual SEO as below.
- Watch the new published content’s crawl delay, indexing delay, and initial ranking.
- Use all of the relevance, prominence, reliability, and quality signals to make every URL rank initially higher such as all of the indexing signals, PageRank share opportunities, and related entities with the prominent attributes to the context.
- Compare the initial rankings across different languages, topics, authors, URL patterns, or breadcrumb patterns.
- Use hreflang to make every section semantically perceived by search engines across languages to share the quality and relevance signals across different versions for better initial rankings.
- Link the published content from the main content of the homepage.
- Audit the competitors’ pages for the initially low-ranked pages to find patterns.
- Wait to see the trigger of the re-ranking process, frequency, and its direction.
- Audit how search engines re-ranking the page according to the “confidence of the SERP quality”.
- Create an alarm for the sitemap changes of your competitor, and audit the re-ranking and initial-ranking processes of the competitors to see their authority and prominence for the search engine’s perception.
In the future, I will publish a new SEO case study and new guidelines for using the initial-ranking to create an advantage for re-ranking processes in the benefits of a website by using topical authority and semantic SEO.
In terms of the boilerplate content’s translation, and consistency for design, anchor texts, the example above can be seen.
20. Prioritize the popular languages and audiences for your niche
Prioritizing the popular language and audience is the process of prioritizing the content publication, and updating frequency for the language that is used most, and creates historical data value. A multilingual website has different language sections for its international SEO, and every section will generate a different level of value for the search engine.
The link value proposition is the value that has been proposed by the source that has been chosen by the search engine for search engine results pages. If a source provides value for the SERP quality within the borders of the link (snippet) that has been given itself, the search engine will be more confident to rank the source further. An international SEO project can ensure its chance to be ranked higher not only initially but also during the re-ranking process by creating more historical data via the popular languages.
If you check the footer design comparison, you should realize that the website uses a Luxembourg address. And, with lots of African countries thanks to South Africa Canonical Signals, we also see that Luxembourg has a better average position. Some of these results are because of low competition, but some of them are because of local relevance signals.
The overall prioritization methods in terms of content publication and canonical-alternate version pairs for international SEO can be found below.
- Prioritize the canonical version of the website.
- Prioritize the central entities and central subtopics.
- Prioritize the popular languages among alternate language versions.
Request your custom demo
Last thoughts on international and multilingual SEO
While writing these multilingual and international SEO guidelines, I have tried to cover things that are not mentioned by the general SEO guidelines, but that search engines know, care about, and exploit. Most of the time, for multilingual SEO, most SEOs only focus on hreflangs, but actually, there are way more things than just hreflangs that affect the multilingual website’s international SEO performance.
The best performing countries for ForexSuggest.com. From Africa to other Continents, there is a strong Average Position and Success difference. The local relevance for a web page is also for decreasing the cost of computation for the Search Engines to create the quality SERP instances.
Multilingual SEO is connected to technical SEO, local SEO, semantic SEO, and hypertextual (link-based) connections. For search engines, multilingual websites provide great value in helping them to close the gap in their knowledge graph across different languages, to extract more synonyms, and to improve cross-lingual SERP quality via cross-language crawling, indexing, and ranking.
Becoming a topical authority for a topic can be acquired via semantic SEO. Locking in the topical authority by increasing the historical data across languages is possible via multilingual SEO.
Internal Link Rot also affects the Googlebot crawling efficiency. GSC Crawling Data is not full, but even the partial section shows the effect of the unnecessary CSS, and JS files along with the non-200 URLs. For a multilingual website, these problems can affect the international SEO Performance and Search Engine’s judgment for the multilingual website’s quality, and trust.
In these multilingual SEO guidelines, I have tried to pay attention to the relation between canonical and alternate versions of multilingual and regional websites, brand identity, design consistency, hreflang’s PageRank, and the quality share effect by processing the search engines’ perspectives via their documents. To do that, I have used a website with tons of errors in terms of technical SEO along with its momentum to grow by fixing some of the key errors.
The importance of this SEO case study does not come from the amount of traffic, or the 400%+ organic click increase. It comes from being able to understand the point of view of search engines for multilingual websites.
See you again later for more new SEO case studies and experiments.