seo best practice

Our data

Duplicate content is leading to SEO issues penalizing your rankings. It refers to content that appears in more than one place on a website. Duplicate content can be inside but also outside of your website. In fact, it causes troubles to crawlers since it is impossible to tell which piece is more relevant than the other for a given query. Taking UX into consideration, bots will not display multiple pages and are forced to choose the one likely to be the best. It leads to an important loss of relevant results on search engine results and so on a loss of traffic. Duplicate content can lead to three main issues:

  • confusion between versions to index
  • trouble to direct the link metrics (authority, trust, anchor text, link juice) to the right page or share it between different versions
  • inability to rank the right version for queries

With OnCrawl, you can easily spot your cluster of duplicate pages and near duplicates. Also you will be able to see if the canonicals are matching, not matching or simply not set. You can also filter your clusters by number of pages and content similarity.

duplicate content cluster

When clicking on a specific cluster, you will access further details about the URLs that are concerned.

URLs with duplications

And then, if you click on a specific canonical URL you can view which pages have similar content.

cluster of pages
Also you can check if you have duplicated HTML.

oncrawl seo audit

Here, you can see that 1,606 pages have a duplicate title. When clicking on it, you will access the details.
However, there are different types of duplicate content. Some of them will hurt your rankings whereas others are harmless. Let’s focus on the ones penalizing your SEO.

What are the best practices ?

In order to avoid those duplicate issues there are some best practices you can follow. Much of the time, a content which is found in different URLs should be canonicalized. It can be done by using 301 redirects, rel=canonical or parameter handling tools in Google Webmaster Central.

301 redirect

301 redirect is in most cases the most relevant solution and especially for URLs issues. It tells search engines which version of the pages is the original and links the duplicate one to the primary one. Moreover, when multiple well ranked pages are linked to a single one, they are not competitors anymore and create a stronger relevancy and popularity signal. Those pages are thus better ranked.

Rel=canonical

Rel=canonical works slightly the same way as 301 redirect except it is easier to implement. It can be used for copied pieces of content from other websites. It will tell search engines that you know the article copied has been intentionally placed on your website and that all the weight of that page should pass to the original one. If you need further details about how rel=canonical works, we previously wrote an article on that subject.

NoIndex, NoFollow

This combined tags is useful for pages which should not appear in search engine’s index. Bots can crawl the pages but will not index them.

Parameter handling

Google Webmaster Tool offers different services. One of them is to set a preferred domain for your site and handle URL parameters differently. However, this just applies to Google. Your changes will not be taken into account for Bing or other search engine settings.

And there are further methods which can be implemented :

Preferred domain

This is a very basic setting that should be implement on every site. It just tells search engines whether a site should be displayed with the www or not in the search engine result pages.

Internal linking

Be careful when internally linking. If you decide that the canonical version of a website is www.mywebsite.com/, then all the internal links should go to http://www.mywebsite.com/website.html and not to http://mywebsite.com/page.html

Merging content

When regrouping content, be sure to add a link back to the original one.

Write unique product descriptions

It might take more time, but if you write your own descriptions instead of taking the manufacturer ones, it might help you to rank above those other sites with duplicated descriptions.

How to improve your content and avoid duplicate content issues?

Here are the main situations where duplicate content happen. This is what you should avoid:

URL issues

Parameters like click tracking or analytics code can lead to duplicate content issues. Actually, similar URLs pointing to identical pages will have problems. Google regards www, non-www, .com, com/index.html, http or https as different pages even if they are the same. It is thus seen as duplicate content.
Exemple:
www.mywebsite.com/red-item?color=red
www.mywebsite.com/red-item

Printer-friendly

Printer-friendly versions of content can cause duplicate content issues when multiple versions of the pages get indexed.
Example:
www.mywebsite.com/red-item
www.mywebsite.com/print/red-item

Session IDs

This common issue happens when each user that comes on a website is assigned a different session ID that is stored in the URL.
Example:
www.mywebsite.com/red-item?SESSID=142
www.mywebsite.com/red-item

Copied or syndicated information

If you want to share an article, a quote or a comment of someone you worship or just to illustrate your articles, it will be seen as duplicate content, even if you have linked back to its website or URL. Indeed, Google will poorly value this pieces of content and it will certainly lead to an overall domain score quality drop.

Duplicate product information

If you own an ecommerce website, you have probably met this problem. It occurs when you use manufacturers’ item descriptions hosted on their websites to describe your products. The problem is that these manufacturers may sell this product to many different sellers and thus the description is appearing on many different websites. This is just pure duplicate content.

Sorting and multi-pages lists

An ecommerce website like Amazon offers filter options that generate unique URLs. It has a large number of product pages in most categories which can change orders depending on how the list is ordered. For example, if you range 30 items by price or by alphabetical order, you will end up with two pages with the same content but with different URLs.

For any questions about duplicate content, feel free to drop us a line @Oncrawl_CS