Duplicate content is one of the main issues webmasters can meet. It refers to content that appears in more than one place on a website. Duplicate content can be inside but also outside of your website. In fact, it causes troubles to crawlers since it is impossible to tell which piece is more relevant than the other for a given query. Taking UX into consideration, bots will not display multiple pages and are obliged to choose the one likely to be the best. It leads to an important loss of relevant results on search engine results and so on a loss of traffic. Oncrawl now knows how to detect duplicate content.
Duplicate content can lead to three main issues:
- confusion between versions to index
- trouble to direct the link metrics (authority, trust, anchor text, link juice) to the right page or share it between different versions
- inability to rank the right version for queries
However, there are different types of duplicate content. Some of them will hurt your ranking whereas others are harmless. Today, we are going to focus on the ones penalizing your SEO.
What types of duplicate content are harmful?
URL issues
Parameters like click tracking or analytics code can lead to duplicate content issues. Actually, similar URLs pointing to identical pages will have problems. Google regards www, non-www, .com, com/index.html, http or https as different pages even if they are the same. It is thus seen as duplicate content.
Exemple:
www.mywebsite.com/red-item?color=red
www.mywebsite.com/red-item
Printer-friendly
Printer-friendly versions of content can cause duplicate content issues when multiple versions of the pages get indexed.
Example:
www.mywebsite.com/red-item
www.mywebsite.com/print/red-item
Session IDs
This common issue happens when each user that comes on a website is assigned a different session ID that is stored in the URL.
Example:
www.mywebsite.com/red-item?SESSID=142
www.mywebsite.com/red-item
Copied or Syndicated Information
If you want to share an article, a quote or a comment of someone you worship or just to illustrate your articles, it will be seen as duplicate content, even if you have linked back to its website or URL. Indeed, Google will poorly value this piece of content and it will certainly lead to an overall domain score quality drop.
Duplicate product information
If you own an ecommerce website, you have probably met this problem. It occurs when you use manufacturer’s item description hosted on their websites to describe your products. The problem is that this manufacturers may sell this product to many different sellers and thus the description is appearing on many different websites. This is just pure duplicate content.
Sorting and multi-pages lists
An ecommerce website like Amazon offers filter options that generate unique URLs. It has a large number of product pages in most categories which can change of orders depending on how the list is ordered. For example, if you range 30 items by price or by alphabetical order, you will end up with two pages with the same content but with different URLs.
What are the best practices ?
In order to avoid those duplicate issues there are some best practices you can follow. Much of the time, a content which is find in different URLs should be canonicalized. It can be done by using 301 redirects, rel=canonical or Parameter handling tool in Google Webmaster Central.
301 redirect
301 redirect is in most case the most relevant solution and especially for URLs issues. It tells search engines which version of the pages is the original and links the duplicate one to the primary one. Moreover, when multiple well ranked pages are linked to a single one, they are not competitors anymore and create a stronger relevancy and popularity signal. Those pages are thus better ranked.
Rel=canonical
Rel=canonical works slightly the same way as 301 redirect except it is easier to implement. It can be used for the copied pieces of content from other websites. It will tell search engines that you know the article copied has been intentionally placed on your website and that all the weight of that page should pass to the original one. If you need further details about how rel=canonical works, we have previously written an article on that subject.
NoIndex, NoFollow
This combined tags is useful for pages which should not appear in search engine’s index. Bots can crawl the page but will not index them.
Parameter handling
Google Webmaster Tool offers different services. One of them is to set a preferred domain for your site and handle URL parameters differently. However, this just applies to Google. Your changes will not be taken into account for Bing or others search engine settings.
And there are further methods which can be implemented :
Preferred domain
This is a very basic setting that should be implement on every site. It just tells search engines whether a site should be displayed with the www or not in the search engine result pages.
Internal linking
Be careful when internally linking. If you decide that the canonical version of a website is www.mywebsite.com/, then all the internal links should go to http://www.mywebsite.com/website.html and not to http://mywebsite.com/page.html
Merging content
When regrouping content, be sure to add a link back to the original one.
Write unique product description
It might take more time, but if you write your own description whereas taking the manufacturer ones, it might help you to rank above those other sites whose descriptions are all duplicates.