Duplicate content is leading to SEO issues penalizing your rankings. It refers to content that appears in more than one place on a website. Duplicate content can be inside but also outside of your website. In fact, it causes troubles to crawlers since it is impossible to tell which piece is more relevant than the other for a given query. Taking UX into consideration, bots will not display multiple pages and are forced to choose the one likely to be the best. It leads to an important loss of relevant results on search engine results and so on a loss of traffic. Duplicate content can lead to three main issues:
- confusion between versions to index
- trouble to direct the link metrics (authority, trust, anchor text, link juice) to the right page or share it between different versions
- inability to rank the right version for queries
With OnCrawl, you can easily spot your cluster of duplicate pages and near duplicates. Also you will be able to see if the canonicals are matching, not matching or simply not set. You can also filter your clusters by number of pages and content similarity.
When clicking on a specific cluster, you will access further details about the URLs that are concerned.
And then, if you click on a specific canonical URL you can view which pages have similar content.
Also you can check if you have duplicated HTML.
Here, you can see that 1,606 pages have a duplicate title. When clicking on it, you will access the details.
However, there are different types of duplicate content. Some of them will hurt your rankings whereas others are harmless. Let’s focus on the ones penalizing your SEO.
What are the best practices ?
In order to avoid those duplicate issues there are some best practices you can follow. Much of the time, a content which is found in different URLs should be canonicalized. It can be done by using 301 redirects, rel=canonical or parameter handling tools in Google Webmaster Central.
301 redirect is in most cases the most relevant solution and especially for URLs issues. It tells search engines which version of the pages is the original and links the duplicate one to the primary one. Moreover, when multiple well ranked pages are linked to a single one, they are not competitors anymore and create a stronger relevancy and popularity signal. Those pages are thus better ranked.
Rel=canonical works slightly the same way as 301 redirect except it is easier to implement. It can be used for copied pieces of content from other websites. It will tell search engines that you know the article copied has been intentionally placed on your website and that all the weight of that page should pass to the original one. If you need further details about how rel=canonical works, we previously wrote an article on that subject.
This combined tags is useful for pages which should not appear in search engine’s index. Bots can crawl the pages but will not index them.
Google Webmaster Tool offers different services. One of them is to set a preferred domain for your site and handle URL parameters differently. However, this just applies to Google. Your changes will not be taken into account for Bing or other search engine settings.
And there are further methods which can be implemented :
This is a very basic setting that should be implement on every site. It just tells search engines whether a site should be displayed with the www or not in the search engine result pages.
Be careful when internally linking. If you decide that the canonical version of a website is www.mywebsite.com/, then all the internal links should go to http://www.mywebsite.com/website.html and not to http://mywebsite.com/page.html
When regrouping content, be sure to add a link back to the original one.
Write unique product descriptions
It might take more time, but if you write your own descriptions instead of taking the manufacturer ones, it might help you to rank above those other sites with duplicated descriptions.
How to improve your content and avoid duplicate content issues?
Here are the main situations where duplicate content happen. This is what you should avoid:
Parameters like click tracking or analytics code can lead to duplicate content issues. Actually, similar URLs pointing to identical pages will have problems. Google regards www, non-www, .com, com/index.html, http or https as different pages even if they are the same. It is thus seen as duplicate content.
Printer-friendly versions of content can cause duplicate content issues when multiple versions of the pages get indexed.
This common issue happens when each user that comes on a website is assigned a different session ID that is stored in the URL.
Copied or syndicated information
If you want to share an article, a quote or a comment of someone you worship or just to illustrate your articles, it will be seen as duplicate content, even if you have linked back to its website or URL. Indeed, Google will poorly value this pieces of content and it will certainly lead to an overall domain score quality drop.
Duplicate product information
If you own an ecommerce website, you have probably met this problem. It occurs when you use manufacturers’ item descriptions hosted on their websites to describe your products. The problem is that these manufacturers may sell this product to many different sellers and thus the description is appearing on many different websites. This is just pure duplicate content.
Sorting and multi-pages lists
An ecommerce website like Amazon offers filter options that generate unique URLs. It has a large number of product pages in most categories which can change orders depending on how the list is ordered. For example, if you range 30 items by price or by alphabetical order, you will end up with two pages with the same content but with different URLs.
For any questions about duplicate content, feel free to drop us a line @Oncrawl_CS