split testing

Types of duplicate content e-commerce websites need to fix

April 5, 2017 - 6  min reading time - by Emma Labrador
Home > SEO Thoughts > E-commerce : types of duplicate content to fix

E-commerces websites are facing recurrent duplicate content issues or thin and low quality content. Those types of SEO issues are badly penalized by search engines. In fact, Google and other search engines are getting more and more picky towards websites and reward qualitative and unique content. This article focuses on different types of duplicate content faced by e-commerce websites.

Internal duplicate content

Non-canonical URLs

Canonical URLs help search engines know that there is only one URL that should be indexed for a content. Canonicals are especially used for tracking URLs but also for category pages URLs where sorting, filtering and functional parameters are added to the end of the URLs. Adding a canonical in the source code at the base category URL will prevent search engines from indexing these duplicate URLs.

Types of duplicate URLs ecommerce websites

You can also disallow the crawl of common URL parameters with a robots.txt to save your crawl budget. Use this type of configuration:

User-agent: *
Disallow: *&order=*
Disallow: *?price=*

Duplicated URL paths

The way a CMS handles product URL structures can potentially lead to important duplicate issues. For instance, lets say a product is located in both category A and B and if category directories are used within the URL structure of product pages, then two different URLs are created for the same product. To avoid this type of duplication you can either:

  • Use /product/ URL directories for all products ;
  • Use root-level product page URLs ;
  • Use product URLs built upon category URL structures with a single canonical URL for each product page URL.

Session IDs

Sessions IDs help e-commerce websites to track user behavior but it also creates duplicates for search engines. For each ID session applied, it creates a duplicate of the main URL. The best solution to avoid is to use cookies instead of ID codes for each URL. But you can also canonicalize session ID URLs or disallow crawling ID URLs with a robots.txt as long as the CMS system does not produce session IDs for search bots. The right format should be:

User-agent: *
Disallow: *?sid=*

Product review pages

Most of CMS offer built-in review features with separate review pages gathering all reviews for a product. It can create duplicate content between the product pages and the review pages. You should add a canonical on the review pages to the main product page or insert a noindex/follow in the x-robots tag. If you want to keep link juice to the review page from an external website, keep the canonical option.

WWW vs. non-WWW URLs

Search engines regard http://www.domain.com and http://domain.com as different addresses. But it is important that only one version of the URL is picked. The best option to do so is to 301 redirect the non-preferred version to the chosen version to avoid any duplicate issue. You can also set a preferred domain within Webmaster Tools.

Internal editorial duplicate content

Category Pages

Category pages are another e-commerce element that provide some SEO headache. Category pages usually only have a title and a product grid. Content is thus really thin and poor regarding to Google‘s guidelines. To prevent Google’s penalties, you should add up to 300 words at the top of category pages describing the product you are going to find to maximize your chances to get indexed. Use that time to deep link to related sub-categories, article or any other content that might add value to your category pages. You will enrich your internal linking and your SEO.

Similar content between products

An e-commerce website has to deal with many different products and some of them can be very similar. It results in close product descriptions that Google regards as near duplicate content. It is easy to tell that e-commerce product pages should be unique, attractive and qualitative especially for websites that need to deal with very competitive keywords.
Taking the time to write unique product descriptions can play a big part in the position war.
Sharing short descriptions, specifications and other content between product pages increase the likelihood that search engines will decrease their feelings of a product page’s content quality and then, ranking position.

Home page duplicate content

Your homepage is the entry of your website and usually get all the intention in terms of acquisition funnel, link strategies and technical optimizations. But for e-commerce websites, it is also important to be sure that unique content serves the majority of the home page and that you don’t only offer an overview of your products. Products snippets offer few contextual value to search engines to index your content and rank your website for competitive keywords.

[Case Study] Handling multiple site audits

In a few weeks, using Oncrawl has helped Evergreen Media with SEO quick wins regarding Google Featured Snippets, snippet optimization, rankings improvements for converting pages, 404 errors... Find out how Oncrawl can ease any SEO agency’s workflow when it comes to SEO audits.

Off-site duplicate content

Duplicate content is a SEO concern that can occur inside and outside a website and between different e-commerce website. External duplicate content can really be a pain for e-commerce websites to effectively rank on competitive keywords. Let’s see some of the most common offsite duplicate content issues.

Manufacturer product descriptions

It is quite common to read the same product description on different e-commerce websites. The thing is, product manufacturers offer one general product descriptions for the websites they are working with. Search engines regard these multiple sites with the same descriptions as low quality websites as they don’t offer any extra value to users. Only authoritative websites (with a lot of backlinks) will have chances to rank for this product.
If you are facing this issue, you will need to take the time to rewrite your existing product descriptions and be sure that the next one will also have unique texts. No doubt that this is time consuming, but every SEO improvements have an impact to increase your rankings.
At least, rewrite unique content for your top products or best sales or in other words product you want to rank in priority. Also, be sure that you take the time to write unique content for product that have a full lifespan on your website and not product that are going to be removed in the next few weeks.

Product feeds

Some e-commerce websites also extend their products on third party shopping websites to increase their chances of conversion. But this strategy also creates offsite duplicate content.
The irony does not stop here. Most of the time, these third party websites are more authoritative that the main website in itself. You can take the example of Amazon that sells billions of products from third websites. The thing is, in terms of revenue strategy, this is great because the product is going to be better exposed but in terms of SEO, your e-commerce website’s search traffic is likely to be penalized. In fact, for two identical versions, Google will give a better chance to more authoritative results.
The solution is simple. Be sure that product located on third party websites have unique descriptions. You could keep the manufacturer description for third party website to save time.

Staging and test websites

I know this can sound crazy but it happens that staging websites can be indexed and found by search engines, creating a exact duplicate or your live version. Don’t freak out, easy solutions exist:

  • Insert a “noindex,nofollow” meta robots to every page on the staging site ;
  • Block search engine crawlers from crawling the sites via a “Disallow: /” command in the /robots.txt file on the staging site ;
  • Set up test sites separately within Webmaster Tools and use the “Remove URLs” tool in Google Webmaster Tools to get the entire staging site out of Google’s index ;
  • Create a password on the staging site, to prevent search engines from crawling it ;

If search engines have already indexed your website, these solutions are the one to provide the best results.


Oncrawl has developed a powerful duplicate and near duplicate content tool to detect all your types of internal duplicate content.

You can also filter your duplicate content by cluster of pages and by content similarity:

We are the first ever to provide a near duplicate content detector based on the Simhash method to exactly tackle one of the most common e-commerce websites SEO issue.

Emma Labrador See all their articles
Emma was the Head of Communication & Marketing at Oncrawl for over seven years. She contributed articles about SEO and search engine updates.
Related subjects: