- Use cases
- Customer Success
- LOG IN
- Start free trial
OnCrawl is excited to present our duplicate content lab. Our R&D team is working on a new way of looking at duplicate and unique content on your site that will give you a more accurate way of approaching your editorial strategy.
Content is still one of the three most important ranking factors, and Google encourages websites to deliver insightful, unique and descriptive content to their visitors to offer the best user experience.
But not all content on a page carries the same weight. Google has always been pretty good and is getting even better at separating boilerplate–structural content such as your header, footer, navigational menus, and other repeating content–from the meat of the page.
In short, Google generally ignores the text of your template and ranks only your main content. This is why, instead of examining page word count, OnCrawl’s new experimental lab breaks content down into blocks, rather than pages.
Once we are done crawling your website, each web page is split into smaller blocks of text. A block of content is a set of words that occur in together within a single HTML node, such as anchor text, paragraphs, or items in a bullet list.
For each block, we calculate a uniqueness quotient and an occurrence ratio across the whole of your website. We continue to use the same algorithms that Google does, notably the Simhash algorithm, which allows us to compute degrees of similarity.
Using blocks of content, we can then identify the main content on a page. This is the content that is the least duplicated. This helps OnCrawl can provide answers to the following questions:
Because content blocks allow us to focus on unique content only, you can now look at a page’s uniqueness in relation to other pages on your website, and find the pages that contain too little unique content.
In the Data Explorer, you can now examine the number of words and percent of words on the page per type of block:
These metrics are also available for segmenting your pages.
In the crawl report, a new dashboard is available in the sidebar: Text block analysis. The charts in this dashboard give you an overview of how your site’s content breaks down by uniqueness quotient.
These charts can also be used in custom dashboards.
Which pages still have thin content once we remove templates and boilerplate content? Check the number of pages with under 300 words in unique blocks, regardless of the total number of words on the page. These pages have very little main content to offer–even if they occur on pages with a total of more than 1200 words:
Compare word count in unique blocks to overall page word count. Some pages with a low word count may still contain significantly more unique content than much longer pages, such as the pages in the first column on this site:
Evaluate the uniqueness per page by examining the portions of words per page that are found in each type of block. This helps answer questions such as:
Understand how many words are unique per page, and how that distribution plays out across other pages. This provides answers to questions such as:
And analyze uniqueness by depth and by page group:
This new analysis comes with a visual overlay for each page crawled by OnCrawl.
The content overlay illustrates your content’s uniqueness by highlighting each block of HTML content on your web page using a color corresponding to its uniqueness.
OnCrawl uses the source code viewed by the bot at the time of the crawl, and overlays the uniqueness analysis for each block on the HTML source.
By hovering over a content block, you can view information such as:
This analysis can reveal sections of pages where content is copied and pasted, or where editorial policy has used copywriting templates without developing them. Conversely, it can also show how pages with little content manage to include originality without increasing their word count.
Go beyond word count when looking into content quality.
OnCrawl’s experimental new metrics are designed to allow deep analysis of editorial strategy:
Our R&D aims to allow you to explore your content in depth and from a new angle. We hope you will enjoy playing with this new data and that it will help you take your editorial strategy to the next level.
Contact us to request access to this experimental lab, and feel free to send us your feedback using the in-app chat box.