- Use cases
- Customer Success
- LOG IN
- Start free trial
OnCrawl is excited to announce that all users have been progressively migrated to an all-new crawler during the months of February and March 2019. Our favorite aspect of the new OnCrawl crawl technology? The out-of-this-world speed.
We’ve also taken advantage of this core update to include a constellation of new elements, including real-time crawl monitoring details and expanded data for link analysis and non-indexable pages.
This major improvement to the OnCrawl crawler produces identical crawl results. All dashboards, features, and available data are fully supported by the new crawler.
This allows you to seamlessly compare new crawls to old ones without worrying about skewed data or requiring adjustments.
There are no differences in the OnCrawl dashboards and charts. Their appearance and the calculations used to produce them have not changed. The change in crawler does not produce different numbers in the charts.
Your crawls now get off the ground with a shorter countdown and a more powerful blast-off.
Not only does the crawl initialization phase take significantly less time, but you can also use the improved crawl monitoring page to see what’s happening during this period, even before the first pages are crawled.
On the crawl monitoring page, you can follow the progression of the different pre-crawl tasks, and then track precise numbers, refreshed in real time, that let you know exactly how (and what) the crawler is doing.
Your crawls also get a continuous boost to their speed throughout the crawl. Larger sites will see enormous reductions in the time it takes to crawl a site, but even sites with under a thousand pages can see increases of 8x in crawl speed.
Not only do we crawl each page faster, but we’ve improved global crawl speeds through improvements in three key areas:
The new crawler benefits from an improved time required to pause or stop a crawl. Because of how data was treated and stored, pausing a crawl could take a while after you pressed the pause button. With the new crawler, it’s nearly instantaneous.
If you need to pause or cancel a crawl, you can now do it in a matter of seconds at most.
OnCrawl’s crawl monitoring interface received a facelift before the release of the new crawler. With the new crawler, however, there’s a galaxy of differences in crawl monitoring.
The upgrade includes:
The new OnCrawl crawler pulls information for all pages in HTML format, regardless of their indexability or their HTTP status code.
Previously, page data was only available for pages with a 200 status and for indexable pages. The new information is available in the Data Explorer and URL Details tools, as it generally concerns only pages that are not included in OnCrawl dashboards and charts. This includes the wordcount, title, meta description, headings (H1-HN), n-grams, Open Graph and Twitter cards–and more.
In the URL Details, under “View source”, the full source code, including the complete page headers, is available for all pages.
Pulling data for additional pages also helps OnCrawl to find more links and to improve the similarity between our crawler and googlebot. For example, OnCrawl previously disregarded the content of pages with a 3xx status, despite evidence that Google may follow links on this type of page. OnCrawl is now able, like Google, to crawl links on pages with a status other than 200.
The fact that additional links are found and additional pages are analyzed means that the total number of pages in the Data Explorer may be higher than the numbers you previously saw for the same site. For example, on a redirected page, we might now discover links to new pages. Even if these new pages happen to be non-indexable, they’ll be counted in the total number of pages in the Data Explorer.
We’ve improved the crawler’s ability to identify and follow links leading away from a page. OnCrawl’s crawler now takes the following links into account:
We’ve added fields related to the OnCrawl bot in the Data Explorer:
We’ve also improved reporting for certain fields that used to report default values. This gives you more precise information for all of your pages. For example, the Metarobots field in the Data Explorer defaulted to indicate that robots were authorized when no contradictory information was present. Now, the Metarobots field displays the actual value of the “meta robots=” property for the URL. If the property is missing, the field is left blank.
The new crawl scheme will be rolled out progressively to all new and existing users without any action required on your part.
If, however, you’re still using the old version and want to jump to the front of the upgrade line, please feel free to reach out to your account representative or to contact us via the blue Intercom button at the bottom right of the screen when you’re signed in to the application.
Interested in faster-than-light crawling with the strength and flexibility a technical SEO data platform? If you’re not an OnCrawl user yet, start your free trial today.