What is Data Ingestion in OnCrawl?

Data ingestion allows you to add any additional data to analyses in our application. This data can then be found in the Data Explorer, and they can also be used to create segmentations, which means they can be used to generate charts based on the external data that you have added to the tool.

Let’s take a simple example. An e-commerce site would like to sort its URLs based on the percentage of sales margin for its products. Their SEO can ingest data for their sales margins:

Integrating this type of third-party data is explained in this article.

Ingesting position data in OnCrawl

I like to make sure I know search volumes, best keywords and current average positions before making decisions for a website.

The following procedure is based on using SEMRush to track these metrics, but it’s possible to adapt it to whatever other rank- and keyword tracking tool you use.

From your SEMRush account, you’ll need to export the data to be ingested by OnCrawl:

  • URLs
  • Keywords
  • Search volumes
  • Positions in Google

This export provides you with the full set of data for the above metrics. For the rest of this article, we’ll concentrate only on a single keyword per URL.

To obtain a single keyword per URL you’ll need to:

  • Sort by “best keyword” criteria
  • In another tab, list all of your URLs and use a VLOOKUP function to obtain the data for the first keyword for each URL
  • Delete excess lines
  • Save the new sheet as a CSV

Next, add the modified CSV to a ZIP file and select it in OnCrawl:

  • In the crawl settings, scroll down to “Analysis”
  • Expand the “Data ingestion” section
  • Tick the box to activate ingestion
  • Click on “Upload files” to add your file

Once the crawl with this ingestion has finished, you can check whether your data has been correctly taken into account in the Data Explorer. Your data will be listed in columns named: “User data: [nom de colonne CSV]”.

To check that they are present, add the columns from your file to whichever report you want:

Analyzing position data in OnCrawl

Now that the data are correctly ingested and are accessible via the crawl report interface, they are ready to be used for various interesting types of analysis.

Tracking important keywords

When you generate a report based on your site’s metrics it’s very easy to add a column for high-volume keywords that you’re tracking. It’s then possible to filter them to concentrate on the main families of keywords:

Filtered reports and reports with additional columns can, as always, be exported.

Establishing the profile of a page that gets the best positions

For the keywords that attract the highest volume of searches, it is extremely revealing to look at the page characteristics for pages that rank. Based on the trends that you find, it is often possible to create a profile for pages that are easy to rank for highly competitive keywords.

For this, you’ll need to be looking at average values on the “classic” metrics such as:

  • Page depth: where are these pages positioned within your site structure?
  • Number of internal links: how many links point to these pages?
  • Duplication status: if your site has duplication issues, do pages with high search volume keywords have systematically fewer duplicated elements?
  • Number of words: although this is not a ranking factor, content size has an influence on user experience. In what wordcount range do these pages fall?

For a more in-depth analysis, cross-analysis is extremely useful:

  • Cross-analysis with backlinks: does the backlink profile of pages ranked for keywords with high search volumes differ from that of other pages?
  • Cross-analysis with log data: is googlebot behavior different on these pages?

Defining priority pages

It goes without saying, pages ranking for keywords with high search volumes have a much better ability to attract visitors when in position 1 than pages ranking for keywords with a low search volume.

URLs that appear on the first two search results pages are good candidates to test optimizations in order to try to win a better position.

At this point, custom segmentations can be useful:

To take this analysis even further, it is interesting to look at how the page ranks and at different metrics like depth within the site structure:

For example, on this website, we can see that the URLs on the first page of the SERPs are located at relatively deep levels in the structure.

What’s more, an important number of URLs on the first two pages of the SERPs have little to no Inrank, an indicator of site-internal popularity similar to PageRank.

For this site, working on internal linking structure would most likely be a profitable optimization.

Best practices for position tracking in OnCrawl

Follow the basic principles of keyword tracking:

  • Exclude branded keywords from your analyses
  • Exclude long tail keywords that are not pertinent for your industry
  • Focus on keywords with high search volumes
  • Optimize pages where small modifications can have a large impact first
  • Segment your website in a way that lets you reveal trends

Tracking position data is worthwhile for everyone and for all websites, no matter their size, their construction, or their industry.

If you’re not an OnCrawl user yet, you can test tracking your position data during a free trial: