Scrape your website’s data with our Custom Fields

January 26, 2017 - 3  min reading time - by Emma Labrador
Home > Product updates > Scrape your website’s data with our Custom Fields

Were you looking for a handy way to scrape, collect and order any data from your website? Our brand new feature lets you extract any piece of content from your website. Build your own filters with our custom fields and find them directly into your Data Explorer.

Why should you use our Custom Fields?

Our ‘Custom Fields’ feature offers different actionable use cases:

  • You can collect any product price or note on a page ;
  • You can collect the number of comments on an article or the number of ads formats on a page ;
  • You can verify if your analytics tagging plan or ads tools are well implemented ;
  • You can list similar or complementary products displayed on a page.

Possibilities are unlimited and the use cases above are just a few examples.

How to use the custom fields?

Our Custom Fields can be set in your crawl settings:

  1. Setup your extraction rules ;
  2. Launch your crawl ;
  3. Find your newly created fields in the Data Explorer ;
  4. You can create as many fields as you want.

How to setup your Custom Fields?

1- Choose your expression type

We support two kinds of expression : either basic regular expression (see the guide) or XPath expression (see the guide). That choice is important because it influences the way the rule will be treated.

1.a Using a basic regular expression

Sample: <meta itemprop=”ratingValue” content=”4.5″>
Rules : <meta itemprop=”ratingValue” content=”([0-9]+(\.[0-9]*)?)“>
Output : 4.5

1.b Using an XPath expression

Sample: <meta itemprop=”ratingValue” content=”4.5″>
Rules: string(//meta[@itemprop=’ratingValue’]/@content)
Output : 4.5

2- Choose the type of extraction

  • Mono-valued: Return the first matching result.

This extraction is perfect to extract a product’s price or note

  • Multi-valued: Return all matching results.

This one can be used to extract a list of similar products

  • Check if exists: Returns true if the expression was found inside the page, false otherwise.

This type of extraction is well suited to check traffic analytics or advertisers tags

  • Length: Returns the length of the matched characters string.
  • Number of occurrences: Returns the number of times the pattern has been found.

This rule is perfect to count the number of comments on an article or the number of ads in a page.

3- Choose the field format

Field formats are important because they enable query operators in our OQL (Oncrawl Query Language) as well as sorting values in the Data Explorer tables.
Note: depending on the type of extraction, this choice is disabled: ‘Check if exists’ enforces the field to be a boolean field, ‘Length’ and ‘Number of occurrences’ both enforce the field to be an integer field.

  • Value: Raw value extraction – the content is stored inside a character string and lets you use string operators such as ‘starts with’, ‘contains’, ‘does not contain’, ‘is’ and ‘is not’ in the Data Explorer.
  • Number: Raw value is casted into an integer – the content is stored as a number, which lets you use operators such as ‘equals’, ‘does not equal’, ‘is greater than’ or ‘is lower than’, and so on, in the Data Explorer.
  • Decimal: Same as number except that the number is casted to a floating point number.
  • Formatted value: Captures different groups from the pattern and allows you to format them in the character string of your choice. Groups are numbered from {0} to {9} depending on the number of capture groups inside the pattern. No more than 10 capture groups are supported.

Sample: <strong class=”product-price”>249<sup>€99</sup></strong>
Rules : <strong[^>]+>\s*([0-9]+)€([0-9]+)\s*
Field format : Formatted value
Formatted value : {0}.{1}€
Output : 249.99€

oncrawl custom fields setup

4- Name fields

You need to add a name to your newly created fields to easily find them in the Data Explorer.

5- Test the rule

You can directly test the rule by hitting the ‘Check’ button with a sample of different pages or by copying a piece of HTML code to make sure everything works as expected.

6- Use your Custom Fields

Then, go to your Data Explorer, click on ‘add columns’ and select the Custom Field you have created.

custom field filter

custom field data explorerYour can also directly sort your URLs by Custom Filters. Select ‘Set your filter’ and the Custom Field you’ve just created. Then, define your query (‘True’ or ‘False’ here) and hit ‘Apply Filters’.

data explorer and custom fields oncrawlYour URLs are then only sorted with the requested Custom Fields:

custom field - soldout is trueYou are now ready to play with your new filters!

Emma Labrador See all their articles
Emma was the Head of Communication & Marketing at Oncrawl for over seven years. She contributed articles about SEO and search engine updates.
Related subjects: