Technical SEO _ OnCrawl blog

The schema.org semantic markup, a scheduled revolution

February 11, 2016 - 5  min reading time - by Erlé Alberton
Accueil > Technical SEO > The schema.org semantic markup, a scheduled revolution

Since decades, internet is facing big mutations, some imposed by search engines, some others, more subtle, come up naturally thanks to the creativity of some communities. Thus, between the 90’s and nowadays, we shifted from an internet reserved to some Phds to a giant web accessible to everyone, anywhere and at anytime. Our web holds billion of documents different from each others. Google would have more than 30 000 billion indexed pages and would treat more than 3.3 billion queries per day with around 15% new ones.

These are sets very hard to classify even if search engines have largely evolved in their crawl treatment and indexation system and in the way they forward the information, going from linear page results to enriched lists of response, giving at the same time more data around the concept linked to the query. These data are derived from a new domain of engines exploration: semantic data.
From an algorithm side, this is the next big step, chances are on your side to make it profitable.

Imagine the phenomenal work of crawl and ranking algorithms. How to understand the meaning of documents, extract key informations, filter the source code, separate informations from the HMTL noise, compare documents to sort them, qualified them, classify them and then resent depending on the user demand – more or less identified – to be sure to bring the best answer possible, in the more elegant format! At a time where more and more smart programs understand the natural language, catching the deep sense of pages and queries is obviously important for the future business.

Crawl and indexation become complex. We need ‘intelligence’ to be better – competition is tough – and search leaders have – one day – discussed around one same subject: semantic markup!

The presages of the data structures

Origins of this thematic has already been discussed in an article from the french website Abondance. It is interesting to remember that Tim Berners-Lee, had already integrally describe the search engines’ semantic mutation in 1999.
This visionary was dreaming of an autonomous machine, ‘intelligent’, trained to understand relations between physical objects and to be able to efficiently reply to a human query thanks to a form of semantic abstraction.

Since its creation, the web has organized itself around the description of documents and of their concepts through properties and relations that linked them to other documents.
The RDFA norm – for Resource Description Framework on Attributes – described a new type of connections, much more subtle than anchors: structured data.

Respecting strict and typed rules, they are organized in syntaxes able to enrich any concept described in a HTML element through attributes and links with known types.

Everything is there: concepts, attributes, types are today the Score, Prop and Type of the schema.org items.

Specificities have evolved – they have been simplified and unified but the foundations are the same. Links between pages are important – there is no questioning about internal and external links’ power – your pages’ main concepts are more and more easy to determine for machines, engines need to gain in efficacy, why don’t inspired ourselves from the attribute system to recreate a normed system that will simplify analysis and classements, then make it adopt by all developers?

Semantic markup is important for engines and they do all their best to make us adopt it massively. We had the “AuthorRank”, the breadcrumbs, the rich snippets, the knowledge graph, the answer box; with schema.org and HTML5 there is now the syntax to describe any physical entity with an amazing exactitude.

Your website is not marked as schema.org!

It is true that between a deep technique optimization and that subject someway futuristic, you have quickly chosen. After your OnCrawl audit, you had to make choices to improve your site structure, reduce your duplicate content, optimize your pages internal linking or the semantic – the linguistic part of the term – and you were right, but what is the next step?

Giving some meaning to your content, enjoy enriched content – this beautiful optimization which has been created to make you adhere to the schema.org technology and that drive today a major part of the search leader innovations.
Look at the last AMP (Accelerated Mobile Page) recommandations to understand that Google attract us to the semantic markup adoption and to the JSON-LD.
Enjoying clients reviews to display small stars on answer lists is a minimum, marking up your sitemap must be a habit, but today you can go further.

Semantic markup is ready to be massively used, but which profit should we retain?

Why marking up entities? For the beauty of the art because for many of us, it is a true source of motivation. To get rich snippets and be different from the competition but this will only last a moment. Train developers to a syntax which should become the foundation of the future interactions between machines – this is already much more appealing. Simplify engines’ work in their comprehension of your site and talk the most suitable language to seduce a crawler or an index.
Maximise simplification of your data, create strong semantic links between your pages, here is your new priority.

A car, for instance, https://schema.org/Car has properties like its length, its width, its steering angle, its color, the number of seats at the back, its trunk space. Its brand is at the same time a property and a typed entity https://schema.org/Brand holding its own attributes. This car entity named is also composed by an engine https://schema.org/EngineSpecification which is a global entity with its own schema, its own properties, which is sell by a https://schema.org/AutoDealer who, as a https://schema.org/LocalBusiness, has open hours, a physical address and much more other informations linked to its types. All these data are describable through the schema.org vocabulary and will be more and more used by engines to give you visibility.
Possibilities are unlimited, don’t miss again this optimization which will improve your content and your site’s quality.

Content is King, Linking is the Kingdom, Technical is the Throne

Erlé, former head of SEO for the french internet provider Orange, is now working as a Customer Success Manager at OnCrawl. He has developed his Front/Back office skills for 10 years and has developed a speciality for schema.org.
Related subjects: