Use Semantic HTML5 tags to Focus Webpage content

Home > Technical SEO > Focus the page content with semantic HTML5 tags

Use semantic HTML5 tags to focus your page content and avoid pollution and dilution of the page’s theme.

So, before we get down to the nitty-gritty, for those of you who are not really sure about what exactly the difference is between semantic HTML5 tags and the other HTML tags, generic html tags like <div>, <span>, <p> are just containers that give no indication of the type of content they contain. In fact, a <div> can hold just about anything and is used as the most basic building block for structuring pages.

So what then are semantic HTML5 tags?

Semantic HTML5 tags have a specific role to play and tell us what kind of content we can expect them to contain. Two that have been around since forever are <head> and <body> and a browser knows for sure that everything in <head> is metadata about the page and everything in <body> is the visible part of the page it will show to the user.

The tags we’ll be looking at specifically in this article are <header>,<footer>, <main> ,<article>,<aside>,<section> and, to a certain extent,<nav>.

Why just those seven tags? Because they are all we need to show the search engine algorithm where the important content is.

Why do we need to do this?

So why is it important to show the search engines where this content is? Can’t it just work it out for itself? I mean, Google is pretty smart, right? Yes, Google is smart, and getting smarter, but by flagging the important content you are not only saving it some work but YOU control the game!

Let’s consider a case where this can be particularly useful. A few years ago I worked on the SEO of a car leasing website. In theory this is a really easy job to structure as the offers were categorized by manufacturer > range > model, so there was plenty of inherited context as you reached the model level.

The problem was that each model page contained similar offers and related blog articles. So, for example, a page about a BMW car would contain similar offers for Mercedes, Audis and Jaguars, and blog articles about the car industry in general, that may mention Renault, Volkswagen or any other brand. If you look at the text content of the page there’s a lot of pollution and mentions of all kinds of stuff that has nothing to do with BMWs.

A case study

There is much debate about how important the semantic tags are and what Google actually does with them, so here is a screenshot of a relatively high-traffic site I worked on two years ago with Jason Barnard, in which the red line shows the moment the semantic HTML5 tags were integrated into the page templates and you can see the 30% gain in traffic that resulted.

So how on earth do search engines know what your page is about?

As humans we can analyse the page layout visually and we know instinctively from experience what the main content is. But how do search engines see your page? They see a jumble of text that is pretty much all about cars. Ok so far, but remember that your site mentions BMW, Mercedes, Audi, Jaguar and Renault in the same page. Along with car insurance and other news about the car industry in general.

A machine can “guess” by looking at signals like the <title> and the <h1> tags. It can also look at the number of times words appear in the text. This all pretty standard analysis and it will probably reach the correct conclusion … but why not tell it very specifically where the only content it has to consider is located?

[Case Study] Driving growth in new markets with on-page SEO

When Springly began looking at expanding to the North American market, on-page SEO has been identified as one of the keys to a successful start in a new market. Find out how to go from 0 to success with technical SEO for your content strategy.

Read the case study

How can we do this?

In the same way as we use the <head> and <body> tags to delimit areas of the html code, we’ll build a structure, an invisible structure, that will add only a few bytes to the weight of the page but will act like the administrative districts in a city. The bots will know exactly where they are and what the purpose of each area is.

* NOTE: do not apply classes or styles to the semantic elements. You need to be able to add them, remove them or move them around with it affecting the look of the page in any way!

The first thing we need to do is to separate the header bar stuff and the footer bar stuff from the main content.

We need to split it up into smaller chunks to organise the content blocks and, until now we’ve been using <div> tags to do this. (shudders remembering when page layouts were done using html tables). So what’s the problem with using <div> tags? Nothing, except that they tell us nothing about the role of their content.

You can give divs an id, like this:

<div id=”header”>
<div id=”main”>
<div id=”footer”>

but this doesn’t actually tell the machines anything. You might as well call them:

<div id=”john”>
<div id=”paul”>
<div id=”george”>
<div id=”ringo”>

We need something that tells us what the role of each block is, just as if we wrote:

<beatles >
<singer id=”john”></singer>
<bassist id=”paul”></bassist>
<guitarist id=”george”></guitarist>
<drummer id=”ringo”></drummer>
</beatles>

Luckily, there are semantic HTML5 tags to do just this: we can use <header>, <main> and <footer> tags. Like this:

<body>
<header ></header>
<main ></main>
<footer ></footer>
</body>

The <header> and <footer> will probably contain some navigation menus contained in <nav> tags, but that doesn’t concern us here.

So let’s look at the <main> block.

The <main> tag

Because there is a huge range of different types of content that we can put in the <main> block we need to be able to isolate the content that is specific to the current page and leave out everything else. To do this we can use the <article> tag, which will contain the <h1>, like this:

<main>
<article>
<h1></h1>
Specific page content
</article>
</main>

All the specific content for this page will go into the <article> tags.

Note here that “article” does not necessarily mean article in the sense of a newspaper article but just a thing, like an article of clothing, a product, a blog post, an “About Us” page, a recipe…

So far, so good. But what about all the other content in the page? We need to divide it into two groups: content items that are associated in some way with the main page content and content items that are more general to the site.

<main>
<article>
<h1 >< /h1>
Specific page content
[Additional content directly associated with the article content]
</article>
[Additional content NOT associated with the article content]
</main>

Look at the table below for some ideas about what additional content needs to be inside the article tags and what needs to stay outside.

Page type	Additional content inside the <article>	Additional content outside the <article>
Blog article	Author information Comments Ratings Associated articles	Links to other blog categories Product promotions Sign-up form Any other unrelated content
Product page	Reviews and ratings of the product Comments about the product Mentions of the product elsewhere on the web Similar products Links to blog articles associated with the product	Links to other product ranges Products on special offer Latest blog articles Sign-up forms Any other unrelated content

How do we tell the machine that this content that we have just defined as “additional content” is just that? This is where the <aside> tag comes into play.

The <aside> tag

This is what our simplified code will look like when we have included the <aside> tags:

<main>
<article>
<h1></h1>
Specific page content
<aside>
[Additional content directly associated with the article content]
</aside>
</article>
<aside>
[Additional content NOT associated with the article content]
</aside>
</main>

Now we have told search engines to ignore anything in the <aside> tag and to not consider it as part of the main content.

Back to our example

In the example I gave at the start of the article, the car leasing site, we can tell machines to ignore the pollution like this:

<main>
<article>
<h1>BMW 1 Series Hatchback</h1> Specific page content about the BMW 1 Series Hatchback
<aside>
Similar offers mentioning Audi, Mercedes and Jaguar
Blog articles specifically about the BMW series 1 or BMW in general*
</aside>
</article>
<aside>
Blog articles about the car industry in general
</aside>
</main>

In this way we have told the machines that:

The article is exclusively about the BMW 1 series hatchback.
There are similar offers that may help give context to the article but should not be considered as part of its content.
There is also some extra content present in the page but it should be right off the radar.

* NOTE: if you have blog articles specifically about the model of BMW, or about BMW in general you can put them in the <aside> inside the article as it will enhance the article context.

But what about the <section> tag?

This, IMHO, is one of the most badly-used of all the HTML5 semantic tags! I’ve seen <section>s in the <header>, <section>s in the <footer>, <section>s in other <section>s and worse.

The trouble with the <section> tag is that it logically needs to be a section of something. Just throwing a load of <section>s into the HTML is more or less the same as using <div>s because we have no idea what their purpose is.

Don’t forget that we should not use these semantic tags to structure the page layout visually, and simply putting <h2> and <h3> tags into the text will break it up into hierarchical sections. So why use <section>s at all?

It is true that <section>s allow you to legally put more <h1> tags in a page: look at this video made from the Google Webmaster YouTube channel in 2017. However, outside of a few special cases, such as a page just listing blog articles with an image title and text extract for each one, you need to be very careful or you’re just risking making the page content more confusing for the machines again.

I think that the only valid usage of <section>s is inside the <article> tag, if the article has chunks that have a meaning as self-contained blocks of information. <section> tags can tell the machine that this block of content can be indexed as fragment that has value in itself. Look at our example again. Here I have kept just the <article> tag and added some sections:

<article>
<h1>BMW 1 Series Hatchback</h1>
<p>General page content about the BMW 1 Series Hatchback</p>
<section>
<h2>Standard equipment of the BMW 1 Series
Hatchback</h2>
<p>Bla bla bla about equipment</p>
</section>
<section>
<h2>Technical specifications of the BMW 1 Series
Hatchback/ h2>
<p>Bla bla bla about technical specs</p>
</section>
<section>
<h2>How fast is BMW 1 Series Hatchback?</ h2>
<p>Bla bla bla about speed</p>
</section>
<aside>
Similar offers mentioning Audi, Mercedes and Jaguar.
Blog articles specifically about the BMW series 1 or BMW in general.
</aside>
</article>

In the preceding example we can see that the section about the technical specifications of the BMW 1 Series Hatchback can be isolated and indexed as a free-standing fragment, or “Fraggle” as defined by Cindy Krum, if done correctly and if the page has authority, a <section> may even end up in the position 0 in the Google search results!

Conclusion

In the article we have seen that following a few basic rules can make your page much easier to understand for machines like search engines:

Use of a simple, logical and consistent semantic HTML5 structure will enable Google’s templating algorithm to understand where the important unique content is for each page. Whatever makes life easier for search engines has to be good!
Using semantic HTML5 tags focuses on the real specific content of the page and excludes other content that may be detrimental to to the theme.
Good semantic HTML5 structure is key for enhancing web accessibility. Don’t forget that search engine robots are the biggest group of blind users on the internet!
Do not add any styles to the semantic tags. You need to be able to add, delete or move them with it affecting the visible page layout. Note that the <main>, <article> and <aside> tags have a default value of “block” for the css display attribute and you may want to reset that, too.

A step further

If you want to use the full power of semantic markup you can look at using <figure> and <figcaption> for images. You can also structure data in html tables using the <thead>, <th>, <tbody> and <caption> tags to tell the machine exactly what the table is about and what the data in each column means: Google has a separate experimental database for tables and providing it with self-contained tables makes it much more likely to get your table displayed in the position 0 in the search results.

Resources

The Wikipedia page about HTML5 and the new semantic elements.

Detailed descriptions of the semantic elements can be found on W3C Schools website.

As far as I know the only tool available to inspect the semantic HTML5 structure is one that I had to write myself: you can find it here.

Hugo Scott See all their articles

Hugo is an SEO consultant specializing in technical SEO. With 25 years’ experience of SEO, coding and site development, he offers a range of services in the domain of both technical and semantic SEO that will make a real difference to your site.

Comments are closed.