xml sitemap

How to Optimize Your XML Sitemap to Improve Your SEO

July 19, 2019 - 5  min reading time - by Tom Donohoe
Home > SEO Thoughts > Optimize Your XML Sitemap to Improve Your SEO

XML sitemaps must be a vital piece of your technical SEO strategy. XML sitemaps help search engines crawl your website more efficiently.

In this guide, I’ll cover what XML sitemaps are and how you can optimize them to improve your SEO performance.

What are XML sitemaps?

An XML sitemap is a file that provides a list of URLs for search engines, such as Google, Bing and Yahoo, to crawl. XML sitemaps can contain attributes that provide additional information about each URL for crawlers.

If you are unfamiliar with XML sitemaps I recommend you read another article on the Oncrawl blog: XML sitemaps: the Swiss army knife of technical SEO. The article gives a nice overview of XML sitemaps for beginners and some basic use cases.

Why are XML sitemaps important for SEO?

There are many reasons why XML sitemaps are important for SEO but the bottom line is that XML sitemaps help search engines discover your pages.

Providing an XML sitemap will make sure that important pages on your website are crawled efficiently. In fact, Gary Illyes, a webmaster trends analyst at Google, said at the Search Marketing Conference in Sydney, that Googlebot uses sitemaps to discover content.

80% of discovery is following links, close to 20% is just following Sitemaps.
Source

You can also read more about the importance of an XML sitemap in SEO here on the Oncrawl blog.

A quick note on crawl budget and XML sitemaps

Remember that a sitemap is providing a list of URLs for search engines to crawl. So, it’s critical we discuss what is a crawl budget and how it affects XML sitemaps.
Here’s how Google define crawl budget:

Taking crawl rate and crawl demand together we define crawl budget as the number of URLs Googlebot can and wants to crawl.
Google Webmasters Blog

The important thing to understand in the context of XML sitemaps is that Googlebot will only crawl a certain number of URLs, and this may not cover all your URLs. Providing XML sitemaps can use your crawl budget more efficiently because Googlebot will know to crawl more the important URLs you provide them in the sitemap over low-value URLs.

XML sitemaps won’t prevent Google from crawling low-value URLs all together but it provides an indication of the URLs Googlebot should focus on.

[Case Study] Optimize links to improve pages with the greatest ROI

Over a period of two years, RegionsJob tackled the challenge of improving its ROI by fine-tuning the internal linking structure of its website in order to create an efficient architecture. This strategy concentrated on SEO actions that supported the website’s goals. Based on KPIs for page profitability, RegionsJob implemented modifications that would create a website with a better user conversion rate.

Submit your XML sitemaps to search engines

It’s best practice to submit your XML sitemaps to search engines via their webmaster tools consoles. If you don’t have access, here are some guides to set them up:

By doing this you get access to handy data, such as any errors, last crawled date and how many URLs were discovered. For more details, you can read my guide to submitting your website to search engines, such as Google and Bing.

Types of XML sitemap

For web content (ex. Images & Videos) there are two types of XML sitemaps: a sitemap index and a sitemap file. I’ll cover briefly cover them below, but make sure you check out the major search engines documentation.

Sitemap Index file

A sitemap index file is simply a sitemap for your sitemaps. You provide the location of a sitemap file and also when it was last modified.

<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>http://www.example.com/sitemap1.xml.gz</loc>
<lastmod >2004-10-01T18:23:17+00:00</lastmod>
</sitemap>
<sitemap>
<loc>http://www.example.com/sitemap2.xml.gz</loc>
<lastmod>2005-01-01</lastmod>
</sitemap>
</sitemapindex>

Sitemap file

A sitemap file is a list of URLs that you want Googlebot to crawl. The sitemap file contains additional information, such as the last modified date, how often the content changes and the priority on a scale of 0.0-1.0.

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>http://www.example.com/</loc>
<lastmod>2005-01-01</lastmod>
<changefreq>monthly</changefreq>
<priority>0.8</priority>
</url>
</urlset>

6 tips to optimise your XML sitemap

Beyond just having an XML sitemap, there are several things you can do to optimize it to improve your SEO performance.

1. Follow the sitemaps.org protocol

Before we look at any optimization tips, it’s vital that your sitemaps follow the sitemps.org protocol for search engines to understand them.

For your XML sitemap to be supported by major search engines it must:

  • Begin with an opening <urlset> tag and end with a closing </urlset> tag.
  • Specify the namespace (protocol standard) within the <urlset> tag.
  • Include a <url> entry for each URL, as a parent XML tag.
  • Include a <loc> child entry for each <url> parent tag.

2. Structure XML sitemaps by site section

One of my favourite use cases for XML sitemaps is to monitor valid and excluded URLs by site section in Google Search Console.
To do this you need to create sitemap index files for each section of your website. Here’s an example of how that may look:

Index.xml
products-index.xml
product.xml
product1.xml
blog-index.xml
blog.xml
blog-1.xml
some-directory-index.xml
directory.xml
directory-1.xml

3. Only include your ‘money pages’

You should only include URLs in your XML sitemap files that you won’t Googlebot to crawl, index and rank. These pages are often referred to as ‘money pages’ because they are the ones that make you money.
There is no reason to provide URLs in your sitemap that aren’t providing any SEO benefits to your website.

4. Avoid non-200 HTTP status codes

Avoid including URLs that return non-200 HTTP response code in your XML sitemap. Including non-200 response is bad for your SEO because you’re telling Googlebot you want these URLs crawled even though they are wasting your crawl budget.

5. Avoid noindexed URLs

URLs that are marked noindex have no place in your XML sitemaps (other than specific use cases). URLs that are noindex will not bring you traffic from search engines, so there is no reason for Googlebot to crawl them after they are dropped from the indexed.
Note: you can use a temporary XML sitemap if you want Google to see a noindex tag on a large number of URLs quickly.

6. Avoid URLs that canonicalise to another URL

URLs that are canonicalized to another URL have no place in your XML sitemap, either. Just like noindex, you don’t want these URLs to rank on Google, so there isn’t any reason to tell Googlebot to crawl them.

The bottom line

XML sitemaps are an important piece of your technical SEO strategy, so you should take the time to optimize them. This guide provides you with 6 actionable tips you can implement right away.

Tom is an SEO consultant at Tom Donohoe Consulting, with experience working on SEO for enterprise brands in Australia. I'm a big evangelist of all things digital, lover of tech SEO, data nerd and frequent blog contributor. Follow Tom on Twitter visit him at https://www.tomdonohoe.com.au
Related subjects: