- Use cases
- Customer Success
- LOG IN
- Start free trial
XML sitemaps must be a vital piece of your technical SEO strategy. XML sitemaps help search engines crawl your website more efficiently.
In this guide, I’ll cover what XML sitemaps are and how you can optimize them to improve your SEO performance.
An XML sitemap is a file that provides a list of URLs for search engines, such as Google, Bing and Yahoo, to crawl. XML sitemaps can contain attributes that provide additional information about each URL for crawlers.
If you are unfamiliar with XML sitemaps I recommend you read another article on the Oncrawl blog: XML sitemaps: the Swiss army knife of technical SEO. The article gives a nice overview of XML sitemaps for beginners and some basic use cases.
There are many reasons why XML sitemaps are important for SEO but the bottom line is that XML sitemaps help search engines discover your pages.
Providing an XML sitemap will make sure that important pages on your website are crawled efficiently. In fact, Gary Illyes, a webmaster trends analyst at Google, said at the Search Marketing Conference in Sydney, that Googlebot uses sitemaps to discover content.
80% of discovery is following links, close to 20% is just following Sitemaps.
You can also read more about the importance of an XML sitemap in SEO here on the OnCrawl blog.
Remember that a sitemap is providing a list of URLs for search engines to crawl. So, it’s critical we discuss what is a crawl budget and how it affects XML sitemaps.
Here’s how Google define crawl budget:
Taking crawl rate and crawl demand together we define crawl budget as the number of URLs Googlebot can and wants to crawl.
– Google Webmasters Blog
The important thing to understand in the context of XML sitemaps is that Googlebot will only crawl a certain number of URLs, and this may not cover all your URLs. Providing XML sitemaps can use your crawl budget more efficiently because Googlebot will know to crawl more the important URLs you provide them in the sitemap over low-value URLs.
XML sitemaps won’t prevent Google from crawling low-value URLs all together but it provides an indication of the URLs Googlebot should focus on.
It’s best practice to submit your XML sitemaps to search engines via their webmaster tools consoles. If you don’t have access, here are some guides to set them up:
By doing this you get access to handy data, such as any errors, last crawled date and how many URLs were discovered. For more details, you can read my guide to submitting your website to search engines, such as Google and Bing.
For web content (ex. Images & Videos) there are two types of XML sitemaps: a sitemap index and a sitemap file. I’ll cover briefly cover them below, but make sure you check out the major search engines documentation.
A sitemap index file is simply a sitemap for your sitemaps. You provide the location of a sitemap file and also when it was last modified.
<?xml version=”1.0″ encoding=”UTF-8″?>
A sitemap file is a list of URLs that you want Googlebot to crawl. The sitemap file contains additional information, such as the last modified date, how often the content changes and the priority on a scale of 0.0-1.0.
<?xml version=”1.0″ encoding=”UTF-8″?>
Beyond just having an XML sitemap, there are several things you can do to optimize it to improve your SEO performance.
Before we look at any optimization tips, it’s vital that your sitemaps follow the sitemps.org protocol for search engines to understand them.
For your XML sitemap to be supported by major search engines it must:
One of my favourite use cases for XML sitemaps is to monitor valid and excluded URLs by site section in Google Search Console.
To do this you need to create sitemap index files for each section of your website. Here’s an example of how that may look:
You should only include URLs in your XML sitemap files that you won’t Googlebot to crawl, index and rank. These pages are often referred to as ‘money pages’ because they are the ones that make you money.
There is no reason to provide URLs in your sitemap that aren’t providing any SEO benefits to your website.
Avoid including URLs that return non-200 HTTP response code in your XML sitemap. Including non-200 response is bad for your SEO because you’re telling Googlebot you want these URLs crawled even though they are wasting your crawl budget.
URLs that are marked noindex have no place in your XML sitemaps (other than specific use cases). URLs that are noindex will not bring you traffic from search engines, so there is no reason for Googlebot to crawl them after they are dropped from the indexed.
Note: you can use a temporary XML sitemap if you want Google to see a noindex tag on a large number of URLs quickly.
URLs that are canonicalized to another URL have no place in your XML sitemap, either. Just like noindex, you don’t want these URLs to rank on Google, so there isn’t any reason to tell Googlebot to crawl them.
XML sitemaps are an important piece of your technical SEO strategy, so you should take the time to optimize them. This guide provides you with 6 actionable tips you can implement right away.