A sitemap.xml file is a list of the pages on your website that you want indexed by search engines. It should be placed at the root of your website (not in a folder). By submitting a sitemap to search engines, you can inform them of new or modified pages on your site. Sitemaps are essential for large sites, for submitting new pages to Google, and for sites with content that changes frequently.
XML sitemaps use standard formatting to provide optional information for the URLs they contain, such as hreflang (language and region) declarations, last modified dates, and update frequencies for each page. Additional specific information can be provided for sitemaps containing news, videos, or images.
What pages should you include in an XML sitemap?
Even when you generate sitemaps dynamically, it’s hard to have a global view of the URLs included in a sitemap. This is why Oncrawl checks the pages in your sitemap against all of the known pages in your site.
- Exclude pages that do not need to be indexed by search engines. (Note that this does not prevent them from accessing or indexing those pages! Don’t forget to prevent indexing using meta robots attributes or directives in the robots.txt file)
- Include all pages you want users to find: landing pages, news, product, blog, and category pages optimized for SEO
Why use a sitemap for cross-analysis with crawl data?
Oncrawl’s cross-analysis with the URLs in your sitemap leverages existing sitemap data to spot ways to improve your SEO.
- Discover orphan pages known to Google through a sitemap but not linked to your site structure
- Use Oncrawl to find pages that can be discovered by bots crawling your site but that should be noindexed