XML Sitemaps Explained: What They Are and Why They Matter
An XML sitemap is a file that lists the URLs on your website that you want search engines to find and index. It is essentially a roadmap of your site, telling Google and other search engines which pages exist, when they were last updated, and how important they are relative to other pages on your site.
Search engines can discover pages without a sitemap by following links. But relying on link discovery alone means some pages may never get found, especially on larger sites where not every page is well-linked internally. A sitemap fills the gaps and gives you explicit control over which pages search engines know about.
How Search Engines Use Sitemaps
When Googlebot encounters your sitemap (either by finding it at /sitemap.xml or through your robots.txt file), it uses the information to prioritize its crawling:
URL discovery. The primary function. Every URL in your sitemap is a page Google knows about. This is critical for new pages that have not yet been linked from existing content.
Last modified date. The <lastmod> tag tells Google when a page was last changed. Google uses this as a hint for when to re-crawl the page. If a page's lastmod date has not changed since the last crawl, Google may skip it in favor of pages that have been updated.
Crawl prioritization. While Google has stated it largely ignores the <priority> tag, the <changefreq> tag and lastmod dates do influence how Google allocates its crawl budget across your site.
Important clarification: Being in a sitemap does not guarantee indexing. Google still evaluates every page individually and decides whether it is worth indexing based on content quality, relevance, and other signals. A sitemap gets your pages discovered, not indexed.
What an XML Sitemap Looks Like
A basic XML sitemap follows a simple structure:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://example.com/</loc>
<lastmod>2026-03-21</lastmod>
</url>
<url>
<loc>https://example.com/blog/</loc>
<lastmod>2026-03-20</lastmod>
</url>
<url>
<loc>https://example.com/pricing/</loc>
<lastmod>2026-03-15</lastmod>
</url>
</urlset>
Each <url> entry contains:
<loc>(required): The full URL of the page<lastmod>(recommended): The date the page was last modified, in W3C datetime format<changefreq>(optional): How often the page changes (daily, weekly, monthly, etc.)<priority>(optional): A value from 0.0 to 1.0 indicating relative importance
In practice, <loc> and <lastmod> are the only tags worth including. Google has publicly stated that it ignores <changefreq> and <priority>.
Sitemap Index Files
A single XML sitemap has a limit of 50,000 URLs and 50 MB (uncompressed). For larger sites, you use a sitemap index file that references multiple individual sitemaps:
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>https://example.com/sitemap-posts.xml</loc>
<lastmod>2026-03-21</lastmod>
</sitemap>
<sitemap>
<loc>https://example.com/sitemap-pages.xml</loc>
<lastmod>2026-03-15</lastmod>
</sitemap>
<sitemap>
<loc>https://example.com/sitemap-products.xml</loc>
<lastmod>2026-03-18</lastmod>
</sitemap>
</sitemapindex>
Splitting sitemaps by content type (posts, pages, products) is a best practice even for sites under 50,000 URLs. It makes debugging easier and lets you monitor indexing rates by content type in Google Search Console.
Check your XML sitemap for issues
The free XML Sitemap Checker validates your sitemap structure and flags broken URLs, missing tags, and other problems.
Try Content Raptor FreeNo credit card required
When You Need a Sitemap
Not every site needs one, but most benefit from having one. Google's own documentation lists the scenarios where sitemaps are especially useful:
You Definitely Need a Sitemap If:
- Your site has more than a few hundred pages. The larger your site, the more likely some pages will be missed by link-based discovery.
- Your site is new and has few external backlinks. Without inbound links, Google has limited ways to discover your pages.
- You have pages that are not well-linked internally. Orphan pages or pages deep in your site architecture depend on sitemaps for discovery.
- Your site uses a lot of rich media or appears in Google News. Specialized sitemap types exist for video, image, and news content.
- You publish content frequently. A sitemap with accurate lastmod dates helps Google prioritize crawling your newest content.
You Might Not Need One If:
- Your site has fewer than 500 pages and strong internal linking. If every page is reachable within a few clicks from the homepage and you have decent external links, Google will find everything through crawling.
- You have a single-page site or simple brochure site. The overhead of creating and maintaining a sitemap is not worth it for a handful of pages.
Even in cases where a sitemap is not strictly necessary, having one does not hurt. It provides an extra layer of assurance that Google knows about all your pages.
How to Create an XML Sitemap
Dynamic Generation (Recommended)
Most modern CMS platforms and frameworks generate sitemaps automatically:
- WordPress: Plugins like Yoast SEO and Rank Math generate sitemaps that update when you publish or modify content.
- Next.js / Nuxt / Astro: Built-in sitemap generation or official plugins that build sitemaps at compile time or server-side.
- Shopify, Squarespace, Wix: Automatic sitemap generation included in the platform.
Dynamic generation is the best approach because the sitemap stays current as your content changes. You never have to remember to update it manually.
Static Generation
For simple sites, you can create a sitemap manually or use a generator tool. The Sitemap URL Extractor can help you pull URLs from an existing sitemap to audit or rebuild it.
If you generate a static sitemap, set a reminder to regenerate it whenever you add or remove pages.
Sitemap Generators
Online tools like XML-Sitemaps.com and Screaming Frog can crawl your site and generate a complete sitemap. These are useful as a starting point or for auditing, but a dynamically generated sitemap should be your production solution.
How to Submit Your Sitemap
Google Search Console
The most direct method. Go to Google Search Console, navigate to Sitemaps in the left sidebar, enter your sitemap URL, and click Submit. Google will begin processing it and report any errors.
After submission, check back periodically. The Sitemaps report shows:
- How many URLs were discovered
- How many were successfully indexed
- Any errors encountered during processing
robots.txt
Add a sitemap directive to your robots.txt file:
Sitemap: https://example.com/sitemap.xml
This tells every search engine crawler where to find your sitemap, not just Google. It is a passive approach; crawlers check robots.txt and follow the sitemap reference on their own schedule.
Ping (Deprecated for Google)
Google used to support pinging their sitemap endpoint to trigger re-crawling. This was deprecated in 2023. Submit through Search Console or list in robots.txt instead.
Common Sitemap Mistakes
Including Noindex Pages
If a page has a noindex meta tag or X-Robots-Tag header, it should not be in your sitemap. Including noindex URLs creates a conflict: the sitemap says "index this page" while the page itself says "do not index me." Google will honor the noindex directive, but the contradiction wastes crawl budget and generates warnings in Search Console.
Use the Crawlability Checker to verify that pages in your sitemap are actually crawlable and indexable.
Including Redirected URLs
Every URL in your sitemap should return a 200 status code. URLs that redirect (301 or 302) should be replaced with the final destination URL. Including redirect chains in your sitemap wastes crawl budget and shows Google that you are not maintaining your sitemap carefully.
Including Broken URLs
Pages that return 404 or 500 errors should never be in your sitemap. Google will eventually stop crawling them, but in the meantime, those requests consume your crawl budget and generate errors in Search Console.
Stale lastmod Dates
Setting all lastmod dates to today's date, or never updating them, makes the tag useless. Google will stop trusting lastmod on your site entirely if the dates do not reflect actual changes. Only update lastmod when the page content genuinely changes.
Missing the Sitemap Entirely
Surprisingly common. Sites get launched without a sitemap, or a site migration breaks the sitemap URL without anyone noticing. Check your sitemap URL regularly and verify it in Search Console.
Including Non-Canonical URLs
If a page has a canonical tag pointing to a different URL, the canonical URL should be in the sitemap, not the non-canonical version. Including non-canonical URLs sends mixed signals about which version of the page Google should index.
Monitoring Sitemap Health
A sitemap is not a "set it and forget it" file. Regular monitoring catches problems before they affect your indexing.
Weekly Checks
- Verify your sitemap URL returns a 200 status code
- Check the XML Sitemap Checker for structural issues
- Review Google Search Console's Sitemaps report for new errors
Monthly Checks
- Compare the number of URLs in your sitemap to the number of indexed pages in Search Console. A large gap indicates indexing problems.
- Verify that new content published in the last month appears in the sitemap
- Check that deleted or redirected pages have been removed from the sitemap
After Site Changes
Any time you make significant changes to your site structure, URL patterns, or CMS, re-check your sitemap. Migrations and redesigns are the most common cause of sitemap breakage.
Specialized Sitemaps
Beyond the standard XML sitemap, Google supports specialized formats for specific content types:
- Image sitemap: Lists images for Google Image Search indexing. Can be embedded within your standard sitemap or provided as a separate file.
- Video sitemap: Required for Google to properly index video content. Includes thumbnail URL, video title, description, and duration.
- News sitemap: For Google News publishers. Includes publication name, language, and article title. Articles must have been published within the last two days.
Most sites only need a standard XML sitemap. Specialized sitemaps are worth the effort only if you produce significant amounts of that content type and want it indexed in the specialized search verticals.
Make sure Google can find all your pages
Use the free XML Sitemap Checker to validate your sitemap and catch issues before they affect your indexing.
Try Content Raptor FreeNo credit card required