XML Sitemap Checker
Enter a domain to check if it has a discoverable XML sitemap. Sitemaps help search engines crawl your site more effectively.
What is an XML Sitemap?
An XML sitemap is a file that lists the important pages of your website, making sure search engines like Google can find and crawl them all. It acts like a roadmap for your site, guiding crawlers to all your desired content.
Why is it Important?
- Improved Crawlability: Helps search engines discover pages they might otherwise miss, especially on large or complex sites.
- Faster Indexing: Can speed up the process of getting new or updated content indexed by search engines.
- Content Prioritization: Allows you to provide metadata about pages, such as last modification date and update frequency, signaling importance.
- Better Coverage: Particularly useful for sites with deep content archives, dynamic pages, or limited internal linking.
Common Sitemap Locations
Sitemaps are typically found at standard locations, or their location is specified in the site's `robots.txt` file. Common URLs include:
- /sitemap.xml
- /sitemap_index.xml (for sitemap index files)
- /sitemap.php (dynamically generated)
- /sitemap-*.xml (common pattern for multiple sitemaps)
Webmasters should also declare the sitemap location in their `robots.txt` file using the `Sitemap:` directive (e.g., `Sitemap: https://example.com/sitemap.xml`).
What Should Be Included?
Your sitemap should list the canonical URLs of pages you want search engines to crawl and index. This generally includes:
- Your main website pages (homepage, about, contact, etc.)
- Product pages (for e-commerce sites)
- Blog posts and articles
- Category and tag archive pages (if valuable)
- Important landing pages
Avoid including URLs that are non-canonical, blocked by `robots.txt`, require login, or are unlikely to be useful to searchers (e.g., internal search results pages).
Sitemap Index Files
For large websites, a single sitemap can become too big (limits are typically 50,000 URLs or 50MB uncompressed). In these cases, you can use a sitemap index file, which is essentially a sitemap that lists other sitemaps. This helps organize your URLs logically (e.g., by section or type).
If this tool finds a sitemap index file (often named `sitemap_index.xml`), it means the website uses multiple sitemaps. You can open the index file to see the list of individual sitemaps.
Troubleshooting Sitemap Issues
Common Sitemap Problems
- Not Found (404 Error): The sitemap URL is incorrect or the file doesn't exist.
- Blocked by robots.txt: The sitemap file itself is disallowed in `robots.txt`.
- Incorrect Format: The XML structure is invalid or doesn't follow the sitemap protocol.
- Contains Blocked URLs: Listing URLs in the sitemap that are disallowed by `robots.txt`.
- Contains Non-Canonical URLs: Including URLs that redirect or have a different canonical tag specified.
- Too Large: Exceeding the size or URL count limits.
- Compression Errors: Issues with Gzip compression if used.
Use Google Search Console's Sitemaps report to diagnose specific errors reported by Google.
Sitemap Best Practices
- Keep your sitemap up-to-date with new and removed content.
- Only include canonical URLs you want indexed.
- Ensure the sitemap is UTF-8 encoded.
- Use consistent, absolute URLs (including the protocol `http` or `https`).
- Declare the sitemap location in `robots.txt`.
- Submit your sitemap(s) via Google Search Console.
- Use a sitemap index file if you have more than 50,000 URLs.
Ensure Your Content Gets Seen
A valid sitemap is step one. Content Raptor helps you optimize the content itself, ensuring it ranks higher once search engines find it. Analyze your content against top competitors.
Optimize Your Content with Content RaptorFree 7-day trial available.