An XML sitemap is a file that lists important URLs you want search engines to discover and crawl. It helps with discovery for large, new, or complex sites. Create it, host it on your domain, validate it, then submit it so you can monitor errors and coverage.
Prerequisites
- A crawlable site: If bots can’t access pages (blocked, login-only, server errors), a sitemap won’t fix that.
- Canonical plan: Each piece of content should have one “main” URL (the canonical version).
- A generator: CMS feature, plugin, or script that produces XML and updates as content changes.
Step-by-step guide
Step 1: Decide what belongs in the sitemap
A sitemap should list indexable, canonical URLs you want in search. It should not be “every URL that exists.”
Include pages that are:
- Canonical versions of your content
- Live and accessible (normal 200 response)
Avoid URLs that are:
- Redirects (301/302)
- 404/410 pages
- Blocked by robots.txt
- Marked noindex
- Parameter duplicates (tracking and endless filters)
Checkpoint:
- You can explain why each listed URL deserves indexing.
- You list the preferred version of each page (not duplicates).
Step 2: Generate the XML file
Use your CMS sitemap feature if available. If you have many URLs, use a setup that can create a sitemap index with multiple child sitemaps.
Checkpoint:
- The sitemap URL loads in a browser and shows XML (not an error page).
Step 3: Host it at a stable URL
Put the sitemap on your domain at a predictable location, such as /sitemap.xml or /sitemap_index.xml. Keep the URL stable so tools and bots can rely on it.
Checkpoint:
- The sitemap returns a normal 200 response consistently.
Step 4: Validate format and URL quality
Validation means (1) the XML is well-formed and (2) the URLs are the ones you want indexed. Spot-check a handful of entries and confirm they match the canonical URL on the page.
Checkpoint:
- No redirects, blocked URLs, or obvious duplicates are listed.
Step 5: Add it to robots.txt
Add a sitemap line so crawlers can discover it easily. Example:
Sitemap: https://example.com/sitemap.xml
Checkpoint:
robots.txtis accessible and the sitemap URL is correct.
Step 6: Submit it in Search Console tools
Submit the sitemap so you can see fetch status, discovered URLs, and processing errors. Treat the report as a monitoring dashboard, not a guarantee of indexing.
Checkpoint:
- The console shows the sitemap as processed or gives clear errors to fix.
Step 7: Monitor coverage and fix what the data reveals
Coverage reports often reveal canonical conflicts, duplicates, blocked pages, or weak pages that don’t earn indexing. Fix the cause, then let crawlers revisit the site.
Checkpoint:
- Exclusions have clear causes and your sitemap matches your indexing goals.
Common mistakes & troubleshooting
Use this table to match what you see in Search Console (or logs) with likely causes and practical next steps.
| Problem you see | Likely cause | What to do next |
|---|---|---|
| Sitemap submitted but shows “couldn’t fetch” | Wrong URL, server blocks, or authentication required | Open the sitemap URL, confirm it returns 200, then check server logs for blocked requests. |
| Many submitted URLs are “excluded” | Canonical conflicts, duplicates, or weak pages | Check canonicals, consolidate duplicates, and improve pages that don’t add unique value. |
| Sitemap contains redirects | Generator is listing old URLs | Update the generator and list only final 200 URLs (the destination after redirects). |
| Sitemap lists URLs blocked by robots.txt | Robots rules are too broad | Fix robots rules or remove blocked URLs from the sitemap so it matches your indexing intent. |
| New pages take time to show up | Crawling and indexing are not instant | Ensure internal links exist, keep the sitemap current, and monitor over time. |
| Multiple sitemaps disagree | Old sitemap files are still live | Remove/redirect outdated sitemaps and keep one source of truth (or a sitemap index). |
Advanced tips
- Split by type: Separate sitemaps for posts, pages, or products can make debugging easier.
- Keep URLs clean: Avoid tracking parameters in sitemap entries.
- Don’t rely on sitemaps alone: Strong internal linking still matters for discovery.
Sitemap verification checklist
- Accessible: Sitemap URL loads publicly and returns 200.
- Canonical-only: Listed URLs match the canonical version of each page.
- Indexable-only: No redirects, errors, blocked URLs, or noindex pages included.
- Fresh: New/updated pages appear when expected.
- robots.txt: Sitemap line points to the correct URL.
- Console health: Submission succeeds and errors are understood.
Frequently Asked Questions
Do I need a sitemap for a small site?
Not always, but it can help with discovery and monitoring. If your site is small and well-linked, search engines can still find most pages without a sitemap.
What’s the difference between an XML sitemap and an HTML sitemap?
An XML sitemap is mainly for search engines. An HTML sitemap is a navigation page for people.
Does submitting a sitemap guarantee indexing?
No. Submission helps discovery. Indexing depends on content quality, canonical signals, and whether the page is worth showing in search.
Should I include noindex pages in my sitemap?
Usually no. Keep the sitemap focused on pages you want indexed.
How often should I update my sitemap?
Update whenever important URLs change. Many CMS systems update automatically; if yours doesn’t, update after publishing, moving, or deleting key pages.
Can I have multiple sitemaps?
Yes. Large sites often use a sitemap index that points to several sitemap files. The key is consistency: don’t list the same content under multiple URL versions.