- An XML sitemap is a file at
/sitemap.xmlthat lists your canonical URLs so search engines can discover them efficiently. - Required per URL:
<loc>(the URL). Recommended:<lastmod>with real timestamps.<priority>and<changefreq>are mostly ignored by Google in 2026. - Use a sitemap index when you exceed 50,000 URLs or 50MB per file. Most ecommerce stores need an index by year 2.
- Split sitemaps by content type (products, collections, blog, pages) for easier per-type indexing diagnostics in Search Console.
- Submit to both Google Search Console and Bing Webmaster Tools. Both expose per-sitemap submission status and error reporting.
Chapter 1. Before you start
XML sitemaps are the second-oldest "files at site root that search engines read" standard after robots.txt. They've been a stable spec since 2005, supported by Google and Bing without major changes for two decades. Get a sitemap shipped within the first week of any new site launch.
- Confirm your canonical URL pattern. Is your site
https://example.comorhttps://www.example.com? Sitemap entries must match canonical, not the alternate. - Catalog the URL types you want indexed. Products, collections, blog posts, static pages, author profiles. Each may benefit from its own sub-sitemap.
- Confirm your CMS auto-generates sitemaps. Shopify, WordPress (with Yoast or Rank Math), Webflow, and most modern CMSs ship sitemap generation built-in. Skip writing XML by hand unless your stack requires it.
- Decide on a sitemap-update cadence. Real-time on URL publish is ideal. Daily regeneration is fine for most stores. Weekly is the minimum acceptable.
Chapter 2. What goes in an XML sitemap?
Canonical URLs you want indexed. That's it. Per the
sitemaps.org protocol,
each URL entry needs <loc> (the URL) and can optionally include
<lastmod>, <changefreq>, and
<priority>. The <urlset> root element uses the
http://www.sitemaps.org/schemas/sitemap/0.9 namespace.
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://www.example.com/</loc>
<lastmod>2026-05-20T08:00:00-05:00</lastmod>
</url>
<url>
<loc>https://www.example.com/products/example</loc>
<lastmod>2026-05-15T10:30:00-05:00</lastmod>
</url>
</urlset>
Don't include:
- URLs that 404, redirect, or noindex. Google will crawl them, find they're not indexable, and waste crawl budget.
- Duplicate URLs. List each canonical once.
- Non-canonical URL variants. If your canonical is
https://example.com/page, don't also listhttps://www.example.com/page. - URLs behind login or authentication. Search engines can't access them anyway.
- Pagination URLs, filter URLs, or search-result URLs unless they have unique indexable content.
Chapter 3. Basic sitemap vs sitemap index
Use a basic single-file sitemap when you have under 50,000 URLs and the file is under 50MB. Both are hard limits per the sitemap spec; exceed either and search engines reject the file or only read part of it. For everything larger, use a sitemap index that references multiple sub-sitemap files.
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>https://www.example.com/sitemap-products.xml</loc>
<lastmod>2026-05-20</lastmod>
</sitemap>
<sitemap>
<loc>https://www.example.com/sitemap-collections.xml</loc>
<lastmod>2026-05-20</lastmod>
</sitemap>
<sitemap>
<loc>https://www.example.com/sitemap-blog.xml</loc>
<lastmod>2026-05-20</lastmod>
</sitemap>
<sitemap>
<loc>https://www.example.com/sitemap-pages.xml</loc>
<lastmod>2026-05-20</lastmod>
</sitemap>
</sitemapindex>
The index lives at /sitemap.xml (same path as a basic sitemap); the sub-sitemaps
live at whatever paths you choose. Sub-sitemaps are normal sitemap files with the same
50,000 URL / 50 MB caps applied per file. Sitemap indexes can themselves reference up to
50,000 sub-sitemaps - effectively unlimited for any normal site.
Chapter 4. lastmod, priority, and changefreq in 2026
Of the three optional fields, only lastmod matters in 2026.
Google's John Mueller has publicly stated multiple times that priority and changefreq are
largely ignored by Google's crawler. lastmod, by contrast, is actively used to determine
crawl scheduling.
| Field | Used by Google? | Ship it? |
|---|---|---|
<lastmod> | Yes, for crawl scheduling | Yes - with REAL timestamps |
<changefreq> | Mostly ignored | Skip - waste of effort |
<priority> | Mostly ignored | Skip - waste of effort |
lastmod must be real. Sites that fake lastmod (regenerate every URL's lastmod to "today" on every sitemap rebuild, even when content hasn't changed) have their lastmod values devalued by Google. Real lastmod values - matching the actual modification time of the page's content - earn faster recrawl on pages that genuinely changed.
Format lastmod in W3C Datetime format: 2026-05-20 (date only) or
2026-05-20T08:00:00-05:00 (with time and timezone). Both are valid; the
time-included version is more precise and worth using when your CMS exposes it.
Chapter 5. Should you split sitemaps by content type?
Yes, for sites over 1,000 URLs. Splitting by content type
(sitemap-products.xml, sitemap-collections.xml,
sitemap-blog.xml, sitemap-pages.xml) makes diagnosis vastly
easier when something goes wrong.
The killer feature: in Google Search Console, you see per-sitemap indexing status. If your products sitemap shows 1,200 of 1,500 URLs indexed but your blog sitemap shows 50 of 200 indexed, you immediately know the indexing problem is on the blog, not the products. With one combined sitemap, you see one aggregate number and have no idea where to look.
- Under 1,000 URLs: one sitemap file is fine. Splitting is overhead with no benefit.
- 1,000-50,000 URLs: split by content type for diagnostics, even though you don't strictly need a sitemap index yet.
- Over 50,000 URLs: required to use a sitemap index. Split by content type by default.
- Over 500,000 URLs: consider also splitting by date (e.g.,
sitemap-products-2026.xml) for crawl-scheduling efficiency.
Chapter 6. Submit to Google and Bing
Two submissions, both critical. Submitting in Search Console and Bing Webmaster Tools triggers a discovery scan and exposes per-sitemap reporting you can't get any other way.
- Open Google Search Console → your property → Indexing → Sitemaps.
- Enter the sitemap path (just
sitemap.xml- GSC appends the verified domain) and click Submit. - Within 5 minutes the status should flip to "Success" with a URL count. If it shows "Couldn't fetch" or errors, open the sitemap URL in a browser to confirm it loads as valid XML.
- Open Bing Webmaster Tools → your site → Sitemaps. Submit the same path. Bing reports the same data with slightly different terminology.
- Add the sitemap URL to your
robots.txtas well:Sitemap: https://www.example.com/sitemap.xml. This is a backup discovery path for crawlers that read robots.txt before any sitemap submission UI.
For sitemap indexes, submit only the index URL. The sub-sitemaps are discovered automatically through the index references. Don't submit each sub-sitemap separately - it creates duplicate reporting.
Chapter 7. The breakages we see most often
Ranked by frequency across 47 ecommerce sitemap audits over the past 24 months:
- No XML sitemap at all: deep pages discovered slowly via internal links only. 14 of 47.
- Sitemap contains 404 or redirected URLs: wastes crawl budget, signals stale data. 8 of 47.
- Missing canonical URLs that are earning traffic: pages indexed via discovery but absent from the official map. 6 of 47.
- Faked lastmod values (every URL "today" on every rebuild): Google devalues the field. 5 of 47.
- Wrong canonical variant in sitemap (e.g., http instead of https, or www instead of apex): treated as duplicate. 4 of 47.
- Sitemap over 50MB or 50,000 URLs without being split: only partially read by Google. 3 of 47.
- Sitemap path missing from robots.txt: depends entirely on manual submission for discovery. 12 of 47.
We track sitemap completeness and validity continuously through our Sentry product's indexability rule set.
FAQ
Do I need an XML sitemap if my site is small?
Yes. Even a 20-page site benefits because the sitemap tells search engines which URLs to treat as canonical, when each was last modified, and which to prioritize for crawl. The file takes one minute to generate and ships free on every major CMS. The only sites that should skip it are single-page apps where the sitemap and the index page are the same URL.
Where exactly does the sitemap go?
Conventionally at /sitemap.xml at the site root - so
https://www.example.com/sitemap.xml. The path is configurable, but
/sitemap.xml is what Google and Bing check by default if you haven't
submitted a sitemap manually.
What's lastmod and why does it matter?
lastmod is the timestamp of when the URL's content was last meaningfully
changed. Google uses it for crawl scheduling - URLs with recent lastmod get recrawled
faster. The catch: lastmod must be REAL. Sites that fake it (regenerating every URL's
lastmod to "today" on every sitemap rebuild) have their lastmod values devalued.
Should I worry about priority and changefreq?
No. Google's John Mueller has publicly stated multiple times that priority and changefreq are largely ignored. Don't waste time tuning them. Some sitemap generators still emit them with default values; that's fine, it doesn't hurt, but don't optimize for them.
How big can a sitemap be?
Hard limits per the spec: 50,000 URLs or 50MB (uncompressed), whichever comes first. Exceed either and search engines reject the file or only read part of it. For larger sites, use a sitemap index that references multiple sub-sitemap files. Each sub-sitemap gets its own 50K / 50MB cap.
Should I split sitemaps by content type?
Yes for any site over 1,000 URLs. Splitting by content type (sitemap-products.xml, sitemap-blog.xml, sitemap-pages.xml) makes diagnosis vastly easier when something goes wrong - Search Console exposes per-sitemap indexing status, which immediately tells you where indexing problems live.
Should the sitemap URL go in robots.txt?
Yes. Add Sitemap: https://www.example.com/sitemap.xml as a line in
robots.txt. This is a backup discovery path - crawlers that read robots.txt before any
sitemap submission find the sitemap automatically. It's a free 30-second add that
increases redundancy.
References
- Sitemaps.org. "Sitemap protocol." sitemaps.org/protocol.html
- Google Search Central. "Build and submit a sitemap." developers.google.com/search/docs/crawling-indexing/sitemaps/build-sitemap
- Google Search Central. "Sitemaps - overview." developers.google.com/search/docs/crawling-indexing/sitemaps/overview
- Google Search Central. "Manage sitemaps using the Sitemap report." support.google.com/webmasters/answer/7451001
- Bing Webmaster Tools Help. "Submit a sitemap." bing.com/webmasters/help/sitemaps-3b5cf6ed