SEOAug 22, 2025·11 min read

XML Sitemaps and lastmod Tags: Do They Still Matter for Google and AI Crawlers?

Capconvert Team

Content Strategy

TL;DR

Every few years, someone declares the XML sitemap dead. The argument sounds convincing: Google's crawlers are sophisticated enough to discover pages through links alone, and the sitemap protocol hasn't had a major revision since its 2005 launch. Yet here we are in 2026, and both Google and Bing have recently renewed emphasis on properly using the lastmod tag in XML sitemaps . Bing's webmaster team published a dedicated blog post in mid-2025 titled "Keeping Content Discoverable with Sitemaps in AI Powered Search.

Every few years, someone declares the XML sitemap dead. The argument sounds convincing: Google's crawlers are sophisticated enough to discover pages through links alone, and the sitemap protocol hasn't had a major revision since its 2005 launch. Yet here we are in 2026, and both Google and Bing have recently renewed emphasis on properly using the lastmod tag in XML sitemaps . Bing's webmaster team published a dedicated blog post in mid-2025 titled "Keeping Content Discoverable with Sitemaps in AI Powered Search." WordPress shipped native lastmod support in version 6.5. The signals are not subtle. The question is no longer whether XML sitemaps still matter. It's whether you're extracting the full value from them-especially the lastmod tag-at a moment when search engines and AI systems are competing to surface the freshest, most authoritative content. If your sitemap is an afterthought generated by a plugin you never configured, you're leaving crawl efficiency on the table. And in an era of tightening crawl budgets and multiplying AI crawlers, efficiency isn't optional. This guide breaks down what's actually changed, what Google and Bing say on the record, and the specific moves that make your sitemap work harder for both traditional and AI-powered discovery.

What Google Actually Uses (and Ignores) in Your Sitemap

Let's start with what's settled. Google ignores <priority> and <changefreq> values. It uses the <lastmod> value if it's "consistently and verifiably (for example by comparing to the last modification of the page) accurate." The value should reflect "the date and time of the last significant update to the page"-for example, an update to main content, structured data, or links is considered significant, but changing a copyright date is not.

That official documentation hasn't changed in years, but the practical weight behind it has grown. John Mueller confirmed on Twitter that Google only cares about the URL and the last modification date. Gary Illyes from Google called the <priority> tag "a bag of noise." There's no ambiguity here: <loc> and <lastmod> are the two fields worth your attention. Everything else is filler.

Why Accuracy Is Non-Negotiable

The phrase "consistently and verifiably" does heavy lifting. The lastmod date needs to reflect when page content actually changed, not an arbitrary refresh date. Google has been clear that faking lastmod dates can backfire by causing the system to distrust those signals for the entire site.

Google's John Mueller explained that changing your sitemap's lastmod dates to all show today won't favor your rankings or SEO. He said on Reddit that "setting today's date in a sitemap file isn't going to be something that works in favor of anyone, it's just lazy." His fuller response makes the consequence plain: in the early days, setting the current date was often a sign that the sitemap generator was confused or broken. It's trivial for search engines to recognize, and it only makes it harder for them to identify actually updated pages.

This means CMS configurations that auto-regenerate the lastmod timestamp on every build-without checking whether content actually changed-are actively harming your crawl signaling. Review your setup. If your lastmod dates update when someone edits a sidebar widget or a footer copyright year, you have a problem worth fixing immediately.

The Crawl Budget Connection: Where lastmod Earns Its Keep

For sites under a few thousand pages, crawl budget barely matters. Google emphasizes that crawl budget is not something most publishers need to worry about. If new pages tend to be crawled the same day they're published, or if a site has fewer than a few thousand URLs, most of the time it will be crawled efficiently.

But the moment your site crosses the 10,000-URL threshold-or publishes new pages at any meaningful velocity-crawl budget becomes a real constraint. The web is a nearly infinite space, exceeding Google's ability to explore and index every available URL. As a result, there are limits to how much time Googlebot can spend crawling any single site.

This is exactly where accurate lastmod data creates measurable impact. Google's own crawl budget documentation advises: "Keep your sitemaps up to date. Google reads your sitemap regularly, so be sure to include all the content that you want Google to crawl. If your site includes updated content, we recommend including the <lastmod> tag."

How Google Uses lastmod in Scheduling

Gary Illyes explained that crawl scheduling is driven mainly by the importance of pages on a site. A URL in a sitemap will "probably be crawled sooner or more often because you deem that page more important by putting it in a sitemap." When that URL also carries an updated lastmod value, Google's scheduler has a concrete signal that the page merits recrawling.

Illyes confirmed in an SEO Office Hours session that "as soon as you change something in your sitemap, be that the URL element or last mod, the sitemap will be parsed again and generally reprocessed." He clarified that this doesn't guarantee the URLs will be crawled-they're still subject to quality evaluations.

The practical implication: treat your sitemap as a living document, not a static file you set up once and submit to Search Console. When you make a significant content update to a page, the lastmod should reflect it. When you publish something new, the URL should appear in the sitemap promptly. Dynamic sitemap generation through your CMS or a plugin like Yoast or Rank Math handles this automatically-which is the whole point.

Segmenting Sitemaps: The Diagnostic Layer Most Teams Miss

One of the highest-value and most underused sitemap techniques is segmentation. Search Console's page indexing report shows data per sitemap. If all URLs are in a single file, the indexing report gives one aggregated view. But if product pages, category pages, blog posts, and support articles each have their own sitemap, Search Console shows indexing status for each group independently.

This matters far more than it appears. When you see that 92% of your blog posts are indexed but only 63% of your product pages are, you've immediately identified where to focus your crawl optimization efforts. Without segmentation, those numbers blend into a single opaque percentage.

How to Structure a Sitemap Index

When a sitemap exceeds size limits, split it into smaller sitemaps. Use a sitemap index file to manage and submit multiple sitemaps simultaneously. A sitemap index can contain up to 50,000 loc tags. But you shouldn't wait until you hit the limit to split. Segment by content type proactively:

  • Blog/editorial sitemap - for posts and articles
  • Product sitemap - for e-commerce catalog pages
  • Category/taxonomy sitemap - only if those pages have unique, indexable content
  • Core pages sitemap - homepage, about, service pages

A good rule: if you would be unhappy seeing that page in Google results, don't list it in your sitemap. Strip out noindexed pages, thin tag archives, internal search results, and anything returning a non-200 status code.

Stale sitemaps with broken URLs, removed pages, or inaccurate lastmod dates waste crawl budget and send misleading signals about the site's structure. The core maintenance tasks are straightforward: remove URLs that return 404 or redirect, update lastmod dates only when content actually changes, add new pages as they're published, and remove pages set to noindex.

AI Crawlers, Sitemaps, and the New Discovery Layer

Here's where the conversation has shifted dramatically. As AI-powered search engines like Bing Copilot continue to reshape how content is discovered and surfaced, keeping your website crawlable, fresh, and fully indexed is more important than ever. While real-time URL submission protocols such as IndexNow help notify search engines of immediate content changes, sitemaps remain a foundational signal for ensuring comprehensive URL coverage.

AI crawlers access websites through four mechanisms-seed URL discovery, link following, sitemap parsing, and direct request-but most cannot execute JavaScript, making server-side rendering essential for AI visibility. Your sitemap is one of the few structured signals that both traditional search crawlers and AI bots can reliably parse.

What AI Crawlers Actually Do With Your Sitemap

The distinction matters: AI crawlers like GPTBot, ClaudeBot, and PerplexityBot serve different purposes than Googlebot. Unlike traditional search engine crawlers that primarily focus on indexing for search results, AI crawlers collect data for model training, real-time information retrieval, and AI-powered responses. These crawlers serve different purposes: some gather data for initial model training, others fetch real-time information for AI responses.

Keeping the <lastmod> value accurate can help search engines prioritize recently updated pages, which is especially useful for AI systems that aim to surface fresh information. A sitemap won't make your content appear in AI answers by itself, but it helps ensure your pages are discoverable, indexed, and up to date, which increases their chances of being used in AI-powered search results.

Managing Access: robots.txt for the AI Era

A clean sitemap means nothing if your robots.txt blocks the crawlers that need it. Google Search Central's robots.txt guide emphasizes explicitly managing AI crawlers like GPTBot, CCBot, or anthropic-ai. You can block these while allowing Googlebot. The inverse is also true-many sites accidentally block AI crawlers without realizing it.

OpenAI uses OAI-SearchBot and GPTBot robots.txt tags. Each setting is independent-a webmaster can allow OAI-SearchBot to appear in search results while disallowing GPTBot to prevent content from being used for training generative AI models. This granular control means you need a deliberate strategy, not a default configuration. Reference your sitemap in robots.txt with a Sitemap: directive. This helps crawlers immediately find your sitemap and discover all your important pages, leading to faster and more complete indexing.

IndexNow, RSS Feeds, and Complementary Submission Protocols

XML sitemaps are a pull mechanism. You publish the file; search engines decide when to come read it. Sitemaps are slow for new content. You publish a post, you have a sitemap-now you wait for Google to notice the change. That process can take anywhere from a few hours to several weeks for smaller or newer sites.

IndexNow flips this into a push model. It enables real-time URL submission, instantly notifying Bing and participating search engines when individual URLs are added, updated, or removed. This helps ensure changes are surfaced quickly, especially important for freshness in AI search.

But there's a critical limitation. Google does not support IndexNow. The protocol is supported by Bing, Yandex, Naver, Seznam.cz, and Yep. For Google, you still need to rely on XML sitemaps, internal linking, crawl budget optimization, and Google's own Indexing API.

XML sitemaps are downloaded less frequently than RSS/Atom feeds. For optimal crawling, Google recommends using both XML sitemaps and RSS/Atom feeds. XML sitemaps give Google information about all of the pages on your site, while RSS/Atom feeds provide all updates, helping Google keep your content fresher in its index.

The strategic play is layered: XML sitemaps for comprehensive inventory, RSS feeds for recent content signals, and IndexNow for instant notification on Bing and supported engines. These aren't competitors. They're complementary channels.

A Practitioner's Audit Checklist for XML Sitemaps

After auditing sitemaps across dozens of sites in the past year, the same errors recur with alarming frequency. Here's a condensed checklist drawn from those patterns: 1. Validate your lastmod accuracy. Pick 10 URLs at random from your sitemap. Compare the lastmod dates to the actual last significant edit. If more than 2 are wrong, your CMS configuration needs work. 2. Remove non-indexable URLs. Every URL in your sitemap should return a 200 status code, carry a self-referencing canonical, and not be blocked by robots.txt or tagged with noindex. If a page is disallowed in robots.txt, it should not appear in your XML sitemap. That sends mixed signals to search engines.

3. Segment by content type. Create separate sitemaps for blog posts, products, and static pages. Submit each through Search Console. Monitor indexing rates per segment weekly. 4. Reference your sitemap in robots.txt. A single line-Sitemap: https://yoursite.com/sitemap.xml-ensures every crawler that reads robots.txt can find your sitemap without relying on Search Console submission alone. 5. Check Search Console's Sitemaps report. Google Search Console's sitemap report and page indexing report are the primary monitoring tools. They show how many URLs were submitted, how many are indexed, and where errors are occurring.

6. Confirm AI crawler access. Review your robots.txt for explicit directives affecting GPTBot, ClaudeBot, OAI-SearchBot, PerplexityBot, and Google-Extended. Decide deliberately whether to allow, restrict, or block each one. 7. Set up an RSS or Atom feed. Ensure your CMS generates one and submit it to Search Console alongside your sitemap. This provides a faster signal for new content than the sitemap alone. 8. Use dynamic sitemap generation. Automating lastmod updates through your CMS or sitemap generator ensures accuracy. Just make sure these updates reflect substantial content changes. Static, hand-crafted sitemaps go stale the moment you publish a new page.

The lastmod Trust Equation: Why Precision Beats Frequency

There's a persistent misconception that updating your sitemap more frequently is inherently better. It's not. Frequency without accuracy trains search engines to distrust your signals.

If your CMS automatically updates lastmod every time someone views a page or makes a trivial formatting change, you're crying wolf. Search engines will learn to ignore your lastmod signals. Reserve lastmod changes for edits that alter the page's value to users: new sections, updated statistics, revised recommendations, corrected information.

Mueller explained that "the lastmod date should reflect the date when the content has significantly changed enough to merit being re-crawled." If comments are a critical part of your page, then using that date is fine-ultimately, this is a decision you can make. The flexibility is intentional. Google doesn't dictate what "significant" means for your site. But the burden of accuracy falls on you.

Bing said it would rely more on the lastmod date for crawling purposes. Gary Illyes posted on LinkedIn that "the lastmod element in sitemaps is a signal that can help crawlers figure out how often to crawl your pages." Both major search engines are investing in this signal. If you maintain it honestly, it works in your favor. If you inflate it, you lose credibility across your entire domain. The path forward is unglamorous but effective. Audit your sitemaps quarterly. Segment them for diagnostic clarity. Keep lastmod honest and automated. Reference them in robots.txt. And extend your discovery strategy beyond the sitemap with RSS feeds and IndexNow for engines that support it. XML sitemaps didn't become less important as the web grew more complex-they became more important precisely because the competition for crawl attention intensified. The sites that treat sitemaps as infrastructure rather than an afterthought are the ones whose new content gets crawled the same day it's published. That's the difference between a technical SEO checkbox and a genuine competitive advantage.

Ready to optimize for the AI era?

Get a free AEO audit and discover how your brand shows up in AI-powered search.

Get Your Free Audit