A SaaS brand launches a new pricing tier in March 2026. By April, their pricing page is updated, their FAQ reflects the change, and their support articles cover the migration. They run AI visibility audits. ChatGPT still describes the old pricing. Claude does too. Perplexity is closer to current but mixes old and new tiers. Gemini, integrated with Google's index, gets it right within two weeks.
The pattern is not random. Each engine has a knowledge cutoff baked into the model's training data, and each engine handles the gap between the cutoff and the current date differently. Some engines retrieve aggressively for any query that could reasonably need fresh information. Others trust parametric memory unless the query explicitly signals freshness. The publisher's content can be perfectly written and still be invisible if it falls in the wrong place in this geometry.
Understanding knowledge cutoffs is not optional in 2026. It is the difference between publishing content that AI engines find and publishing content that AI engines ignore. This explainer breaks down the two knowledge layers, the current cutoffs across the major models, and the publishing strategies that bridge the gap.
The Two Knowledge Layers: Parametric And Retrieval
Every modern AI assistant has two distinct sources of information. Parametric memory is what the model learned during training. It is fixed at the moment training ended. Retrieval is the real-time layer that lets the model fetch information from the web (or a corporate knowledge base, or an MCP server) during the conversation.
Parametric memory is fast, broad, and stale. The model can answer questions about virtually any topic it was trained on, without making any external calls. The catch is that the answer reflects the world as of the training cutoff, which is typically months before the model was released and often more than a year before the user is interacting with it.
Retrieval is slow, narrow, and current. The model makes an external call (a web search, an API lookup, an MCP query) and reads the results. The information is up to date but limited to whatever the retrieval system surfaces, and the model has to integrate the retrieved text into its answer, which adds latency and complexity.
Engines decide which layer to use on a per-query basis. The decision is influenced by the query's apparent freshness sensitivity, the user's explicit signal (some interfaces have a "search the web" toggle), the model's confidence in its parametric answer, and the engine's product policy. Different engines apply different defaults. We have unpacked the difference between training and retrieval in more depth elsewhere.
The visibility implication for publishers is that your new content lives in the retrieval layer. Whether the model uses retrieval at all on a given query determines whether your content has any chance of being cited.
Current Cutoffs Across The Major Models
The cutoff for each model is published by the vendor and can shift with new releases. As of mid-2026, the public state looks like this.
OpenAI's GPT-5 family, the current flagship deployed in ChatGPT, was released in 2025 with training data through late 2024. ChatGPT supplements parametric knowledge with web search via OAI-SearchBot and Operator. The web-search behavior is aggressive: ChatGPT searches by default for most queries that mention recent events, specific products, or named entities.
Anthropic's Claude 4 family, including Opus 4.7, Sonnet 4.6, and Haiku 4.5, has training cutoffs ranging from January 2026 (Opus 4.7) to mid-2025 (Sonnet 4.6 and Haiku 4.5). Claude supplements with web search via ClaudeBot, but the default behavior is more conservative than ChatGPT. Claude tends to answer from parametric memory unless the query explicitly signals freshness or the user has enabled browsing.
Google's Gemini 2.0 family powers AI Mode and AI Overviews. The cutoff is less explicitly published because Gemini's retrieval is so tightly integrated with the live Google index that the parametric layer matters less. For most queries, Gemini will pull from real-time Google search results and synthesize.
Microsoft Copilot uses GPT-5 with Bing search integrated. The cutoff inherits from OpenAI but the retrieval layer pulls from Bing rather than OpenAI's own web search. The behavior is similar to ChatGPT for fresh queries.
Perplexity is retrieval-first by design. The model defaults to running a search for every query and synthesizes from the retrieved sources. Parametric memory plays a smaller role than in the other engines. New content has the best chance of being cited quickly on Perplexity.
The implication for publishers is that the engine matters. New content published after a model's parametric cutoff reaches users through the retrieval layer, and each engine's retrieval layer behaves differently.
Why Cutoffs Shift With Releases
Each major model release moves the cutoff forward, but the gap between the cutoff and the release date typically stays roughly constant. The gap is the time required for the vendor to train, evaluate, and harden the model after data collection ends. In practice, the gap is six to twelve months. By the time a model is in user hands, its parametric memory is already at least half a year stale, and the gap widens daily.
When Cutoffs Help Your Citations And When They Hurt
Cutoffs interact with content age in patterns worth understanding.
For evergreen content (definitions, principles, historical facts, established practices), the parametric layer is the dominant source. A page published before the cutoff has the strongest baseline because it had a chance to be included in the training data. New evergreen content published after the cutoff has to fight to be retrieved on queries where the model might just answer from memory.
For fresh content (recent product launches, current events, new pricing, contemporary comparisons), the parametric layer is unreliable by definition. The model knows it needs retrieval. New content has a much stronger chance of being cited because the engine is actively looking for current sources.
For mid-life content (12 to 24 months old), the pattern depends on the topic. Stable topics may still rely on parametric memory. Volatile topics will trigger retrieval. The same article can be cited reliably for one query and ignored for another based on the engine's read of how time-sensitive the query is.
The practical lesson is that content that takes a clear position on something that has changed since the model's cutoff is more likely to be retrieved than content that overlaps with the model's existing parametric knowledge. New comparisons, new pricing analyses, new product reviews, and new event coverage all have a higher floor in retrieval than evergreen updates do.
The Query Classification Pattern: Evergreen Versus Fresh
Engines internally classify queries to decide whether to retrieve. The classification is not exposed in the user interface, but the behavior can be inferred by testing the same query with and without retrieval enabled.
Queries that trigger retrieval reliably include any reference to current dates, recent events, specific recent products, or comparisons that use words like "now" or "today" or "in 2026." Queries about specific named entities also tend to trigger retrieval because the engine wants to verify it has current information about that entity.
Queries that often do not trigger retrieval include broad conceptual questions, historical or biographical questions about long-established figures, definitional questions, and questions that match common patterns the model has seen many times in training.
The pattern is not exactly "old questions skip retrieval and new questions use it." The pattern is closer to "questions the model is confident about skip retrieval and questions where the model senses it might be wrong use it." Confidence is correlated with topic stability and the model's training exposure.
For publishers, this means that the queries you can reliably win citations on as a new entrant are the ones the engines treat as freshness-sensitive. Trying to displace a Wikipedia-style answer on a definitional query is hard regardless of how good your content is, because the engine may not retrieve at all. Adding fresh perspective on a recent event is easier because the engine is already looking for sources.
Content freshness and AI search is the natural counterpart topic. The freshness work is what keeps your existing content in the retrieval-eligible pool.
Publishing Strategies That Work Around The Cutoff
Publishers can shape their content strategy to take advantage of how cutoffs work.
First, lead with freshness signals. Content that is explicitly dated, references specific recent dates and events, and updates its dateModified field on substantive edits is more likely to be classified as fresh and retrieved. The signals are simple to add and the lift is real.
Second, write about what has changed. Articles framed as "what is new in 2026" or "what we learned from the March 2026 update" or "the latest pricing changes" are more likely to be retrieved than articles framed as timeless guides. The reason is that the engine's classifier reads the framing and routes to retrieval more often.
Third, attach your content to named recent events. If a recent industry shift happened (a product launch, a regulatory change, a major company news), articles that reference the event by name get pulled into queries about the event. This is the highest-leverage freshness strategy in practice.
Fourth, refresh existing pages strategically. We have written about the freshness rule of refreshing at least one section every six to nine months. The retrofit is doubly important post-cutoff because pages that have not been refreshed look stale to the classifier and get downranked in retrieval.
Fifth, treat each major engine separately. Perplexity and Gemini are retrieval-heavy and will cite fresh content quickly. ChatGPT is moderate. Claude is conservative. Publishing content with the awareness of which engine is most likely to pull it shapes which surfaces you can move first.
The First-Week Citation Lag
New content typically does not get cited in the first 24 to 72 hours after publication. The engines need time to crawl, embed, and integrate the page into their retrieval indexes. The lag varies by engine: Perplexity is often citing within 48 hours, Gemini within 72 to 96 hours, ChatGPT within 7 to 14 days, Claude within 14 to 30 days. Patience matters more than people expect.
Six Mistakes Publishers Make Around Cutoffs
Several recurring patterns reduce citation rates around cutoffs.
- Treating all queries the same. Optimizing every page for both evergreen and fresh queries dilutes the freshness signal. Decide which queries each page is for and write accordingly.
- Failing to update the dateModified field on substantive edits. The field is read by retrieval systems as a freshness signal. Leaving it stale on updated content costs visibility.
- Burying dates in the body. The publication date and any "as of" dates should be visible in the first or second paragraph. Burying them at the end of the article weakens the signal.
- Writing in a tone that signals timelessness when the content is actually time-sensitive. Phrases like "this guide will always be relevant" or "the principles never change" push the engine toward parametric retrieval. Even when the underlying ideas are stable, fresh examples and recent dates ground the content in retrieval-friendly territory.
- Ignoring engine-specific behavior. A single content strategy applied uniformly to every engine misses the differences in retrieval aggressiveness. The brands that win adapt by engine.
- Underpublishing fresh angles. Many brands publish only occasionally and miss the retrieval window for recent events. A higher cadence of fresh-angle content (twice a month at minimum) keeps the brand visible in the retrieval pool.
Frequently Asked Questions
How can I find the exact training cutoff for ChatGPT, Claude, or Gemini?
OpenAI publishes ChatGPT cutoffs in its model documentation. Anthropic publishes Claude cutoffs in the model release notes. Gemini's cutoff is less publicly documented because of its tight integration with the Google index. For most practical work, you can verify the cutoff by asking the model itself: "what is your knowledge cutoff date" usually returns an accurate answer.
Does ChatGPT automatically search the web for every query?
No. ChatGPT searches when its internal classifier judges the query likely to need fresh information. The classifier is not perfect. Queries about recent events, specific dated products, and current pricing reliably trigger search. Broader conceptual queries often do not. Users can force a search by enabling the relevant tool in the interface.
Can I increase the chance of my page being retrieved by a specific engine?
Indirectly. The strongest lever is making the page about something the engine treats as freshness-sensitive. References to recent events, dated content, and currently relevant comparisons all push the classifier toward retrieval. The other levers (schema, structure, citation gravity) help once retrieval happens.
Do AI engines cache content they have retrieved before?
Yes, partially. Each engine maintains its own retrieval cache or index. Pages that have been cited recently are easier to retrieve again because they are already embedded in the system. Pages that have never been retrieved have to be crawled and processed before they can be cited. The cold-start lag is real.
Will future models eliminate the cutoff entirely?
Probably not in the near term, but the gap will shrink. Continuous training models (which update parametric memory more frequently) are being researched but face stability and cost challenges. The likely path is hybrid: parametric memory updated on a quarterly or monthly cycle, plus an aggressive retrieval layer that picks up the difference. The cutoff as a hard line will soften but not disappear.
Should I publish more frequently if I want to be cited more often?
Yes, with a caveat. Frequency matters because it keeps you in the freshness-eligible pool. But frequency without quality dilutes the brand's overall authority signal. The right cadence is the maximum you can sustain at high quality. For most brands, that is two to four substantive pieces per month, not daily blasts.
Knowledge cutoffs are the single most underexamined variable in GEO planning. The engines you are trying to be cited by have a parametric layer that may already cover your topic and a retrieval layer that decides whether to look for fresh sources. New content reaches users through the retrieval layer, and the retrieval layer's behavior depends on how the engine classifies the query.
The practical work is to lean into freshness explicitly: dated content, named recent events, refreshed pages, and a publishing cadence that keeps the brand in the eligible pool. Engine-specific awareness pays off because Perplexity and Gemini move faster than ChatGPT and Claude. The brands that publish with the cutoff in mind get cited weeks faster than the brands that publish into the void.
If your team wants help building a publishing calendar that exploits the cutoff geometry, including engine-specific timing for product launches and event coverage, that work sits inside our generative engine optimization program. The brands cited on fresh queries are the brands publishing with intent, not in volume for its own sake.
Ready to optimize for the AI era?
Get a free AEO audit and discover how your brand shows up in AI-powered search.
Get Your Free Audit