GPTBot vs OAI-SearchBot: Block One, Keep The Other

Most publishers who set out to opt their site out of AI training reach for one rule in robots.txt: disallow GPTBot. They paste it in, redeploy, and assume they have closed the door on OpenAI. Three months later, traffic from ChatGPT referrals quietly disappears and nobody can explain why.

The problem is that the rule worked exactly as advertised. GPTBot stopped fetching pages for training corpus collection. But OpenAI runs a second, entirely separate crawler called OAI-SearchBot, and that bot is the one that fetches your pages at query time when a ChatGPT user asks a question your content could answer. OAI-SearchBot has its own user agent, its own IP range, and its own robots.txt token. Blocking GPTBot does not block OAI-SearchBot. Blocking OAI-SearchBot is what removes you from ChatGPT search results.

The distinction is the single most important detail in OpenAI's publisher documentation, and almost every brand audit we run uncovers some confusion about which bot does what. This post pulls the two crawlers apart, explains the third and fourth that round out OpenAI's bot fleet, and lays out the robots.txt configurations that match the most common publisher goals.

Two Bots, Two Different Purposes

OpenAI publishes its bot list at developers.openai.com/api/docs/bots. As of 2026, the fleet includes at least four named crawlers. GPTBot collects training data. OAI-SearchBot fetches pages at query time for ChatGPT search and the AI search surfaces OpenAI powers. ChatGPT-User is the user-proxy bot that fetches a page on behalf of a specific ChatGPT user who pasted a URL or clicked a link inside the chat. OAI-AdsBot enforces landing-page policy for the ad system that OpenAI is rolling out across paid ChatGPT placements.

These four bots are not interchangeable. They run on different infrastructure, they respect different robots.txt directives, and they exist for different stages of the OpenAI pipeline. The fetched pages flow into different systems, and the consequences of blocking each one are different.

The most consequential distinction sits between the first two. GPTBot is a training-time crawler. Its job is to add your pages to the corpus that OpenAI uses to train the next generation of GPT models, which become commercial products at some indeterminate point in the future. OAI-SearchBot is a retrieval-time crawler. Its job is to find your pages right now, when a user asks ChatGPT a question, and pass the relevant passages to the model that writes the answer.

Blocking GPTBot is a forward-looking decision about whether you want your content shaping future model weights. Blocking OAI-SearchBot is a present-tense decision about whether you want to appear in ChatGPT search answers today. These are different decisions, and most publishers conflate them because the bot names look similar and the documentation is dense.

Why OpenAI Split Them

The split is partly a response to publisher pressure. When GPTBot first launched in mid-2023, the only robots.txt lever publishers had was binary: allow OpenAI to crawl your site for training, or block them entirely. The cost of opting out was high, because the same block removed you from any future product that OpenAI built on top of the crawled corpus. As OpenAI rolled out search and agentic products, the company introduced separate bots so that publishers could opt out of training without losing the right to appear in the live ChatGPT product. The split is now codified in the documentation and the robots.txt parsing logic on OpenAI's side respects each token independently.

GPTBot: The Training Corpus Crawler

GPTBot is the crawler that ships pages into the training pipeline. The user agent string identifies as GPTBot followed by version information. Activity from GPTBot is recurring rather than triggered by user queries, and the volume varies by site authority and update frequency.

The pages that GPTBot fetches are not used to answer your customer's next ChatGPT question. They are added to a curated corpus that OpenAI eventually uses to train future GPT models. The horizon between fetch and impact is long. A page GPTBot crawled in early 2026 might influence answers in GPT-6 sometime in 2027 or beyond, and only if the page survives the deduplication, filtering, and quality scoring stages that OpenAI applies before any data reaches the training run.

Blocking GPTBot is a defensible choice for some publishers. News organizations protecting paywalled archives, legal publishers concerned about training data provenance, and brands with proprietary research they want kept out of competitor-available models all have legitimate reasons. The robots.txt directive is straightforward:

User-agent: GPTBot
Disallow: /

The directive blocks GPTBot from fetching any URL on your site. The next time GPTBot attempts to crawl, it will see the disallow rule and stop. OpenAI publishes the IP ranges that GPTBot uses as a JSON file, which allows you to verify in your access logs that the bot honored the directive after deployment. Equivalent ranges exist for OAI-SearchBot and ChatGPT-User at the same root path.

What Blocking GPTBot Does Not Block

The most important consequence of the GPTBot block is what it does not affect. OAI-SearchBot is a separate user agent and ignores the GPTBot rule entirely. ChatGPT-User is also separate. Blocking GPTBot has zero effect on ChatGPT's ability to cite your site when a user asks a question.

This is the single sentence that matters most in this post. If you block GPTBot, ChatGPT will continue to cite your site. If you want to stay out of training and stay in ChatGPT search, blocking GPTBot is exactly the rule you need.

OAI-SearchBot: The Live Retrieval Crawler

OAI-SearchBot operates on a fundamentally different rhythm than GPTBot. It is the crawler that builds and refreshes the index OpenAI uses to ground ChatGPT answers in current web content. When ChatGPT runs a search at query time, the search infrastructure draws on an index OAI-SearchBot has been maintaining. The bot does not necessarily fetch your page in the same instant a user asks the question, but it has fetched and indexed your page recently enough that the content is fresh in the retrieval system.

OAI-SearchBot identifies itself with its own user agent string. The IP ranges are published separately from GPTBot's. The crawl cadence is more frequent than GPTBot's for high-priority publishers because retrieval indices have to stay current. A breaking news article from a major publisher might be fetched by OAI-SearchBot within hours of publication, while a static documentation page might be revisited every few weeks.

Blocking OAI-SearchBot is the action that makes your site invisible to ChatGPT search. Once the block is in place, OpenAI's retrieval system stops fetching new pages from your domain and begins decaying existing entries from the index. Within weeks, your site will stop appearing in ChatGPT citations for queries you used to win.

For the vast majority of publishers, this is the opposite of what they want. ChatGPT referrals carry buyer-intent traffic. Brands cited in ChatGPT answers earn share-of-voice in the AI search surfaces that increasingly replace classic SERPs. The reflex to block OpenAI completely because of training-data concerns is understandable but usually misallocates the lever. Block GPTBot to opt out of training. Leave OAI-SearchBot allowed so you stay in the live product.

The Bing Dependency

OAI-SearchBot is not the only ingestion path into ChatGPT search. OpenAI's search infrastructure also leverages Bing's web index for certain query types. Even if OAI-SearchBot does not crawl your site directly for some reason, your Bing-indexed content can still surface in ChatGPT answers when the retrieval system pulls from the Bing layer. This is why brands with low Bing visibility often see lower ChatGPT citation rates than peers who appear on the same Google SERP. If you have neglected Bing Webmaster Tools setup and Bing-side indexation health, you are leaving ChatGPT visibility on the table even with OAI-SearchBot fully allowed.

How To Tell Them Apart In Server Logs

The cleanest way to distinguish the four OpenAI bots is to grep your access logs for each user agent string. A typical Nginx or Apache access log line includes the user agent in the final quoted field. From a terminal:

grep -i "GPTBot" /var/log/nginx/access.log | head
grep -i "OAI-SearchBot" /var/log/nginx/access.log | head
grep -i "ChatGPT-User" /var/log/nginx/access.log | head
grep -i "OAI-AdsBot" /var/log/nginx/access.log | head

You will see distinct patterns. GPTBot tends to fetch many URLs per session in a depth-first sweep. OAI-SearchBot tends to fetch a smaller number of URLs more frequently, refreshing pages that have been previously indexed. ChatGPT-User fetches single URLs corresponding to user actions in the chat surface. OAI-AdsBot only hits pages that are running paid placements in OpenAI's ad system.

The user agent string is the primary signal but not the only one. OpenAI publishes IP ranges for each bot. Verifying that the user agent matches an IP from OpenAI's published list is the standard defense against spoofed crawlers that impersonate legitimate bots to scrape sites without disclosing their identity. If a request claims to be GPTBot but originates from an IP outside OpenAI's published range, the request is not actually from OpenAI.

For higher-traffic sites, the volume of crawler activity is informative on its own. Running periodic crawl-log analysis tells you not just whether the bots are visiting but which pages they prefer, where they trigger errors, and whether your robots.txt rules are honored as written.

A Two-Minute Sanity Check

If you suspect your robots.txt configuration has unintended effects, a fast diagnostic is to fetch your own robots.txt from each bot's perspective. Use a tool like curl with the bot's user agent string set explicitly:

curl -A "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; OAI-SearchBot/1.0)" https://your-site.com/robots.txt

The response is the same file every fetcher sees, but reading it through the lens of a specific bot helps catch mistakes. The most common error is a global User-agent: * Disallow directive that catches every bot, including OAI-SearchBot, because no explicit allow rule overrides it.

The Right Robots.txt For Your Goals

Once you understand that the four bots respect independent directives, the right robots.txt for your site is a function of what you actually want. The most common goals map to four distinct configurations.

The first is the default. You want OpenAI to do everything: train on your content, fetch it for ChatGPT search, allow ChatGPT-User to load it on user request, and qualify it for OAI-AdsBot if you ever run ads. The right rule is no rule at all. Leave OpenAI's bots out of robots.txt entirely and they will fetch under the standard User-agent: * rules that apply to all crawlers.

The second is the most common publisher choice in 2026. You want to stay in ChatGPT search but opt out of training data. The right configuration blocks GPTBot only:

User-agent: GPTBot
Disallow: /

User-agent: OAI-SearchBot
Allow: /

User-agent: ChatGPT-User
Allow: /

The third is full opt-out. You want OpenAI to stop interacting with your site entirely. The right configuration blocks every named OpenAI bot:

User-agent: GPTBot
Disallow: /

User-agent: OAI-SearchBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

User-agent: OAI-AdsBot
Disallow: /

The fourth is a granular opt-out where some sections of the site are open and others are closed. This is the right pattern for sites with mixed content: a marketing blog open to all crawlers, a paywalled archive closed to training but open to live search, and a member-only library closed to everything. Express the granularity through Disallow paths under each User-agent block:

User-agent: GPTBot
Disallow: /archive/
Disallow: /members/

User-agent: OAI-SearchBot
Disallow: /members/

The granular pattern is the right answer for most established publishers because it preserves visibility on the surfaces that drive traffic while protecting the content categories that need protection.

A broader robots.txt walkthrough across GPTBot, ClaudeBot, and PerplexityBot is worth a read if you are designing the policy for a multi-section site. Anthropic and Perplexity follow the same pattern of separating training-time and retrieval-time crawlers, and the directives interact with each other in ways that can surprise you on a first deployment.

Why Explicit Allow Rules Matter

The most subtle mistake in the configurations above is dropping the explicit Allow rules in the second pattern. Robots.txt parsers across crawlers handle the absence of an Allow rule differently. Some assume the absence equals allow. Others apply the most specific matching User-agent block and ignore the implicit defaults. To eliminate ambiguity, write the rule you want to be true. If OAI-SearchBot should fetch your site, put Allow: / under its User-agent block. The cost is two lines of yaml. The benefit is robots.txt that reads correctly to every parser without depending on edge-case fallback behavior.

What Goes Wrong When Publishers Conflate Them

The most common failure mode is the disappearance pattern we opened this post with: a brand wanted to opt out of training, blocked GPTBot, and then noticed weeks later that ChatGPT referral traffic had dropped because they had inadvertently blocked OAI-SearchBot through an overly broad rule. Auditing the actual robots.txt usually reveals one of a small number of root causes:

A global User-agent: * Disallow: / left over from staging that affects every crawler including OAI-SearchBot.
A copy-pasted rule from an outdated blog post that lists OpenAI's old combined bot, before the GPTBot vs OAI-SearchBot split, with a single global block.
A WAF or CDN rule that blocks OpenAI IPs at the network layer regardless of what robots.txt says. Cloudflare's AI Audit and similar features can be set to challenge or block OpenAI bots indiscriminately if the dashboard toggle is on.
A Disallow rule on the path that contains the most important content. The robots.txt only blocks the specific paths listed, but if your most-cited pages live under those paths, the effect is the same as a full block.
A noindex meta tag injected into the HTML that some retrieval pipelines respect even though robots.txt explicitly allows the crawler.

The diagnostic for any of these is the same. Fetch a few pages with each OpenAI bot's user agent and verify that the response is a 200 OK with the expected HTML body. If the response is a 403, a 404, or an empty page, you have a block somewhere in the stack between your origin server and the bot.

The remediation depends on which root cause you identify. Robots.txt mistakes are the easiest to fix because the directive is a single file. CDN or WAF blocks require dashboard changes and sometimes account-level review. Noindex meta tag injections require theme or template updates. Granular path blocks require a rethink of which sections of the site you actually want indexed versus protected.

The Time-To-Recover After A Misconfiguration

Once you fix the block, the recovery curve is not instantaneous. OAI-SearchBot has to re-crawl the now-unblocked pages, re-index them, and re-incorporate them into the retrieval system. Anecdotally, sites that fix a mistaken full-block see ChatGPT citation recovery starting within a few days but full restoration takes several weeks. The lesson is that the cost of a misconfigured robots.txt is more than the duration of the misconfiguration. There is a tail of lost citations on the way back up.

Verifying The Bots Hit You (And Behave)

After deploying a new robots.txt or correcting a previous mistake, verification matters as much as the change itself. The default assumption that the change worked because the robots.txt looks right is exactly how publishers end up in the disappearance pattern above. Three layers of verification close the loop.

The first layer is reading robots.txt from each bot's perspective. The curl command in the earlier section returns the file as the bot would see it. Confirm that the relevant User-agent block contains the directives you intend.

The second layer is access log monitoring. After deployment, watch your access logs for the next round of crawls from each bot's user agent. If GPTBot stops appearing but OAI-SearchBot continues to hit pages with 200 responses, the policy is working. If both bots disappear or if either bot starts receiving 403 responses, you have a block lower in the stack than robots.txt and need to investigate.

The third layer is empirical citation testing. The point of the whole exercise is whether ChatGPT cites your site for queries you should be winning. Run the citation tests yourself by asking ChatGPT 10 to 20 buyer-intent queries about your category and counting how often your domain appears in the source links. If the citation rate is healthy, your retrieval-time access is in good shape regardless of what GPTBot is doing for training. If the citation rate is low, the next investigation is whether OAI-SearchBot is actually fetching your pages and which pages are surfacing.

The OAI-SearchBot playbook covers the retrieval-side optimizations in depth, from sitemap submission to schema and content patterns that improve the odds of citation. The robots.txt configuration is the precondition. The content and structure of the indexed pages are what determine which of your pages get cited and for which queries.

A Simple Monitoring Setup

For brands that want continuous visibility into bot activity, a lightweight monitoring setup logs each bot's daily request volume and writes the result to a dashboard. Tools like Cloudflare Analytics, Grafana, or even a small custom job that parses access logs nightly will surface anomalies fast. A 50 percent drop in OAI-SearchBot fetch volume over a week is the kind of signal that warrants investigation before it converts to a citation drop downstream.

Frequently Asked Questions

Does blocking GPTBot affect my Google rankings?

No. GPTBot is operated by OpenAI, not Google, and your Google rankings depend on Googlebot's crawl of your site. Google operates its own separate bot fleet with its own robots.txt tokens, including Googlebot for the main search index and Google-Extended for Gemini training opt-out. The GPTBot directive is invisible to Google and has no effect on Google search rankings or AI Overviews placement.

Will blocking OAI-SearchBot also block Microsoft Copilot or other Bing-powered AI surfaces?

Not directly. OAI-SearchBot is OpenAI's crawler. Microsoft Copilot and Bing's AI surfaces draw on Bingbot's index. Blocking OAI-SearchBot stops OpenAI from fetching your site for ChatGPT search, but Microsoft can still surface your site in Copilot through Bingbot. That said, ChatGPT search itself uses some Bing-derived content under the hood, so a site that is well-indexed in Bing has multiple paths into ChatGPT answers even if OAI-SearchBot is blocked. Most publishers benefit from keeping both bot families allowed.

How long does it take for OpenAI to honor a robots.txt change?

OpenAI does not publish a guaranteed propagation window, but observable behavior is that GPTBot and OAI-SearchBot pick up new robots.txt rules within 24 to 72 hours of the next scheduled crawl. After deployment, verify the bots have honored the rules by checking your access logs for the relevant user agents. If the bots continue to fetch blocked paths after a week, escalate through OpenAI's publisher feedback channels or check whether the robots.txt is being served correctly from your origin.

Should I block ChatGPT-User if I have already blocked GPTBot and OAI-SearchBot?

Only if you have a specific reason. ChatGPT-User fetches a single page on behalf of a specific ChatGPT user, usually because the user pasted a URL into the chat or clicked a citation link. Blocking ChatGPT-User means those user actions silently fail, which often surprises users without protecting much. A user who wanted to read your page through ChatGPT will instead see an error and either give up or open the URL in a regular browser. The asymmetry is rarely worth it for typical content sites.

What is OAI-AdsBot and do I need to allow it?

OAI-AdsBot is the crawler OpenAI uses to validate landing pages for the ad system that has been rolling out across paid ChatGPT placements. If you do not run paid placements in ChatGPT, OAI-AdsBot will not fetch your site and you do not need to address it in robots.txt. If you do plan to run paid placements, allowing OAI-AdsBot is necessary for ad approval. The bot follows landing-page policy checks similar to Google Ads' landing-page quality enforcement, and blocking it will cause your ads to be disapproved.

The robots.txt file that controls AI crawler access is one of the highest-leverage configurations on your site, and the four-bot split that OpenAI has formalized is the most common source of confusion in 2026. Get the directives right and you can hold a precise position: out of training, in live ChatGPT search, with citations flowing and traffic intact. Get them wrong and a single line in robots.txt can take you dark to the AI surface that increasingly dominates buyer-research queries in your category.

If your team wants a robots.txt audit that cross-checks the directives, the CDN layer, the WAF, and the actual access-log behavior across all major AI crawlers, that work sits inside our generative engine optimization program. The configurations are simple once you understand the split, and the cost of getting it wrong is silent and slow.

Ready to optimize for the AI era?

Get a free AEO audit and discover how your brand shows up in AI-powered search.

Get Your Free Audit