ChatGPT-User: The User-Proxy Bot Explained

Most publishers learn about ChatGPT-User the same way: a strange spike in single-URL fetches lands in their server logs, traced back to an unfamiliar user agent string, originating from IP addresses they have never seen before. By the time they go looking for documentation, they have already been receiving these requests for weeks. The traffic is real, the source is OpenAI, and the bot has a name. It just is not the bot you probably blocked when you set up your robots.txt rules.

ChatGPT-User is OpenAI's third named crawler, sitting alongside GPTBot, OAI-SearchBot, and the more recently added OAI-AdsBot. Where GPTBot collects training data and OAI-SearchBot maintains a retrieval index for ChatGPT search, ChatGPT-User does something fundamentally different. It acts on behalf of a specific human user, fetching a specific page at a specific moment, because that user asked ChatGPT to read it. The bot is named ChatGPT-User precisely because it operates as a proxy for the user, not as a background crawler operating on OpenAI's schedule.

This piece explains what ChatGPT-User does, what triggered the December 2025 documentation change that quietly removed its robots.txt commitment, and what your real controls look like now that the easiest lever has been taken away.

Why "User-Proxy" Is The Right Mental Model

Conventional web crawlers operate on a schedule. They visit your pages in some order determined by the crawler's own priority queue, fetch the content, store it, and move on. The visit is impersonal. No specific user is waiting for the result in real time. This describes Googlebot, Bingbot, GPTBot, OAI-SearchBot, ClaudeBot, and most of the named crawlers publishers have managed for the past twenty-five years.

ChatGPT-User does not work this way. It fires when a user takes a specific action inside ChatGPT that requires fetching a specific URL. The user pastes a link into the chat. The user clicks an inline citation. The user instructs the agent to read a page they want summarized. In every case, a real person is waiting for the result. The user-proxy framing matters because it changes the legal and ethical character of the request. A scheduled crawler is OpenAI's own activity, governed by OpenAI's policies and the publisher's robots.txt directives. A user-initiated fetch is, by OpenAI's argument, the user's activity. ChatGPT-User is the technical mechanism that executes a user's request, and the user has every right to browse the open web.

The distinction has practical consequences for how publishers should think about controls. You cannot stop a determined user from copying your URL into a browser, so the analog argument runs, you cannot reasonably stop them from copying it into ChatGPT either. The bot named ChatGPT-User exists to make that user action work end-to-end. Whether or not you agree with the framing, it is the framing OpenAI has adopted, and the documentation as of 2026 reflects it.

Why This Matters For Compliance

If you work in a regulated industry where every page fetch is potentially auditable (healthcare, financial services, legal), the user-proxy framing creates a complication. A request from ChatGPT-User is not anonymous in the same way a Googlebot request is. The bot is acting on behalf of an identifiable person who is, in OpenAI's models, traceable to a specific account. But the publisher does not see that user. The publisher sees only the ChatGPT-User user agent and an IP address from OpenAI's published range. Reconciling that with audit-trail requirements is a real problem and one of the legitimate reasons some publishers do block ChatGPT-User at the WAF layer despite the user-proxy argument.

What Triggers ChatGPT-User

OpenAI's bot documentation lists the trigger conditions explicitly, and observable behavior in publisher logs confirms them. ChatGPT-User runs when one of three things happens inside a ChatGPT conversation.

The most common trigger is a direct URL request. A user types or pastes a URL into the chat and asks ChatGPT to read it, summarize it, extract data from it, or evaluate it. The model translates the user request into a fetch operation, and ChatGPT-User retrieves the page so the model has its content available for the next response. The user sees a summary or analysis in the chat. The publisher sees a single GET request hitting the exact URL the user shared.

The second trigger is citation traversal. ChatGPT often surfaces inline citations linking to sources it used to answer the question. When the user clicks one of those citation chips, the click can either open the source in a new browser tab or, in newer ChatGPT clients, trigger an in-app preview that ChatGPT-User fetches to render. The latter pattern is increasingly common because it keeps the user inside the chat instead of bouncing them out to a separate tab.

The third trigger is agentic activity inside the ChatGPT browser surfaces. Atlas, the OpenAI native browser, and the Operator agent both rely on ChatGPT-User to fetch pages while completing multi-step tasks. When a user asks Operator to comparison shop across three vendors, the agent visits each vendor's site. Those visits are ChatGPT-User fetches from the publisher's perspective.

What does not trigger ChatGPT-User is anything you can preempt with a robots.txt rule scoped to GPTBot or OAI-SearchBot. ChatGPT-User is a completely separate user agent with its own IP range published at openai.com/chatgpt-user.json. The rules you write for the other OpenAI bots do not apply to it.

How Often Does It Actually Fire

For most sites, ChatGPT-User volume is small relative to GPTBot or OAI-SearchBot. The bot fires once per user action, not periodically per URL. A blog post that gets cited in 1,000 ChatGPT conversations might see 1,000 ChatGPT-User fetches, but each one is a single request and the cadence is bursty rather than constant. By comparison, GPTBot might fetch the same blog post twice a week as part of its training-corpus sweep. The relative volumes shift dramatically once a brand becomes citation-worthy in popular query categories, because the user-proxy fetches grow with the brand's footprint in ChatGPT answers.

The December 2025 Policy Change

In December 2025 OpenAI quietly updated its bot documentation. The change was not announced in a blog post and did not generate headlines. The substance of the change was a deletion. OpenAI removed language that had previously stated ChatGPT-User would comply with robots.txt directives when fetching pages on behalf of users. The new documentation does not say ChatGPT-User will respect robots.txt for user-initiated fetches, and it does not say it will not. The silence is the change.

The reasoning OpenAI has offered in adjacent statements aligns with the user-proxy framing described above. A robots.txt directive controls how a crawler interacts with your site. When the entity initiating the fetch is a user rather than a crawler, the robots.txt protocol is, by OpenAI's reading, not the right tool. Users do not consult robots.txt before visiting your site in a browser, and ChatGPT-User by extension does not consult robots.txt before fetching a URL the user asked it to fetch.

Whether this interpretation is correct is contested. Some publishers and legal commentators argue that ChatGPT-User is still a programmatic crawler operated by OpenAI, regardless of whether the trigger came from a user, and the protocol should apply. Others note that the same logic could extend to any agentic browser (Atlas, Comet, Operator), and treating those as exempt from robots.txt creates a category of crawler that publishers cannot block through the standard protocol. The debate is genuinely live in 2026 and will likely be settled at some combination of policy, regulation, and lawsuit before the year ends.

For now, the operative facts are these. ChatGPT-User's documentation does not state robots.txt compliance. Publishers who relied on a Disallow rule scoped to ChatGPT-User to block the bot can no longer assume the rule is honored. The block requires a different mechanism.

What Stayed The Same

GPTBot and OAI-SearchBot continue to honor robots.txt as documented. The December 2025 change applied only to ChatGPT-User. The other OpenAI crawlers respect the User-agent: GPTBot and User-agent: OAI-SearchBot blocks in robots.txt as they always have. The split exists precisely because OpenAI views training crawlers and retrieval crawlers as a different category from user-proxy fetches. The split also means the standard guidance on managing OpenAI's bot fleet, including our companion post on GPTBot vs OAI-SearchBot, still applies to those two. ChatGPT-User just is not in scope for those rules anymore.

What ChatGPT-User Means For Your Robots.txt

The practical implication is that the obvious lever no longer pulls reliably. You can still add a ChatGPT-User block to your robots.txt:

User-agent: ChatGPT-User
Disallow: /

OpenAI's documentation does not promise the rule will be honored, but our own access-log observations across client sites in early 2026 show mixed behavior. Some ChatGPT-User fetches do appear to skip URLs the publisher has explicitly disallowed, suggesting OpenAI applies the rule on a best-effort basis even though the documentation no longer commits to it. Other fetches go through regardless. The pattern is not predictable enough to rely on for compliance purposes.

The clearer interpretation is that robots.txt rules for ChatGPT-User now function as a signal, not a control. The signal communicates that the publisher does not consent to user-proxy fetches. Whether the signal is honored depends on factors outside your visibility. If you need a hard control, robots.txt is the wrong layer.

The right layer depends on what you are actually trying to achieve. Three goals are common, and they map to three different mechanisms.

Goal one: keep your site reachable to humans but exclude all programmatic fetches including user-proxy ones. The mechanism is a WAF rule at the CDN layer that challenges or blocks requests from OpenAI's published IP ranges, regardless of user agent. This catches ChatGPT-User along with GPTBot and OAI-SearchBot, and it does not depend on robots.txt being honored.

Goal two: stay open to ChatGPT search citations and discovery, but limit user-proxy fetches to specific page types. The mechanism is a URL pattern block at the WAF layer scoped to the ChatGPT-User user agent (or IP range) and applying only to the sensitive paths. Login pages, member areas, account pages, and pricing tools that should not be programmatically fetched can be blocked specifically while the rest of the site remains open.

Goal three: signal disapproval without enforcing it. The mechanism is the robots.txt directive shown above. The directive will appear in your robots.txt and will be visible to OpenAI's parsing infrastructure. The fetches may or may not stop. If they continue, you have a position from which to escalate through OpenAI's publisher feedback channels.

Where The robots.txt Conversation Goes Next

The industry is not finished arguing about this. Several major publisher organizations have proposed extensions to the robots.txt protocol that would explicitly cover user-proxy fetches, with separate directives like User-agent-proxy or similar mechanisms. None of these proposals has been formally adopted. Until they are, the de facto standard is what each individual AI engine chooses to honor. The practical advice for 2026 is to write the robots.txt directive you wish were honored, and pair it with the WAF or CDN rule that actually enforces what you mean.

Detecting ChatGPT-User In Your Server Logs

The first step before deciding how to respond is confirming what you are actually seeing. ChatGPT-User identifies itself with the user agent string ChatGPT-User followed by version information. In standard Nginx or Apache access logs, the user agent appears in the final quoted field of each request line. A grep is the fastest way to find recent activity:

grep -i "ChatGPT-User" /var/log/nginx/access.log | head

Three patterns characterize ChatGPT-User traffic. First, individual fetches arrive in isolation rather than as part of a sweep. You will not see ChatGPT-User crawl 50 URLs from your site in a row. You will see one URL fetched once, then nothing for hours, then another URL. Second, the URLs tend to be article-level pages, product pages, or pages with names that look like they were shared by a user (long descriptive slugs with topical keywords). Crawlers fetch index pages and category pages too; user-proxies do not, because users do not paste category-page URLs into chats. Third, the referer field is typically empty or set to chatgpt.com depending on the surface that originated the fetch.

Compare this to GPTBot, which fetches URLs from a queue maintained by OpenAI's training pipeline and tends to walk through your sitemap or sweep depth-first through internal links. The volume patterns are visibly different in a daily access-log summary.

For higher-traffic sites, running periodic crawl-log analysis gives you a baseline against which to spot anomalies. A 20x spike in ChatGPT-User fetches over a week is the kind of signal that warrants investigation, because it usually means your brand has crossed a citation threshold that started routing more user-proxy traffic to your pages.

Verifying The User Agent Is Real

Spoofed user agents are common. Anyone can claim to be ChatGPT-User in their request headers. To verify the request is actually from OpenAI, cross-check the requesting IP against OpenAI's published range. The chatgpt-user.json file at openai.com lists the current IP allocations for the bot. A request claiming to be ChatGPT-User from an IP outside that range is not actually from OpenAI and should be treated as a fingerprint-spoofing attempt by some other actor (often a competitor scraper using ChatGPT-User as cover, because few WAFs block legitimate AI traffic by default).

Practical Controls If You Want To Limit It

If robots.txt no longer reliably blocks ChatGPT-User, the working controls all live higher up in the stack. The right choice depends on your business model and the granularity of restriction you actually want. Four mechanisms cover the practical range:

WAF-level user-agent blocks. The bluntest tool but the most reliable. Major WAF providers (Cloudflare, AWS WAF, Akamai, Fastly) all support custom rules that match on user agent strings. A rule scoped to "user-agent contains ChatGPT-User" with action "challenge" or "block" will catch every request the bot sends, whether the documentation promises compliance or not. The downside is that legitimate user-proxy fetches stop too, which means users who paste your URL into ChatGPT will see an error inside the chat instead of a summary of your page.
IP-range blocks. More durable than user-agent matching. OpenAI publishes the IP ranges for each bot at well-known URLs. A WAF rule that blocks all requests from those ranges will catch ChatGPT-User along with the other OpenAI crawlers, regardless of what user agent string the requests claim. This is the right choice if your goal is broad exclusion of OpenAI from your site rather than narrow blocking of one bot.
Path-scoped restrictions. The most surgical option. WAF rules that combine a user-agent match with a URL pattern can block ChatGPT-User from sensitive paths while leaving the rest of the site open. Member areas, account dashboards, paid content gates, and admin pages are the natural candidates. The result is a site that remains citation-reachable for marketing content while protecting the surfaces where user-proxy fetches would expose private or paywalled material.
Rate-limiting. A softer alternative when full blocking feels too aggressive. A WAF rule that limits ChatGPT-User to a small number of requests per minute per IP can curb burst traffic without preventing legitimate user-proxy actions entirely. The downside is that under heavy load, some users will see their requests fail while others succeed, which produces inconsistent experiences and support tickets.

A Decision Tree

If you do not want OpenAI fetching your site for any purpose, block IP ranges. If you want training-data exclusion but live citations, block GPTBot in robots.txt and leave OAI-SearchBot and ChatGPT-User open. If you want user-proxy fetches blocked specifically while live citations continue, run a WAF rule scoped to the ChatGPT-User user agent only. If you want robots.txt to remain your primary control even though it is no longer reliable, deploy the directive and accept the partial coverage. For most publishers, a hybrid approach (robots.txt for signal, WAF for sensitive paths) is the right balance, and it sits inside the broader OAI-SearchBot optimization playbook that covers the rest of the OpenAI bot fleet.

The Bigger Picture: Agent-Initiated Fetches Are Here To Stay

ChatGPT-User is the first prominent user-proxy crawler. It will not be the last. Anthropic's Claude has an analogous user-initiated fetch path. Perplexity uses Perplexity-User for the same purpose. Google's emerging agentic surfaces, including AI Mode's interactive citations and the experimental browsing features in Gemini, are converging on the same pattern. The model is consistent across the AI ecosystem: a small number of scheduled crawlers do the slow indexing work, and a growing number of user-proxy crawlers do the on-demand fetching for live agent interactions.

For publishers, the implication is that robots.txt is becoming a less complete control surface. The protocol was designed for the era of scheduled crawlers, when a single named user agent represented a single category of activity. In 2026 the activity has split. Scheduled crawlers still follow the protocol. User-proxy crawlers increasingly do not, by design rather than by accident. Publishers who want effective control over AI traffic need to operate at the WAF layer with IP-range awareness, not just at the robots.txt layer with user-agent matching.

The strategic question is whether to block or accept. Blocking ChatGPT-User cleanly stops user-proxy fetches but also stops the AI surfaces that drive a growing share of buyer-research traffic. Accepting ChatGPT-User keeps your brand reachable through ChatGPT but accepts a category of fetch you cannot fully control. Most brands we work with land on acceptance because the visibility upside outweighs the loss of crawler-level control. The exceptions are publishers with subscription paywalls, legally regulated content, or strong arguments about training-data provenance, all of which have legitimate reasons to limit even user-proxy activity.

The right answer for your site depends on what you are publishing, who is reading it, and how your business model translates AI visibility into revenue. The wrong answer is to assume that the robots.txt rule you wrote in 2023 is still doing what you think it is. The protocol has changed underneath that assumption, and ChatGPT-User is the first concrete example of a category of crawler the protocol does not cleanly cover.

Frequently Asked Questions

Will OpenAI ever bring back robots.txt compliance for ChatGPT-User?

There is no public roadmap indicating either direction. The December 2025 documentation change went the other way, removing the existing commitment rather than adding to it. The argument OpenAI has implicitly adopted is that user-proxy fetches are a different category from crawler-initiated fetches and should not be governed by a protocol designed for the latter. Industry pressure or regulatory action could change the position, but as of mid-2026 the operative behavior is that ChatGPT-User does not promise to honor robots.txt rules. Plan for the current behavior, not the hoped-for one.

Does ChatGPT-User respect noindex meta tags or X-Robots-Tag headers?

Inconsistently. ChatGPT-User is fetching the page on behalf of a user who wants its content, which is the opposite of what noindex was designed to prevent. Some retrieval pipelines do check noindex and skip citation generation for pages marked as such, but the fetch itself still happens. If your goal is to prevent ChatGPT-User from fetching the page at all, noindex is not the right tool. A WAF block is.

How do I tell ChatGPT-User from OAI-SearchBot or GPTBot in my logs?

The user agent string is the primary signal. ChatGPT-User identifies as "ChatGPT-User" plus version information. GPTBot identifies as "GPTBot" plus version. OAI-SearchBot identifies as "OAI-SearchBot" plus version. All three have distinct strings and distinct IP ranges published in separate JSON files at openai.com. The fetch patterns also differ: ChatGPT-User fires once per user action on isolated URLs, GPTBot fires periodically in sweeps, and OAI-SearchBot fires more frequently on a maintained retrieval index.

Will blocking ChatGPT-User affect my ChatGPT search citations?

Not directly. ChatGPT search citations are powered by OAI-SearchBot's index and the Bing layer that ChatGPT search also draws on. Blocking ChatGPT-User does not stop OAI-SearchBot from crawling your site or citing it in search responses. What blocking ChatGPT-User does affect is the case where a user pastes your URL directly into ChatGPT or clicks an existing citation. In those cases, the user gets a failed fetch instead of a working summary. If your business model values inbound traffic from users who proactively share your URL into chats, blocking ChatGPT-User has a real downside even though it does not hit citation rates.

What about OAI-AdsBot? Is that the same thing?

No. OAI-AdsBot is the fourth named OpenAI crawler and serves a separate function: validating landing-page policy for the paid placements that OpenAI's ad system has rolled out across ChatGPT surfaces. If you do not run paid placements, OAI-AdsBot is unlikely to visit your site. If you do, allowing OAI-AdsBot is necessary for ad approval. None of the four bots overlap in function. They are four independent agents with four independent purposes, and the controls for each one are separate.

The story of ChatGPT-User is the story of how the bot ecosystem has split into categories that the original robots.txt protocol was not designed to cover. Scheduled crawlers still follow the protocol. User-proxy crawlers do not, increasingly by design. The right response is not to abandon robots.txt but to recognize what it can and cannot do, and to layer the controls that actually enforce your intent on top of it.

If your team wants a full audit of how ChatGPT-User, GPTBot, OAI-SearchBot, and the other AI bots are interacting with your site (which user agents are hitting which paths, what your WAF is letting through, and where the gaps are between intent and enforcement), that work sits inside our generative engine optimization program. The category of fetch is new. The risk of misconfiguration is real. The right control depends on what you are publishing and who you are serving.

Ready to optimize for the AI era?

Get a free AEO audit and discover how your brand shows up in AI-powered search.

Get Your Free Audit