How To Build an LLMs.txt File

Q: Where do I host the file and what headers do I need?

Serve both files from the root of your primary domain and from any documentation subdomain. Set Content-Type to text/plain; charset=utf-8 and enable gzip or Brotli compression. Verify HTTP 200, correct encoding, and that your CDN is not blocking the path. Both files are also available at any level of your hierarchy - /docs/llms.txt and /docs/ai-features/llms-full.txt are valid.

TL;DR

llms-full.txt contains your entire documentation in one Markdown file - where llms.txt points to sources, the full version delivers the complete content in a single HTTP request with no HTML parsing, no JavaScript, and no wasted tokens on navigation or cookie banners.
LLMs are accessing llms-full.txt more frequently than llms.txt according to Profound data - that signal alone should change how documentation-heavy sites prioritize their implementation work.
Adoption remains tiny but the standard is real: ProGEO.ai measured 7.4% of Fortune 500 with llms.txt; SE Ranking found 10.13% across nearly 300,000 domains. Anthropic, Google's A2A protocol, and most major doc platforms have adopted it.
Serving Markdown instead of HTML reduces token consumption by 90%+ per Fern. A typical documentation page is 20% content and 80% navigation, styling, and scripts - llms-full.txt eliminates that waste entirely.
Build it once, automate updates, then keep it small. Aim for 20-50 priority documents max. Stale or bloated files erode the standard's value. Treat the file as security-sensitive and gate it behind code review.

What llms-full.txt actually is - and why LLMs are hitting it

The llms.txt standard, proposed by Jeremy Howard of Answer.AI in September 2024, defines two complementary files. llms.txt is a streamlined navigation index pointing AI systems to relevant sources. llms-full.txt is the comprehensive companion - the entire documentation corpus flattened into a single Markdown file ready for one-shot ingestion.

Mintlify originally developed llms-full.txt in a collaboration with Anthropic, who needed a cleaner way to feed their entire documentation into LLMs without parsing HTML. After seeing its impact, they rolled it out for all customers, and it was officially adopted into the llmstxt.org standard. llms.txt specification - llmstxt.org

That origin matters because it reveals the file's primary purpose: serving as a single ingestion point for AI systems that need complete context. A language model can retrieve an entire documentation corpus in one HTTP request - ideal for RAG pipelines, AI-powered support bots, and coding assistants that need context-window-friendly snapshots of a knowledge base. By stripping JavaScript, navigation menus, and cookie banners, retrieval becomes clean and deterministic.

The shift in operator priorities comes from observed behavior. Data from Profound reveals that LLMs are accessing llms-full.txt more frequently than the original llms.txt. For documentation-heavy products, that traffic pattern is the signal that this is where the standard actually shines. The file's value lies less in raw search visibility and more in content accuracy when AI systems reference your material.

Standard timeline and platform availability

llms.txt has no enforcement and no required date. It is a voluntary specification. The relevant timeline is the platform-adoption curve, which is what determines whether you can implement quickly or need to build from scratch.

Proposed: September 2024 by Jeremy Howard at Answer.AI (llmstxt.org specification published)
Anthropic adoption: Asked Mintlify to implement llms.txt and llms-full.txt for Claude documentation
Google A2A protocol: Included an llms.txt file in the new Agents to Agents protocol
GitBook: Added llms.txt in January 2025; expanded to llms-full.txt and .md page support in June 2025
Mintlify, Fern: Auto-generation live; files refresh whenever documentation changes
Enforcement: None - implementation is voluntary, and AI crawlers respect the file as a hint, not a directive

The fastest path depends entirely on your stack. Documentation-platform users (Mintlify, Fern, GitBook) get auto-generation. WordPress users have the Yoast SEO integration with weekly regeneration via WordPress cron jobs, plus the Website LLMs.txt plugin with 30,000+ installs. Static-site users have sphinx-llms-txt, eleventy-plugin-llms-txt, and gatsby-plugin-llms-txt. Custom stacks on Next.js, Nuxt, or SvelteKit can generate the file from a headless CMS and expose it as a static asset.

Who benefits most - and who can deprioritize

The evidence on llms-full.txt is mixed. Statistical analysis and machine-learning models have found no measurable effect on how often a domain is cited by LLMs - removing the variable from one XGBoost model actually improved its accuracy. And yet ChatGPT now refers around 10% of new Vercel signups, and major AI labs are explicitly building against the standard. The honest reading is that value is highly segment-dependent.

Segment	Priority	Why
Documentation-heavy products (APIs, SDKs, technical guides)	High	This is where the standard actually shines. AI-powered support bots and coding assistants need clean, token-efficient context. Single-request ingestion of your entire reference is materially better than HTML parsing dozens of pages.
RAG pipeline operators and AI tooling vendors	High	If you are the consumer (building agents that ingest other sites' docs) or the supplier (whose docs other agents ingest), llms-full.txt is the cheapest, most deterministic ingest path available. Profound's data showing higher LLM access frequency is most likely concentrated here.
E-commerce and marketing sites	Medium	Useful but limited. For an e-commerce site you would typically link to the return policy in llms.txt and use llms-full.txt to explain how to use the search function and the checkout process. Don't dump product catalogs.
Content-marketing blogs and editorial sites	Low	The data here is the most sobering. A few SEOs have started creating markdown copies of every blog article as .md files, then linking all these from their llms.txt. This approach creates unnecessary duplicate content without clear benefits. Editorial value lies in being citable, not in being ingested wholesale.

Two caveats matter. First, llms-full.txt is not a search-ranking lever - the evidence does not support that framing. Second, the standard rewards focus. Context windows are expanding (Gemini 1.5 Pro at 1 million tokens, Claude 3.5 Sonnet at 200,000), but bigger isn't always better. Models perform better with curated information than with comprehensive dumps. Aim for the most essential documentation comprehensively rather than cramming every page you've ever published.

Build the file this week - in four steps

Priority order: audit and prioritize content first, convert to clean Markdown second, assemble the file structure third, then host and validate. The mistake most teams make is starting with conversion before deciding what belongs in the file at all.

Audit and prioritize your content. Identify quickstart guides, authentication docs, API references, SDKs, pricing pages, security policies, SLAs, and the top 10 most common support questions. An effective file is not a dump of your entire site - it should resemble a smart table of contents. Keep it at 20-50 links max. More isn't curation; it's dumping.
Convert content to clean Markdown. Gather the full content of all documents referenced in your llms.txt file. Tools: Pandoc for batch conversion between document formats, Firecrawl which generates both llms.txt and llms-full.txt via web UI or API, Mintlify's free generator (paste a URL and get a formatted starter file), and Wetrocloud's Website to Markdown Converter for LLM-optimized output. WordPress operators can use Yoast SEO with one-click activation and weekly regeneration.
Assemble the file structure. H1 with your project or site name, blockquote summary, H2 sections matching your llms.txt categories, H3 headings for individual documents, full Markdown content beneath each heading, horizontal rules separating documents. Always include source URLs for attribution. Preserve original heading structure as much as possible. Maintain consistent formatting across the file.
Host and validate. Serve both files from the root of your primary domain. For documentation subdomains, publish a copy there too. Set the Content-Type header to text/plain; charset=utf-8. Enable gzip or Brotli compression. Test that the file returns HTTP 200, uses correct encoding, and is not blocked by your CDN. Then test that real language models can actually answer questions about your content using the file.

Two structural decisions that catch teams later. First, AI crawlers have soft limits on file size: files over 100KB are often partially indexed or deprioritized. That guidance applies most strictly to llms.txt itself. llms-full.txt will be larger by design, but the focus discipline still applies. Second, never include gated content. If your site requires authentication, llms.txt and llms-full.txt also require authentication to view. LLMs and AI tools that cannot authenticate into your site cannot access these files. Listing pages behind login walls wastes space and confuses AI systems.

Advanced moves to plan over the quarter

The first version of your file is the entry point. Over the next quarter, four upgrades separate teams who treated this as a one-time build from teams treating llms-full.txt as production content infrastructure.

Segment by product area

For multi-product companies, a single monolithic file may not serve you well. Create separate files for different product areas: API navigation and complete API docs at one path, tutorial navigation and complete tutorials at another. Both files are available at any level of your documentation hierarchy - /llms.txt, /llms-full.txt, /docs/llms.txt, /docs/ai-features/llms-full.txt, and so on.

Filter by language or specification

Fern and similar platforms let you filter llms.txt and llms-full.txt output with query parameters like lang and excludeSpec to reduce token usage. The use case is real: a developer who only writes Python doesn't need every Ruby, Go, and TypeScript example in the file they paste into Claude. Some implementations even add a dropdown in the navbar linking to different filtered versions of llms-full.txt.

Control what AI sees vs what humans see

Platforms now offer granular controls. Within pages, use llms-only and llms-ignore tags to control what content is exposed to AI versus human readers. The llms-only tag shows content to AI but hides it from human readers - useful for technical context that is verbose but helpful for AI, like implementation details or architecture notes.

Wire it into CI/CD

Set up CI/CD pipelines to automatically update both files when documentation changes. This is non-negotiable for any team that updates docs more than monthly. Stale content is worse than no file. Aim to regenerate llms-full.txt whenever significant documentation changes are published. Add a change log in Git or internal documentation to track why a page was added or removed.

Common implementation failures we keep seeing

The patterns below come from documentation audits across SaaS and ecommerce clients in 2025. The same handful of mistakes appears in nearly every implementation that has not been actively maintained, and they consistently erase the value of the file.

From the audit notes

On a B2B SaaS account, the team had shipped a llms-full.txt that mirrored their entire sitemap - including blog articles, gated whitepapers, careers pages, and the privacy policy. The file was 1.4 MB and included nine pages behind auth walls (which returned 302s to a login page when fetched). When tested with a real LLM, the model could answer fewer questions about the product than it could using the public website directly. The fix was unglamorous: cut the file to 22 documents covering the API reference, quickstart, three integration guides, the pricing page, and the SLA. The file dropped to 180 KB and the same LLM test went from incoherent to consistently correct.

Five failure modes show up most often. Treating the file as a duplicate sitemap, where breadth replaces curation. Including gated content that LLMs cannot actually fetch. Forgetting security implications - llms-full.txt consolidates everything in one place, so an attacker with write access could inject malicious instructions or misleading content. Treat it as a security-sensitive file: automate generation, require code review, and monitor for changes. Never updating the file, which decays the value within a quarter. And duplicating every blog article as .md files linked from llms.txt, which produces noise without measurable benefit. Maintenance cadence: monthly review at minimum, paired with strategic content releases.

What we are still watching

Four open questions are shaping how aggressively to invest in llms-full.txt over the next two quarters.

Search-citation effect: Whether the no-measurable-effect finding from large-scale statistical analysis holds as adoption rises past the current 7.4-10% range. If broader adoption changes how AI systems weight the signal, today's neutral finding could shift.
Standardization tightening: Whether llmstxt.org publishes additional structure (canonical schemas, content-type hints, freshness signals) that promote llms-full.txt from a useful convention to a fully validated specification. The current loose spec is part of what slows enterprise adoption.
Measurement tooling: How operators measure whether the file is doing anything. Server logs and crawler hit rates are the obvious primary signal, but the field still lacks a clean way to attribute downstream AI citations to llms-full.txt ingestion specifically.
Editorial-content question: Whether content-marketing sites should bother. The current evidence says probably not at scale, but the data is preliminary and the cost of building a minimal file is low. Watch behavior in the next 6-12 months before committing serious resource.

Frequently asked

What's the difference between llms.txt and llms-full.txt?

llms.txt acts as a curated table of contents pointing AI systems to relevant sources - a streamlined navigation index. llms-full.txt contains the entire documentation content in one Markdown file, ready for single-request ingestion with no additional fetching or HTML parsing required. Most implementations ship both files, with llms.txt linking to llms-full.txt.

How big should the file be?

AI crawlers apply soft limits where files over 100KB are often partially indexed or deprioritized - that's the strictest guidance for llms.txt itself. llms-full.txt will be larger by design but should still prioritize curation. Aim for 20-50 priority documents rather than mirroring your full sitemap. Context windows are expanding but focus, not size, is the actual constraint.

Will llms-full.txt boost my AI citations?

Probably not directly. Statistical analysis and machine-learning models have found no measurable effect of llms.txt on how often a domain is cited by LLMs. The value is in content accuracy when AI systems do reference your material, plus measurable signal that LLMs access llms-full.txt more frequently than llms.txt. For documentation-heavy products this is where the standard genuinely earns its keep.

Where do I host the file and what headers do I need?

Serve both files from the root of your primary domain (and from any documentation subdomain). Set Content-Type to text/plain; charset=utf-8 and enable gzip or Brotli compression. Verify HTTP 200, correct encoding, and that your CDN is not blocking the path. Both files are also available at any level of your hierarchy - /docs/llms.txt and /docs/ai-features/llms-full.txt are valid.

Do auto-generators handle this for me?

For most documentation platforms, yes. Mintlify and Fern auto-generate token-optimized llms.txt and llms-full.txt files whenever your documentation changes. GitBook added support in early 2025. WordPress operators can use Yoast SEO or the Website LLMs.txt plugin. Static-site generators have sphinx-llms-txt, eleventy-plugin-llms-txt, and gatsby-plugin-llms-txt. Custom stacks (Next.js, Nuxt, SvelteKit) typically generate from the headless CMS.

References

llms.txt specification. Jeremy Howard (Answer.AI). Proposed September 2024. llmstxt.org
Mintlify documentation platform. llms-full.txt collaboration with Anthropic; auto-generation of /llms.txt, /llms-full.txt, and .md page versions for all customers.
Profound. Crawler-traffic analysis showing LLM access frequency to llms-full.txt vs llms.txt.
ProGEO.ai. Fortune 500 adoption study (March 2026) measuring llms.txt, JSON-LD, and AI directives in robots.txt across the Fortune 500. Finding: 7.4% Fortune 500 adoption of llms.txt.
SE Ranking. Adoption study across approximately 300,000 domains. Finding: 10.13% of domains had llms.txt deployed.
Fern. Documentation platform reporting that serving Markdown instead of HTML reduces token consumption by 90%+ versus equivalent HTML pages.
Anthropic. Adopted llms.txt and llms-full.txt for Claude documentation via Mintlify partnership.
Google. Included an llms.txt file in the Agents to Agents (A2A) protocol release.
Yoast SEO (WordPress). llms.txt generation feature with one-click activation and weekly regeneration via WordPress cron.
GitBook. llms.txt support shipped January 2025; llms-full.txt and .md page support added June 2025.