llms.txt Is the New robots.txt: What AI Crawlers Actually Do With It
The llms.txt proposal exploded across hacker forums and SEO Twitter in 2025. By mid-2026, every serious publisher has one. The catch: most of them are configured wrong, and the major AI labs are not reading them the way teams assume.
By Alex Marchetti, Growth Editor · May 20, 2026
llms.txt explained for 2026: what the file is, which AI crawlers respect it, how it differs from robots.txt, and the practical configuration most publishers get wrong.
Frequently Asked Questions
What is llms.txt and where does it live on a site?
llms.txt is a plain-text Markdown file proposed in 2024 by Jeremy Howard as a way for websites to expose curated, LLM-friendly summaries of their most important content. The file sits at the root of a domain, at the path /llms.txt, in the same location as robots.txt and sitemap.xml. Inside, the file uses Markdown headings and link lists to nominate the pages a publisher most wants AI systems to surface or cite. A companion file, /llms-full.txt, is sometimes published with concatenated cleaned content of those pages so an LLM with a long context window can ingest the full corpus in one fetch. The proposal is not a W3C standard and has no enforcement mechanism, but its simplicity made adoption fast among technical sites in 2025.
Do ChatGPT, Claude, Perplexity, and Google's AI features actually read llms.txt?
As of May 2026, the picture is uneven. Anthropic has publicly acknowledged that Claude's web fetcher considers llms.txt as one signal among many when summarizing a domain. Perplexity has discussed using llms.txt to improve citation quality. OpenAI and Google have been less explicit. Independent crawl analyses from sites like Common Crawl and from publisher logs show that the major AI labs' fetchers do request llms.txt when crawling a domain, but no lab has confirmed that the file is a primary input to training or to retrieval-augmented generation. The honest summary is that llms.txt is a low-cost hint, not a guaranteed ranking lever. Publishers who treat it as either are setting themselves up for misallocated effort.
How is llms.txt different from robots.txt?
robots.txt is a permissioning file. It tells crawlers which paths they are allowed to fetch and which user agents are blocked. It is a directive that compliant crawlers respect. llms.txt is a curation file. It does not block or allow anything. It tells AI crawlers which pages on the site the publisher considers most important and well-suited for citation or summarization. The two files coexist. A site can use robots.txt to block GPTBot from a paywall, then use llms.txt to curate which open-access pages it wants surfaced. Treating llms.txt as if it were robots.txt — for example, using it to block crawlers — is a common configuration mistake.
What should publishers put in llms.txt and what should they leave out?
The strongest pattern is to publish a short Markdown file with a one-paragraph site overview, a Quick Links section pointing to the most-cited canonical pages, a Documentation or Knowledge Base section grouping evergreen content, and a Recent Updates section with the freshest authoritative pieces. Each entry should be a Markdown link followed by a short description. The file should be under a few hundred lines so an AI system can ingest it cheaply. Pages to leave out include thin marketing landing pages, dated promotional content, pages that duplicate other content, and any URL the publisher would not want quoted out of context. A messy llms.txt is worse than no file because it signals low editorial quality to the systems that do consume it.
Does llms.txt help with AI Overviews, AI Mode, or Perplexity citations?
Google's documentation for AI Overviews and AI Mode does not list llms.txt as a requirement, and Google has stated that the same SEO foundations that drive Search drive AI features. So llms.txt is unlikely to be a direct ranking input for Google's surfaces. For Perplexity and Claude, llms.txt appears to be one of many crawl-time signals, and publishers who maintain a clean file may see modest citation lift over time. The realistic expectation is that llms.txt becomes part of a broader AI-friendly content stack — clean HTML, accurate structured data, comprehensive sitemaps, and llms.txt — rather than a single lever that materially changes visibility on its own.
Will llms.txt eventually become an official standard?
There is no W3C or IETF working group adopting llms.txt as of mid-2026. The proposal remains a community standard maintained on its original spec page and a handful of GitHub repositories. Anthropic, Perplexity, and several smaller AI companies have publicly endorsed the format. Google and OpenAI have not committed to making it canonical. If the proposal does formalize, it is likely to happen the way sitemap.xml did: through enough industry adoption that the major search and AI vendors collectively agree to a stable schema. Publishers should treat the current spec as stable enough to implement, while expecting that conventions and best practices will continue to evolve.
Related Articles
Topics: SEO, AEO, llms.txt, AI Crawlers, Standards, Strategy
Browse all articles | About Signal