Interactive Calculators: Why ChatGPT Cites Them at 4x the Rate of Static Pages
Jeremy Howard proposed llms.txt in September 2024. By 2026 it split into two artifacts with very different costs. A 2026 audit of 4,200 sites shows 38 percent ship the wrong one for their goal.
By Kwame Asante, Open Source & DevRel · May 25, 2026
llms.txt vs llms-full.txt in 2026: deployment patterns, crawl budget math, open source code references, and the build pipeline for shipping both without leaks.
Frequently Asked Questions
What is the difference between llms.txt and llms-full.txt?
llms.txt is a curated table of contents in markdown that points crawlers to the most important URLs on your site, usually one to three hundred lines long. llms-full.txt is the full concatenated body of every page or doc listed in llms.txt, often running tens of megabytes for documentation-heavy sites. The split emerged in late 2024 and early 2025 after Jeremy Howard's original llmstxt.org proposal, when developers realized one artifact could not serve both purposes. llms.txt optimizes for navigation and discovery, costs almost nothing in bandwidth, and lets crawlers selectively fetch the canonical URL of each section. llms-full.txt optimizes for one-shot ingestion by an LLM during retrieval or fine-tuning, costs a lot in bandwidth, and reveals your entire content corpus in a single fetch. Most modern adoption ships both files side by side.
Should I publish llms-full.txt or just llms.txt?
Publish llms.txt for almost every site. Publish llms-full.txt only if you have a defensible reason to give LLMs your entire content in one request, typically because you are documentation-first, open source, or actively trying to be cited and ingested. If your content is competitive intellectual property, behind paywalls, or expensive to crawl, skip llms-full.txt entirely and let crawlers fetch individual canonical URLs through llms.txt instead. Anthropic, Mintlify, and Cloudflare ship both files for their docs because their business model rewards LLM citations of their developer documentation. SaaS marketing sites and ecommerce stores typically should not ship llms-full.txt because they have no upside from giving the full corpus to crawlers in one shot.
Does llms.txt actually affect AI search citations in 2026?
The signal is positive but weaker than the marketing claims suggest. Cloudflare's 2026 crawler data shows ChatGPT, Perplexity, and Claude crawlers fetch llms.txt on roughly 31 percent of sites that publish it, up from 8 percent in mid 2025. Sites that ship both llms.txt and llms-full.txt see crawl efficiency improvements of 14 to 22 percent measured as crawler bandwidth per indexed URL. Whether this translates to citation rate uplift depends on the underlying content. A 2026 study of 3,200 documentation sites by Mintlify found a 6 to 11 percent increase in citation rate after shipping llms.txt, controlling for other variables. The mechanism is not magic; the file just makes the canonical URL set discoverable and reduces wasted crawl on navigation chrome.
How do I generate llms.txt and llms-full.txt for my site?
Use a static site generator plugin if you have one, or write a build-time script that reads your sitemap and content directory and concatenates the relevant fields. Mintlify, Docusaurus, and Nextra all ship plugins that produce both files automatically as part of the docs build. For custom sites, the pattern is straightforward: parse your sitemap.xml or content tree, extract each page's title and canonical URL, write those to llms.txt as a markdown link list, then optionally fetch each page's markdown source and concatenate it to llms-full.txt with a clear delimiter. Run the generation step in CI so the files stay synchronized with the published content. Cloudflare Workers and Vercel both offer auto-generation features as of early 2026 that build the files at the edge without requiring custom code.
Can publishing llms-full.txt hurt my search rankings or crawl budget?
It can hurt crawl budget if you serve it incorrectly. The file itself does not affect Google search rankings because Googlebot does not currently use llms.txt for indexing. The risk is bandwidth amplification: if llms-full.txt is twenty megabytes and forty different AI crawlers fetch it daily, you are serving 800 megabytes per day of cold cache traffic from your origin. The mitigations are CDN caching with long TTLs, gzip or brotli compression which typically reduces text payload by 75 to 85 percent, and selective robots.txt rules that allow specific crawler user agents while blocking others. The other risk is competitive intelligence leak: shipping your entire content corpus in one file makes it trivial for competitors to download and analyze your full information advantage.
Related Articles
Topics: AEO, LLM Crawlers, Open Source Code, Technical SEO, Content Distribution
Browse all articles | About Signal