Infrastructure and SEO

llms.txt: what it is, what to put in it, and if you need one

llms.txt is a markdown index that hands LLMs a curated map of your site. Here is what to put in it and whether it actually moves AI citations.

April 26, 20267 min read
llms.txt: what it is, what to put in it, and if you need one

llms.txt is a proposed plain-text file at the root of a website that hands large language models a curated, markdown-formatted index of the site's most important content. It exists because context windows are too small to ingest a real website, and because HTML, ads, and JavaScript make most pages painful for an LLM to parse cleanly.

The proposal is recent. Jeremy Howard at Answer.AI published it in September 2024. The spec lives at llmstxt.org. As of 2026, adoption is moderate and patchy. Fewer than 11% of domains in a 300,000-site sample carry the file (Search Engine Journal, 2025), and the same dataset shows no measurable lift in citation frequency from the file alone. The honest answer to "do I need one" is more nuanced than the marketing headlines suggest.

The 30-second version

Drop a /llms.txt at your site root. Make it a markdown file with one H1, one blockquote summary, and a few H2 sections that link to the canonical, content-rich pages you want an LLM to read. Optionally publish a sibling /llms-full.txt with the full text of those pages bundled together. That is it. There is no schema validator, no Google submission step, no penalty for skipping it.

Why the proposal exists

Context windows are finite. Even a frontier model with a million-token budget cannot ingest the rendered HTML of a marketing site, a docs portal, and a blog and reason cleanly across them. Every byte spent on navigation, JavaScript, cookie banners, and tracker scripts is a byte not spent on the content the model needs to answer the user.

The robots.txt file solved a similar problem in 1994 by giving crawlers a single allow-or-deny instruction. The sitemap.xml file extended that in 2005 by giving crawlers a discoverable list of canonical URLs. Neither tells a model what your site is about or which pages matter most. That is the gap llms.txt tries to close.

What goes inside the file

The format is intentionally narrow. A valid llms.txt has four parts:

  1. One H1 with the site or project name.
  2. One blockquote summarising the site in 1 to 3 sentences.
  3. Optional prose giving extra context: who the audience is, how the site is organised, what tone the LLM should adopt when summarising it.
  4. One or more H2 sections, each containing a markdown bullet list of links. Each link follows the shape - [Link text](https://full-url): one-sentence description.

The convention is to group sections by intent: a "Docs" section, a "Blog" section, a "Pricing" section, sometimes an "Optional" section that the LLM is allowed to skip if context is tight. Links must be absolute URLs. Descriptions should be short, factual, and scannable, the same shape a search snippet takes.

llms.txt vs llms-full.txt

Two files, two jobs. llms.txt is the catalogue, a thin index that points to other URLs the LLM can fetch on demand. llms-full.txt is the bundle, the full markdown text of those pages concatenated into one file an LLM can ingest in a single request.

Anthropic publishes both. Their llms.txt is small and links into the docs tree. Their llms-full.txt ships the full API documentation in markdown, hundreds of thousands of tokens. Vercel and Cloudflare follow the same pattern.

The split exists because making an LLM spend tokens to fetch every linked page is wasteful when the same content can be served pre-bundled. If your site is small, llms.txt alone is enough. If you ship serious documentation or a content corpus you actually want quoted, llms-full.txt is where the citation upside lives.

Does it actually move AI citations

This is the question the marketing posts dodge. The honest answer in 2026: the data does not support the headline claim.

SE Ranking analysed roughly 300,000 domains and found no statistical relationship between having an llms.txt file and citation frequency in major LLM answers. OtterlyAI tracked 10 sites for 90 days and saw no change in AI traffic on 8 of them. Google has confirmed that AI Overviews and AI Mode rely on traditional SEO signals, not llms.txt. In OtterlyAI's logs, only 0.1% of AI crawler requests touched the file at all.

That does not mean the file is useless. It means it does not substitute for the rest of the GEO playbook. If your pages are not already factual, citation-dense, and structured for extraction, an llms.txt index pointing at them will not raise their citation rate. We covered the levers that do move the needle in our breakdown of GEO best practices.

When to ship it anyway

Three cases where llms.txt earns its place even without proven citation lift.

Documentation-heavy products. If your users routinely paste your docs URL into ChatGPT or Claude to ask "how does this API work", a well-structured llms-full.txt makes those answers more accurate. The win is reduced support load, not raw citations. This is why Anthropic, Vercel, and Cloudflare all ship one.

Content-first sites. Blogs, news sites, knowledge bases. The cost is one file. The downside is zero. The upside, if and when AI engines start respecting the standard, is already in place.

Compliance and policy clarity. Several major publishers use llms.txt as a public signal of which content is permitted for AI training and which is not, complementing robots.txt rules for AI crawlers. Useful when legal asks where the policy is published.

When to skip it

Marketing landing pages and short brochure sites do not need one. The file would shrink to "here is our homepage and our pricing page", which an LLM can parse from the HTML in 200 tokens. Single-page sites, transactional flows, and apps that have no public reading surface have nothing to index.

Adjacent files: how llms.txt fits the picture

Three plain-text files now share the site root, each solving a different problem.

  • robots.txt tells crawlers where they may go. It is normative: allow and disallow.
  • sitemap.xml tells crawlers which URLs exist. It is discovery-oriented: a flat list.
  • llms.txt tells LLMs which URLs matter and what they are about. It is editorial: a curated index with prose.

None of them replace the others. A complete setup ships all three, plus structured data (JSON-LD) inside the HTML of each individual page. Schema markup remains the strongest signal for both classic search and AI engines, regardless of where llms.txt lands as a standard. The broader picture sits inside our explainer on GEO vs SEO.

Practical implementation

Add a static file at public/llms.txt in a Next.js project, or serve it through a route handler if you prefer to generate it from your CMS. Keep it under a few hundred lines so a model can read it without exceeding context. Update it when you ship significant new content. Submit your sitemap.xml to Google Search Console and Bing as usual; llms.txt does not have an equivalent submission step, since no major AI engine has built one yet.

If you want the bundled version, generate llms-full.txt at build time from your published markdown. A typical Next.js content site does this with a build script that walks the content tree and concatenates each published page's body with a leading H1 and a canonical URL line.

The bottom line

llms.txt is low-effort, low-risk, and not yet a measurable visibility lever. Ship it if your site has reading content worth indexing. Treat it as table stakes for documentation. Do not treat it as a substitute for the harder work: factually accurate content, schema markup, citations placed at the claim site, and the first 200 tokens of each page doing real work.

The standard may grow. Until it does, the file is a polite gesture toward a future where LLMs read the web on the web's own terms.

Sources

Photo by Susan Wilkinson on Unsplash

Studio

Start a project.

One partner for companies, public sector, startups and SaaS. Faster delivery, modern tech, lower costs. One team, one invoice.