llms.txt: What It Is, How It Works, and How We Built Ours
What llms.txt does, what it doesn't, and how Nekko ships one in production — real file excerpts, setup steps, and an honest read on AI search. Start here.
By Rodrigo Diniz Published
Founder & Head of Search Strategy
There is a text file at nekkodigital.com/llms.txt — 158 lines, plain Markdown, public. You can open it right now. There is a second one at /llms-full.txt that runs past five thousand lines. We generate both at build time from the same content that renders this website, and we have shipped them in production for months.
That is the honest credential behind this post. Most “what is llms.txt” explainers are written secondhand from the spec by teams that have never shipped one. This one walks through our own live files — real excerpts, real ordering decisions, real automation. Here is the thesis up front, so you can decide whether to keep reading: llms.txt is cheap to implement, genuinely contested in adoption, and — for most sites — still worth shipping. The rest of this post is the evidence on all three.
What Is llms.txt? The 60-Second Version
llms.txt is a Markdown file you place at the root of your domain that hands large language models a curated, token-efficient map of your most important content. Instead of making an AI model crawl your whole site and wade through navigation, ads, and boilerplate, it offers a clean index: here are our canonical pages, grouped and described, in the format a model reads most easily.
It was proposed in September 2024 by Jeremy Howard, co-founder of Answer.AI, and the canonical spec lives at llmstxt.org. The original rationale was not about crawling at all — it was about inference-time context. Language models have limited context windows, and HTML is a noisy, expensive way to fill them. A concise Markdown manifest lets a model load the parts of your site that matter without burning tokens on markup. Keep that framing in mind: llms.txt is a reading aid for models, not a set of crawler directives.
The Spec: How an llms.txt File Is Structured
The format is deliberately minimal. The spec at llmstxt.org defines a small set of rules, and a valid file needs only a few elements:
- An H1 with the name of your site or project (the only required line).
- An optional blockquote (
>) giving a one-sentence summary. - Zero or more H2 sections, each grouping a list of links.
- Each list item is a standard Markdown link, optionally followed by
:and a short description.
A minimal, generic llms.txt example looks like this:
# Your Company
> One-sentence summary of what your company does.
## Docs
- [Getting Started](https://example.com/start): Set up the product in five minutes.
- [API Reference](https://example.com/api): Endpoints, authentication, and rate limits.
## About
- [Company](https://example.com/about): Who we are and what we believe.
That is the entire llms.txt format — there is no schema to validate, no XML, no special syntax. The spec also defines a companion convention, llms-full.txt, which carries the full text of your pages rather than just links. We will get to ours later. This is also where the long-tail searches live — llms.txt example, llms.txt format — and the answer to all of them is the same: it is just structured Markdown.
llms.txt vs robots.txt vs sitemap.xml: Three Files, Three Different Jobs
Because all three live at your site’s root, they get confused constantly. They do completely different jobs:
| robots.txt | sitemap.xml | llms.txt | |
|---|---|---|---|
| Who reads it | Search + AI crawlers | Search engines | LLMs / AI assistants (opt-in) |
| Format | Plain-text directives | XML | Markdown |
| What it controls | Which URLs bots may crawl | Which URLs exist and when they changed | A curated, readable map of key content |
| Required? | No, but expected | No, but expected | No — emerging convention |
The single most common misconception is worth killing directly: llms.txt is not robots.txt configuration. It does not grant or deny access to anything, and a crawler that ignores it faces no consequence. It also does not affect crawl budget — how much a search engine crawls you is governed by your robots.txt rules and your server performance, not by the presence of a reading manifest. If a tool tells you llms.txt will fix your crawl efficiency, that tool is wrong.
Does Anything Actually Read It? An Honest Adoption Check
This is the section that should make you skeptical in the right places. The uncomfortable truth, as of June 2026: no major AI company — Google, OpenAI, Anthropic, Meta, or Mistral — has publicly committed to reading or acting on llms.txt in their production answer or search systems.
| Engine | Crawler user-agent(s) | Public position on llms.txt | Verified |
|---|---|---|---|
| Googlebot, Google-Extended | Does not use it; John Mueller likens it to the keywords meta tag | Jun 2026 | |
| OpenAI | GPTBot, OAI-SearchBot | No production commitment; aware of the convention | Jun 2026 |
| Anthropic | ClaudeBot | Publishes its own llms.txt; read by its coding tools | Jun 2026 |
| Perplexity | PerplexityBot | Indicates it reads the file; no formal production guarantee | Jun 2026 |
Google has been the loudest skeptic: its Search team has said it does not use llms.txt, and John Mueller publicly compared maintaining a separate bot-only Markdown file to the long-dead keywords meta tag. The most consistent real-world signal is narrower than the hype: llms.txt is read today mainly by developer and coding agents — tools like Claude Code and Cursor that pull a project’s docs — rather than by the consumer chat engines that answer “best Hawaii law firm.” Adoption across the open web sits in the low double digits by most measurements, concentrated in technology and documentation sites.
So the honest verdict is a cost-benefit one, not a hype one: low effort, no downside, unproven mandate, and an asymmetric upside if the convention catches on. That is a very different — and more useful — claim than “you must have llms.txt to rank in AI.” You do not. But it is cheap, and it sits squarely in the substrate of where AI search is heading, which is why we ship it.
Inside Nekko’s Production llms.txt: A Line-by-Line Walkthrough
Here is the part no secondhand explainer can write. This is the top of our actual, live file:
# Nekko Digital
> Hawaii-based digital marketing agency — SEO, AI-search optimization (GEO/AEO), web design, and conversion strategy for hospitality, real estate, and local businesses.
This file follows the llms.txt spec and lists the canonical pages on
nekkodigital.com. The companion /llms-full.txt contains the full markdown
bodies for every item.
## Services
- [Generative Engine Optimization (GEO)](https://nekkodigital.com/generative-engine-optimization/): Build E-E-A-T excellence and topical authority so AI engines like ChatGPT, Perplexity, and Claude cite your brand as the answer.
- [Local SEO](https://nekkodigital.com/local-seo/): Hyper-local targeting, Google Business Profile optimization, and mobile-first strategies to dominate local search and Google Maps.
The header and summary blockquote
The first line is the required H1 — our name. The blockquote underneath is the one-sentence summary a model reads first to understand what we are. Then a short paragraph points to the companion file. Nothing fancy, but every element is doing the spec’s job.
Section ordering: services first, reference last
The file is organized into sections in deliberate priority order: Services first (what we sell), then Authors, Recent Articles, Case Studies, Industries, Locations, Tools, Guides, The Reef Method, Blog Topics, and finally About / Reference. The ordering is editorial, not alphabetical — we lead with the pages we most want an AI model to associate with us, and end with evergreen housekeeping like the contact and privacy pages. Every entry pairs one canonical URL with one plain-language description, so a model gets both the link and the reason to follow it.
Why we auto-generate instead of hand-writing
The most important decision is invisible in the file itself: we never type it by hand. It is generated at build time from the same content collections that render the website. Service descriptions, article titles, page summaries — all of it is pulled from each page’s existing metadata when the site builds. Add a blog post, rename a service, ship a new location page, and the manifest updates itself on the next deploy. A hand-maintained list inevitably rots — someone edits a page title and forgets the separate file — and a stale manifest is actively misleading. Generating from the single source of truth means ours cannot silently go out of date. In our Reef Method framework, this is pure Substrate work: the technical foundation that makes everything above it legible.
llms-full.txt: The Companion File, Explained
If llms.txt is the index, llms-full.txt is the whole book. The spec’s companion convention carries the full Markdown body of every listed item, not just the link — and ours runs to roughly 5,200 lines. Where the manifest says “here is our AI Search Manual and what it covers,” the full file contains the article’s actual text, so a model can ingest the content in a single fetch without parsing a page of HTML.
When is the companion worth it? When your value lives in long-form content — guides, documentation, reference material — that you want a model to read in full. When your site is mostly short landing pages, the manifest alone is plenty, and a giant full-text file is overhead you do not need. We ship both because our research and guides are the point; many sites should ship only the manifest.
How to Create an llms.txt File for Your Site
For a small site, this is a 30-minute technical SEO task. Four steps:
Step 1: Inventory your canonical pages
List the pages you actually want an AI model to know about — core services, your best articles, key reference pages. Leave out thin, duplicate, or utility pages. This is curation, not a sitemap dump.
Step 2: Write the H1 and summary
Start the file with # Your Company Name, then a one-line > blockquote summarizing what you do. This is the first thing a model reads, so make it concrete and specific.
Step 3: Group links into H2 sections
Create ## Section headings (Services, Guides, About) and, under each, list pages as Markdown links with a short description after a colon. Order by importance, not alphabetically.
Step 4: Serve it at the root and keep it current
Save the file as llms.txt and publish it at yourdomain.com/llms.txt so it is reachable at the root. Then solve the real problem: staleness. A manifest that lies about your site is worse than none. If your platform can generate it from your content — many CMS plugins and static-site setups now can — use that. Otherwise, put a recurring reminder to update it. This is one line item in the rest of your technical foundation; do not let it be the one that rots.
Where This Fits in Your AI Search Strategy
Zoom out, because this is where people overinvest. llms.txt is one input at the substrate layer — useful, cheap, and entirely upstream of the thing you actually care about, which is whether AI engines cite you. A perfect manifest earns nothing on its own; citations come from authority, structure, and content that genuinely answers the questions your customers ask.
So treat llms.txt as table stakes, then spend your real effort up the stack. The AI Search Manual is the full playbook for getting cited. Generative engine optimization services are that work done for you. And the Reef Citation Index is how we measure whether any of it actually moves citations across the five major engines — because shipping a file is an input, and a tracked citation is the output that pays for it.
Frequently Asked Questions
What is llms.txt?
llms.txt is a Markdown file placed at your site’s root that gives AI models a curated, token-efficient map of your most important pages. Proposed by Jeremy Howard of Answer.AI in 2024, it is designed to help LLMs find and read your canonical content — not to control crawling.
Is llms.txt the same as robots.txt?
No. robots.txt tells crawlers which URLs they may access; llms.txt suggests which content is worth reading and links to it in Markdown. robots.txt is a permission gate that crawlers enforce; llms.txt is an opt-in content map with no enforcement mechanism behind it.
Does Google support llms.txt?
No. As of June 2026, Google says it does not use llms.txt, and John Mueller has compared it to the old keywords meta tag — something sites add that the engine ignores. Treat llms.txt as an optional convention, not a Google ranking factor.
Does llms.txt improve crawl budget?
No. llms.txt does not change how often or how deeply crawlers visit you — robots.txt rules, internal linking, site speed, and server performance govern crawl budget. llms.txt is a curated reading map for AI models, not a crawl-control file, so it neither raises nor lowers your budget.
What should an llms.txt file include?
An H1 with your site or company name, a blockquote one-line summary, and H2 sections grouping links to your canonical pages — each a Markdown link followed by a short description. Keep it to the pages that matter most; the format is plain Markdown, nothing more.
What is llms-full.txt?
llms-full.txt is the optional companion file that includes the full Markdown body of every page listed in your llms.txt, not just the links. It lets an AI model ingest your actual content in one fetch without parsing HTML. Ours runs about 5,200 lines next to the 158-line manifest.
Do ChatGPT and Claude read llms.txt?
Partly. Anthropic publishes its own llms.txt and its coding tools read the format; OpenAI has not confirmed production use in ChatGPT. As of June 2026 it is consumed mainly by developer and coding agents, not the consumer chat engines — so it is low-cost insurance, not a guaranteed citation lever.
How do I add llms.txt to my website?
Create a plain-text Markdown file named llms.txt, write your H1, summary, and sectioned links, then serve it at yourdomain.com/llms.txt. For most small sites it is a 30-minute technical SEO task; the harder part is keeping it current as pages change, so automate generation if you can.
The short version: build the file, keep it honest, then go earn the citations it can only point toward. If you want to see where your site stands today, our AI search readiness checklist scores machine-readability and the signals that actually move AI visibility — a practical next step after you have shipped your manifest.
Rodrigo Diniz
Founder & Head of Search Strategy
Founder & Head of Search Strategy at Nekko Digital with 15+ years in digital marketing and AI search optimization.