What is llms.txt?
llms.txt is the emerging standard for telling AI systems — ChatGPT, Claude, Perplexity, Gemini — which pages on your site matter, in what order, and how they relate. Think robots.txt for the LLM era. Lower stakes than robots.txt. Higher leverage for citation.
So what is llms.txt?
A markdown file that lives at the root of your domain, at /llms.txt. It tells AI systems which pages on your site are important, in what order, and how they relate to each other.
The file is markdown. Required pieces — one H1 with the project name, optionally a > blockquote summary line, then any number of H2 sections containing markdown lists of links. The full open spec lives at llmstxt.org.
That's it. The format is simple on purpose. The leverage comes from how AI systems use it.
How is llms.txt different from robots.txt and sitemap.xml?
Three different jobs, often confused.
| File | Tells the engines | Format |
|---|---|---|
| robots.txt | What you allow them to crawl | Plain text directives |
| sitemap.xml | Which URLs exist on your site | XML with timestamps |
| llms.txt | Which pages matter, in what hierarchy, how they relate | Markdown with structured sections |
robots.txt is permission. sitemap.xml is inventory. llms.txt is curation.
A site can have a clean robots.txt that allows all AI crawlers, a complete sitemap.xml listing every URL, and still get cited poorly because the engines have to guess which pages are the canonical answers to category questions. llms.txt is the operator telling the engines exactly which page answers each question.
Which AI engines actually read llms.txt?
The honest answer — fewer than you'd expect today, more than you'd expect in twelve months.
As of Q1 2026, llms.txt is read by:
- Anthropic / Claude — confirmed reading and weighting llms.txt for site-context queries
- Perplexity — partial — uses llms.txt for navigation hints but doesn't strictly weight the hierarchy
- OpenAI / ChatGPT — under evaluation; not confirmed as a load-bearing input yet
- Google / Gemini — not confirmed; uses sitemap.xml + structured data + Google Search index instead
The trajectory matters more than the snapshot. Anthropic shipped first. Perplexity followed. Open-source frameworks (LangChain, LlamaIndex) read it natively. Most independent agent frameworks built in 2025–2026 read it. Even when ChatGPT and Gemini don't load-bearing-weight it today, they're absorbing it through training data — your llms.txt becomes part of the training corpus alongside your other public pages.
The cost of deploying llms.txt is roughly an hour. The leverage compounds across every engine that adopts it over the next 12 to 18 months.
What goes inside llms.txt?
The spec is permissive. Operator practice has converged on a specific structure.
Required line — H1 with the project name.
# Acme Corp
Recommended — a one-line blockquote summary.
> Acme is a B2B SaaS for inventory automation in mid-market retail.
Optional — a paragraph or two of context. This is where operator voice can matter. Keep it factual, not promotional.
Recommended — H2 sections grouping links.
## What we offer
- [Pricing](https://acme.com/pricing): Five-tier ladder
- [Free trial](https://acme.com/trial): 14-day no-card-needed trial
## Documentation
- [API reference](https://docs.acme.com/api)
- [Integration guide](https://docs.acme.com/integrations)
Each list item is - [name](url) or - [name](url): description. The description is optional but high-leverage — it's what the engines extract as context when citing the page.
Optional — an ## Optional section. Items in this section are explicitly de-prioritized. The engines will skip them when context is constrained. Use this for legal pages, archives, or low-priority surfaces.
Where does llms.txt live?
At the root of your domain. https://yourdomain.com/llms.txt. Always exactly that path.
A common mistake — putting it at /public/llms.txt or /static/llms.txt. The engines look at the root. If your file isn't at /llms.txt exactly, it's invisible.
For a Next.js site, that means putting llms.txt in your public/ directory so it serves at the root. For Hugo, it goes in static/. For static-HTML sites, it goes in the document root. For headless CMS deployments, it usually requires a redirect or a server-side route.
How do you validate a llms.txt file?
Three approaches, in order of rigor.
Approach 1 — eye it. Open llmstxt.org in one tab, your llms.txt in another. Compare structure. If the file starts with #, has a > summary, and has H2 sections with - [name](url) lists, you're 90% of the way there.
Approach 2 — run a linter. Doxia Axis publishes a free Python stdlib-only linter at github.com/Doxia-Axis/llmstxt-linter. Three input modes — local path, URL, or stdin. Validates structure against the open spec. Exits non-zero on errors. Wire into pre-commit or CI.
python3 llmstxt_lint.py https://yourdomain.com/llms.txt
Approach 3 — let the engines tell you. Ask Claude (or any AI assistant that reads llms.txt) a category-shaped question that should pull from your site. If the answer cites the right pages from your llms.txt hierarchy, the file is working. If the answer cites pages you de-prioritized in ## Optional, your hierarchy is fighting your intent.
What does a good llms.txt actually look like?
A few patterns from operator-grade deployments:
Pattern 1 — frontload the highest-conversion pages. The first H2 section lists your highest-intent surfaces — pricing, audit request, primary product page. Engines that truncate context honor the order.
Pattern 2 — group by use intent, not site structure. The engines read context, not navigation. Group by "what would a user ask?", not by your site's URL hierarchy. "What we offer", "Case studies", "Documentation" beat "/services", "/case-studies", "/docs".
Pattern 3 — write descriptions that pre-answer the question. - [Free audit](https://acme.com/audit): Five-business-day AI visibility audit with revenue-quantified findings beats - [Free audit](https://acme.com/audit). The description is what the engines extract.
Pattern 4 — keep it under 300 lines. Most engine context windows truncate llms.txt files past a certain length. The Doxia Axis llms.txt is under 80 lines and covers the entire site hierarchy. Under 200 is the sweet spot for most operators.
What about llms-full.txt?
Some operators publish a second file — llms-full.txt — that contains the full text content of every page in markdown. This is more aggressive than llms.txt and more controversial.
The case for it — engines can ingest the full corpus without crawling. Faster, cheaper, more reliable than crawling.
The case against it — content cloning concerns, AI-training opt-in posture (you're literally publishing your content as a training corpus), maintenance overhead.
Most operators we audit don't ship llms-full.txt yet. The cost-benefit isn't clear, and the AI-training opt-in posture is a corporate decision that often requires legal review. We don't recommend it as a default. We recommend the standard llms.txt as the floor.
Should you ship llms.txt today?
Yes. Three reasons:
Reason 1 — the cost is roughly an hour. Audit your existing site hierarchy, write a markdown file mirroring the canonical pages, validate with the linter, deploy at root. One operator-afternoon.
Reason 2 — the engines that read it today weight it materially. Anthropic and Perplexity both use llms.txt as a load-bearing input. Citation lift on those engines after a well-formed llms.txt deployment is measurable inside thirty days.
Reason 3 — the trajectory is up. Every quarter, more engines and more agent frameworks adopt llms.txt. The deployment compounds. The brand without llms.txt today faces every adoption cycle from zero.
So what should you do this week?
Three concrete moves.
Move 1 — check whether you have one. curl https://yourdomain.com/llms.txt. If it returns 404, you're missing the floor.
Move 2 — write a draft. H1, blockquote summary, three to five H2 sections with curated links. 60 minutes. Use the Doxia Axis llms.txt as a reference if helpful.
Move 3 — validate and deploy. Run the linter. Deploy at root. Test with curl. You're done.
If you want the audit to validate that your llms.txt is well-shaped against your category, /audit ships that as part of the diagnostic.
Where to go from here
- The open spec: llmstxt.org.
- The free linter: github.com/Doxia-Axis/llmstxt-linter.
- What schema matters alongside llms.txt: /answers/what-schema-matters-for-ai-visibility.
- The audit: /audit. Validates llms.txt as part of the technical-GEO scorecard.