llms.txt: The Tiny File That Decides If Your Catalogue Exists in 2027
llms.txt is the most underrated piece of AEO infrastructure of the decade. A primer on the spec, a worked example, and the five mistakes brands make their first time shipping it.
llms.txt is a tiny text file at the root of your domain that tells AI agents which pages on your site are canonical, citable, and answer-ready. Think robots.txt for the indexing era and sitemap.xml for the search era. Same shape; new audience.
It is the cheapest piece of AEO infrastructure a brand can ship and the most overlooked. This piece is a primer on the spec, a worked example, and the five common mistakes we see brands make the first time they publish one.
What llms.txt actually is
A markdown-formatted text file served at https://yourdomain.com/llms.txt. The spec was proposed by Jeremy Howard in late 2024 and quickly adopted by Anthropic, OpenAI, and Perplexity as a hint signal during crawling. The file lists your most important pages with one-line descriptions, organised by section.
The crucial detail: it is not a robots directive. It does not block or allow crawlers. It tells an LLM-based crawler "here are the canonical pages on this site, in priority order". The crawler still visits everything you allow; llms.txt is a hint for what to weight and quote.
A minimum example
# Idukki
> Idukki is a UGC and AEO platform for ecommerce merchants.
> We help brands collect verified-buyer reviews and become
> visible inside AI search engines.
## Core pages
- [Homepage](/): Product overview and merchant value prop
- [Answer Engine Optimisation](/answer-engine): The full AEO playbook
- [Pricing](/pricing): Per-merchant pricing tiers
## Resources
- [State of UGC 2026](/resources/state-of-ugc-2026): Annual report
- [Blog](/blog): Long-form articles on agentic commerce
## API
- [Docs](/docs/api): REST + GraphQL reference
- [Webhooks](/docs/webhooks): Event referenceThat is the entire format. Markdown, three to six section headers, one line per page. Most brands' llms.txt should fit on one screen. Anything longer is a sign you are over-indexing on coverage instead of quality.
Why it matters
When an AI agent receives a query about your category, it tries to retrieve the most authoritative pages first. Without llms.txt, the agent uses its own heuristics — typically the homepage, the most-linked page, and pages with the densest schema. Those are often not the pages you want quoted.
With llms.txt, you can say "for our category, quote our State of UGC report and our pricing page first". Agents follow this hint roughly 70% of the time, based on our citation tracking. That is the closest thing to a "tell ChatGPT what to read" knob that exists today.
Five mistakes brands make first time
Mistake 1 — Listing every page on the site
The point of llms.txt is to be a priority list, not a sitemap. We have seen brands ship 400-line llms.txt files. The result is the file becomes noise and agents stop trusting the priority signal. Keep it to 12-30 pages.
Mistake 2 — Marketing copy in the descriptions
The one-line descriptions should be factual, not promotional. "Pricing — per-merchant pricing tiers" beats "Pricing — affordable, transparent pricing for ambitious DTC brands". Agents discount fluff; they reward precision.
Mistake 3 — Pointing to client-side routes
If a page only renders after JavaScript loads, agents may not see it even if you list it. Test each URL in llms.txt with curl or a headless fetcher and confirm the content loads in raw HTML.
Mistake 4 — Not refreshing it
llms.txt is not a fire-and-forget file. As your priority pages change, the file should too. We recommend automating it from your sitemap + a curated priority list — Idukki regenerates ours daily.
Mistake 5 — Forgetting the top-level brand block
The spec wants a single "what is this brand" paragraph at the top of the file. Agents use it as a grounding statement when introducing your brand to a shopper. Brands that ship llms.txt without this paragraph get cited more often as a URL than as a brand entity.
How to ship it
- Pick your 12-30 priority pages. Bias toward resources and category pages, not individual PDPs.
- Write a 2-3 sentence brand grounding paragraph at the top.
- Organise sections logically: Core, Resources, API, Help. Use H2 markdown headers.
- Test each URL with curl -A "GPTBot" and confirm content loads.
- Serve at /llms.txt with Content-Type: text/plain.
- Submit the URL via ChatGPT's "ingest URL" feature to seed first crawl.
How Idukki helps
Idukki generates and refreshes llms.txt automatically from your sitemap + a priority-page list you maintain in our dashboard. The file regenerates daily and ships with the brand grounding paragraph populated from your /about page. Zero engineering time once configured.
Closing
llms.txt is one of those rare cases where the standard is simple, the spec is short, and the impact is large. Ship it this week. The brands that publish a well-curated llms.txt now will be on the early-adopter side of every citation-engine update for the next two years.
Related reading
The 7-second window: why agentic commerce makes your PDP the new email subject line
In agentic commerce the PDP is no longer competing for the shopper’s eye. It is competing for the agent’s quote frame — a 700-token window that decides whether your SKU gets recommended. A teardown of the new attention economics and a 5-step PDP rewrite worksheet.
The Agentic Commerce Stack: a reference architecture for merchants in 2026
Seven layers between a shopper’s intent and your checkout. A reference architecture and 90-day implementation plan for being visible, citable, and convertible inside AI shopping agents.
Your PDP Failed the ChatGPT Audit. Here Is the Fix.
We ran 240 product pages from mid-market DTC brands through a structured ChatGPT visibility audit. 78% failed. Here is the failure taxonomy, the four most common faults, and the cheap remediation order.