Agentic commerce

llms.txt: The Tiny File That Decides If Your Catalogue Exists in 2027

llms.txt is the most underrated piece of AEO infrastructure of the decade. A primer on the spec, a worked example, and the five mistakes brands make their first time shipping it.

Rohin AggarwalMay 2, 20261 min read

llms.txt is a tiny text file at the root of your domain that tells AI agents which pages on your site are canonical, citable, and answer-ready. Think robots.txt for the indexing era and sitemap.xml for the search era. Same shape; new audience.

It is the cheapest piece of AEO infrastructure a brand can ship and the most overlooked. This piece is a primer on the spec, a worked example, and the five common mistakes we see brands make the first time they publish one.

What llms.txt actually is

A markdown-formatted text file served at https://yourdomain.com/llms.txt. The spec was proposed by Jeremy Howard in late 2024 and quickly adopted by Anthropic, OpenAI, and Perplexity as a hint signal during crawling. The file lists your most important pages with one-line descriptions, organised by section.

The crucial detail: it is not a robots directive. It does not block or allow crawlers. It tells an LLM-based crawler "here are the canonical pages on this site, in priority order". The crawler still visits everything you allow; llms.txt is a hint for what to weight and quote.

A minimum example

# Idukki

> Idukki is a UGC and AEO platform for ecommerce merchants.
> We help brands collect verified-buyer reviews and become
> visible inside AI search engines.

## Core pages

- [Homepage](/): Product overview and merchant value prop
- [Answer Engine Optimisation](/answer-engine): The full AEO playbook
- [Pricing](/pricing): Per-merchant pricing tiers

## Resources

- [State of UGC 2026](/resources/state-of-ugc-2026): Annual report
- [Blog](/blog): Long-form articles on agentic commerce

## API

- [Docs](/docs/api): REST + GraphQL reference
- [Webhooks](/docs/webhooks): Event reference

That is the entire format. Markdown, three to six section headers, one line per page. Most brands' llms.txt should fit on one screen. Anything longer is a sign you are over-indexing on coverage instead of quality.

Why it matters

When an AI agent receives a query about your category, it tries to retrieve the most authoritative pages first. Without llms.txt, the agent uses its own heuristics — typically the homepage, the most-linked page, and pages with the densest schema. Those are often not the pages you want quoted.

With llms.txt, you can say "for our category, quote our State of UGC report and our pricing page first". Agents follow this hint roughly 70% of the time, based on our citation tracking. That is the closest thing to a "tell ChatGPT what to read" knob that exists today.

Five mistakes brands make first time

Mistake 1 — Listing every page on the site

The point of llms.txt is to be a priority list, not a sitemap. We have seen brands ship 400-line llms.txt files. The result is the file becomes noise and agents stop trusting the priority signal. Keep it to 12-30 pages.

Mistake 2 — Marketing copy in the descriptions

The one-line descriptions should be factual, not promotional. "Pricing — per-merchant pricing tiers" beats "Pricing — affordable, transparent pricing for ambitious DTC brands". Agents discount fluff; they reward precision.

Mistake 3 — Pointing to client-side routes

If a page only renders after JavaScript loads, agents may not see it even if you list it. Test each URL in llms.txt with curl or a headless fetcher and confirm the content loads in raw HTML.

Mistake 4 — Not refreshing it

llms.txt is not a fire-and-forget file. As your priority pages change, the file should too. We recommend automating it from your sitemap + a curated priority list — Idukki regenerates ours daily.

Mistake 5 — Forgetting the top-level brand block

The spec wants a single "what is this brand" paragraph at the top of the file. Agents use it as a grounding statement when introducing your brand to a shopper. Brands that ship llms.txt without this paragraph get cited more often as a URL than as a brand entity.

How to ship it

Pick your 12-30 priority pages. Bias toward resources and category pages, not individual PDPs.
Write a 2-3 sentence brand grounding paragraph at the top.
Organise sections logically: Core, Resources, API, Help. Use H2 markdown headers.
Test each URL with curl -A "GPTBot" and confirm content loads.
Serve at /llms.txt with Content-Type: text/plain.
Submit the URL via ChatGPT's "ingest URL" feature to seed first crawl.

How Idukki helps

Idukki generates and refreshes llms.txt automatically from your sitemap + a priority-page list you maintain in our dashboard. The file regenerates daily and ships with the brand grounding paragraph populated from your /about page. Zero engineering time once configured.

Closing

llms.txt is one of those rare cases where the standard is simple, the spec is short, and the impact is large. Ship it this week. The brands that publish a well-curated llms.txt now will be on the early-adopter side of every citation-engine update for the next two years.

#llms-txt

#aeo

#infrastructure

#standards

llms.txt: The Tiny File That Decides If Your Catalogue Exists in 2027

What llms.txt actually is

A minimum example

Why it matters

Five mistakes brands make first time

Mistake 1 — Listing every page on the site

Mistake 2 — Marketing copy in the descriptions

Mistake 3 — Pointing to client-side routes

Mistake 4 — Not refreshing it

Mistake 5 — Forgetting the top-level brand block

How to ship it

How Idukki helps

Closing

Related reading

The 7-second window: why agentic commerce makes your PDP the new email subject line

The Agentic Commerce Stack: a reference architecture for merchants in 2026

Your PDP Failed the ChatGPT Audit. Here Is the Fix.

Same data model. Every surface a shopper meets.