Agentic commerce

The Agentic Commerce Stack: a reference architecture for merchants in 2026

Seven layers between a shopper’s intent and your checkout. A reference architecture and 90-day implementation plan for being visible, citable, and convertible inside AI shopping agents.

Rohin AggarwalMay 14, 20261 min read

The phrase "agentic commerce" entered the trade press around the back end of 2024 and within nine months had become unfalsifiable. Every vendor deck claims to be doing it. Every analyst report cites it as the next platform shift. Most of the writing about it is wrong, or right in a way that doesn’t help you ship anything by Friday.

This piece is for the merchant — the person responsible for whether the next quarter’s revenue happens. It does two things. First, it lays out a seven-layer reference architecture for what agentic commerce actually is from a buyer’s perspective, drawn from working with brands shipping into ChatGPT shopping, Claude Computer Use, Perplexity, Gemini and a half-dozen private retail-side agents. Second, it gives you a 90-day plan for moving across those layers, with what to build, what to buy, and what to push back to your platform vendors.

There is a poster version of the diagram below. It prints cleanly at A3. If you do nothing else after this piece, print one and stick it next to the team’s roadmap. Watch how many planning conversations it shortens.

Why a stack, and why now

A new buyer interface always brings out the stack diagram. We had one for browser-based commerce in 1997, one for app-based commerce in 2011, one for marketplace commerce around 2016. Each was a way for builders to admit the surface above the line had eaten enough of the buying journey that the surface below it needed to be designed for, not against.

In 2026, the surface above the line is the AI agent. By the time a shopper clicks through to a PDP in 2026, they have often already had a substantive conversation with an AI agent that narrowed the choice set to one or two SKUs. The PDP is the last 30 seconds, not the first 10 minutes. That re-orders every implementation priority you had.

Three observations make the stack worth drawing as a stack instead of a flat checklist:

Each layer depends on the layer below it. You cannot rank in citation signals (L5) if your schema surface (L3) is broken. You cannot serve an agent tool call (L4) if your catalogue identity (L1) is ambiguous.
The layers are not all yours to own. L0–L2 are upstream infrastructure (the storefront, the PIM, the embeddings provider). L3–L5 are where most merchants compete today. L6–L7 are the moat — the closing 18% of revenue and the data feedback loop that keeps you ahead.
Most teams have invested heavily in L3 and barely thought about L4 or L7. The asymmetry is the opportunity.

L7 · Measurement & feedback (your moat)

Measurement is the top layer, not the bottom, because in agentic commerce you cannot improve what you cannot see. The buying decision now happens inside a black box rented from someone else. If you do not instrument the shape of the box from the outside, you are flying blind.

The unit of measurement is no longer the page view. It is the prompt. For every high-intent prompt that matters in your category — "best linen jacket under £180", "what is the best CRM for a 4-person team", "non-toxic mascara that doesn’t smudge" — you need to know three things every week: (a) does the agent answer the prompt at all, (b) does it cite your brand inside the answer, and (c) does it cite a specific SKU of yours.

Tools that compose into a workable L7 today:

A prompt panel — 50–300 head prompts you care about, scored weekly across at least four engines (ChatGPT, Claude, Perplexity, Gemini). Run as a cron job; store as a time-series so you can see drift.
Citation tracking — record which sources the agent cited for each prompt. This is where you discover that Reddit and a specific reviewer matter more than your own PDP.
Server-side referral attribution — every agent surface that drives a click sends a referer or a specific UTM. Capture both at the load balancer, not the client.
A "prompt-share" index — the share-of-voice equivalent for AI engines. We publish ours weekly; it should be a board-level number inside 12 months.

What good looks like: every Monday, your growth lead reads a one-page report showing citation rate per engine per prompt cluster, a list of the five prompts you regressed on, and a list of the three new prompts where you broke into the top of the answer. That is the dashboard that turns into a quarterly board number.

Most common mistake: treating GA4 as enough. GA4 sees the click after the agent has decided. You need the prompt, the answer, and the citation list — not just the session.

L6 · Conversion & checkout handoff

Once an agent has decided to recommend your SKU, the next 30 seconds determine whether the recommendation becomes revenue. This layer is the one most merchants under-engineer in 2026 because it looks like it has always existed. It hasn’t. The cart and the checkout were designed for a human clicking through hyperlinks. The agent does not click. It reads, parses, and either generates a deep-link or, increasingly, drives a browser via Computer Use / Operator-style protocols.

What this layer must produce:

A deterministic deep-link per SKU + variant. Not just /products/jacket but /products/jacket?variant=42&size=M&color=oat that pre-fills the cart server-side. Without this, the agent has to guess the URL shape, which it does poorly.
A machine-readable buy primitive on the PDP — schema.org Offer with availability, price, currency, and shipDate. Plus ARIA labels on the buy button that a Computer-Use agent can find without guessing.
A "buy via link" endpoint that converts an agent’s intent (sku + qty + variant) into a one-line URL the user lands directly into checkout on. Stripe, Shop Pay and several headless cart APIs do this; what they don’t do is make it default.
A graceful fallback when an agent tries Computer Use on your site. If your checkout has a captcha that fires for any non-headed browser, the agent will hand the cart back to the human with the buy unfinished. You will lose that order to a competitor whose checkout doesn’t.

What good looks like: an agent can take a shopper from "yes, the jacket" to a completed order in a single message with a single link. Time-from-intent-to-checkout is under 12 seconds. The cart deeplink survives across the agent’s and the user’s device contexts.

Most common mistake: leaving your checkout protected by anti-bot defaults that fire on agent traffic. Run a test — ask Claude or ChatGPT with Computer Use to buy your own product end-to-end and watch where it breaks. You will be surprised. Fix those breakpoints; they are usually one config away.

L5 · Citation & ranking signals

This is the layer that most resembles SEO from the 2010s, and that resemblance is exactly why teams misjudge it. The signals are different. The decay is different. The shape of the winning answer is different.

Six signals that empirically move citation rate (from a panel of 1,200 prompts run across four engines over 90 days):

Table 1. Citation-rate lift by signal type. Panel: 1,200 prompts, four engines, 90-day rolling window.

Signal · Effect on citation rate · Hardest part

Signal: Original UGC reviews on the PDP · Effect on citation rate: +39% vs none · Hardest part: Volume + freshness

Signal: Creator video on the PDP · Effect on citation rate: +22% vs none · Hardest part: Rights + tagging

Signal: Structured Q&A on the PDP · Effect on citation rate: +18% vs none · Hardest part: Authentic answers, not marketing

Signal: Third-party press citation · Effect on citation rate: +14% per ref · Hardest part: Earning real coverage

Signal: Author byline + verifiable expertise · Effect on citation rate: +11% · Hardest part: Real people, real history

Signal: llms.txt manifest pointing at canonical URLs · Effect on citation rate: +9% · Hardest part: None; ship it tomorrow

Notice what is not on the list: meta-keywords, alt text in isolation, image SEO. They have not stopped mattering for classical search; they have stopped mattering enough to be the lever that decides whether an agent cites you.

What good looks like: every priority PDP carries at least 20 UGC photos or videos, a recent review (last 90 days), at least 4 structured Q&A pairs, and a byline on any long-form content. There is an llms.txt at the root pointing crawlers at the canonical product feed.

Most common mistake: treating UGC as decoration. Agents read it. They quote from it. A PDP whose reviews are paginated behind a "load more" button that requires JavaScript is invisible to half the crawlers that matter. Render the first 12 reviews server-side and you will see citation rate move in a week.

L4 · Agent-facing APIs & tools

This is the layer that didn’t exist three years ago and now matters as much as your homepage. It is the answer to the question: when an agent decides it needs more information than the public page offers, what API do you give it?

Three concrete things to build, in this order:

A product-feed RPC — take Google Merchant Center as the template, add availability + variant graph + price-by-region. Make it free, public, fast (sub-200ms p95), and stable. Most agents discover catalogues via these feeds before they ever touch your storefront.
A reviews and Q&A endpoint — the agent should be able to fetch the latest 20 reviews and the top 10 Q&A pairs for a given SKU in a single call. Idukki ships this; you can also self-host with your existing review provider if it has a sane API.
An MCP server — yes, you should run one. Anthropic’s Model Context Protocol gives you a deterministic surface for agent tool calls. A merchant MCP exposes search-products, get-product, get-inventory, get-reviews and start-checkout as named functions. That is the entire contract.

What good looks like: an agent looking at your category page can answer "do you have this jacket in oatmeal, size M, in the EU warehouse, shipping within three days?" without ever rendering HTML. The latency is below 300ms p95.

Most common mistake: thinking your existing GraphQL API is good enough. It is not. Your GraphQL API was designed for your front-end engineers; agent tools want named, fixed-shape functions with stable contracts. Build the MCP; let the GraphQL API keep doing its job.

L3 · Schema & semantic surface

Schema is the cheapest layer to fix and one of the highest-leverage. Most merchants ship Product schema in 2026 and almost nothing else. The list below is the minimum semantic surface a citable storefront emits:

Product (with brand, sku, gtin, mpn, image, description, category, color, material, audience)
Offer (with price, priceCurrency, availability, priceValidUntil, shippingDetails)
AggregateRating (only if you have ≥50 verified reviews — do not fake)
Review (each review as its own JSON-LD, not just an aggregate)
HowTo on any "how to use", "how to install", "how to size" page
FAQPage on PDPs that carry Q&A blocks
BreadcrumbList on every page nested below the root
Article + author Person on every blog post (with sameAs to verified profiles)
Organization with sameAs to your verified social accounts
WebSite with potentialAction for search

Two emerging schemas worth adopting early: speakable on long-form content (helps voice agents lift sections cleanly) and ItemList on category pages (some engines cite category pages as the canonical answer to "what brands sell X").

What good looks like: every priority page emits at least four valid schemas. Schema validates clean in Google’s Rich Results Test and Schema Markup Validator. JSON-LD is server-rendered, not injected client-side after hydration.

Most common mistake: emitting schema that doesn’t match the visible page. Agents now cross-validate. A Product page claiming AggregateRating 4.9 with no visible review block reads as a manipulation attempt and gets demoted across at least two of the four engines we test.

L2 · Embedding & retrieval

This layer is invisible to most merchants but underpins how agents actually find your products when they are searching, not when they are fetching a known SKU. If a shopper asks an agent "find a linen jacket that doesn’t wrinkle", the agent runs a semantic search across an embedding index. If your products are in that index with good vectors, you are a candidate. If not, you are not.

What to embed, and how:

Embed the product title + structured attributes + the first 200 words of the description. Truncating naively here destroys recall; structured attributes (color, material, occasion, audience) matter more than the marketing copy.
Embed the cover image with a multimodal model. Voyage, Cohere multimodal, and OpenAI’s CLIP-family models all work; the differences matter at the margin but any of them beats no visual embeddings at all.
Embed each review separately. Reviews carry the language a shopper uses (“doesn’t wrinkle”, “smells like vanilla”, “snags on velcro”) which marketing copy avoids.
Update on every catalogue change. Stale embeddings are how an agent confidently recommends a product that’s been out of stock for a quarter.

What good looks like: your full catalogue is embedded in a hybrid (lexical + vector) index that exposes a public retrieval endpoint (per L4 above). Recall on a held-out test set of natural-language queries is above 70% in the top-5.

Most common mistake: leaving this to the platform. Shopify and BigCommerce both ship a built-in semantic search. Both are configured for an average merchant’s catalogue, not yours. Run the recall test on your own product space and you will find at least 30% of natural-language queries that miss you entirely.

L1 · Identity & catalogue normalisation

Of the seven layers, this is the one most likely to be in worse shape than you think. Merchant catalogues in 2026 still have duplicate SKUs, ambiguous variant graphs, missing GTINs, and "brand" fields that are sometimes the manufacturer and sometimes the retailer. Agents trip over this on day one. They cite the wrong product. They quote the wrong price. They send shoppers to a dead variant URL.

The non-negotiables at L1:

A canonical SKU per saleable unit. Variants roll up under a parent product. Discontinued SKUs do not get re-used.
A real GTIN or MPN on every physical product. If you don’t have it, get it from the manufacturer. Without it, agents cannot disambiguate you from a marketplace listing.
A clear "brand" field that points to a single canonical brand entity, with a sameAs to your verified Wikipedia / Wikidata / Crunchbase page if one exists.
A variant graph that lists every dimension (size, color, material) with consistent values. "Oatmeal" and "Oat" are not two colors.
Live availability + price at the variant level. Stale inventory is the largest single source of agent-driven complaints.

What good looks like: you can answer "what is the canonical URL, GTIN and current price for SKU ↑" with a single API call in under 100ms, and the answer is the same one your finance team sees in the PIM.

Most common mistake: assuming your platform’s default catalogue is good enough. Shopify product IDs are not GTINs. WooCommerce SKUs are free-text. You need a PIM-grade normalisation pass over the data your platform stores. Do it once, document the contract, then enforce it via CI on every product update.

Build, buy, or push to your vendor

No merchant is going to build all seven layers from scratch. The decision matrix below is the one we use in scoping conversations — it has held up well across about 40 brands.

Table 2. Default build/buy/push recommendations by layer. Reverse the recommendation if you are the platform.

Layer · Default — build, buy, or push? · Why

Layer: L7 Measurement · Default — build, buy, or push?: Build · Why: No vendor sees your prompt set or your moat data the way you do.

Layer: L6 Conversion handoff · Default — build, buy, or push?: Buy (Stripe / Shop Pay) + tune · Why: Solved primitives; the tuning is yours.

Layer: L5 Citation signals · Default — build, buy, or push?: Buy (Idukki, reviews vendor) + earn · Why: The platform is bought; the content is earned.

Layer: L4 Agent APIs / MCP · Default — build, buy, or push?: Build (it is your moat) · Why: Off-the-shelf MCPs do not yet know your taxonomy.

Layer: L3 Schema surface · Default — build, buy, or push?: Build (small one-off, then maintain) · Why: No vendor can guess every page type for you.

Layer: L2 Embedding & retrieval · Default — build, buy, or push?: Buy (Voyage / Cohere / OpenAI) + run · Why: Models commodified; your data is the differentiator.

Layer: L1 Catalogue normalisation · Default — build, buy, or push?: Push (to platform / PIM vendor) · Why: Standardising GTINs is the platform’s job.

Layer: L0 Storefront · Default — build, buy, or push?: Buy (Shopify / Woo / BC / Wix) · Why: The market answered this question in 2014.

A 90-day implementation plan

If you have read this far, you are probably looking at your own stack and wondering where to start. Here is the plan we run when a brand asks "what do we do this quarter?" — ordered by leverage per engineering day.

Days 1–10 · Audit + telemetry

Compile the prompt panel: 50–300 head queries that matter to you, weighted by revenue.
Run the panel against ChatGPT, Claude, Perplexity, and Gemini. Record citation rate, ranking, and source list per prompt. This is your baseline.
Audit your schema surface across the top 50 pages by traffic. Use Schema Markup Validator. Note every page that fails or is missing a recommended schema.
Audit your catalogue: pick 50 random SKUs and verify canonical URL, GTIN, availability and price match between platform, PIM and product feed.

Days 11–30 · Quick wins (L3 + L5)

Add the missing schemas you identified. Most teams gain 4–6 schema types in this window.
Server-render the first 12 UGC reviews and 4 Q&A blocks on every PDP. This single change moves citation rate within a week.
Ship an llms.txt at the root pointing at your canonical product feed + key category pages + author bios.
Add real author bylines on long-form content with a sameAs to a verified LinkedIn profile.

Days 31–60 · L4 (the moat layer)

Build the public product-feed RPC. Spec: REST or GraphQL, < 200ms p95, public auth or signed-token if you must rate-limit.
Build the reviews + Q&A endpoint per SKU. Pair with the L5 work so the data is rich.
Build a minimal MCP server exposing search_products, get_product, get_variants, check_inventory, get_reviews, start_checkout. Self-host or run on Cloudflare Workers.
Publish the MCP endpoint to Anthropic’s server directory. Iterate on the tool descriptions weekly based on how agents call the tools — the tool description is the prompt the agent sees.

Days 61–90 · L7 (measurement, your moat)

Operationalise the prompt panel from Day 1 as a weekly cron. Store results as time-series, alert on regressions.
Add server-side referral attribution at the load balancer. Capture agent-referer headers, sticky UTM parameters and prompt-fingerprint cookies.
Build the prompt-share dashboard — share of voice in the answer, share of voice in citations, share of conversions.
Take the dashboard to your CMO + CFO. Get the prompt-share index added to the quarterly board pack.

A word on L0 — your storefront

We did not write a full section on L0 because it is mostly a solved problem. Shopify, BigCommerce, WooCommerce, Wix and Magento have done the job of giving you a stable storefront. None of them was designed for agentic commerce, but most of them are catching up.

What to push your platform on, in 2026:

A clean public product-feed RPC at L4, not just the Google Merchant Center one.
A first-class MCP server you can extend, not just an app marketplace.
Server-rendered schema across every template by default. You should not have to install a plugin to emit Product JSON-LD in 2026.
A predictable deep-link grammar for cart pre-fill. /cart/add?items=SKU:1 belongs in the docs, not in a community thread.

If your platform is not moving on these in the next 12 months, start the conversation about what migration looks like. Not because you should leave, but because the platforms that move fastest here will compound for years.

Closing — the shape of the next two years

The story I tell brands is this: in 2014 we all spent a quarter rewriting our pages for mobile. We took the same content and chopped it into a vertical layout. The brands that won were not the ones with the prettiest mobile site — they were the ones whose underlying product information was already correct, whose checkout already worked across devices, whose review content already loaded fast enough to render on a 3G connection. The mobile-first migration was a forcing function for hygiene.

Agentic commerce is the same shape of forcing function. The brands that win the next two years are not the ones who ship the cleverest agent feature. They are the ones whose catalogue is clean, whose schema is correct, whose reviews are real, whose checkout is deterministic, whose telemetry is honest. Most of that work pays off in classical search too. None of it is glamorous. All of it is necessary.

Print the poster. Pin it where the roadmap conversations happen. Pick the highest-leverage layer your team has not yet touched. Start there. Ship a thing this Friday.

#agentic-commerce

#reference-architecture

#aeo

#ai-search

#ecommerce-strategy

#mcp