The 42-signal AEO scorecard: rate any PDP for citation-readiness in 10 minutes
A 42-question rubric grouped into 7 categories. Each signal scored 0/1/2 with what to fix. Pin it next to your category leads’ desks. Includes a downloadable CSV scorecard you can import into Sheets or Notion.
Every team I talk to about AEO eventually asks the same question: "how do we know which of our PDPs are ready, and which need rewriting?" The answer is usually the same shrug. They've read the principles. They have not built a way to score against them.
This piece is the scorecard. Forty-two signals across seven categories, each scored 0/1/2. Print it, walk a category lead through their top SKUs, see which ones are below the line. Average score takes ten minutes per PDP after the first one. You will find your top three quick-wins in the first hour.
How the scoring works
Each of the 42 signals is scored 0, 1 or 2. The rubric is the same across signals:
- 0 — the signal is missing or actively wrong
- 1 — the signal is present but partial, marketing-flavoured, or behind JavaScript
- 2 — the signal is present, server-rendered, factually correct, and matches what is visible on the page
Max score per PDP: 84. We use these thresholds across about 40 brands:
Table 1. Score interpretation and recommended action across 40 brands measured.
Score band · Interpretation · What to do this quarter
Score band: 68–84 · Interpretation: Citation-ready. Top quartile. · What to do this quarter: Maintain. Re-score quarterly.
Score band: 52–67 · Interpretation: Quotable but inconsistent. · What to do this quarter: Fix the 5 lowest-scoring signals.
Score band: 36–51 · Interpretation: Visible to crawlers, rarely cited. · What to do this quarter: Three-week rewrite. Start with categories 1, 3, 4.
Score band: 0–35 · Interpretation: Invisible to most agents. · What to do this quarter: Schedule a full audit. Likely a wider site-architecture issue.
No score is perfect. We have brands at 76 whose conversion is still flat — usually because their L4 + L7 (agent tools and measurement) are not in place yet (see the Agentic Commerce Stack reference architecture). We have brands at 58 whose citation rate is climbing because they fixed the right five signals first. The score is a focus tool, not a destination.
Category 1 · Identity & factual descriptor (6 signals)
The first 200 words of the PDP — what the page is about, factually. Agents lean on these signals more than any other category.
- 1.1 — H1 contains the product name, not a tagline. (0/1/2)
- 1.2 — Sub-headline is a factual descriptor sentence, not marketing copy. (0/1/2)
- 1.3 — Product category is stated in plain English in the first paragraph. (0/1/2)
- 1.4 — Materials or core ingredients listed in the first 200 words. (0/1/2)
- 1.5 — Care, sizing or usage constraints surfaced (not hidden in a tab). (0/1/2)
- 1.6 — Brand name appears with a sameAs link to a verified source (Wikipedia / official LinkedIn). (0/1/2)
Category 2 · Use cases & intent matching (6 signals)
Bullets that match shopper intents (occasion, season, body type, comparison context). The "works well for X" section.
- 2.1 — At least three use-case bullets are present, server-rendered. (0/1/2)
- 2.2 — Each use-case bullet contains at least one quantifiable fact. (0/1/2)
- 2.3 — Use cases map to common shopper queries in your category (do the prompt-panel test). (0/1/2)
- 2.4 — Comparison paragraph vs. an obvious alternative is on the page. (0/1/2)
- 2.5 — "Best for X" framing appears for the top 3 use cases. (0/1/2)
- 2.6 — "Not great for X" anti-recommendation present (this is rare and powerful). (0/1/2)
Category 3 · Specs & structured data (6 signals)
The machine-readable surface. JSON-LD, OpenGraph, materials list.
- 3.1 — Product JSON-LD with brand, sku, gtin/mpn, image, description, category emits server-side. (0/1/2)
- 3.2 — Offer JSON-LD with price, currency, availability, priceValidUntil, shippingDetails. (0/1/2)
- 3.3 — AggregateRating JSON-LD only if ≥50 verified reviews, and matches visible content. (0/1/2)
- 3.4 — BreadcrumbList JSON-LD. (0/1/2)
- 3.5 — Visible spec list (5–10 rows) with units and material percentages. (0/1/2)
- 3.6 — Variant graph is consistent (same dimensions across SKUs, no duplicate colour names). (0/1/2)
Category 4 · Reviews & social proof (6 signals)
UGC and reviews are the highest-leverage citation signal. This category alone moves citation rate 30%+ when fixed.
- 4.1 — At least 5 reviews server-rendered, not behind a "load more" button. (0/1/2)
- 4.2 — Reviews include verified-buyer tags where applicable. (0/1/2)
- 4.3 — Each review has its own Review JSON-LD (not just the aggregate). (0/1/2)
- 4.4 — The first review on the page addresses the most common product objection. (0/1/2)
- 4.5 — Customer photos or video appear on the PDP, not just star ratings. (0/1/2)
- 4.6 — At least one review is from the last 90 days (freshness). (0/1/2)
Category 5 · Q&A & FAQ structure (6 signals)
- 5.1 — At least 5 Q&A pairs are present on the page. (0/1/2)
- 5.2 — FAQPage JSON-LD emits cleanly server-side. (0/1/2)
- 5.3 — Questions are real questions buyers ask (mine your CS inbox, not marketing). (0/1/2)
- 5.4 — Answers contain specifics (units, percentages, conditions). (0/1/2)
- 5.5 — At least one Q&A addresses the most common returns reason. (0/1/2)
- 5.6 — Q&A is below the first 500 words, not above (don't crowd the descriptor block). (0/1/2)
Category 6 · Content, authorship & freshness (6 signals)
- 6.1 — Long-form content (style guide, sizing essay, care article) is linked from the PDP. (0/1/2)
- 6.2 — Long-form content has a real byline with sameAs to a verified profile. (0/1/2)
- 6.3 — Article JSON-LD on the long-form content carries dateModified, not just datePublished. (0/1/2)
- 6.4 — HowTo schema on care / sizing / installation content. (0/1/2)
- 6.5 — At least one third-party citation (press, expert review, magazine feature) is linked from the PDP. (0/1/2)
- 6.6 — Brand emits an llms.txt manifest at the root. (0/1/2)
Category 7 · Technical hygiene (6 signals)
- 7.1 — Page renders cleanly without JavaScript (test in Lynx or curl). (0/1/2)
- 7.2 — Canonical URL is set and matches the rendered page. (0/1/2)
- 7.3 — hreflang annotations correct for multi-region storefronts. (0/1/2)
- 7.4 — Page is in the sitemap with a recent lastmod. (0/1/2)
- 7.5 — robots.txt allows the agent crawlers you want (GPTBot, ClaudeBot, PerplexityBot, Google-Extended). (0/1/2)
- 7.6 — A public product-feed RPC exists at /api/products (any agent can fetch the canonical record). (0/1/2)
A worked example
Here's a real (anonymised) score from a brand we ran the rubric with two months ago. Their flagship PDP scored as follows on first pass:
Table 2. A real PDP first-pass score before any rewrites. Three-week rewrite brought it to 71.
Category · Max · Score · Notes
Category: 1 · Identity & descriptor · Max: 12 · Score: 6 · Notes: H1 was a tagline. Materials buried in a tab.
Category: 2 · Use cases & intent · Max: 12 · Score: 3 · Notes: No comparison paragraph. Vague bullets.
Category: 3 · Specs & structured data · Max: 12 · Score: 8 · Notes: Product schema fine; Offer missing shippingDetails.
Category: 4 · Reviews · Max: 12 · Score: 7 · Notes: Loox loads client-side; agents miss them.
Category: 5 · Q&A & FAQ · Max: 12 · Score: 2 · Notes: Tab-based, JS-only, marketing-voiced.
Category: 6 · Content & authorship · Max: 12 · Score: 4 · Notes: No bylines. No long-form linked.
Category: 7 · Technical hygiene · Max: 12 · Score: 10 · Notes: Good hygiene team.
Category: TOTAL · Max: 84 · Score: 40 · Notes: "Visible to crawlers, rarely cited" band.
The fix-list that came out of this was four items: rewrite the opening sentence + bullets, server-render reviews, add 5 honest Q&A pairs, ship the comparison paragraph. Three weeks of work. Citation rate moved from 8% to 23% on the head prompt set within 30 days.
Closing — score, then ship
The scorecard is not the work. The work is the rewrite that comes after. The scorecard exists so the rewrite is targeted instead of vague; so the third PDP doesn't take three weeks because your team is debating which signal mattered.
Run it on five PDPs this week. Score them in pen, fast. Find the worst-scoring category that's shared across all five. Fix that category first. Re-score in 30 days. Watch what moves.
Related reading
Agentic AI in Shopping — A Merchant Primer
A short, honest primer for ecommerce operators trying to separate signal from noise on agentic AI. What it is, what to actually do this quarter, and what to ignore until 2027.
The Citation Gap: We Tracked 1,200 Brands Inside ChatGPT for 90 Days
A field study of citation distribution across 1,200 DTC brands inside ChatGPT, Claude and Perplexity. The findings are starker than expected, and they predict the next 18 months of AEO competition.
GPTBot vs Googlebot: Divergence in Crawl, Citation and Conversion
GPTBot and Googlebot are not the same bot with a different name. They crawl on different cadences, weight different signals, and produce traffic with wildly different conversion profiles. The operational implications are real.