Voice Commerce Is Back. And It Is About Your Q&A Schema, Not Your Skill.
Voice commerce failed in 2018 because Alexa skills were the wrong abstraction. In 2026 it is back, on different rails — and the winning surface is your Q&A schema, not a voice app.
Voice commerce had its first wave around 2017-2019. Alexa skills, Google Actions, every DTC brand building a "voice app" that almost no one ever used. The wave broke because the abstraction was wrong: shoppers do not want a brand-specific voice app. They want their existing assistant to know about your products.
In 2026 voice is back, on different rails. ChatGPT Voice, Claude Voice, Gemini Live and Siri's LLM-powered upgrade all give shoppers spoken access to commerce queries against your catalogue — without any brand-specific app. The winning surface is your Q&A schema, not a voice skill.
Why this wave will not break
Three differences from the 2018 wave.
- Voice now runs through general LLM assistants, not brand-specific apps. Zero adoption friction.
- Latency dropped below the conversational threshold. 2026 voice interaction is sub-300ms end-to-end; 2018 was 1.5-2 seconds.
- Multimodal context. The assistant knows the user is looking at their phone, can see what is on screen, and can blend visual + voice in the same query.
The net effect is a category of interactions that did not exist in 2018: a shopper standing in front of a wardrobe asks their phone "what should I wear for a 12-degree run later", gets a recommendation, and orders the missing piece — all spoken, in roughly 40 seconds.
Where the data the assistant reads lives
Voice assistants do not "have" your product catalogue. They retrieve it on demand from the web, the same way text-based AI agents do. The retrieval favours:
- Q&A schema (QAPage) — read aloud almost verbatim.
- FAQ schema (FAQPage) — read aloud with light paraphrasing.
- Product attribute tables — referenced for specific spec questions.
- AI-summarised review blocks — quoted to convey "what shoppers say".
- Aggregate ratings — read with the count and average.
Notice what is not on this list: long-form product descriptions, marketing copy, video descriptions, hero imagery. Voice strips everything visual and pulls the structured, machine-readable text. If your PDP is 80% imagery and 20% structured data, your voice presence is small.
Optimising for voice
Make every answer voice-friendly
Read each FAQ answer aloud. If it sounds like a marketing brochure when spoken, rewrite. If it contains acronyms a voice assistant would mispronounce ("EPDM", "GSM"), expand them. If it is longer than 30 spoken seconds, split it.
Front-load the answer
Voice assistants quote the first 1-2 sentences and skip the rest. Get the answer in the first sentence. Reserve elaboration for sentences 2-3.
Use numerals, not spelled-out numbers
"14 days" reads better aloud than "fourteen days" — most voice TTS engines render numerals correctly while spelled-out numbers occasionally trip them up.
Pronunciation overrides
If your brand name or any key product name has an unusual pronunciation, ship a pronunciation hint via the speech-to-text x-pronunciation header. Idukki, for example, would emit a hint that the second syllable is stressed.
What not to build
Three things teams get tempted to build that they should skip:
- A brand-specific voice app. Same reason it failed in 2018; nobody installs it.
- A custom wake-word. Adoption is a non-starter; users say "Hey ChatGPT" or "Hey Siri", not "Hey BrandName".
- An on-device voice assistant. Heavy engineering for a feature the OS-level assistants do better.
Measurable impact
Voice referrals are currently 1-3% of AI-engine referrals on mid-market stores — small, but growing 30-40% quarter-on-quarter through 2026. Brands that invest in voice-friendly Q&A schema now are seeing voice-driven sessions convert at 1.5-2x the rate of typed-AI sessions, presumably because the shopper has already verbally committed by the time they reach the PDP.
Closing
Voice commerce in 2026 is not a category to staff a team against. It is a property of well-structured Q&A and FAQ content. If you have already done the AEO work on schema, you are 80% of the way to a voice-friendly catalogue. The remaining 20% is half a week of editorial polish.
Related reading
The Future Trends of Conversational Commerce
Ten trends shaping conversational commerce through 2028: agent personalities, multimodal storefronts, voice-first cart, agentic loyalty, and the disappearance of the checkout button. From an Idukki product perspective.
Anatomy of a Conversational PDP: The 9 Components Every Shop Needs by Q3
The reference architecture for a conversational product page in 2026. Nine components, what each one does, and the order to ship them in.
Multilingual UGC at Scale: Why Translation Kills Conversion, and What to Do Instead
Auto-translating reviews into the buyer's language is the obvious move. It is also wrong. Here is the data, the alternative model, and the rollout playbook for ten-locale stores.