Strategy

What 8,400 UGC Pieces Told Us About Brand-Safe Content (Data Dump)

A full data dump from 8,400 user-generated photos, videos and reviews analysed for brand-safety signals. The patterns, the false positives, and the moderation thresholds that actually work.

Rohin AggarwalFebruary 28, 20261 min read

Brand safety in UGC is a topic that operators only think about when something goes wrong — usually a high-visibility offensive photo, an inappropriate review, or a screenshot circulating on social media. The rest of the time it is invisible plumbing. This piece is a data dump from 8,400 UGC pieces we moderated across customer programmes in Q4 2025 through Q1 2026, with the goal of making the plumbing visible.

We will share the patterns, the false-positive rates by classifier type, and the moderation thresholds that we have settled on after iterating against real volume.

Composition of the dataset

Total pieces: 8,427.
Photos: 5,290 (63%).
Videos: 1,108 (13%).
Text-only reviews: 2,029 (24%).
Source channels: WhatsApp 41%, email post-purchase 33%, native PDP capture 16%, Instagram-DM ingest 10%.
Categories represented: apparel 38%, beauty 22%, home 18%, food/bev 11%, fitness 6%, baby 5%.

Flag rates by classifier

We run every piece through a stack of classifiers before publication. The flag rates across the dataset:

Explicit content (nudity/sexual): 0.18% of photos and videos flagged.
Violence or weapons: 0.04%.
Hate speech or slurs in text: 0.31% of text reviews.
Profanity (mild): 4.8% of text reviews.
Minor faces (children): 1.6% of photos.
Competitor branding visible: 3.2% of photos and videos.
Personally identifiable information (faces with names, phone numbers, etc.): 0.9%.
Off-topic content (wrong product, irrelevant photo): 6.1%.
Misinformation or unfounded health/safety claims: 0.5% of text reviews.

The total share flagged for any reason: 14.7%. The share that escalates to human moderation: 4.2%. The share that ultimately blocks publication: 1.8%. The remainder is autoresolved (e.g., profanity masked, faces blurred, competitor logo cropped).

False positives

Classifiers err. The false-positive rates we observed:

Explicit content: ~14% false-positive rate. The biggest source of false positives is bathing-suit/swimwear UGC in summer categories.
Violence: ~7% false-positive. Mostly hunting/outdoors content with rifles or knives in context.
Hate speech: ~22% false-positive. The biggest source is reclaimed slurs and AAVE used by the original speaker community.
Minor faces: ~31% false-positive. Hard problem — adults with small features are often flagged.
PII: ~8% false-positive. Logos with text on them often trip the name-detector.

The false-positive rate matters more than the false-negative rate operationally. False positives cost moderator time and slow time-to-publish; false negatives cost reputation. The right balance depends on category — beauty and food/bev can run looser; baby and B2B need to run tighter.

Moderation thresholds that work

After 14 months of tuning, the thresholds we settled on:

Auto-publish

Pieces with no classifier flags above the "low" threshold across any category. Roughly 85% of all submitted UGC.

Auto-resolve and publish

Pieces with cosmetic issues — mild profanity in text, minor visible logos, faces in non-primary positions. The system applies the resolution (mask, crop, blur) and publishes automatically. Roughly 9% of all UGC.

Human review

Pieces with classifier flags above the "medium" threshold but below "high". A human moderator looks at it, makes a call, and either publishes, edits or blocks. SLA: 24 hours; actual median: 4 hours. Roughly 4% of all UGC.

Auto-block

Pieces flagged "high" by any classifier (explicit, violence, hate speech, minor faces). Never published. Submitter notified. Roughly 1.8% of all UGC.

Category-specific patterns

Apparel

Highest rate of "swimwear flagged as explicit" false positives. Recommend a category-aware classifier threshold for fashion brands operating in summer SKUs.

Beauty

Highest rate of close-up faces. Need to balance face-detection sensitivity against the legitimate marketing value of close-ups.

Home

Highest rate of competitor logos in the background (other furniture, appliances visible). Auto-crop sensible here.

Food and beverage

Highest rate of misinformation flags — health claims, dietary claims, allergen mentions. Tight thresholds critical for regulated markets.

Baby

Special handling. Minor-face threshold tightened; parental verification required for any photo featuring a child. Slower time-to-publish; higher trust signal.

Open questions

Three things we are still tuning.

Cultural context in text moderation. Reclaimed language, in-group jokes, AAVE — all cause false positives at rates we are not satisfied with.
AI-generated UGC. As shoppers start submitting AI-generated images of "what I would look like in this", how do we treat them? We currently block; the right answer may be a different tag.
Cross-platform identity. When the same user submits via WhatsApp and email under different identities, do we treat the second submission as fresh or as a duplicate?

Closing

UGC moderation is plumbing, and like all plumbing, it is invisible when it works. The data above is the closest most operators will see to a real-volume distribution; the thresholds we have settled on are starting points for your own programme, not gospel. Tune to your category, your channel mix, and your brand-risk tolerance.

#brand-safety

#ugc

#moderation

#data