Toxic Comments Experience
Overview
Between 2016–2019, The Coral Project (a collaboration among Mozilla, The New York Times, and The Washington Post) built open-source tools to improve online community interactions. One flagship effort integrated early machine-learning toxicity detection through Google’s Perspective API to help newsroom moderators triage potentially harmful comments faster, while keeping final editorial judgment fully human.
My Role and Scope
I led product design for the Toxic Comments experience end-to-end, partnering closely with engineering and working directly with moderators to ensure the system felt useful, legible, and trustworthy.
What I owned
- The moderation queue and review flow (from “incoming comment” to “decision”).
- The way toxicity scores were displayed, explained, and acted on.
- Interaction patterns that communicated “suggestion” versus “enforcement.”
- Usability validation with moderators and iteration based on feedback.
Who I collaborated with
- Engineers implementing the Perspective API integration.
- Product stakeholders and newsroom moderation leads.
- Teams responsible for policy guidance and community operations.
Problem
News comment sections were increasingly overwhelmed by harassment, spam, and hate speech. Many early approaches treated automation as an enforcement layer (auto-hiding or deleting content), which created two major issues:
- Trust gap: users could not understand why their content disappeared.
- Editorial risk: automation removed nuance in borderline cases (context, reclaimed language, sarcasm).
Coral took a different stance: moderation is a conversation, not a purge.
Constraints
This work sat at a tricky intersection of policy, UX, and imperfect ML signals. Key constraints shaped the design:
- False positives and false negatives: a score is not a verdict.
- Moderator time pressure: triage needed to be fast, not cognitively expensive.
- Context sensitivity: meaning changes with thread context and user history.
- Transparency expectations: moderators needed to understand what the model was “seeing.”
- Human accountability: decisions needed to be reviewable and defensible.
Solution
Instead of using AI as a gatekeeper, we used it as an assistant.
- Perspective API produced a toxicity score (a probability-style signal, not a yes/no).
- Moderators saw the score alongside comment and thread context.
- The experience surfaced high-risk comments for review, without auto-enforcing removal.
- Over time, moderator feedback helped calibrate how teams interpreted and acted on the scores.
Key Design Decisions
1) Treat the score as a signal, not a verdict
Decision: Make the interface communicate “this is guidance,” not “this is enforcement.”
- Why: Overconfident UI causes over-trust (moderators defer to the model) or backlash (moderators reject it entirely).
- Tradeoff: A more careful presentation can slow first-time comprehension, but it protects trust long-term.
- What I designed: score presentation that supported quick triage while reinforcing that humans decide.
2) Keep context visible at decision time
Decision: Optimize the review flow around conversation context, not just a single comment.
- Why: Toxicity is often contextual (piling-on, targeted harassment, sarcasm, reclaimed terms).
- Tradeoff: More context means more on-screen information, so the layout had to stay scannable.
- What I designed: a queue that preserved thread cues and let moderators act quickly without losing the surrounding conversation.
3) Support trust calibration, not blind adoption
Decision: Add lightweight cues that helped moderators learn when the score was helpful and when to be skeptical.
- Why: Moderators needed to build a mental model for the system, especially in edge cases.
- Tradeoff: Too much explanation becomes noise, too little becomes mystery.
- What I designed: interaction patterns that encouraged “check the context” behavior and reinforced accountability.
Results and Impact
This work helped demonstrate a human-centered approach to AI-assisted moderation in a newsroom setting.
- Faster triage: moderators could focus attention on higher-risk comments first.
- More defensible decisions: keeping context and avoiding auto-enforcement supported consistent, reviewable calls.
- A repeatable pattern: using ML as a prioritization signal (rather than a silent gate) became a common approach later across industry moderation tooling.
What I’d Improve Next
If I were extending this today, I’d focus on:
- Better explanations for edge cases: clearer “why this might be high” cues, without turning moderators into model debuggers.
- Calibration controls: per-community thresholds and presets aligned to policy and staffing.
- Feedback loops: simpler ways for moderators to flag “model wrong” moments and see the system adapt over time.