Skip to content

Coral: Toxic Comments Experience

Led product design for Coral's toxicity detection integration, creating moderation workflows that balanced ML scoring with human judgment. The work demonstrated an early, human-centered pattern for using ML as a review signal rather than an automatic enforcement mechanism.

  • Product Design
  • UX Strategy
  • UX Research

At a Glance

Problem
Moderators were overwhelmed by harassment while automation risked trust and nuance.
Design stance
ML as a review signal, not an enforcement mechanism.
My role
Led end-to-end product design with engineers and newsroom moderators.
Impact
Faster triage, defensible decisions, and a reusable AI-assisted pattern.
Toxic Comments moderation queue with toxicity scores and context for human review

Toxic Comments Experience

Overview

Between 2016–2019, The Coral Project (a collaboration among Mozilla, The New York Times, and The Washington Post) built open-source tools to improve online community interactions. One flagship effort integrated early machine-learning toxicity detection through Google’s Perspective API to help newsroom moderators triage potentially harmful comments faster, while keeping final editorial judgment fully human.

My Role and Scope

I led product design for the Toxic Comments experience end-to-end, partnering closely with engineering and working directly with moderators to ensure the system felt useful, legible, and trustworthy.

What I owned

  • The moderation queue and review flow (from “incoming comment” to “decision”).
  • The way toxicity scores were displayed, explained, and acted on.
  • Interaction patterns that communicated “suggestion” versus “enforcement.”
  • Usability validation with moderators and iteration based on feedback.

Who I collaborated with

  • Engineers implementing the Perspective API integration.
  • Product stakeholders and newsroom moderation leads.
  • Teams responsible for policy guidance and community operations.

Problem

News comment sections were increasingly overwhelmed by harassment, spam, and hate speech. Many early approaches treated automation as an enforcement layer (auto-hiding or deleting content), which created two major issues:

  • Trust gap: users could not understand why their content disappeared.
  • Editorial risk: automation removed nuance in borderline cases (context, reclaimed language, sarcasm).

Coral took a different stance: moderation is a conversation, not a purge.

Constraints

This work sat at a tricky intersection of policy, UX, and imperfect ML signals. Key constraints shaped the design:

  • False positives and false negatives: a score is not a verdict.
  • Moderator time pressure: triage needed to be fast, not cognitively expensive.
  • Context sensitivity: meaning changes with thread context and user history.
  • Transparency expectations: moderators needed to understand what the model was “seeing.”
  • Human accountability: decisions needed to be reviewable and defensible.

Solution

Instead of using AI as a gatekeeper, we used it as an assistant.

  • Perspective API produced a toxicity score (a probability-style signal, not a yes/no).
  • Moderators saw the score alongside comment and thread context.
  • The experience surfaced high-risk comments for review, without auto-enforcing removal.
  • Over time, moderator feedback helped calibrate how teams interpreted and acted on the scores.
Toxic Comments moderation queue showing toxicity scores alongside comments
Moderators used Perspective scores as a triage signal while keeping full conversation context.

Key Design Decisions

1) Treat the score as a signal, not a verdict

Decision: Make the interface communicate “this is guidance,” not “this is enforcement.”

  • Why: Overconfident UI causes over-trust (moderators defer to the model) or backlash (moderators reject it entirely).
  • Tradeoff: A more careful presentation can slow first-time comprehension, but it protects trust long-term.
  • What I designed: score presentation that supported quick triage while reinforcing that humans decide.

2) Keep context visible at decision time

Decision: Optimize the review flow around conversation context, not just a single comment.

  • Why: Toxicity is often contextual (piling-on, targeted harassment, sarcasm, reclaimed terms).
  • Tradeoff: More context means more on-screen information, so the layout had to stay scannable.
  • What I designed: a queue that preserved thread cues and let moderators act quickly without losing the surrounding conversation.

3) Support trust calibration, not blind adoption

Decision: Add lightweight cues that helped moderators learn when the score was helpful and when to be skeptical.

  • Why: Moderators needed to build a mental model for the system, especially in edge cases.
  • Tradeoff: Too much explanation becomes noise, too little becomes mystery.
  • What I designed: interaction patterns that encouraged “check the context” behavior and reinforced accountability.

Results and Impact

This work helped demonstrate a human-centered approach to AI-assisted moderation in a newsroom setting.

  • Faster triage: moderators could focus attention on higher-risk comments first.
  • More defensible decisions: keeping context and avoiding auto-enforcement supported consistent, reviewable calls.
  • A repeatable pattern: using ML as a prioritization signal (rather than a silent gate) became a common approach later across industry moderation tooling.

What I’d Improve Next

If I were extending this today, I’d focus on:

  • Better explanations for edge cases: clearer “why this might be high” cues, without turning moderators into model debuggers.
  • Calibration controls: per-community thresholds and presets aligned to policy and staffing.
  • Feedback loops: simpler ways for moderators to flag “model wrong” moments and see the system adapt over time.

Next engagement

Shape your moderation strategy with care

Let’s explore how thoughtful tooling and messaging can help your community thrive without sacrificing safety or trust.