Content moderation¶

Content moderation encompasses policies, enforcement mechanisms, and governance structures platforms use to manage harmful, false, or violative content. This includes both moderation of human-generated content and, increasingly, governance of AI-generated content and AI-assisted moderation tools.

Key approaches¶

Detection-based: Identifying and removing harmful content through human review, crowdsourcing, or algorithmic detection

Prevention-based: Structural design choices that reduce the spread of false content (e.g., algorithmic transparency, friction, diversification)

Governance-based: Establishing policies, appeals processes, and external review mechanisms

Key papers¶

Platform governance and regulation¶

The platform governance triangle: conceptualising the informal regulation of online content — analyzes informal multi-stakeholder governance arrangements for platform content, using the governance triangle model to examine Facebook's Oversight Body and similar initiatives

AI-generated content detection and mitigation¶

Generative Language Models and Automated Influence Operations: Emerging Threats and Potential Mitigations — mitigation strategies for detecting and slowing the spread of AI-generated propaganda; discusses challenges in identifying synthetic text and platform-level coordination requirements

Offensive and abusive content detection¶

Toxicity in ChatGPT: Analyzing Persona-assigned Language Models — analysis of toxic output generation by ChatGPT under persona assignment; identifies a safety vulnerability and discriminatory bias in AI-generated content
Comparative Studies of Detecting Abusive Language on Twitter — Benchmarks detection models for abusive language on Twitter; establishes baseline performance for automated content moderation systems
Predicting the Type and Target of Offensive Posts in Social Media — Detection and hierarchical classification of offensive content
SemEval-2019 Task 6: Identifying and Categorizing Offensive Language in Social Media — Benchmark evaluation of offensive content detection and categorization systems across 115 teams; demonstrates feasibility of hierarchical detection, type categorization, and target identification
Mohseni & Ragan (2018) — Combating Fake News with Interpretable News Feed Algorithms — Argues that transparent news feed algorithm design is a prevention-based approach complementary to detection
Truthful AI: Developing and Governing AI That Does Not Lie — governance frameworks for AI truthfulness as applied to platform content and AI systems
Lazer et al. (2018) — The Science of Fake News — challenges in detection and moderation