Content moderation¶
Content moderation encompasses policies, enforcement mechanisms, and governance structures platforms use to manage harmful, false, or violative content. This includes both moderation of human-generated content and, increasingly, governance of AI-generated content and AI-assisted moderation tools.
Key approaches¶
Detection-based: Identifying and removing harmful content through human review, crowdsourcing, or algorithmic detection
Prevention-based: Structural design choices that reduce the spread of false content (e.g., algorithmic transparency, friction, diversification)
Governance-based: Establishing policies, appeals processes, and external review mechanisms
Key papers¶
Platform governance and regulation¶
- The platform governance triangle: conceptualising the informal regulation of online content — analyzes informal multi-stakeholder governance arrangements for platform content, using the governance triangle model to examine Facebook's Oversight Body and similar initiatives
AI-generated content detection and mitigation¶
- Generative Language Models and Automated Influence Operations: Emerging Threats and Potential Mitigations — mitigation strategies for detecting and slowing the spread of AI-generated propaganda; discusses challenges in identifying synthetic text and platform-level coordination requirements
Offensive and abusive content detection¶
- Toxicity in ChatGPT: Analyzing Persona-assigned Language Models — analysis of toxic output generation by ChatGPT under persona assignment; identifies a safety vulnerability and discriminatory bias in AI-generated content
- Comparative Studies of Detecting Abusive Language on Twitter — Benchmarks detection models for abusive language on Twitter; establishes baseline performance for automated content moderation systems
- Predicting the Type and Target of Offensive Posts in Social Media — Detection and hierarchical classification of offensive content
- SemEval-2019 Task 6: Identifying and Categorizing Offensive Language in Social Media — Benchmark evaluation of offensive content detection and categorization systems across 115 teams; demonstrates feasibility of hierarchical detection, type categorization, and target identification
- Mohseni & Ragan (2018) — Combating Fake News with Interpretable News Feed Algorithms — Argues that transparent news feed algorithm design is a prevention-based approach complementary to detection
- Truthful AI: Developing and Governing AI That Does Not Lie — governance frameworks for AI truthfulness as applied to platform content and AI systems
- Lazer et al. (2018) — The Science of Fake News — challenges in detection and moderation