Hate speech detection¶
Detection of hateful speech—language that attacks individuals or groups based on protected characteristics such as race, ethnicity, religion, gender identity, or other identity attributes. Hate speech detection is a specialized subdomain of toxicity detection with unique challenges: the targeted nature of attacks, implicit forms of hatred, and the need to distinguish harmful speech from protected speech and satire.
Key papers¶
- Comparative Studies of Detecting Abusive Language on Twitter — First comprehensive benchmark of models on 100K-tweet Hate and Abusive Speech on Twitter dataset; neural approaches with latent topic clustering outperform traditional ML
- Are You a Racist or Am I Seeing Things? Annotator Influence on Hate Speech Detection on Twitter — Compares expert vs. crowdsourced annotations on Twitter hate speech, showing systems trained on expert labels significantly outperform those on crowdsourced data
- Predicting the Type and Target of Offensive Posts in Social Media — OLID dataset introducing hierarchical classification of offensive language with group-targeting identification as one level
- SemEval-2019 Task 6: Identifying and Categorizing Offensive Language in Social Media — Shared task benchmark for OLID with 115 competing systems; best systems for group-targeted hate detection (task C) achieve F1 0.660
- Hate Lingo — Linguistic and psycholinguistic analysis distinguishing directed hate speech (personal attacks) from generalized hate speech (targeting groups)
- A Survey on Hate Speech Detection using Natural Language Processing — Comprehensive survey of NLP methods for automatic hate speech detection
- Toxicity Detection with Generative Prompt-based Inference — Uses the HateExplain dataset for zero-shot detection with prompt-based methods
Related topics¶
- Toxicity detection — broader offensive language detection
- Social Bias — demographic biases in hate speech detection systems
- Content moderation — platform policies and moderation systems