Abusive language detection¶
Detection of abusive language—any insult, vulgarity, profanity, or harmful expression that debases targets, causes aggravation, or constitutes harassment. This umbrella category includes hate speech, cyberbullying, offensive language, and toxic comments. The context-dependent and subjective nature of abusive language makes annotation difficult and automated detection challenging.
Key papers¶
- Comparative Studies of Detecting Abusive Language on Twitter — First comprehensive benchmark of traditional ML and neural network models on 100K-tweet Hate and Abusive Speech on Twitter dataset; bidirectional GRU with latent topic clustering achieves 0.805 F1
- Are You a Racist or Am I Seeing Things? Annotator Influence on Hate Speech Detection on Twitter — Expert vs. crowdsourced annotation quality comparison for Twitter hate speech
- [[2017-schmidt-hate-speech-detection]] — Survey of NLP methods for hate speech detection
- Predicting the Type and Target of Offensive Posts in Social Media — OLID dataset with hierarchical offensive language classification
- SemEval-2019 Task 6: Identifying and Categorizing Offensive Language in Social Media — Shared task with 115 systems for offensive language detection and categorization
Related topics¶
- Hate speech detection — specialized subdomain targeting protected groups
- Toxicity detection — broader category including toxicity beyond targeted abuse
- Content moderation — platform enforcement mechanisms
- Cyberbullying — online harassment and bullying