Abusive language detection¶

Detection of abusive language—any insult, vulgarity, profanity, or harmful expression that debases targets, causes aggravation, or constitutes harassment. This umbrella category includes hate speech, cyberbullying, offensive language, and toxic comments. The context-dependent and subjective nature of abusive language makes annotation difficult and automated detection challenging.

Key papers¶

Comparative Studies of Detecting Abusive Language on Twitter — First comprehensive benchmark of traditional ML and neural network models on 100K-tweet Hate and Abusive Speech on Twitter dataset; bidirectional GRU with latent topic clustering achieves 0.805 F1
Are You a Racist or Am I Seeing Things? Annotator Influence on Hate Speech Detection on Twitter — Expert vs. crowdsourced annotation quality comparison for Twitter hate speech
[[2017-schmidt-hate-speech-detection]] — Survey of NLP methods for hate speech detection
Predicting the Type and Target of Offensive Posts in Social Media — OLID dataset with hierarchical offensive language classification
SemEval-2019 Task 6: Identifying and Categorizing Offensive Language in Social Media — Shared task with 115 systems for offensive language detection and categorization

Hate speech detection — specialized subdomain targeting protected groups
Toxicity detection — broader category including toxicity beyond targeted abuse
Content moderation — platform enforcement mechanisms
Cyberbullying — online harassment and bullying

Abusive language detection¶

Key papers¶

Related topics¶