Machine-generated text detection¶
Detecting machine-generated text is increasingly important as language models improve and become more widely accessible. This includes both human-based detection (can people tell the difference?) and automated detection (can algorithms identify generated text by analyzing linguistic or statistical features?).
Key challenges¶
- Advancing generation quality: As language models improve (GPT-2 → GPT-3), generated text becomes harder to distinguish from human writing
- Human limitations: Untrained evaluators often perform at chance level, and even trained evaluators struggle with current models
- Domain variation: Detection accuracy varies significantly across domains (stories, news, recipes)
- Evaluation methodology: Consistent human evaluation practices are critical for benchmarking detection approaches
Key papers¶
Fake news detection with mixed human-machine content: - Su, Cardie & Nakov (2023) — Adapting Fake News Detection to the Era of Large Language Models: Addresses critical challenge of detecting fake news as content landscape transitions from human-written to machine-generated; reveals that detectors trained on human-written fake news generalize poorly to machine-generated variants; proposes training on balanced mixtures of human and machine-generated content for robustness
Watermarking & proactive detection: - Kirchenbauer et al. (2023) — A Watermark for Large Language Models: Proposes watermarking LLM outputs by promoting "green" tokens during decoding; watermark is detectable via z-test without model access, achieves <1.2% false positive rate.
Zero-shot detection via probabilistic analysis: - Mitchell et al. (2023) — DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature: Identifies that LLM-generated text occupies negative-curvature regions of the log-probability landscape; proposes a zero-shot detection method using perturbation-based curvature estimation, achieving 0.95 AUROC on GPT-2 detection.
Human evaluation of generated text: - Clark et al. (2021) — All That's 'Human' Is Not Gold: Empirically demonstrates that untrained humans cannot distinguish GPT2 (57% accuracy) or GPT3 (50% accuracy) from human text; tests training interventions to improve evaluator accuracy.
Related topics¶
- Text generation — source of generated text
- Human evaluation — human detection of generated content
- NLP evaluation — evaluation methodology
- Generated text detection — synonym/related topic