AI-Generated Text Detection¶
Detection of machine-generated text from large language models and other AI systems has become increasingly important as generative models improve. The challenge involves distinguishing between human-authored and AI-authored text to prevent misuse including plagiarism, disinformation, and fraud.
Detection approaches include watermarking (embedding hidden patterns during generation), neural network-based classifiers trained on human vs. AI text, zero-shot methods based on statistical properties, and retrieval-based systems. However, adversarial attacks on these detectors—through paraphrasing, prompt engineering, and other evasion techniques—demonstrate fundamental limitations in reliable detection.
Key papers¶
- Disinformation 2.0 in the Age of AI: A Cybersecurity Perspective — perspective on AI-generated content detection as part of defense-in-depth countermeasures against disinformation 2.0; proposes device-level and platform-level detection mechanisms
- Can AI-Generated Text be Reliably Detected? — Comprehensive analysis of detector robustness showing recursive paraphrasing attacks defeat watermarking and retrieval-based detectors; establishes theoretical bounds on detection difficulty
- Mitchell et al. (2023) — DetectGPT — Zero-shot detection via probability curvature analysis
- Mao et al. (2024) — Raidar — Detection via rewriting distance; achieves strong performance across multiple domains using only symbolic output
Related topics¶
- Adversarial Machine Learning (attack methods)
- Language Models (the sources generating text)
- Watermarking (embedding detection signals)