AI-Generated Text Detection¶

Detection of machine-generated text from large language models and other AI systems has become increasingly important as generative models improve. The challenge involves distinguishing between human-authored and AI-authored text to prevent misuse including plagiarism, disinformation, and fraud.

Detection approaches include watermarking (embedding hidden patterns during generation), neural network-based classifiers trained on human vs. AI text, zero-shot methods based on statistical properties, and retrieval-based systems. However, adversarial attacks on these detectors—through paraphrasing, prompt engineering, and other evasion techniques—demonstrate fundamental limitations in reliable detection.

Key papers¶

Disinformation 2.0 in the Age of AI: A Cybersecurity Perspective — perspective on AI-generated content detection as part of defense-in-depth countermeasures against disinformation 2.0; proposes device-level and platform-level detection mechanisms
Can AI-Generated Text be Reliably Detected? — Comprehensive analysis of detector robustness showing recursive paraphrasing attacks defeat watermarking and retrieval-based detectors; establishes theoretical bounds on detection difficulty
Mitchell et al. (2023) — DetectGPT — Zero-shot detection via probability curvature analysis
Mao et al. (2024) — Raidar — Detection via rewriting distance; achieves strong performance across multiple domains using only symbolic output

Adversarial Machine Learning (attack methods)
Language Models (the sources generating text)
Watermarking (embedding detection signals)

AI-Generated Text Detection¶

Key papers¶

Related topics¶