DeepFakes and Beyond: A Survey of Face Manipulation and Fake Detection¶

Authors: Ruben Tolosana, Ruben Vera-Rodriguez, Julian Fierrez, Aythami Morales, Javier Ortega-Garcia

Venue: arXiv, 2020 — arXiv:2001.00179

TL;DR¶

This comprehensive survey reviews face manipulation techniques and detection methods in the era of deep learning. It categorizes facial manipulations into four types—entire face synthesis, identity swap, attribute manipulation, and expression swap—and surveys both generation and detection approaches across public datasets and benchmarks. The paper emphasizes the difficulty of detecting synthesized faces, the challenges of cross-domain generalization, and open problems in the field.

Contributions¶

Systematic categorization of four main face manipulation groups: i) entire face synthesis, ii) identity swap, iii) attribute manipulation, and iv) expression swap
Comprehensive review of manipulation techniques, focusing on GAN-based approaches like StyleGAN and DeepFake
Survey of public databases for face manipulation research (100K-Faces, DFFD, iFakeFaceDB)
Review of detection methods and state-of-the-art benchmarks across manipulation types
Discussion of auxiliary manipulation techniques: face morphing, de-identification, audio-to-video and text-to-video synthesis
Identification of open challenges and future research directions

Methods¶

The paper surveys techniques across four manipulation categories:

Entire Face Synthesis: Uses GANs (ProGAN, StyleGAN) to generate photorealistic face images from scratch. StyleGAN achieves unsupervised separation of high-level attributes and enables intuitive control over generated image characteristics.

Identity Swap: Replaces one person's face in a video with another's using either graphics-based techniques (DeepFaceLab) or deep learning approaches. The key difference from entire face synthesis is that manipulations occur at the image level, generating realistic fake videos.

Attribute Manipulation: Uses face-editing techniques (face color, hair, skin, gender, age, glasses, etc.), carried out through GAN such as StarGAN. The manipulation process is usually done through face attribute transfer.

Expression Swap: Modifies facial expressions by transferring one person's expression to another. Techniques include FaceForensics and Neutral Textures, which replace facial expression while preserving identity.

Detection Approaches: The paper reviews detection methods including GAN-pipeline features (SVM), deep learning features (CNN), steganalysis features, and incremental learning approaches. Detection varies significantly by manipulation type and source.

Results¶

The survey compiles benchmark results across studies, showing state-of-the-art performance for each manipulation type:

Entire Face Synthesis: Best performance achieved with deep learning features (CNN + attention mechanisms), reaching >99% AUC in controlled scenarios
Identity Swap: Best detection achieves 99.4% AUC using deep learning with attention mechanisms on 150K fake faces from multiple GAN architectures
Attribute Manipulation: Detection accuracy varies (83.2%–96.8%) depending on the database and manipulation type
Expression Swap: Best performance reaches 100% AUC in controlled scenarios with EER = 0.1% (FaceForensics++, ProGAN, StyleGAN)

Key findings: i) current manipulations are easily detected under controlled scenarios when fake detectors are evaluated in the same conditions they are trained, ii) detection fails under unseen conditions (different compression, cross-database evaluation), and iii) most detection approaches are highly dependent on the specific scenario, showing poor generalization.

Connections¶

Related to EANN: Event Adversarial Neural Networks for Multi-Modal Fake News Detection on multimodal fake news detection through shared emphasis on detection methods
Related to Deepfakes and Disinformation: Exploring the Impact of Synthetic Political Video on Deception, Uncertainty, and Trust in News on deepfakes as a disinformation threat
Cited by numerous papers on face forensics and manipulation detection in the biometrics literature
Covers techniques discussed in Defending Against Neural Fake News but focused on facial content rather than textual generation

Notes¶

This is a thorough technical survey with emphasis on evaluation benchmarks and the practical challenges of detection. The paper makes clear that most detection methods fail under realistic conditions (cross-dataset, different compression levels), highlighting the gap between controlled evaluations and real-world robustness. The discussion of open challenges—particularly the evolution of GAN techniques that remove fingerprints and the need for large-scale public benchmarks—remains highly relevant. A key strength is the systematic categorization of manipulation types; a limitation is the rapid evolution of generation techniques that has likely surpassed some of the detection methods discussed.