Skip to content
DeepFakes and Beyond: A Survey of Face Manipulation and Fake Detection

DeepFakes and Beyond: A Survey of Face Manipulation and Fake Detection

Authors: Ruben Tolosana, Ruben Vera-Rodriguez, Julian Fierrez, Aythami Morales, Javier Ortega-Garcia

Venue: arXiv, 2020 — arXiv:2001.00179

TL;DR

This comprehensive survey reviews face manipulation techniques and detection methods in the era of deep learning. It categorizes facial manipulations into four types—entire face synthesis, identity swap, attribute manipulation, and expression swap—and surveys both generation and detection approaches across public datasets and benchmarks. The paper emphasizes the difficulty of detecting synthesized faces, the challenges of cross-domain generalization, and open problems in the field.

Contributions

  • Systematic categorization of four main face manipulation groups: i) entire face synthesis, ii) identity swap, iii) attribute manipulation, and iv) expression swap
  • Comprehensive review of manipulation techniques, focusing on GAN-based approaches like StyleGAN and DeepFake
  • Survey of public databases for face manipulation research (100K-Faces, DFFD, iFakeFaceDB)
  • Review of detection methods and state-of-the-art benchmarks across manipulation types
  • Discussion of auxiliary manipulation techniques: face morphing, de-identification, audio-to-video and text-to-video synthesis
  • Identification of open challenges and future research directions

Methods

The paper surveys techniques across four manipulation categories:

Entire Face Synthesis: Uses GANs (ProGAN, StyleGAN) to generate photorealistic face images from scratch. StyleGAN achieves unsupervised separation of high-level attributes and enables intuitive control over generated image characteristics.

Identity Swap: Replaces one person's face in a video with another's using either graphics-based techniques (DeepFaceLab) or deep learning approaches. The key difference from entire face synthesis is that manipulations occur at the image level, generating realistic fake videos.

Attribute Manipulation: Uses face-editing techniques (face color, hair, skin, gender, age, glasses, etc.), carried out through GAN such as StarGAN. The manipulation process is usually done through face attribute transfer.

Expression Swap: Modifies facial expressions by transferring one person's expression to another. Techniques include FaceForensics and Neutral Textures, which replace facial expression while preserving identity.

Detection Approaches: The paper reviews detection methods including GAN-pipeline features (SVM), deep learning features (CNN), steganalysis features, and incremental learning approaches. Detection varies significantly by manipulation type and source.

Results

The survey compiles benchmark results across studies, showing state-of-the-art performance for each manipulation type:

  • Entire Face Synthesis: Best performance achieved with deep learning features (CNN + attention mechanisms), reaching >99% AUC in controlled scenarios
  • Identity Swap: Best detection achieves 99.4% AUC using deep learning with attention mechanisms on 150K fake faces from multiple GAN architectures
  • Attribute Manipulation: Detection accuracy varies (83.2%–96.8%) depending on the database and manipulation type
  • Expression Swap: Best performance reaches 100% AUC in controlled scenarios with EER = 0.1% (FaceForensics++, ProGAN, StyleGAN)

Key findings: i) current manipulations are easily detected under controlled scenarios when fake detectors are evaluated in the same conditions they are trained, ii) detection fails under unseen conditions (different compression, cross-database evaluation), and iii) most detection approaches are highly dependent on the specific scenario, showing poor generalization.

Connections

Notes

This is a thorough technical survey with emphasis on evaluation benchmarks and the practical challenges of detection. The paper makes clear that most detection methods fail under realistic conditions (cross-dataset, different compression levels), highlighting the gap between controlled evaluations and real-world robustness. The discussion of open challenges—particularly the evolution of GAN techniques that remove fingerprints and the need for large-scale public benchmarks—remains highly relevant. A key strength is the systematic categorization of manipulation types; a limitation is the rapid evolution of generation techniques that has likely surpassed some of the detection methods discussed.