Skip to content

Synthetic Media Detection

Synthetic media detection encompasses technical approaches for identifying media (images, video, audio) that have been created using generative models (GANs, diffusion models, etc.) or manipulated using facial reenactment, face-swapping, or other synthesis techniques. Detection is fundamentally an arms race: as generation quality improves, detection becomes harder.

Detection approaches

Frequency-domain analysis: Deepfakes and synthetic media exhibit artifacts in Fourier space due to GAN compression and upsampling patterns. Analyzing spectral properties can reveal generation artifacts.

Learned features (deep learning): Training neural networks (typically CNNs or vision transformers) to distinguish real from synthetic media by learning discriminative features automatically. State-of-the-art approaches use XceptionNet and achieve >95% accuracy on benchmark datasets.

Behavioral inconsistencies: Checking for unnatural eye movements, irregular blinking patterns, asymmetric facial expressions, or impossible head pose trajectories that reveal synthesis.

Audio-visual synchronization: Detecting mismatches between lip movements and speech by analyzing temporal alignment of visual and acoustic features.

Forensic signals: Detecting camera noise patterns, sensor artifacts, lighting inconsistencies, or compression signatures that differ between synthetic and real videos.

Face recognition confidence paradox: Recent findings show that face recognition systems often exhibit higher confidence on deepfakes than genuine videos, suggesting synthetic faces may be "too perfect" in unnatural ways.

Key challenges

Generalization: Detection models trained on one generation technique (Face2Face) often fail on others (FaceSwap, DeepFakes, StyleGAN).

Compression robustness: Social media compression (re-encoding, resizing) degrades detection performance significantly.

Temporal persistence: Single-frame detection is unreliable; robust detection requires analyzing temporal consistency across video sequences.

Scale: Social media platforms process billions of videos daily; detection must run at scale with acceptable computational cost.

Key papers in this wiki