Skip to content

Multimedia Forensics

Multimedia forensics is the field of detecting and authenticating digital media (images, video, audio) by identifying traces of manipulation, compression, acquisition conditions, or synthesis. It combines signal processing, computer vision, and machine learning to answer forensic questions: Is this media authentic? Has it been manipulated? Where and how was it modified?

Scope

Image forensics: Detecting splicing, copy-move forgery, inpainting, and synthesis of still images using frequency-domain analysis, noise residuals, and CNN-based approaches.

Video forensics: Extending image forensics to temporal domain; addressing challenges of compression, motion, and computational complexity. Methods include frame-level analysis, optical flow, biological signals, and temporal recurrence.

Audio forensics: Detecting voice synthesis, speech deepfakes, audio splicing, and speaker verification attacks.

Multimodal analysis: Detecting inconsistencies across modalities, e.g., audio-visual desynchronization indicating lip-sync mismatch.

Key challenges

Generalization: Forensic methods trained on one generative model or compression codec often fail on newer techniques or different conditions.

Adversarial robustness: Adversarial perturbations can fool forensic detectors. The adversarial arms race between generators and detectors is ongoing.

Compression degradation: Video and audio compression destroys subtle forensic signals. Robust methods must work in the compressed domain.

Computational efficiency: Real-time detection at scale is computationally expensive; simpler models sacrifice accuracy for speed.

Key papers in this wiki