SwapMe and FaceSwap Dataset¶
A face tampering dataset introduced by Two-Stream Neural Networks for Tampered Face Detection, created using two commercial and open-source face-swapping applications.
Composition¶
Tampered images: 2010 total - 1005 images generated using SwapMe iOS app - 1005 images generated using FaceSwap open-source software
Authentic images: 2800 total - 1400 per subset (corresponding to each swapping application)
Total: 4810 images
Creation methodology¶
- Selected 1005 target-source face pairs
- Generated one tampered image per pair using each application
- Applied post-processing including:
- Boundary blurring for seamless blending
- Image resizing to match context
- Color/illumination blending
Only high-quality results were retained to ensure realistic tampering.
Dataset characteristics¶
Advantages: - Large scale: 2010 tampered images suitable for training deep learning methods - High visual quality: Tampering is realistic and difficult to detect through visual inspection alone - Diverse content: Images cover diverse events (holidays, sports, conferences) and identities (different ages, genders, races) - Two algorithms: Uses two different face-swapping techniques to avoid overfitting to algorithm-specific artifacts - Face-specific: Focuses on facial regions rather than general image tampering
Challenges: - Limited to face regions in images with clear faces - Only two swapping algorithms (may not generalize to other techniques) - Not publicly released (as of the paper's publication in 2018)
Evaluation protocol¶
The paper uses a cross-algorithm training and testing protocol: - Train on FaceSwap subset (705 tampered + 1400 authentic images) - Test on SwapMe subset (300 tampered + 300 authentic images)
This ensures the model does not overfit to algorithm-specific artifacts and tests generalization to different face-swapping techniques.
Baseline results¶
Face-level AUC on SwapMe test set: - Two-stream network (proposed): 0.927 - Face classification stream alone: 0.854 - Patch triplet stream alone: 0.875 - Steganalysis features + SVM: 0.794
Related datasets¶
- FaceForensics++ (FaceForensics++: Learning to Detect Manipulated Facial Images): Larger dataset with 1.8M+ images and multiple manipulation methods
- DFDC Dataset (The DeepFake Detection Challenge (DFDC) Dataset): Deepfakes Detection Challenge dataset with videos
- CASIA 1.0 / 2.0: General image splicing and manipulation datasets (not face-specific)
- Columbia Image Splicing Detection: Small dataset (180 spliced + 180 authentic images)
Notes¶
The SwapMe/FaceSwap dataset addresses limitations of prior face tampering datasets by focusing specifically on faces and using realistic post-processing. However, the lack of public release limits its broader impact compared to datasets like FaceForensics++ and DFDC, which became standard benchmarks for the field.