Detecting GAN-generated Imagery using Color Cues¶
Authors: Scott McCloskey, Michael Albright Affiliation: Honeywell ACST Venue: arXiv, 2018 — arxiv:1812.08247
TL;DR¶
This paper identifies two forensic cues to distinguish GAN-generated images from real camera imagery by analyzing how generator networks process color. The first cue exploits overlapping color channel weights in the generator (unlike camera spectral sensitivities); the second leverages the fact that generator normalization suppresses saturated pixels. The saturation-based method achieves 0.7 AUC on fully synthetic images and 0.61 AUC on face-swapped images.
Contributions¶
- Analysis of GAN generator architecture, focusing on the color formation layer that collapses K > 3 depth channels to RGB
- Identification of two architectural properties that differ from real cameras:
- Generator color weights overlap and include negative values, unlike camera spectral response functions
- Normalization steps (pixel-wise or layer-wise) constrain output values to a uniform distribution, preventing the saturation and under-exposure regions present in natural images
- Two forensic detection methods based on these cues: color chromaticity histograms (r-g space) and saturation frequency features (over- and under-exposed pixel counts)
- Experimental evaluation on NIST Media Forensics Challenge 2018 datasets
Method¶
The paper analyzes two stages of GAN generation:
Color image formation: The generator's final layer converts K > 3 depth feature maps to RGB via a 1×1 convolution (or similar). This is conceptually similar to a camera's Bayer color filter array collapsing spectral information to RGB. However, the learned weights differ fundamentally from spectral response functions: they have negative values, exhibit high overlap across channels, and share common peaks. Real camera spectral sensitivities are non-negative, have limited overlap, and peak at different wavelengths.
Normalization effects: Both pixel-wise normalization (dividing by the vector magnitude) and layer-wise normalization (standardizing each depth plane) constrain feature values to a fixed range before RGB conversion. This produces well-behaved, uniformly distributed output intensities. Real cameras, by contrast, receive irradiance with a logarithmic distribution, leading to regions of saturation (over-exposure) and under-exposure.
Two detection pipelines follow:
-
Color forensics: Compute r-g chromaticity coordinates for each pixel and build a 2D histogram. Use a pre-trained Intensity Noise Histogram (INH) network (from prior work) fine-tuned on chromaticity histograms to classify as GAN or camera.
-
Saturation forensics: Count pixels with intensity ≥ τ_o ∈ {240, 245, 250, 255} (over-exposed) and ≤ τ_u ∈ {0, 5, 10, 15} (under-exposed). Train a linear SVM on these eight feature counts.
Results¶
Saturation-based detection: - GAN Crop (fully synthetic regions): 0.7 AUC - GAN Full (camera with synthetic face replacements): 0.61 AUC - Over-exposure features outperform combined over- and under-exposure features (0.70 vs. 0.67 AUC on Crop)
Color image forensics: - GAN Crop: 0.56 AUC - GAN Full: 0.54 AUC - Performance little better than random; likely hindered by pre-trained network expecting different image statistics or by re-touched camera images in evaluation set
Training data: 1,387 GAN images from 30 LSUN categories (progressive GAN) + ImageNet camera images.
Connections¶
- Complementary to deepfake detection methods that target semantic artifacts (eye blinking, mismatched eye colors)
- Related to image forensics via statistical analysis of image formation; contrasts with learning-based forensic detectors that risk being circumvented by adversarial GAN fine-tuning
- Addresses synthetic media detection in the context of online disinformation
Notes¶
Strengths: The architectural analysis is insightful—the contrast between generator color weights and camera spectral sensitivities is well-illustrated. The saturation method is simple, interpretable, and achieves reasonable performance on fully synthetic images. The focus on architectural properties common across GANs (rather than image-specific artifacts) could generalize better as generator designs evolve.
Limitations: Color forensics underperformed, likely due to noise from pre-trained networks or image re-touching in the evaluation set. The saturation method dilutes on partially GAN-modified images (GAN Full), where synthetic regions are small. The paper does not discuss robustness to image compression or other post-processing. Both methods are trained on a specific GAN architecture (Progressive GAN) and may not transfer to other generative models.
Future work: The authors acknowledge the rapid pace of GAN innovation and the risk of their forensic cues becoming obsolete as architectures evolve. Re-training the full color-forensics network with more data, testing on other GAN variants, and examining robustness under compression are natural next steps.