Generative AI Misuse: A Taxonomy of Tactics and Insights from Real-World Data¶
Authors: Nahema Marchal, Rachel Xu, Rasmi Elasmar, Iason Gabriel, Beth Goldberg, William Isaac
Affiliation: Google DeepMind, Google, Jigsaw
Venue: arXiv, June 2024 — 2406.13843
TL;DR¶
This paper presents the first comprehensive taxonomy of generative AI misuse tactics grounded in 191 real-world documented incidents (January 2023–March 2024). The authors identify two broad categories—exploitation of GenAI capabilities and compromise of GenAI systems—with 18 distinct tactics. Despite widespread concern about sophisticated attacks, real-world misuse predominantly involves low-tech, easily accessible tactics (impersonation, falsification) to achieve goals like opinion manipulation and monetization, lowering barriers for a broad pool of actors.
Contributions¶
- A taxonomy distinguishing between tactics that exploit GenAI capabilities (10 tactics) and those that compromise GenAI systems (8 tactics), grounded in empirical analysis of media reports
- Empirical evidence that most real-world GenAI misuse is not technically sophisticated but relies on accessible capabilities (text, image, audio, video generation)
- Identification of five primary goals driving GenAI misuse: opinion manipulation (27%), monetization & profit (21%), fraud (18%), harassment (6%), and reach (3.6%)
- Documentation of how different modalities (text, image, audio, video) are leveraged across tactics and goals, with temporal snapshots of emerging patterns (January 2023–March 2024)
- Policy-relevant findings emphasizing the need for multi-faceted mitigation approaches beyond purely technical defenses
Method¶
The authors conducted a two-phase study: (1) literature review of academic and grey literature on GenAI misuse, and (2) qualitative analysis of a dataset of 191 media reports documenting real-world GenAI misuse incidents published between January 2023 and March 2024. They used keyword-based searches across social media, blogs, news outlets, and academic sources, then de-duplicated and manually reviewed each case. Two independent reviewers coded each incident for misuse tactics, actors, goals, and strategies; disagreements were resolved through consensus discussion. The taxonomy was iteratively refined as patterns emerged.
Taxonomy of Tactics¶
Exploitation of GenAI Capabilities (10 tactics):
Realistic depictions of humans: - Impersonation: Creating real-time deepfakes of identified individuals to take action on their behalf (audio, video) - Appropriated Likeness: Using a person's likeness without consent in static depictions - Sockpuppeting: Creating synthetic online personas or accounts - Non-consensual Intimate Imagery (NCII): Using GenAI to create sexually explicit content with real people's likenesses
Realistic depictions of content: - Falsification: Fabricating or falsely representing evidence (reports, IDs, documents, images of events) - Intellectual Property Infringement: Using person's IP without permission - Counterfeit: Reproducing original works, brands, or styles to pass as authentic - IP Infringement: Creating unlicensed copies of intellectual property
Scaling and distribution: - Scaling & Amplification: Automating or amplifying content workflows; high-volume distribution of synthetic content - Targeting & Personalisation: Refining outputs to target individuals with tailored attacks
Compromise of GenAI Systems (8 tactics):
Model integrity: - Adversarial Input: Adding imperceptible perturbations to cause model errors - Prompt Injection: Manipulating text instructions to cause unintended outputs - Jailbreaking: Bypassing safety filters and restrictions - Model Diversion: Repurposing pre-trained models for unintended functionality - Model Extraction: Obtaining model hyperparameters, architecture, or parameters - Steganography: Hiding coded messages in model output to communicate covertly
Data integrity: - Data Poisoning: Corrupting training data to derail learning or introduce vulnerabilities - Privacy Compromise: Revealing sensitive or private information from training data - Data Exfiltration: Illicitly obtaining training data used to build proprietary models
Results¶
Prevalence and modalities: Nearly 90% of documented cases involved exploitation of GenAI capabilities rather than direct system compromise. Impersonation (22% of cases, primarily audio and video), Sockpuppeting (13%), Scaling & Amplification (12%), and Falsification (12%) were most prevalent. Modalities varied by tactic: impersonation leveraged audio (28 cases) and video (21 cases); sockpuppeting and scaling relied on image (17, 15) and text (18, 24); falsification mixed image (16) and text (12). System compromise attacks (prompted injection, adversarial input, jailbreaking) were documented in research demonstrations but only 2 real-world instances of deployed model compromise were reported in the media.
Goals and strategies: Misuse operators pursued five primary goals: 1. Opinion Manipulation (27%): distorting public perception through emotional synthetic imagery, impersonating politicians, creating grassroots support campaigns; often paired with political campaigns and disinformation 2. Monetization & Profit (21%): content farming, generating NCII for sale, celebrity fraud schemes; leveraged scaling and falsification 3. Scam & Fraud (18%): identity fraud, phishing via deepfakes, investment/crypto scams; highly personalized attacks 4. Harassment (6%): bullying, NCII targeting minors, defamation of public figures; primarily NCII generation 5. Reach (3.6%): digital resurrection, "raising awareness"; creation of synthetic personas for advocacy
Actors ranged from state-sponsored entities to private citizens; most misuse did not require significant technical expertise or background.
Key Findings¶
-
Low-tech dominates: Most real-world GenAI misuse exploits easily accessible capabilities (generation of images, text, audio) for purposes long predating GenAI (impersonation, forgery, scams). Sophisticated technical attacks on AI systems are largely limited to academic demonstrations.
-
Accessibility lowers barriers: GenAI's widespread availability, ease of use, and intuitive interfaces have democratized previously costly tactics, enabling a broader pool of actors with minimal technical background to engage in misuse.
-
New ethical frontiers: Beyond policies against CSAM and NCII, emerging use cases blur authenticity and deception—synthetic media used in political communication, AI-written content at scale, targeted manipulation—that do not explicitly violate terms of service but raise ethical concerns around disclosure and consent.
-
Multimodal threat landscape: Text-to-image, text-to-video, and text-to-audio models enable coordinated campaigns across modalities; similar tactics deployed with different media produce different trust and social impact profiles.
-
Information ecosystem burden: Mass-produced synthetic content, even low quality, contaminates public information spaces, increases verification costs, and may erode shared understanding of facts and reality.
Connections¶
- Related to Deepfake Detection via shared technical underpinnings and the challenge of detecting synthetic media at scale
- Connected to Online Manipulation and Disinformation Campaigns as GenAI amplifies traditional tactics with new tools
- Builds on Misinformation and fake news detection taxonomy literature by adding AI-specific threat vectors
- Informs AI Safety and content moderation policy through empirical evidence of real-world attack patterns
- Complements Synthetic Media Generation research by documenting downstream misuse rather than technology alone
- Related to Fraud Detection via shared techniques in identity-based scams and deepfake impersonation
Notes¶
Strengths: The paper fills a critical gap by grounding GenAI misuse discussion in real, documented incidents rather than hypothetical scenarios. The taxonomy is clearly articulated and inductively derived from data. The finding that most real-world misuse is low-tech and accessible is policy-relevant and challenges the assumption that GenAI misuse requires adversarial machine learning expertise. The multimodal framing is timely as GenAI systems become more capable.
Limitations: The authors acknowledge that media reports bias toward high-impact, sensational incidents, potentially underrepresenting covert or undetected misuse. System compromise attacks may be underreported due to lack of public disclosure. The dataset is time-bound (Jan–Mar 2024) and represents an early phase of GenAI deployment; patterns may shift rapidly. Most cases involved text-based model prompting; as multimodal models mature, new tactics may emerge.
Follow-ups: Longitudinal monitoring of emerging tactics would strengthen forecasting. Collaboration with platform trust & safety teams and law enforcement could yield more comprehensive data on real-world incidents. Extension to non-English-language GenAI misuse and geographic variation would improve generalizability.