Security in AI systems¶
Security in AI systems concerns protecting models and deployments from adversarial attacks, misuse, and exploitation. Key concerns include:
Adversarial attacks: Carefully crafted inputs designed to fool models into making incorrect predictions or generating harmful content.
Model theft: Extracting model parameters or capabilities through query-based attacks or membership inference.
Poisoning attacks: Introducing malicious data during training to compromise model behavior.
Evasion: Circumventing safeguards (e.g., content filters) through adversarial prompting or jailbreaking.
Factuality vulnerabilities: Language models can be exploited to generate false narratives or misinformation at scale.
Key papers¶
- A Comprehensive Survey of AI-Generated Content (AIGC): A History of Generative AI from GAN to ChatGPT — discusses security vulnerabilities in generative AI systems, including factuality exploitation and adversarial threats
Related topics¶
- Adversarial robustness — defenses against adversarial attacks
- Adversarial learning for fake news detection — techniques for attack and defense
- Trustworthy AI — broader trustworthiness considerations