Skip to content

Security in AI systems

Security in AI systems concerns protecting models and deployments from adversarial attacks, misuse, and exploitation. Key concerns include:

Adversarial attacks: Carefully crafted inputs designed to fool models into making incorrect predictions or generating harmful content.

Model theft: Extracting model parameters or capabilities through query-based attacks or membership inference.

Poisoning attacks: Introducing malicious data during training to compromise model behavior.

Evasion: Circumventing safeguards (e.g., content filters) through adversarial prompting or jailbreaking.

Factuality vulnerabilities: Language models can be exploited to generate false narratives or misinformation at scale.

Key papers