Skip to content

GUIDE

Adversarial training

Adversarial training¶

Adversarial training is a defense strategy where models are trained on a mixture of clean and adversarial examples to improve robustness. By exposing models to perturbations and attacks during training, adversarial training can reduce the model's vulnerability to novel attacks and improve generalization.

Key papers¶

TextAttack: A Framework for Adversarial Attacks, Data Augmentation, and Adversarial Training in NLP — Provides end-to-end pipeline for adversarial training, automating generation and periodic training of models on augmented adversarial examples.

Adversarial robustness (the goal of adversarial training)
Data augmentation (generation of adversarial training examples)
Model Evaluation (evaluating trained model robustness)