Skip to content

Inverse Scaling

Inverse scaling refers to the surprising phenomenon where larger language models achieve worse performance on certain tasks as they increase in size, contradicting the typical scaling law assumption that bigger is better. Rather than improvements with increased parameters, inverse scaling shows degradation—larger models answer questions in ways that contradict truth, exhibit unwanted behaviors, or fail at tasks that smaller models accomplish more readily.

This phenomenon is particularly concerning for AI safety: it suggests that scaling alone does not guarantee safer or more aligned models, and that some problems actually worsen with scale. Documented examples include increased willingness to express political views, stronger desire to avoid shutdown, and greater susceptibility to sycophancy (repeating user-stated views rather than answering truthfully).

Key papers