Skip to content

Selection Bias

Selection bias occurs when the data we observe does not represent the true underlying process due to non-random exposure, self-selection, or platform filtering. In social media, selection bias is ubiquitous: we only observe posts users chose to engage with or that algorithms chose to show them.

In the context of fake news, selection bias creates a measurement problem: if a user did not share a piece of fake news, we cannot distinguish between "the user was not interested" and "the user was not exposed." This missing-not-at-random (MNAR) pattern biases estimators of user behavior if left unaddressed.

Standard approaches to mitigating selection bias in observational studies include inverse propensity scoring, which reweights observed data to approximate a randomized experiment.

Key papers