Leveraging the Crowd to Detect and Reduce the Spread of Fake News and Misinformation¶
Authors: Jooyeon Kim, Behzad Tabibian, Alice Oh, Bernhard Schölkopf, Manuel Gomez-Rodriguez
Venue: arXiv preprint, 2017 — arXiv:1711.09918
TL;DR¶
Online social networks use crowd-powered flagging to identify misinformation: when users flag a story, enough flags trigger fact-checking, which disputes false claims. This paper formulates the problem of deciding which stories to fact-check and when, subject to limited fact-checking resources, as a stochastic optimal control problem. The authors develop CURB, a scalable algorithm that minimizes the spread of misinformation while accounting for uncertain exposure dynamics, and validate it on Twitter and Weibo data.
Contributions¶
- Formulate the fact-checking scheduling problem using marked temporal point processes with a novel representation of exposure uncertainty.
- Model the problem as a stochastic optimal control problem for stochastic differential equations with jumps, establishing a connection to Bayesian inference on misinformation rates.
- Derive an efficient algorithm (CURB) that determines optimal fact-checking intensities in closed form and scales to multiple stories.
- Demonstrate empirically that CURB outperforms intuitive baselines on real-world Twitter and Weibo datasets.
Method¶
The authors model the spread of stories on social media using two key processes:
Exposure dynamics. When a story is posted or reshared, users are exposed to it. The paper captures exposures using exogenous events (initial posts, reshares) and endogenous events (users who see the story and then decide to reshare or flag it). An intensity function models the rate at which new exposures occur.
Flagging and fact-checking. Users can flag a story as misinformation; the paper models flagging probability using a Bernoulli distribution. When a story receives enough flags, it is sent for fact-checking. If fact-checked and verified as false, it is marked as disputed, reducing further exposures.
Optimal control formulation. The problem is to find fact-checking intensities \(u(t)\) that minimize expected misinformation spread, subject to the constraint that some stories go unverified and flagging probability is unknown. The authors derive the optimal intensity as:
where \(M(t)\) indicates whether the story has been verified, \(N^f(t)\) and \(N^e(t)\) count flags and exposures, and \(p_{m|f}\) are posterior probabilities that a story is misinformation given flagging evidence.
CURB algorithm. Iteratively samples the next exposure or flagging event, updates state, and applies the optimal control formula. The algorithm exploits the fact that optimal intensity depends linearly on exposure intensity, making it simple and efficient.
Results¶
Datasets: Twitter (192,350 posts, 117,821 users, 111 stories) and Weibo (3,752,459 posts, 2,819,338 users, 4,663 stories).
Metrics: - Precision — fraction of fact-checked stories that are actually false. - Misinformation reduction — fraction of unverified exposures prevented by fact-checking.
Key findings: - CURB achieves comparable or superior performance to an oracle baseline with access to true flagging probabilities, and substantially outperforms simpler heuristics (flag ratio, flag sum, exposure-based selection). - The algorithm is robust to variations in the crowd's ability to spot misinformation (true positive rate \(p_f\)); better crowds yield better performance and higher precision. - CURB is more cautious when flagging evidence is weak, reducing exposures only when confidence in misinformation is high. - The algorithm scales to large numbers of stories and can be run in parallel.
Connections¶
- Related to 2018 Vosoughi Spread False Claims on the epidemiology of misinformation spread; CURB takes an interventionist angle.
- Builds on Fact-checking and corrections and Crowdsourcing as mechanisms for reducing misinformation.
- Shares the optimal-control framing with Misinformation interventions strategies.
- Uses temporal point processes, similar to Early detection of misinformation approaches in rumor detection.
Notes¶
Strengths: The paper makes a novel connection between stochastic optimal control and misinformation interventions, grounding the problem in rigorous mathematics. The CURB algorithm is interpretable—the optimal intensity is linear in exposures—and computationally efficient. Experiments on real data (Twitter and Weibo) show practical value.
Limitations: The paper assumes all users are equally reliable at flagging misinformation; in practice, some users are more trustworthy than others. The independence assumption between stories may not hold (correlated narratives, coordinated disinformation). The cost of fact-checking is modeled as a simple trade-off parameter rather than resource constraints or fact-checker availability. The paper assumes fact-checking is instantaneous; in reality, there is often a delay.
Follow-ups: Extending the framework to account for source credibility and user trustworthiness. Modeling interdependencies between stories (e.g., variants of the same false claim). Incorporating delays in fact-checking and accounting for the quality of fact-check verdicts themselves.