Computational social science and large-scale text analysis¶

Computational social science leverages large digitized datasets and machine learning to answer social questions at scales impractical for traditional qualitative or hand-coded methods. For misinformation research, this includes: Structural Topic Modeling to identify discourse themes from massive text corpora, social network analysis to map organizational influence and information flow, and temporal analysis of how messaging evolves in response to events or funding.

Key advantages: Reproducibility via automated coding, ability to detect patterns missed by human analysts, and capacity to examine entire populations (e.g., all tweets, all press releases) rather than samples. Limitations include need for human interpretation of machine-discovered patterns and potential encoding of biases present in training data.

Key papers¶

A large-scale COVID-19 Twitter chatter dataset for open scientific research - an international collaboration — Collaborative international effort to curate 800+ million COVID-19 tweets with multilingual coverage; demonstrates computational social science infrastructure for crisis informatics including preprocessing pipelines, metadata extraction, and open-science data governance
Botometer 101: Social bot practicum for computational social scientists — Tutorial on Botometer demonstrating practical application of machine learning and computational social science methods for large-scale Twitter analysis; instructs researchers how to apply supervised classification and interpret results responsibly
Bail et al. (2018) — Exposure to opposing views on social media can increase political polarization — Methodological innovation combining survey research, bot technology, and digital trace data; demonstrates techniques for measuring treatment compliance, mitigating causal inference challenges, and verifying survey responses with behavioral data; contributes computational social science methods for studying social media and politics.
Farrell (2016) — Corporate funding and ideological polarization about climate change — exemplary application of Structural Topic Modeling on 40,785 texts paired with organizational network analysis and metadata-conditional covariate effects; demonstrates how machine learning can uncover latent thematic structure and how funding covariates shape topic prevalence over time.