Skip to content

Home

GUIDE

A curated reference for researchers working on fake news, misinformation, and disinformation — covering detection methods, propagation dynamics, and the underlying psychology and social science.

Maintained by the Syracuse University DataLab. Every claim on every page traces back to a primary source.

Survey

Entry points for the field: - Cao et al. (2023) — A Comprehensive Survey of AI-Generated Content (AIGC): A History of Generative AI from GAN to ChatGPT in J. ACM — foundational survey of generative AI covering history (GANs to transformers), technical foundations, unimodal and multimodal models, applications across domains, and critical trustworthiness concerns (factuality, security, privacy, fairness); essential context for understanding AI-enabled misinformation generation and detection. - Chang et al. (2023) — A Survey on Evaluation of Large Language Models in J. ACM — comprehensive survey on evaluation methodologies for LLMs across three dimensions (what, where, how to evaluate); encompasses 269 papers on natural language understanding, generation, reasoning, robustness, ethics, bias, factuality, trustworthiness, and domain-specific applications; essential for understanding how to assess LLMs used in misinformation detection and generation. - Liu et al. (2023) — Trustworthy LLMs: A Survey and Guideline for Evaluating Large Language Models' Alignment — comprehensive survey on LLM trustworthiness across seven dimensions (reliability, safety, fairness, resistance to misuse, explainability, social norms, robustness) with 29 sub-categories; presents taxonomy, measurement studies on multiple LLMs, and case studies demonstrating effectiveness of alignment varies significantly across trustworthiness categories. - Hamborg, Donnay & Gipp (2018) — Automated identification of media bias in news articles: an interdisciplinary literature review in International Journal on Digital Libraries — bridges social science and computer science research on media bias detection; defines nine bias forms (event/source selection, labeling, placement, spin); maps manual analysis methods to computational approaches - Zhou, Xu, Trajcevski & Zhang (2021) — A Survey of Information Cascade Analysis: Models, Predictions, and Recent Advances in ACM Computing Surveys — comprehensive survey of cascade prediction covering 250+ papers; taxonomy of feature-based, generative, and deep learning approaches - Zhou & Zafarani (2020) — A Survey of Fake News in ACM Computing Surveys — concise, authoritative survey of detection methods and opportunities - Shu et al. (2020) — Mining Disinformation and Fake News: Concepts, Methods, and Recent Advancements — comprehensive book chapter covering user engagement, weak supervision approaches, and trending issues

Sources by type

See all papers, all articles (populated by ingest workflow)

Foundations

  • Kaddour et al. (2022) — Causal Machine Learning: A Survey and Open Problems — comprehensive 191-page survey of CausalML methods that formalize data generation as a structural causal model to enable reasoning about interventions and counterfactuals; taxonomizes 5 problem areas (causal supervised learning, generative modeling, explanations, fairness, reinforcement learning) with systematic comparison of methods and applications to computer vision, NLP, and graph learning; addresses robustness via invariant features, fairness via counterfactual constraints, and generalization across environments.
  • Brundage et al. (2018) — The Malicious Use of Artificial Intelligence: Forecasting, Prevention, and Mitigation — comprehensive threat analysis of AI-enabled attacks across digital, physical, and political security domains; identifies AI-specific threats including deepfakes, automated disinformation campaigns, and denial-of-information attacks; policy recommendations for mitigating malicious AI uses while enabling beneficial applications.
  • Bommasani et al. (2021) — On the Opportunities and Risks of Foundation Models — comprehensive 214-page Stanford CRFM report analyzing foundation models (large models trained on broad data and adapted to diverse tasks); examines capabilities across language, vision, and reasoning; applications in healthcare, law, education; and critical societal risks including misuse for misinformation/deepfakes, fairness harms, environmental costs, and security vulnerabilities; proposes frameworks for detection and mitigation.
  • Efficient Estimation of Word Representations in Vector Space — Efficient Continuous Bag-of-Words (CBOW) and Skip-gram architectures for learning high-quality word embeddings from large corpora in under a day; demonstrates that word vectors capture both syntactic and semantic regularities enabling vector arithmetic (king − man + woman ≈ queen); foundational technique widely adopted in NLP pipelines including fake news detection systems.
  • Distributed Representations of Words and Phrases and their Compositionality — Extends Skip-gram with phrase representations, negative sampling, and subsampling of frequent words; demonstrates that word vectors exhibit compositional structure via simple vector addition; achieves 72% accuracy on phrase analogy tasks, enabling embedding-based representation of multi-word units and idioms.
  • Vaswani et al. (2017) — Attention Is All You Need — introduces the Transformer architecture based entirely on self-attention mechanisms, replacing recurrence and convolution; proposes scaled dot-product and multi-head attention; achieves state-of-the-art BLEU scores on machine translation (28.4 on WMT14 En-De, 41.0 on En-Fr) with significantly faster training; demonstrates strong generalization to parsing and other NLP tasks; canonical architecture underlying BERT, GPT, and virtually all modern fake news detection models.
  • Liu et al. (2019) — RoBERTa: A Robustly Optimized BERT Pretraining Approach — systematic replication study showing BERT was undertrained; proposes RoBERTa with four key improvements: dynamic masking, removal of next-sentence-prediction loss, training on longer sequences, and larger mini-batches; achieves state-of-the-art on GLUE, RACE, and SQuAD benchmarks; became foundational pretrained language model for downstream fine-tuning across NLP applications including fake-news and misinformation detection systems.
  • Izacard et al. (2022) — Atlas: Few-shot Learning with Retrieval Augmented Language Models — jointly-trained retrieval-augmented model achieving strong few-shot performance on knowledge-intensive tasks with compact parameters; achieves 42.4% on NaturalQuestions and 80.1% on FEVER fact-checking with full data; demonstrates improved parameter efficiency and interpretability compared to dense-only language models; foundational for resource-efficient fact-checking and misinformation detection systems.
  • Waseem (2016) — Are You a Racist or Am I Seeing Things? Annotator Influence on Hate Speech Detection on Twitter — empirical comparison of expert (feminist and anti-racism activists) vs. crowdsourced annotations on 6,909 tweets; demonstrates that systems trained on expert labels substantially outperform those on crowdworker data (F1 91.19 vs. 83.88); introduces intersectional annotation scheme capturing both racism and sexism simultaneously; foundational methodological work on data quality and annotator expertise in hate speech detection.
  • Pulastya et al. (2021) — Assessing the Quality of the Datasets by Identifying Mislabeled Samples — proposes AQUAVS, a supervised variational autoencoder with auxiliary discriminative network, to automatically identify mislabeled data points via outlier detection in latent space; demonstrates high-precision mislabel identification without requiring clean validation data or prior knowledge of noise type; shows significant accuracy improvements on downstream classification tasks after filtering identified mislabeled samples, especially on MNIST and CIFAR-10.
  • Zannettou et al. (2018) — The Web of False Information: Rumors, Fake News, Hoaxes, Clickbait, and Various Other Shenanigans — comprehensive typology of false information ecosystem identifying eight types of false information, twelve actor categories, and six motives; surveys 200+ papers on user perception, propagation dynamics, detection, containment, and political misinformation.
  • Anand, Chakraborty & Park (2016) — We used Neural Networks to Detect Clickbaits: You won't believe what happened Next! — bidirectional LSTM with distributed word embeddings and character-level CNN embeddings for clickbait detection; achieves 98% accuracy and 0.99 ROC-AUC on 15,000-headline dataset, 5% improvement over hand-crafted feature baselines; demonstrates effectiveness of deep learning without feature engineering for headline classification.
  • Lazer et al. (2018) — The Science of Fake News — multidisciplinary Science article synthesizing knowledge on fake news prevalence, impact, psychological mechanisms, and interventions (individual and platform-based); identifies major gaps and calls for industry-academic collaboration; canonical framework for the field.
  • Mohseni & Ragan (2018) — Combating Fake News with Interpretable News Feed Algorithms — position paper reviewing fake news detection methods and arguing that transparent, interpretable news feed algorithms could mitigate misinformation amplification by increasing user awareness of algorithmic curation; identifies echo chambers and filter bubbles as key mechanisms of harm.
  • Guess, Nagler & Tucker (2019) — Less than you think: Prevalence and predictors of fake news dissemination on Facebook — empirical study linking survey data (N=3,500) to Facebook profiles (N=1,191); establishes that fake news sharing during 2016 was rare (8.5% of users); identifies age as the strongest demographic predictor—users 65+ shared nearly 7 times as many fake news articles as those 18–29; effect persists after controlling for ideology and education.
  • Tandoc, Lim & Ling (2017) — Defining "Fake News": A typology of scholarly definitions — systematic review of 34 academic studies (2003–2017) that define and operationalize "fake news"; proposes a two-dimensional typology (facticity × intent to deceive) identifying six types: satire, parody, fabrication, manipulation, advertising, and propaganda; foundational for conceptual clarity in the field.
  • Lewandowsky et al. (2012) — Misinformation and its correction: Continued influence and successful debiasing — Psychological Science in the Public Interest foundational review synthesizing literature on why false beliefs persist despite corrections; examines cognitive mechanisms (mental models, source confusion, fluency, coherence) and proposes evidence-based debiasing strategies (warnings, alternative explanations, repeated corrections, worldview-consonant framing) with practical guidance for practitioners.
  • Zhou, Zafarani, Shu & Liu (2019) — Fake News: Fundamental Theories, Detection Strategies and Challenges — WSDM '19 tutorial survey synthesizing 20+ interdisciplinary theories (psychology, social science, economics, forensics) explaining why misinformation succeeds and why people spread it; unified framework of four detection perspectives (knowledge, style, propagation, credibility); identifies open challenges in timeliness, cross-domain transfer, and efficiency for real-world deployment.
  • Douglas, Sutton & Cichocka (2017) — The Psychology of Conspiracy Theories — Current Directions in Psychological Science review synthesizing two decades of empirical research; proposes unified taxonomy of epistemic (seeking understanding and certainty), existential (seeking control and security), and social (seeking positive in-group identity) motives driving conspiracy-theory belief; critical finding that conspiracy theories appear to frustrate rather than satisfy these underlying needs, making them a self-defeating form of motivated reasoning.
  • van Prooijen & Douglas (2018) — Belief in conspiracy theories: Basic principles of an emerging research domain — European Journal of Social Psychology special issue introduction synthesizing the emerging research domain around four foundational principles: conspiracy beliefs are consequential (with real impacts on health and relationships), universal (across cultures and historical periods), emotional (driven by sense-making rather than logic), and social (rooted in intergroup conflict); provides organizing framework for understanding why conspiracy theories appeal across diverse contexts.
  • Acerbi (2019) — Cognitive attraction and online misinformation — Palgrave Communications content analysis of 260 articles from 26 hoax websites showing that misinformation succeeds due to psychological appeal rather than media inefficiency; 86% of articles contain threat-related content (28%), negative framing (49%), social information (50%), or cognitive-preference elements; reframes misinformation as "high-quality" when measured by cognitive appeal, not truthfulness.
  • Treen, Williams & O'Neill (2020) — Online misinformation about climate change — comprehensive literature review synthesizing research across communication, psychology, computer science, and political science on climate change misinformation; defines concepts (misinformation vs. disinformation), identifies actor networks (scientists, governments, industry, media, think tanks), examines spread mechanisms via social media (homophily, echo chambers, algorithmic bias), analyzes impacts on policy and public attitudes, reviews countermeasures (inoculation, correction, detection, platform mechanisms).
  • Farrell (2016) — Corporate funding and ideological polarization about climate change — PNAS empirical study combining Structural Topic Modeling on 40,785 texts with organizational network analysis of 164 climate contrarian organizations (1993–2013); demonstrates that corporate funding from ExxonMobil and Koch foundations directly influences thematic content of polarization efforts, with funded organizations emphasizing energy-production-friendly and scientific-skepticism frames; provides empirical evidence for long-suspected dynamics of how private funding shapes public scientific discourse.
  • van der Linden et al. (2017) — Inoculating the Public against Misinformation about Climate Change — large-scale randomized experiments (N=2,167) testing attitudinal inoculation against competing consensus-related misinformation; consensus messaging increases perceived agreement 20 percentage points, but is nullified by competing claims; pre-emptive warnings and refutations preserve two-thirds of the effect across political spectrum, with no evidence of backfire.
  • Cook, Lewandowsky & Ecker (2017) — Neutralizing misinformation through inoculation: Exposing misleading argumentation techniques reduces their influence — PLOS ONE empirical study (N=714 and 392 in two experiments) testing inoculation theory on climate change misinformation; demonstrates that pre-exposure to explanations of flawed argumentation techniques (false balance, fake experts) neutralizes misinformation and reduces politically motivated polarization.
  • Pennycook et al. (2021) — Shifting attention to accuracy can reduce misinformation online — Nature paper demonstrating that subtle reminders to focus on accuracy increase sharing of accurate news across six survey experiments and a Twitter field experiment; identifies limited attention (not confusion or indifference) as the primary mechanism driving misinformation sharing; shows that reorienting people's limited cognitive attention toward accuracy can substantially improve the quality of shared information online.
  • Guess et al. (2020) — A digital media literacy intervention increases discernment between mainstream and false news in the United States and India — large-scale RCT testing Facebook's "Tips to Spot False News" platform-based intervention across three samples; finds 26.5% improvement in discernment in nationally representative US sample, 17.3% in India online, and no effect in rural face-to-face sample; effects persist ~3 weeks in US but decay over time; demonstrates that simple, scalable media literacy teaching can improve news evaluation ability but effectiveness varies by digital experience and context.
  • Roozenbeek & van der Linden (2019) — Fake news game confers psychological resistance against online misinformation — Palgrave Communications empirical study (N=15,000) demonstrating that active inoculation through a gamified ~15-minute intervention teaching six deception techniques (impersonation, polarisation, emotional manipulation, conspiracy theories, discrediting, trolling) significantly reduces perceived reliability of fake news across education, age, and political ideology; largest effect among those most vulnerable to misinformation.
  • Ecker et al. (2022) — The psychological drivers of misinformation belief and its resistance to correction — Nature Reviews Psychology comprehensive review of cognitive, social, and affective mechanisms in false belief formation; barriers to belief revision (continued influence effect); evidence-based interventions (prebunking and debunking); implications for journalists, policymakers, information consumers, and health communicators.
  • Wardle & Derakhshan (2017) — Information Disorder: Toward an Interdisciplinary Framework for Research and Policy Making — canonical framework distinguishing mis-, dis-, and mal-information by falseness and intent-to-harm; agent-message-interpreter model; creation-production-distribution lifecycle analysis; 34 policy recommendations for platforms, governments, media, and civil society.
  • Jack (2024) — Lexicon of Lies: Terms for Problematic Information — Data & Society practitioner's guide to terminology; clarifies distinctions between misinformation, disinformation, propaganda, information operations, gaslighting, and related concepts; examines challenges in establishing intent and cross-cultural complications.
  • Marwick & Lewis (2017) — Media Manipulation and Disinformation Online — Data & Society ecosystem-level analysis of internet subcultures' tactics, actors, and platform vulnerabilities; case studies of Gamergate, Pizzagate, and 2016 election manipulation.
  • Papasavva et al. (2021) — The Gospel According to Q: Understanding the QAnon Conspiracy from the Perspective of Canonical Information — empirical study of the QAnon conspiracy theory analyzing 4,961 unique Q drops from six aggregation sites, 121,956 Reddit posts, and related 4chan/8kun content; demonstrates poor canonicalization of Q drops across aggregation sites, provides stylometric evidence of multiple authors, and traces QAnon's transition from fringe imageboards to mainstream social networks via Reddit's crucial intermediary role.
  • Allcott & Gentzkov (2017) — Social Media and Fake News in the 2016 Election — first comprehensive empirical evidence on fake news exposure; database of 156 election-related false stories, web traffic data, and post-election survey of 1,208 adults; estimates average American saw 1.14 fake articles; documents 3:1 partisan asymmetry (pro-Trump articles shared 30M times vs. pro-Clinton 7.6M); economic model of fake news supply/demand; argues electoral impact smaller than single TV ad.
  • Sahly, Shao & Kwon (2019) — Social Media for Political Campaigns: An Examination of Trump's and Clinton's Frame Building and Its Effect on Audience Engagement — comparative content analysis of frame building in 2016 campaign across Twitter (3,805 Trump, 655 Clinton tweets) and Facebook (655 posts); finds Trump relied on conflict and negative emotion frames while Clinton used morality and positive frames; frame effects on engagement (retweets, shares) consistent on Twitter but platform-specific on Facebook
  • Nelson & Taneja (2018) — The small, disloyal fake news audience — empirical audience measurement using comScore data showing fake news reaches only 675K unique monthly visitors vs. 28M for real news; applies the Law of Double Jeopardy to show fake news audiences are small and disloyal; demonstrates that audience availability (time spent online) is a stronger predictor of misinformation exposure than demographics; shows 80% of fake news traffic originates from social platforms, particularly Facebook.
  • Allen et al. (2020) — Evaluating the fake news problem at the scale of the information ecosystem — multimode national dataset (Nielsen TV, Comscore desktop/mobile) spanning 2016–2018; fake news comprises 0.15% of daily media diet; TV dominates news 5:1 over online; reframes misinformation debate toward mainstream media bias and news avoidance rather than overt fakery.
  • Helmus et al. (2018) — How to Counter Russian Social Media Influence in Eastern Europe — RAND Corporation report analyzing Russian state-sponsored social media campaigns; documents coordinated troll networks, bot accounts, fake hashtags, and nonattributed comments targeting Eastern European publics; mixed-methods approach combining quantitative social media analysis with expert interviews; identifies counter-strategies including accelerated detection, alternative narratives, and institutional capacity building.
  • Friggeri et al. (2014) — Rumor Cascades — large-scale empirical study of 16,672 rumor cascades on Facebook using Snopes.com ground truth; shows rumor cascades run deeper than typical content; finds true rumors more viral than false despite false rumors dominating uploads (62% vs. 45% on Snopes); Snopes fact-checks increase deletion likelihood 4.4× for false rumors but have minimal long-term propagation effects; demonstrates rumor mutation and variant selection over time.

Network and graph algorithms

  • Mao et al. (2024) — Advancing Graph Representation Learning with Large Language Models: A Comprehensive Survey of Techniques — comprehensive survey of integrating LLMs with graph representation learning; proposes novel taxonomy decomposing models into primary components (knowledge extractors for attributes, structures, and labels; knowledge organizers as GNN-centric, LLM-centric, or hybrid) and operation techniques (integration strategies at input, hidden, and alignment-based levels; training strategies via pre-training, prompting, and instruction tuning); essential framework for understanding emerging graph foundation models that combine graph structure with semantic information.
  • Vatter, Mayer & Jacobsen (2023) — The Evolution of Distributed Systems for Graph Neural Networks and their Origin in Graph Processing and Deep Learning: A Survey — comprehensive survey of distributed systems for scalable GNN training, bridging graph processing systems (Pregel, PowerGraph, GraphLab) and DNN training frameworks; systematically categorizes partitioning strategies, sampling techniques, inter-process communication, synchronization modes, and programming abstractions across 20+ systems (DGL, GraphSAINT, DistDGL, etc.); essential for understanding how to scale GNN-based misinformation detection to large social networks.
  • Dai et al. (2023) — A Comprehensive Survey on Trustworthy Graph Neural Networks: Privacy, Robustness, Fairness, and Explainability in ACM Computing Surveys — comprehensive survey of trustworthy GNN research covering privacy attacks (membership inference, property inference, reconstruction) and defenses (differential privacy, federated learning, machine unlearning), adversarial robustness methods, fairness approaches to prevent discrimination, and explainability techniques; essential for understanding GNN-based approaches to misinformation detection and their real-world deployment challenges.
  • Fortunato, S. (2009) — Community detection in graphs — comprehensive 103-page survey covering algorithm taxonomy (hierarchical clustering, spectral methods, modularity optimization), theoretical foundations (NP-hardness, quality functions), benchmarking on standard networks, and applications across biological, social, and technological networks; provides foundational theory and methods for network-based approaches to misinformation detection and information diffusion analysis.

Rumour verification and stance

Stance detection

Shared tasks and benchmarks

  • The Clickbait Challenge 2017: Towards a Regression Model for Clickbait Strength — Clickbait Challenge 2017 shared task with 38,517 graded-scale annotated tweets; 13 submitted systems achieving significant performance gains over prior baselines; reformulates clickbait detection as regression to measure strength rather than binary classification; introduces Webis Clickbait Corpus 2017
  • A Benchmark Study of Machine Learning Models for Online Fake News Detection — Comprehensive empirical benchmark comparing 19 machine learning models (8 traditional, 6 deep learning, 5 pre-trained transformers) on three fake news datasets spanning politics, health, and diverse topics; finding: BERT-based pre-trained models (RoBERTa 96% accuracy on large datasets, >90% with 500 samples) substantially outperform traditional ML and deep learning approaches; practical guidance for practitioners across resource constraints.
  • RumourEval 2019: Determining Rumour Veracity and Support for Rumours — extended shared task with Twitter and Reddit data; two subtasks: (A) SDQC stance classification on 8,574 conversation posts, (B) veracity prediction on 446 rumours; 22 system submissions (70% increase from 2017); best systems employ pre-trained contextual embeddings (BERT, GPT); demonstrates that conversation structure and ensemble approaches advance rumour verification beyond single-task specialization.
  • SemEval-2017 Task 8: RumourEval — benchmark shared task establishing foundation for rumour verification research; defines two subtasks: (a) SDQC stance classification (Support/Deny/Query/Comment) of replies to rumourous claims, and (b) veracity prediction (true/false) of source tweets; provides datasets from 10 events with 297 training threads, 28 test threads, and 1,080 tweets; 13 systems from 4 continents participated; results show stance classification achievable (78% best), but veracity prediction remains hard (below baselines).
  • Kochkina, Liakata & Augenstein (2017) — Turing at SemEval-2017 Task 8: Sequential Approach to Rumour Stance Classification with Branch-LSTM — best-performing system in RumourEval 2017 Subtask A; proposes Branch-LSTM architecture that decomposes conversation trees into linear branches and models them sequentially; achieves 78.4% accuracy using LSTM layers processing tweet sequences with word2vec and hand-crafted lexical/relational features.
  • Thorne et al. (2018) — The Fact Extraction and VERification (FEVER) Shared Task — first shared task combining evidence retrieval and natural language inference for fact verification; 23 teams, 185,445 human-generated claims verified against Wikipedia; best system achieves 64.21% FEVER score; analysis reveals three-stage pipeline architecture (document selection → sentence selection → NLI) is dominant; post-competition evidence augmentation identified 308 new evidence sets and corrected label errors.
  • Da San Martino, Barrón-Cedeño & Nakov (2019) — Findings of the NLP4IF-2019 Shared Task on Fine-Grained Propaganda Detection — shared task on propaganda technique identification in news articles; two subtasks: FLC (fragment-level with 18-way technique classification) and SLC (sentence-level binary); 90 registered teams with 39 submitting predictions; winning systems use fine-tuned BERT achieving 0.63 F1 (SLC) and 0.25 F1 (FLC); corpus of 497 annotated articles with fragment-level annotations enables interpretable propaganda analysis.

Datasets and resources

Media profiling and source credibility

Deception and behavioral detection

  • A Deep Learning Approach for Multimodal Deception Detection — Multimodal neural networks for deception detection using 3D-CNN on video, audio features, textual CNN, and micro-expressions; achieves 96.14% accuracy on courtroom trial videos, substantially outperforming traditional classifiers

Mainstream media dissemination

COVID-19 pandemic infodemic

Synthetic media and deepfakes

  • Mirsky & Lee (2020) — The Creation and Detection of Deepfakes: A Survey — comprehensive 38-page survey covering both creation and detection methodologies; systematically reviews generative architectures (GANs, VAEs, CNNs, RNNs), technical approaches to reenactment/replacement/editing/synthesis, artifact-specific and undirected detection methods; identifies arms race dynamics and current technological limitations.
  • Tolosana et al. (2020) — DeepFakes and Beyond: A Survey of Face Manipulation and Fake Detection — comprehensive survey of facial manipulation techniques (entire face synthesis, identity swap, attribute manipulation, expression swap) and detection methods; covers GAN-based generation (StyleGAN, ProGAN), public databases, and state-of-the-art benchmarks showing detection difficulty under cross-domain conditions.
  • Rana et al. (2022) — Deepfake Detection: A Systematic Literature Review — comprehensive SLR of 112 deepfake detection papers (2018–2020) with rigorous methodology; organizes 77% deep learning (primarily CNNs), 18% machine learning, 3% statistical, and 2% blockchain-based techniques; synthesizes datasets (FaceForensics++, DFDC, DeeperForensics); evaluates 100+ detection models and 10+ feature types; finds deep learning achieves 89.7% mean accuracy vs. 85% for traditional ML; identifies standardization gaps in evaluation and future research directions.
  • Ba et al. (2024) — Exposing the Deception: Uncovering More Forgery Clues for Deepfake Detection — information-theoretic framework decomposing facial features into disentangled local representations and aggregated global representations using mutual information losses; achieves 0.983 AUC on FaceForensics++, 0.999 AUC on Celeb-DF-V2, 0.939 AUC on DFDC; demonstrates strong cross-dataset generalization (0.818-0.864 AUC on Celeb-DF when trained on FaceForensics++); addresses overfitting limitations of prior region-specific detection methods.
  • Cifci, Demir & Yin (2019) — FakeCatcher: Detection of Synthetic Portrait Videos using Biological Signals — detects deepfakes via photoplethysmography (blood flow patterns) by analyzing spatial coherence and temporal consistency of biological signals; achieves 91%+ accuracy on Face Forensics++, CelebDF, and UADFV; introduces "in the wild" Deep Fakes Dataset; demonstrates biological signal inconsistencies as orthogonal detection signal to visual artifacts.
  • Sabir et al. (2019) — Recurrent Convolutional Strategies for Face Manipulation Detection in Videos — recurrent-convolutional networks exploiting temporal discrepancies for detecting Deepfake, Face2Face, and FaceSwap; combines face alignment preprocessing with bidirectional GRU cells operating on frame sequences; achieves 96.9%, 94.35%, and 96.3% accuracy respectively on FaceForensics++, improving prior state-of-the-art by up to 4.55%; shows bidirectional temporal recurrence essential while multi-level recurrence hurts due to limited training data.
  • DeepFakes: a New Threat to Face Recognition? Assessment and Detection — first publicly available GAN-based Deepfake database (620 videos from 16 VidTIMIT subject pairs); demonstrates that VGG and FaceNet face recognition systems achieve FAR of 85.62% and 95.00% on high-quality deepfakes; evaluates detection methods showing audio-visual lip-sync approaches fail entirely while image quality metrics achieve 8.97% EER
  • Rössler et al. (2019) — FaceForensics++: Learning to Detect Manipulated Facial Images — largest facial forgery benchmark (1.8M+ images from 1K+ videos) with four manipulation methods (Face2Face, FaceSwap, DeepFakes, NeuralTextures); comprehensive evaluation of detection methods from stegananalysis to CNN-based approaches; human baseline (68.7% accuracy) vs. XceptionNet (99.26%); systematic analysis of compression robustness showing significant performance degradation under realistic post-processing.
  • Li, Chang & Lyu (2018) — In Ictu Oculi: Exposing AI Generated Fake Face Videos by Detecting Eye Blinking — detects deepfakes via physiological signal absence (eye blinking); LRCN model combining CNN feature extraction with LSTM temporal modeling achieves 0.99 AUC vs. 0.98 for CNN-only and 0.79 for hand-crafted baselines; exploits fact that deepfake training datasets rarely contain closed-eye images, making natural blink sequences absent from synthesized video.
  • Zhou et al. (2018) — Two-Stream Neural Networks for Tampered Face Detection — two-stream architecture combining GoogLeNet (high-level visual artifacts) with steganalysis-based triplet network (low-level noise residuals) for detecting face swapping; achieves 0.927 AUC on SwapMe/FaceSwap dataset of 2010 high-quality tampered images; demonstrates robustness to post-processing (resizing, blurring, blending).
  • McCloskey & Albright (2018) — Detecting GAN-generated Imagery using Color Cues — forensic detection of GAN-generated images by analyzing generator network architecture; identifies two cues (color channel overlap and saturation suppression) that distinguish GANs from real cameras; saturation-based SVM achieves 0.7 AUC on fully GAN-generated images and 0.61 on face-swapped images.
  • Vaccari & Chadwick (2020) — Deepfakes and Disinformation: Exploring the Impact of Synthetic Political Video on Deception, Uncertainty, and Trust in News — experimental study (N=2,005 UK respondents) on political deepfakes using the widely-circulated Obama/Peele deepfake; finds that deepfakes increase uncertainty about content, and this uncertainty mediates reduced trust in news on social media; deepfakes threaten civic culture through epistemic erosion rather than mass deception; educational interventions showing deepfakes are synthetic can mitigate effects.
  • Fagni et al. (2020) — TweepFake: about detecting deepfake tweets — first public dataset of human vs. machine-generated tweets; 25,572 tweets from 23 bot accounts (GPT-2, RNN, LSTM, Markov, etc.) and 17 human accounts; benchmarks 13 detection methods; finds transformer-based fine-tuned models (RoBERTa) achieve 90% accuracy, character-level encodings effective for short text, but GPT-2 tweets remain challenging (65–80% accuracy).
  • Yang, Li & Lyu (2018) — Exposing Deep Fakes Using Inconsistent Head Poses — forensic detection method exploiting facial landmark misalignment in deepfake generation pipeline; compares 3D head poses estimated from all facial landmarks vs. central region only; real faces show consistent poses (cosine distance <0.02) while deepfakes exhibit large divergence (0.02–0.08); SVM classifier achieves 89.0% AUROC on deepfake videos and 84.3% on diverse face-swap dataset; demonstrates that synthesis errors invisible to human eye are detectable through geometric constraints.
  • Afchar et al. (2018) — MesoNet: A Compact Facial Video Forgery Detection Network — lightweight CNN architectures (Meso-4 and MesoInception-4) for detecting Deepfake and Face2Face forgeries at mesoscopic level; achieves 98% detection accuracy for Deepfake and 95% for Face2Face under realistic compression; introduces first publicly available Deepfake dataset with 175 videos; demonstrates that efficient networks with ~28K parameters match or exceed complex architectures while remaining computationally practical.

Offensive AI & threat modeling

  • Mirsky et al. (2021) — The Threat of Offensive AI to Organizations — comprehensive survey of 33 offensive AI capabilities (OACs) adversaries use to attack organizations, categorized into automation, campaign resilience, credential theft, exploit development, information gathering, social engineering, and stealth. Through expert user study (N=22), ranks threats by profit/achievability/defeatability/harm, finding exploit development, social engineering, and information gathering pose the greatest risk. Develops threat model T = H × (M/D) enabling practitioners to prioritize defensive investments. Particularly relevant for understanding AI-enabled social engineering attacks (deepfakes, impersonation, phishing) and model extraction threats to misinformation detection systems.

AI safety & governance

  • Mehrabi et al. (2023) — FLIRT: Feedback Loop In-context Red Teaming — Automated red teaming framework using in-context learning to generate adversarial prompts targeting generative models. Red language model generates prompts without fine-tuning; outputs are evaluated for safety, and feedback refines future generations. Proposes multiple attack strategies (FIFO, LIFO, Scoring, Scoring-LIFO) balancing effectiveness vs. diversity; demonstrates 80%+ attack success on vanilla Stable Diffusion and 60%+ on safeguarded variants, substantially outperforming prior manual and weakly-automated approaches. Shows attacks transfer across text-to-image models and extends to text-to-text models (GPT-Neo).
  • Perez et al. (2022) — Discovering Language Model Behaviors with Model-Written Evaluations — proposes using language models to generate high-quality evaluations for testing diverse model behaviors. Generates 154+ datasets testing 154 behaviors across personality, goal-seeking, politics, and ethics. Discovers inverse scaling phenomena where larger models exhibit worse behavior on some safety-relevant tasks (stronger political views, greater desire to avoid shutdown, increased sycophancy). Shows RLHF training amplifies political bias and can create unintended instrumental subgoals. Demonstrates that smaller preference models effectively predict RLHF model behavior, enabling early detection of safety concerns before full deployment.
  • Perez et al. (2022) — Red Teaming Language Models with Language Models — demonstrates automated red teaming to systematically discover harmful behaviors in language models. Uses one LM to generate adversarial test cases probing another LM for offensive replies, data leakage, personal information generation, and distributional biases. Explores zero-shot, few-shot, supervised, and reinforcement learning methods; RL achieves 27-42% offensive reply elicitation rates. Uncovers 1709 training data leakage instances and reveals that models discuss different demographic groups with significantly different offensiveness rates. Foundational work showing language models can complement human red teaming at scale.
  • Bang et al. (2024) — Measuring Political Bias in Large Language Models: What Is Said and How It Is Said — evaluates political bias in LLM-generated content via two-tiered framework separating political stance (extreme anchor comparison) from framing bias (content and style components). Tests 11 open-source LLMs (LLaMa-2, Yi, Vicuna, Falcon, etc.) on 14 politically divisive topics; finds models exhibit liberal bias on social issues (same-sex marriage, climate change, public education), US-centric focus despite global training claims, and nuanced issue-specific biases varying across models. Decomposition of bias into content bias (entity/topic selection) and style bias (lexical polarity) provides explainable measurement beyond left-right spectrum.
  • Goldstein et al. (2023) — Generative Language Models and Automated Influence Operations: Emerging Threats and Potential Mitigations — threat assessment of how generative language models could expand the scale and sophistication of influence operations and propaganda campaigns. Uses ABC framework (Actors, Behaviors, Content) to analyze threats; proposes comprehensive mitigation taxonomy across four intervention points (model design & construction, model access, content dissemination, belief formation). Evaluates mitigations for technical feasibility, social feasibility, downside risks, and impact. Identifies critical unknowns about adoption rates, effectiveness, and norm-setting.
  • [[2023-shu-exploitability-instruction-tuning|Shu et al. (2023) — On the Exploitability of Instruction Tuning]] — investigates vulnerabilities in instruction-tuned LLMs to data poisoning attacks via AutoPoison, an automated pipeline generating high-quality poisoned training data; demonstrates content injection and over-refusal attacks that scale to larger models while maintaining fluency; shows instruction tuning's low sample complexity is a double-edged sword enabling both capability learning and behavior hijacking.
  • Evans et al. (2021) — Truthful AI: Developing and Governing AI That Does Not Lie — policy and governance framework for preventing AI systems from generating false or misleading statements. Proposes conceptual distinctions between lies, negligent falsehoods, and truthfulness; describes institutional arrangements for AI truthfulness standards (industry self-regulation, co-regulation, top-down regulation); outlines technical approaches to developing truthful AI. Argues early standards-setting is crucial before AI capabilities in strategic deception exceed human capacity.
  • Wei, Haghtalab & Steinhardt (2023) — Jailbroken: How Does LLM Safety Training Fail? — analyzes fundamental failure modes in safety-trained language models through two mechanisms: competing objectives (where safety training conflicts with pretraining-induced instruction following) and mismatched generalization (where safety training fails to cover capabilities developed during pretraining). Develops 30 jailbreak methods and tests on GPT-4, Claude v1.3, and GPT-3.5 Turbo, finding vulnerabilities persist despite extensive red-teaming. Argues that scaling and additional red-teaming alone cannot resolve fundamental tensions in how LLMs are trained for safety.

Platform governance and regulation

  • Gorwa (2019) — The platform governance triangle: conceptualising the informal regulation of online content — applies Abbott and Snidal's governance triangle framework to analyze how informal multi-stakeholder arrangements regulate online platform content across Europe. Maps regulatory initiatives by actor composition (state, firm, NGO) including NetzDG, AVMSD, Code of Conduct on Terror and Hate Content, Code of Practice on Disinformation, Facebook Oversight Board, and Global Network Initiative. Identifies three key dynamics shaping effectiveness: legitimation politics (contested authority), actor competencies (divergent expertise and capacity), and power relations (asymmetries favoring firms); argues effective governance requires collaboration across all three actor types, though such arrangements face ongoing tensions over legitimacy and influence.

Propagation-based detection

  • Shu, Bernard & Liu (2018) — Studying Fake News via Network Analysis: Detection and Mitigation — comprehensive chapter surveying network properties (echo chambers, filter bubbles, malicious accounts), three homogeneous and three heterogeneous network types, feature learning via network embeddings (NMF, RNNs), detection methods (interaction embedding, temporal diffusion, credibility propagation, knowledge network matching), and mitigation strategies (provenance identification, leader selection, influence minimization, mitigating campaigns).
  • Cheng et al. (2014) — Can Cascades be Predicted? — foundational work showing cascades on social networks are highly predictable (~80% accuracy); temporal features most predictive, followed by structural, resharer, and user features; demonstrates prediction improves with observation window size; reveals fundamental differences between user-initiated and page-initiated cascades.
  • Li et al. (2016) — DeepCas: an End-to-end Predictor of Information Cascades — first end-to-end deep learning approach to cascade size prediction; represents cascade graphs as random walk paths processed through bidirectional GRU with attention mechanisms; automatically learns cascade representations without hand-crafted features; evaluated on Twitter and academic citation cascades.
  • Wang et al. (2017) — Topological Recurrent Neural Network for Diffusion Prediction — novel LSTM architecture for dynamic DAGs; models cascades as diffusion topologies showing information spread over network structure; learns topology-aware sender embeddings capturing both node properties and cascade dynamics; achieves 20–56% relative improvement over DeepCas across three real-world networks.
  • Tacchini et al. (2017) — Some Like it Hoax: Automated Fake News Detection in Social Networks — hoax detection via user interaction patterns on Facebook; proposes that user "likes" encode veracity signals independent of content; two algorithms (logistic regression, harmonic boolean crowdsourcing) achieve >99% and 99.4% accuracy respectively; demonstrates transfer learning across Facebook communities and robustness with minimal labeled training data (<1% of posts); suggests diffusion patterns are a primary detection signal.
  • Ruchansky, Seo, & Liu (2017) — CSI: A Hybrid Deep Model for Fake News Detection — three-module neural network combining text, temporal engagement response patterns, and user group behavior to identify fake news and suspicious users; Capture module uses LSTM on temporal features; Score module learns user suspiciousness from co-engagement patterns; achieves 89.2% accuracy on Twitter, outperforming text-only and propagation-only baselines; demonstrates value of joint modeling with fewer parameters than competing RNN approaches.
  • Castillo, Mendoza & Poblete (2011) — Information Credibility on Twitter — foundational work framing Twitter credibility assessment via user reputation and propagation signals; 2,500+ trending topics, human-labeled for newsworthiness and credibility; shows user features (registration age, followers, activity) and propagation structure (retweet tree depth/breadth) are stronger predictors than text alone; achieves 89% accuracy on newsworthy detection, 86% on credibility classification.
  • Ma, Gao & Wong (2017) — Detect Rumors in Microblog Posts Using Propagation Structure via Kernel Learning — kernel-based approach (PTK/cPTK) measuring structural similarity between propagation trees; soft-matches subtrees to capture high-order patterns; context-sensitive extension considers propagation paths from root; extends to four-class classification (false/true/unverified/non-rumor); achieves 75.0% accuracy on Twitter15 with superior early detection (75% within 24 hours).
  • Ma, Gao & Wong (2018) — Rumor Detection on Twitter with Tree-structured Recursive Neural Networks — applies recursive neural networks (bottom-up and top-down variants) to thread propagation trees; learns joint content-structure representations where local patterns (supportive/questioning replies) signal veracity; achieves 72.3% / 73.7% accuracy on Twitter15/16 with superior early detection (8 hours vs. 36 hours to match baseline).
  • Vosoughi, Roy & Aral (2017) — The Spread of True and False News Online — largest longitudinal study of misinformation diffusion; ~126K verified true/false cascades from Twitter (2006–2017), 3M users, 4.5M shares; falsehood diffuses 6× faster to 1.5K people, reaches depth 19 vs. truth's depth 10 in 1/10 the time; 70% higher retweet likelihood; humans (not bots) responsible; novelty perception key driver.
  • Shu et al. (2019) — Hierarchical Propagation Networks for Fake News Detection — constructs hierarchical propagation networks from both macro-level (retweet cascades) and micro-level (reply conversations) granularity; extracts structural, temporal, and linguistic features showing fake news spreads deeper, contains more bots, has shorter lifespan, and generates more negative sentiment; hierarchical network features (HPNF) achieve F1 > 0.80, outperforming prior macro-level-only approaches.
  • Zhou & Zafarani (2019) — Network-based Fake News Detection: A Pattern-driven Approach — four network-structural patterns (More-Spreader, Farther-Distance, Stronger-Engagement, Denser-Network) across five network levels; 138 interpretable features; RF 0.929/0.932 accuracy/F₁ on PolitiFact without reading content; robust to limited early-stage network information.
  • Monti et al. (2019) — Fake News Detection on Social Media using Geometric Deep Learning — graph convolutional networks on propagation cascades integrating user profiles, activity, social network structure, and spreading patterns; 92.7% ROC AUC on Twitter; user profile and network features are most important (90% AUC combined) while content marginally contributes; enables early detection within 1–2 hours with minimal cascade size (6 tweets).
  • Dou et al. (2021) — User Preference-aware Fake News Detection (UPFD): Combines endogenous user preferences (from historical tweets) with exogenous news propagation patterns via GNNs. Encodes user preferences and news content using BERT/word2vec, builds Twitter retweet cascade graphs, uses GNN message passing to integrate signals; concatenates user engagement and news textual embeddings for classification. Achieves 84.62% accuracy on Politifact and 97.23% on Gossipcop; ablation studies show both user preference and propagation are necessary; demonstrates confirmation bias is a learnable signal in engagement data.
  • Lu & Li (2020) — GCAN: Graph-aware Co-Attention Networks for Explainable Fake News Detection on Social Media — detects fake news on Twitter using only source tweet text and retweet user sequences (no comments, no explicit network topology); models user propagation with CNN and GRU, constructs fully-connected user interaction graphs weighted by feature similarity, applies GCN and dual co-attention mechanism to jointly highlight suspicious users and informative words; achieves 87.67% accuracy on Twitter15 and 90.84% on Twitter16 (18–20% improvement over prior work); demonstrates early detection at 90% accuracy with only 10 retweets; provides interpretable explanations of suspicious user characteristics and dramatic linguistic markers.

Bot detection

  • Cresci et al. (2016) — DNA-Inspired Online Behavioral Modeling and Its Application to Spambot Detection — encodes user actions as character sequences and applies longest common substring analysis to detect groups of similar accounts; outperforms supervised and unsupervised baselines achieving MCC 0.952 on political bots; demonstrates paradigm shift toward group-level detection of evolved spambots
  • The Paradigm-Shift of Social Spambots: Evidence, Theories, and Tools for the Arms Race — empirical evidence of a new generation of social spambots that evade all existing detection approaches (Twitter, humans, academic tools); crowdsourced evaluation shows 0.24 accuracy vs 0.91 on traditional bots; paradigm shift toward group-level detection methods
  • Better Safe Than Sorry: an Adversarial Approach to improve Social Bot Detection — proposes GenBot, a genetic algorithm for synthesizing evolved spambots; demonstrates that evolved bots evade state-of-the-art detection (F₁ ≈ 0.26) while revealing an actionable vulnerability (entropy-based signatures); introduces proactive detection paradigm to anticipate and preemptively defend against future bot evolutions
  • A Decade of Social Bot Detection — decade-long longitudinal review (2010–2020) of social bot detection research; systematically catalogs 230+ detectors across two dimensions (individual vs. group detection, methodological approach); documents three "waves" of bot evolution and shift from individual-account to group-level detection approaches; analyzes publication trends showing exponential growth post-2014; argues modern ML detectors fail due to non-stationarity and non-neutrality assumptions
  • Detection of Novel Social Bots by Ensembles of Specialized Classifiers — ensemble of specialized classifiers; shows heterogeneous bot behaviors (spammers, fake followers, political bots) are distinguished by different feature sets; ESC trains per-type classifiers combined via maximum rule; achieves 56% improvement in F1 score for novel bots (47% → 73%); improves cross-domain recall from 42% to 84%; enables efficient adaptation with fewer labeled examples for retraining; deployed in Botometer v4 achieving AUC 0.99
  • Davis et al. (2016) — BotOrNot: A System to Evaluate Social Bots — publicly available web and API service for classifying Twitter accounts as human or bot using 1,000+ features; Random Forest classifier achieves 0.95 AUC on 15k bots and 16k legitimate accounts; served over one million API requests since 2014 launch
  • Arming the public with artificial intelligence to counter social bots — comprehensive review of social bot types, activities, and impact; case study of Botometer bot detection tool; user experience survey (N=731) revealing interpretation challenges; proposes calibration methods (Platt scaling, Complete Automation Probability) to make bot scores interpretable; demonstrates model generalization across diverse bot types
  • Scalable and Generalizable Social Bot Detection through Data Selection — scalable metadata-only bot detection framework using only 20 user features; achieves 900M tweets/day processing speed; compiles 13 labeled datasets (94K bots, 43K humans); demonstrates that strategic data selection (training on curated subset) improves generalization and consistency better than exhaustive training; achieves 0.99 AUC on unseen datasets
  • Shao et al. (2018) — Anatomy of an online misinformation network — network analysis of fact-checking vs. misinformation diffusion during 2016 U.S. election; k-core decomposition reveals strong segregation between claim and fact-check communities; fact-checking nearly disappears in dense network cores dominated by bots and misinformation spreaders; identifies efficient node-removal strategies for disrupting misinformation circulation
  • Shao et al. (2017) — The spread of low-credibility content by social bots — large-scale empirical analysis (14M messages, 400K articles) during 2016 U.S. election; shows only 6% of accounts spreading misinformation are bots but they account for 31% of tweet volume; bots employ early amplification strategy and target influential users; network dismantling analysis shows removing bots is critical for reducing misinformation spread
  • Varol et al. (2017) — Online Human-Bot Interactions: Detection, Estimation, and Characterization — large-scale machine-learning framework extracting 1,150 behavioral features to classify bots from Twitter accounts; evaluates on 14M accounts; estimates 9–15% of active users are bots; clustering analysis reveals behavioral phenotypes; demonstrates concept drift in detection systems
  • Ferrara et al. (2015) — The Rise of Social Bots — foundational survey of social bot phenomenon and detection methods; proposes taxonomy dividing approaches into graph-based detection (network structure), crowd-sourced detection (human judgment), and feature-based detection (behavioral patterns); analyzes characteristics distinguishing bots from humans and discusses arms race between sophistication and detection.
  • Ayoobi, Shahriar & Mukherjee (2023) — The Looming Threat of Fake and LLM-generated LinkedIn Profiles: Challenges and Opportunities for Detection and Prevention — Dataset of 3,600 LinkedIn profiles (1,800 legitimate, 600 human-created fake, 1,200 ChatGPT-generated) and Section and Subsection Tag Embedding (SSTE) method for detection; achieves 95% accuracy distinguishing legitimate from fake profiles, and 70%+ accuracy on unseen ChatGPT-generated profiles despite absence from training; demonstrates that minimal LLM-generated training samples suffice for generalization to diverse LLM outputs.

Media manipulation & coordinated campaigns

Fringe communities and internet culture

  • Zannettou et al. (2018) — On the Origins of Memes by Means of Fringe Web Communities — large-scale empirical study of meme origins and propagation across Twitter, Reddit, /pol/, and Gab (160M images, 2016–2017); uses perceptual hashing and custom distance metrics to identify 12.6K meme clusters; employs Hawkes processes to quantify directed influence between communities; finds /pol/ and The_Donald substantially influence mainstream meme ecosystems despite modest size; documents disproportionate prevalence of hateful and anti-semitic memes on fringe communities.

LLM-generated text detection

  • Tang, Chuang & Hu (2023) — The Science of Detecting LLM-Generated Texts — comprehensive survey of black-box and white-box detection approaches for LLM-generated text, covering data collection, feature selection (statistical disparities, linguistic patterns, fact verification), classification models, watermarking strategies, benchmark datasets (HC3, Neural Fake News, etc.), and challenges including adaptive attacks, bias in training data, and threats from open-source LLMs.

Style / content-based detection

  • Wang & Chang (2022) — Toxicity Detection with Generative Prompt-based Inference — zero-shot toxicity detection using generative prompt-based classification; compares generative (estimating p(x|y)) vs. discriminative formulations; careful prompt engineering crucial for performance; demonstrates generative approach outperforms discriminative and embedding-similarity baselines on SBIC, HateExplain, and Civility datasets; qualitative analysis reveals LLMs sometimes rely on spurious correlations learned during pre-training.
  • Oshikawa, Qian, & Wang (2020) — A Survey on Natural Language Processing for Fake News Detection — comprehensive NLP survey systematically comparing task formulations (classification vs. regression), nine benchmark datasets (LIAR, FEVER, FakeNewsNet, SNS data), and five methodological approaches (preprocessing, ML models, rhetorical approaches, evidence collection); demonstrates attention-based LSTM models outperform hand-crafted linguistic features; achieves 41.5–45.7% on LIAR, 68–76% on FEVER, 94.4% on FakeNewsNet (with graph convolutional networks).
  • Singhania, Fernandez & Rao (2023) — 3HAN: A Deep Neural Network for Fake News Detection — three-level hierarchical attention network modeling articles at word, sentence, and headline-body levels. Word-level attention extracts relevant words; sentence-level attention identifies informative sentences; headline-body attention captures stance between headline and body. Achieves 96.77% with headline-based pre-training; provides interpretable attention visualizations showing which words and sentences drive fake news predictions.
  • Liu, Wang, Li & Li (2024) — TELLER: A Trustworthy Framework For Explainable, Generalizable and Controllable Fake News Detection — dual-system framework combining LLM-driven cognition system (decomposes claims into interpretable yes/no questions) with neural-symbolic decision system (learns transparent logic rules via disjunctive normal form); achieves 76% accuracy on GossipCop and 80%+ on three datasets while maintaining explainability, generalizability, and human controllability; demonstrates integration of human expertise with machine learning for trustworthy detection.
  • Potthast et al. (2017) — A Stylometric Inquiry into Hyperpartisan and Fake News — stylometric analysis via writing style features showing hyperpartisan news can be distinguished from mainstream (F1=0.78), satire from both (F1=0.81), but style-based fake news detection alone insufficient (F1=0.46); introduces Unmasking technique for assessing style similarity between text categories; corpus of 1,627 fact-checked articles from BuzzFeed.
  • Karimi & Tang (2019) — Learning Hierarchical Discourse-level Structure for Fake News Detection — proposes HDSF framework that automatically learns discourse-level dependency trees (hierarchical sentence organizations) via inter-sentential attention; identifies three structure-related properties distinguishing fake/real news: leaf node count (coherence), preorder difference (sentence ordering), parent-child distance (discourse cohesion); achieves 82.19% accuracy, outperforming linguistic baselines; real news documents exhibit statistically significant higher coherence in discourse structures.
  • Rashkin et al. (2017) — Truth of Varying Shades: Analyzing Language in Fake News and Political Fact-Checking — linguistic analysis of news across satire, hoax, propaganda, and trusted categories; demonstrates that fake news uses more first-person pronouns, superlatives, modal adverbs, and hedging while trusted news uses concrete language and assertive verbs; news reliability classification 65% F1; fine-grained truthfulness prediction on PolitiFact 6-point scale achieves 22% F1 (6-class) and 52% F1 (2-class).
  • Zhou et al. (2023) — Linguistic-style-aware Neural Networks for Fake News Detection — HERO hierarchical recursive neural network; constructs per-document linguistic trees integrating constituency and RST discourse structure; Bi-GRU aggregation preserves global tree topology; attribute-specific variant 0.866/0.896 AUC on Recovery/MM-COVID, outperforming HAN, Text-GCN, DRNN, and Transformer baselines.
  • Cao et al. (2025) — Is Less Really More? Fake News Detection with Limited Information — SLIM framework; replaces full article text with MMR-selected keywords, POS/NER sequence tags, or metadata; 30% keyword extraction achieves ~99% accuracy ratio vs. full text; XLNet_base backbone; 95.55%/97.60% accuracy on ReCOVery/Fake_And_Real_News.
  • Kaliyar, Goswami & Narang (2021) — FakeBERT: Fake News Detection in Social Media with a BERT-based Deep Learning Approach — combines BERT embeddings with parallel 1D convolutional neural networks using varying kernel sizes for multi-scale feature extraction; achieves 98.90% accuracy on real-world 2016 U.S. Presidential Election dataset, substantially outperforming CNN (92.70%) and LSTM (97.55%) baselines, demonstrating the effectiveness of contextualized transformer embeddings for fake news detection on social media.

Knowledge & fact-checking

Evidence-based detection

  • Popat et al. (2018) — DeClareE: Debunking Fake News and False Claims using Evidence-Aware Deep Learning — end-to-end neural network combining claims and evidence articles via bidirectional LSTMs with claim-specific attention; automatically discovers which web articles support or refute a claim without hand-crafted features; 78.96% accuracy on Snopes, demonstrating the value of external evidence for credibility assessment.
  • Vo & Lee (2021) — Hierarchical Multi-head Attentive Network for Evidence-aware Fake News Detection — proposes MAC with hierarchical multi-head attention at both word and document levels; word-level attention identifies important phrases in claims and evidence; document-level attention weights evidence sources by relevance; jointly optimized with BiLSTM embeddings; achieves 88.7% AUC on Snopes (9.47% improvement over baselines) and 75.8% on PolitiFact; ablation studies demonstrate both attention levels are essential for evidence-aware fact-checking.
  • Jin et al. (2021) — Towards Fine-Grained Reasoning for Fake News Detection — constructs claim-evidence graphs from social media (posts, users, keywords) and uses mutual-reinforcement-based ranking to identify salient evidence; proposes bi-channel kernel graph attention network integrating textual and social signals for fine-grained reasoning; achieves 91.7% F1 on PolitiFact and 86.4% F1 on GossipCop with interpretable explanations of which evidence groups matter most for each prediction.

Multimodal

  • Jagtap et al. (2021) — Misinformation Detection on YouTube Using Video Captions — applies pre-trained word embeddings (GloVe, Word2Vec) to YouTube video captions for three-class (Misinformation, Debunking, Neutral) and binary classification; achieves 0.85–0.90 F₁ (three-class) and 0.92–0.95 F₁ (binary); demonstrates that video metadata (views, likes) alone insufficient but caption analysis with classical ML classifiers outperforms baselines across five conspiracy topics (vaccines, 9/11, chemtrails, moon landing, flat earth).
  • Alam et al. (2021) — A Survey on Multimodal Disinformation Detection — comprehensive survey of multimodal disinformation covering text, images, speech, video, network structure, and temporal information; distinguishes factuality (content falsity) from harmfulness (intent to deceive/harm); systematically reviews ~140 papers on detection approaches and identifies key challenges in combining multiple modalities.
  • Nakamura, Levy & Wang (2019) — r/Fakeddit: A New Multimodal Benchmark Dataset for Fine-grained Fake News Detection — large-scale multimodal dataset with 1.06M Reddit submissions (64% text+image) labeled for 2-way, 3-way, and 6-way classification; demonstrates multimodal models (BERT + ResNet50) achieve 85.88% 6-way accuracy, ~10 percentage points above text-only baselines; identifies satire and imposter content as hardest categories.
  • Wang et al. (2018) — EANN: Event Adversarial Neural Networks for Multi-Modal Fake News Detection — adversarial learning to remove event-specific features and learn event-invariant representations; feature extractor cooperates with fake news detector and tries to fool event discriminator; 71.5% / 82.7% accuracy on Twitter / Weibo; first to formulate fake news detection on newly emerged events as a transfer learning problem.
  • Yang et al. (2018) — TI-CNN: Convolutional Neural Networks for Fake News Detection — parallel CNN branches for text and image; extracts explicit features (word counts, punctuation, capital letters, negations, pronouns, face count, resolution) and learns latent representations; concatenates both feature sets for classification; achieves F₁ 0.9210 on 2016 US presidential election news, significantly outperforming text-only (0.8920) and image-only (0.4729) approaches.
  • Khattar et al. (2019) — MVAE: Multimodal Variational Autoencoder for Fake News Detection — variational autoencoder learning shared text-image representations; jointly trains encoder-decoder (VAE reconstruction) with binary classifier; 74.5% / 82.4% accuracy on Twitter / Weibo, improving ~6% over attention-based baselines by explicitly modeling cross-modal correlations.
  • A Multi-Modal Method for Satire Detection using Textual and Visual Cues: Multi-modal satire detection using ViLBERT (Vision & Language BERT) on headline-image pairs from satirical and mainstream news sources; achieves 93.80% accuracy on 10,000-article dataset; demonstrates that early fusion and multi-modal pre-training outperform uni-modal and simple fusion baselines; notably, image forensics (ELA+CNN) alone underperforms, highlighting importance of joint reasoning.
  • Zhou et al. (2020) — SAFE: Similarity-Aware Multi-Modal Fake News Detection — proposes cross-modal text-image similarity as a detection signal; modified cosine similarity between Text-CNN text and image2sentence visual representations; F₁ 0.896/0.895 on PolitiFact/GossipCop, outperforming text-only and prior multi-modal baselines.
  • Detecting Cross-Modal Inconsistency to Defend Against Neural Fake News: First to address defending against machine-generated multimodal fake news with images and captions. Proposes DIDAN, a named entity-based approach to detect visual-semantic inconsistencies by measuring named entity co-occurrence between article text and image captions. Introduces NeuralNews dataset of 128K articles across four types (real/generated articles × real/generated captions). Shows naive humans achieve only 46.2% detection accuracy while trained humans with visual-semantic cues reach 67.8%; identifies Type C (generated text + real images) as most deceptive.
  • Silva et al. (2021) — Embracing Domain Differences in Fake News: Cross-domain Fake News Detection using Multimodal Data — addresses practical problem that multimodal models trained on one domain fail on others (politics→entertainment, politics→COVID-19); unsupervised domain discovery via propagation network community detection; supervised domain-agnostic classifier preserves both domain-specific and cross-domain knowledge via dual decoders with adversarial loss; LSH-based instance selection reduces labeling cost; 7.55% F₁ improvement on rarely-appearing domains; 0.836–0.869 F₁ across PolitiFact/GossipCop/CoAID.
  • Zhou et al. (2020) — ReCOVery: A Multimodal Repository for COVID-19 News Credibility Research — multimodal COVID-19 news credibility dataset; 2,029 articles from 60 screened publishers with NewsGuard/MBFC labels, 140,820 tweets; benchmarks LIWC, RST, Text-CNN, and SAFE with SAFE achieving best F₁ 0.833/0.672.
  • Yang et al. (2020) — CHECKED: Chinese COVID-19 Fake News Dataset — first Chinese-language COVID-19 misinformation dataset; 2,104 Weibo microblogs with per-item expert labels, multimedia, and 1.87M repost/1.19M comment propagation graphs; TextCNN baseline macro F₁ = 0.938.

Source / social context

Susceptibility factors and public health behavior

Moral and behavioral psychology of misinformation

Adversarial & defensive

Early detection

Explainability

  • Danilevsky et al. (2020) — A Survey of the State of Explainable AI for Natural Language Processing in arXiv — comprehensive survey of explainability in NLP across 50 papers; categorizes explanations by local/global scope and self-explaining/post-hoc generation; details five core techniques (feature importance, surrogate models, example-driven, provenance-based, induction); reviews visualization methods (saliency, raw declarative, natural language); identifies evaluation gaps and future directions; foundational reference for interpretable NLP systems including fake-news detectors.
  • Shu et al. (2019) — dEFEND: Explainable Fake News Detection — hierarchical attention networks jointly encoding news content and user comments; sentence-comment co-attention identifies which sentences and comments drive the fake-news prediction; 0.904 accuracy on PolitiFact; human evaluation demonstrates dEFEND ranks check-worthy sentences better than HPA-BLSTM.

Cross-domain & transfer

  • Nan et al. (2021) — MDFEND: Multi-domain Fake News Detection — Introduces Weibo21, the first multi-domain fake news dataset from a single platform with 9 domains; proposes MDFEND using mixture-of-experts with domain gate to adaptively aggregate representations across domains; achieves 0.9137 F₁, outperforming single-domain and multi-domain baselines; directly addresses domain shift in linguistic patterns and propagation behavior.

LLMs & generative-era

Real-world GenAI misuse and threat assessment: - Marchal et al. (2024) — Generative AI Misuse: A Taxonomy of Tactics and Insights from Real-World Data — First empirical taxonomy of real-world GenAI misuse based on 191 documented media incidents (Jan 2023–Mar 2024); identifies 18 distinct tactics across exploitation of GenAI capabilities (impersonation, falsification, content scaling) and technical system attacks; finds most misuse is low-tech and accessible, driven by five goals (opinion manipulation 27%, monetization 21%, fraud 18%, harassment 6%, reach 3.6%); demonstrates GenAI has democratized previously costly tactics for broader pools of actors with minimal technical expertise.

Empirical detection studies: - Su, Cardie & Nakov (2023) — Adapting Fake News Detection to the Era of Large Language Models — Comprehensive evaluation of fake news detectors across three stages: human-written dominance (Human Legacy), mixed human-machine (Transitional Coexistence), and machine-generated dominance. Key finding: detectors trained exclusively on human-written fake news generalize poorly to machine-generated fakes. Recommends training on balanced human-machine data to improve robustness. Benchmarks RoBERTa, BERT, ELECTRA, ALBERT, DeBERTa on GossipCop++ and PolitiFact++ datasets; reveals data distribution shifts caused by LLMs create asymmetric generalization challenges. - Dugan et al. (2022) — Real or Fake Text?: Boundary Detection — Investigates human ability to detect transition points where text shifts from human-written to machine-generated (boundary detection). Introduces RoFT game platform; 21,000+ annotations across four genres show humans achieve 23.4% on first guess (vs. 10% random) and 72.3% with top-3 guesses; larger models harder to detect; genre-specific error patterns; monetary incentives improve learning. - Can LLM-Generated Misinformation Be Detected? — Empirical evidence that LLM-generated misinformation is harder to detect for humans (9.6% vs 40.7% success) and detectors than human-written content with same semantics; builds taxonomy and LLMFake dataset.

Comprehensive surveys: - Combating Misinformation in the Age of LLMs: Opportunities and Challenges — Systematic review of both opportunities (detection, intervention, attribution) and challenges (hallucination, intentional generation) for using LLMs in misinformation research; examines domain-specific threats and countermeasures.

Model evaluation and understanding:

Model alignment and instruction-following: - Askell et al. (2021) — A General Language Assistant as a Laboratory for Alignment — interactive evaluation framework for alignment using helpfulness, honesty, and harmlessness (HHH) criteria; compares prompting, imitation learning, binary discrimination, and ranked preference modeling; finds ranked preference modeling scales better than imitation learning; introduces preference model pre-training (PMP) on public data to improve sample efficiency.

Disinformation generation and safety: - Toxicity in ChatGPT: Analyzing Persona-assigned Language Models — Large-scale systematic analysis of persona-induced toxicity in ChatGPT; shows safety mechanisms can be bypassed via system parameter manipulation; 6× toxicity increase possible; reveals discriminatory bias targeting certain demographic groups, countries, and entity categories - Vykopal et al. (2023) — Disinformation Capabilities of Large Language Models — Comprehensive empirical study of 10 LLMs' ability to generate disinformation news articles across 20 narratives (COVID-19, Russia-Ukraine, health, elections); most models readily agree with dangerous claims; Falcon is sole exception with effective safeguards; ChatGPT shows behavioral safety; existing detectors achieve ~0.8 F1 but struggle per-sample.

Truthfulness and hallucination evaluation: - Ji et al. (2022) — Survey of Hallucination in Natural Language Generation — comprehensive survey of hallucination across six major NLG tasks (abstractive summarization, dialogue generation, QA, data-to-text generation, machine translation, vision-language generation) and LLMs; defines intrinsic and extrinsic hallucinations; reviews metrics (statistical, model-based, human evaluation) and mitigation methods (architecture, training, post-processing, controllable generation); identifies task-specific tolerance differences and open challenges. - Lin, Hilton & Evans (2021) — TruthfulQA: Measuring How Models Mimic Human Falsehoods — benchmark demonstrating larger language models are less truthful; proposes automated metric for evaluating factual accuracy.

Robustness of detectors and adversarial attacks: - Sadasivan et al. (2023) — Can AI-Generated Text be Reliably Detected? — Comprehensive stress-testing of four detector classes (watermarking, neural network-based, zero-shot, retrieval-based) using recursive paraphrasing attacks; reduces watermark detector AUROC from 99.8% to 80.7%, and retrieval-based detectors below 60% accuracy with only modest text quality degradation; establishes theoretical bound on detector AUROC via total variation distance between text distributions, revealing fundamental hardness as LLMs improve. - Mao et al. (2024) — RAIDAR: Generative AI Detection via Rewriting — Detection method leveraging rewriting behavior: LLMs preserve their own generated text while modifying human-written text when asked to rewrite. Measures three structural properties (invariance, equivariance, output uncertainty) from editing distance without requiring internal model access. Achieves 60–95 F1 across diverse domains (news, essays, code, reviews, arXiv) with 29-point improvements over prior methods; robust to adversarial rephrasing attacks even when adversaries know the detection mechanism (up to 93 F1).

Zero-shot detection methods: - Mitchell et al. (2023) — DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature — Identifies that LLM-generated text occupies negative-curvature regions of the log-probability landscape; zero-shot method using random perturbations to estimate Hessian trace; achieves 0.95 AUROC on GPT-2 detection without training data or access to model parameters.

Generation attacks and detection: - Adelani et al. (2019) — Generating Sentiment-Preserving Fake Online Reviews — Demonstrates practical attack using fine-tuned GPT-2 to generate high-quality product reviews; two-step approach (generation + BERT validation) preserves sentiment; shows humans and automated detectors (Grover, GLTR, OpenAI detector) struggle to distinguish generated from authentic reviews; sentiment preservation rates 67–71% with fine-tuning. - Solaiman et al. (2019) — Release Strategies and the Social Impacts of Language Models — OpenAI's report on GPT-2 staged release (124M–1.5B parameters, Feb–Nov 2019) and responsible AI publication norms. Evaluates human credibility perception of synthetic text (~75% for largest models), automated detection (RoBERTa ~95% accuracy), biases in outputs (gender, religion, language preference), and threat landscape. Conducted partnership-based risk analysis with external institutions (Cornell, Middlebury CTEC, University of Oregon, University of Texas Austin). Foundational for understanding staged release strategies and detecting GPT-2 generated fake news. - Gehrmann, Strobelt & Rush (2019) — GLTR: Statistical Detection and Visualization of Generated Text — interactive tool for detecting AI-generated text by analyzing language model output distribution; three statistical tests (word probability, token rank, entropy) reveal that generated text concentrates on high-rank tokens while humans use wider vocabulary; human-subjects study shows visual interface improves fake-text detection from 54% to 72% accuracy; widely deployed at gltr.io. - Ippolito et al. (2019) — Automatic Detection of Generated Text is Easiest when Humans are Fooled — empirical study contrasting human and automatic detection of GPT-2-generated text across three decoding strategies (top-k, nucleus sampling, untruncated random). Fine-tuned BERT achieves 80%+ accuracy on long (192-token) excerpts versus 71% for trained human raters. Critical finding: detectors trained on one strategy transfer poorly to others (42.5% accuracy drop when trained on top-k and tested on nucleus), whereas humans remain robust. Reveals asymmetric difficulty: detection is easiest when humans are fooled (nucleus sampling) but hardest when humans remain reliable (top-k). - Clark et al. (2021) — All That's 'Human' Is Not Gold: Evaluating Human Evaluation of Generated Text — large-scale study (1,170 Amazon Mechanical Turk evaluators) examining humans' ability to assess machine-generated text from GPT2 and GPT3 across three domains (stories, news, recipes). Untrained evaluators achieve 57% accuracy on GPT2 and 50% (chance) on GPT3; three training interventions (instructions, examples, comparisons) improve accuracy modestly, with example-based training reaching 55% overall. Reveals evaluators focus on surface-level features (grammar, spelling, style) rather than content; provides evidence that consistent human evaluation methodology is critical for benchmarking NLG quality and detection systems.

Misinformation generation and ODQA poisoning: - On the Risk of Misinformation Pollution with Large Language Models — Investigates LLM-generated misinformation threat to ODQA systems; demonstrates GPT-3.5 can generate credible false passages that degrade retrieval-based QA performance (14–87% EM drop); proposes misinformation detection, vigilant prompting, and reader ensemble defenses

Real-world impacts & polarization

  • Garimella et al. (2017) — Quantifying Controversy on Social Media — three-stage graph-based pipeline for measuring controversy in social media discussions via conversation topology; proposes multiple network-structure metrics with random-walk-based approach (RWC) most reliably separating controversial from non-controversial topics; validates on Twitter and external datasets.
  • Garimella et al. (2016) — Reducing Controversy by Connecting Opposing Views — algorithmic approach to mitigating echo chambers and polarization via graph-based edge recommendation; uses RWC metric to identify which edges to add; efficient algorithm (ROV) focuses on high-degree hub nodes; extends to ROV-AP incorporating acceptance probability based on user polarity; empirical validation on 10 Twitter controversy datasets.
  • Cinelli et al. (2021) — The echo chamber effect on social media — comparative analysis of 100+ million posts across Twitter, Facebook, Reddit, and Gab; quantifies echo chambers via homophily in interaction networks and bias in information diffusion; shows platform architecture (feed algorithms vs. community-based curation) determines whether homophilic clustering and polarized diffusion emerge; Facebook and Twitter exhibit strong echo chambers while Reddit shows reduced segregation despite polarization.
  • Bail et al. (2018) — Exposure to opposing views on social media can increase political polarization — field experiment on Twitter showing that repeated exposure to opposing political ideology can increase polarization (backfire effect), particularly for Republicans; challenges assumption that "breaking echo chambers" reduces polarization.
  • Soares, Recuero & Zago (2018) — Influencers in Polarized Political Networks on Twitter — social network analysis of Twitter conversations during Brazil's 2016 impeachment process; identifies three influencer types (opinion leaders, informational influencers, activists) and shows that user behavior—especially activist retweeting of in-group messages—actively reinforces echo-chamber structure and polarization beyond algorithmic curation.
  • Wilson & Wiysonge (2020) — Social media and vaccine hesitancy — large-scale cross-national study (137–166 countries) demonstrating causal links between social media activity and public health outcomes; social media organization for offline action predicts vaccine safety skepticism (cross-sectional); foreign disinformation campaigns associated with 2-percentage-point drop in vaccination coverage year-over-year; 15% increase in negative vaccine tweets per point on disinformation scale
  • Mills et al. (2023) — Engagement, User Satisfaction, and the Amplification of Divisive Content on Social Media — pre-registered field experiment comparing Twitter's engagement-based ranking algorithm with reverse-chronological and stated-preference baselines; finds engagement-based ranking amplifies partisan (0.24 SD), emotionally charged, and out-group hostile content beyond what users report preferring; proposes and evaluates stated-preference ranking that reduces harmful amplification while maintaining engagement and satisfaction.

Authors

See all authors — sorted alphabetically.

Topics

See all topics — research themes and methods.

Datasets

See all datasets.

  • NELA-GT-2022 — 1.78M news articles from 361 sources (2022); source-level MBFC labels (factuality 0–5, conspiracy/pseudoscience); 346K embedded tweets; fifth NELA release with stabilized collection; SQLite and JSON formats.
  • NELA-GT-2018 — 713K news articles from 194 sources (2018); engagement-independent collection; source-level labels from 8 assessment sites (NewsGuard, Pew Research, Wikipedia, OpenSources, MBFC, AllSides, BuzzFeed, PolitiFact); multi-dimensional ground truth.
  • NELA-GT-2019 — 1.12M news articles from 260 sources (2019); source-level labels from 7 assessment sites (MBFC, AllSides, PolitiFact, etc.); 3-class aggregated reliability label; SQLite and JSON formats.
  • NELA-GT-2020 — 1.78M news articles from 519 sources (2020); source-level MBFC labels; novel embedded tweets feature (410K tweets); covers COVID-19 and 2020 U.S. election; SQLite and JSON formats.
  • MM-COVID — 3,981 fake news pieces in six languages (English, Spanish, Portuguese, Hindi, French, Italian) with 7,192 tweets; multilingual and multimodal COVID-19 dataset enabling cross-lingual detection research.
  • FakeNewsNet — PolitiFact + GossipCop; news content + Twitter social context; labels from professional fact-checkers.
  • ReCOVery — 2,029 COVID-19 news articles from 60 publishers; publisher-level NewsGuard/MBFC credibility labels; 140,820 tweets; multimodal (text, image, social).
  • CHECKED — 2,104 Weibo microblogs with per-item expert labels (344 fake, 1,760 real); Chinese COVID-19; includes images, video, and full propagation threads.
  • Weibo21 — 9,128 Weibo microblogs (4,488 fake, 4,640 real) across 9 domains (Science, Military, Education, Disasters, Politics, Health, Finance, Entertainment, Society); first multi-domain fake news dataset from a single platform; addresses domain shift for cross-domain detection.
  • Fake And Real News — 10,558 English news articles (binary fake/real); fake articles from 2016 Kaggle election dataset; real articles from AllSides/major outlets; 50% null accuracy.

Tools & libraries

See all tools (populated by ingest workflow)

Videos & talks

See all videos (populated by ingest workflow)

  • Misinformation and Data Literacy — data literacy as a defense against misinformation; chart manipulation in mainstream media; educational interventions (Calling Bullshit); emerging synthetic-media threats.
  • Misinformation in Diaspora Communities — messaging-app misinformation in immigrant communities; platform moderation gaps; language-specific fact-checking inequality.