Where Are the Facts? Searching for Fact-checked Information to Alleviate the Spread of Fake News¶

Authors: Nguyen Vo, Kyumin Lee
Venue: arXiv, October 2020 — arXiv:2010.03159

TL;DR¶

Proposes a framework to automatically find fact-checking articles that address claims in original tweets. Uses a multimodal attention network (MAN) combining textual and visual content from tweets with fact-checking article text to rank candidate fact-checks; demonstrates that multimodal retrieval substantially outperforms text-only baselines and can warn users about fact-checked misinformation.

Contributions¶

First study to search for fact-checking articles to increase users' awareness of fact-checked information when exposed to fake news
Novel multimodal attention network (MAN) combining textual and visual matching signals with an attention mechanism that focuses on key word interactions
Experimental validation on Snopes and PolitiFact datasets showing effectiveness and generalizability of the retrieval approach

Method¶

The framework addresses two challenges: (P1) how to extract useful information from original tweets (including images) to find relevant FC-articles, and (P2) how to design a retrieval system to find and rank FC-articles.

Input Representations. Original tweets are represented as pairs of text and images; fact-checking articles contain only text. Both are converted to vector representations via projection layers.

Basic Retrieval. Uses BM25 to retrieve initial candidate fact-checking articles based on tweet text, since raw text alone is insufficient to capture the full meaning present in images.

Multimodal Attention Network (MAN). Three-component architecture:

Textual Matching Layer. Uses Glove embeddings combined with contextual word embeddings from ELMo; derives Glove interaction matrix S and contextual word embedding interaction matrix C; learns dissimilarity matrix G via a sigmoid function to avoid over-reliance on raw similarities (handles cases like "hillar" vs. "hillary" that have high similarity but different meanings).
Visual Matching Layer. Uses ResNet50 to extract image representations; computes pairwise similarity matrix V between tweet and article images.
Unifying Textual and Visual Information. Concatenates scalar similarity score (computed from visual features) with textual feature vector; trains a dense layer to learn joint representations.

Training. Triplet loss minimization using original tweets paired with relevant and non-relevant FC-articles.

Results¶

Experimental Setup. Split Snopes and PolitiFact datasets into train/validation/test with 80/10/10 ratio. Evaluated using NDCG@K and Hit@K metrics.

Performance of Basic Retrieval. BM25 on tweets' text achieves ~50% Hit@50 (Snopes) and ~70% (PolitiFact). Adding images (BM25-I) improves Hit@50 to ~80% (Snopes) and ~94% (PolitiFact).

Multimodal Attention Network. MAN consistently outperforms all baselines: - On Snopes: 4.7% improvement on NDCG@1, 17.2% improvement on Hit@50 - On PolitiFact: 3.9% improvement on NDCG@1, maximum improvement of 39.6% on Hit@50 - On left-over queries: MAN-A (with augmented training data for sparse textual signals) achieves best results with 8% and 11% improvement on Snopes/PolitiFact

Ablation Studies. Contextual word embeddings from ELMo are more effective than Glove alone for capturing semantic nuances; combining Glove and ELMo (CTM) achieves strong improvements over baselines.

Connections¶

Related to Hierarchical Multi-head Attentive Network for Evidence-aware Fake News Detection via multimodal and evidence-aware fact-checking approaches from the same group
Complementary to DeClareE: Debunking Fake News and False Claims using Evidence-Aware Deep Learning which uses evidence for debunking via deep learning
Extends MVAE: Multimodal Variational Autoencoder for Fake News Detection and EANN: Event Adversarial Neural Networks for Multi-Modal Fake News Detection by focusing specifically on fact-checking article retrieval rather than veracity classification
Applied in A Picture Paints a Thousand Lies? The Effects and Mechanisms of Multimodal Disinformation and Rebuttals Disseminated via Social Media and related multimodal fact-checking scenarios

Notes¶

Strengths: Novel problem formulation (finding FC-articles rather than just verifying claims); genuine multimodal design that captures word-level interactions via contextual embeddings; strong empirical results showing multimodal substantially beats text-only; publicly released code and datasets enable reproducibility.

Limitations: Limited to English-language datasets (Snopes, PolitiFact); evaluation restricted to political and consumer fact-checking domains; assumes fact-checking articles already exist for claims (reactive rather than proactive); doesn't address scalability for real-time deployment at social media scale.

Open questions: How does the approach generalize to new fact-checking sources and emerging claims? Can the framework be extended to cross-lingual retrieval? How sensitive is the model to the quality and comprehensiveness of the FC-article repository?