Skip to content
Where Are the Facts? Searching for Fact-checked Information to Alleviate the Spread of Fake News

Where Are the Facts? Searching for Fact-checked Information to Alleviate the Spread of Fake News

Authors: Nguyen Vo, Kyumin Lee
Venue: arXiv, October 2020 — arXiv:2010.03159

TL;DR

Proposes a framework to automatically find fact-checking articles that address claims in original tweets. Uses a multimodal attention network (MAN) combining textual and visual content from tweets with fact-checking article text to rank candidate fact-checks; demonstrates that multimodal retrieval substantially outperforms text-only baselines and can warn users about fact-checked misinformation.

Contributions

  • First study to search for fact-checking articles to increase users' awareness of fact-checked information when exposed to fake news
  • Novel multimodal attention network (MAN) combining textual and visual matching signals with an attention mechanism that focuses on key word interactions
  • Experimental validation on Snopes and PolitiFact datasets showing effectiveness and generalizability of the retrieval approach

Method

The framework addresses two challenges: (P1) how to extract useful information from original tweets (including images) to find relevant FC-articles, and (P2) how to design a retrieval system to find and rank FC-articles.

Input Representations. Original tweets are represented as pairs of text and images; fact-checking articles contain only text. Both are converted to vector representations via projection layers.

Basic Retrieval. Uses BM25 to retrieve initial candidate fact-checking articles based on tweet text, since raw text alone is insufficient to capture the full meaning present in images.

Multimodal Attention Network (MAN). Three-component architecture:

  1. Textual Matching Layer. Uses Glove embeddings combined with contextual word embeddings from ELMo; derives Glove interaction matrix S and contextual word embedding interaction matrix C; learns dissimilarity matrix G via a sigmoid function to avoid over-reliance on raw similarities (handles cases like "hillar" vs. "hillary" that have high similarity but different meanings).

  2. Visual Matching Layer. Uses ResNet50 to extract image representations; computes pairwise similarity matrix V between tweet and article images.

  3. Unifying Textual and Visual Information. Concatenates scalar similarity score (computed from visual features) with textual feature vector; trains a dense layer to learn joint representations.

Training. Triplet loss minimization using original tweets paired with relevant and non-relevant FC-articles.

Results

Experimental Setup. Split Snopes and PolitiFact datasets into train/validation/test with 80/10/10 ratio. Evaluated using NDCG@K and Hit@K metrics.

Performance of Basic Retrieval. BM25 on tweets' text achieves ~50% Hit@50 (Snopes) and ~70% (PolitiFact). Adding images (BM25-I) improves Hit@50 to ~80% (Snopes) and ~94% (PolitiFact).

Multimodal Attention Network. MAN consistently outperforms all baselines: - On Snopes: 4.7% improvement on NDCG@1, 17.2% improvement on Hit@50 - On PolitiFact: 3.9% improvement on NDCG@1, maximum improvement of 39.6% on Hit@50 - On left-over queries: MAN-A (with augmented training data for sparse textual signals) achieves best results with 8% and 11% improvement on Snopes/PolitiFact

Ablation Studies. Contextual word embeddings from ELMo are more effective than Glove alone for capturing semantic nuances; combining Glove and ELMo (CTM) achieves strong improvements over baselines.

Connections

Notes

Strengths: Novel problem formulation (finding FC-articles rather than just verifying claims); genuine multimodal design that captures word-level interactions via contextual embeddings; strong empirical results showing multimodal substantially beats text-only; publicly released code and datasets enable reproducibility.

Limitations: Limited to English-language datasets (Snopes, PolitiFact); evaluation restricted to political and consumer fact-checking domains; assumes fact-checking articles already exist for claims (reactive rather than proactive); doesn't address scalability for real-time deployment at social media scale.

Open questions: How does the approach generalize to new fact-checking sources and emerging claims? Can the framework be extended to cross-lingual retrieval? How sensitive is the model to the quality and comprehensiveness of the FC-article repository?