Skip to content
Multimodal Emergent Fake News Detection via Meta Neural Process Networks

Multimodal Emergent Fake News Detection via Meta Neural Process Networks

Authors: Yaqing Wang, Fenglong Ma, Haoyu Wang, Kishlay Jha, Jing Gao

Venue: ACM SIGKDD Conference on Knowledge Discovery and Data Mining, August 2021 — DOI

TL;DR

Existing fake news detectors trained on labeled data perform poorly on novel, emergent events due to domain shift. The paper proposes MetaFEND, which combines meta-learning and neural processes to leverage a handful of labeled posts from a new event to quickly adapt a detection model. Key innovations include hard attention (to focus on informative posts despite imbalanced class distributions) and label embedding (to treat categorical labels as semantic vectors rather than scalar values).

Contributions

  • Formulates emergent fake news detection as a few-shot learning problem, recognizing that large-scale labeled datasets are unavailable for breaking news events
  • Proposes MetaFEND, a meta neural process framework unifying meta-learning's parameter adaptation with neural processes' ability to condition on context data, linked via a novel leave-one-out simulation task
  • Introduces hard attention mechanism using Straight-Through Gumbel SoftMax to select the single most informative post from an imbalanced support set, replacing standard soft attention
  • Proposes label embedding to capture categorical structure of true/false labels, improving generalization across heterogeneous events
  • Demonstrates 4-5% accuracy improvements over MAML, ANP, and other baselines on Twitter and Weibo datasets in 5-shot and 10-shot settings

Method

Problem setting: Given a new event with K labeled posts (support set), detect fake news on unlabeled posts (query set). Unlike standard classification, events are heterogeneous: news style, vocabulary, and class distributions vary significantly across events.

Meta neural process design: The framework has two stages executed with the same architecture:

  1. Event adaptation: Simulate a learning task by treating each support post i as a target and the remaining posts as context. Minimize loss on this leave-one-out prediction to adapt parameters θ → θ_e via gradient descent. This bridges meta-learning (parameter update) and neural processes (explicit conditioning).

  2. Detection: Use the adapted parameters θ_e to make predictions on the query set, conditioned on the full support set.

Architecture:

  • Feature extractor: CNN for text (with FastText embeddings), frozen VGG19 for images, concatenated as d-dimensional features
  • Label embedding: Fixed embeddings vec(fake) and vec(real) capture semantic meaning of labels, treating them as categorical rather than scalar
  • Aggregator: Uses scaled dot-product attention over context features with hard attention variant
  • Soft attention: Standard softmax over support posts; treats all posts equally when class-imbalanced
  • Hard attention (novel): Samples a single post k using Straight-Through Gumbel SoftMax, creating a discrete, differentiable argmax that selects the most similar support post
  • Detector: Fully connected layer outputting d-dimensional vector; predictions computed as cosine similarity to label embeddings

Results

Datasets: MediaEval Twitter dataset (6,934 fake / 5,683 real news) and Weibo dataset (4,050 fake / 3,558 real news), split to ensure no event overlap between train and test.

Baselines: Multi-modal models (VQA, att-RNN, EANN fine-tuned on support set) and few-shot methods (CNP, ANP, MAML, Meta-SGD).

Performance:

Setting Twitter Accuracy Twitter F1 Weibo Accuracy Weibo F1
5-shot 86.45% (+5.12% vs MAML) 86.21% 81.28% (+4.41%) 80.19%
10-shot 88.79% (+4.19% vs MAML) 88.66% 82.92% (+5.22%) 82.37%

MetaFEND outperforms all baselines, with larger gains in 10-shot setting where soft attention's limitation becomes more pronounced.

Ablation study:

  • Hard attention vs. soft attention: Hard attention improves accuracy, with advantage increasing as support set grows
  • Label embedding vs. label values: Label embedding consistently improves accuracy in both 5- and 10-shot settings

Connections

Notes

Strengths:

  • Well-motivated problem: emergent events are practically important and existing approaches fail due to domain shift
  • Novel integration of meta-learning and neural processes via leave-one-out simulation is elegant and principled
  • Hard attention mechanism is a clear improvement over soft attention, especially with imbalanced data; case study (Fig. 5) effectively illustrates the problem
  • Label embedding treating labels as categorical is a simple but effective insight
  • Strong empirical results on two datasets with diverse characteristics (Twitter: text-heavy; Weibo: image-heavy)

Weaknesses:

  • The improvement over MAML, while consistent, is modest (4-5%). The hard attention and label embedding contributions are somewhat incremental
  • Limited to binary classification (real/fake). Real-world misinformation is often more nuanced (satire, misleading, out-of-context, etc.)
  • Evaluation limited to two platforms (Twitter, Weibo). Generalization to other social media or news domains unclear
  • No discussion of computational cost or inference time compared to baselines
  • The leave-one-out simulation task, while novel, lacks theoretical justification for why this particular design bridges the two frameworks

Follow-ups:

  • Extension to multiclass misinformation categorization
  • Investigation of transfer across platforms and languages
  • Theoretical analysis of the leave-one-out design in the context of risk minimization