Skip to content
KAN: Knowledge-aware Attention Network for Fake News Detection

KAN: Knowledge-aware Attention Network for Fake News Detection

Authors: Yaqian Dun, Kefei Tu, Chen Chen, Chunyan Hou, Xiaojie Yuan

Affiliations: College of Computer Science, Nankai University; Tianjin Key Laboratory of Network and Data Security Technology; School of Computer Science and Engineering, Tianjin University of Technology

Venue: The Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-21), 2021

TL;DR

Most fake news detection models rely on textual features and social context but ignore knowledge-level relationships among entities in news. This paper proposes KAN, which extracts entity mentions from news, aligns them with knowledge graphs (Wikidata), and uses two attention mechanisms (N-E and N-E2C) to measure entity and entity-context importance. KAN achieves 7.4% F1 improvement over prior methods on PolitiFact, 2.8% on GossipCop, and 9.7% on PHEME.

Contributions

  1. Incorporates entities and their entity contexts (neighbors in knowledge graphs) as external knowledge for fake news detection—addressing the limitation that existing methods ignore knowledge-level entity relationships.
  2. Proposes Knowledge-aware Attention Network with two attention mechanisms:
  3. N-E attention: Measures semantic similarity between news content and entities to assign importance weights.
  4. N-E2C attention: Assigns importance weights to entity contexts based on the vitality of their corresponding entities.
  5. Demonstrates through ablation studies that both knowledge components and attention mechanisms are critical to detection performance.

Method

Knowledge Extraction: Uses entity linking (TagMe tool) to identify entity mentions in news and align them with Wikidata. For each linked entity, extracts its immediate neighbors (one-hop distance) as entity context.

Architecture: - Text Encoder: Transformer encoder with positional encoding to generate news representation p. - Knowledge Encoder: Separate transformer encoders for entity embeddings (from word2vec) and entity context embeddings to produce intermediate encodings q' and r'. - Attention Mechanisms: - N-E attention computes attention weights α between news and entities to produce weighted entity representation q. - N-E2C attention uses news and entity representations to weight entity contexts and produce representation r. - Classifier: Concatenates p, q, and r, feeds into a fully-connected layer with softmax and L2 regularization.

Results

Experiments on three benchmark datasets:

Dataset Metric KAN KCNN B-TransE GRU-2
PolitiFact F1 0.8539 0.7804 0.7641 0.7041
PolitiFact Accuracy 0.8586 0.7827 0.7694 0.7109
GossipCop F1 0.7713 0.7433 0.7340 0.7079
GossipCop Accuracy 0.7766 0.7491 0.7394 0.7180
PHEME F1 0.7461 0.6489 0.6074 0.6917
PHEME Accuracy 0.7830 0.7265 0.7200 0.7371

Ablations reveal: - Removing entity contexts (KAN\EC) degrades performance, confirming their value. - Removing entities entirely (KAN\E) shows entities are crucial for disambiguation. - Removing all external knowledge (KAN\EC\E) reduces F1 by 2.2% on PolitiFact, 1.2% on GossipCop, and 1.3% on PHEME. - N-E and N-E2C attention mechanisms improve performance by 2.2% accuracy on PolitiFact and 6.2% on GossipCop when used together.

Connections

Notes

Strengths: - Novel and well-motivated use of entity contexts from knowledge graphs—entities rarely appear in isolation; their neighbors provide disambiguating context. - Thorough ablation studies demonstrate each component contributes meaningfully. - Strong empirical results across three diverse datasets (politics, entertainment, Twitter events). - Clear architectural design with interpretable attention weights.

Limitations: - Entity linking quality depends on the TagMe tool; errors propagate downstream. The paper does not report linking accuracy or analyze failure modes. - Knowledge graph coverage bias not discussed—rare entities may have sparse contexts or be absent from Wikidata. - Comparison to other knowledge-aware methods (e.g., B-TransE) is included, but limited discussion of why KAN's attention design outperforms them. - The entity context representation uses simple averaging of neighbor embeddings; richer encoding of multi-hop paths or relation types might improve performance. - Evaluation limited to supervised settings; generalization to out-of-domain news without retraining is unclear.