Skip to content
A Comprehensive Survey on Trustworthy Graph Neural Networks: Privacy, Robustness, Fairness, and Explainability

A Comprehensive Survey on Trustworthy Graph Neural Networks: Privacy, Robustness, Fairness, and Explainability

Authors: Enyan Dai, Tianxiang Zhao, Huaisheng Zhu, Junjie Xu, Zhimeng Guo, Hui Liu, Jiliang Tang, Suhang Wang

Venue: ACM Computing Surveys, 2023 — arXiv:2204.08570

TL;DR

A comprehensive survey of trustworthy Graph Neural Networks covering four critical dimensions: privacy attacks and defenses against membership/property inference and model extraction; robustness against adversarial perturbations; fairness mechanisms to prevent discrimination in graph-based predictions; and explainability methods to interpret GNN decisions. Addresses the gap between prior surveys on GNN architecture and practical deployment requirements in high-stakes domains like finance and healthcare.

Contributions

  • Comprehensive taxonomy of privacy attacks on GNNs including membership inference, property inference, reconstruction attacks, and model extraction
  • Categorization of privacy-preserving techniques: differential privacy, federated learning, machine unlearning, adversarial privacy-preserving approaches, and model ownership verification
  • Survey of robustness methods and defenses against adversarial attacks on graph-structured data
  • Review of fairness approaches ensuring non-discriminatory GNN predictions across protected attributes
  • Synthesis of explainability methods for interpreting GNN decision-making processes
  • Framework connecting four trustworthiness aspects to ethical principles: respect for human autonomy, prevention of harm, fairness, and explainability

Method

GNNs employ message-passing mechanisms where node representations are iteratively updated by aggregating information from neighborhood nodes. The paper formalizes GNN computations as layer-wise transformations where each node's representation captures both node features and local graph structure. For node classification, semi-supervised learning splits training nodes into transductive (labeled during training) and inductive (new test nodes) settings. Link prediction learns node pair relationships by computing similarity scores. Graph classification applies readout functions to aggregate node embeddings. Community detection identifies densely connected node subgraphs.

Privacy Attacks

The survey categorizes four attack types:

Membership inference determines whether a sample was in the training set. Supervised privacy attack uses shadow datasets mimicking the target model's training distribution, then trains binary classifiers on prediction vectors.

Property inference extracts dataset-level properties (e.g., gender ratio) using embedding reconstruction from model outputs.

Reconstruction attacks recover node attributes and graph structure from learned embeddings. The unified framework uses supervised attack models trained on shadow graph data to infer private information.

Model extraction learns a surrogate model replicating target model behavior, used as a stepping stone for other attacks.

Privacy-Preserving Methods

Differential privacy adds calibrated noise to gradients during training. NoisySGD injects Gaussian noise proportional to privacy budget ε; DP-GCN applies local differential privacy to perturb node features before aggregation.

Federated learning trains locally on decentralized data without centralizing sensitive information. Parameter aggregation combines updates across devices while keeping raw data on-device.

Machine unlearning removes training sample influence through exact unlearning (retraining) or approximate methods (influence functions).

Adversarial privacy-preserving uses adversarial networks where an encoder learns task-useful representations while an adversary attempts to infer sensitive attributes, forcing representation separation.

Model ownership verification uses watermarking and fingerprinting to prove ownership against model stealing.

Results

The survey systematically catalogs 100+ papers across privacy, robustness, fairness, and explainability for GNNs. Key findings:

  • Privacy attacks on GNNs exploit message-passing structure: neighborhood aggregation leaks topological information; node embeddings encode sufficient information for multiple attack types
  • Differential privacy methods provide formal privacy guarantees but with utility-privacy tradeoffs; federated learning scales to distributed settings but adds computational overhead
  • Adversarial robustness requires defending against perturbations to both node features and graph edges; certified defenses provide worst-case guarantees but limited scalability
  • Fairness approaches span pre-processing (node attribute modification), in-processing (representation learning constraints), and post-processing (prediction calibration)
  • Explainability methods include saliency-based (feature importance), example-based (prototype selection), and model-based (attention, decomposition) approaches

Connections

Notes

This survey fills a critical gap: while extensive surveys cover GNN architectures and graph mining techniques, trustworthiness research on GNNs remains scattered across venues. The four-pillar framework (privacy, robustness, fairness, explainability) aligned with ethical AI principles provides structure for understanding GNN limitations in high-stakes deployment.

Strengths: comprehensive coverage across four dimensions, clear problem formulations, taxonomy of attack/defense methods, discussion of future research directions, practical guidance for practitioners.

Limitations: the survey emphasizes message-passing GNNs (GCN) and would benefit from coverage of graph attention and pooling-based architectures; federated GNN learning and decentralized graph analysis remain emerging areas with limited foundational work; fairness definitions are still debated in the literature.

The paper notably highlights that trustworthiness aspects are not independent—for example, differential privacy noise can degrade model robustness, and fairness constraints may amplify certain adversarial vulnerabilities. Practitioners deploying GNNs in critical domains (credit scoring, recommendation systems, social network moderation) must navigate these interdependencies carefully.