Causal Machine Learning: A Survey and Open Problems¶

Authors: Jean Kaddour, Aengus Lynch, Qi Liu, Matt J. Kusner, Ricardo Silva

Venue: arXiv, 2022 — arXiv:2206.15475

TL;DR¶

Comprehensive survey of causal machine learning (CausalML), which formalizes data generation as a structural causal model to reason about interventions and counterfactuals. Categorizes 191 pages of CausalML work into five groups—causal supervised learning, causal generative modeling, causal explanations, causal fairness, and causal reinforcement learning—with systematic comparison of methods, open problems, and applications to vision, NLP, and graph learning.

Contributions¶

Unified taxonomy of CausalML methods across five problem categories with open problems for each
Causal foundations (Chapter 2): Self-contained introduction to structural causal models, interventions, counterfactuals, and identifiability without assuming prior knowledge of causal inference
Causal supervised learning (Chapter 3): Invariant feature learning and invariant mechanism learning to learn domain-robust representations that remain predictive across environments
Causal generative modeling (Chapter 4): Structural assignment learning and causal disentanglement to generate counterfactual samples
Causal explanations (Chapter 5): Feature attribution and contrastive explanations grounded in causal graphs
Causal fairness (Chapter 6): Counterfactual and interventional fairness criteria to mitigate discrimination
Causal reinforcement learning (Chapter 7): Model-based RL, off-policy evaluation, and counterfactual data augmentation
Modality-specific applications (Chapter 8): Causal computer vision, NLP, and graph representation learning
Benchmarks and open challenges (Chapter 9–10): Causal benchmarks, limitations of current approaches, and future directions

Method¶

The survey adopts a causal perspective on machine learning. Rather than treating data as i.i.d. samples from a fixed distribution, CausalML formalizes the data-generation process as a structural causal model (SCM): a directed acyclic graph (DAG) with nodes as variables and edges as causal relationships. This enables:

Reasoning about interventions: What happens to \(Y\) if we do set treatment \(T\) to \(t\)? (counterfactual prediction)
Identifying causal effects: When is the causal effect identifiable from observational data? (relies on graph structure and back-door / front-door criteria)
Invariance and robustness: Which features remain predictive under distribution shifts? (via the principle of independent mechanisms)

Key concepts:

Spurious associations arise from unobserved confounders; e.g., in ImageNet, bird images often have trees in the background due to photographer bias, not because trees cause birds
Style and content decomposition: Disentangle "style variables" (domain-specific features subject to interventions) from "content variables" (causal parents of the outcome)
Counterfactual invariance: Predictions must be invariant to interventions on attributes we don't want to influence predictions (e.g., race or gender in fairness)
Causal influence: Quantify one variable's causal effect on another via KL-divergence or other information-theoretic measures

The survey reviews methods across modalities: data augmentation for deconfounding (vision), contrastive learning and foundation-model fine-tuning (NLP), and causal graph learning (graph neural networks).

Results¶

No empirical results table; this is a methodological survey. Key findings include:

Invariant feature learning outperforms standard supervised learning on out-of-distribution benchmarks (WILDS, DomainNet, ImageNet-C) when spurious associations are present
Contrastive learning implicitly enforces causal structure by comparing samples under soft interventions (style augmentations)
Causal fairness methods successfully reduce disparate impact in hiring, lending, and criminal justice domains compared to naive approaches
Causal RL achieves better off-policy evaluation and sample efficiency than model-free methods on simulated benchmarks
Open challenges include: tractable causal discovery from high-dimensional data, learning from limited observational data with hidden confounders, and generalizing across multiple environments

Connections¶

Causal Understanding of Fake News Dissemination on Social Media — applies causal inference (inverse propensity scoring) to fake news dissemination
Robustness in language models — causal invariance is a foundation for adversarial and domain-shift robustness
Cross-domain generalization — invariant features address distribution shift and domain adaptation
Fairness in NLP — counterfactual and interventional fairness are core to unbiased ML
Explainability in misinformation detection — causal graphs and influence measures enable transparent, interpretable predictions
Graph Neural Networks — causal structure learning on graphs
Domain adaptation — learning representations that transfer across environments via causal assumptions

Notes¶

This survey synthesizes a nascent and fast-growing field. The main strength is the unified vocabulary and taxonomy—CausalML comprises heterogeneous methods (data augmentation, contrastive learning, causal discovery, fairness constraints) that share common causal principles. The authors usefully distinguish between "the good" (CausalML enables more robust, interpretable, and fair models), "the bad" (identifiability and confounding assumptions are strong and often untestable in practice), and "the ugly" (causal discovery from high-dimensional data remains open, and the cost of enforcing invariance can be high in flexible models). For fake news research, causal approaches are relevant to understanding how models generalize across sources, time periods, and languages—and to designing deconfounded features that capture misinformation signals rather than spurious patterns in training data.