GNN Training¶
GNN training encompasses methods and systems for learning parameters of graph neural networks. Unlike standard deep learning training (which operates on independent data samples), GNN training must handle graph structure: each node's features depend on its neighbors, creating dependencies that complicate distributed training.
Key challenges in distributed GNN training include: (1) partitioning graphs to balance computation and minimize communication, (2) sampling neighborhoods to reduce memory and computation, (3) synchronizing updates across machines, and (4) efficient feature storage and access. Modern systems like DGL, GraphSAINT, and DistDGL address these challenges with specialized partitioning, sampling, and communication strategies.
Key papers¶
- The Evolution of Distributed Systems for Graph Neural Networks and their Origin in Graph Processing and Deep Learning: A Survey — comprehensive survey of distributed GNN training systems, covering partitioning, sampling, synchronization, and communication strategies
Related topics¶
- Graph Neural Networks (GNN architectures and models)
- Distributed Systems (distributed training and computation)
- Scalability (handling large graphs)