Multi-task learning¶
Multi-task learning (MTL) is a machine learning paradigm where a single model jointly learns multiple related tasks. By sharing representations between tasks, MTL can improve generalization and data efficiency, particularly when tasks are complementary or when some tasks have limited training data.
Motivation¶
Multi-task learning is motivated by the observation that learning shared representations across related tasks can: - Reduce overfitting by increasing effective training set size - Improve generalization through inductive bias from auxiliary tasks - Leverage unlabeled or partially labeled data for auxiliary tasks - Capture shared structure between tasks more efficiently than single-task models
Architecture patterns¶
Hard parameter sharing¶
A shared network layer processes all tasks' input, then task-specific output layers predict task-specific labels. This forces the model to learn shared representations.
Soft parameter sharing¶
Each task has its own model, but task-specific parameters are regularized to be similar. This allows more task-specific flexibility while encouraging representation sharing.
Applications in misinformation detection¶
- Rumour verification: Joint learning of veracity (main task) with stance classification and rumour detection (auxiliary tasks) — see Kochkina et al. 2018
- Fake news classification: Combining headline, body, and credibility prediction tasks
- Claim verification: Joint learning of evidence retrieval and claim-evidence relevance ranking
Effectiveness factors¶
Multi-task learning effectiveness depends on: - Task relatedness: Highly related tasks benefit more from shared representations - Data balance: When tasks have imbalanced dataset sizes, auxiliary tasks with abundant labels can help main tasks with sparse labels - Label distribution properties: Tasks with lower kurtosis (more balanced class distributions) and higher entropy show greater MTL gains
Key papers¶
- Multi-Task Ordinal Regression for Jointly Predicting the Trustworthiness and the Leading Political Ideology of News Media: applies Copula Ordinal Regression for jointly modeling outlet factuality and political ideology; auxiliary tasks at different bias granularities reduce prediction error
- Kumar & Carley (2019) — Tree LSTMs with Convolution Units to Predict Stance and Rumor Veracity in Social Media Conversations — demonstrates multi-task learning (stance + rumor veracity) with Tree LSTM architectures; alternating task training strategy; achieves 12% and 15% F1-macro improvements over single-task baselines on PHEME dataset.
- Kochkina et al. (2018) — All-in-one: Multi-task Learning for Rumour Verification — demonstrates MTL benefits for rumor veracity prediction with stance classification and rumor detection as auxiliary tasks; analyzes link between dataset properties and MTL effectiveness
See also¶
- Transfer learning for fake news detection — related paradigm for leveraging knowledge from one task to improve another
- Neural networks — foundation for most MTL implementations