Ordinal regression¶
Ordinal regression (or ordinal classification) is a supervised learning task where the target variable has a natural order or ranking. Unlike nominal classification (e.g., cat vs. dog, where there is no natural ordering), ordinal targets have meaningful progression: low → medium → high, or negative → neutral → positive.
Examples in misinformation research¶
Factuality assessment: Rating news outlets or articles on a 3-point scale (low, mixed, high) or claim veracity on a 5-point scale (false, mostly-false, mixed, mostly-true, true). Order matters: confusing "high" with "low" is worse than confusing "high" with "mixed."
Political ideology: Rating news outlets on a 7-point left-right spectrum (extreme-left, left, center-left, center, center-right, right, extreme-right). Predicting center when the true label is extreme-right is a gross error (opposite direction).
Sentiment analysis: 5-point sentiment scale (very negative → very positive). Misclassifying positive as negative is worse than misclassifying positive as neutral.
Why ordinal regression matters¶
Better loss functions: Standard classification loss (0-1) treats all misclassifications equally. Ordinal regression uses loss functions that penalize out-of-order errors more heavily. - E.g., confusing class 1 with class 2 is less penalized than confusing class 1 with class 5 - Common metrics: Mean Absolute Error (MAE), Ordinal-aware metrics
Improved predictions: By respecting the ordering, models learn that intermediate predictions are more plausible than extreme ones for ambiguous cases.
Data efficiency: Ordinal structure provides inductive bias, improving generalization with limited data.
Common approaches¶
Threshold-based: Learn a single regression model to predict a continuous value, then apply ordinal thresholds to convert to class labels. Simple, interpretable.
One-vs-rest: Train K-1 binary classifiers, where each distinguishes between "≤ k" and "> k" for k = 1 to K-1. Enforces ordering naturally.
Copula ordinal regression: Model the joint distribution of multiple ordinal variables capturing dependencies (e.g., factuality and bias jointly). Used when multiple ordinal tasks are correlated.
Neural approaches: Embed inputs in a shared representation space, then apply ordinal-aware loss functions (e.g., ordinal cross-entropy).
Evaluation metrics¶
Mean Absolute Error (MAE): Average distance between predicted and true ordinal class. Respects ordering.
Mean Absolute Error - Macro (MAE_M): Macro-averaged variant, more robust to class imbalance than simple MAE.
Ordinal-aware accuracy: Variants that assign partial credit for near-miss predictions.
Spearman correlation: Rank correlation between predictions and true labels, capturing whether the ranking is preserved.
Key papers¶
- Multi-Task Ordinal Regression for Jointly Predicting the Trustworthiness and the Leading Political Ideology of News Media: applies Copula Ordinal Regression to jointly predict outlet trustworthiness (3-point) and political ideology (7-point); demonstrates multi-task learning with auxiliary tasks at different ordinal granularities improves performance
- General ordinal regression foundation: Ordinal Regression for Machine Learning and sentiment analysis literature (e.g., SemEval sentiment tasks using 5-point scales)
Related topics¶
- Classification: more general supervised learning task
- Regression: predicting continuous values; ordinal regression bridges classification and regression
- Machine learning: broader field
- Multi-task learning: joint learning of multiple ordinal tasks (e.g., bias and factuality) can exploit their correlation