Skip to content

Data quality

The reliability and correctness of training data—particularly labels assigned by human annotators. Data quality encompasses inter-annotator agreement, label noise, systematic biases, and the overall fitness of a dataset for a modeling task. In NLP, data quality critically affects downstream model performance; models cannot exceed the quality of their training data.

Key papers