Skip to content

NLP evaluation

Evaluation in NLP encompasses both automatic metrics (BLEU, ROUGE, perplexity) and human judgment, with growing emphasis on understanding when metrics correlate with human assessments and best practices for collecting reliable human annotations.

Key papers