Skip to content

GUIDE

Quality assessment

Quality assessment¶

Quality assessment of text outputs requires evaluating multiple dimensions—grammaticality, coherence, factuality, helpfulness—often combining automated metrics with human judgment. For machine-generated text, quality assessment intersects with detectability: how well can evaluators distinguish generated from human-written content?

Key papers¶

Clark et al. (2021) — All That's 'Human' Is Not Gold: Analyzes what dimensions human evaluators focus on when assessing generated text (surface-level grammar/style vs. content quality) and shows untrained evaluators struggle to assess quality based on origin.

Human evaluation — collecting human quality judgments
NLP evaluation — evaluation methodology in NLP