Quality assessment¶
Quality assessment of text outputs requires evaluating multiple dimensions—grammaticality, coherence, factuality, helpfulness—often combining automated metrics with human judgment. For machine-generated text, quality assessment intersects with detectability: how well can evaluators distinguish generated from human-written content?
Key papers¶
- Clark et al. (2021) — All That's 'Human' Is Not Gold: Analyzes what dimensions human evaluators focus on when assessing generated text (surface-level grammar/style vs. content quality) and shows untrained evaluators struggle to assess quality based on origin.
Related topics¶
- Human evaluation — collecting human quality judgments
- NLP evaluation — evaluation methodology in NLP