Skip to content

Machine Translation

Machine translation (MT) systems automatically convert text from a source language to a target language. Neural machine translation (NMT)—based on sequence-to-sequence architectures with attention and transformers—has dramatically improved over statistical machine translation, achieving fluent outputs that often rival human translation on many language pairs.

Hallucination in neural machine translation

NMT systems exhibit hallucinations, though the phenomenon manifests differently than in other NLG tasks. Common hallucination types:

  • Untranslated sequences: Model generates repetitions or copies of source-language words in the target output
  • Back-translation errors: Model generates target sequences that don't correspond to any part of the source
  • Alignment failures: Incorrect word-order or phrase-structure alignments that produce ungrammatical or nonsensical translations

Hallucinations often stem from dataset noise, training/inference mismatch (exposure bias), and implicit coverage constraints in beam search decoding.

Evaluation

  • Automatic metrics: BLEU, METEOR, TER score overall translation quality but don't specifically measure hallucination
  • Human evaluation: Fluency and adequacy judgments; hallucinations reduce adequacy
  • Alignment-based metrics: Check whether translated sequences align to source phrases

Mitigation

  • Data cleaning: Remove low-quality, misaligned, or noisy parallel data
  • Architecture improvements: Explicit coverage mechanisms, fertility models
  • Training methods: Diversified training, back-translation, knowledge distillation
  • Inference: Constrained decoding with coverage penalties

Key papers