Skip to content
Generating Sentiment-Preserving Fake Online Reviews Using Neural Language Models and Their Human- and Machine-based Detection

Generating Sentiment-Preserving Fake Online Reviews Using Neural Language Models and Their Human- and Machine-based Detection

Authors: David Ifeoluwa Adelani, Haotian Mai, Fuming Fang, Huy H. Nguyen, Junichi Yamagishi, Isao Echizen Venue: arXiv, 2019 — arXiv:1907.09177

TL;DR

The paper demonstrates a practical attack where low-skilled adversaries can generate high-quality fake product reviews at scale by fine-tuning publicly available GPT-2 models on real review datasets. A two-step approach—generation plus BERT-based validation to preserve sentiment—produces reviews indistinguishable from genuine ones. Both human raters and automated detectors (Grover, GLTR, OpenAI detector) fail to reliably identify the fake reviews.

Contributions

  • Demonstrates feasibility of sentiment-preserving fake review generation using pre-trained language models without specialized expertise
  • Proposes a two-stage pipeline: GPT-2 generation conditioned on a seed review, followed by BERT-based validation to filter reviews with incorrect sentiment
  • Comprehensive evaluation of human ability to detect generated reviews via subjective assessment with 80 participants
  • Empirical evaluation of three automated detection methods (Grover, GLTR, OpenAI GPT-2 detector) and their failure modes
  • Shows fine-tuned models outperform pre-trained models in sentiment preservation (67.0–70.7% vs. 62.1–64.3%)

Method

The attack exploits the transferability of large pre-trained language models. The authors fine-tune GPT-2 (117M parameters) on Amazon and Yelp review datasets. The generation pipeline takes a real review as a seed, generates a candidate review x', and validates it using BERT fine-tuned as a sentiment classifier. If the generated review matches the original review's sentiment (positive or negative), it enters the attack pool; otherwise it is discarded.

Fine-tuning improves substantially over the pre-trained model: adding task-specific supervision via BERT validation and explicit sentiment modeling yield the best results. The authors also experiment with multi-layer LSTM conditioning on sentiment embeddings, which achieves 70.7% sentiment preservation on Amazon and 71.0% on Yelp.

Results

Sentiment preservation: Fine-tuned GPT-2 achieves 67.0% ± 1.4% (Amazon) and 67.7% ± 1.2% (Yelp) sentiment preservation; explicit sentiment modeling reaches 70.7% ± 1.3% (Amazon) and 70.1% ± 1.2% (Yelp).

Subjective fluency evaluation: 80 human raters (39 native, 41 non-native English speakers) rated generated reviews on a 5-point Likert scale. Fake reviews achieved mean opinion scores of 3.23–3.30 (similar to 2.95–3.49 for real reviews), indicating humans find them fluent and indistinguishable from genuine reviews.

Detection performance: Single detectors achieved equal error rates (EER) of 23.5% (GPT-2PD), 40.9% (GLTR), and 11.8% (Grover). Ensemble methods combining multiple detectors reduced EER to 19.6%. Humans choosing between one real and three fake reviews had accuracy near chance (25.4%–34.6%), demonstrating humans cannot reliably identify generated reviews.

Connections

  • Related to [[2018-jindi-stay-on-topic-generating-context-specific-fake-restaurant-reviews]] via fake review generation methodology
  • Relevant to [[2019-zellers-defending-against-neural-fake-news]] on detecting model-generated text
  • Builds on [[2018-radford-language-models-are-unsupervised-multitask-learners|GPT-2's transfer learning capabilities]]
  • Connected to Sentiment Analysis and Text generation literatures

Notes

The work highlights a critical practical threat: state-of-the-art detectors are not yet reliable for distinguishing AI-generated reviews from authentic ones. The reliance on pre-trained, publicly available models makes this attack accessible to non-experts. A key limitation is the relatively modest sentiment preservation rates (67–71%), meaning many generated reviews fail validation and are discarded—a real attacker would need to generate many more candidates than humans would post. The paper does not address countermeasures beyond detection; rate-limiting or review volume constraints might mitigate the attack's practical impact.