Skip to content

Feature engineering for fake news detection

Feature engineering covers approaches that manually design or extract structured feature representations from news content, social context, or knowledge sources, as opposed to end-to-end deep learning methods that learn representations directly from raw input. Classical classifiers (logistic regression, random forests, SVM) are typically applied to these hand-crafted features.

Common feature families in fake news detection include: - Linguistic / stylometric: RST discourse features, LIWC psycholinguistic categories, n-gram representations, readability scores, lexical/syntactic features. - Social-context / user-profile: account metadata, behavioral features (post frequency, follower ratios), inferred demographics (age, personality, political bias). - Knowledge-based: fact-check signals from external knowledge bases. - Network: propagation graph statistics, stance distribution.

Feature importance analysis (e.g., Gini impurity in Random Forest) is frequently used to identify which features drive classification, providing interpretability absent in black-box neural models.

Key papers

Connections

  • User profiles is the specific application of feature engineering to social media account attributes.
  • Social-context detection relies heavily on feature-engineered representations of user and network signals.