User profiles for fake news detection¶
User profile features are attributes derived from a news sharer's social media account — both metadata available directly from APIs (explicit features) and attributes inferred from behavioral signals such as tweet history or profile images (implicit features). The core hypothesis is that users who systematically spread fake news differ demographically and behaviorally from those who share real news, and that these differences are detectable at scale.
Key papers¶
- Castillo et al. (2011) — Information Credibility on Twitter: foundational work; identifies user-based features (registration age, followers, followees, activity level) as key credibility signals; shows these features rank among top predictors of news veracity on Twitter alongside propagation patterns.
- Grinberg et al. (2019) — Fake news on Twitter during the 2016 U.S. presidential election: individual-level Twitter analysis linked to voter registration; documents extreme concentration (1% consume 80% of fake news), with older conservative voters overrepresented; identifies demographic and behavioral patterns of high consumers.
- Guess, Nagler & Tucker (2019) — Less than you think: Prevalence and predictors of fake news dissemination on Facebook: links survey demographics to Facebook sharing behavior; identifies age as strongest predictor (65+ share 7× more fake news); finds education, income, and race do not predict sharing; users who share most overall share least fake news.
- Shu et al. (2019) — The Role of User Profiles for Fake News Detection: first systematic study; shows UPF outperforms RST and LIWC content baselines on FakeNewsNet; RegisterTime and political bias are top features.
Key concepts¶
- Explicit features: directly available from API metadata — account age (RegisterTime), verified status, follower/following counts, post count, favorite count.
- Implicit features: inferred from content and behavior — age, personality (Big Five), geolocation, profile image object type, political bias score.
- Fake news Ratio (FR): \(FR(i) = n_i^{(f)} / (n_i^{(r)} + n_i^{(f)})\), a per-user score used to identify representative fake-news and real-news propagators.
- UPF vector: news-level feature formed by averaging per-user feature vectors across all users who shared a given item.
Connections¶
- Closely related to social-context detection; user profiles are one component of the social context.
- Political bias is a particularly discriminative implicit feature.
- Bot filtering (e.g., Botometer) is typically a prerequisite before profile-based analysis to avoid noise from automated accounts.