Quantifying Controversy on Social Media¶

Authors: Kiran Garimella, Gianmarco De Francisci Morales, Aristides Gionis, Michael Mathioudakis

Venue: ACM Transactions on Social Computing, Vol. 1, No. 1, January 2017 — DOI — arXiv

TL;DR¶

This work presents a graph-based pipeline to measure controversy in social media discussions. The authors build conversation graphs from Twitter interactions (retweets, follows, mentions), partition them to identify opposing sides, and apply multiple controversy measures—with a random-walk-based metric (RWC) emerging as the most reliable discriminator between controversial and non-controversial topics.

Contributions¶

A three-stage framework: conversation-graph construction, graph partitioning via METIS, and controversy measurement.
Multiple controversy quantification measures including random-walk-based, betweenness centrality, embedding-based, boundary connectivity, and dipole moment metrics.
Extensive empirical validation on Twitter topics (e.g., #beefban, #netanyahu, #russia_march) and external datasets from prior work (political blogs, election discussions).
Demonstration that random-walk-based measures outperform existing baselines and generalize across domains.
Analysis of how controversy scores evolve in response to high-impact events.

Method¶

The pipeline operates in three stages:

Graph Building. For a given topic (specified by hashtags or keywords), the authors construct conversation graphs from Twitter activity in multiple variants: - Retweet graph: Edges represent retweets; two users are connected if one retweets the other at least τ=2 times on the topic. - Follow graph: Edges represent follower relationships among users discussing the topic. - Mention graph: Users are connected if they mention each other in posts about the topic. - Content graph: Users are linked if they use the same hashtags or URLs. - Hybrid approaches: Combinations of the above.

Graph Partitioning. The conversation graph is partitioned into two clusters using METIS (a spectral clustering algorithm based on modularity optimization). This step identifies two potential "sides" of the controversy. Visualization via force-directed layout (Gephi) reveals clustering structure in controversial topics; non-controversial topics often form a single dominant component.

Controversy Measures. Six main measures are proposed and evaluated:

Random Walk Controversy (RWC): Based on the intuition that in a polarized discussion, a random walk starting from one side is less likely to cross to the other. RWC compares the probability of a random walk ending in the opposite partition versus the same partition. The metric is normalized to [−1,1], with high positive values indicating controversy.
Betweenness Centrality Controversy (BCC): Uses edge betweenness on the cut between partitions. When two partitions are well-separated, edges crossing the cut have high betweenness centrality; BCC measures the divergence of betweenness distributions.
Embedding Controversy (EC): Computes a low-dimensional embedding of the graph (via Gephi's ForceAtlas2) and measures the average inter-partition distance relative to intra-partition distances.
Boundary Connectivity (GMCK): Measures the proportion of boundary vertices (those connecting to the other side) and internal vertices; high controversy corresponds to few boundary connections.
Dipole Moment (MBLB): Inspired by the physics concept of dipole moments; assigns polarity values to high-degree vertices (top 5%) and computes the average signed polarization difference between partitions.
Cut-based measures: Simple conductor-like ratios of edges crossing the partition boundary.

Additionally, the authors test content-based approaches (bag-of-words, sentiment analysis) but find them less reliable.

Results¶

Twitter Topics. On 20 hand-curated controversial and non-controversial Twitter hashtags: - RWC achieves strong separation between controversial and non-controversial groups, with human-annotated controversy scores correlating at r = 0.51 (Pearson) with RWC. - BCC and EC also perform well; simple cut-based measures (GMCK, MBLB) fail to discriminate reliably. - Content-only approaches (bag-of-words, sentiment) show no significant separation.

External Datasets. The measures are validated on six external datasets from prior work (political blogs, Twitter politics, Brazilian soccer, gun control, Facebook university, NYC teams): - RWC shows Pearson correlation ≥ 0.34 across datasets where ground truth is known. - BCC and EC correlate well with RWC; simpler measures diverge. - All measures correctly identify controversial topics as more controversial than non-controversial ones.

Temporal Evolution. Analysis of controversy scores on 56 retweet graphs from Morales et al.'s Venezuela dataset (tracking the death of Hugo Chavez, Feb–May 2013): - RWC and EC track the evolution of controversy over time. - Both measures show a sharp drop on the day of the death ("D"), reflecting users' shift to a unified response, followed by rising controversy as political debate resumed. - GMCK remains nearly constant, indicating it lacks temporal sensitivity.

Synthetic Data. On random Erdős–Rényi graphs with two planted communities, RWC increases monotonically with community separation (intra-community edge probability p₁) and decreases with inter-community edge density (p₂), validating the expected behavior.

Connections¶

Related to polarization studies on Twitter via shared focus on network structure and opposing views.
Builds on earlier work on modularity and graph partitioning but extends beyond binary political graphs.
Precedes and informs research on exposure to opposing views in algorithmic recommendation.
Provides methods for identifying echo chambers and filter bubbles via conversation-graph analysis.
Referenced in later work on characterizing online echo chambers.

Notes¶

Strengths: - Comprehensive empirical evaluation across multiple datasets and domains. - Proposes a novel random-walk-based metric that outperforms baselines. - Methods are general and domain-agnostic—applicable to any social media platform with interaction data. - Gracefully handles the subjectivity of "controversy" through graph structure rather than content. - Includes both static and temporal analysis.

Limitations: - Focuses primarily on Twitter due to data availability; generalization to other platforms assumed but not fully validated. - Graph partitioning relies on METIS, a hard-threshold approach; multiway controversies (3+ sides) are not addressed. - Human annotation of controversy (the "ground truth" for validation) is subjective and limited to 20 topics for the main experiment. - Content-based methods are dismissed early; hybrid approaches deserving deeper exploration. - Random-walk-based measures assume two well-defined sides; performance may degrade in multi-faceted or fuzzy controversies.

Open questions: - How do these measures perform on non-binary controversies (e.g., abortion, with religious, secular, and pragmatic viewpoints)? - Can temporal dynamics of controversy (growth, decay, shifts) be used predictively to identify emerging controversies? - How sensitive are the measures to graph-building parameters (e.g., retweet threshold τ, user-selection criteria)?