Studying fake news propagation in social network data
The impact of mass propagation of misinformation on social media became glaringly obvious with Trumpism. “Fake news” dynamically marries fiction with reason to make a seemingly persuasive argument by triggering certain emotions, affirming a demographic’s biases, and preserving certain insecurities. It is deceptive manipulation that appeals to ideals. These ideas may lend an air of credibility but they can be vanquished by logical, critical scrutiny.
62% of American adults get their news primarily from social media, which forms the primary platform for the spread of misinformation. Social networks can be represented by a graph, consisting of nodes representing people/pages, and edges signifying connection parameters.
A Machine Learning (ML) approach, TraceMiner, is suggested to classify messages as fake or real by studying their diffusion in the social network. The data is multidimensional and sparse — connections between people are complex and only a small fraction of people spread messages. This problem is addressed by considering the proximity of nodes, and social dimensions.
The social network graph can be used to generate representative embedding vectors, which do well in classfication tasks — classifying as fake or real, and such. Proximities, or degrees of connection, are important features when representing social networks. The first-order and second-order proximities represent friends, and friends of friends, respectively, if taking Facebook as an example. This indicates that users connected to each other have similar interests, and so do users with mutual friends. Though this may not always be the case, community structures form important features for the ML model.
Since the same misinformation message can be propagating down the edges of different disconnencted subsequences within the social network, LSTM-RNNs (Long-Short Tem Memory-Recurrent Neural Nets) are suggested to capture the relations between far-away subcommunities. These neural networks are computational constructs modelled after the human brain to understand sequential data on a large scale and to evaluate messages.
The proximity between nodes is captured in random walks within the graph, where a walk samples a random traversal between two nodes. According to the algorithm, DeepWalk, nodes sampled together retain their similarity when encoded to a space of low dimensionality. This encoding helps alleviate the data sparsity of utilizing social media users as features.
Then, the user sequences of information diffusion are represented. This utilizes the sequence of the spread of a social media message in the network. The previously mentioned LSTM-RNNs model these sequences, and hence help in the classification of propagation pathways.
An important finding is that spreaders of information can be used to predict message categories. This algorithm works well to classify messages as fake or real, even in the absence of content information, because it utilizes learned network structures. This is helpful in detecting fake news early on in its spread. TraceMiner, the algorithm, classifies a message as fake or real to ~85% accuracy. This outperforms other content-based classification algorithms.
Misinformation (“fake news”) is more likely to be spread from similar sources in similar sequences to similar people. TraceMiner capitalizes on this idea by factoring in the sequences of diffusion. The content of fake news is less descriptive, and its intentional spreaders manipulate the content to make it look more similar to non-rumor information. TraceMiner addresses these issues by being largely content-independent.
My article is a summarization of this paper.