Sunday, October 21, 2018

How Russian Twitter Trolls behave and attempt to influence mainstream media


Background:
Foreign actors–both independent and state-sponsored–are increasingly using the Internet as a tool of subversion in order to foment distrust in American institutions. The recent Director of National Intelligence report asserts the U.S. intelligence community’s confidence that Russian President Vladimir Putin ordered an influence campaign in 2016 that was intended to undermine public faith in the U.S. democratic process.[i] Much of this influence campaign took place on social media platforms, leading to congressional inquiries and requests for increased transparency. In response to those requests, Twitter has released a substantial dataset related to this alleged foreign interference in political conversations on Twitter in an effort to improve, through research, our ability to detect, understand, and neutralize disinformation campaigns as quickly and robustly as technically possible.

As researcher Jonathan Albright notes, “The state of the modern information ecosystem — including our dilemma addressing misinformation, propaganda, security, and hate speech — are manifestations of intrinsic misalignments within an increasingly hybridized global information system.”[ii] As we work to find solutions to these new challenges, we must better understand the movements of the content that comprises influence operations and the behaviors of those who propagate it.

Of particular interest is those cases where Russian trolls were successful in infiltrating U.S. news and establishing direct engagement with American Twitter users. One such account, with the username @cassishere, posted a photo of a Putin banner on the Manhattan Bridge that won a credit from The New York Daily News.[iii] There are also allegations that Michael Flynn Followed Russian troll accounts and pushed their messages in the days leading up to the 2016 election.[iv] Twitter itself even informed 1.4 million people that they had interacted with Russian trolls.[v] By looking at the behaviors of these Twitter accounts in closer detail, we can determine whether patterns exist that might help us defend against foreign influence operations in the future.

Two researchers at Clemson University have already begun to analyze the data. To date, they have divided the Russian Twitter trolls into five distinct categories: Right Troll, Left Troll, News Feed, Hashtag Gamer and Fearmonger. My SNA will build off of their research.

Research Question: How do identified Russian trolls engage with mainstream U.S. media accounts on Twitter?
-       Sub-Question: Based on an identified number of cases (TBD[vi]) in which trolls successfully infiltrated mainstream news coverage (i.e. were quoted in a news article), do “successful” trolls exhibit unique characteristics within the troll network?
-       [Based on existing data sets or previous SNA studies] Does the Russian Twitter troll network mirror mainstream news media Twitter networks, “normal” Twitter user networks, and/or known terrorist networks on Twitter?

Why SNA?
Social network analysis is uniquely positioned to begin exploring new questions about influence operations via social media. While this is an area of increasing academic and political inquiry, there is very little established literature due to the contemporary nature of the topic and the difficulty in studying it. While there are many limitations to any political science research about trust, influence, and behavior (see below), we can use SNA to identify patterns in the behavior of these “bot” accounts. Such patterns can help social media companies decide how to handle their presence on their platforms, and can help the wider public learn to defend against political manipulation online.

Hypothesis:
I predict that Russian trolls engaged directly with mainstream news media accounts on a regular basis, in order to push content to those entities. However, I believe the most successful trolls would work through third-party actors (“real” Twitter users) in order to erase their trace and make the information appear more believable. I believe that Russian trolls’ ultimate goal is to sway the beliefs of large portions of the American population by having their message picked up by trusted mainstream news channels. When it comes to the “successful” cases, I hypothesize that the strategies used varied widely—a deliberate attempt by Russia trolls to obfuscate and avoid any potential pattern recognition. I predict that the “successful” cases were pushing content that was more closely aligned with public opinion or other sources, suggesting that trolls are more likely to succeed if they stick closer to reality. Finally, I hypothesize that Russian Twitter troll networks are visually unique from known networks (for mainstream media, “normal” Twitter users, or even terrorists) when looking at patterns of interactions (both mentions and retweets). I believe this would give us the most useful insight into potential ways to identify troll accounts in the future.

Methodology:
For my SNA, I plan to identify a list of mainstream news media Twitter accounts, based on the news organizations Americans most commonly turn to receive their news.[vii] I plan use my larger dataset to identify “levels” of interaction, identified as general tweets (perhaps with certain keywords, yet to be determined), “mentions” that include either another identified Russian troll or one of the established mainstream news accounts I have identified, or “retweets” of any of the aforementioned users’ tweets. While the cohesion measures of this network itself

I also plan to look at the ego networks of a few specific Twitter handles which have been identified in prior research and investigation as having “successfully” infiltrated U.S. news or interacted with public figures (assumed to have wider influence).

Data Collection:
The initial dataset is publicly available on GitHub thanks to Clemson University researchers Darren Linvill and Patrick Warren. Linvill and Warren gathered this data using custom searches on Social Studio, a tool owned by Salesforce and contracted by Clemson’s Social Media Listening Center. The directory contains nearly 3 million tweets from Twitter handles that were found to be connected to the Internet Research Agency, a Russian “troll factory” that was implicated in special counsel Robert Mueller’s February 2018 indictment. Twitter provided Congress with 2,752 handles that were connected to the IRA in November 2017, and added an additional 946 handles in June 2018 (at which point they also removed 19 handles from the original list).[viii] The majority of the tweets in this data set were posted between 2015 and 2017, though I may limit this timeframe as necessary upon further review of the data. The full data file includes 2,973,371 tweets from 2,848 Twitter handles.

In an exciting update, on Wednesday, October 17, 2018 Twitter released an even more substantial archive of Tweets and media that “resulted from potentially state-backed information operations” on the platform. The dataset includes information from 3,841 accounts believed to be connected to the Russian Internet Research Agency, and 770 accounts believed to originate in Iran. These datasets include all public, nondeleted Tweets and media (e.g., images and videos) from accounts believed to be connected to state-backed information operations, including more than 10 million Tweets and more than 2 million images, GIFs, videos, and Periscope broadcasts.

Limitations:
In Twitter’s recent release, some account-specific information is hashed in the dataset for accounts with fewer than 5,000 followers in order to protect user privacy. I do not expect this to affect my SNA, since I do not plan to analyze normal user accounts other than the Russian trolls themselves. Also missing from this dataset is the reciprocal Twitter data of mainstream media accounts, which would provide crucial evidence as to how exactly news organizations engaged, if at all, with Russia trolls on Twitter. Without this piece, influence can only be assumed from the limited data available.

Most importantly, the impact of this information is still largely unknown, and incredibly difficult to measure. Despite knowing that these Russian troll accounts existed (and likely still exist, in different forms, today), we do not and cannot know if and to what extent their presence influenced American beliefs, much less American voter behavior in the 2016 election. For that reason—and to limit the scope of this project—I will not look at user interaction with these Russian trolls. Rather, I will focus on the behaviors of Russian trolls and their interactions with mainstream news media. This analysis will be admittedly one-way in nature, but I can use external sources to corroborate my findings through network analysis.

One area of future research would be to track, in real time (using NodeXL), the behavior patterns of other networks of Twitter users (e.g. politicians, news media accounts, or average users with respect to a specific trending topic) surrounding specific events (namely the upcoming 2018 midterm elections). Since Twitter data is constantly changing and users are able to delete tweets, the best way to analyze data is to use Twitter’s API in real time. Even this strategy is limited, though, because it only captures public accounts (not those who make their accounts private). However, public, “verified” accounts are a good representation of potential influence due to their high follower counts. Another area of future research would be to assess the traction of the Russian trolls identified as being influential by searching Media Cloud's open source text aggregation, indexing, and analysis tool of online information sources for references to that username.

I am taking the second module of this course.



[i] Office of the Director of National Intelligence, Intelligence Community Assessment, Assessing Russian Activities and Intentions in U.S. Elections, January 6, 2017.
[ii] Albright, Jonathan. “Web no.point.0: rise of the Splintr.net,Medium, 17 October 2018. < https://medium.com/tow-center/web-no-point-0-rise-of-the-splintr-net-d45869aa1b8>
[iii] Shane, Scott and Mark Mazzetti. “The Plot to Subvert an Election: Unraveling the Russia Story So Far,” New York Times, 20 September 2018 <https://www.nytimes.com/interactive/2018/09/20/us/politics/russia-interference-election-trump-clinton.html>
[iv] Collins, Ben and Kevin Poulsen. “Michael Flynn Followed Russian Troll Accounts, Pushed Their Messages in Days Before Election,” Daily Beast, 1 November 2017 < https://www.thedailybeast.com/michael-flynn-followed-russian-troll-accounts-pushed-their-messages-in-days-before-election>
[v] https://blog.twitter.com/official/en_us/topics/company/2018/2016-election-update.html
[vi] Shane, Scott and Mark Mazzetti. “The Plot to Subvert an Election: Unraveling the Russia Story So Far,” New York Times, 20 September 2018 <https://www.nytimes.com/interactive/2018/09/20/us/politics/russia-interference-election-trump-clinton.html>
[vii] I plan to use the list of sources used by Pew Research Center in its 2014 study of Political Polarization and Media Habits: <http://www.pewresearch.org/wp-content/uploads/sites/8/2014/10/Political-Polarization-and-Media-Habits-FINAL-REPORT-7-27-15.pdf>
[viii] https://democrats-intelligence.house.gov/news/documentsingle.aspx?DocumentID=396

1 comment:

Christopher Tunnard said...

There's a lot to unpack from what you've written, but perhaps the best place to start is with your main Q. It's a How Q, which is fine, as you will have accepted (and have us, the readers, accept) that engagement is a given. The problem is that you then discuss a bunch of ways that you can go about visualizing and analyzing the problem, and it becomes difficult to understand how they combine to inductively address your How Q.

If you haven't already, I suggest you get together with Arik B. as soon as you can to see what's feasible, given the large amount of data from the Clemson dump and the October 17 release.