Thursday, October 19, 2017

A Light in the Dark: Using Social Network Analysis to Analyze Dark Web Groups

Background
The Internet has amplified the ability of individuals to connect across the globe.  As Internet technology proliferated, criminal organizations, terrorist networks, and other threat actors recognized its usefulness and began leveraging its benefits (i.e., criminal and terrorist organizations became “globalized”).  One of the most useful benefits of the Internet is the anonymity it provides actors, but users utilizing TOR (The Onion Router) software can increase their anonymity still further.

TOR anonymizes the browsing habits of its users by encrypting their network traffic, using specific TOR nodes to transmit the encrypted data, and following random paths to the desired servers.  Users also benefit from a type of “herd immunity” (i.e., the more TOR users there are in a given area, the more difficult it is to “de-anonymize” users).  Perhaps more interesting, TOR grants access to the Deep Web—users can access websites that search engines cannot find.  Deeper still is the Dark Web; a subsection of the Deep Web that is intentionally hidden and contains illegal/illicit activities.

Malicious actors (e.g., criminal organizations, terrorist/extremist organizations, etc.) frequent the Dark Web to purchase illicit goods as well as to communicate.  This communication occurs in different forums and Internet-relay chatrooms (IRCs).  IRCs frequently require specific knowledge to find and passwords to enter, forums by contrast are often open-access to anyone that can find them—this is because such forums are necessary to spread the various “messages” of the groups present. 

Utilizing the process that Elizabeth Philips, Jason Nurse, Michael Goldsmith, and Sadie Creese laid out in their paper, “Applying Social NetworkAnalysis to Security,” this analysis will explore how social network analysis techniques can provide insights into Dark Web networks.  While previous studies have qualitatively analyzed Dark Web forums, or used relatively small datasets, this study will utilize an extremely large dataset that spans over a decade of collection across multiple forums.   

TOR presents anonymity, but it is not truly anonymous.  Numerous techniques exist to de-anonymize TOR users (e.g., monitoring exit nodes, etc.), but law enforcement and intelligence agencies do not have the resources to de-anonymize every potential actor.  Social network analysis provides a tool to focus the efforts of such agencies on disrupting extremist networks.

Research Question
I will conduct a social network analysis of Dark Web forum message and posting metadata utilizing a dataset compiled from various English-language Dark Web extremist forums.  Each dataset spans a number of years and contains various numbers of members and postings, but I have compiled them into a single dataset that spans 13 years and contains over 2.5 million unique posts/messages.  I want to analyze three different aspects of the network:

1.     Can groups and leaders (i.e., hierarchy) be predicted or discovered based solely on metadata?
a.     If leaders can be discovered, how connected are those leaders?  Are they cross-forum, or are they leaders of only one forum?
b.     Are posters united, or do their beliefs and posts diverge?

2.     How much interaction is there between individuals and groups across the different forums?
a.     Do individuals or groups remain on a handful of forums, or do they spread across a wider network?
b.     How connected are the various groups?  Do Dark Web criminal organizations interact, or do they establish “turf”?

Theory & Hypotheses
I expect to be able to determine significant amounts of individual-based information based on friend-groups and communication habits.  Furthermore, I expect that I will be able to determine the hierarchy of Dark Web posting groups based on social network analysis techniques (i.e., distinguishing “broadcasters” from “sinks,” etc.).  I predict that leaders will remain on specific forums, but that lower-level individuals will act as “bridges” connecting forums and groups together.  Moreover, I predict that groups will distinguish themselves using specific language (i.e., slang) that denotes group-membership. 

Data Collection
I will utilize a variety of datasets created by Arizona StateUniversity’s Artificial Intelligence Laboratory.  These datasets are compiled from various English-language Dark Web extremist forums (e.g., Islamic Awakening, Islamic Network, Turn to Islam, etc.).  Each dataset extends over time—the shortest dataset covers 2 years, while the longest covers 8 years.  Taken together, the dataset is comprised of seven different Dark Web forums, spans 13 years, has over 48,000 members, and over 2.5 million posts.  Using Ucinet, a software package used for social network analysis, I will map and analyze the connections between posters, groups, and the forums themselves.  The social network data gathered will be directed and one-mode.  Nodes will correspond to forum members.  I intend to analyze content from 2000-2013 to track how the Dark Web community groups transformed over time.  Finally, I will analyze and determine sub-groups as well as the leaders of said sub-groups utilizing the attribute data (i.e., metadata) contained within the dataset (e.g., post date, member name, etc.).

Conclusion

The Dark Web has been a powerful tool that has connected criminal organizations, terrorist networks, and illicit actors worldwide.  Furthermore, given that the technology used to access the Dark Web (i.e., TOR—The Onion Router) provides substantial, though not complete, anonymity, these actors can use the Dark Web to avoid surveillance while simultaneously conducting their business in “the open.”  If the metadata scraped from Dark Web forums can provide insight into the organizational structure and leadership behind these shadow groups, then the effectiveness of law enforcement and intelligence organizations will be multiplied.  Analyzing the social network surrounding these forums will hopefully allow us to make inferences and predictions about how communication habits predict hierarchies and social organization.

1 comment:

Christopher Tunnard said...

Very nice job, Mr. Dark Web. As discussed in class, it needs a Key Question to focus data collection and analysis, and you could also expand a bit more on which net measures you will use, and what results they might yield. I hope that you will actually do this work as all or part of a capstone, as there's some real value to be gained from it. Also, if your job interests lie in this direction, it would look very good in the Experience section of your CV.

BTW, if you are willing to share the data set that you complied, I'm sure others would be interested.