Thursday, December 12, 2013

Data visualization of Wikipedia edits

This isn't exactly social network analysis, but I thought our class of data visualizers might appreciate this:,en

Friday, December 6, 2013

Network Analysis Showing Increasing Polarization of American Congress

I came across a very interesting article with a set of network maps of the 101st (1989 session), 107th (2002) and 113th (2013) Congresses. From the article published in the Economist:

"The network maps shown here look at the degree to which senators vote the same way. Each node is a senator. Links represent instances when senators have voted similarly on substantive legislation on at least 100 occasions during the same congressional session. Their placement is determined algorithmically, based on their co-operation with other legislators—which has the effect of pushing more bipartisan ones to the centre."

The conclusion of the analysis, done by a computer science undergrad at Harvard, is not surprising; Congress and American policymaking has become much more polarized over the past two decades.

Wednesday, December 4, 2013

"How Ukrainian protestors are using Twitter and Facebook"

"Taken together, our data suggests that Ukrainian social media users are strategically using the tools available to them in the ways that seem most effective. The disparity in language use between Facebook and Twitter suggests an understanding on the part of users about the audiences consuming the content they produce in each medium. The spike in Twitter use is, to our knowledge, a previously unobserved phenomenon. It suggests a reciprocal relationship between social media and protest, where social media can serve as an important strategic tool for protest, and at the same time attract new users to online communication platforms."

Facebook activity on public pages related to Ukrainian protests [Data: NYU Social Media and Political Participation ( lab; Figure: Pablo Barberá and Megan Metzger]

Ukrainian protests tweets by language [Data: NYU Social Media and Political Participation ( lab; Figure: Pablo Barberá and Megan Metzger]

Ukrainian protests and creation of new twitter accounts[Data: NYU Social Media and Political Participation ( lab; Figure: Pablo Barberá and Megan Metzger]

Tuesday, November 19, 2013

Is the U.S. Isolated mid-(Asia) Pivot? Network Showing Countries' TPP Negotiation Positions

"Political scientists often talk about dyads, by which we simply mean groups of two. In this case, a dyad refers to a pair of TPP countries. If we count up every instance that the United States appears in the same marker as, say, Australia, we can say that the U.S.-Australia dyad occurs with a certain frequency. If we did this for every possible dyad, we could compare the frequency of dyads and get a sense of how often countries’ negotiating positions overlap. The following chart displays the frequency of every possible dyad among the 12 TPP countries. For example, the U.S.-Australia dyad (AU-US) appears 83 times in the leaked text, and is the 43rd most frequent dyad. Note that the order doesn’t matter: a U.S.-Australia dyad is the same as an Australia-U.S. dyad."

Friday, November 15, 2013

Social Networks as a predictor for criminal behavior in Chicago

Chicago PD using SNA for murder forecasting

While Yale sociologists find social networks in victims of fatal shootings, the Chicago PD takes SNA in order to predict a "heat list"of likely shooters to reach out to. Fascinating and a bit troubling.

Sunday, November 10, 2013

Social Networks Critical to Understanding the Spread of Obesity

Health professionals in this study, published in The New England Journal of Medicine, conducted a social network analysis to see how person-to-person networks contribute to the obesity epidemic. Their conclusion was not very surprising: network phenomena is quite relevant to this epidemic and its spread depends on the social ties.

Tuesday, October 29, 2013

OK, this is stretching the definition of network, but...

This will reaffirm your confidence in the American educational system--or at least amaze you.

Monday, October 28, 2013

Researchers Draw Romantic Insights From Maps of Facebook Networks By STEVE LOHR

NYT upgrades its API for info on Congress

If you know what the title means, and you want to do SNA work on the U.S. Congress, you'll be excited. Yes, it's tech-speak, but the good news is that there is a lot more data not just available but easily downloadable. Have a read, then find a friend who understands how to do this. There are more around school than you might think.

Sunday, October 27, 2013

SNA project proposal: Duplicate patents - lessons to be found through SNA?

Hypothetical Project Proposal
If I was able to take the second module of the course, I would have liked to examine the the following issue affecting the patent industry.
A key issue affecting the technology industry is the role of patents. Parents are key in protecting new research, commercializing new technologies, and giving firms space to gain reward based on their research. Because of the importance of patents, and the supremacy of first filing in the patent process, there is often a race to patent forward looking ideas in order to capitalize on new ideas.
Given the volume of ideas presented, the US Patent and Trademark Office will sometimes grant patents for already patented technologies. An increasing problem facing the industry is the role of so called "patent trolls", overly broad patents. Patent trolls demand payment from users who utilize that technology, and because of the cost of litigation for smaller players, many often give into patent troll demands. Whether patent trolls, and other players, acquire or make duplicate filings are filed because of a lack of patent research, willful exploitation of the US Patent and Trademark Office's lack of capacity to closely compare new and old patents, or any other reason, the connections on how frequent offenders approach topics and patents would be useful in beginning to characterize these players, and inform policy makers and technology firms on how to approach these players.
On a sample of US patent data of duplicate patents, are there any clear connections between delicate filers, and are there any measures we could hypothesize that the USPTO could take to avoid granting duplicate patents.
Methodology :
Using US Patent data, and previous notable cases, a sampling of cases and players would be examined. Using the USPTO's website, and guided by qualitative research about notable disputes, a dataset could be built about patent filings on similar ideas. This project would need significant data manipulation, in terms of vetting the data, cleaning anomalies, and gathering sufficient information.
The data would be in a two mode dataset, with patent filers, and subject of patent. Additional attribute data collected should include year of filing, geographic location of filer, name of filer, and owners of the patents.  Likely, a script would be created to gather this type of data.

Saturday, October 26, 2013

Proposal: Evaluating Development Project Outcomes

Research Question:
Broadly: Can Social Network Analysis be a useful tool in evaluating development programs?
Specific to Current Proposal: What are some key predictors of household cook-stove purchases? Who are socially influential actors within a household and outside of it? Can recommendations be made to increase probabilities of purchase based on network analysis?
The field of international development is increasingly moving towards data-driven methodologies. Systematic evaluations, carried out before, during, and after implementing a program, can be useful not only in determining whether an extant project has been successful, but also by providing transferrable lessons that may reduce redundancies in future endeavors. Data driven methodologies, if combined with knowledge sharing among development organizations and governments, could lead to better overall outcomes. I believe that social network analysis has been an underutilized tool in evaluating development projects, and could become particularly useful for projects with strong socio-cultural components.
I am hoping to use pre-existing data-sets from a completed development project which contains strong social network effects on outcomes.
I have chosen a multi-part study on the dissemination of healthy cook-stoves in rural Bangladesh. This study, by Mushfiq Mobarak et al., evaluates how price and influential social actors affect villagers’ purchase decisions. Mobarak and his team primarily use econometric tools to carry out evaluations. I hope to use the study’s survey and attribute data sets to evaluate the project using social network analysis tools and determine whether new decision patterns and insights arise.
Data and Methodology:
Caveat: This study mentioned above is associated with MIT’s Abul Latif Jameel Poverty Action Lab (JPAL). My ability to carry out a SNA analysis is dependent on whether JPAL is willing to share its data-sets.
 Data-sets for some of JPAL’s projects are available for free here:
The Cook Stove Study was carried out between 2006—2009 and focused on a sample of 4000 households in 42 villages. Villages were randomly selected into 8 different control groups receiving different offer prices, community leader opinions, and combinations thereof. Data was also collected at the household level: cohabitant family members, relatives, smart household member, and “close” household member.  Data was collected in two stages: In the first stage, households were asked questions relating to their willingness to purchase a stove given relevant a price point and social network constraints; in the second-stage, follow-up surveys were carried out to see whether households had behaved according to their initial statements.
Attribute data will consist of the various relationship indicators and relevant community leader identifiers. A two-mode data set will be created and used to map relationship patterns and to determine individuals with strong eigenvectors both on a village-scale and within individual households. Ego-networks of village opinion leaders will be examined to determine the strength (or lack thereof) of their recommendations. One mode data sets will also be created to compare a household’s intentions to buy with final outcomes.
Ultimately, the hope is that potentially predictive patterns of behavior/influence emerge. Are women more likely than men (or vice versa) to purchase an environmentally friendly cook-stove? Are opinion leaders, in fact, influential in affecting a household purchase decision?
Final Thoughts:
Ultimately, this study could demonstrate an alternative method to traditional econometric analysis in evaluating development projects. I look forward to any comments or critiques of my project proposal.
Miller, Grant, and A. Mushfiq Mobarak. "Learning About New Technologies Through Opinion Leaders and Social Networks: Experimental Evidence on Non-Traditional Stoves in Rural Bangladesh." Working Paper, January 2013.

Behind the Sources - Project Proposal


On May 25, Hezbollah General Secretary Hassan Nasrallah for the first time publicly acknowledged Hezbollah’s military role in the Syria conflict and pledged to propel Syrian President Bashar Assad to victory against opposition forces. Specifically, Nasrallah addressed the role of Hezbollah in the battle for Qusair, a strategically located town near the northern border of Lebanon that connects the land corridor between Damascus, the seat of the Assad regime, and the Alawite stronghold of Latakia on the Mediterranean coast. Lebanese, Arab and Western news sources gave Nasrallah’s speech prime coverage. The media cycle just ahead of and immediately following his announcement was filled with news analyses ranging from pieces examining Hezbollah’s decision-making process and the potential military impact they would have on the Syrian war to impact of this development on Lebanon’s and the region’s stability.

With the 24-hour news cycle and the media-driven nature of the Syrian conflict, there is huge pressure on news networks to deliver immediate coverage and on the spot analysis as events develop. Often, this cycle drives reporters and news sources to reach out to experts to provide insights on these developments. Availability, relationships and networks of journalists or their news outlets affect who and how these experts are chosen and what perspectives and insights dominate the mainstream media discussion. These media debates and news outlets – and the experts to whom they provide platforms – have a huge impact on policy making and public opinion regarding the Syrian crisis. I propose to examine news analyses in the days leading up to and immediately following Nasrallah’s May 25 speech to reveal the networks of experts and journalism coverage that exist in a snapshot of such Syria coverage.

Primary Question:

Do news sources on Syria utilize a variety of “expert” sources or are certain voices dominating the coverage? Which experts dominate the mainstream media conversation on Syria?


Working as a journalist in Lebanon I watched the way networks of reporters and experts formed and how this affected coverage. Peer connections often determine access to experts and the breadth of points of view presented by a news source. I hypothesize that a social network analysis of media coverage will illustrate these networks and show that some voices outweigh others in the coverage on Syria, often giving coverage a certain political slant. These individuals, therefore, may have greater impact on public opinion and policy making.


To create the data set for this project I will choose a set of the top news sources in the US, Europe and within the region (written in English) to analyze. For each news source, I will look at the news analyses published between May 23 and May 30, 2013 – the period just before and the days after Nasrallah’s May 25 speech – and identify the journalists writing the pieces and the experts quoted in these articles. I will also create attribute sets for the experts and journalists including information on nationality, academic or government background and whether or not the expert is located in the region or is in the US or elsewhere.

The news sources:

US media: The New York Times, The Washington Post, The Wall Street Journal, National Public Radio, Syria Deeply, Al Jazeera America, Associated Press,

European media: BBC, The Guardian, The Financial Times, The Independent, Agence France Presse, Der Spiegel, Russia Today

Regional sources in English: The National (Abu Dhabi), Al Arabiya (Saudi-own, Dubai), The Daily Star  (Lebanon), Al Akhbar  (Lebanon), Haaretz (Israel), Al Monitor (translations from around the region)

Methodology/Important Network Measures:

Social Network Analysis provides a unique way to visualize this two-mode network of news sources/journalists and experts that other forms of data analysis cannot provide. I will analyze the network on a few different levels: the overall network to determine the most central experts and reporters, the networks by region of news sources and the individual ego networks of different news sources, journalists and experts.

Clique and sub group analysis can identify which groups of journalists or news sources are all utilizing the same expert analysis. Other centrality measures (betweenness, eigenvector) will help determine who are the most central experts and therefore which experts dominate the media conversation. I will also be able compare and contrast the networks by region of news source or nationality of experts, etc.


This project will take a critical look at the expert inputs to the media discussion on one event in the coverage of the Syria coverage. While this analysis is just concerned with a single event and time frame, the model could be applied to examining other types of media coverage and issue areas. Using social network analysis I hope to highlight the importance of sourcing the media sources and thinking critical about production of news and its impact.

Zeynep Tufekci on social media & Gezi Park

Check out the video from this event with Prof. Zeynep Tifekci talking about the Gezi Park protests in Turkey and the influence of social media and technology on the "boom and bust" protests around the world in recent years:

Friday, October 25, 2013

Maintenance of social networks of first-time offenders to aid their integration into society upon release

Deepti Jayakrishnan

As I am not taking the second module, I propose a project, that will take longer than a semester to complete, to study (a) the effect of mass incarceration on an individual’s networks (b) how modification of rehabilitative programmes can help prison inmates, especially under-trial prisoners, in maintaining their positive networks that will ease their integration into society upon release.

Introduction: Incarceration disrupts an inmate’s positive networks of family, school, romantic relationships, jobs and has the effect of filtering the inmate into a vastly different set of networks upon release. While in the short term, incarceration has a negative impact on inmate’s the capacity to get a job, maintain relationships with children/siblings, it also affects intergenerational transition of things such as poverty. The significance of the topic under study lies in the fact that under-trial prisoners in India often languish in prison for periods longer than those they would have to serve, if their trial were to result in a conviction. Upon release after serving the required years upon conviction or even if acquitted, the long separation from society and associated stigma has a severe impact on a former inmate’s ability to return to the life/ job and social position held, prior to prison entry.

Hypothesis and objective: Helping prison inmates maintain their social networks prior to incarceration will ease their integration into society upon release.

Scope of data collection and limitations: The target is a section of the population, chosen on the criteria below, at Tihar Central Jail, New Delhi, one of the largest prisons in the world. I choose this prison in India as it is front-runner of progressive prison reform in India and therefore, its management is likely to be more open to conducting such a study.
I limit the study to under-trial prisoners as they constituted 80% of the total population in Tihar, according to prisoner profile data in 2009. I also limit the study to those prisoners who are first-time offenders and awaiting trial for crimes not punishable with death or life imprisonment. This limitation may be removed at a later stage, if resources permit it and the pilot project is found to have some degree of acceptance/success amongst prison officials and inmates.

Methodology: Keeping in mind the fact that ideas such as social bonding, cohesion and control, opportunity structures, diffusion, trust, and peer influence have significant manifestations in social network analysis, I would conduct two surveys, one at the beginning of the period of incarceration and two, upon the inmate’s release.
The first survey would include questions on personal and familial attributes including employment history, prior criminal record, family structure (such as joint or nuclear) and income levels. It would also include questions regarding affiliations to professional and social organizations (formal or informal), professional, romantic and platonic relationships currently in.

The second survey would include similar questions, three persons they most frequently communicated with during incarceration and an additional component on whether the inmate used, and how often the inmate used, the Tihar prison facilities such as the art studio, the computer lab, yoga and meditation classes, vocational courses such as tailoring, baking, etc. It would be ideal if the second survey can be conducted three months after release, instead of immediately upon release, but given the practical difficulties involved, it can be done only with those inmates who are released on parole or bail. In that event, the second survey would include questions on current job held and whether it was obtained through a contact in prison such as official, consumer of Tihar prison products or fellow inmate.

Network measures used: Ego networks in order to determine the the opportunities and constraints inmates face; factions (after the second survey) to understand new relationships created within prison and their strength; cliques, if any and centrality measures i.e. in-degree to determine popularity and out-degree for influence within prison.

According to sociologists such as Andrew Papachristos, crime spreads through risky relationships and behaviors. This study will help determine if maintaining the networks prior to a fist-time offender’s entry into prison will reduce his/her chances of building new offending relationships or gang relationships, which subsequently deter rehabilitation upon release and affect outcomes like employment, and even mortality.

Bedi, Kiran, It's Always Possible: One Woman's Transformation of India's Prison System, Sterling, First edition, 2002, New Delhi.
Papachristos, Andrew V., “The Coming of a Networked Criminology? Using Social Network Analysis in the Study of Crime and Deviance,” Advances in Criminological Theory, Vol 13, 2011.
International Center for Prison Studies- India Prison Brief available at; accessed on 23 October 2013
Tihar Prisons: Prisoners’ Profiles available at <>; accessed on 23 October 2013

Cinderella and Stockholm Syndrome

In light of today's awesome conversation...

Efficiency of communications in Transparency International's network of organizations

Before joining Fletcher, I had the opportunity to work for Transparency International (TI), a non-profit organization that fights corruption in different levels. The interesting thing about this NGO is how it’s internally structured. Their webpage explains: "Transparency International consist of more than 100 chapters - locally established, independent organizations - that fight corruption in their respective countries." Essentially, what this means is that TI is a network of organizations (national chapters), which are subscribed to the cause of fighting corruption, but work autonomously from the Secretariat based in Berlin. They have there own budget, Board of Directors, organizational structure and culture. TI’s success is determined by the strength of the connections the Secretariat has with other NGO’s around the world who decide to become part of the network. Paradoxically, one of TI’s major internal issues is the non-effective way in which it communicates with the national chapters worldwide.

Increase efficiency in communications between Transparency International’s Secretariat and the National Chapters subscribed to the organization by understanding how the information is flowing between the different departments (regional and thematic) in TI and the National Chapters.

Research Question
How can TI be more effective to achieve its cause of fighting corruption worldwide by strengthening their network through more efficient means of communications with the national chapters?

The participants would be all the units, departments, programs and special initiatives (as defined in TI’s organizational chart made public in their website) and a selection of 25 different national chapters in representation of the 100 chapters that TI now includes in their network.

Measures and Procedures:
The participants would be administered a questionnaire in order to identify the following centrality measures:
  • Degree Centrality: to identify the number of connections different departments have among themselves and with the national chapters and vice versa.
  • Closeness: to identify which departments have the most ties with chapters and vice versa.
  • Betweeness: to understand how the flow of information is controlled.

  • The connection amongst departments and chapters very often depends on the personal ties created by the employees and therefore the analysis would change if personnel leave.
  • The information drawn from the survey could potentially be incomplete since we would be collecting data from departments that are composed by more than 1 person and it is difficult to ensure the participation of everyone.

The Effect of Contested Elections on Partisanship

(Posted on behalf of a student who wishes to remain unnamed on the Web)


The rise in the number of gerrymandered districts in the United States corresponds to an increasing tendency toward extremism and partisanship in American politics. The two trends are particularly disconcerting because they empower each other, resulting in Congressional sessions that are both inefficient and arguably ineffective. Public frustration has led to calls for fairer districting and more bipartisanship in Congress. One suggestion for producing more moderate politicians is to establish fairer districts – districts in which there will not be a “safe seat” in an election and candidates must appeal to swing voters to achieve a victory.

While research on the benefits of such highly competitive elections has proven inconclusive, the idea is intuitively appealing. Garnering support for fairer districting, however, will require more than intuitive sense. Consequently, this project seeks to contribute to the evolving body of research on the effects of “highly contested” elections through the employment of social network analysis. It will focus on the House of Representatives, where districts shape the election and politicians are constantly considering the next election cycle.

Objective & Research Question

Do highly contested elections produce Representatives who are more or less partisan than their House colleagues?


Because highly contested elections depend heavily on the choices of swing voters, it is hypothesized that Representatives from highly contested districts will want to “signal” that they not only share the ideological leanings of voters on both sides of the political spectrum but also support policies that produce outcomes all of their constituents desire. They will consequently form networks with stronger bonds across party lines than their “safe seat” peers.


The project will start with an analysis of a two-mode network representing affiliations between Representatives during House sessions (one network per session). Nodes will be analyzed using degree centrality, betweenness centrality, and eigenvector centrality.

Network links between Representatives can demonstrate the “strength” of connections between Representatives by highlighting how frequently they support each other or interact at a policy level. Analysis on these networks will seek to determine whether bipartisan Representatives (“highly contested” Representatives) are actually at the center of the House network or act rather as links between dense, centralized party networks.

The majority of the analysis will be conducted through network comparisons, attempting to draw conclusions from the differences in patterns and network structures. For example, the project will seek to compare the ego network of a highly contested Representative to the ego network of a “safe seat” Representative with the hope that the process will provide insight into their structure, density, and affiliations. And, in the event that enough data can be collected, the project will also attempt to compare a House network during a congressional session in which there were more highly contested seats with a House network during a congressional session in which there were fewer.


Data will be drawn from publically available information on Representatives themselves, House voting records, committee membership, and House bills (passed and unpassed).

Primary data will be collected for House networks, including: • Representatives’ connections through their voting records. • Representatives’ connections through the bills they sponsor – with whom and on what issues.

Attribute data will be collected for each Representative, including: • Gender • Party Affiliation (republican, democrat, other) • Election results (margin of victory) • Number of previous terms • Education • Age • Issues important to voting constituents during the election (jobs, healthcare, farm subsidies, etc) • Committee memberships

Limitations and other considerations

Data limitations –

Project will be conducted using only information that is publically accessible. House networks calculated through this project will therefore reflect nothing about the personal relationships between Representatives or any behind-the-scenes politicking that could very well indicate stronger bipartisanship than voting records and bill sponsorship could demonstrate.

Things to Consider –

It is going to be necessary to define “highly contested” in the context of elections so as to ensure that consistency is maintained in distinguishing highly contested Representatives over multiple House sessions. One way to define the phrase might be to establish a specific range of “margins of victory”; any Representatives who win their elections by a percentage of votes within the established margin will be considered a “highly contested” Representative. Additional questions include: What does “bipartisan” look like in a social network? Is it an equal number of connections among both parties or just stronger connections across party lines (in relative terms) than party peers? The project will also require clarifying how many Congressional sessions would be necessary for valid analysis.