Thursday, October 19, 2017

Analyzing the Arabic-Language Media Landscape


Background and Research Questions

Over the past few years, researchers have found that Americans have become more polarized in their media consumption.[1] Social media allows one to avoid exposing oneself to opposing viewpoints, and media outlets on opposite sides of the political spectrum tend to focus on different issues.[2] 

Although much research has been conducted on the topology of the U.S. media landscape,[3] no similar analyses have been conducted on Arabic-language media outlets. This raises the following questions: How much polarization or overlap is there among the major Arabic-language media outlets, in terms of their news coverage and audience? Do these media outlets’ coverage and audiences vary substantially based on their location, ownership, or political leanings?

Social network analysis provides a way to visualize the degree of similarity between these media outlets’ coverage and followers. It can illuminate whether certain media organizations play a central role in the Arabic-language media ecosystem, as measured by betweenness – that is, whether they act as “brokers” that transcend national media landscapes, with audiences that consume content from very different types of media outlets. Finally, social network analysis can show whether the geopolitical divisions in the Middle East today are reflected in media outlets’ coverage and audiences.

Hypothesis

I hypothesize that media outlets that brand themselves as pan-Arab (like Al Hayat and Al Arabiya) will cover a broader range of issues and have more diverse sets of followers than media outlets that focus on news in a particular country. The former will likely be more central nodes in the network, whereas the latter will probably cluster with one another based on the country where they are headquartered, and be relatively isolated from other outlets.

Clusters will probably also form based on media outlets’ geopolitical alignment (which, in a region with few press freedoms,[4] will usually reflect the stances of their country of origin). For example, media outlets supportive of Iran, the Syrian government, and Hezbollah may cover similar issues and have similar followers; while those that support Saudi Arabia and its major regional allies, Egypt and the United Arab Emirates, will also cluster together. Media outlets in countries that are not as strongly aligned with the groupings mentioned above, such as BBC Arabic, may act as “brokers” in terms of their coverage and followers.

Data and Methodology

To gauge media outlets’ coverage and audience, I will use publicly available data from their Twitter accounts: namely, the hashtags they use and the composition of their followers on the social media network. Media outlets that cover many of the same issues, and that have significant overlap in their followers, can be said to have strong ties with one another; whereas outlets with little similarity in audience and coverage will have weaker ties.

Twitter hashtags, which are used to identify the main topic of a piece of content, can be used as a rough proxy for the issues that media outlets cover. Hashtags have been described as “the definitive way to group tweets on the same subject”[5] and as having “proven most useful for filtering conversations about events.”[6]

I will gather attribute data for each media outlet to enrich the analysis. This data will include the country where the media outlet is headquartered; whether or not its primary output is in Arabic; whether or not it is state-owned; whether it focuses on a single country’s news or is “pan-Arab”; and the media outlet’s main format (newspaper, television, or web-only).

To measure the tie strength between any two media networks, I will use the Jaccard index, a measure of the similarity between two sets. It is calculated by dividing the intersection of two sets (that is, the number of items they have in common) by the union of those two sets (the total number of items contained in them).[7] The higher the value of the index, the more similar the media outlets are in terms of coverage and audience. A Jaccard index of 1 means that two media outlets use identical hashtags or have identical sets of followers, whereas a Jaccard index of 0 means that there is no overlap at all. Calculating this for each media outlet dyad will allow me to create two valued, two-mode, undirected graphs – one linking media outlets based on their hashtag usage, and the other connecting media outlets based on the extent to which their Twitter followers overlap. I will try dichotomizing at several values of the Jaccard index to visualize both weaker and stronger ties among media outlets.

I was unable to find any pre-existing datasets comparing Arabic-language media outlets’ coverage and audience makeup, let alone data specifically on hashtag usage or Twitter followers. To obtain this data, I wrote a scraper in Python that downloads the 3,200 most recent tweets for any given Twitter user (the maximum that Twitter lets one download for free), and stores the content of these tweets in a spreadsheet. It also downloads a list of the 100,000 most recent followers of that Twitter user.

Limitations

Perhaps the biggest limitation is one imposed by Twitter: Scraper programs cannot harvest more than 3,200 tweets from a given user, and can download the usernames of only 200 followers per minute. For Twitter accounts with more than 10 million followers, it would take about a month to obtain the entire follower list. Accordingly, I will be working with 100,000-user samples of outlets’ Twitter followers, which could yield imprecise estimates of the overlap between two users’ followings.

Analyzing media outlets’ coverage based on hashtag usage also presents several complications. Certain media outlets use hashtags more often than others. Because of this, one outlet’s most recent 3,200 tweets may span a different date range than another outlet’s most recent tweets. Some media outlets also use hashtags that are specific to that organization. Furthermore, hashtags only reflect a small portion of media outlets’ output. Although many media outlets will publish a tweet for each article they publish, they may not use hashtags – and others may not tweet at all.

Finally, in many cases, using a certain hashtag will not reflect the bias or angle of the underlying content. Other hashtags, however, are explicitly political – such as “Boycott Qatar” – and media outlets that use such hashtags can be reasonably assumed to share the position for which the hashtag advocates.



[1] Amy Mitchell et al., “Political Polarization and Media Habits,” Pew Research Center, October 21, 2014, http://www.journalism.org/2014/10/21/political-polarization-media-habits/
[2] Jon Keegan, “Blue Feed, Red Feed,” Wall Street Journal, May 18, 2016, http://graphics.wsj.com/blue-feed-red-feed/
[3] “Partisan Right-Wing Websites Shaped Mainstream Press Coverage Before 2016 Election, Berkman Klein Study Finds,” Berkman Klein Center, August 16, 2017, https://cyber.harvard.edu/node/99982
[4] Dominic Dudley, “Media Clampdowns Send Middle East Countries Tumbling Down Press Freedom Index,” Forbes, April 26, 2017, https://www.forbes.com/sites/dominicdudley/2017/04/26/press-freedom-index/
[5] Dave Lee, “How Twitter changed the world, hashtag-by-hashtag,” BBC News, November 7, 2013, http://www.bbc.com/news/technology-24802766
[6] Matt Stevens, “As the Hashtag Celebrates Its 10th Birthday, Are We #Blessed?” New York Times, August 23, 2017,  https://www.nytimes.com/2017/08/23/business/hashtag-anniversary-twitter.html
[7] Loet Leydesdorff, “On the normalization and visualization of author co-citation data: Salton’s Cosine versus the Jaccard index,” Journal of the Association for Information Science and Technology, October 26, 2007, http://onlinelibrary.wiley.com/doi/10.1002/asi.20732/full

1 comment:

Christopher Tunnard said...

You've clearly done some thinking about this, right down to the use of a Jaccard index to compare any two networks as a measure of similarity, which is a very nice touch. Two comments: firm up your key question to help you manage the scope of the data collection and analysis. And be a bit clearer about what you can achieve by using node, subgroup, or whole-net analysis.

Looking forward to seeing this develop!