Sunday, October 25, 2015

International Languages and Online Content Flows: A Comparison of Networks


Preamble
The more I study poverty and inequality, the more I’ve noticed that structural divisions in information sharing between countries, cultures, and people are fundamental to conflict and inequality.  Economists identify information asymmetries as a core market failure in many developing economies.  Reducing information asymmetries requires us sharing information with each other, overcoming structural divisions. Put more simply, we need to communicate and listen more, and more equally.

One potential structural division in information sharing is language.  People without a shared language are simply less able to communicate to one another.  And even when people do share a common language, it may be the case that the flow of information is unevenly balance in favor of native speakers of the core languages, such as English.  One place where asymmetries in information flows might be able to be measured is on the internet, by measuring the directionality of the flow of data between countries.

A Network Analysis
I propose comparing the network of online data flows between countries with the network of shared languages.  In particular, I would like to analyze the directionality of online content flows, and measure how strongly the the information network correlates to the network of shared languages between countries.  The scope of the analysis will be narrowed to text-based online media content and shared languages, and will attempt to identify divisions, isolates, and countries that could be groomed to be connectors in the global network of information sharing and language.

Why this is Important
The key motivation of this analysis is that divisions in the internet community and global language landscape are a key indicator of divisions in communication, and are likely to be a contributing factor to real economic and political divisions between countries, cultures, and people. In addition to increasing the likelihood of conflict, divisions in language and information also reduce the size of the knowledge commons and the slow down the pace of our collective learning in all fields of knowledge.  Some languages are also unequally favored in terms of the content available to speakers of that language, as the map of English Wikipedia content below indicates, as well as the language network map further below. Gaining a better understanding of where the weak points are in the global network of language and information flows will better equip us to address them, by informing us of the languages and countries that are critically isolated or who are potential connectors that could be used to bridge divisions in language and information sharing.




Data Required
The network of online information flows will constitute data from the International Telecommunication Union (ITU), and the Berkman Center for Internet and Society will be approached for the data that they have already published on international data flows, through their Internet Monitor project. The dataset that I hope to obtain will have country level data that records the sum of the total data requested by each country of each other country in the last year.

The data for the language network will be constructed using public records of the languages spoken in each country, and the populations that speak them.  The structure of the language dataset that I intend to build will be symmetrical such that for each country pair in the matrix there will be a measure of the percentage of people in those two countries who share a language. As is illustrated in the table below. In this example, 50% of people in countries A and B share a language, whereas only 10% in B and C share a language:

Country A
Country B
Country C
Country A
1
0.5
0.33
Country B
0.5
1
0.1
Country C
0.33
0.1
1

A second option would be to map the network of translations between languages. This has been done before, as shown in the image below. However, this method is less ideal for the country comparison between internet flows and language, as is intended in this study.



Details of the Network Analysis
The network analysis will focus on the following methods:
  • Sub-group analysis: The first step for conducting the analysis will be to visually compare the distribution of the sub-groups in the two networks under analysis. Within the sub-group analysis, the relative strength of the groups will be measured using the I-E measures.
  • Betweeness: Identifying countries/languages with high betweenness will be a key method for finding countries that are connectors within the network, and who facilitate information sharing.  Contrastingly, high betweenness may also indicate that the connection between sub-groups is too dependent on a limited number of countries, and is vulnerable connection.
  • InDirection: this will be used to identify countries who are having their online information requested from them by many other countries. It should be noted that directionality will not be measured in the language network due to the symmetry of the data.
  • OutDirection: this will be used to identify countries who are requesting online information from many other countries.

Next Steps for Further Analysis
Core issues that are unlikely to be addressed by this network analysis but that should be followed up in future analysis include:
  • Test to see if there is a correlation between the internat and language network maps and political divisions, such as by looking at treaty networks. Similarly, it would be interesting to measure the correlation with the real economy, such as by looking at trade flows and trade agreements.
  • Identify key languages and countries that should be groomed for being connectors and diplomatic links between otherwise disconnected sub-groups. These will be countries that have high potential for betweeness.
  • Investigate technology based methods for reducing internet and language divisions. This might include investing in improved automated translation services between languages of critical interest.

1 comment:

Christopher Tunnard said...

I get that you want to look at shared languages, but I still don't see a network there. Looking at the directionality of flows is a start, but there's a bit of a "so what?" attached to that. And how exactly will you "measure how strongly the the information network correlates to the network of shared languages between countries?" And what are the sub-groups you'll study? Trade and treaty nets? Etc? You are using a bit of a shotgun approach--hoping you'll hit something.

This is going to take some work to get right, but you'll get there.