Preamble
The more I study poverty and inequality, the more I’ve
noticed that structural divisions in information sharing between countries,
cultures, and people are fundamental to conflict and inequality. Economists identify
information asymmetries as a core market failure in many developing economies. Reducing information asymmetries requires us
sharing information with each other, overcoming structural divisions. Put more
simply, we need to communicate and listen more, and more equally.
One potential structural division in information sharing is
language. People without a shared
language are simply less able to communicate to one another. And even when people do share a common
language, it may be the case that the flow of information is unevenly balance
in favor of native speakers of the core languages, such as English. One place where asymmetries in information
flows might be able to be measured is on the internet, by measuring the
directionality of the flow of data between countries.
A Network Analysis
I propose comparing the network of online data flows between
countries with the network of shared languages.
In particular, I would like to analyze the directionality of online content
flows, and measure how strongly the the information network correlates to the network
of shared languages between countries. The
scope of the analysis will be narrowed to text-based online media content and shared
languages, and will attempt to identify divisions, isolates, and countries that
could be groomed to be connectors in the global network of information sharing and
language.
Why this is Important
The key motivation of this analysis is that divisions in the
internet community and global language landscape are a key indicator of
divisions in communication, and are likely to be a contributing factor to real
economic and political divisions between countries, cultures, and people. In
addition to increasing the likelihood of conflict, divisions in language and
information also reduce the size of the knowledge commons and the slow down the
pace of our collective learning in all fields of knowledge. Some languages are also unequally favored in
terms of the content available to speakers of that language, as the map of
English Wikipedia content below indicates, as well as the language network map
further below. Gaining a better understanding of where the weak points are in
the global network of language and information flows will better equip us to
address them, by informing us of the languages and countries that are
critically isolated or who are potential connectors that could be used to
bridge divisions in language and information sharing.
Data Required
The network of online information flows will constitute data
from the International Telecommunication Union (ITU), and the Berkman Center
for Internet and Society will be approached for the data that they have already
published on international data flows, through their Internet Monitor project.
The dataset that I hope to obtain will have country level data that records the
sum of the total data requested by each country of each other country in the
last year.
The data for the language network will be constructed using public
records of the languages spoken in each country, and the populations that speak
them. The structure of the language
dataset that I intend to build will be symmetrical such that for each country
pair in the matrix there will be a measure of the percentage of people in those
two countries who share a language. As is illustrated in the table below. In
this example, 50% of people in countries A and B share a language, whereas only
10% in B and C share a language:
Country A
|
Country B
|
Country C
|
|
Country A
|
1
|
0.5
|
0.33
|
Country B
|
0.5
|
1
|
0.1
|
Country C
|
0.33
|
0.1
|
1
|
A second option would be to map the network of translations between languages. This has been
done before, as shown in the image below. However, this method is less ideal for the
country comparison between internet flows and language, as is intended in this study.
Details of the Network
Analysis
The network analysis will focus on the following methods:
- Sub-group analysis: The first step for conducting the analysis will be to visually compare the distribution of the sub-groups in the two networks under analysis. Within the sub-group analysis, the relative strength of the groups will be measured using the I-E measures.
- Betweeness: Identifying countries/languages with high betweenness will be a key method for finding countries that are connectors within the network, and who facilitate information sharing. Contrastingly, high betweenness may also indicate that the connection between sub-groups is too dependent on a limited number of countries, and is vulnerable connection.
- InDirection: this will be used to identify countries who are having their online information requested from them by many other countries. It should be noted that directionality will not be measured in the language network due to the symmetry of the data.
- OutDirection: this will be used to identify countries who are requesting online information from many other countries.
Next Steps for
Further Analysis
Core issues that are unlikely to be addressed by this
network analysis but that should be followed up in future analysis include:
- Test to see if there is a correlation between the internat and language network maps and political divisions, such as by looking at treaty networks. Similarly, it would be interesting to measure the correlation with the real economy, such as by looking at trade flows and trade agreements.
- Identify key languages and countries that should be groomed for being connectors and diplomatic links between otherwise disconnected sub-groups. These will be countries that have high potential for betweeness.
- Investigate technology based methods for reducing internet and language divisions. This might include investing in improved automated translation services between languages of critical interest.
1 comment:
I get that you want to look at shared languages, but I still don't see a network there. Looking at the directionality of flows is a start, but there's a bit of a "so what?" attached to that. And how exactly will you "measure how strongly the the information network correlates to the network of shared languages between countries?" And what are the sub-groups you'll study? Trade and treaty nets? Etc? You are using a bit of a shotgun approach--hoping you'll hit something.
This is going to take some work to get right, but you'll get there.
Post a Comment