Monday, October 23, 2017

How Can Big Data Companies Working in Political Risk Benefit from SNA?


Situation                    

Political risk companies today use data mining to increase the frequency and accuracy at which they detect events that might affect the stability of a region of interest. This involves looking at open source information from twitter, Facebook, websites, rss feeds etc.

These companies’ work is “event-centered” i.e. the primary goal is to detect events that have occurred from posts by sources, online according to the below model. This means that most human and financial resources are spent looking for sources of information and enhancing the way this information is mined and organized. 


Complication
At the heart of this process lies the assumption that the more an information is repeated by different sources, the more likely it is to be true. Although this quantitative approach allows us to look beyond political biases, today, it can easily fall victim to bots, fake news and amplification.


Research Question
If we shift our attention from the information to the sources themselves, can we find ways of distinguishing between noise and sound?


Methodology:
  • Select an event or a set of events that have made a lot of “noise” – that have been mentioned by a large sample of sources. (natural disasters, large scale protests, terrorist attacks etc.) 
  • Using an existing company’s data mining algorithms, look for the following elements: 
  1. Name of Source
  2. Type of Source (twitter, Facebook, website, etc.)
  3. Source Bias (Based on key-words) 
  4. Time of post (compared to time the event actually occurred)
  5. Link to other sources that have mentioned the same event
  6. Number of times the same source posted about the same event
  7. Source location (if possible)
  • Create a data set with the above information

The Social Network Analysis
  • By creating a time sensitive social network – one that mentions sources according to the time they posted information about an event, we can identify those that are faster.
  • By combining the above network with betweeness measures, we can look at the shortest path between the occurrence of an event and its detection by a source. Thus, companies can focus their attention on specific sources.
  • By looking at the information shared between sources (links, mentions, retweets) and combining them with centrality measures such as in and out degrees, and eigenvectors we can identify find the sources that are acting as emitters, relayers, influencers, etc. 
  • Information timeliness and accuracy could be combined with the sources biases to determine who’s amplifying a perspective and why. 
Outcome:
The hope is that the above Social Network Analysis will generate patterns of event reporting that will allow the company to allocate financial and human resources more efficiently by focusing on specific sources.

Shifting the organization’s focus from events to sources could pave the way for the development of new algorithms that could measure opinion on events, topics and even personalities.



I am taking the 2nd module of the course














1 comment:

Christopher Tunnard said...

We have discussed this, so you already know that I think it’s a good idea. The question seems well-formulated, and the data sources are clear. It seems to me that you would want to do a bit more sophisticated analysis then you outlined here, namely subgroup and clique analysis and perhaps something on triadic closure, which we can talk about.

Overall, this seems solid. Let’s hope you can get your hands on the data And agreement on project scope from the client.