Friday, June 5, 2015

Gold in the Data

As Facebook rise, social network become the focus of attention once again. Compared with the traditional BBS, blog, social network is bridge to link a virtual world and real world. From the perspective of the classification of social network, Facebook, Twitter, LinkedIn represent three different kinds of social networks. Facebook social networking is based on the strong relationship between friends, help to friends of the relationship between maintaining and improvement; Twitter is based on one-way attention weak relationship of the social network, the network is conducive to shape the opinion leaders and the spread of news; LinkedIn is vocational social network for business people, help the user to use social relations for business communication and recruitment.

This three kinds of social networks have every day a lot of User Generated Content, and with unprecedented scale and mass, attracts numerous researchers to discover valuable information from the disorderly data. It's like frequently in probability and statistics for example, calculate the probability of the front side of the coin from throwing in the result it is difficult to see law of several times, but by tens of thousands of times of throwing experiments, it is easy to see that both the front and the occurrences of nearly equal. Social networks generated a lot of scale, social data, attracted including computer science, psychology, sociology, journalism and communication in the fields of experts and scholars to study and explore, hope to be able to use a stronger social network analysis and find out more processing power of human has not been explored.

For a wide range of social network analysis and research, there are many interesting research topic. Identification of a circle, for example, in a social network Community Detection, social network calculation, the influence of characters in the transmission of information on social networks model, false information and identification of bots, based on the social network information for the forecast of stock markets, elections, and infectious diseases, etc. Social network analysis and study is a cross field of discipline, so in the process of research, we usually use of sociology, psychology, and even basic conclusions and medical theory as a guide, through the use of machine learning in artificial intelligence, such as graph theory algorithm of social network behavior and simulate and predict the trend of the future.

For me, the most interesting thing is use SNA to predict the future. Social network to attract hundreds of millions of people every day on the Internet to release their own data, status, mood, these data gives us the opportunity to discover something we need to know. Such as by monitoring the Twitter public sentiment data, found the mood of the public data and many social phenomena and events have strong correlation. For example, some researchers found that regardless of whether they are "hope" positive emotions, or "fear" of the embodiment of the negative emotions are heralds a drop in the stock market index. Researchers believe that as long as there is in a social network public mood suddenly changed, will reflect on the uncertainty of the stock market, so the signal can be used to predict the future direction of the stock market.

However, our ability to predict for using social network data is also not too optimistic attitude, because of the social network prediction is based on the huge amounts of data, but for the massive text data has yet to reach the ideal accuracy analysis of the algorithm. Especially for judging from text messages for mood this seemingly simple question, its essence is the overlapping problem of unity and natural language processing and emotional psychology. But the current main methods of natural language processing by using the method of probability and statistics, as well as the lexical and syntactic analysis. The text emotional judgment is also based on thesaurus and grammatical structure of judgment and the method based on machine learning. However these methods for a bit complicated, especially with irony, and it is difficult to effectively determine the implied meaning of the language. In addition, for the use of social networking groups can not completely represent the effective mass, because people use social networks and age, regional, ethnic, and so on have very big difference, so only use social network on the data of prediction is likely to be biased and the final result, so the scientific and effective sampling method from the viewpoint of people for social network prediction is also particularly important one link.

(Posted on behalf of Bevis Zhang)

1 comment:

Christopher Tunnard said...

OK, but this post was supposed to be about how you would use SNA to diagnose a problem that you're interested in, gather data, analyze, etc.