Thursday, June 2, 2016

Identification of high risk depression patient via SNA method

             Xin Li-MBA-2016



Problem/Challenge

Depression is a very common mental disorder.  According to WHO, there are approximate 350 million people that are diagnosed as depression covering all age groups.  Depression is also one of the major reason that people get disability. Furthermore, depression can even cause patient to suicide. 
Even though there are some effective therapy for depression patients, less than 50% of depression patients get effective treatment globally.  In many countries, only less than 10% of patients get effective treatment.  One of main reasons is that identifying people in high risk of depression is very difficult.  Therefore, finding a method to identify high risk depression patient has significant meaning for public health and human wellness.
Social network analysis method and development of social media with big data provides us a way to resolve this issue. 

What data do I need? Easy/hard to get?

Data needed:
Easy /Hard to get?
Reason why I need this data:
How will I deal with data? Coding.
Name
Easy
Name is important biometry information to identify people
Recorded as biometry information.
Gender
Easy
Woman has more possibility to get depression.
Weigh more for female
Job title
Easy
Some high pressure industry may has high risk that lead to depression.
High risk:  3
Mid  risk: 2
Low risk:  1
No risk:   0
Living country
Easy
People from some country in cold area has high risk like ones in Russia, North Europe.  And also there some political and nationality reason that cause high risk of depression.
Weight for high risk countries.
Historical post
Easy
High risk people like to post their own angry or over-excited feeling in social media.
Filter key word and rate them as weight.
On-line interval
Easy
The possibility to get depression is higher for people who stay long on-line than the one for people who stay shorter on-line. On the other hand, longer on-line time means less sleeping time that is potential symptom of depression.
> 12h    5
8-12h    4
6-8h    3
4-6h    2
2-4h    1
0-2h    0
Response time and frequency for reply and like/unlike
Easy
The faster and the more frequently response, like and unlike,  the more possibility to get depression
Frequency=FR
Response time=RT
Higher FR/RT means higher risks.
FR/RT percentile:  
99%        3
97%        2
68%        1
Family connections
Easy and hard
Genetic factors: If subject has family connections who are identified as depression patients, the subject has higher risk.  The connections information are easy to get, but it’s hard to get depression patients’ information.
SNA method
Employment status
Easy
Unemployment subject has higher possibilities of depression.
Yes 0
                No 1
If Heart disease?
Hard
Heart disease patient has higher possibilities of depression.
Yes    1
No    0
History of losing family or horrible accident
Hard
The event of losing family or horrible accident can cause depression disease.
Yes  1
No  0


What will be the most important network measures? What will the SNA help me do?

Once I calculated all the weight, I will combine the weight information for each subject I got for first round analysis. I can label high weighted subject, for example bigger size, and analyze the social network via SNA method.
If I can find some subjects with bigger weight who have family connections with some confirmed depression patients, the subjects might have high risk.    
Next steps are to further investigate each risk factors of the selected subjects.  Based on the attributes of subjects, check his/her historical posts to further evaluate.
With healthcare professional’s help, we can identify the high risk subject more scientifically. And then provide interception treatment proactively.

Let’s make the world happier!

1 comment:

Christopher Tunnard said...

You do a great job of laying out the data you need and evaluating its availability and how you'll deal with it. For an ordinary statistical analysis of a dependent variable, this would have been very good.

The problem is, you don't really have a network here; what you're doing is figuring out how to identify individual nodes but without any consideration (that I can see) for the network effect. You talk about calculating "all the weight," but it's unclear what that refers to, or means, and there are no network measurements mentioned other that "the SNA method," whatever that means.

You might have given some thought to what kind of networks could give researchers some insight into the incidence of depression. I'm not talking about individual node attributes, like the ones you mention, but some kind of co-occurrence (e.g. served in the military.)

You have done a really good job of laying out the data, as I said, but more consideration of the network aspects would have made an OK post much better.