Sunday, October 21, 2018

Integrating Social Network Analysis and Geospatial Analysis to Predict Latrine Ownership in India

1. Background
Sanitation is one of the key issues in developing countries. In 2015, 2.3 billion people lacked even basic sanitation services, and 892 million people worldwide still practiced open defecation (WHO and UNICEF, 2017). Lack of access to adequate sanitation leads to contraction of diseases such as diarrhea through ingestion of pathogens from untreated feces. Those diseases are estimated to cause 280,000 deaths every year (PrüssUstün et al., 2014).

There are several factors which affect the household’s ownership of latrine (toilet). Previous studies mainly focused on the effects of socio-economic conditions, financial conditions, and dissemination of information on the health benefits of a latrine. However, less literature focuses on peer effects among households, which means how latrine ownership of a certain household will affect the latrine ownership among neighboring households. Thus, in this research project, I focus on clarifying how social interactions among households affect the decision of latrine ownership.

Economists have mainly conducted researches on this topic and suggested that peer effects can be both positive and negative depending on the target areas and experimental designs. (Pattanayak, 2009; Guiteras et al., 2015; Yishay et al., 2016). On the other hand, Shakya et al. (2015) is the first paper which used social network analysis in examining the peer effects of latrine ownership. They showed that individuals are more likely to own latrines if their social contacts own latrines. This relationship among households is stronger among those of the same cast, same education, and those with stronger ties. In addition, individuals who are more central in the network are more likely to own latrines.

However, the analysis in Shakya et al. (2015) can be further deepened in the following ways. First, they only focused on the social ties between households and didn’t consider the distance among households. By incorporating geospatial aspect in social network analysis, I attempt to predict the interaction among households more precisely. Second, they haven’t considered how leadership affects the social contagion of latrine ownership. I will look at whether leaders in the village are more effective than other people at influencing people to own latrine.

2. Research Questions and Hypothesis
This research intends to answer how latrine ownership of certain households is affected by the status of latrine ownership of social contacts/neighbors. Specifically, I will focus on the following research questions.

1) Does the analysis of using social ties and distance among households reach the same conclusions as the analysis of using only social ties?

2) Does this combined method enable us to identify the significant social effects which were found to be insignificant in Shakya et al. (2015)? (eg. social effects in the network of “Go to temple with”)

3) Are leaders more competent in influencing others’ decisions in owning latrine?

As for the first and second questions, I expect that the distance is another factor in predicting how latrine ownership spreads across households. By combining distance with social ties, I can better predict the social effects on latrine ownership.

For the third question, I hypothesize that leaders have generally more power in influencing other decisions. But if leaders are further away from households, then the power of influence to those households decreases.

3. Data and Methods
I plan to use the social network data which were collected in the Banerjee et al. (2013). This dataset is mostly publicly available in Harvard Dataverse but GPS data of each household is not public. I will apply for IRB review and then receive additional GPS data from the authors of Banerjee et al. (2013).

Data includes network data on 16,579 individuals coming from 6,811 households in 75 villages. There are 12 types of network data, covering all individuals. The dataset also includes attribute data of all individuals. For example, you can know whether each individual owns latrine or not, whether each has electricity or not, and whether he or she is a leader or not. The dataset also contains information on caste, religion, language, and roof-type, etc. of each individual.

In the analysis, I first construct a dataset of each distance between households using GPS data. I integrate the value of network ties with distance by certain formulas to create new indices. Then, I will run a regression to see the social effects on latrine ownership. The regression sets dummy variable of whether a certain individual (ego) owns latrine or not as dependent variable. For independent variables, I include interaction terms of similar dummy variable of whether social contact (alter) own latrine or not, multiplied by new indices. I also include caste, electricity, roof-type, religion, language as control variables. After running a regression, I will check whether the coefficient of the independent variable is statistically and economically significant. Additionally, I will run this type of regression repeatedly using each type of network data to capture the heterogeneity in the results.

For the third research question, I will run this regression with samples including leaders, and run same regression with samples including non-leaders. I will then compare the results of the two regressions.

4. Limitation
Since the data used in this analysis are cross-sectional, we cannot infer causality of social effects of toilet ownership from this result, even if we can conclude the association among households from the regression. We will need time series data or new experimental design to rigorously infer causality.

5. References
[1] Guiteras, R., Levinsohn, J., and Mobarak, A. M. (2015). Encouraging sanitation investment in the developing world: a cluster-randomized trial. Science, 348(6237), 903-906.
[2] Pattanayak, S. K., and Pfaff, A. (2009). Behavior, environment, and health in developing countries: evaluation and valuation. Annu. Rev. Resour. Econ., 1(1), 183-217.
[3] Prüss-Ustün, A., Bartram, J., Clasen, T., Colford, J. M., Cumming, O.,  Curtis, V., Bonjour, S.,Dangour, AD., De France, J., Fewtrell, L., Freeman, M. C., Gordon, B., Hunter, P. R.,
[4] Johnston, R. B., Mathers, C., Mäusezahl, D., Medlicott, K., Neira, M., Stocks, M., Wolf, J., and Cairncross, S. (2014). Burden of disease from inadequate water, sanitation and hygiene in lowand middleincome settings: a retrospective analysis of data from 145 countries. Tropical Medicine & International Health, 19(8), 894-905.
[5] Shakya, Holly B., Nicholas A. Christakis, and James H. Fowler. "Social network predictors of latrine ownership." Social science & medicine 125 (2015): 129-138.
[6] WHO and UNICEF (2017). Progress on drinking water, sanitation and hygiene: 2017 update and SDG baselines.
[7] Yishay, A. B., Fraker, A., Guiteras, R., Palloni, G., Shah, N. B., Shirrell, S., and Wang, P. (2016). Microcredit and willingness to pay for environmental quality: Evidence from a randomized-controlled trial of finance for sanitation in rural Cambodia. Journal of Environmental Economics and Management.


I am taking the second module of the course.

1 comment:

Christopher Tunnard said...

This is an interesting topic, and you are fortunate to have access to a rich supply of data, to be enhanced by the geo-spatial data you plan to add. I have some concerns: you are quite clear on how you can run regressions to look for significant factors, but this is supposed to be an exercise in social network analysis, not just statistical methods. I'm disappointed to see only regression analysis mentioned.

Perhaps you mean that you will do SNA when you say that you will "integrate the value of network ties with distance by certain formulas to create new indices?" If so, it's not clear. Also, I would hope that you'll use SNA to determine "how leadership affects the social contagion of latrine ownership." It is by doing this that you can infer the network effect of leadership, in addition perhaps to using regressions. All to be discussed.