Friday, October 21, 2016

Discovering the network behind the founder’s myth

Discovering the network behind the founder’s myth
Bin Feng Zheng (Currently not taking the second module)

Every great business has a great story and a founder who speaks to their values.  We have come to learn a great deal about these founders, specifically, their personal traits that led them on their journey.  I am interested in the social network behind these founders myth.  I propose that success is a product of the social network around you and that the right social network around you can predict success. 

Taking a step back, it is important to analyze this genesis point in which business ideas become products and products become wildly successful.  My specific focus will be on the commercial drone sector based in the West Coast of the United States.  The commercial drone industry is at a critical moment in its young history.  Hardware, software, regulation and capital are all aligning to potentially make them the enabling technology of the future.  It is also an industry that is really wide-open with no dominant players.  Thus, the networks within the industry may well be forming.  I want to focus on the West Coast, to provide a geographical limit but also recognize the considerable advantage of access to Silicon Valley and venture capitals.   

Social Network Question/Research Question
This is really a test case about social networks around ideas.  I want to see if social networks have an impact on ideas becoming products in the business world.  One approach would be to look at historical data to tease out connections.  However, I think it would be more much interesting to examine networks that are still forming, to project and forecast where the industry is heading in terms of its social relations.  In this case, the context is the commercial and business drone sector in the West Coast.

The social network questions I will be asking are the following:
1.     Does a network exist?  Can we tease one out?
2.     What are the attributes of those in the industry?  How are they connecting?
3.     Who are the established and emerging leaders?
4.     Which companies are best positioned to leverage social network of established and emerging leaders?
5.     Do extra-company networks exist?

Additionally, this is an approach to researching an industry.  It could be important in terms of corporate intelligence. For the novice, it is a fun way to learn what is going on. 

An analogy for what we’re looking for here is the rolodex of contacts.  One collect cards through previous ties or being at the same companies.  We’ll be looking at many rolodexes and looking for the network within that ecosystem. 

My hypothesis is that such networks exist within the industry, within companies and across companies.  Additionally, the networks will congeal around certain brokers and emerging leaders, who may or may know it themselves right now, will go on to dominate the industry.  My goal is to identify them. 

I expect the networks to be dominated by weak ties and be relatively insular in terms of education and work experience.  In fact, it is entirely likely that a common work experience is the key past link of many key actors--overall, wide ranging and distributed networks but with cliques.  In the end, leaders are those who can overcome weak ties and build more lasting relations.  Additionally, it would be interesting to look for well-connected actors, who by other measures are considered outliers. 

Data for this network analysis will be a challenge.  It would be impractical and almost impossible to conduct a network survey with a defined group of individuals in any sector.  However, I do think there is open-sourced information that can provide a creative solution. 

We’re going to build an industry attribute dataset of individuals in the industry, using publically available information on Linkedin profiles and company websites.  These are the steps:

1.     Identify target companies, limiting to West Coast-based and industry specific. 
2.     Comb their staff biography pages
3.     Search on Linkedin for sector, region (West Coast), and keywords.
4.     Build an attribute data set of all relevant individuals—with the understanding that data will be incomplete in some columns.

Past 3 Companies (each company will be coded differently so this list may grow really large)
Connection to Tech Lab (each tech lab will be coded differently)
Any additional Linkedin information off profile that may be interesting
(This list will be modify after a more comprehensive preliminary search.)

Target population: From 100 individuals to upwards of 500 or 1000, depending on resources and free time.   If total number is smaller, there will be a preference for leadership roles at company over rank and file.

Making a One-Mode Dataset (with limitation)
With the information we pulled from the Internet, we can be creative in building an useful One-Mode dataset.  Here’s how we will do it:

Through the attributes dataset, we should have the following information relevant to target population: Education/School, Age Range, and Concentration or Field of Studies.  Each selection within the attribute category would have distinct value coding.  We’ll translate these attributes into assumptions about ties.  If an individual share one of these data points, then they are consider to have a tie valued at 1.  If they share two, then they are considered to have a tie valued at 2; three shares for tie valued at 3 and four shares for tie valued at 4.  Here is a breakdown of what it might look like:

1.     If an individual went to the same educational institution with another individual, they share an undirected weak tie (valued at 1)
2.     If an individual went to the same educational institution and is within the same age range with another individual, they share an undirected medium tie (valued at 2)
3.     If an individual went to the same educational institution, is within the same age range, and share similar field of study with another individual, then they share an undirected strong tie (valued at 3).
4.     If an individual went to the same educational institution, is within the same age range, share similar field of study, and currently or has worked at the same company with another individual, then they share an undirected very strong tie (valued at 4).

A note on preparing and cleaning this dataset: Obviously, it would be challenging.  But one way to do so after the attribute dataset has been collected is to manipulate the excel transpose, copy and paste functions to get relevant columns next to each other and collate their data points.

Two-Mode Dataset:  
It would also be interesting to analyze individuals who have been employed at multiple companies.  This type of experience would represent invaluable institutional knowledge.
It would be valuable to look at companies that have accumulated individuals with multiple-companies experience and the network of individuals who have multi-companies experience.  Specific SNA techniques will be elaborated on later.

Creating a Two-Mode dataset:
This too will require some creativity.  From the attribute dataset, we have a list of individuals as well as the their last three companies of employment.  We can extrapolate the list of individuals and a total list of most popular companies among the individuals’ work experience.  Thus, our Two-Mode dataset will be a matrix of individuals and companies (as defined by work experience).  The values of ties would be binary, 0 for no work experience at company, and 1 for work experience at company.
            Some limitations to keep in mind before analysis:
1.     Each individual would only be tied to three companies maximum, given that we’re only accounting for the top three companies in the Attribute dataset.  We could certainly try to get a cumulative list of every company individuals have claimed to work for; however, that could get prohibitively difficult. 
2.     If, for example, we take the top twenty companies most frequently listed by individuals in their work history, each company could have anywhere from 1 to many ties.  This is the component that will allow for further analysis.  We could certainly attempt a more complete list of companies but for leadership network analysis purposes, the top ones will probably provide a sufficient network.
3.     The Two-Mode dataset will allow us to look at companies with many individuals but not necessarily individuals with many companies.  Further Social Network analysis would need to be employed for those insights.
4.     The same caveat for the One-Mode dataset applies here.  Given that this network data is not based on a survey or network questionnaire, it would be a close approximate of a network, not the actual network.

Social Network Analysis/Methodology
There are three datasets for analysis: a master attribute dataset, a One-Mode actor-by-actor network dataset and a Two-Mode actor-by-company network dataset.

Network Cohesion Measures of One-Mode Network dataset: This analysis will provide density and centralization measures of the network in the One-Mode dataset.  It is possible that no network exists at all and we get a set of islands build around companies.  The visualization through NetDraw will provide a good sense of where we’re headed.  In addition, visualizing the data at different tie strength value will help decide a good dichotomy threshold.

One-Mode Network and Attribute datasets:
1.     Dichotomize One-Mode Dataset at tie strength greater than or equal to 3. The One-Mode dataset is now binary, 1 for having ties, and 0 for no ties.
2.     Add the Attribute dataset.
3.     Using E-Index Analysis, search for homophily score based on each of the attributes listed. 

This analysis will give you a sense of the network diversity and whether individuals have ties based on attributes.  Alternatively, if the E-I index score is positive, then it indicates that individuals (who may have been a homogenous group in the first place) are not connecting based on similar attributes, which in itself, is a valuable insight for evaluating hidden networks.  It would propose the possibility of a network based on merits, over, say for example, the old boys’ club.   

Looking for subgroups using Components, Faction, Girvan-Newman and clique analyses:  Once a network visual has been established, dichotomized and separated from isoquants, we can use Components, Faction, Girvan-Newman and clique analyses to identify subgroups based on number of inward ties.  These analyses can lead to subgroups that are not obvious from the homophily analysis.  Alternatively, we can also compare ties subgroups with the subgroups based on attributes to see if there are overlaps.  Clique overlap analysis will identify the most involved individuals.  The two layers of analysis will add nuance to our interpretation of the data.   Overall, the factions in the network should be interpreted as the group of most inter-connected individuals in the drone sector.

Centrality Measures and Egonet of brokers and emerging leaders: The One-Mode Network dataset at this point would be binary and undirected.  Using centrality measures can help us positively identify the most well-placed network leaders, emerging leaders and brokers (individuals who straddle various networks).  Egonet will allow us to examine the network of these specific leaders.  Our goal at this point in the analysis is to identify the promising leads for further investigation and comparison. 

Source of Comparison in the Two-Mode Dataset:
1.     The Two-Mode Dataset is currently a binary dataset with 1 for tie with company and 0 for no tie.  By splitting the dataset from a Two-Mode to One-Mode matrix of actors by actors, we’ll be able analyze a value dataset of actors who are connected via the same company overlap.  You can choose the tie strength at which it makes sense to dichotomize the data.  One suggestion would be at greater than or equal to 1 since the network could be severely limited at this point.  Once the dataset is binary, we can run faction, Girvan-Newman and clique analyses, centrality measures, and then Egonet.  The analyses can be interpreted in the following way:
a.     At value dataset, actors with high value can be considered individuals with high reserve of industry and cross-institutional knowledge.   It would be great to collapse these individuals into a list.
b.     At binary dataset, one can identity the individuals who are most connected with other through overlapping company experiences.  We are looking for the leaders, emerging leaders and brokers with this network data too. 

At this point, we have two lists of emerging leaders and brokers; one from the first One-Mode Network dataset which approximates possible ties within the industry; the second, a separated One-Mode Network of actors based on shared company experiences.  To reiterate, the former indicates network ties; the second indicates ties based on having worked at the same company.  It would be insightful to make a comparison of the two sets of emerging leaders and brokers.

Adding the two One-Mode datasets to see if there are overlaps:
1.     We can add the two One-Mode, binary datasets to create a value dataset with the following distinction:
a.     Coding 1 to stand for the ties in the first One-Mode Network
b.     Adding the value of 1 to the second One-Mode Network data excel so that after the dataset merger, those originally with ties in this network will be valued at 2 in the new value dataset.
c.     Adding the two network matrices so that those who share both types of ties are valued at 3.
2.     At this point, we can dichotomoized the newly value One-Mode dataset at tie strength greater than or equal to 3 to produce a binary dataset with which we can perform the previous leadership analysis, including Egonet. 
a.      We will have clearly identified leadership, emerging leadership and brokers.  You can trace backward, in the previous networks for their Egonet for more insight. 
b.     We will also need to refer to context and industry information to evaluate if these names make sense.

Revisiting the Two-Mode Dataset for company analysis:
1.     It would also be interesting to look at which companies have the most connections via shared employees.  This would represent a flow of information, institutional knowledge, and expertise from one company to the other, without specifying in which direction.  We can identify the companies with the highest number of employees who have the most connections, to suggest that these companies are best positioned to succeed moving forward, if not already because they can channel their employees’ networks.
2.     Cross-referencing this list of companies with the list of leaders, emerging leaders and brokers will provide a single indicator of companies that are well placed in the industry.
3.     Our hypothesis is that companies that employ leaders, emerging leaders, and brokers AND their Egonets may be the most well-tuned to succeed.  
4.     We also want to look for outliers.  These are companies and individuals who perhaps, have their own network and need one bridge to tap into a larger network.  It would be worth paying attention to them moving forward in monitoring the industry. 
5.     Lastly, it is important to return to the context analysis to evaluate if these findings make sense.  Some of the results will be obvious, others not so much.


Using SNA, we would have identified emerging leaders in the commercial drones sector through selecting for the networks around them, identify companies that employ these well connected leaders, and identified companies that may be in position to generate a founder’s myth of their own. 

1 comment:

Christopher Tunnard said...

What an intriguing idea: using SNA to test the validity (if that's the right word) of the "founder's myth." You've thought this through well, except for one thing: the network question. Yes, you have some good ideas about creating nets out of two-mode data, or attributes. But with such a large sample, I would think you could find some affective connections, like actual or aspirational collaborations ("who would you like to work with,) on the personal level. It would seem to me that these personal connections would be very helpful, if not necessary, to support your hypothesis about the myth, no? My point is that myth-spreading is done from personal relationships, not necessarily between those who have common attributes.

But this is all in the nice-to-have category. You could come up with interesting conclusions from the data and approaches you've so nicely described.