UVM Theses and Dissertations
Format:
Print
Author:
Bliss, Catherine Anne
Dept./Program:
Mathematics and Statistics
Year:
2014
Degree:
PhD
Abstract:
Complex networks underlie a variety of social, biological, physical, and virtual systems. Understanding the topology of networks, the manner in which agents interact and evolutionary dynamics of the system can be challenging, both computationally and theoretically. In many settings, network data is incomplete; it is impossible to observe all nodes and all network interactions due to samplingconstraintsin large datasets or covert interactions between agents.
As both a test of our general methods and as a problem of scientific interest in itself, we focus our attention on over 100 million tweets from the microblogging service Twitter authored between September 2008 and February 2009. This dataset accounts for approximately 30% of all tweets authored in this tiInespan. The goals ofour analysis are threefold: to develop a construction of social networks from replies and reciprocated replies, predict future links in a way that ellucidates evolutionary dynamics, and to scale global statistics of sampled network data to account for incomplete and missing observations.
We begin by defining Twitter reciprocal reply networks and examine the revealed social network structure and dynamics over the time scales of days, weeks, and months. At the level of user behavior, we employ our hedonometric analysis methods to investigate patterns of sentiment expression. We find users average happiness scores to be positively and significantly correlated with those of users one, two, and three links away. We strengthen our analysis by proposing and using a null model to test the effect of network topology on the assortativity of happiness. We also find evidence that more well connected users write happier status updates, with a transition occurring around Dunbar's number. Second, we use an evolutionary algorithm to optimize weights which are used in a linear combination of sixteen neighborhood and node similarity indices to predict future links. Our method exhibits fast convergence and high levels of precision for the top twenty predicted links. Based on our findings, we suggest possible factors which may be driving the evolution of Twitter reciprocal reply networks.
Lastly, we acknowledge that our dataset is incomplete and explore how global network statistics scale with missing data in a variety of sampling regimes. We propose scaling methods to predict true network parameters from only partial knowledge of nodes, links, or weighted interactions. We validate our analytical results·with four classes of simulated networks (Erdos-Renyi, Scale-free, Small World, and Range dependent) and six empirical data sets. To overcome limitations due to sampling tweets, we apply our developed methods to Twitter reply networks and suggest a characterization of the Twitter interactome for this time period.
As both a test of our general methods and as a problem of scientific interest in itself, we focus our attention on over 100 million tweets from the microblogging service Twitter authored between September 2008 and February 2009. This dataset accounts for approximately 30% of all tweets authored in this tiInespan. The goals ofour analysis are threefold: to develop a construction of social networks from replies and reciprocated replies, predict future links in a way that ellucidates evolutionary dynamics, and to scale global statistics of sampled network data to account for incomplete and missing observations.
We begin by defining Twitter reciprocal reply networks and examine the revealed social network structure and dynamics over the time scales of days, weeks, and months. At the level of user behavior, we employ our hedonometric analysis methods to investigate patterns of sentiment expression. We find users average happiness scores to be positively and significantly correlated with those of users one, two, and three links away. We strengthen our analysis by proposing and using a null model to test the effect of network topology on the assortativity of happiness. We also find evidence that more well connected users write happier status updates, with a transition occurring around Dunbar's number. Second, we use an evolutionary algorithm to optimize weights which are used in a linear combination of sixteen neighborhood and node similarity indices to predict future links. Our method exhibits fast convergence and high levels of precision for the top twenty predicted links. Based on our findings, we suggest possible factors which may be driving the evolution of Twitter reciprocal reply networks.
Lastly, we acknowledge that our dataset is incomplete and explore how global network statistics scale with missing data in a variety of sampling regimes. We propose scaling methods to predict true network parameters from only partial knowledge of nodes, links, or weighted interactions. We validate our analytical results·with four classes of simulated networks (Erdos-Renyi, Scale-free, Small World, and Range dependent) and six empirical data sets. To overcome limitations due to sampling tweets, we apply our developed methods to Twitter reply networks and suggest a characterization of the Twitter interactome for this time period.