UVM Theses and Dissertations
Format:
Print
Author:
Fytilis, Nikolaos
Dept./Program:
Civil and Environmental Engineering
Year:
2014
Degree:
PhD
Abstract:
Organizing or clustering data into natural groups is one of the most fundamental aspects of understanding and mining information. The recent explosion in sensor networks and data storage associated with hydrological monitoring has created a huge potential for automating data analysis and classification of large, high-dimensional data sets. In this work, we develop a new classification tool that couples a Naive Bayesian classifier with a clustering artificial neural network (specifically, a Kohonen Self-Organizing map (SOM) that reduces classification error by minimizing within class variance. Our primary motivation is the reduction of uncertainty, while leveraging prior information/evidence embedded in multiple data types and maintaining simplicity of implementation. In this work, we focus on construction of statistical models driven by field-measured data and not physical laws. We explore the applicability of this new SOM-Bayesian tool and Bayesian statistics on two real-world hydrological datasets to show proof-of-concept. This research is presented as a series of three manuscripts.
At the beginning, we tackle the issue of identifying tubificid worm taxa in stream communities. These taxa are the intermediate host for the causative agent of salmonid whirling disease. The main contribution is the design and development of multiplex qPCR assay probes to identify the three most common taxa found along the Madison River watershed, MT, USA. We also detect the infection prevalence using parasite specific assays already developed. The data are comprised of 3000+ worms collected in 2009 from six different stream reaches. The combination of the results from both assays (taxa and parasite) helps explain the transmission variability using simple Bayesian statistics. We further evaluate relationships between taxa density metrics, environmental characteristics and fish infection risk metrics using traditional and Bayesian regression analysis while we test the posterior predictive ability of the resulting models.
The contribution of my research focuses on the development and application of a new SOM-Bayesian classification tool to overcome challenges associated with combining multiple types of field data. As a starting point, we apply the genetic data from the taxa assays for all of the Madison River tubificid worms and compare the site-specific SOM-Bayesian taxa predictions to more traditional Bayesian approaches. This application helps improve predictions of taxa and estimates of relative abundance in future years using data from previous years. A second application uses stream geomorphic and water quality data measured at ~2500 Vermont streams to predict stream-reach habitat conditions and the associated uncertainty. The dataset demonstrates the network's ability to handle large amounts of multiple data and better addresses issues of uncertainty. Results show the network outperforms traditional classification and clustering methods; and due to its parallel architecture, it is computationally comparable to a Naive Bayesian classifier.
At the beginning, we tackle the issue of identifying tubificid worm taxa in stream communities. These taxa are the intermediate host for the causative agent of salmonid whirling disease. The main contribution is the design and development of multiplex qPCR assay probes to identify the three most common taxa found along the Madison River watershed, MT, USA. We also detect the infection prevalence using parasite specific assays already developed. The data are comprised of 3000+ worms collected in 2009 from six different stream reaches. The combination of the results from both assays (taxa and parasite) helps explain the transmission variability using simple Bayesian statistics. We further evaluate relationships between taxa density metrics, environmental characteristics and fish infection risk metrics using traditional and Bayesian regression analysis while we test the posterior predictive ability of the resulting models.
The contribution of my research focuses on the development and application of a new SOM-Bayesian classification tool to overcome challenges associated with combining multiple types of field data. As a starting point, we apply the genetic data from the taxa assays for all of the Madison River tubificid worms and compare the site-specific SOM-Bayesian taxa predictions to more traditional Bayesian approaches. This application helps improve predictions of taxa and estimates of relative abundance in future years using data from previous years. A second application uses stream geomorphic and water quality data measured at ~2500 Vermont streams to predict stream-reach habitat conditions and the associated uncertainty. The dataset demonstrates the network's ability to handle large amounts of multiple data and better addresses issues of uncertainty. Results show the network outperforms traditional classification and clustering methods; and due to its parallel architecture, it is computationally comparable to a Naive Bayesian classifier.