UVM Theses and Dissertations
Format:
Print
Author:
Huang, Lulu
Dept./Program:
Statistics Program
Year:
2007
Degree:
M.S.
Abstract:
Count data are nonnegative integer-valued data and refer to the number of times an event occurs. Count data are usually assumed to follow a Poisson distribution. In practice, this assumption may not be appropriate. The Poisson distribution has the characteristic that its mean and variance are equal, but the variance of actual counts can be greater than their mean. The number of zeros may be excessive compared to the number expected for a Poisson distribution. Various strategies for modeling the relationship between the counts and a set of predictor variables have been proposed, such as the linear, the Poisson, the overdispersed Poisson, the negative binomial, the zero-inflated Poisson (ZIP) and the zero-inflated negative binomial (ZINB) regression models. Investigators or data analysts must decide which to report. Analysts may consider the fit of a model and ease of interpretation when selecting an appropriate model. We applied these models to mammography participation data obtained from a survey of low income minority women conducted in 1990. We were interested in these data from a modeling perspective because few low-income minority women were getting mammograms at that time. The response count variable is number of mammograms; predictors are age, income, health insurance and geographic area.
For this example, the Poisson regression model did not fit very well. The negative binomial and ZIP model fit better with the negative binomial model slightly outperfoming the ZIP model. Unfortunately, when we fit the ZINB model with all the four variables, it did not work out. All models indicated that a greater number of mammograms was associated with having insurance and a lesser number of mammograms was associated with lower income. In practice, for count data the Poisson regression model is not always a good choice. If overdispersion exists, the Poisson regression model tends to underestimate errors. So the negative binomial and the overdispersed Poisson regression model may be more appropriate. If zero-inflation exists, the ZIP model may be used. If both zeroinflation and overdispersion exist, the ZINB, the negative binomial, and the ZIP models may be used.
For this example, the Poisson regression model did not fit very well. The negative binomial and ZIP model fit better with the negative binomial model slightly outperfoming the ZIP model. Unfortunately, when we fit the ZINB model with all the four variables, it did not work out. All models indicated that a greater number of mammograms was associated with having insurance and a lesser number of mammograms was associated with lower income. In practice, for count data the Poisson regression model is not always a good choice. If overdispersion exists, the Poisson regression model tends to underestimate errors. So the negative binomial and the overdispersed Poisson regression model may be more appropriate. If zero-inflation exists, the ZIP model may be used. If both zeroinflation and overdispersion exist, the ZINB, the negative binomial, and the ZIP models may be used.