Bayesian mixture models for count data

Chanialidis, Charalampos (2015) Bayesian mixture models for count data. PhD thesis, University of Glasgow.

Full text available as:
[thumbnail of 2015chanialidisphd.pdf] PDF
Download (15MB)
Printed Thesis Information:


Regression models for count data are usually based on the Poisson distribution. This thesis is concerned with Bayesian inference in more flexible models for count data. Two classes of models and algorithms are presented and studied in this thesis. The first employs a generalisation of the Poisson distribution called the COM-Poisson distribution, which can represent both overdispersed data and underdispersed data. We also propose a density regression technique for count data, which, albeit centered around the Poisson distribution, can represent arbitrary discrete distributions. The key contribution of this thesis are MCMC-based methods for posterior inference in these models.

One key challenge in COM-Poisson-based models is the fact that the normalisation constant of the COM-Poisson distribution is not known in closed form. We propose two exact MCMC algorithms which address this problem. One is based on the idea of retrospective sampling; we sample the uniform random variable used to decide on the acceptance (or rejection) of the proposed new state of the unknown parameter first and then only evaluate bounds for the acceptance probability, in the hope that we will not need to know the acceptance probability exactly in order to come to a decision on whether to accept or reject the newly proposed value. This strategy is based on an efficient scheme for computing lower and upper bounds for the normalisation constant. This procedure can be applied to a number of discrete distributions, including the COM-Poisson distribution. The other MCMC algorithm proposed is based on an algorithm known as the exchange algorithm. The latter requires sampling from the COM-Poisson distribution and we will describe how this can be done efficiently using rejection sampling.

We will also present simulation studies which show the advantages of using the COM-Poisson regression model compared to the alternative models commonly used in literature (Poisson and negative binomial). Three real world applications are presented: the number of emergency hospital admissions in Scotland in 2010, the number of papers published by Ph.D. students and fertility data from the second German Socio-Economic Panel.

COM-Poisson distributions are also the cornerstone of the proposed density regression technique based on Dirichlet process mixture models. Density regression can be thought of as a competitor to quantile regression. Quantile regression estimates the quantiles of the conditional distribution of the response variable given the covariates. This is especially useful when the dispersion changes across the covariates. Instead of estimating the conditional mean , quantile regression estimates the conditional quantile function across different quantiles.As a result, quantile regression models both location and shape shifts of the conditional distribution. This allows for a better understanding of how the covariates affect the conditional distribution of the response variable. Almost all quantile regression techniques deal with a continuous response. Quantile regression models for count data have so far received little attention. A technique that has been suggested is adding uniform random noise (``jittering''), thus overcoming the problem that, for a discrete distribution, the conditional quantile function is not a continuous function of the parameters of interest. Even though this enables us to estimate the conditional quantiles of the response variable, it has disadvantages. For small values of the response variable Y, the added noise can have a large influence on the estimated quantiles. In addition, the problem of ``crossing quantiles'' still exists for the jittering method. We eliminate all the aforementioned problems by estimating the density of the data, rather than the quantiles. Simulation studies show that the proposed approach performs better than the already established jittering method. To illustrate the new method we analyse fertility data from the second German Socio-Economic Panel.

Item Type: Thesis (PhD)
Qualification Level: Doctoral
Keywords: Quantile regression, Bayesian nonparametrics, mixture models, COM-Poisson distribution, COM-Poisson regression, Markov chain Monte Carlo.
Subjects: H Social Sciences > HA Statistics
Q Science > QA Mathematics
Colleges/Schools: College of Science and Engineering > School of Mathematics and Statistics > Statistics
Supervisor's Name: Evers, Dr. Ludger and Neocleous, Dr. Tereza
Date of Award: 2015
Depositing User: Mr Charalampos Chanialidis
Unique ID: glathesis:2015-6371
Copyright: Copyright of this thesis is held by the author.
Date Deposited: 22 May 2015 15:15
Last Modified: 22 May 2015 15:30

Actions (login required)

View Item View Item


Downloads per month over past year