Clustering and cluster inference of complex data structures

Alruwaili, Bader Lafi Q (2019) Clustering and cluster inference of complex data structures. PhD thesis, University of Glasgow.

Full text available as:
[thumbnail of 2019AlruwailiPhd.pdf] PDF
Download (5MB)
Printed Thesis Information: https://eleanor.lib.gla.ac.uk/record=b3348704

Abstract

Finite mixtures provide a flexible and powerful tool for fitting univariate and multivariate distributions that cannot be captured by standard statistical distributions. In particular, multivariate mixtures have been widely used to perform modelling and cluster analysis of high-dimensional data in a wide range of applications. Modes of mixture densities have been used with great success for organizing mixture components into homogenous groups. But the results are limited to normal mixtures. Beyond the clustering application existing research in this area has provided fundamental results regarding the upper bound of the number of modes, but they too are limited to normal mixtures.
This thesis provides new modality theorems and important analytical results on the upper bound of the number of modes for multivariate t-mixtures and compares them with existing results on normal mixtures. Graphical tools for merging t-mixtures and the effect of degrees-of-freedom are also thoroughly examined.
The most important contribution of this thesis are a set of fundamental results on the modality of skewed normal and skewed normal mixtures. First, we show that the topography of high-dimensional skew normal mixtures can be analyzed rigorously in lower dimensions by defining the corresponding ridgeline manifold that contains all critical points, as well as the ridges of the density. But unlike the normal or t-mixtures we need to solve an implicit equation to obtain this manifold. The plot of the elevations on the ridgeline can still be used to develop tools to explore the number of modes and for merging mixture components. Though analytical results on the number of modes cannot be explored any more, the elevation plots lead to a new conjecture on the upper bound on the number of modes of skew normal mixture.
Unlike the normal and t-distribution, for skew normal distributions even the one-component counterpart have very interesting modal features. Firstly, as the modes cannot be written in closed form, we design and provide software tools to calculate the modes in any dimensions. We also provide a thorough study exploring the relationship between the means and modes of skew normals and provide fundamental results on the limiting behaviour of the mean and mode as the skewness parameter increases. We also provide another new result showing that though the mean can vary widely as the skewness parameter varies, the mode is a much more robust measure of the central tendency as the mode of skew distribution only varies within a smaller range.
Two R-package available on github containing the numerical tools for calculating the modes of skew normals and function specific to merging of skew normal components is provided as part of this thesis. Additionally, application of the merging tool developed of skew normal mixtures is demonstrated using flow-cytomtery data.

Item Type: Thesis (PhD)
Qualification Level: Doctoral
Keywords: Modality, number of modes, multivariate t-mixture, skew normal distribution, multivariate skew normal distribution, merging skew normal mixture.
Subjects: Q Science > Q Science (General)
Colleges/Schools: College of Science and Engineering > School of Mathematics and Statistics > Statistics
Supervisor's Name: Ray, Dr. Surajit
Date of Award: 2019
Depositing User: Mr Bader Lafi Q Alruwaili
Unique ID: glathesis:2019-72979
Copyright: Copyright of this thesis is held by the author.
Date Deposited: 28 May 2019 13:01
Last Modified: 05 Mar 2020 22:42
Thesis DOI: 10.5525/gla.thesis.72979
URI: https://theses.gla.ac.uk/id/eprint/72979

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year