Topics on statistical design and analysis of cDNA microarray experiment

Zhu, Ximin (2009) Topics on statistical design and analysis of cDNA microarray experiment. PhD thesis, University of Glasgow.

Full text available as:
[thumbnail of 2009XiminZhuPhD.pdf] PDF
Download (5MB)
Printed Thesis Information: https://eleanor.lib.gla.ac.uk/record=b2695228

Abstract

A microarray is a powerful tool for surveying the expression levels of many thousands of genes simultaneously. It belongs to the new genomics technologies which have important applications in the biological, agricultural and pharmaceutical sciences.

In this thesis, we focus on the dual channel cDNA microarray which is one of the most popular microarray technologies and discuss three different topics: optimal experimental design;
estimating the true proportion of true nulls, local false discovery rate (lFDR) and positive false discovery rate (pFDR) and dye effect normalization.

The first topic consists of four subtopics each of which is about an independent and practical problem of cDNA microarray experimental design. In the first subtopic, we propose an optimization strategy which is based on the simulated annealing method to find optimal or near-optimal designs with both biological and technical replicates. In the second subtopic, we discuss how to apply Q-criterion for the factorial design of microarray experiments. In the third subtopic, we suggest an optimal way of pooling samples, which is actually a replication scheme to minimize the variance of the experiment under the constraint of fixing the total cost at a certain level. In the fourth subtopic, we indicate
that the criterion for distant pair design is not proper and propose an alternative criterion instead.

The second topic of this thesis is dye effect normalization. For cDNA microarray technology, each array compares two samples which are usually labelled with different dyes Cy3 and Cy5. It assumes that: for a given gene (spot) on the array, if Cy3-labelled sample has k times as much of a transcript as the Cy5-labelled sample, then the Cy3 signal should be k times as high as the Cy5 signal, and vice versa. This important assumption requires that the dyes
should have the same properties. However, the reality is that the Cy3 and Cy5 dyes have slightly different properties and the relative efficiency of the dyes vary across the intensity range in a "banana-shape" way. In order to remove the dye effect, we propose a novel dye effect normalization method which is based on modeling dye response functions and dye effect curve. Real and simulated microarray data sets are used to evaluate the method. It shows that the performance of the proposed method is satisfactory.

The focus of the third topic is the estimation of the proportion of
true null hypotheses, lFDR and pFDR. In a typical microarray
experiment, a large number of gene expression data could be
measured. In order to find differential expressed genes, these
variables are usually screened by a statistical test simultaneously.
Since it is a case of multiple hypothesis testing, some kind of
adjustment should be made to the p-values resulted from the
statistical test. Lots of multiple testing error rates, such as FDR,
lFDR and pFDR have been proposed to address this issue. A key
related problem is the estimation of the proportion of true null
hypotheses (i.e. non-expressed genes). To model the distribution of
the p-values, we propose three kinds of finite mixture of unknown
number of components (the first component corresponds to
differentially expressed genes and the rest components correspond to
non-differentially expressed ones). We apply a new MCMC method
called allocation sampler to estimate the proportion of true null
(i.e. the mixture weight of the first component). The method also
provides a framework for estimating lFDR and pFDR. Two real
microarray data studies plus a small simulation study are used to
assess our method. We show that the performance of the proposed
method is satisfactory.

Item Type: Thesis (PhD)
Qualification Level: Doctoral
Keywords: cDNA microarray, optimal design, dye normalization, multiple hypothesis testing, mixture model
Subjects: Q Science > QH Natural history > QH301 Biology
H Social Sciences > HA Statistics
Q Science > QA Mathematics
Colleges/Schools: College of Science and Engineering > School of Mathematics and Statistics > Statistics
Supervisor's Name: Wit, Prof. Ernst and Agostino, Dr. Nobile
Date of Award: 2009
Depositing User: Mr Ximin Zhu
Unique ID: glathesis:2009-1206
Copyright: Copyright of this thesis is held by the author.
Date Deposited: 28 Oct 2009
Last Modified: 10 Dec 2012 13:35
URI: https://theses.gla.ac.uk/id/eprint/1206

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year