Model selection and model averaging in the presence of missing values

Gopal Pillay, Khuneswari (2015) Model selection and model averaging in the presence of missing values. PhD thesis, University of Glasgow.

Full text available as:
Download (2MB) | Preview
Printed Thesis Information:


Model averaging has been proposed as an alternative to model selection which is intended
to overcome the underestimation of standard errors that is a consequence of
model selection. Model selection and model averaging become more complicated in the
presence of missing data. Three different model selection approaches (RR, STACK and
M-STACK) and model averaging using three model-building strategies (non-overlapping
variable sets, inclusive and restrictive strategies) were explored to combine results from
multiply-imputed data sets using a Monte Carlo simulation study on some simple linear
and generalized linear models. Imputation was carried out using chained equations (via
the "norm" method in the R package MICE). The simulation results showed that the
STACK method performs better than RR and M-STACK in terms of model selection
and prediction, whereas model averaging performs slightly better than STACK in terms
of prediction. The inclusive and restrictive strategies perform better in terms of prediction,
but non-overlapping variable sets performs better for model selection. STACK and
model averaging using all three model-building strategies were proposed to combine the
results from a multiply-imputed data set from the Gateshead Millennium Study (GMS).
The performance of STACK and model averaging was compared using mean square error
of prediction (MSE(P)) in a 10% cross-validation test. The results showed that STACK
using an inclusive strategy provided a better prediction than model averaging. This
coincides with the results obtained through a mimic simulation study of GMS data. In
addition, the inclusive strategy for building imputation and prediction models was better
than the non-overlapping variable sets and restrictive strategy. The presence of highly
correlated covariates and response is believed to have led to better prediction in this
particular context. Model averaging using non-overlapping variable sets performs better
only if an auxiliary variable is available. However, STACK using an inclusive strategy
performs well when there is no auxiliary variable available. Therefore, it is advisable to
use STACK with an inclusive model-building strategy and highly correlated covariates
(where available) to make predictions in the presence of missing data. Alternatively,
model averaging with non-overlapping variables sets can be used if an auxiliary variable
is available.

Item Type: Thesis (PhD)
Qualification Level: Doctoral
Keywords: model selection, model averaging, missing data, STACK, M-STACK, Rubin's rules, MICE, inclusive strategy, auxiliary variable
Subjects: H Social Sciences > HA Statistics
Q Science > QA Mathematics
Colleges/Schools: College of Science and Engineering > School of Mathematics and Statistics > Statistics
Supervisor's Name: McColl, Professor John
Date of Award: 2015
Unique ID: glathesis:2015-6834
Copyright: Copyright of this thesis is held by the author.
Date Deposited: 10 Nov 2015 12:27
Last Modified: 20 Nov 2015 16:28

Actions (login required)

View Item View Item


Downloads per month over past year