Estimating the incidence of HIV in Sub-Saharan Africa

Murphy, Neil (2012) Estimating the incidence of HIV in Sub-Saharan Africa. MSc(R) thesis, University of Glasgow.

Full text available as:
[thumbnail of 2011murphyMSc.pdf] PDF
Download (418kB)
Printed Thesis Information:


It is common knowledge that HIV is a serious problem in South Africa and one of the worst affected areas of this country is KwaZulu-Natal. As such, accurate measurement of HIV incidence in this area is of vital importance. Unfortunately, surveys of HIV incidence in the area often return high numbers of missing results making the task of estimating the incidence and prevalence very difficult. In this study, methods are developed to produce accurate measurements of the incidence of HIV from data which contain a large number of missing values.

As well as developing our own method, we consider the merits of existing methods of estimating HIV incidence, particularly those which are able to produce incidence estimates using cross-sectional surveys. These methods make use of the optical density (OD) value, a measure which can be taken at the same time as HIV tests and which increases with time since HIV infection. The OD values are used to ascertain whether HIV-positive individuals are recently infected or not (i.e. infected within a pre-determined time frame). These recency classifications are then used to produce estimates of the HIV incidence.

The method of incidence estimation developed in this study consists of imputing the missing data values before applying traditional methods of incidence estimation to the imputed dataset. This imputation consists of two parts: deterministic and probabilistic imputation. To impute deterministically, we assume that once an individual has tested positive for HIV they cannot then test negative in a later test. This allows us to back- and forward-fill as appropriate some of the missing values in HIV tests carried out at different times on the same individual. Remaining missing values are imputed probabilistically with probabilities calculated using observed values in the data.

Using our method, our best estimate of the HIV incidence between the first and second stage of testing is 31.04 infections per 1000 person years with a 95% confidence interval of 30.25 to 31.83 infections per 1000 person years. Our best estimate of the HIV incidence between the second and third stages of testing is 30.92 infections per 1000 person years with a 95% confidence interval of 29.72 to 32.13 infections per 1000 person years. Our method also produces a best estimate of the HIV incidence between the first and third stages of testing of 30.96 infections per 1000 person year with a 95% confidence interval of 30.46 to 31.47 infections per 1000 person years.

Simulation of HIV test data allows us to assess the accuracy and appropriateness of the methods considered in this study. The inclusion of missing data in these simulated datasets allows us to check the performance of each of these methods under conditions similar to those seen in our original dataset. Our imputation method was shown to cope well with missing data and produced estimates of the incidence with consistently low biases and root mean square errors. One of the methods which produces incidence estimates based on cross-sections of the data was also shown to perform reasonably well with generally good levels of accuracy.

Item Type: Thesis (MSc(R))
Qualification Level: Masters
Keywords: HIV, incidence, Africa, simulation, imputation, Statistics
Subjects: H Social Sciences > HA Statistics
Colleges/Schools: College of Science and Engineering > School of Mathematics and Statistics > Statistics
Supervisor's Name: McColl, Professor John
Date of Award: 2012
Depositing User: Mr Neil Murphy
Unique ID: glathesis:2012-2885
Copyright: Copyright of this thesis is held by the author.
Date Deposited: 21 Jan 2013 13:54
Last Modified: 21 Jan 2013 13:57

Actions (login required)

View Item View Item


Downloads per month over past year