Investigation of non-protein-coding regions in the human cytomegalovirus genome

Hector, Ralph David (2005) Investigation of non-protein-coding regions in the human cytomegalovirus genome. PhD thesis, University of Glasgow.

Full text available as:
[thumbnail of edited version, 3rd party copyright removed] PDF (edited version, 3rd party copyright removed)
Download (21MB)

Abstract

Human cytomegalovirus (HCMV) has the largest genome of the human herpesviruses, and its gene content is imperfectly understood. The gene content of HCMV strain AD 169 was recently re-evaluated, discounting 51 previously proposed protein-coding open reading frames (ORFs) because they have no counterparts in chimpanzee cytomegalovirus (CCMV) and lack any other convincing evidence for expression. Some of the discounted ORFs were located in blocks, and their omission left three large 'empty' regions in the ADI 69 genome. One of these regions, located between genes RL1 and RL10, has been investigated previously leading to the discovery of a new gene. The other two regions are investigated in this study. The first, termed region X, is located between genes UL105 and UL1 12, which are firmly accepted as encoding proteins. The second, termed region O, is located between genes UL57 and UL69. Initial analysis of the AD 169 genome predicted six (UL106- UL111) and eleven (UL58-UL68) small ORFs in regions X and O, respectively, and subsequent analyses have predicted three further ORFs in each region (C-ORF16-C- 0RF18 and ORF3-ORF5, respectively). Sequence data were generated from seven HCMV strains by PCR amplification and cloning of region X (approximately 6 kbp) and region O (approximately 8 kbp). Sequence comparisons were used to identify which ORFs in these regions are conserved and which are disrupted by insertions, deletions or substitutions leading to in-frame termination codons. All of the ORFs are frameshifted in certain strains, with the exception of ULl 08, C-0RF16 and C-0RF18 in region X, and UL66 in region O. The disrupted ORFs are unlikely to represent protein-coding genes. Moreover, the few ORFs unaffected by frameshifts remain unlikely to encode proteins as they are predicted to encode very small proteins and lack counterparts in CCMV. Furthermore, in region X no transcripts corresponding to the cognate ORFs were detected by northern blotting, and in region O UL66 is completely overlapped by the previously identified pp67 transcript. This transcript, which is routinely detected by PCR assays, is disrupted in all the strains analysed and is therefore also unlikely to encode a protein. Transcript mapping in region X detected a spliced 1.1 kb polyadenylated RNA and a 4.6 kb intron, which covers most of the region. 5'- and 3'-ends of the 1.1 kb RNA were identified, the former located 25 bp from a TATA box and the latter to two sites located 20 and 34 bp from a polyadenylation signal. These results consolidate the findings of two previous studies that had partially characterised the 5 kb and 1.1 kb RNAs, respectively. The splice sites, 3'-polyadenylation signal, and 5'-TATA box of the 1.1 kb RNA are conserved in the corresponding region of CCMV, suggesting that a similar RNA with a large intron should be expressed in CCMV. However, this RNA is unlikely to encode a protein as no amino acid sequences are conserved between the two genomes. A third region of the HCMV genome where coding potential was not clear was also investigated. Region G, at the end of the unique short (Us) sequence, is located between genes US32 and TRS1, which are firmly accepted as encoding proteins. On the basis of conservation in CCMV, three originally defined ORFs (US33, US35 and US36) had been discounted and a novel ORF (US34A) identified. Again, sequence data were generated from seven HCMV strains by PCR amplification and cloning. Comparisons showed that the sequence between US34A and TRS1 is highly variable, and that US35 and US36 are frameshifted in multiple strains. US33 is conserved between strains, but no relevant transcript was detected. However, a transcript from a small, novel ORF (US33A) was detected on the opposing strand. 5'- and 3'-ends were identified for this RNA, the 5'-end located 31 bp downstream from a TATA box. The US33A ORF and the TATA box are also conserved in CCMV, suggesting that US33A may constitute a novel HCMV gene. The US33A transcript was shown to be 3'-co-terminal with that of US34. Transcripts from US31 and US32, which are conserved ORFs immediately to the left of region G, were also mapped. No convincing evidence was found for transcription of US34A. This work extends the understanding of the genetic content of HCMV and has identified novel transcripts in HCMV, providing a basis on which to develop future experiments aimed at determining their function.

Item Type: Thesis (PhD)
Qualification Level: Doctoral
Additional Information: Adviser: Andrew Davison
Keywords: Genetics, virology.
Colleges/Schools: College of Medical Veterinary and Life Sciences
Date of Award: 2005
Depositing User: Enlighten Team
Unique ID: glathesis:2005-74104
Copyright: Copyright of this thesis is held by the author.
Date Deposited: 07 Aug 2019 14:53
Last Modified: 07 Aug 2019 14:57
URI: https://theses.gla.ac.uk/id/eprint/74104
Related URLs:

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year