Genomic insights into Trypanosoma brucei and Leishmania major: compartmentalised DNA replication, modified base detection by Nanopore sequencing and nucleotide composition analysis

Krasiļņikova, Marija (2025) Genomic insights into Trypanosoma brucei and Leishmania major: compartmentalised DNA replication, modified base detection by Nanopore sequencing and nucleotide composition analysis. PhD thesis, University of Glasgow.

Full text available as:
[thumbnail of 2025KrasilnikovaPhD.pdf] PDF
Download (92MB)

Abstract

The genomes of the early-diverging eukaryotic parasites Trypanosoma brucei and Leishmania major are unusual in both their organisation and dynamics. The majority of transcribed genes in both vector-transmitted parasites are organised in polycistronic transcription units (PTUs), and the boundaries between these units act as transcription start and termination sites. The genome of T. brucei is further shaped by the parasite’s immune evasion strategy – antigenic variation. One variant surface glycoprotein (VSG) is expressed on the parasite’s surface at any given time, despite the trypanosome harbouring >2500 of VSG (pseudo)genes in its genome. This monoallelic gene expression is achieved whereby only one VSG is expressed at a time from a dedicated telomere-proximal site – the bloodstream-form expression site (BES), of which the parasite has ~15. The remainder of VSG genes are located in transcriptionally silent arrays in the subtelomeric compartments of the larger, so-called megabase chromosomes of the parasites, as well as the smaller mini- and intermediate chromosomes.

Analysis of DNA replication patterns using marker frequency analysis coupled with sequencing (MFAseq) in T. brucei showed that in the core of the larger chromosomes some PTU boundaries, as well as the annotated centromeric regions, co-localise with early S phase DNA replication initiation; this pattern was consistent between two T. brucei strains, TREU 927 and Lister 427, as well as between mammalian stage bloodstream-form (BSF) cells and insect-stage procyclic cells (PCF). Curiously, the active BES was also an early-replicating region, but only in the BSF cells; the origin – telomeric or upstream – of this replication could not be determined. In addition, complete, genome-wide analysis of DNA replication dynamics was not possible at the time due to incomplete genome assembly. Until recently, the genome of this parasite remained poorly assembled outside of the megabase chromosome cores, despite its small < 50 Mb size. In 2018, a number of major improvements in the genome assembly of T. brucei were achieved using PacBio long-read assembly, assisted by Hi-C DNA interaction data, but the chromosome ‘core’ sequences remained separate from VSG-containing subtelomeric sequences and BES. In addition, none of the centromeric regions, which co-localise with the earliest S phase DNA replication sites, have been fully resolved. Moreover, MFA-seq mapping to mini- and intermediate chromosomes was compromised by much of their content comprising 177 bp repeats.

In chapter 3 we discuss de novo long-read assembly of the genome of T. brucei brucei Lister 427 using Nanopore sequencing to improve contiguity along chromosome compartments (core, subtelomeres, BES), as well as improve repetitive region and sub-megabase chromosome assembly. This was motivated by a wish to provide a complete understanding of DNA replication, by expanding MFAseq mapping to the subtelomere compartments of the megabase chromosomes, across the core and subtelomere boundaries, across the entirety of the telomeric BESs, and within the submegabase chromosomes. Long-read assembly resulted in at least 1 bridging of previously separate genome sequences in 10/11 megabase chromosomes, overall improved contiguity, as well as assembly of full-length centromeric, 50bp, 177bp and 70bp repeats. Additionally, smaller, sub-megabase chromosomes of this parasite with the characteristic 177bp repeat region were at least partially assembled; the assembly of these smaller chromosomes showed that they contain more genes than previously thought. The improved genome assembly allowed mapping of MFAseq across the various genome compartments, and showed that megabase chromosome subtelomeres, surprisingly, contain no detectable early-replicating regions outside of centromeric repeats. We also show that 177bp repeats act as sites of DNA replication initiation in the submegabase chromosomes and that they are found in centromeric regions of the megabase chromosomes, revealing these repeats to be widespread, sequence conserved origins of replication. In addition, we demonstrate that early replication if the active BES in BSF cells initiates from the telomere. Finally, we show that, in addition to genome compartmentalisation being evident in gene content, organisation, transcription and DNA replication in T. brucei, genome stability is also compartmentalised, with the subtelomeric regions of megabase chromosomes showing pronounced genomic instability compared to the cores and, indeed, sub-megabase chromosomes.

Kinetoplastid parasite genomes also harbour an unusual DNA modification – a thymidine modification termed base J. It is thought to be generated in a twostep process – first, a thymidine is modified to 5-hydroxymethyluracil (5hmU), followed by glycosylation. Previous work has indicated that this modification is present in the T. brucei genome at very low levels, and that it is primarily detected at repetitive DNA, as well as PTU boundaries. Only mammalian stage parasites appear to harbour easily detectable levels of this base, as in the insect vector stage only very low or no base J has been previously detected. While both base J and 5hmU have previously been mapped genome-wide in T. brucei, the existing datasets have not been re-examined in light of improved genome assemblies, nor have they been evaluated together. In addition to offering longread DNA sequencing, the two main long-read sequencing technologies at the time of writing – Nanopore and PacBio – also possess the ability to detect modified bases in DNA and RNA. In chapter 4, we provide more comprehensive analysis of base J and 5hmU ChIPseq datasets in T. brucei, along with newly generated modified base detection data using Nanopore sequencing and Tombo software. We find that Nanopore-generated data recapitulates many base J ChIPseq enrichment patterns, specifically, at polycistronic unit boundaries, in repetitive regions and around coding sequence, while also offering strandspecific and base-resolution data. To our surprise, insect and mammalian stage parasites show similar levels and patterns of DNA modification based on Nanopore data, arguing that the two lifecycle stages may, in fact, have similar DNA modification distribution patterns.

In our analysis of modified DNA distribution in Chapter 4, it became clear that broader canonical base distribution genome-wide has not been fully described in T. brucei. DNA strand asymmetries in prokaryotes and eukaryotes often arise as a consequence of directional processes, such as transcription and DNA replication, as these processes are asymmetrical with regards to the two DNA strands. This often leads to overabundances of certain nucleotides on one DNA strand relative to the other, termed nucleotide skews. In many eukaryotic and prokaryotic genomes, analysis of nucleotide skews can even be used to detect origins in DNA replication. In chapter 5 we present efforts in describing the nature, and elucidating the potential contributors of nucleotide skews in T. brucei, and comparing these skews to those observed in L. major, as well as a broad range of trypanosomatid genomes. We found that T. brucei and L. major display clear and distinct skews associated with transcription direction, but similar skews associated with DNA replication, meaning differential processes are responsible for the distinct nature of transcription-associated skews. Additionally, through the inclusion of analysis in other trypanosomatids, we show that nucleotide composition and skew differences observed in T. brucei and L. major can be explained through their evolutionary divergence.

Item Type: Thesis (PhD)
Qualification Level: Doctoral
Subjects: Q Science > QR Microbiology
Q Science > QR Microbiology > QR180 Immunology
Colleges/Schools: College of Medical Veterinary and Life Sciences > School of Infection & Immunity
Supervisor's Name: McCulloch, Professor Richard and Cobbold, Professor Christina
Date of Award: 2025
Depositing User: Theses Team
Unique ID: glathesis:2025-85047
Copyright: Copyright of this thesis is held by the author.
Date Deposited: 11 Apr 2025 14:22
Last Modified: 11 Apr 2025 14:31
Thesis DOI: 10.5525/gla.thesis.85047
URI: https://theses.gla.ac.uk/id/eprint/85047

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year