Genome-wide detection of transposable elements for mammalian phylogenomics

  • Transposable elements (TEs) are replicating genetic elementst hat comprise up to 50% of mammalian genomes. A specific class of TEs are retrotransposons that proliferate by transcription into a RNA intermediate, followed by genomic reintegration into another locus (so called “copy & paste” mechanism). Due to the lack of removal mechanisms and very rare parallel insertions, the presence of TE insertions at ortholgous genomic loci in multiple taxa provides a virtually homoplasy free phylogenetic marker. So far, developing phylogenetically informative markers from TE insertions has been a tedious work of testing hundreds of putative candidate loci in a trial-and error approach with low success rate. Hence, phylogenetic studies using TE insertions were often limited to a few dozen markers. Recently, genome sequencing of multiple species using reference-mapping allowed the identification of genome-scale datasets of TE insertions. and made the ad-hoc development of phylogenetic informative markers possible. However, genome scale TE detection methods have rarely been applied to non model organisms in which data availability and quality is comparably limited. In this thesis, I developed the TeddyPi pipeline (TE detection and discovery for phylogenetic inference), a software tool that made it possible to obtain reliable genome-scale TE insertion data from low-coverage genomes. This was achieved by integrating the data from multiple TE and structural variation callers as well as applying a stringent filtering pipeline to exclude low-quality insertion calls. Whole-genome sequencing datasets of bears (Ursidae) and baleen whales (Mysticeti) were used to apply TE based phylogenetic inference and evaluate the method in comparison to sequence-based phylogenomic analyses. In the bear genomes, TeddyPi identified 150,513 high-quality transposable element (TE) insertions, which allowed me to reconstruct the evolutionary history of bears despite extensive phylogenetic conflict (Lammers et al., 2017). The large number of detected TE insertions made also detailed network analyses possible that visualize the phylogenetic conflict. Experimental polymerase chain reaction (PCR) assays validated up to 93 % of the computationally identified TE loci and demonstrated the high accuracy of the dataset underlying the phylogenetic analyses. Second, I present the initial genome sequencing of six baleen whales and a detailed investigation of their evolutionary history using TE insertions and established sequence-based phylogenomic methods. The taxon sampling of baleen whales included iconic species like the blue whale (Balaneoptera musculus) or the humpback whale (Megaptera novaengliae) (Árnason et al., 2018). A sequence-based reconstruction of the baleen whale species tree solved the long-debated phylogenetic position of the gray whale (Echrichtius robustus) within rorquals (Balaneopteridae) for the first time with high statistical support. Furthermore, the genome data made it possible to identify large extent of phylogenetic conflict for divergences during the radiation of rorquals that occurred 7-10 million years ago (Ma). The phylogenomic analyses of 91,589 TE insertions in the whale genomes confirmed the sequence-based topology (Lammers et al., 2019). The quantification of phylogenetic signals obtained from the TE insertions revealed a high degree of discordance for the divergence of the gray whale and rorquals. Despite the large genome-scale dataset, statistical tests showed only marginal support for a bifurcating divergence of gray whales and the rorqual species. The limited statistical support for a strictly bifurcating tree obtained from genome-scale datasets of thousands of markers demonstrates the importance for including phylogenetic networks for displaying evolutionary divergences. In conclusion, this thesis shows that identification of TE insertions from whole-genome resequencing provides plentiful and accurate phylogenomic markers. For the application in non model organisms, I provide a easy-to-use software to integrate multiple datasets from TE and structural variation callers in order to obtain reliable and ascertainment-bias free datasets. Detecting genome-scale datasets of TE insertions in two case studies demonstrates the applicability of this marker system for phylogenetic reconstruction and inferring phylogenetic conflict.

Download full text files

Export metadata

Additional Services

Share in Twitter Search Google Scholar
Author:Fritjof Lammers
Place of publication:Frankfurt am Main
Referee:Axel Janke, Ingo EbersbergerORCiDGND
Document Type:Doctoral Thesis
Date of Publication (online):2019/09/19
Year of first Publication:2019
Publishing Institution:Universitätsbibliothek Johann Christian Senckenberg
Granting Institution:Johann Wolfgang Goethe-Universität
Date of final exam:2019/08/29
Release Date:2019/09/26
Page Number:146
Institutes:Biowissenschaften / Biowissenschaften
Dewey Decimal Classification:5 Naturwissenschaften und Mathematik / 57 Biowissenschaften; Biologie / 570 Biowissenschaften; Biologie
Sammlung Biologie / Biologische Hochschulschriften (Goethe-Universität)
Licence (German):License LogoDeutsches Urheberrecht