Refine
Document Type
- Article (6)
- Preprint (4)
- Doctoral Thesis (1)
Language
- English (11)
Has Fulltext
- yes (11)
Is part of the Bibliography
- no (11)
Keywords
- retrotransposition (2)
- Evolution (1)
- LINE-1 (1)
- Phylogenetics (1)
- Retrotransposon (1)
- Tasmanian devil (1)
- Transposable elements (1)
- Ursidae (1)
- Whales (1)
- bears (1)
Bears are iconic mammals with a complex evolutionary history. Natural bear hybrids and studies of few nuclear genes indicate that gene flow among bears may be more common than expected and not limited to polar and brown bears. Here we present a genome analysis of the bear family with representatives of all living species. Phylogenomic analyses of 869 mega base pairs divided into 18,621 genome fragments yielded a well-resolved coalescent species tree despite signals for extensive gene flow across species. However, genome analyses using different statistical methods show that gene flow is not limited to closely related species pairs. Strong ancestral gene flow between the Asiatic black bear and the ancestor to polar, brown and American black bear explains uncertainties in reconstructing the bear phylogeny. Gene flow across the bear clade may be mediated by intermediate species such as the geographically wide-spread brown bears leading to large amounts of phylogenetic conflict. Genome-scale analyses lead to a more complete understanding of complex evolutionary processes. Evidence for extensive inter-specific gene flow, found also in other animal species, necessitates shifting the attention from speciation processes achieving genome-wide reproductive isolation to the selective processes that maintain species divergence in the face of gene flow.
Phylogenetic analyses of nuclear and mitochondrial genomes have shown that polar bears captured the mitochondrial genome of brown bears some 160,00 years ago. This hybridization event likely led to an extinction of the original polar bear mitochondrial genome. However, parts of the mitochondrial DNA occasionally integrates into the nuclear genome, forming pseudogenes called numts (nuclear mitochondrial integrations). Screening the polar bear genome for numts, we identified only 13 such integrations. Analyses of whole-genome sequences from additional polar bears, brown and American black bears as well as the giant panda indicates that the discovered numts entered the bear lineage before the initial ursid radiation some 14 million years ago. Our findings suggests a low integration rate of numts in the bear lineage and a complete loss of the original polar bear mitochondrial genome.
Phylogenetic analyses of nuclear and mitochondrial genomes indicate that polar bears captured the brown bear mitochondrial genome 160,000 years ago, leading to an extinction of the original polar bear mitochondrial genome. However, mitochondrial DNA occasionally integrates into the nuclear genome, forming pseudogenes called numts (nuclear mitochondrial integrations). Screening the polar bear genome identified only 13 numts. Genomic analyses of two additional ursine bears and giant panda indicate that all except one of the discovered numts entered the bear lineage at least 14 million years ago. However, short read genome assemblies might lead to an under-representation of numts or other repetitive sequences. Our findings suggest low integration rates of numts in bears and a loss of the original polar bear mitochondrial genome.
Bears are iconic mammals with a complex evolutionary history. Natural bear hybrids and studies of few nuclear genes indicate that gene flow among bears may be more common than expected and not limited to the closely related polar and brown bears. Here we present a genome analysis of the bear family with representatives of all living species. Phylogenomic analyses of 869 mega base pairs divided into 18,621 genome fragments yielded a well-resolved coalescent species tree despite signals for extensive gene flow across species. However, genome analyses using three different statistical methods show that gene flow is not limited to closely related species pairs. Strong ancestral gene flow between the Asiatic black bear and the ancestor to polar, brown and American black bear explains numerous uncertainties in reconstructing the bear phylogeny. Gene flow across the bear clade may be mediated by intermediate species such as the geographically wide-spread brown bears leading to massive amounts of phylogenetic conflict. Genome-scale analyses lead to a more complete understanding of complex evolutionary processes. The increasing evidence for extensive inter-specific gene flow, found also in other animal species, necessitates shifting the attention from speciation processes achieving genome-wide reproductive isolation to the selective processes that maintain species divergence in the face of gene flow.
Background: Ever decreasing costs along with advances in sequencing and library preparation technologies enable even small research groups to generate chromosome-level assemblies today. Here we report the generation of an improved chromosome-level assembly for the Siamese fighting fish (Betta splendens) that was carried out during a practical university Master’s course. The Siamese fighting fish is a popular aquarium fish and an emerging model species for research on aggressive behaviour. We updated the current genome assembly by generating a new long-read nanopore-based assembly with subsequent scaffolding to chromosome-level using previously published HiC data.
Findings: The use of nanopore-based long-read data sequenced on a MinION platform (Oxford Nanopore Technologies) allowed us to generate a baseline assembly of only 1,276 contigs with a contig N50 of 2.1 Mbp, and a total length of 441 Mbp. Scaffolding using previously published HiC data resulted in 109 scaffolds with a scaffold N50 of 20.7 Mbp. More than 99% of the assembly is comprised in 21 scaffolds. The assembly showed the presence of 95.8% complete BUSCO genes from the Actinopterygii dataset indicating a high quality of the assembly.
Conclusion: We present an improved full chromosome-level assembly of the Siamese fighting fish generated during a university Master’s course. The use of ~35× long-read nanopore data drastically improved the baseline assembly in terms of continuity. We show that relatively in-expensive high-throughput sequencing technologies such as the long-read MinION sequencing platform can be used in educational settings allowing the students to gain practical skills in modern genomics and generate high quality results that benefit downstream research projects.
Compared to sequence analyses, phylogenetic reconstruction from transposable elements (TEs) offers an additional perspective to study evolutionary processes. However, detecting phylogenetically informative TE insertions requires tedious experimental work, limiting the power of phylogenetic inference. Here, we analyzed the genomes of seven bear species using high throughput sequencing data to detect thousands of TE insertions. The newly developed pipeline for TE detection called TeddyPi (TE detection and discovery for Phylogenetic Inference) obtained 150,513 high-quality TE insertions in the genomes of ursine and tremarctine bears. By integrating different TE insertion callers and using a stringent filtering approach, the TeddyPi pipeline produced highly reliable TE insertion calls, which were confirmed by extensive in vitro validation experiments. Screening for single nucleotide substitutions in the flanking regions of the TEs show that these substitutions correlate with the phylogenetic signal from the TE insertions. Our phylogenomic analyses show that TEs are a major driver of genomic variation in bears and enabled phylogenetic reconstruction of a well-resolved species tree, even with strong signals for incomplete lineage sorting and introgression. The analyses show that the Asiatic black, sun and sloth bear form a monophyletic clade. TeddyPi is open source and can be adapted to various TE and structural variation callers. The pipeline makes it easy to confidently extract thousands of TE insertions even from low coverage genomes of non-model organisms, opening new possibilities for biologists to study phylogenies, evolutionary processes as well as rates and patterns of (retro-)transposition and structural variation.
Phylogenetic reconstruction from transposable elements (TEs) offers an additional perspective to study evolutionary processes. However, detecting phylogenetically informative TE insertions requires tedious experimental work, limiting the power of phylogenetic inference. Here, we analyzed the genomes of seven bear species using high-throughput sequencing data to detect thousands of TE insertions. The newly developed pipeline for TE detection called TeddyPi (TE detection and discovery for Phylogenetic Inference) identified 150,513 high-quality TE insertions in the genomes of ursine and tremarctine bears. By integrating different TE insertion callers and using a stringent filtering approach, the TeddyPi pipeline produced highly reliable TE insertion calls, which were confirmed by extensive in vitro validation experiments. Analysis of single nucleotide substitutions in the flanking regions of the TEs shows that these substitutions correlate with the phylogenetic signal from the TE insertions. Our phylogenomic analyses show that TEs are a major driver of genomic variation in bears and enabled phylogenetic reconstruction of a well-resolved species tree, despite strong signals for incomplete lineage sorting and introgression. The analyses show that the Asiatic black, sun, and sloth bear form a monophyletic clade, in which phylogenetic incongruence originates from incomplete lineage sorting. TeddyPi is open source and can be adapted to various TE and structural variation callers. The pipeline makes it possible to confidently extract thousands of TE insertions even from low-coverage genomes (∼10×) of nonmodel organisms. This opens new possibilities for biologists to study phylogenies and evolutionary processes as well as rates and patterns of (retro-)transposition and structural variation.
Transposable elements (TEs) are replicating genetic elementst hat comprise up to 50% of mammalian genomes. A specific class of TEs are retrotransposons that proliferate by transcription into a RNA intermediate, followed by genomic reintegration into another locus (so called “copy & paste” mechanism). Due to the lack of removal mechanisms and very rare parallel insertions, the presence of TE insertions at ortholgous genomic loci in multiple taxa provides a virtually homoplasy free phylogenetic marker. So far, developing phylogenetically informative markers from TE insertions has been a tedious work of testing hundreds of putative candidate loci in a trial-and error approach with low success rate. Hence, phylogenetic studies using TE insertions were often limited to a few dozen markers.
Recently, genome sequencing of multiple species using reference-mapping allowed the identification of genome-scale datasets of TE insertions. and made the ad-hoc development of phylogenetic informative markers possible. However, genome scale TE detection methods have rarely been applied to non model organisms in which data availability and quality is comparably limited. In this thesis, I developed the TeddyPi pipeline (TE detection and discovery for phylogenetic inference), a software tool that made it possible to obtain reliable genome-scale TE insertion data from low-coverage genomes. This was achieved by integrating the data from multiple TE and structural variation callers as well as applying a stringent filtering pipeline to exclude low-quality insertion calls. Whole-genome sequencing datasets of bears (Ursidae) and baleen whales (Mysticeti) were used to apply TE based phylogenetic inference and evaluate the method in comparison to sequence-based phylogenomic analyses.
In the bear genomes, TeddyPi identified 150,513 high-quality transposable element (TE) insertions, which allowed me to reconstruct the evolutionary history of bears despite extensive phylogenetic conflict (Lammers et al., 2017). The large number of detected TE insertions made also detailed network analyses possible that visualize the phylogenetic conflict. Experimental polymerase chain reaction (PCR) assays validated up to 93 % of the computationally identified TE loci and demonstrated the high accuracy of the dataset underlying the phylogenetic analyses.
Second, I present the initial genome sequencing of six baleen whales and a detailed investigation of their evolutionary history using TE insertions and established sequence-based phylogenomic methods. The taxon sampling of baleen whales included iconic species like the blue whale (Balaneoptera musculus) or the humpback whale (Megaptera novaengliae) (Árnason et al., 2018). A sequence-based reconstruction of the baleen whale species tree solved the long-debated phylogenetic position of the gray whale (Echrichtius robustus) within rorquals (Balaneopteridae) for the first time with high statistical support. Furthermore, the genome data made it possible to identify large extent of phylogenetic conflict for divergences during the radiation of rorquals that occurred 7-10 million years ago (Ma).
The phylogenomic analyses of 91,589 TE insertions in the whale genomes confirmed the sequence-based topology (Lammers et al., 2019). The quantification of phylogenetic signals obtained from the TE insertions revealed a high degree of discordance for the divergence of the gray whale and rorquals. Despite the large genome-scale dataset, statistical tests showed only marginal support for a bifurcating divergence of gray whales and the rorqual species. The limited statistical support for a strictly bifurcating tree obtained from genome-scale datasets of thousands of markers demonstrates the importance for including phylogenetic networks for displaying evolutionary divergences.
In conclusion, this thesis shows that identification of TE insertions from whole-genome resequencing provides plentiful and accurate phylogenomic markers. For the application in non model organisms, I provide a easy-to-use software to integrate multiple datasets from TE and structural variation callers in order to obtain reliable and ascertainment-bias free datasets. Detecting genome-scale datasets of TE insertions in two case studies demonstrates the applicability of this marker system for phylogenetic reconstruction and inferring phylogenetic conflict.
he autonomous transposable element LINE-1 is a highly abundant element that makes up between 15% and 20% of therian mammal genomes. Since their origin before the divergence of marsupials and placental mammals, LINE-1 elements have contributed actively to the genome landscape. A previous in silico screen of the Tasmanian devil genome revealed a lack of functional coding LINE-1 sequences. In this study we present the results of an in vitro analysis from a partial LINE-1 reverse transcriptase coding sequence in five marsupial species. Our experimental screen supports the in silico findings of the genome-wide degradation of LINE-1 sequences in the Tasmanian devil, and identifies a high frequency of degraded LINE-1 sequences in other Australian marsupials. The comparison between the experimentally obtained LINE-1 sequences and reference genome assemblies suggests that conclusions from in silico analyses of retrotransposition activity can be influenced by incomplete genome assemblies from short reads.
Retrophylogenomics in rorquals indicate large ancestral population sizes and a rapid radiation
(2019)
Background: Baleen whales (Mysticeti) are the largest animals on earth and their evolutionary history has been studied in detail, but some relationships still remain contentious. In particular, reconstructing the phylogenetic position of the gray whales (Eschrichtiidae) has been complicated by evolutionary processes such as gene flow and incomplete lineage sorting (ILS). Here, whole-genome sequencing data of the extant baleen whale radiation allowed us to identify transposable element (TE) insertions in order to perform phylogenomic analyses and measure germline insertion rates of TEs. Baleen whales exhibit the slowest nucleotide substitution rate among mammals, hence we additionally examined the evolutionary insertion rates of TE insertions across the genomes.
Results: In eleven whole-genome sequences representing the extant radiation of baleen whales, we identified 91,859 CHR-SINE insertions that were used to reconstruct the phylogeny with different approaches as well as perform evolutionary network analyses and a quantification of conflicting phylogenetic signals. Our results indicate that the radiation of rorquals and gray whales might not be bifurcating. The morphologically derived gray whales are placed inside the rorqual group, as the sister-species to humpback and fin whales. Detailed investigation of TE insertion rates confirm that a mutational slow down in the whale lineage is present but less pronounced for TEs than for nucleotide substitutions.
Conclusions: Whole genome sequencing based detection of TE insertions showed that the speciation processes in baleen whales represent a rapid radiation. Large genome-scale TE data sets in addition allow to understand retrotransposition rates in non-model organisms and show the potential for TE calling methods to study the evolutionary history of species.