Refine
Document Type
- Article (5)
Language
- English (5)
Has Fulltext
- yes (5)
Is part of the Bibliography
- no (5)
Keywords
- Laurasiatheria (1)
- Scrotifera (1)
- anomaly zone (1)
- assembly gaps (1)
- benchmarking (1)
- exon coalescence (1)
- exon concatenation (1)
- genome assembly (1)
- long sequencing reads (1)
- retrophylogenomics (1)
Institute
- Biowissenschaften (5) (remove)
Background: Long sequencing reads allow increasing contiguity and completeness of fragmented, short-read–based genome assemblies by closing assembly gaps, ideally at high accuracy. While several gap-closing methods have been developed, these methods often close an assembly gap with sequence that does not accurately represent the true sequence.
Findings: Here, we present DENTIST, a sensitive, highly accurate, and automated pipeline method to close gaps in short-read assemblies with long error-prone reads. DENTIST comprehensively determines repetitive assembly regions to identify reliable and unambiguous alignments of long reads to the correct loci, integrates a consensus sequence computation step to obtain a high base accuracy for the inserted sequence, and validates the accuracy of closed gaps. Unlike previous benchmarks, we generated test assemblies that have gaps at the exact positions where real short-read assemblies have gaps. Generating such realistic benchmarks for Drosophila (134 Mb genome), Arabidopsis (119 Mb), hummingbird (1 Gb), and human (3 Gb) and using simulated or real PacBio continuous long reads, we show that DENTIST consistently achieves a substantially higher accuracy compared to previous methods, while having a similar sensitivity.
Conclusion: DENTIST provides an accurate approach to improve the contiguity and completeness of fragmented assemblies with long reads. DENTIST's source code including a Snakemake workflow, conda package, and Docker container is available at https://github.com/a-ludi/dentist. All test assemblies as a resource for future benchmarking are at https://bds.mpi-cbg.de/hillerlab/DENTIST/.
Detecting associations between genomic changes and phenotypic differences is fundamental to understanding how phenotypes evolved. By systematically screening for parallel amino acid substitutions, we detected known as well as novel cases (Strc, Tecta, and Cabp2) of parallelism between echolocating bats and toothed whales in proteins that could contribute to high-frequency hearing adaptations. Our screen also showed that echolocating mammals exhibit an unusually high number of parallel substitutions in fast-twitch muscle fiber proteins. Both echolocating bats and toothed whales produce an extremely rapid call rate when homing in on their prey, which was shown in bats to be powered by specialized superfast muscles. We show that these genes with parallel substitutions (Casq1, Atp2a1, Myh2, and Myl1) are expressed in the superfast sound-producing muscle of bats. Furthermore, we found that the calcium storage protein calsequestrin 1 of the little brown bat and the bottlenose dolphin functionally converged in its ability to form calcium-sequestering polymers at lower calcium concentrations, which may contribute to rapid calcium transients required for superfast muscle physiology. The proteins that our genomic screen detected could be involved in the convergent evolution of vocalization in echolocating mammals by potentially contributing to both rapid Ca2+ transients and increased shortening velocities in superfast muscles.
Relationships among laurasiatherian clades represent one of the most highly disputed topics in mammalian phylogeny. In this study, we attempt to disentangle laurasiatherian interordinal relationships using two independent genome-level approaches: (1) quantifying retrotransposon presence/absence patterns, and (2) comparisons of exon datasets at the levels of nucleotides and amino acids. The two approaches revealed contradictory phylogenetic signals, possibly due to a high level of ancestral incomplete lineage sorting. The positions of Eulipotyphla and Chiroptera as the first and second earliest divergences were consistent across the approaches. However, the phylogenetic relationships of Perissodactyla, Cetartiodactyla, and Ferae, were contradictory. While retrotransposon insertion analyses suggest a clade with Cetartiodactyla and Ferae, the exon dataset favoured Cetartiodactyla and Perissodactyla. Future analyses of hitherto unsampled laurasiatherian lineages and synergistic analyses of retrotransposon insertions, exon and conserved intron/intergenic sequences might unravel the conflicting patterns of relationships in this major mammalian clade.
Background: Genome sequencing of all known eukaryotes on Earth promises unprecedented advances in biological sciences and in biodiversity-related applied fields such as environmental management and natural product research. Advances in long-read DNA sequencing make it feasible to generate high-quality genomes for many non–genetic model species. However, long-read sequencing today relies on sizable quantities of high-quality, high molecular weight DNA, which is mostly obtained from fresh tissues. This is a challenge for biodiversity genomics of most metazoan species, which are tiny and need to be preserved immediately after collection. Here we present de novo genomes of 2 species of submillimeter Collembola. For each, we prepared the sequencing library from high molecular weight DNA extracted from a single specimen and using a novel ultra-low input protocol from Pacific Biosciences. This protocol requires a DNA input of only 5 ng, permitted by a whole-genome amplification step.
Results: The 2 assembled genomes have N50 values >5.5 and 8.5 Mb, respectively, and both contain ∼96% of BUSCO genes. Thus, they are highly contiguous and complete. The genomes are supported by an integrative taxonomy approach including placement in a genome-based phylogeny of Collembola and designation of a neotype for 1 of the species. Higher heterozygosity values are recorded in the more mobile species. Both species are devoid of the biosynthetic pathway for β-lactam antibiotics known in several Collembola, confirming the tight correlation of antibiotic synthesis with the species way of life.
Conclusions: It is now possible to generate high-quality genomes from single specimens of minute, field-preserved metazoans, exceeding the minimum contig N50 (1 Mb) required by the Earth BioGenome Project.
Vampire bats are the only mammals that feed exclusively on blood. To uncover genomic changes associated with this dietary adaptation, we generated a haplotype-resolved genome of the common vampire bat and screened 27 bat species for genes that were specifically lost in the vampire bat lineage. We found previously unknown gene losses that relate to reduced insulin secretion (FFAR1 and SLC30A8), limited glycogen stores (PPP1R3E), and a unique gastric physiology (CTSE). Other gene losses likely reflect the biased nutrient composition (ERN2 and CTRL) and distinct pathogen diversity of blood (RNASE7) and predict the complete lack of cone-based vision in these strictly nocturnal bats (PDE6H and PDE6C). Notably, REP15 loss likely helped vampire bats adapt to high dietary iron levels by enhancing iron excretion, and the loss of CYP39A1 could have contributed to their exceptional cognitive abilities. These findings enhance our understanding of vampire bat biology and the genomic underpinnings of adaptations to blood feeding.