Refine
Document Type
- Article (3)
Language
- English (3)
Has Fulltext
- yes (3)
Is part of the Bibliography
- no (3)
Keywords
- Afrotheria (1)
- Animal phylogenetics (1)
- Elephants (1)
- Eutheria (1)
- Hyraxes (1)
- Mammalian genomics (1)
- Mammals (1)
- Sequence alignment (1)
- assembly gaps (1)
- benchmarking (1)
Institute
- Senckenbergische Naturforschende Gesellschaft (3) (remove)
Background: Long sequencing reads allow increasing contiguity and completeness of fragmented, short-read–based genome assemblies by closing assembly gaps, ideally at high accuracy. While several gap-closing methods have been developed, these methods often close an assembly gap with sequence that does not accurately represent the true sequence.
Findings: Here, we present DENTIST, a sensitive, highly accurate, and automated pipeline method to close gaps in short-read assemblies with long error-prone reads. DENTIST comprehensively determines repetitive assembly regions to identify reliable and unambiguous alignments of long reads to the correct loci, integrates a consensus sequence computation step to obtain a high base accuracy for the inserted sequence, and validates the accuracy of closed gaps. Unlike previous benchmarks, we generated test assemblies that have gaps at the exact positions where real short-read assemblies have gaps. Generating such realistic benchmarks for Drosophila (134 Mb genome), Arabidopsis (119 Mb), hummingbird (1 Gb), and human (3 Gb) and using simulated or real PacBio continuous long reads, we show that DENTIST consistently achieves a substantially higher accuracy compared to previous methods, while having a similar sensitivity.
Conclusion: DENTIST provides an accurate approach to improve the contiguity and completeness of fragmented assemblies with long reads. DENTIST's source code including a Snakemake workflow, conda package, and Docker container is available at https://github.com/a-ludi/dentist. All test assemblies as a resource for future benchmarking are at https://bds.mpi-cbg.de/hillerlab/DENTIST/.
Descent of testes from a position near the kidneys into the lower abdomen or into the scrotum is an important developmental process that occurs in all placental mammals, with the exception of five afrotherian lineages. Since soft-tissue structures like testes are not preserved in the fossil record and since key parts of the placental mammal phylogeny remain controversial, it has been debated whether testicular descent is the ancestral or derived condition in placental mammals. To resolve this debate, we used genomic data of 71 mammalian species and analyzed the evolution of two key genes (relaxin/insulin-like family peptide receptor 2 [RXFP2] and insulin-like 3 [INSL3]) that induce the development of the gubernaculum, the ligament that is crucial for testicular descent. We show that both RXFP2 and INSL3 are lost or nonfunctional exclusively in four afrotherians (tenrec, cape elephant shrew, cape golden mole, and manatee) that completely lack testicular descent. The presence of remnants of once functional orthologs of both genes in these afrotherian species shows that these gene losses happened after the split from the placental mammal ancestor. These “molecular vestiges” provide strong evidence that testicular descent is the ancestral condition, irrespective of persisting phylogenetic discrepancies. Furthermore, the absence of shared gene-inactivating mutations and our estimates that the loss of RXFP2 happened at different time points strongly suggest that testicular descent was lost independently in Afrotheria. Our results provide a molecular mechanism that explains the loss of testicular descent in afrotherians and, more generally, highlight how molecular vestiges can provide insights into the evolution of soft-tissue characters.
Background: Genome sequencing of all known eukaryotes on Earth promises unprecedented advances in biological sciences and in biodiversity-related applied fields such as environmental management and natural product research. Advances in long-read DNA sequencing make it feasible to generate high-quality genomes for many non–genetic model species. However, long-read sequencing today relies on sizable quantities of high-quality, high molecular weight DNA, which is mostly obtained from fresh tissues. This is a challenge for biodiversity genomics of most metazoan species, which are tiny and need to be preserved immediately after collection. Here we present de novo genomes of 2 species of submillimeter Collembola. For each, we prepared the sequencing library from high molecular weight DNA extracted from a single specimen and using a novel ultra-low input protocol from Pacific Biosciences. This protocol requires a DNA input of only 5 ng, permitted by a whole-genome amplification step.
Results: The 2 assembled genomes have N50 values >5.5 and 8.5 Mb, respectively, and both contain ∼96% of BUSCO genes. Thus, they are highly contiguous and complete. The genomes are supported by an integrative taxonomy approach including placement in a genome-based phylogeny of Collembola and designation of a neotype for 1 of the species. Higher heterozygosity values are recorded in the more mobile species. Both species are devoid of the biosynthetic pathway for β-lactam antibiotics known in several Collembola, confirming the tight correlation of antibiotic synthesis with the species way of life.
Conclusions: It is now possible to generate high-quality genomes from single specimens of minute, field-preserved metazoans, exceeding the minimum contig N50 (1 Mb) required by the Earth BioGenome Project.