Refine
Year of publication
- 2017 (2)
Document Type
Language
- English (2)
Has Fulltext
- yes (2)
Is part of the Bibliography
- no (2)
Keywords
- Census of Life (1)
- Museum life (1)
- Phylogenomics (1)
- data quality (1)
- eukaryotic biodiversity (1)
- incongruence (1)
- molecular assisted systematics (1)
- species description (1)
- supermatrix (1)
- systematic error (1)
The special set of papers entitled “DNA Library of Life” constitutes an outcome of the project “Bibliothèque du vivant” (BdV), which aims to promote the molecular taxonomy of eukaryotes by offering research teams the possibility to produce and manage a molecular library linked with specimens deposited in natural history museums. The project was funded by three French institutions (the CNRS, INRA and MNHN), and provided access to the sequencing power offered by the Genoscope for 105 teams between 2011 and 2013. It was subsequently supported by the CNRS through the “Groupement de Recherche Génomique Environnementale”. The scientific objectives of this programme were threefold: 1) species delimitation among species complexes; 2) phylogenetic reconstruction (including phylogenomics); and 3) metabarcoding and improving NGS methods for systematic purposes. Within the present collection, 19 papers contribute to these objectives across a large taxonomic range and a worldwide geographic coverage. These papers propose taxonomic novelties (22 new species and 3 new genera) in both animal and plant taxa.
In the mid-2000s, molecular phylogenetics turned into phylogenomics, a development that improved the resolution of phylogenetic trees through a dramatic reduction in stochastic error. While some then predicted “the end of incongruence”, it soon appeared that analysing large amounts of sequence data without an adequate model of sequence evolution amplifies systematic error and leads to phylogenetic artefacts. With the increasing flood of (sometimes low-quality) genomic data resulting from the rise of high-throughput sequencing, a new type of error has emerged. Termed here “data errors”, it lumps together several kinds of issues affecting the construction of phylogenomic supermatrices (e.g., sequencing and annotation errors, contaminant sequences). While easy to deal with at a single-gene scale, such errors become very difficult to avoid at the genomic scale, both because hand curating thousands of sequences is prohibitively time-consuming and because the suitable automated bioinformatics tools are still in their infancy. In this paper, we first review the pitfalls affecting the construction of supermatrices and the strategies to limit their adverse effects on phylogenomic inference. Then, after discussing the relative non-issue of missing data in supermatrices, we briefly present the approaches commonly used to reduce systematic error.