Pitfalls in supermatrix phylogenomics

  • In the mid-2000s, molecular phylogenetics turned into phylogenomics, a development that improved the resolution of phylogenetic trees through a dramatic reduction in stochastic error. While some then predicted “the end of incongruence”, it soon appeared that analysing large amounts of sequence data without an adequate model of sequence evolution amplifies systematic error and leads to phylogenetic artefacts. With the increasing flood of (sometimes low-quality) genomic data resulting from the rise of high-throughput sequencing, a new type of error has emerged. Termed here “data errors”, it lumps together several kinds of issues affecting the construction of phylogenomic supermatrices (e.g., sequencing and annotation errors, contaminant sequences). While easy to deal with at a single-gene scale, such errors become very difficult to avoid at the genomic scale, both because hand curating thousands of sequences is prohibitively time-consuming and because the suitable automated bioinformatics tools are still in their infancy. In this paper, we first review the pitfalls affecting the construction of supermatrices and the strategies to limit their adverse effects on phylogenomic inference. Then, after discussing the relative non-issue of missing data in supermatrices, we briefly present the approaches commonly used to reduce systematic error.

Download full text files

Export metadata

Additional Services

Share in Twitter Search Google Scholar
Metadaten
Author:Hervé Philippe, Damien M. de Vienne, Vincent Ranwez, Béatrice Roure, Denis Baurain, Frédéric Delsuc
URN:urn:nbn:de:hebis:30:3-461325
DOI:https://doi.org/10.5852/ejt.2017.283
ISSN:2118-9773
Parent Title (English):European journal of taxonomy : 283
Series (Serial Number):European journal of taxonomy : EJT (283)
Publisher:Muséum National d'Histoire Naturelle
Place of publication:Paris
Document Type:Part of Periodical
Language:English
Year of Completion:2017
Year of first Publication:2017
Publishing Institution:Universitätsbibliothek Johann Christian Senckenberg
Release Date:2018/04/18
Tag:Phylogenomics; data quality; incongruence; supermatrix; systematic error
Page Number:25
Dewey Decimal Classification:5 Naturwissenschaften und Mathematik / 59 Tiere (Zoologie) / 590 Tiere (Zoologie)
Sammlungen:Sammlung Biologie / Sondersammelgebiets-Volltexte
Licence (German):License LogoCreative Commons - Namensnennung 3.0