An ontology-based method for assessing batch effect adjustment approaches in heterogeneous datasets

  • Motivation: International consortia such as the Genotype-Tissue Expression (GTEx) project, The Cancer Genome Atlas (TCGA) or the International Human Epigenetics Consortium (IHEC) have produced a wealth of genomic datasets with the goal of advancing our understanding of cell differentiation and disease mechanisms. However, utilizing all of these data effectively through integrative analysis is hampered by batch effects, large cell type heterogeneity and low replicate numbers. To study if batch effects across datasets can be observed and adjusted for, we analyze RNA-seq data of 215 samples from ENCODE, Roadmap, BLUEPRINT and DEEP as well as 1336 samples from GTEx and TCGA. While batch effects are a considerable issue, it is non-trivial to determine if batch adjustment leads to an improvement in data quality, especially in cases of low replicate numbers. Results: We present a novel method for assessing the performance of batch effect adjustment methods on heterogeneous data. Our method borrows information from the Cell Ontology to establish if batch adjustment leads to a better agreement between observed pairwise similarity and similarity of cell types inferred from the ontology. A comparison of state-of-the art batch effect adjustment methods suggests that batch effects in heterogeneous datasets with low replicate numbers cannot be adequately adjusted. Better methods need to be developed, which can be assessed objectively in the framework presented here.

Download full text files

Export metadata

Metadaten
Author:Florian Schmidt, Markus ListORCiDGND, Engin Cukuroglu, Sebastian Köhler, Jonathan GökeORCiDGND, Marcel Holger SchulzORCiDGND
URN:urn:nbn:de:hebis:30:3-516977
DOI:https://doi.org/10.1093/bioinformatics/bty553
ISSN:1460-2059
ISSN:1367-4803
Pubmed Id:https://pubmed.ncbi.nlm.nih.gov/30423059
Parent Title (English):Bioinformatics
Publisher:Oxford Univ. Press
Place of publication:Oxford
Document Type:Article
Language:English
Year of Completion:2018
Date of first Publication:2018/09/08
Publishing Institution:Universitätsbibliothek Johann Christian Senckenberg
Release Date:2019/11/13
Volume:34
Issue:17
Page Number:9
First Page:i908
Last Page:i916
Note:
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
HeBIS-PPN:456369317
Institutes:Medizin / Medizin
Dewey Decimal Classification:6 Technik, Medizin, angewandte Wissenschaften / 61 Medizin und Gesundheit / 610 Medizin und Gesundheit
Sammlungen:Universitätspublikationen
Licence (German):License LogoCreative Commons - Namensnennung 4.0