• Treffer 3 von 19
Zurück zur Trefferliste

Predicting transcription factor binding using ensemble random forest models [version 1; peer review: 2 approved with reservations]

  • Background: Understanding the location and cell-type specific binding of Transcription Factors (TFs) is important in the study of gene regulation. Computational prediction of TF binding sites is challenging, because TFs often bind only to short DNA motifs and cell-type specific co-factors may work together with the same TF to determine binding. Here, we consider the problem of learning a general model for the prediction of TF binding using DNase1-seq data and TF motif description in form of position specific energy matrices (PSEMs). Methods: We use TF ChIP-seq data as a gold-standard for model training and evaluation. Our contribution is a novel ensemble learning approach using random forest classifiers. In the context of the ENCODE-DREAM in vivo TF binding site prediction challenge we consider different learning setups. Results: Our results indicate that the ensemble learning approach is able to better generalize across tissues and cell-types compared to individual tissue-specific classifiers or a classifier applied to the data aggregated across tissues. Furthermore, we show that incorporating DNase1-seq peaks is essential to reduce the false positive rate of TF binding predictions compared to considering the raw DNase1 signal. Conclusions: Analysis of important features reveals that the models preferentially select motifs of other TFs that are close interaction partners in existing protein protein-interaction networks. Code generated in the scope of this project is available on GitHub: https://github.com/SchulzLab/TFAnalysis (DOI: 10.5281/zenodo.1409697)
Metadaten
Verfasserangaben:Fatemeh Behjati Ardakani, Florian Schmidt, Marcel Holger SchulzORCiDGND
URN:urn:nbn:de:hebis:30:3-535576
DOI:https://doi.org/10.12688/f1000research.16200.1
ISSN:2046-1402
Pubmed-Id:https://pubmed.ncbi.nlm.nih.gov/31723409
Titel des übergeordneten Werkes (Englisch):F1000Research
Verlag:F1000 Research Ltd
Verlagsort:London
Dokumentart:Wissenschaftlicher Artikel
Sprache:Englisch
Jahr der Fertigstellung:2018
Datum der Erstveröffentlichung:04.10.2018
Veröffentlichende Institution:Universitätsbibliothek Johann Christian Senckenberg
Datum der Freischaltung:13.05.2020
Freies Schlagwort / Tag:Chromatin accessibility; DNase1-seq; ENCODE-DREAM in vivo Transcription Factor binding site prediction challenge; Ensemble learning; Indirect-binding; TF-complexes; Transcription Factors
Jahrgang:7
Ausgabe / Heft:Art. 1603
Auflage:version 1
Seitenzahl:26
Erste Seite:1
Letzte Seite:26
Bemerkung:
Copyright:  © 2018 Behjati Ardakani F et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
HeBIS-PPN:465069290
Institute:Medizin / Medizin
DDC-Klassifikation:6 Technik, Medizin, angewandte Wissenschaften / 61 Medizin und Gesundheit / 610 Medizin und Gesundheit
Sammlungen:Universitätspublikationen
Lizenz (Deutsch):License LogoCreative Commons - Namensnennung 4.0