Institutes
Refine
Document Type
- Preprint (6)
Language
- English (6) (remove)
Has Fulltext
- yes (6)
Is part of the Bibliography
- no (6)
Institute
- Medizin (6)
- Informatik (1)
Understanding the complexity of transcriptional regulation is a major goal of computational biology. Because experimental linkage of regulatory sites to genes is challenging, computational methods considering epigenomics data have been proposed to create tissue-specific regulatory maps. However, we showed that these approaches are not well suited to account for the variations of the regulatory landscape between cell-types. To overcome these drawbacks, we developed a new method called STITCHIT, that identifies and links putative regulatory sites to genes. Within STITCHIT, we consider the chromatin accessibility signal of all samples jointly to identify regions exhibiting a signal variation related to the expression of a distinct gene. STITCHIT outperforms previous approaches in various validation experiments and was used with a genome-wide CRISPR-Cas9 screen to prioritize novel doxorubicin-resistance genes and their associated non-coding regulatory regions. We believe that our work paves the way for a more refined understanding of transcriptional regulation at the gene-level.
Background: Eukaryotic gene expression is controlled by cis-regulatory elements (CREs), including promoters and enhancers, which are bound by transcription factors (TFs). Differential expression of TFs and their binding affinity at putative CREs determine tissue- and developmental-specific transcriptional activity. Consolidating genomic data sets can offer further insights into the accessibility of CREs, TF activity, and, thus, gene regulation. However, the integration and analysis of multi-modal data sets are hampered by considerable technical challenges. While methods for highlighting differential TF activity from combined chromatin state data (e.g., ChIP-seq, ATAC-seq, or DNase-seq) and RNA-seq data exist, they do not offer convenient usability, have limited support for large-scale data processing, and provide only minimal functionality for visually interpreting results.
Results: We developed TF-Prioritizer, an automated pipeline that prioritizes condition-specific TFs from multi-modal data and generates an interactive web report. We demonstrated its potential by identifying known TFs along with their target genes, as well as previously unreported TFs active in lactating mouse mammary glands. Additionally, we studied a variety of ENCODE data sets for cell lines K562 and MCF-7, including twelve histone modification ChIP-seq as well as ATAC-seq and DNase-seq datasets, where we observe and discuss assay-specific differences.
Conclusion: TF-Prioritizer accepts ATAC-seq, DNase-seq, or ChIP-seq and RNA-seq data as input and identifies TFs with differential activity, thus offering an understanding of genome-wide gene regulation, potential pathogenesis, and therapeutic targets in biomedical research.
Background Eukaryotic gene expression is controlled by cis-regulatory elements (CREs) including promoters and enhancers which are bound by transcription factors (TFs). Differential expression of TFs and their putative binding sites on CREs cause tissue and developmental-specific transcriptional activity. Consolidating genomic data sets can offer further insights into the accessibility of CREs, TF activity, and thus gene regulation. However, the integration and analysis of multi-modal data sets are hampered by considerable technical challenges. While methods for highlighting differential TF activity from combined ChIP-seq and RNA-seq data exist, they do not offer good usability, have limited support for large-scale data processing, and provide only minimal functionality for visual result interpretation.
Results We developed TF-Prioritizer, an automated java pipeline to prioritize condition-specific TFs derived from multi-modal data. TF-Prioritizer creates an interactive, feature-rich, and user-friendly web report of its results. To showcase the potential of TF-Prioritizer, we identified known active TFs (e.g., Stat5, Elf5, Nfib, Esr1), their target genes (e.g., milk proteins and cell-cycle genes), and newly classified lactating mammary gland TFs (e.g., Creb1, Arnt).
Conclusion TF-Prioritizer accepts ChIP-seq and RNA-seq data, as input and suggests TFs with differential activity, thus offering an understanding of genome-wide gene regulation, potential pathogenesis, and therapeutic targets in biomedical research.
Non-coding variations located within regulatory elements may alter gene expression by modifying Transcription Factor (TF) binding sites and thereby lead to functional consequences like various traits or diseases. To understand these molecular mechanisms, different TF models are being used to assess the effect of DNA sequence variations, such as Single Nucleotide Polymorphisms (SNPs). However, few statistical approaches exist to compute statistical significance of results but they often are slow for large sets of SNPs, such as data obtained from a genome-wide association study (GWAS) or allele-specific analysis of chromatin data.
Results We investigate the distribution of maximal differential TF binding scores for general computational models that assess TF binding. We find that a modified Laplace distribution can adequately approximate the empirical distributions. A benchmark on in vitro and in vivo data sets showed that our new approach improves on an existing method in terms of performance and speed. In applications on large sets of eQTL and GWAS SNPs we could illustrate the usefulness of the novel statistic to highlight cell type specific regulators and TF target genes.
Conclusions Our approach allows the evaluation of DNA changes that induce differential TF binding in a fast and accurate manner, permitting computations on large mutation data sets. An implementation of the novel approach is freely available at https://github.com/SchulzLab/SNEEP.
Genome-wide CRISPR screens are becoming more widespread and allow the simultaneous interrogation of thousands of genomic regions. Although recent progress has been made in the analysis of CRISPR screens, it is still an open problem how to interpret CRISPR mutations in non-coding regions of the genome. Most of the tools concentrate on the interpretation of mutations introduced in gene coding regions. We introduce a computational pipeline that uses epigenomic information about regulatory elements for the interpretation of CRISPR mutations in non-coding regions. We illustrate our approach on the analysis of a genome-wide CRISPR screen in hTERT-RPE-1 cells and reveal novel regulatory elements that mediate chemoresistance against doxorubicin in these cells. We infer links to established and to novel chemoresistance genes. Our approach is general and can be applied on any cell type and with different CRISPR enzymes.
Mechanisms by which specific histone modifications regulate distinct gene regulatory networks remain little understood. We investigated how H3K79me2, a modification catalyzed by DOT1L and previously considered a general transcriptional activation mark, regulates gene expression in mammalian cardiogenesis. Early embryonic cardiomyocyte ablation of Dot1l revealed that H3K79me2 does not act as a general transcriptional activator, but rather regulates highly specific gene regulatory networks at two critical cardiogenic junctures: left ventricle patterning and postnatal cardiomyocyte cell cycle withdrawal. Mechanistic analyses revealed that H3K79me2 in two distinct domains, gene bodies and regulatory elements, synergized to promote expression of genes activated by DOT1L. Surprisingly, these analyses also revealed that H3K79me2 in specific regulatory elements contributed to silencing genes usually not expressed in cardiomyocytes. As DOT1L mutants had increased numbers of postnatal mononuclear cardiomyocytes and prolonged cardiomyocyte cell cycle activity, controlled inhibition of DOT1L might be a strategy to promote cardiac regeneration post-injury.