Refine
Document Type
- Preprint (6)
- Article (3)
- Doctoral Thesis (1)
Language
- English (10)
Has Fulltext
- yes (10)
Is part of the Bibliography
- no (10)
Institute
- Medizin (9)
- Informatik (2)
Background: Eukaryotic gene expression is controlled by cis-regulatory elements (CREs), including promoters and enhancers, which are bound by transcription factors (TFs). Differential expression of TFs and their binding affinity at putative CREs determine tissue- and developmental-specific transcriptional activity. Consolidating genomic data sets can offer further insights into the accessibility of CREs, TF activity, and, thus, gene regulation. However, the integration and analysis of multi-modal data sets are hampered by considerable technical challenges. While methods for highlighting differential TF activity from combined chromatin state data (e.g., ChIP-seq, ATAC-seq, or DNase-seq) and RNA-seq data exist, they do not offer convenient usability, have limited support for large-scale data processing, and provide only minimal functionality for visually interpreting results.
Results: We developed TF-Prioritizer, an automated pipeline that prioritizes condition-specific TFs from multi-modal data and generates an interactive web report. We demonstrated its potential by identifying known TFs along with their target genes, as well as previously unreported TFs active in lactating mouse mammary glands. Additionally, we studied a variety of ENCODE data sets for cell lines K562 and MCF-7, including twelve histone modification ChIP-seq as well as ATAC-seq and DNase-seq datasets, where we observe and discuss assay-specific differences.
Conclusion: TF-Prioritizer accepts ATAC-seq, DNase-seq, or ChIP-seq and RNA-seq data as input and identifies TFs with differential activity, thus offering an understanding of genome-wide gene regulation, potential pathogenesis, and therapeutic targets in biomedical research.
Background Eukaryotic gene expression is controlled by cis-regulatory elements (CREs) including promoters and enhancers which are bound by transcription factors (TFs). Differential expression of TFs and their putative binding sites on CREs cause tissue and developmental-specific transcriptional activity. Consolidating genomic data sets can offer further insights into the accessibility of CREs, TF activity, and thus gene regulation. However, the integration and analysis of multi-modal data sets are hampered by considerable technical challenges. While methods for highlighting differential TF activity from combined ChIP-seq and RNA-seq data exist, they do not offer good usability, have limited support for large-scale data processing, and provide only minimal functionality for visual result interpretation.
Results We developed TF-Prioritizer, an automated java pipeline to prioritize condition-specific TFs derived from multi-modal data. TF-Prioritizer creates an interactive, feature-rich, and user-friendly web report of its results. To showcase the potential of TF-Prioritizer, we identified known active TFs (e.g., Stat5, Elf5, Nfib, Esr1), their target genes (e.g., milk proteins and cell-cycle genes), and newly classified lactating mammary gland TFs (e.g., Creb1, Arnt).
Conclusion TF-Prioritizer accepts ChIP-seq and RNA-seq data, as input and suggests TFs with differential activity, thus offering an understanding of genome-wide gene regulation, potential pathogenesis, and therapeutic targets in biomedical research.
Genome-wide CRISPR screens are becoming more widespread and allow the simultaneous interrogation of thousands of genomic regions. Although recent progress has been made in the analysis of CRISPR screens, it is still an open problem how to interpret CRISPR mutations in non-coding regions of the genome. Most of the tools concentrate on the interpretation of mutations introduced in gene coding regions. We introduce a computational pipeline that uses epigenomic information about regulatory elements for the interpretation of CRISPR mutations in non-coding regions. We illustrate our approach on the analysis of a genome-wide CRISPR screen in hTERT-RPE-1 cells and reveal novel regulatory elements that mediate chemoresistance against doxorubicin in these cells. We infer links to established and to novel chemoresistance genes. Our approach is general and can be applied on any cell type and with different CRISPR enzymes.
Mechanisms by which specific histone modifications regulate distinct gene regulatory networks remain little understood. We investigated how H3K79me2, a modification catalyzed by DOT1L and previously considered a general transcriptional activation mark, regulates gene expression in mammalian cardiogenesis. Early embryonic cardiomyocyte ablation of Dot1l revealed that H3K79me2 does not act as a general transcriptional activator, but rather regulates highly specific gene regulatory networks at two critical cardiogenic junctures: left ventricle patterning and postnatal cardiomyocyte cell cycle withdrawal. Mechanistic analyses revealed that H3K79me2 in two distinct domains, gene bodies and regulatory elements, synergized to promote expression of genes activated by DOT1L. Surprisingly, these analyses also revealed that H3K79me2 in specific regulatory elements contributed to silencing genes usually not expressed in cardiomyocytes. As DOT1L mutants had increased numbers of postnatal mononuclear cardiomyocytes and prolonged cardiomyocyte cell cycle activity, controlled inhibition of DOT1L might be a strategy to promote cardiac regeneration post-injury.
Understanding how epigenetic variation in non-coding regions is involved in distal gene-expression regulation is an important problem. Regulatory regions can be associated to genes using large-scale datasets of epigenetic and expression data. However, for regions of complex epigenomic signals and enhancers that regulate many genes, it is difficult to understand these associations. We present StitchIt, an approach to dissect epigenetic variation in a gene-specific manner for the detection of regulatory elements (REMs) without relying on peak calls in individual samples. StitchIt segments epigenetic signal tracks over many samples to generate the location and the target genes of a REM simultaneously. We show that this approach leads to a more accurate and refined REM detection compared to standard methods even on heterogeneous datasets, which are challenging to model. Also, StitchIt REMs are highly enriched in experimentally determined chromatin interactions and expression quantitative trait loci. We validated several newly predicted REMs using CRISPR-Cas9 experiments, thereby demonstrating the reliability of StitchIt. StitchIt is able to dissect regulation in superenhancers and predicts thousands of putative REMs that go unnoticed using peak-based approaches suggesting that a large part of the regulome might be uncharted water.
Non-coding variations located within regulatory elements may alter gene expression by modifying Transcription Factor (TF) binding sites and thereby lead to functional consequences like various traits or diseases. To understand these molecular mechanisms, different TF models are being used to assess the effect of DNA sequence variations, such as Single Nucleotide Polymorphisms (SNPs). However, few statistical approaches exist to compute statistical significance of results but they often are slow for large sets of SNPs, such as data obtained from a genome-wide association study (GWAS) or allele-specific analysis of chromatin data.
Results We investigate the distribution of maximal differential TF binding scores for general computational models that assess TF binding. We find that a modified Laplace distribution can adequately approximate the empirical distributions. A benchmark on in vitro and in vivo data sets showed that our new approach improves on an existing method in terms of performance and speed. In applications on large sets of eQTL and GWAS SNPs we could illustrate the usefulness of the novel statistic to highlight cell type specific regulators and TF target genes.
Conclusions Our approach allows the evaluation of DNA changes that induce differential TF binding in a fast and accurate manner, permitting computations on large mutation data sets. An implementation of the novel approach is freely available at https://github.com/SchulzLab/SNEEP.
Understanding the complexity of transcriptional regulation is a major goal of computational biology. Because experimental linkage of regulatory sites to genes is challenging, computational methods considering epigenomics data have been proposed to create tissue-specific regulatory maps. However, we showed that these approaches are not well suited to account for the variations of the regulatory landscape between cell-types. To overcome these drawbacks, we developed a new method called STITCHIT, that identifies and links putative regulatory sites to genes. Within STITCHIT, we consider the chromatin accessibility signal of all samples jointly to identify regions exhibiting a signal variation related to the expression of a distinct gene. STITCHIT outperforms previous approaches in various validation experiments and was used with a genome-wide CRISPR-Cas9 screen to prioritize novel doxorubicin-resistance genes and their associated non-coding regulatory regions. We believe that our work paves the way for a more refined understanding of transcriptional regulation at the gene-level.
Endothelial cells play a critical role in the adaptation of tissues to injury. Tissue ischemia induced by infarction leads to profound changes in endothelial cell functions and can induce transition to a mesenchymal state. Here we explore the kinetics and individual cellular responses of endothelial cells after myocardial infarction by using single cell RNA sequencing. This study demonstrates a time dependent switch in endothelial cell proliferation and inflammation associated with transient changes in metabolic gene signatures. Trajectory analysis reveals that the majority of endothelial cells 3 to 7 days after myocardial infarction acquire a transient state, characterized by mesenchymal gene expression, which returns to baseline 14 days after injury. Lineage tracing, using the Cdh5-CreERT2;mT/mG mice followed by single cell RNA sequencing, confirms the transient mesenchymal transition and reveals additional hypoxic and inflammatory signatures of endothelial cells during early and late states after injury. These data suggest that endothelial cells undergo a transient mes-enchymal activation concomitant with a metabolic adaptation within the first days after myocardial infarction but do not acquire a long-term mesenchymal fate. This mesenchymal activation may facilitate endothelial cell migration and clonal expansion to regenerate the vascular network.
A central concern in genetics is to identify mechanisms of transcriptional regulation. The aim is to unravel the mapping between the DNA sequence and gene expression. However, it turned out that this is extremely complex. Gene regulation is highly cell type-specific and even moderate changes in gene ex- pression can have functional consequences.
Important contributors to gene regulation are transcription factors (TFs), that are able to directly interact with the DNA. Often, a first step in understanding the effect of a TF on the gene’s regulation is to identify the genomic regions a TF binds to. Therefore, one needs to be aware of the TF’s binding preferences, which are commonly summarized in TF binding motifs. Although for many TFs the binding motif is experimentally validated, there is still a large number of TFs where no binding motif is known. There exist many tools that link TF binding motifs to TFs. We developed the method Massif that improves the performance of such tools by incorporating a domain score that uses the DNA binding domain of the studied TF as additional information.
TF binding sites are often enriched in regulatory elements (REMs) such as promoters or enhancers, where the latter can be located megabases away from its target gene. However, to understand the regulation of a gene it is crucial to know where the REMs of a gene are located. We introduced the EpiRegio webserver that holds REMs associated to target genes predicted across many cell types and tissues using STITCHIT, a previously established method. Our publicly available webserver enables to query for REMs associated to genes (gene query) and REMs overlapping genomic regions (region query). We illus- trated the usefulness of EpiRegio by pointing to a TF that occurs enriched in the REMs of differential expressed genes in circPLOD2 depleted pericytes. Further, we highlighted genes, which are affected by CRISPR-Cas induced mutations in non-coding genomic regions using EpiRegio’s region query. Non-coding genetic variants within REMs may alter gene expression by modifying TF binding sites, which can lead to various kinds of traits or diseases. To understand the underlying molecular mechanisms, one aims to evaluate the effect of such genetic variations on TF binding sites. We developed an accurate and fast statistical approach, that can assess whether a single nucleotide polymorphism (SNP) is regulatory. Further, we combined this approach with epigenetic data and additional analyses in our Sneep workflow. For instance, it enables to identify TFs whose binding preferences are affected by the analyzed SNPs, which is illustrated on eQTL datasets for different cell types. Additionally, we used our Sneep workflow to highlight cardiovascular disease genes using regulatory SNPs and REM-gene interactions.
Overall, the described results allow a better understanding of REM-gene interactions and their interplay with TFs on gene regulation.
Highlights
• Single nucleotide variants (SNVs) may affect transcription factor (TF) binding
• Fast statistical approach to assess significance of differential TF binding for SNVs
• Validate new approach on in vitro and in vivo TF binding assays
• Applications on GWAS SNVs and large eQTL studies illustrate utility
Summary
Non-coding variants located within regulatory elements may alter gene expression by modifying transcription factor (TF) binding sites, thereby leading to functional consequences. Different TF models are being used to assess the effect of DNA sequence variants, such as single nucleotide variants (SNVs). Often existing methods are slow and do not assess statistical significance of results. We investigated the distribution of absolute maximal differential TF binding scores for general computational models that affect TF binding. We find that a modified Laplace distribution can adequately approximate the empirical distributions. A benchmark on in vitro and in vivo datasets showed that our approach improves upon an existing method in terms of performance and speed. Applications on eQTLs and on a genome-wide association study illustrate the usefulness of our statistics by highlighting cell type-specific regulators and target genes. An implementation of our approach is freely available on GitHub and as bioconda package.