OPUS 4 | Suchen

A statistical approach to identify regulatory DNA variations (2023)

Baumgarten, Nina ; Rumpf, Laura ; Keßler, Thorsten ; Schulz, Marcel Holger

Non-coding variations located within regulatory elements may alter gene expression by modifying Transcription Factor (TF) binding sites and thereby lead to functional consequences like various traits or diseases. To understand these molecular mechanisms, different TF models are being used to assess the effect of DNA sequence variations, such as Single Nucleotide Polymorphisms (SNPs). However, few statistical approaches exist to compute statistical significance of results but they often are slow for large sets of SNPs, such as data obtained from a genome-wide association study (GWAS) or allele-specific analysis of chromatin data. Results We investigate the distribution of maximal differential TF binding scores for general computational models that assess TF binding. We find that a modified Laplace distribution can adequately approximate the empirical distributions. A benchmark on in vitro and in vivo data sets showed that our new approach improves on an existing method in terms of performance and speed. In applications on large sets of eQTL and GWAS SNPs we could illustrate the usefulness of the novel statistic to highlight cell type specific regulators and TF target genes. Conclusions Our approach allows the evaluation of DNA changes that induce differential TF binding in a fast and accurate manner, permitting computations on large mutation data sets. An implementation of the novel approach is freely available at https://github.com/SchulzLab/SNEEP.

Computational prediction of CRISPR-impaired non-coding regulatory regions (2020)

Baumgarten, Nina ; Schmidt, Florian ; Wegner, Martin ; Hebel, Marie ; Kaulich, Manuel ; Schulz, Marcel Holger

Genome-wide CRISPR screens are becoming more widespread and allow the simultaneous interrogation of thousands of genomic regions. Although recent progress has been made in the analysis of CRISPR screens, it is still an open problem how to interpret CRISPR mutations in non-coding regions of the genome. Most of the tools concentrate on the interpretation of mutations introduced in gene coding regions. We introduce a computational pipeline that uses epigenomic information about regulatory elements for the interpretation of CRISPR mutations in non-coding regions. We illustrate our approach on the analysis of a genome-wide CRISPR screen in hTERT-RPE-1 cells and reveal novel regulatory elements that mediate chemoresistance against doxorubicin in these cells. We infer links to established and to novel chemoresistance genes. Our approach is general and can be applied on any cell type and with different CRISPR enzymes.

The endothelial-specific LINC00607 mediates endothelial angiogenic function (2022)

Boos, Frederike ; Oo, James A. ; Warwick, Timothy ; Günther, Stefan ; Ponce, Judit Izquierdo ; Buchmann, Giulia Karolin ; Li, Tianfu ; Seredinski, Sandra ; Haydar, Shaza ; Kashefiolasl, Sepide ; Baker, Andrew H. ; Boon, Reinier ; Schulz, Marcel Holger ; Miller, Francis J. ; Brandes, Ralf ; Leisegang, Matthias

Long non-coding RNAs (lncRNAs) can act as regulatory RNAs which, by altering the expression of target genes, impact on the cellular phenotype and cardiovascular disease development. Endothelial lncRNAs and their vascular functions are largely undefined. Deep RNA-Seq and FANTOM5 CAGE analysis revealed the lncRNA LINC00607 to be highly enriched in human endothelial cells. LINC00607 was induced in response to hypoxia, arteriosclerosis regression in non-human primates and also in response to propranolol used to induce regression of human arteriovenous malformations. siRNA knockdown or CRISPR/Cas9 knockout of LINC00607 attenuated VEGF-A-induced angiogenic sprouting. LINC00607 knockout in endothelial cells also integrated less into newly formed vascular networks in an in vivo assay in SCID mice. Overexpression of LINC00607 in CRISPR knockout cells restored normal endothelial function. RNA- and ATAC-Seq after LINC00607 knockout revealed changes in the transcription of endothelial gene sets linked to the endothelial phenotype and in chromatin accessibility around ERG-binding sites. Mechanistically, LINC00607 interacted with the SWI/SNF chromatin remodeling protein BRG1. CRISPR/Cas9-mediated knockout of BRG1 in HUVEC followed by CUT&RUN revealed that BRG1 is required to secure a stable chromatin state, mainly on ERG-binding sites. In conclusion, LINC00607 is an endothelial-enriched lncRNA that maintains ERG target gene transcription by interacting with the chromatin remodeler BRG1.

The epigenetic modifier DOT1L regulates gene regulatory networks necessary for cardiac patterning and cardiomyocyte cell cycle withdrawal (2022)

Cattaneo, Paola ; Hayes, Michael G. B. ; Baumgarten, Nina ; Hecker, Dennis ; Peruzzo, Sofia ; Kunderfranco, Paolo ; Larcher, Veronica ; Zhang, Lunfeng ; Contu, Riccardo ; Fonseca, Gregory ; Spinozzi, Simone ; Chen, Ju ; Condorelli, Gianluigi ; Schulz, Marcel Holger ; Heinz, Sven ; Guimarães-Camboa, Nuno ; Evans, Sylvia M.

Mechanisms by which specific histone modifications regulate distinct gene regulatory networks remain little understood. We investigated how H3K79me2, a modification catalyzed by DOT1L and previously considered a general transcriptional activation mark, regulates gene expression in mammalian cardiogenesis. Early embryonic cardiomyocyte ablation of Dot1l revealed that H3K79me2 does not act as a general transcriptional activator, but rather regulates highly specific gene regulatory networks at two critical cardiogenic junctures: left ventricle patterning and postnatal cardiomyocyte cell cycle withdrawal. Mechanistic analyses revealed that H3K79me2 in two distinct domains, gene bodies and regulatory elements, synergized to promote expression of genes activated by DOT1L. Surprisingly, these analyses also revealed that H3K79me2 in specific regulatory elements contributed to silencing genes usually not expressed in cardiomyocytes. As DOT1L mutants had increased numbers of postnatal mononuclear cardiomyocytes and prolonged cardiomyocyte cell cycle activity, controlled inhibition of DOT1L might be a strategy to promote cardiac regeneration post-injury.

Two Piwis with Ago-like functions silence somatic genes at the chromatin level (2020)

Drews, Franziska ; Karunanithi, Sivarajan ; Götz, Ulrike ; Marker, Simone ; Wijn, Raphael de ; Pirritano, Marcello ; Rodrigues-Viana, Angela M. ; Jung, Martin ; Gasparoni, Gilles ; Schulz, Marcel Holger ; Simon, Martin

Most sRNA biogenesis mechanisms involve either RNAseIII cleavage or ping-pong amplification by different Piwi proteins harboring slicer activity. Here, we follow the question why the mechanism of transgene-induced silencing in the ciliate Paramecium needs both Dicer activity and two Ptiwi proteins. This pathway involves primary siRNAs produced from non-translatable transgenes and secondary siRNAs from endogenous remote loci. Our data does not indicate any signatures from ping-pong amplification but Dicer cleavage of long dsRNA. We show that Ptiwi13 and 14 have different preferences for primary and secondary siRNAs but do not load them mutually exclusive. Both Piwis enrich for antisense RNAs and Ptiwi14 loaded siRNAs show a 5′-U signature. Both Ptiwis show in addition a general preference for Uridine-rich sRNAs along the entire sRNA length. Our data indicates both Ptiwis and 2’-O-methylation to contribute to strand selection of Dicer cleaved siRNAs. This unexpected function of two distinct vegetative Piwis extends the increasing knowledge of the diversity of Piwi functions in diverse silencing pathways. As both Ptiwis show differential subcellular localisation, Ptiwi13 in the cytoplasm and Ptiwi14 in the vegetative macronucleus, we conclude that cytosolic and nuclear silencing factors are necessary for efficient chromatin silencing.

Broad domains of histone marks in the highly compact Paramecium macronuclear genome (2021)

Drews, Franziska ; Salhab, Abdulrahman ; Karunanithi, Sivarajan ; Cheaib, Miriam ; Jung, Martin ; Schulz, Marcel Holger ; Simon, Martin

The unicellular ciliate Paramecium contains a large vegetative macronucleus with several unusual characteristics including an extremely high coding density and high polyploidy. As macronculear chromatin is devoid of heterochromatin our study characterizes the functional epigenomic organisation necessary for gene regulation and proper PolII activity. Histone marks (H3K4me3, H3K9ac, H3K27me3) revealed no narrow peaks but broad domains along gene bodies, whereas intergenic regions were devoid of nucleosomes. Our data implicates H3K4me3 levels inside ORFs to be the main factor to associate with gene expression and H3K27me3 appears to occur as a bistable domain with H3K4me3 in plastic genes. Surprisingly, silent and lowly expressed genes show low nucleosome occupancy suggesting that gene inactivation does not involve increased nucleosome occupancy and chromatin condensation. Due to a high occupancy of Pol II along highly expressed ORFs, transcriptional elongation appears to be quite different to other species. This is supported by missing heptameric repeats in the C-terminal domain of Pol II and a divergent elongation system. Our data implies that unoccupied DNA is the default state, whereas gene activation requires nucleosome recruitment together with broad domains of H3K4me3. This could represent a buffer for paused Pol II along ORFs in absence of elongation factors of higher eukaryotes.

Efficiently quantifying DNA methylation for bulk- and single-cell bisulfite data (2023)

Fischer, Jonas ; Schulz, Marcel Holger

Motivation DNA CpG methylation (CpGm) has proven to be a crucial epigenetic factor in the gene regulatory system. Assessment of DNA CpG methylation values via whole-genome bisulfite sequencing (WGBS) is, however, computationally extremely demanding. Results We present FAst MEthylation calling (FAME), the first approach to quantify CpGm values directly from bulk or single-cell WGBS reads without intermediate output files. FAME is very fast but as accurate as standard methods, which first produce BS alignment files before computing CpGm values. We present experiments on bulk and single-cell bisulfite datasets in which we show that data analysis can be significantly sped-up and help addressing the current WGBS analysis bottleneck for large-scale datasets without compromising accuracy. Availability An implementation of FAME is open source and licensed under GPL-3.0 at https://github.com/FischerJo/FAME.

Widespread effects of DNA methylation and intra-motif dependencies revealed by novel transcription factor binding models (2020)

Grau, Jan ; Schmidt, Florian ; Schulz, Marcel Holger

Several studies suggested that transcription factor (TF) binding to DNA may be impaired or enhanced by DNA methylation. We present MeDeMo, a toolbox for TF motif analysis that combines information about DNA methylation with models capturing intra-motif dependencies. In a large-scale study using ChIP-seq data for 335 TFs, we identify novel TFs that are affected by DNA methylation. Overall, we find that CpG methylation decreases the likelihood of binding for the majority of TFs. For a considerable subset of TFs, we show that intra-motif dependencies are pivotal for accurately modelling the impact of DNA methylation on TF binding.

TF-Prioritizer: a java pipeline to prioritize condition-specific transcription factors (2022)

Hoffmann, Markus ; Trummer, Nico ; Jankowski, Jakub ; Kyung Lee, Hye ; Willruth, Lina-Liv ; Lazareva, Olga ; Yuan, Kevin ; Baumgarten, Nina ; Schmidt, Florian ; Baumbach, Jan ; Schulz, Marcel Holger ; Blumenthal, David B. ; Hennighausen, Lothar ; List, Markus

Background Eukaryotic gene expression is controlled by cis-regulatory elements (CREs) including promoters and enhancers which are bound by transcription factors (TFs). Differential expression of TFs and their putative binding sites on CREs cause tissue and developmental-specific transcriptional activity. Consolidating genomic data sets can offer further insights into the accessibility of CREs, TF activity, and thus gene regulation. However, the integration and analysis of multi-modal data sets are hampered by considerable technical challenges. While methods for highlighting differential TF activity from combined ChIP-seq and RNA-seq data exist, they do not offer good usability, have limited support for large-scale data processing, and provide only minimal functionality for visual result interpretation. Results We developed TF-Prioritizer, an automated java pipeline to prioritize condition-specific TFs derived from multi-modal data. TF-Prioritizer creates an interactive, feature-rich, and user-friendly web report of its results. To showcase the potential of TF-Prioritizer, we identified known active TFs (e.g., Stat5, Elf5, Nfib, Esr1), their target genes (e.g., milk proteins and cell-cycle genes), and newly classified lactating mammary gland TFs (e.g., Creb1, Arnt). Conclusion TF-Prioritizer accepts ChIP-seq and RNA-seq data, as input and suggests TFs with differential activity, thus offering an understanding of genome-wide gene regulation, potential pathogenesis, and therapeutic targets in biomedical research.

TF-Prioritizer: a java pipeline to prioritize condition-specific transcription factors (2023)

Hoffmann, Markus ; Trummer, Nico ; Schwartz, Leon ; Jankowski, Jakub ; Kyung Lee, Hye ; Willruth, Lina-Liv ; Lazareva, Olga ; Yuan, Kevin ; Baumgarten, Nina ; Schmidt, Florian ; Baumbach, Jan ; Schulz, Marcel Holger ; Blumenthal, David B. ; Hennighausen, Lothar ; List, Markus

Background: Eukaryotic gene expression is controlled by cis-regulatory elements (CREs), including promoters and enhancers, which are bound by transcription factors (TFs). Differential expression of TFs and their binding affinity at putative CREs determine tissue- and developmental-specific transcriptional activity. Consolidating genomic data sets can offer further insights into the accessibility of CREs, TF activity, and, thus, gene regulation. However, the integration and analysis of multi-modal data sets are hampered by considerable technical challenges. While methods for highlighting differential TF activity from combined chromatin state data (e.g., ChIP-seq, ATAC-seq, or DNase-seq) and RNA-seq data exist, they do not offer convenient usability, have limited support for large-scale data processing, and provide only minimal functionality for visually interpreting results. Results: We developed TF-Prioritizer, an automated pipeline that prioritizes condition-specific TFs from multi-modal data and generates an interactive web report. We demonstrated its potential by identifying known TFs along with their target genes, as well as previously unreported TFs active in lactating mouse mammary glands. Additionally, we studied a variety of ENCODE data sets for cell lines K562 and MCF-7, including twelve histone modification ChIP-seq as well as ATAC-seq and DNase-seq datasets, where we observe and discuss assay-specific differences. Conclusion: TF-Prioritizer accepts ATAC-seq, DNase-seq, or ChIP-seq and RNA-seq data as input and identifies TFs with differential activity, thus offering an understanding of genome-wide gene regulation, potential pathogenesis, and therapeutic targets in biomedical research.

Open Access

Filtern

Autor*in

Erscheinungsjahr

Dokumenttyp

Sprache

Volltext vorhanden

Gehört zur Bibliographie

Institut

14 Treffer