Endothelial cells play a critical role in the adaptation of tissues to injury. Tissue ischemia induced by infarction leads to profound changes in endothelial cell functions and can induce transition to a mesenchymal state. Here we explore the kinetics and individual cellular responses of endothelial cells after myocardial infarction by using single cell RNA sequencing. This study demonstrates a time dependent switch in endothelial cell proliferation and inflammation associated with transient changes in metabolic gene signatures. Trajectory analysis reveals that the majority of endothelial cells 3 to 7 days after myocardial infarction acquire a transient state, characterized by mesenchymal gene expression, which returns to baseline 14 days after injury. Lineage tracing, using the Cdh5-CreERT2;mT/mG mice followed by single cell RNA sequencing, confirms the transient mesenchymal transition and reveals additional hypoxic and inflammatory signatures of endothelial cells during early and late states after injury. These data suggest that endothelial cells undergo a transient mes-enchymal activation concomitant with a metabolic adaptation within the first days after myocardial infarction but do not acquire a long-term mesenchymal fate. This mesenchymal activation may facilitate endothelial cell migration and clonal expansion to regenerate the vascular network.
The aging process is characterized by a chronic, low‐grade inflammatory state, termed “inflammaging.” It has been suggested that macrophage activation plays a key role in the induction and maintenance of this state. In the present study, we aimed to elucidate the mechanisms responsible for aging‐associated changes in the myeloid compartment of mice. The aging phenotype, characterized by elevated cytokine production, was associated with a dysfunction of the hypothalamic–pituitary–adrenal (HPA) axis and diminished serum corticosteroid levels. In particular, the concentration of corticosterone, the major active glucocorticoid in rodents, was decreased. This could be explained by an impaired expression and activity of 11β‐hydroxysteroid dehydrogenase type 1 (11β‐HSD1), an enzyme that determines the extent of cellular glucocorticoid responses by reducing the corticosteroids cortisone/11‐dehydrocorticosterone to their active forms cortisol/corticosterone, in aged macrophages and peripheral leukocytes. These changes were accompanied by a downregulation of the glucocorticoid receptor target gene glucocorticoid‐induced leucine zipper (GILZ) in vitro and in vivo. Since GILZ plays a central role in macrophage activation, we hypothesized that the loss of GILZ contributed to the process of macroph‐aging. The phenotype of macrophages from aged mice was indeed mimicked in young GILZ knockout mice. In summary, the current study provides insight into the role of glucocorticoid metabolism and GILZ regulation during aging.
Background: Understanding the location and cell-type specific binding of Transcription Factors (TFs) is important in the study of gene regulation. Computational prediction of TF binding sites is challenging, because TFs often bind only to short DNA motifs and cell-type specific co-factors may work together with the same TF to determine binding. Here, we consider the problem of learning a general model for the prediction of TF binding using DNase1-seq data and TF motif description in form of position specific energy matrices (PSEMs).
Methods: We use TF ChIP-seq data as a gold-standard for model training and evaluation. Our contribution is a novel ensemble learning approach using random forest classifiers. In the context of the ENCODE-DREAM in vivo TF binding site prediction challenge we consider different learning setups.
Results: Our results indicate that the ensemble learning approach is able to better generalize across tissues and cell-types compared to individual tissue-specific classifiers or a classifier built based upon data aggregated across tissues. Furthermore, we show that incorporating DNase1-seq peaks is essential to reduce the false positive rate of TF binding predictions compared to considering the raw DNase1 signal.
Conclusions: Analysis of important features reveals that the models preferentially select motifs of other TFs that are close interaction partners in existing protein protein-interaction networks. Code generated in the scope of this project is available on GitHub: https://github.com/SchulzLab/TFAnalysis (DOI: 10.5281/zenodo.1409697).
Background: With the rise of single-cell RNA sequencing new bioinformatic tools have been developed to handle specific demands, such as quantifying unique molecular identifiers and correcting cell barcodes. Here, we benchmarked several datasets with the most common alignment tools for single-cell RNA sequencing data. We evaluated differences in the whitelisting, gene quantification, overall performance, and potential variations in clustering or detection of differentially expressed genes. We compared the tools Cell Ranger version 6, STARsolo, Kallisto, Alevin, and Alevin-fry on 3 published datasets for human and mouse, sequenced with different versions of the 10X sequencing protocol.
Results: Striking differences were observed in the overall runtime of the mappers. Besides that, Kallisto and Alevin showed variances in the number of valid cells and detected genes per cell. Kallisto reported the highest number of cells; however, we observed an overrepresentation of cells with low gene content and unknown cell type. Conversely, Alevin rarely reported such low-content cells. Further variations were detected in the set of expressed genes. While STARsolo, Cell Ranger 6, Alevin-fry, and Alevin produced similar gene sets, Kallisto detected additional genes from the Vmn and Olfr gene family, which are likely mapping artefacts. We also observed differences in the mitochondrial content of the resulting cells when comparing a prefiltered annotation set to the full annotation set that includes pseudogenes and other biotypes.
Conclusion: Overall, this study provides a detailed comparison of common single-cell RNA sequencing mappers and shows their specific properties on 10X Genomics data.
Background: Enhancers play a fundamental role in orchestrating cell state and development. Although several methods have been developed to identify enhancers, linking them to their target genes is still an open problem. Several theories have been proposed on the functional mechanisms of enhancers, which triggered the development of various methods to infer promoter–enhancer interactions (PEIs). The advancement of high-throughput techniques describing the three-dimensional organization of the chromatin, paved the way to pinpoint long-range PEIs. Here we investigated whether including PEIs in computational models for the prediction of gene expression improves performance and interpretability.
Results: We have extended our TEPIC framework to include DNA contacts deduced from chromatin conformation capture experiments and compared various methods to determine PEIs using predictive modelling of gene expression from chromatin accessibility data and predicted transcription factor (TF) motif data. We designed a novel machine learning approach that allows the prioritization of TFs binding to distal loop and promoter regions with respect to their importance for gene expression regulation. Our analysis revealed a set of core TFs that are part of enhancer–promoter loops involving YY1 in different cell lines.
Conclusion: We present a novel approach that can be used to prioritize TFs involved in distal and promoter-proximal regulatory events by integrating chromatin accessibility, conformation, and gene expression data. We show that the integration of chromatin conformation data can improve gene expression prediction and aids model interpretability.
Background: Understanding the location and cell-type specific binding of Transcription Factors (TFs) is important in the study of gene regulation. Computational prediction of TF binding sites is challenging, because TFs often bind only to short DNA motifs and cell-type specific co-factors may work together with the same TF to determine binding. Here, we consider the problem of learning a general model for the prediction of TF binding using DNase1-seq data and TF motif description in form of position specific energy matrices (PSEMs).
Methods: We use TF ChIP-seq data as a gold-standard for model training and evaluation. Our contribution is a novel ensemble learning approach using random forest classifiers. In the context of the ENCODE-DREAM in vivo TF binding site prediction challenge we consider different learning setups.
Results: Our results indicate that the ensemble learning approach is able to better generalize across tissues and cell-types compared to individual tissue-specific classifiers or a classifier applied to the data aggregated across tissues. Furthermore, we show that incorporating DNase1-seq peaks is essential to reduce the false positive rate of TF binding predictions compared to considering the raw DNase1 signal.
Conclusions: Analysis of important features reveals that the models preferentially select motifs of other TFs that are close interaction partners in existing protein protein-interaction networks. Code generated in the scope of this project is available on GitHub: https://github.com/SchulzLab/TFAnalysis (DOI: 10.5281/zenodo.1409697)
Specialized de novo assemblers for diverse datatypes have been developed and are in widespread use for the analyses of single-cell genomics, metagenomics and RNA-seq data. However, assembly of large sequencing datasets produced by modern technologies is challenging and computationally intensive. In-silico read normalization has been suggested as a computational strategy to reduce redundancy in read datasets, which leads to significant speedups and memory savings of assembly pipelines. Previously, we presented a set multi-cover optimization based approach, ORNA, where reads are reduced without losing important k-mer connectivity information, as used in assembly graphs. Here we propose extensions to ORNA, named ORNA-Q and ORNA-K, which consider a weighted set multi-cover optimization formulation for the in-silico read normalization problem. These novel formulations make use of the base quality scores obtained from sequencers (ORNA-Q) or k-mer abundances of reads (ORNA-K) to improve normalization further. We devise efficient heuristic algorithms for solving both formulations. In applications to human RNA-seq data, ORNA-Q and ORNA-K are shown to assemble more or equally many full length transcripts compared to other normalization methods at similar or higher read reduction values. The algorithm is implemented under the latest version of ORNA (v2.0, https://github.com/SchulzLab/ORNA).
The transcription factor vitamin D receptor (VDR) is the high affinity nuclear target of the biologically active form of vitamin D3 (1,25(OH)2D3). In order to identify pure genomic transcriptional effects of 1,25(OH)2D3, we used VDR cistrome, transcriptome and open chromatin data, obtained from the human monocytic cell line THP-1, for a novel hierarchical analysis applying three bioinformatics approaches. We predicted 75.6% of all early 1,25(OH)2D3-responding (2.5 or 4 h) and 57.4% of the late differentially expressed genes (24 h) to be primary VDR target genes. VDR knockout led to a complete loss of 1,25(OH)2D3–induced genome-wide gene regulation. Thus, there was no indication of any VDR-independent non-genomic actions of 1,25(OH)2D3 modulating its transcriptional response. Among the predicted primary VDR target genes, 47 were coding for transcription factors and thus may mediate secondary 1,25(OH)2D3 responses. CEBPA and ETS1 ChIP-seq data and RNA-seq following CEBPA knockdown were used to validate the predicted regulation of secondary vitamin D target genes by both transcription factors. In conclusion, a directional network containing 47 partly novel primary VDR target transcription factors describes secondary responses in a highly complex vitamin D signaling cascade. The central transcription factor VDR is indispensable for all transcriptome-wide effects of the nuclear hormone.
Endocannabinoids are important lipid-signaling mediators. Both protective and deleterious effects of endocannabinoids in the cardiovascular system have been reported but the mechanistic basis for these contradicting observations is unclear. We set out to identify anti-inflammatory mechanisms of endocannabinoids in the murine aorta and in human vascular smooth muscle cells (hVSMC). In response to combined stimulation with cytokines, IL-1β and TNFα, the murine aorta released several endocannabinoids, with anandamide (AEA) levels being the most significantly increased. AEA pretreatment had profound effects on cytokine-induced gene expression in hVSMC and murine aorta. As revealed by RNA-Seq analysis, the induction of a subset of 21 inflammatory target genes, including the important cytokine CCL2 was blocked by AEA. This effect was not mediated through AEA-dependent interference of the AP-1 or NF-κB pathways but rather through an epigenetic mechanism. In the presence of AEA, ATAC-Seq analysis and chromatin-immunoprecipitations revealed that CCL2 induction was blocked due to increased levels of H3K27me3 and a decrease of H3K27ac leading to compacted chromatin structure in the CCL2 promoter. These effects were mediated by recruitment of HDAC4 and the nuclear corepressor NCoR1 to the CCL2 promoter. This study therefore establishes a novel anti-inflammatory mechanism for the endogenous endocannabinoid AEA in vascular smooth muscle cells. Furthermore, this work provides a link between endogenous endocannabinoid signaling and epigenetic regulation.
Motivation DNA CpG methylation (CpGm) has proven to be a crucial epigenetic factor in the gene regulatory system. Assessment of DNA CpG methylation values via whole-genome bisulfite sequencing (WGBS) is, however, computationally extremely demanding.
Results We present FAst MEthylation calling (FAME), the first approach to quantify CpGm values directly from bulk or single-cell WGBS reads without intermediate output files. FAME is very fast but as accurate as standard methods, which first produce BS alignment files before computing CpGm values. We present experiments on bulk and single-cell bisulfite datasets in which we show that data analysis can be significantly sped-up and help addressing the current WGBS analysis bottleneck for large-scale datasets without compromising accuracy.
Availability An implementation of FAME is open source and licensed under GPL-3.0 at https://github.com/FischerJo/FAME.