Genetic generalised epilepsy (GGE) is the most common form of genetic epilepsy, accounting for 20% of all epilepsies. Genomic copy number variations (CNVs) constitute important genetic risk factors of common GGE syndromes. In our present genome-wide burden analysis, large (≥ 400 kb) and rare (< 1%) autosomal microdeletions with high calling confidence (≥ 200 markers) were assessed by the Affymetrix SNP 6.0 array in European case-control cohorts of 1,366 GGE patients and 5,234 ancestry-matched controls. We aimed to: 1) assess the microdeletion burden in common GGE syndromes, 2) estimate the relative contribution of recurrent microdeletions at genomic rearrangement hotspots and non-recurrent microdeletions, and 3) identify potential candidate genes for GGE. We found a significant excess of microdeletions in 7.3% of GGE patients compared to 4.0% in controls (P = 1.8 x 10-7; OR = 1.9). Recurrent microdeletions at seven known genomic hotspots accounted for 36.9% of all microdeletions identified in the GGE cohort and showed a 7.5-fold increased burden (P = 2.6 x 10-17) relative to controls. Microdeletions affecting either a gene previously implicated in neurodevelopmental disorders (P = 8.0 x 10-18, OR = 4.6) or an evolutionarily conserved brain-expressed gene related to autism spectrum disorder (P = 1.3 x 10-12, OR = 4.1) were significantly enriched in the GGE patients. Microdeletions found only in GGE patients harboured a high proportion of genes previously associated with epilepsy and neuropsychiatric disorders (NRXN1, RBFOX1, PCDH7, KCNA2, EPM2A, RORB, PLCB1). Our results demonstrate that the significantly increased burden of large and rare microdeletions in GGE patients is largely confined to recurrent hotspot microdeletions and microdeletions affecting neurodevelopmental genes, suggesting a strong impact of fundamental neurodevelopmental processes in the pathogenesis of common GGE syndromes.
Background: Understanding the location and cell-type specific binding of Transcription Factors (TFs) is important in the study of gene regulation. Computational prediction of TF binding sites is challenging, because TFs often bind only to short DNA motifs and cell-type specific co-factors may work together with the same TF to determine binding. Here, we consider the problem of learning a general model for the prediction of TF binding using DNase1-seq data and TF motif description in form of position specific energy matrices (PSEMs).
Methods: We use TF ChIP-seq data as a gold-standard for model training and evaluation. Our contribution is a novel ensemble learning approach using random forest classifiers. In the context of the ENCODE-DREAM in vivo TF binding site prediction challenge we consider different learning setups.
Results: Our results indicate that the ensemble learning approach is able to better generalize across tissues and cell-types compared to individual tissue-specific classifiers or a classifier built based upon data aggregated across tissues. Furthermore, we show that incorporating DNase1-seq peaks is essential to reduce the false positive rate of TF binding predictions compared to considering the raw DNase1 signal.
Conclusions: Analysis of important features reveals that the models preferentially select motifs of other TFs that are close interaction partners in existing protein protein-interaction networks. Code generated in the scope of this project is available on GitHub: https://github.com/SchulzLab/TFAnalysis (DOI: 10.5281/zenodo.1409697).
Background: Enhancers play a fundamental role in orchestrating cell state and development. Although several methods have been developed to identify enhancers, linking them to their target genes is still an open problem. Several theories have been proposed on the functional mechanisms of enhancers, which triggered the development of various methods to infer promoter–enhancer interactions (PEIs). The advancement of high-throughput techniques describing the three-dimensional organization of the chromatin, paved the way to pinpoint long-range PEIs. Here we investigated whether including PEIs in computational models for the prediction of gene expression improves performance and interpretability.
Results: We have extended our TEPIC framework to include DNA contacts deduced from chromatin conformation capture experiments and compared various methods to determine PEIs using predictive modelling of gene expression from chromatin accessibility data and predicted transcription factor (TF) motif data. We designed a novel machine learning approach that allows the prioritization of TFs binding to distal loop and promoter regions with respect to their importance for gene expression regulation. Our analysis revealed a set of core TFs that are part of enhancer–promoter loops involving YY1 in different cell lines.
Conclusion: We present a novel approach that can be used to prioritize TFs involved in distal and promoter-proximal regulatory events by integrating chromatin accessibility, conformation, and gene expression data. We show that the integration of chromatin conformation data can improve gene expression prediction and aids model interpretability.
Background: Understanding the location and cell-type specific binding of Transcription Factors (TFs) is important in the study of gene regulation. Computational prediction of TF binding sites is challenging, because TFs often bind only to short DNA motifs and cell-type specific co-factors may work together with the same TF to determine binding. Here, we consider the problem of learning a general model for the prediction of TF binding using DNase1-seq data and TF motif description in form of position specific energy matrices (PSEMs).
Methods: We use TF ChIP-seq data as a gold-standard for model training and evaluation. Our contribution is a novel ensemble learning approach using random forest classifiers. In the context of the ENCODE-DREAM in vivo TF binding site prediction challenge we consider different learning setups.
Results: Our results indicate that the ensemble learning approach is able to better generalize across tissues and cell-types compared to individual tissue-specific classifiers or a classifier applied to the data aggregated across tissues. Furthermore, we show that incorporating DNase1-seq peaks is essential to reduce the false positive rate of TF binding predictions compared to considering the raw DNase1 signal.
Conclusions: Analysis of important features reveals that the models preferentially select motifs of other TFs that are close interaction partners in existing protein protein-interaction networks. Code generated in the scope of this project is available on GitHub: https://github.com/SchulzLab/TFAnalysis (DOI: 10.5281/zenodo.1409697)
Specialized de novo assemblers for diverse datatypes have been developed and are in widespread use for the analyses of single-cell genomics, metagenomics and RNA-seq data. However, assembly of large sequencing datasets produced by modern technologies is challenging and computationally intensive. In-silico read normalization has been suggested as a computational strategy to reduce redundancy in read datasets, which leads to significant speedups and memory savings of assembly pipelines. Previously, we presented a set multi-cover optimization based approach, ORNA, where reads are reduced without losing important k-mer connectivity information, as used in assembly graphs. Here we propose extensions to ORNA, named ORNA-Q and ORNA-K, which consider a weighted set multi-cover optimization formulation for the in-silico read normalization problem. These novel formulations make use of the base quality scores obtained from sequencers (ORNA-Q) or k-mer abundances of reads (ORNA-K) to improve normalization further. We devise efficient heuristic algorithms for solving both formulations. In applications to human RNA-seq data, ORNA-Q and ORNA-K are shown to assemble more or equally many full length transcripts compared to other normalization methods at similar or higher read reduction values. The algorithm is implemented under the latest version of ORNA (v2.0, https://github.com/SchulzLab/ORNA).
The transcription factor vitamin D receptor (VDR) is the high affinity nuclear target of the biologically active form of vitamin D3 (1,25(OH)2D3). In order to identify pure genomic transcriptional effects of 1,25(OH)2D3, we used VDR cistrome, transcriptome and open chromatin data, obtained from the human monocytic cell line THP-1, for a novel hierarchical analysis applying three bioinformatics approaches. We predicted 75.6% of all early 1,25(OH)2D3-responding (2.5 or 4 h) and 57.4% of the late differentially expressed genes (24 h) to be primary VDR target genes. VDR knockout led to a complete loss of 1,25(OH)2D3–induced genome-wide gene regulation. Thus, there was no indication of any VDR-independent non-genomic actions of 1,25(OH)2D3 modulating its transcriptional response. Among the predicted primary VDR target genes, 47 were coding for transcription factors and thus may mediate secondary 1,25(OH)2D3 responses. CEBPA and ETS1 ChIP-seq data and RNA-seq following CEBPA knockdown were used to validate the predicted regulation of secondary vitamin D target genes by both transcription factors. In conclusion, a directional network containing 47 partly novel primary VDR target transcription factors describes secondary responses in a highly complex vitamin D signaling cascade. The central transcription factor VDR is indispensable for all transcriptome-wide effects of the nuclear hormone.
Most sRNA biogenesis mechanisms involve either RNAseIII cleavage or ping-pong amplification by different Piwi proteins harboring slicer activity. Here, we follow the question why the mechanism of transgene-induced silencing in the ciliate Paramecium needs both Dicer activity and two Ptiwi proteins. This pathway involves primary siRNAs produced from non-translatable transgenes and secondary siRNAs from endogenous remote loci. Our data does not indicate any signatures from ping-pong amplification but Dicer cleavage of long dsRNA. We show that Ptiwi13 and 14 have different preferences for primary and secondary siRNAs but do not load them mutually exclusive. Both Piwis enrich for antisense RNAs and Ptiwi14 loaded siRNAs show a 5′-U signature. Both Ptiwis show in addition a general preference for Uridine-rich sRNAs along the entire sRNA length. Our data indicates both Ptiwis and 2’-O-methylation to contribute to strand selection of Dicer cleaved siRNAs. This unexpected function of two distinct vegetative Piwis extends the increasing knowledge of the diversity of Piwi functions in diverse silencing pathways. As both Ptiwis show differential subcellular localisation, Ptiwi13 in the cytoplasm and Ptiwi14 in the vegetative macronucleus, we conclude that cytosolic and nuclear silencing factors are necessary for efficient chromatin silencing.
Simple Summary: In patients with myeloproliferative neoplasms (MPN) and in patients with kidney dysfunction, a higher rate of thrombosis has been reported compared with the general population. Furthermore, MPN patients are more prone to develop kidney dysfunction. In our study, we assessed the importance of specific risk factors for kidney dysfunction and thrombosis in MPN patients. We found that the rate of thrombosis is correlated with the degree of kidney dysfunction, especially in myelofibrosis. Significant associations for kidney dysfunction included arterial hypertension, MPN treatment, and increased inflammation, and those for thrombosis comprised arterial hypertension, non-excessive platelet counts, and antithrombotic therapy. The identified risk factor associations varied between MPN subtypes. Our data suggest that kidney dysfunction in MPN patients is associated with an increased risk of thrombosis, mandating closer monitoring, and, possibly, early thromboprophylaxis.
Abstract: Inflammation-induced thrombosis represents a severe complication in patients with myeloproliferative neoplasms (MPN) and in those with kidney dysfunction. Overlapping disease-specific attributes suggest common mechanisms involved in MPN pathogenesis, kidney dysfunction, and thrombosis. Data from 1420 patients with essential thrombocythemia (ET, 33.7%), polycythemia vera (PV, 38.5%), and myelofibrosis (MF, 27.9%) were extracted from the bioregistry of the German Study Group for MPN. The total cohort was subdivided according to the calculated estimated glomerular filtration rate (eGFR, (mL/min/1.73 m2)) into eGFR1 (≥90, 21%), eGFR2 (60–89, 56%), and eGFR3 (<60, 22%). A total of 29% of the patients had a history of thrombosis. A higher rate of thrombosis and longer MPN duration was observed in eGFR3 than in eGFR2 and eGFR1. Kidney dysfunction occurred earlier in ET than in PV or MF. Multiple logistic regression analysis identified arterial hypertension, MPN treatment, increased uric acid, and lactate dehydrogenase levels as risk factors for kidney dysfunction in MPN patients. Risk factors for thrombosis included arterial hypertension, non-excessive platelet counts, and antithrombotic therapy. The risk factors for kidney dysfunction and thrombosis varied between MPN subtypes. Physicians should be aware of the increased risk for kidney disease in MPN patients, which warrants closer monitoring and, possibly, early thromboprophylaxis.
Background: With the rise of single-cell RNA sequencing new bioinformatic tools have been developed to handle specific demands, such as quantifying unique molecular identifiers and correcting cell barcodes. Here, we benchmarked several datasets with the most common alignment tools for single-cell RNA sequencing data. We evaluated differences in the whitelisting, gene quantification, overall performance, and potential variations in clustering or detection of differentially expressed genes. We compared the tools Cell Ranger version 6, STARsolo, Kallisto, Alevin, and Alevin-fry on 3 published datasets for human and mouse, sequenced with different versions of the 10X sequencing protocol.
Results: Striking differences were observed in the overall runtime of the mappers. Besides that, Kallisto and Alevin showed variances in the number of valid cells and detected genes per cell. Kallisto reported the highest number of cells; however, we observed an overrepresentation of cells with low gene content and unknown cell type. Conversely, Alevin rarely reported such low-content cells. Further variations were detected in the set of expressed genes. While STARsolo, Cell Ranger 6, Alevin-fry, and Alevin produced similar gene sets, Kallisto detected additional genes from the Vmn and Olfr gene family, which are likely mapping artefacts. We also observed differences in the mitochondrial content of the resulting cells when comparing a prefiltered annotation set to the full annotation set that includes pseudogenes and other biotypes.
Conclusion: Overall, this study provides a detailed comparison of common single-cell RNA sequencing mappers and shows their specific properties on 10X Genomics data.
The aging process is characterized by a chronic, low‐grade inflammatory state, termed “inflammaging.” It has been suggested that macrophage activation plays a key role in the induction and maintenance of this state. In the present study, we aimed to elucidate the mechanisms responsible for aging‐associated changes in the myeloid compartment of mice. The aging phenotype, characterized by elevated cytokine production, was associated with a dysfunction of the hypothalamic–pituitary–adrenal (HPA) axis and diminished serum corticosteroid levels. In particular, the concentration of corticosterone, the major active glucocorticoid in rodents, was decreased. This could be explained by an impaired expression and activity of 11β‐hydroxysteroid dehydrogenase type 1 (11β‐HSD1), an enzyme that determines the extent of cellular glucocorticoid responses by reducing the corticosteroids cortisone/11‐dehydrocorticosterone to their active forms cortisol/corticosterone, in aged macrophages and peripheral leukocytes. These changes were accompanied by a downregulation of the glucocorticoid receptor target gene glucocorticoid‐induced leucine zipper (GILZ) in vitro and in vivo. Since GILZ plays a central role in macrophage activation, we hypothesized that the loss of GILZ contributed to the process of macroph‐aging. The phenotype of macrophages from aged mice was indeed mimicked in young GILZ knockout mice. In summary, the current study provides insight into the role of glucocorticoid metabolism and GILZ regulation during aging.