Refine
Document Type
- Article (1)
- Doctoral Thesis (1)
Language
- English (2) (remove)
Has Fulltext
- yes (2) (remove)
Is part of the Bibliography
- no (2)
Keywords
- epigenome (2) (remove)
Institute
- Informatik (1)
- Medizin (1)
In the last two decades, our understanding of human gene regulation has improved tremendously. There are plentiful computational methods which focus on integrative data analysis of humans, and model organisms, like mouse and drosophila. However, these tools are not directly employable by researchers working on non-model organisms to answer fundamental biological, and evolutionary questions. We aimed to develop new tools, and adapt existing software for the analysis of transcriptomic and epigenomic data of one such non-model organism, Paramecium tetraurelia, an unicellular eukaryote. Paramecium contains two diploid (2n) germline micronuclei (MIC) and a polyploid (800n) somatic macronuclei (MAC). The transcriptomic and epigenomic regulatory landscape of the MAC genome, which has 80% protein-coding genes and short intergenic regions, is poorly understood.
We developed a generic automated eukaryotic short interfering RNA (siRNA) analysis tool, called RAPID. Our tool captures diverse siRNA characteristics from small RNA sequencing data and provides easily navigable visualisations. We also introduced a normalisation technique to facilitate comparison of multiple siRNA-based gene knockdown studies. Further, we developed a pipeline to characterise novel genome-wide endogenous short interfering RNAs (endo-siRNAs). In contrary to many organisms, we found that the endo-siRNAs are not acting in cis, to silence their parent mRNA. We also predicted phasing of siRNAs, which are regulated by the RNA interference (RNAi) pathway.
Further, using RAPID, we investigated the aberrations of endo-siRNAs, and their respective transcriptomic alterations caused by an RNAi pathway triggered by feeding small RNAs against a target gene. We find that the small RNA transcriptome is altered, even if a gene unrelated to RNAi pathway is targeted. This is important in the context of investigations of genetically modified organisms (GMOs). We suggest that future studies need to distinguish transcriptomic changes caused by RNAi inducing techniques and actual regulatory changes.
Subsequently, we adapted existing epigenomics analysis tools to conduct the first comprehensive epigenomic characterisation of nucleosome positioning and histone modifications of the Paramecium MAC. We identified well positioned nucleosomes shifted downstream of the transcription start site. GC content seems to dictate, in cis, the positioning of nucleosomes, histone marks (H3K4me3, H3K9ac, and H3K27me3), and Pol II in the AT-rich Paramecium genome. We employed a chromatin state segmentation approach, on nucleosomes and histone marks, which revealed genes with active, repressive, and bivalent chromatin states. Further, we constructed a regulatory association network of all the aforementioned data, using the sparse partial correlation network technique. Our analysis revealed subsets of genes, whose expression is positively associated with H3K27me3, different to the otherwise reported negative association with gene expression in many other organisms.
Further, we developed a Random Forests classifier to predict gene expression using genic (gene length, intron frequency, etc.) and epigenetic features. Our model has a test performance (PR-AUC) of 0.83. Upon evaluating different feature sets, we found that genic features are as predictive, of gene expression, as the epigenetic features. We used Shapley local feature explanation values, to suggest that high H3K4me3, high intron frequency, low gene length, high sRNA, and high GC content are the most important elements for determining gene expression status.
In this thesis, we developed novel tools, and employed several bioinformatics and machine learning methods to characterise the regulatory landscape of the Paramecium’s (epi)genome.
The molecular basis of vitamin D signaling implies that the metabolite 1α,25-dihydroxyvitamin D3 (1,25(OH)2D3) of the secosteroid vitamin D3 activates the transcription factor vitamin D receptor (VDR), which in turn modulates the expression of hundreds of primary vitamin D target genes. Since the evolutionary role of nuclear receptors, such as VDR, was the regulation of cellular metabolism, the control of calcium metabolism became the primary function of vitamin D and its receptor. Moreover, the nearly ubiquitous expression of VDR enabled vitamin D to acquire additional physiological functions, such as the support of the innate immune system in its defense against microbes. Monocytes and their differentiated phenotypes, macrophages and dendritic cells, are key cell types of the innate immune system. Vitamin D signaling was most comprehensively investigated in THP-1 cells, which are an established model of human monocytes. This includes the 1,25(OH)2D3-modulated cistromes of VDR, the pioneer transcription factors PU.1 and CEBPA and the chromatin modifier CTCF as well as of the histone markers of promoter and enhancer regions, H3K4me3 and H3K27ac, respectively. These epigenome-wide datasets led to the development of our chromatin model of vitamin D signaling. This review discusses the mechanistic basis of 189 primary vitamin D target genes identified by transcriptome-wide analysis of 1,25(OH)2D3-stimulated THP-1 cells and relates the epigenomic basis of four different regulatory scenarios to the physiological functions of the respective genes.