Refine
Document Type
- Doctoral Thesis (10)
Language
- English (10)
Has Fulltext
- yes (10)
Is part of the Bibliography
- no (10)
Keywords
- Bioinformatics (1)
- CLIP (1)
- PURA (1)
- Paramecium (1)
- RNA biology (1)
- RNA interference (1)
- Transcriptomics (1)
- epigenome (1)
- m6A (1)
- macronucleus (1)
Institute
- Biowissenschaften (9)
- Informatik (1)
Alternative splicing (AS) is a co- or post-transcriptional process by which one gene gives rise to multiple isoforms. This ‘split and combine’ step multiplies eukaryotic proteome diversity several fold and is implicated in several diseases given its pervasive impact. Control of alternative splicing is brought about by cis-regulatory elements, such as RNA sequence and structure, which recruit trans-acting RNA-binding proteins (RBPs). Although several of these interactions are already described in detail, we lack a comprehensive understanding of the regulatory code that underlies a splicing decision.
Here, we have established a high-throughput screen to comprehensively identify and characterise cis-regulatory elements that control a specific splicing decision. A cancer-relevant splicing event in proto-oncogene RON was picked as a minigene prototype for initialising the screening approach. Then, we transfected a library of thousands of randomly mutagenised minigene variants as a pool into human cells, and subsequently quantified the spliced isoforms by RNA sequencing. Importantly, we used a barcode sequence to tag the minigene variants and thereby linked mutations to their corresponding spliced products. By using a linear regression-based modelling approach, we were able to determine the effects of single mutations on RON AS. In total, more than 700 mutations were found to significantly affect the splicing regulation of the RON alternative exon. In addition, mutation effects quantified from the screening approach correlate with RON alternative splicing in cancer patients. We discovered numerous previously unknown cis-regulatory elements in both introns and exons, and found that the RBP heterogeneous nuclear ribonucleoprotein H (HNRNPH) extensively regulates RON AS at multiple levels in both cell lines and cancer. Furthermore, the large number of RBPs involved in the process, point to a complex splicing regulatory network involved in the control of RON splicing. iCLIP and synergy analysis between mutations and HNRNPH knockdown data pinpointed the most relevant HNRNPH binding sites across RON. Finally, cooperative HNRNPH binding was shown to mediate a splicing switch of RON alternative exon. In summary, our results provide an unprecedented view on the complexity of splicing regulation of an alternative exon. The novel screening approach introduces a tool to study the relationship of RNA sequence variants along with trans-acting regulators to their impact on the splicing outcome, offering insights on alternative splicing regulation and the relevance of mutations in human disease.
In the last two decades, our understanding of human gene regulation has improved tremendously. There are plentiful computational methods which focus on integrative data analysis of humans, and model organisms, like mouse and drosophila. However, these tools are not directly employable by researchers working on non-model organisms to answer fundamental biological, and evolutionary questions. We aimed to develop new tools, and adapt existing software for the analysis of transcriptomic and epigenomic data of one such non-model organism, Paramecium tetraurelia, an unicellular eukaryote. Paramecium contains two diploid (2n) germline micronuclei (MIC) and a polyploid (800n) somatic macronuclei (MAC). The transcriptomic and epigenomic regulatory landscape of the MAC genome, which has 80% protein-coding genes and short intergenic regions, is poorly understood.
We developed a generic automated eukaryotic short interfering RNA (siRNA) analysis tool, called RAPID. Our tool captures diverse siRNA characteristics from small RNA sequencing data and provides easily navigable visualisations. We also introduced a normalisation technique to facilitate comparison of multiple siRNA-based gene knockdown studies. Further, we developed a pipeline to characterise novel genome-wide endogenous short interfering RNAs (endo-siRNAs). In contrary to many organisms, we found that the endo-siRNAs are not acting in cis, to silence their parent mRNA. We also predicted phasing of siRNAs, which are regulated by the RNA interference (RNAi) pathway.
Further, using RAPID, we investigated the aberrations of endo-siRNAs, and their respective transcriptomic alterations caused by an RNAi pathway triggered by feeding small RNAs against a target gene. We find that the small RNA transcriptome is altered, even if a gene unrelated to RNAi pathway is targeted. This is important in the context of investigations of genetically modified organisms (GMOs). We suggest that future studies need to distinguish transcriptomic changes caused by RNAi inducing techniques and actual regulatory changes.
Subsequently, we adapted existing epigenomics analysis tools to conduct the first comprehensive epigenomic characterisation of nucleosome positioning and histone modifications of the Paramecium MAC. We identified well positioned nucleosomes shifted downstream of the transcription start site. GC content seems to dictate, in cis, the positioning of nucleosomes, histone marks (H3K4me3, H3K9ac, and H3K27me3), and Pol II in the AT-rich Paramecium genome. We employed a chromatin state segmentation approach, on nucleosomes and histone marks, which revealed genes with active, repressive, and bivalent chromatin states. Further, we constructed a regulatory association network of all the aforementioned data, using the sparse partial correlation network technique. Our analysis revealed subsets of genes, whose expression is positively associated with H3K27me3, different to the otherwise reported negative association with gene expression in many other organisms.
Further, we developed a Random Forests classifier to predict gene expression using genic (gene length, intron frequency, etc.) and epigenetic features. Our model has a test performance (PR-AUC) of 0.83. Upon evaluating different feature sets, we found that genic features are as predictive, of gene expression, as the epigenetic features. We used Shapley local feature explanation values, to suggest that high H3K4me3, high intron frequency, low gene length, high sRNA, and high GC content are the most important elements for determining gene expression status.
In this thesis, we developed novel tools, and employed several bioinformatics and machine learning methods to characterise the regulatory landscape of the Paramecium’s (epi)genome.
Most cellular processes are regulated by RNA-binding proteins (RBPs). These RBPs usually use defined binding sites to recognize and directly interact with their target RNA molecule. Individual-nucleotide resolution UV crosslinking and immunoprecipitation (iCLIP) experiments are an important tool to de- scribe such interactions in cell cultures in-vivo. This experimental protocol yields millions of individual sequencing reads from which the binding spec- trum of the RBP under study can be deduced. In this PhD thesis I studied how RNA processing is driven from RBP binding by analyzing iCLIP-derived sequencing datasets.
First, I described a complete data analysis pipeline to detect RBP binding sites from iCLIP sequencing reads. This workflow covers all essential process- ing steps, from the first quality control to the final annotation of binding sites. I described the accurate integration of biological iCLIP replicates to boost the initial peak calling step while ensuring high specificity through replicate re- producibility analysis. Further I proposed a routine to level binding site width to streamline downstream analysis processes. This was exemplified in the re- analysis of the binding spectrum of the U2 small nuclear RNA auxiliary factor 2 (U2AF2, U2AF65). I recaptured the known dominance of U2AF65 to bind to intronic sequences of protein-coding genes, where it likely recognizes the polypyrimidine tract as part of the core spliceosome machinery.
In the second part of my thesis, I analyzed the binding spectrum of the serine and arginine rich splicing factor 6 (SRSF6) in the context of diabetes. In pancreatic beta-cells, the expression of SRSF6 is regulated by the transcription factor GLIS3, which encodes for a diabetes susceptibility gene. It is known that SRSF6 promotes beta-cell death through the splicing dysregulation of genes essential to beta-cell function and survival. However, the exact mechanism of how these RNAs are targeted by SRSF6 remains poorly understood. Here, I applied the defined iCLIP processing pipeline to describe the binding landscape of the splicing factor SRSF6 in the human pancreatic beta-cell line EndoC-H1. The initial binding sites definition revealed a predominant binding to coding sequences (CDS) of protein-coding genes. This was followed up by extensive motif analysis which revealed a so far, in human, unknown purine-rich binding motif. SRSF6 seemed to specifically recognize repetitions of the triplet GAA. I also showed that the number of contiguous triplets correlated with increasing binding site strength. I further integrated RNA-sequencing data from the same cell type, with SRSF6 in KD and in basal conditions, to analyze SRSF6- related splicing changes. I showed that the exact positioning of SRSF6 on alternatively spliced exons regulates the produced transcript isoforms. This mechanism seemed to control exons in several known susceptibility genes for diabetes.
In summary, in my PhD thesis, I presented a comprehensive workflow for the processing of iCLIP-derived sequencing data. I applied this pipeline on a dataset from pancreatic beta-cells to unveil the impact of SRSF6-mediated splicing changes. Thus, my analysis provides novel insights into the regulation of diabetes susceptibility genes.
Hypoxia is a condition in which cells are deprived of adequate oxygen supply and represents a main feature of solid tumours. Cells under hypoxic stress activate transcriptional responses driven by hypoxia-inducible factors (HIFs), which affect multiple cellular pathways, including angiogenesis, metabolic adaptation and cell proliferation. While the transcriptional changes induced in hypoxic tumours are well characterised, it is still poorly understood how hypoxia contributes to the aberrant post-transcriptional regulation observed in tumours. In this PhD thesis, I studied the RNA response to hypoxia in cancer, to provide novel insights into its regulation.
Using deep RNA-Sequencing (RNA-Seq), I investigated transcriptome changes of three human cell lines from lung, cervical and breast cancer under hypoxia, advancing our knowledge of post-transcriptional gene regulation in hypoxic cancer. I show that hypoxia induced consistent changes in transcript abundance in the three cancer types. This was coupled to divergent splicing responses, highlighting the cell type specificity of alternative splicing programs. While the mRNA levels of RNA-binding proteins were mainly reduced, hypoxia upregulated muscleblind-like protein 2 (MBNL2) in all three cell lines. Hypoxia control was specific for MBNL2, since it did not affect its paralogs MBNL1 and MBNL3. Via knockdown experiments of MBNL2 in hypoxic cells, I could show that MBNL2 induction promotes adaptation of cancer cells to low oxygen by regulating both transcript abundance and alternative splicing of hypoxia response genes. In addition, depletion of MBNL2 reduced the proliferation and migration of cancer cells, corroborating a function of MBNL2 as cancer driver.
In the last few years, a novel class of RNAs has gained attention, namely circular RNAs (circRNAs), which are produced by a particular splicing mechanism, known as back-splicing. CircRNAs have been reported to change their abundance in cancer and their high stability makes them promising candidates as diagnostic biomarkers. In this study, I took advantage of deep rRNA-depleted RNA-Seq data to comprehensively investigate the expression of circRNAs in human cancer cells and their changes in response to hypoxia. To reliably identify circRNAs, I established a pipeline that integrates two available tools. for circRNA detection with custom approaches for quantification and statistical analysis. Using this pipeline, I identified 12006 circRNAs in the three cancer cell lines. Their molecular features suggest an involvement of complementary RNA sequences as well as trans-acting factors in circRNA biogenesis, including the splicing factor HNRNPC. Remarkably, I detected 210 circRNAs that are more abundant than their linear counterparts. Upon hypoxic stress, 64 circRNAs were differentially expressed in cancer cells, in most cases in a cell type-specific manner. In summary, in this PhD thesis, I present a comparative transcriptome profiling in human cancer cell lines. It reveals MBNL2 as an important player in hypoxic cancer progression and provides novel insights into the biogenesis and regulation of circRNAs under hypoxic stress.
Die Vorläuferform der eukaryotischen mRNA (prä-mRNA) durchläuft, eine Reihe von Prozessierungs-Schritte, die schließlich zu der Synthese einer „reifen“ und Exportkompetenten mRNA führt. prä-mRNA Spleißen ist ein essentieller Teilschritt dieser Reifung bei der intragene Sequenzen, sogenannte Introns, von der prä-mRNA entfernt werden, während Exons legiert werden. Das prä-mRNA Spleißen wird durch das Spleißosom katalysiert. Dieser Mega-Dalton Komplex, besteht aus fünf Sub-Komplexen, die sich wiederum aus katalytisch aktiven „kleinen nukleären Ribonukleinsäuren“ (snRNAs) und einer Vielzahl von proteinogenen Faktoren zusammensetzen. Diese Subkomplexe, bezeichnet als snRNPs (small nuclear Ribonucleoprotein Particles), binden die prä-mRNA an charakteristischen Sequenzen und richten die prä-mRNA durch eine Reihe von Konformations-Änderungen so aus, dass benachbarte Exons in Kontakt treten und über eine biochemische Ligations-Reaktion verbunden werden können.
Die Exon- bzw Intronerkennung der snRNPs wird durch zahlreiche Spleißfaktoren reguliert. Eine Proteinfamilie, die essentiell für die Regulierung des Spleißens ist, sind Serin/Arginin-reiche Proteine (SR-Proteine). Diese binden vorzugsweise an das 3‘ oder 5’ Ende von Exons, rekrutieren snRNPs und stimulieren dadurch die Exon-Inklusion. Durch diese Stimulierung können Spleiß-Events reguliert und gezielt spezifische Exons ausgeschlossen oder eingeschlossen werden. Dieser Prozess, der als alternatives Spleißen (AS) bezeichnet wird, tritt in 95% des menschlichen Transkriptoms auf und erweitert die Diversität eines Organismus, da verschiedene Transkripte von demselben Gen erzeugt werden können und folglich die Translation unterschiedlicher Proteine mit distinkten Funktionen ermöglicht wird.
Darüber hinaus verfügt die Zelle durch das AS über eine weitere posttranskriptionale Genregulationsebene, die insbesondere unter zellulären Stressbedingungen zur Expression von alternativen Protein-Isoformen von der Zelle genutzt wird. Eine in medizinischer Hinsicht besonders relevante Stressbedingung ist die sogenannte Hypoxie, die eine Sauerstoff-Unterversorgung von Zellen oder Gewebebereichen beschreibt. Hypoxie bzw. hypoxische Bereiche finden sich in Krebszellen und treten in 90% aller soliden Tumoren auf. Als Teil der Hypoxie Stress-Antwort, verfügt die Zelle über einen Adaptations-Mechanismus, der durch Hypoxieinduzierbare Faktoren (HIF) vermittelt wird. Diese Faktoren induzieren die Transkription zahlreicher Gene und stimulieren die Expression von Stressfaktoren, die an der zellulären Adaption der Hypoxie beteiligt sind. Einer dieser Faktoren ist der vaskuläre endotheliale Wachstumsfaktor A (VEGFA), welcher unter hypoxischen Bedingungen sekretiert wird und dadurch die Proliferation von Endothelzellen, die Neubildung von Blutgefäßen und damit die Vaskularisation des hypoxischen Bereichs stimuliert.
Die zelluläre Anpassung ist jedoch nicht nur auf die transkriptionelle Regulation des HIF-vermittelten Hypoxie Signalwegs beschränkt, sondern wird auf multiplen Genexpressions-Ebenen reguliert. Obwohl bekannt ist, dass tausende Transkripte unter hypoxischen Bedingungen alternativ gespleißt werden, sind die Faktoren, die die zelluläre Stress-Antwort durch AS regulieren, sowie deren molekularer Mechanismus jedoch weitestgehend unbekannt.
Diese Arbeit umfasst die Identifizierung und Charakterisierung von AS Events, sowie den Einfluss und die Regulation von Spleißfaktoren auf AS unter hypoxischen Bedingungen. Hierzu führten wir globale Genexpressions- und AS-Analysen in HeLaKarzinomzelllinien unter Normoxie (21% O2) und Hypoxie (0.2% O2) durch und zeigen, dass 7962 Gene nach 24h Hypoxie unterschiedlich exprimiert werden. Über AS-Analysen konnten 4434 Transkripte identifiziert werden, die bei Hypoxie über AS reguliert sind. Dabei trat „Exon-Skipping“ als das am häufigsten auftretende AS-Events auf. Über PCR basierte Validierungs-Experimente konnten 5 regulierte Transkripte nachgewiesen werden. Dabei weisen Exon 3 und 4 in BORA, Exon 6 in MDM4 und Exon 4-5 in CSSP1 Exon-Skipping Events auf, während Exon-Inklusionen in CEP192 Exon 28 und in der 3’UTR von EIF4A2 validiert werden konnten.
Darüber hinaus wurde im Rahmen der AS-Analyse die Regulation des sogenannten „backsplicings“ bei Hypoxie untersucht. Im Gegensatz zum linearen Spleißens, wird beim backsplicing das 5’Ende und das 3’Ende von Exons verbunden, was die Bildung von sogenannten zirkulären RNAs (circRNAs) zufolge hat. Obwohl nur wenige Funktionen dieser RNA-Klasse bekannt sind, wurde die Regulation von circRNAs während der Zell-Differenzierung sowie in diversen Krebszellen beschrieben. Dabei können circRNAs als microRNA- oder Protein-Schwämme fungieren oder dienen als Protein-Interaktion Plattform und regulieren dabei die Genexpression.
The attention on the protein PURA has increased recently following the discovery of the rare PURA Syndrome. This neurodevelopmental disorder is caused by de novo mutations in the PURA gene. Notably, our collaborators could show that the protein PURA can bind DNA and RNA in vitro. As a result, I was motivated to explore PURA's cellular RNAbinding activity. Furthermore, I inquired on the connection of PURA-RNA binding to the cellular effect of a reduction of functional PURA as present in PURA Syndrome patients.
To investigate the binding of PURA and the impact of PURA de ciency on cellular RNA and protein expression, I performed an integrative computational analysis of multimodal data from complementary high-throughput experiments. An essential component was the examination of UV Crosslinking and immunoprecipitation (CLIP) experiments, which can query the global RNA-binding behaviour of a given protein in a cellular context. As the processing and analysis of CLIP data are rather complex, I introduce an automated command line tool for the processing of CLIP data named racoon_clip as part of this dissertation. Therefore, this dissertation comprises two major segments. Firstly, I describe the implementation and usage of racoon clip for CLIP data analysis. Secondly, I discuss my research on the protein PURA, demonstrating its global RNA-binding properties, the effects of PURA depletion and its association with neuronal functions and P-bodies, among others.
racoon_clip is a command line application that I have developed for processing of individualnucleotide resolution CLIP (iCLIP) and enhanced CLIP (eCLIP) experiments - two of the most commonly used types of CLIP experiments - in a comparable and user-friendly way.
For this, I built racoon_clip as an automated work how that encompasses all CLIP processing steps from raw data to single-nucleotide resolution crosslink events. racoon_clip is available as a command line tool that users can run with a single command. The work how is implemented with Snakemake work how management providing computational advantage tages including parallelisation, scalability and portability of the work how. The main task of racoon_clip is to extract single-nucleotide crosslink events from iCLIP, iCLIP2, eCLIP and similar data types. To strike a balance between being highly customisable and easy to use, racoon_clip supplies pre-set options for the most common types of experiments.
Additionally, it is possible for users to create a custom setup of barcode and adapter architectures, which allows them to use the software for other types of CLIP data. While accounting for the different architectures in the reads, the performed central processing steps remain the same. This leads to a high degree of comparability between the different experiment types, which I demonstrate in the exemplary processing of U2AF2 iCLIP and eCLIP data. Taken together, I am confident that racoon_clip will be beneficial to numerous researchers interested in RNA-Protein interactions as it offers easily accessible processing for CLIP data and enhances the comparability of multiple CLIP datasets across di erent experiment types.
In the second part of this dissertation, I focus on the cellular function of the RNAbinding protein PURA. Through in-depth computational analysis of one iCLIP data set of endogenous PURA and two iCLIP data sets of overexpressed PURA in HeLa cells, I establish that PURA is a global RNA-binding protein. It preferentially binds RNAs in either the coding sequence (CDS) or the 3' untranslated region (3'UTR) of mature protein-coding transcripts by recognising a Purine-rich degenerated sequence motif. Even though overexpression of PURA results in less specific binding behaviour, the same overall binding patterns as from endogenous PURA persist. Overall characteristics of PURA binding remain similar in three distinct PURA iCLIP data sets with and without PURA overexpression.
To learn about the molecular consequences of a depletion of functional PURA in a cellular context, I used a 50% reduction of PURA in HeLa cells as a model for the heterozygous loss of PURA in PURA Syndrome and evaluated its impact on global RNA and protein expression. The results demonstrate that PURA depletion globally a ects RNA and protein expression. Additionally, I integrate PURA RNA binding with the changes in expression of RNAs and proteins in the context of PURA depletion. This reveals 234 targets of PURA that are bound by PURA and are impacted at both RNA and protein levels by the PURA protein. RNAs that are bound by PURA or change in abundance upon PURA depletion are enriched in neuronal development factors, RNA lifecycle regulators, and mitochondrial factors, among others. Consistent with a possible role of PURA in neuronal transport, there is considerable overlap between PURA bound transcripts and transcripts, that are transported to the dendritic end of neurons.
Notably, there is a link between PURA and P-bodies, as documented by the enrichment of PURA-bound RNAs in both the P-body and stress granule transcriptome. Further, PURA was found by our collaborators to be localised within P-bodies and P-body numbers were strongly reduced in cells that are depleted of PURA. This absence might be attributed to the downregulation of the proteins encoded by the PURA targets LSM14A and DDX6 as both of them were previously identified as essential for P-body formation.
Overall, the reduction of P-body numbers in PURA depletion, the neuronal function of PURA, and its association with mitochondria and RNA lifecycle regulation may indicate the cellular foundation of both PURA Syndrome and related neuronal diseases.
In summary, I present a versatile and user-friendly computational tool for the analysis of CLIP data. Subsequently, I conduct a thorough computational analysis of CLIP and other high-throughput data in the context of the RNA-binding protein PURA, which offers valuable insights into the cellular functions of PURA. These insights advance our understanding of the impact of PURA loss in PURA Syndrome and other disease contexts.
The central dogma of biology is based on the concatenated transfer of information from DNA, via transcribed mRNA, to the translated protein. In eukaryotes, transcription and translation are separated locally as well as temporally by cellular compartmentalization. Prior to active export factor-dependent transport from the nucleus to the cytosol, the newly formed pre-mRNA must mature. This involves 5'capping, splicing, and endonucleolytic cleavage and polyadenylation (CPA).
Transcription of a new pre-mRNA is terminated by hydrolytic cleavage in the 3'-UTR, and the newly formed 3'-end is protected from premature degradation by synthesis of a poly(A) tail. These processes are catalyzed by four multi-protein complexes (CFIm, CFIIm, CPSF, and CsTF) and poly(A) polymerase (PAP). CPA is sequence-specific and dependent on RNA-binding proteins (RBPs). APA-specific sequences include the poly(A) motif ('AAUAAA' and certain motif variants), the UGUA motif, and U/GU-rich sequences upstream and downstream of the poly(A) signal, respectively. About 70% of mammalian genes have more than one polyadenylation site (PAS) and express transcripts of different lengths by a mechanism called alternative polyadenylation (APA). This can affect the length of the 3'UTR (3'UTR-APA) or the coding sequence of the transcript (CDS-APA) if the alternative PAS is upstream of the STOP codon. The length of the 3'UTR affects the stability, export efficiency, subcellular localization, translation rate, and local translation of the nascent transcript. 3'UTR-APA is regulated in the interplay of the cis-elements (poly(A) motif, UGUA and U/GU) and trans-elements (expression of CPA factors). In this context, the functions of the individual cis and trans elements have been extensively studied, yet the regulation of alternative polyadenylation-the decision whether to use the proximal or distal PAS-is less deciphered and requires additional study.
In murine P19 cells, we were able to demonstrate for the first time a direct link between 3'UTR-APA and nuclear export of mature mRNA by the splicing factors SRSF3 and SRSF7 and decipher the mechanism. At the core here is the direct recruitment of the export factor NXF1 by SRSF3 and SRSF7 to transcripts with 3'UTRs of different lengths.
The primary goal of the thesis presented here was to decipher the function of SRSF3 and SRSF7 in the regulation of 3'UTR-APA and to determine the basic mechanism. For this purpose, various genome-wide methods, such as RNA-Seq, MACE-Seq, and iCLIP-Seq, were integrated and the findings were supported by reporter gene and mutation studies.
Initial determination of the poly(A)-tome in P19 cells by MACE-Seq yielded approximately 16,000 PAS and showed that slightly less than 50% of all genes used two or more PAS and expressed alternative 3'UTR isoforms. Further DaPARS analyses after knockdown of Srsf3 or Srsf7 confirmed that SRSF3 affected more transcripts than SRSF7 and led primarily to the expression of long 3'UTRs, whereas SRSF7 promoted the expression of short 3'UTRs. Integration of SRSF3- and SRSF7-specific iCLIP data suggested a possible competition between SRSF3 and SRSF7 at the proximal PAS (pPAS), which could thus act as a hotspot of 3'UTR regulation.
Experiments with intron-free reporter genes revealed that SRSF3- and SRSF7-dependent regulation of 3'UTR-APA is independent of splicing. With respect to SRSF7, a concentration dependence was demonstrated. Mutation experiments involving the SRSF3- and SRSF7-specific binding motifs in the 3'UTR also confirmed the hypothesis of competition between the two SR proteins.
Extensive Co-IP experiments clearly demonstrated that only SRSF7, but not SRSF3, can interact with CFIm and FIP1 (a subunit from the CPSF complex) in an RNA-independent manner. In addition, we showed that these interactions exhibited some phosphorylation dependence, such that the interaction to FIP1 arose primarily in the semi- to hypophosphorylated state of SRSF7. Whereas the interaction to CFIm was mainly detected in the hyperphosphorylated state. The differential affinity between SRSF3 and SRSF7 for polyadenylation factors could be attributed to two SRSF7-specific domains in subsequent mutation experiments: A CCHC-type Zn finger between the RRM and the RS domain, and a hydrophobic 27 amino acid long region in the middle of the RS domain. Together, this suggested that SRSF3 could block the utilization of pPAS, whereas SRSF7 could activate it by directly recruiting polyadenylation factors.
Interestingly, we showed that knockdown of Srsf3 also negatively regulates the expression of Cpsf6 (a subunit of CFIm) through alternative splicing, which subsequently leads to decreased expression of CPSF6 and of CFIm. Reduction of CFIm led to increased expression of transcripts with short 3'UTR, analogous to knockdown of Srsf3. This mirrors the results of previous studies. A direct comparison between SRSF3- and CPSF6-specific transcripts revealed that not all targets were congruent. In addition, we found preliminary evidence for CFIm-related masking of essential cis-pPAS elements by bimodal UGUA motifs at the pPAS. In summary, we present a novel mechanism of indirect 3'UTR-APA regulation through SRSF3-conditional expression of the CFIm subunit CPSF6.
...
RNA modification is a dynamic and complex process that involves the addition of various chemical groups to RNA molecules, contributing to their diversity and functional complexity. Among all the RNA modifications, N6-methyladenosine (m6A) is the most common post-transcriptional modification found in mRNA molecules, particularly in eukaryotic mRNA. It involves methylation of the adenosine base at the nitrogen-6 position. This modification plays a crucial role in many aspects of RNA metabolism, including splicing, stability, translation, and the cellular response to stress. With the development of m6A sequencing technologies, our knowledge of m6A has evolved rapidly over the past two decades. However, one of the most widely used m6A profiling techniques termed “m6A individual-nucleotide resolution UV cross-linking and immunoprecipitation (miCLIP)” suffers from a high unspecific background signal due to the limited antibody binding specificity.
To accurately discriminate m6A sites from the background signal in miCLIP data, in Chapter 4, I first developed different strategies to identify the true miCLIP2 signal changes that are corrected for the underlying transcript abundance changes. I performed this analysis on data that generated with an improved experiment protocol, named miCLIP2. With the best performing strategy, the Bin-based method, I detected more than 10,000 genuine m6A sites. I then used the information embedded in the genuine m6A sites to train a machine learning model - named "m6Aboost" - to enable accurate m6A site detection from the miCLIP2 data without a control dataset from an m6A depletion cell line. To allow an easy access for future users, I packaged the m6Aboost model into an R package that is available on Bioconductor.
Although previous studies have reported that m6A is involved in three different RNA decay pathways, it remains unclear how a pathway is selected for a specific transcript or m6A site. In Chapter 5, I reveal that m6A sites in the coding sequence (CDS) induce a stronger and faster RNA decay than the m6A sites in the 3’ untranslated region (3’UTR). Through an in-depth investigation, I found that m6A sites in CDS trigger a novel mRNA decay pathway, which I termed CDS-m6A decay (CMD). Importantly, CMD is distinct from the three previously reported m6A-mediated decay pathways. In terms of its mechanism, CMD relies on translation, where m6A sites in the CDS lead to ribosome pausing and subsequent destabilization of the transcript. The transcripts targeted by CMD are identified by the m6A reader protein YTHDF2, preferentially localized to processing bodies (P-bodies), and undergo degradation facilitated by the decapping factor DCP2. CMD provides a flexible way to control the expression of CDS m6A-containing transcripts which include many developmental regulators and retrogenes.
In summary, this PhD thesis introduces a novel workflow for identifying m6A sites in miCLIP data through the implementation of the m6Aboost machine learning model. Using the m6A sites identified by m6Aboost and additional data, a newly uncovered m6A-mediated mRNA decay pathway, CMD, is elucidated, providing valuable insights into m6A-mediated decay processes.