Refine
Document Type
- Doctoral Thesis (10)
Has Fulltext
- yes (10)
Is part of the Bibliography
- no (10)
Keywords
- Baleen whales (1)
- Bioinformatics (1)
- Conservation (1)
- Developmental Biology (1)
- Evolution (1)
- Genome (1)
- endothelial cell (1)
- m6A (1)
- npas4l (1)
- pronephric duct (1)
Institute
- Biowissenschaften (10)
Transposable elements (TEs) are replicating genetic elementst hat comprise up to 50% of mammalian genomes. A specific class of TEs are retrotransposons that proliferate by transcription into a RNA intermediate, followed by genomic reintegration into another locus (so called “copy & paste” mechanism). Due to the lack of removal mechanisms and very rare parallel insertions, the presence of TE insertions at ortholgous genomic loci in multiple taxa provides a virtually homoplasy free phylogenetic marker. So far, developing phylogenetically informative markers from TE insertions has been a tedious work of testing hundreds of putative candidate loci in a trial-and error approach with low success rate. Hence, phylogenetic studies using TE insertions were often limited to a few dozen markers.
Recently, genome sequencing of multiple species using reference-mapping allowed the identification of genome-scale datasets of TE insertions. and made the ad-hoc development of phylogenetic informative markers possible. However, genome scale TE detection methods have rarely been applied to non model organisms in which data availability and quality is comparably limited. In this thesis, I developed the TeddyPi pipeline (TE detection and discovery for phylogenetic inference), a software tool that made it possible to obtain reliable genome-scale TE insertion data from low-coverage genomes. This was achieved by integrating the data from multiple TE and structural variation callers as well as applying a stringent filtering pipeline to exclude low-quality insertion calls. Whole-genome sequencing datasets of bears (Ursidae) and baleen whales (Mysticeti) were used to apply TE based phylogenetic inference and evaluate the method in comparison to sequence-based phylogenomic analyses.
In the bear genomes, TeddyPi identified 150,513 high-quality transposable element (TE) insertions, which allowed me to reconstruct the evolutionary history of bears despite extensive phylogenetic conflict (Lammers et al., 2017). The large number of detected TE insertions made also detailed network analyses possible that visualize the phylogenetic conflict. Experimental polymerase chain reaction (PCR) assays validated up to 93 % of the computationally identified TE loci and demonstrated the high accuracy of the dataset underlying the phylogenetic analyses.
Second, I present the initial genome sequencing of six baleen whales and a detailed investigation of their evolutionary history using TE insertions and established sequence-based phylogenomic methods. The taxon sampling of baleen whales included iconic species like the blue whale (Balaneoptera musculus) or the humpback whale (Megaptera novaengliae) (Árnason et al., 2018). A sequence-based reconstruction of the baleen whale species tree solved the long-debated phylogenetic position of the gray whale (Echrichtius robustus) within rorquals (Balaneopteridae) for the first time with high statistical support. Furthermore, the genome data made it possible to identify large extent of phylogenetic conflict for divergences during the radiation of rorquals that occurred 7-10 million years ago (Ma).
The phylogenomic analyses of 91,589 TE insertions in the whale genomes confirmed the sequence-based topology (Lammers et al., 2019). The quantification of phylogenetic signals obtained from the TE insertions revealed a high degree of discordance for the divergence of the gray whale and rorquals. Despite the large genome-scale dataset, statistical tests showed only marginal support for a bifurcating divergence of gray whales and the rorqual species. The limited statistical support for a strictly bifurcating tree obtained from genome-scale datasets of thousands of markers demonstrates the importance for including phylogenetic networks for displaying evolutionary divergences.
In conclusion, this thesis shows that identification of TE insertions from whole-genome resequencing provides plentiful and accurate phylogenomic markers. For the application in non model organisms, I provide a easy-to-use software to integrate multiple datasets from TE and structural variation callers in order to obtain reliable and ascertainment-bias free datasets. Detecting genome-scale datasets of TE insertions in two case studies demonstrates the applicability of this marker system for phylogenetic reconstruction and inferring phylogenetic conflict.
Die Beobachtung, dass Tumorzellen häufig eine Abhängigkeit gegenüber einer einzelnen und treibenden Mutation entwickeln, obwohl sie zahlreiche Mutationen aufweisen, bildet die Grundlage der mittlerweile etablierten, zielgerichteten Tumortherapie (Weinstein, 2002). Mit der Identifikation verantwortlicher Signalwege sowie beteiligter Signalkomponenten, sind Ansatzpunkte für diese Therapieform geschaffen worden, die bereits zu einigen Erfolgen in der Leukämie-, Brustkrebs- oder Lungenkrebsbehandlung geführt haben (Druker et al., 2001; Slamon et al., 2001; Kwak et al., 2010) . In vielen Fällen stellt sich jedoch ein Rückfall aufgrund der Ausbildung von Resistenzen ein oder auch das Nichtanschlagen der Therapien wird beobachtet (Ramos & Bentires-Alj, 2015).
Verschiedenste Mechanismen kommen dabei in Frage, doch häufig werden kompensatorische Veränderungen in den Signalwegen beobachtet, die schließlich zur Umgehung der Inhibition führen (Holohan et al., 2013). Grundlage hierfür ist die Redundanz und Verknüpfung der Signalwege mit- und untereinander, die es der Zelle im Sinne der Homöostase ermöglichen sich flexibel an ihre Umgebung anzupassen (Rosell et al., 2013; Sun & Bernards, 2014) . Daher ist es von äußerster Wichtigkeit, die Mechanismen der Inhibition im Hinblick auf die Signalwege der Zellen genauer zu verstehen, und dabei nicht nur die direkten, sondern auch die indirekten Effekte der Inhibition zu analysieren. So lassen sich Rückschlüsse auf den Einsatz zielgerichteter Medikamenten ziehen, die in besseren Therapiekombinationen resultieren und dadurch die Entstehung von Resistenzen verhindern.
Eine Hyper-Aktivierung von STAT3 sowie das dadurch induzierte Genmuster sind als starkes onkogenes Signal identifiziert worden, und spielen darüber hinaus an der Vermittlung von Resistenzen gegenüber Tumortherapien eine entscheidende Rolle. Durch seine Rolle in diversen zellulären Prozessen, beeinflusst STAT3 die Proliferation und das Überleben von Tumorzellen, ihr migratorisches und invasives Verhalten sowie ihre Kommunikation mit Stroma- und Immunzellen. (Bromberg et al., 1999; Wake & Watson, 2015) Sehr selten ist die aberrante Aktivierung des Transkriptionsfaktors auf eigene Mutationen zurückzuführen, vielmehr sorgen Treiber überhalb für diese (Johnston & Grandis, 2011; Kucuk et al., 2015).
In der vorliegenden Arbeit wurden verschiedene STAT3-Inhibitionen in unterschiedlichen Modellen verglichen um darüber Rückschlüsse auf Kriterien einer Therapie zu ziehen. In einem Gliommodell aus der Maus, dem eine v-SRC-Expression als Treiber zu Grunde liegt (Smilowitz et al., 2007), wurde eine indirekte, BMX-vermittelte STAT3-Inhibition mit einer zielgerichteten STAT3-Hemmung verglichen. BMX, die zur TEC-Kinase-Familie gehört, wird als STAT3-aktivierende Kinase beschrieben. In letzter Zeit wurde ihr Einfluss bei der Tumorentwicklung immer deutlicher (Dai et al., 2006; Hart et al., 2011; Holopainen et al., 2012). Unter anderem konnte in Glioblastom-Stammzellen eine BMX-vermittelte STAT3-Aktivierung als Treiber für die Selbsterneurungskapazität und das tumorigene Potential identifiziert werden (Guryanova et al., 2011). Mit dem Tyrosinkinase-Inhibitor Canertinib ist es gelungen, in den murinen Tu-2449-Gliomzellen eine BMX-vermittelte STAT3-Aktivierung nachzuweisen und zu inhibieren. Dies ist damit die erste Arbeit, in der Canertinib als BMX-Inhibitor in einem endogenen Zellsystem getestet wurde. Die einmalige Canertinib-Gabe resultierte in einem Zellzyklusarrest der G1-Phase und die Aufrechterhaltung der Inhibitorwirkung im Zelltod. Im Vergleich dazu konnte eine RNAivermittelte STAT3-Stilllegung nicht das Absterben dieser Zellen induzieren. Mit der Suche weiterer Zielstrukturen von Canertinib, die die Grundlage dieser unterschiedlichen Phänotypen bilden, konnte eine zusätzliche AKT-Inhibition identifiziert werden. Sehr wahrscheinlich wird die AKT-Inhibition ebenfalls durch BMX vermittelt, da keine Inhibition der ERBB-Familie bestätigt werden konnte. Um die Effekte weiter abzugleichen wurden Canertinib-Versuche mit einem humanen Brustkrebsmodell durchgeführt, das als Treiber eine Überexpression des EGFR aufweist.
In MDA-MB-468-Zellen, in denen keine BMX-Aktivierung vorliegt, resultierte eine Canertinib-Behandlung in der sehr prominenten Inhibition des ERK-Signalweges und in einer weniger ausgeprägten Verminderung der STAT3- und AKT-Aktivierung. Auch in diesen Zellen führte die Canertinib-Behandlung zum Zelltod. Diese Effekte werden sehr wahrscheinlich durch die Inhibition des EGFR induziert, da Canertinib als pan-ERBBInhibitor beschrieben ist (Slichenmyer et al., 2001; Djerf Severinsson et al., 2011) .
Resultate die früher in der Arbeitsgruppe gewonnen wurden, beweisen, dass eine Herunterregulation von STAT3 in der Brustkrebszelllinie MDA-MB-468 ausreicht um ein Absterben der Zellen zu induzieren (Groner et al., 2008).
Die Ergebnisse dieser Arbeit zeigen, dass eine Canertinib-Behandlung über die Inhibition unterschiedlicher Signalwege den Zelltod in beiden Zelllinien induziert. Obwohl beide Zelllinien Treiber-vermitteltes, konstitutiv aktives STAT3 aufweisen, stellt nur in den Brustkrebszellen seine Inhibition eine ausreichende Therapiebedingung dar. Somit sind die Unterschiede zwischen den beiden Zelllinien essentiell für ein Überleben der Zellen nach einer STAT3-Inhibition. In Zukunft ist es wichtig, diese Unterschiede zu identifizieren um damit zu definieren, in welchen Patientengruppen eine STAT3-Inhibition zum Erfolg führt.
Die Analyse von DNA-Sequenzen steht spätestens seit der Feststellung ihrer tragenden Rolle in der Vererbung organismischer Eigenschaften im Fokus biologischer Fragestellungen. Seit Kurzem wird mit modernsten Methoden die Untersuchung von kompletten Genomen ermöglicht. Dies eröffnet den Zugang zu genomweiten Informationen gegenüber begrenzt aussagekräftigen markerbasierten Analysen. Eine Genomsequenz ist die ultimative Quelle an organismischer Information. Allerdings sind diese Informationen oft aufgrund technischer und biologischer Gründe komplex und werfen meist mehr Fragen auf, als sie beantworten.
Die Rekonstruktion einer bislang unbekannten Genomsequenz aus kurzen Sequenzen stellt eine technische Herausforderung dar, die mit grundlegenden, aber in der Realität nicht zwingend zutreffenden Annahmen verbunden ist. Außerdem können biologische Faktoren, wie Repeatgehalt oder Heterozygotie, die Fehlerrate einer Assemblierung stark beeinflussen. Die Beurteilung der Qualität einer de novo Assemblierung ist herausfordernd, aber zugleich äußerst notwendig. Anschließend ist eine strukturelle und funktionale Annotation von Genen, kodierenden Bereichen und repeats nötig, um umfangreiche biologische Fragestellungen beantworten zu können. Ein qualitativ hochwertiges und annotiertes assembly ermöglicht genomweite Analysen von Individuen und Populationen. Diese Arbeit beinhaltet die Assemblierung und Annotation des Genoms der Süßwasserschnecke Radix auricularia und eine Studie vergleichender Genomik von fünf Individuen aus verschiedenen molekularen Gruppen (MOTUs).
Mollusken beherbergen nach den Insekten die größte Artenvielfalt innerhalb der Tierstämme und besiedeln verschiedenste, teils extreme, Habitate. Trotz der großen Bedeutung für die Biodiversitätsforschung sind verhältnismäßig wenige genomische Daten öffentlich verfügbar. Zudem sind Arten der Gattung Radix auch aufgrund ihrer großen geografischen Verbreitung in diversen biologischen Disziplinen als Modellorganismen etabliert. Eine annotierte Genomsequenz ermöglicht über bereits untersuchte Felder hinaus die Forschung an grundlegenden biologischen Fragestellungen, wie z.B. die Funktionsweise von Hybridisierung und Artbildung. Durch Assemblierung und scaffolding von sechs whole genome shotgun Bibliotheken verschiedener insert sizes und einem transkriptbasiertem scaffolding konnte trotz des hohen Repeatgehalts ein vergleichsweise kontinuierliches assembly erhalten werden. Die erhebliche Differenz zwischen der Gesamtlänge der Assemblierung und der geschätzten Genomgröße konnte zum Großteil auf kollabierte repeats zurückgeführt werden.
Die strukturelle Annotation basierend auf Transkriptomen, Proteinen einer Datenbank und artspezifisch trainierten Genvorhersagemodellen resultierte in 17.338 proteinkodierenden Genen, die etwa 12,5% der geschätzten Genomgröße abdecken. Der Annotation wird u.a. aufgrund beinhaltender Kernrthologen, konservierter Proteindomänenarrangements und der Übereinstimmung mit de novo sequenzierten Peptiden eine hohe Qualität zugesprochen.
Das mapping der Sequenzen von fünf Radix MOTUs gegen die R. auricularia Assemblierung zeigte stark verringerte coverage außerhalb kodierender Bereiche der nicht-Referenz MOTUs aufgrund hoher Nukleotiddiversität. Für 16.039 Gene konnten Topologien berechnet werden und ein Test auf positive Selektion ausgeführt werden. Insgesamt konnte über alle MOTUs hinweg in 678 verschiedenen Genen positive Selektion detektiert werden, wobei jede MOTU ein nahezu einzigartiges Set positiv selektierter Gene beinhaltet. Von allen 16.039 untersuchten Genen konnten 56,4% funktional annotiert werden. Diese niedrige Rate wird vermutlich durch Mangel an genomischer Information in Mollusken verursacht. Anschließende Analysen auf Anreicherungen von Funktionen sind deshalb nur bedingt repräsentativ.
Neben den biologischen Ergebnissen wurden Methoden und Optimierungen genomischer Analysen von Nichtmodellorganismen entwickelt. Dazu zählen eigens angefertigte Skripte, um beispielsweise Transkriptomalignments zu filtern, Trainings eines Genvorhersagemodells automatisiert und parallelisiert auszuführen und Orthogruppen bestimmter Arten aus einer Orthologievorhersage zu extrahieren. Zusätzlich wurden Abläufe entwickelt, um möglichst viele vorhandene Daten in die Assemblierung und Annotation zu integrieren. Etwa wurde ein zusätzliches scaffolding mit eigens assemblierten Transkripten mehrerer MOTUs sequenziell und phylogenetisch begründet ausgeführt.
Insgesamt wird eine umfassende und qualitativ hochwertige Genomsequenz eines Süßwassermollusken präsentiert, welche eine Grundlage für zukünftige Forschungsprojekte z.B. im Bereich der Biodiversität, Populationsgenomik und molekularen Ökologie bietet. Die Ergebnisse dieser Arbeit stellen einen Wissenszuwachs in der Genomik von Mollusken dar, welche bisher trotz ihrer Artenvielfalt deutlich unterrepräsentiert bezüglich assemblierter und annotierter Genome auffallen.
Most cellular processes are regulated by RNA-binding proteins (RBPs). These RBPs usually use defined binding sites to recognize and directly interact with their target RNA molecule. Individual-nucleotide resolution UV crosslinking and immunoprecipitation (iCLIP) experiments are an important tool to de- scribe such interactions in cell cultures in-vivo. This experimental protocol yields millions of individual sequencing reads from which the binding spec- trum of the RBP under study can be deduced. In this PhD thesis I studied how RNA processing is driven from RBP binding by analyzing iCLIP-derived sequencing datasets.
First, I described a complete data analysis pipeline to detect RBP binding sites from iCLIP sequencing reads. This workflow covers all essential process- ing steps, from the first quality control to the final annotation of binding sites. I described the accurate integration of biological iCLIP replicates to boost the initial peak calling step while ensuring high specificity through replicate re- producibility analysis. Further I proposed a routine to level binding site width to streamline downstream analysis processes. This was exemplified in the re- analysis of the binding spectrum of the U2 small nuclear RNA auxiliary factor 2 (U2AF2, U2AF65). I recaptured the known dominance of U2AF65 to bind to intronic sequences of protein-coding genes, where it likely recognizes the polypyrimidine tract as part of the core spliceosome machinery.
In the second part of my thesis, I analyzed the binding spectrum of the serine and arginine rich splicing factor 6 (SRSF6) in the context of diabetes. In pancreatic beta-cells, the expression of SRSF6 is regulated by the transcription factor GLIS3, which encodes for a diabetes susceptibility gene. It is known that SRSF6 promotes beta-cell death through the splicing dysregulation of genes essential to beta-cell function and survival. However, the exact mechanism of how these RNAs are targeted by SRSF6 remains poorly understood. Here, I applied the defined iCLIP processing pipeline to describe the binding landscape of the splicing factor SRSF6 in the human pancreatic beta-cell line EndoC-H1. The initial binding sites definition revealed a predominant binding to coding sequences (CDS) of protein-coding genes. This was followed up by extensive motif analysis which revealed a so far, in human, unknown purine-rich binding motif. SRSF6 seemed to specifically recognize repetitions of the triplet GAA. I also showed that the number of contiguous triplets correlated with increasing binding site strength. I further integrated RNA-sequencing data from the same cell type, with SRSF6 in KD and in basal conditions, to analyze SRSF6- related splicing changes. I showed that the exact positioning of SRSF6 on alternatively spliced exons regulates the produced transcript isoforms. This mechanism seemed to control exons in several known susceptibility genes for diabetes.
In summary, in my PhD thesis, I presented a comprehensive workflow for the processing of iCLIP-derived sequencing data. I applied this pipeline on a dataset from pancreatic beta-cells to unveil the impact of SRSF6-mediated splicing changes. Thus, my analysis provides novel insights into the regulation of diabetes susceptibility genes.
Characterizing the hologenome of Lasallia pustulata and tracing genomic footprints of lichenization
(2017)
The lichen symbiosis – consisting of fungal mycobionts and photoautotroph photobionts (green algae or cyanobacteria) – is globally successful. It covers an estimated 6% of the global surface with habitats ranging from deserts to the arctic. This success is reflected in the diversity of the mycobionts, with around 21% of all fungal species participating in lichen symbioses that can be facultative or obligate. Lichenization is furthermore evolutionary old, with fossil evidence for lichens reaching back 415 million years. For an individual fungal lineage, the Lecanoromycetes, the lichenization happened around 300 million years ago. This longstanding symbiotic relationship and the diversity of observed symbiotic dependency make them promising models to study the genomic consequences that follow the establishment of symbioses. Despite this, only little is known about the genomic effects of lichenization and extreme symbiotic dependency. To fill this gap we sequenced the hologenome of the lichen Lasallia pustulata, where the mycobiont could so far not been cultivated, suggesting that it might be more dependent on its symbionts.
As the poor culturability of lichen symbionts renders their genomes inaccessible to standard sequencing practices, we evaluated the extent to which different metagenome sequencing- and de novo assembly-strategies can be used to sequence and reconstruct the genomes of the individual symbionts. We find that the abundances of individual genomes present in the L. pustulata hologenome vary substantially, with the mycobiont being most abundant. Using in silico generated data sets and real Illumina sequencing data for L. pustulata we observe that the skewed abundances prevent a contiguous assembly of the underrepresented genomes when using only short-read sequencing. We conclude that short-read sequencing can offer first insights into lichen hologenomes. The fragmentation of the reconstructions hinders downstream analyses into the genomic consequences of lichenization though, as these are focused on identifying the gain and loss of genes.
We thus demonstrate a hybrid genome assembly strategy that is based on both short- and long-read sequencing. We show that this strategy is capable of creating highly contiguous genome reconstructions, not only for the L. pustulata mycobiont but also its photobiont Trebouxia sp., along with substantial amounts of the bacterial microbiome. A subsequent analysis of the microbiome of L. pustulata – performed over nine different samples collected in Germany and Italy – showed a stable taxonomic composition across the geographic range. We find that Acidobacteriaceae, which are known to thrive in nutrient poor habitats, are the dominant taxa. These would make them well adapted for the co-habitation with L. pustulata, which largely grows on rocks. Whether the Acidobacteriaceae are functionally involved in the lichen symbiosis is unclear so far.
As further comparative genomic studies rely on comprehensive genome annotations, we evaluate the completeness and fidelity of the gene annotations for the mycobiont L. pustulata as well as four further Lecanoromycetes. This reveals that un- and mis-annotated genes impact all evaluated genomes, with artificially joined genes and unannotated genes having the largest impact. In addition to these factors we find that the sequence composition – especially G/C-rich inverted repeats – lead to sequencing errors that interfere with the gene prediction. We minimize the effects of these artifacts through a rigorous curation.
Given the extremely sparse taxon sampling of available green alga genomes, we focus our search for the genomic footprints of lichenization on the mycobionts. We compare the genomes of the Lecanoromycetes to their closest relatives, the Eurotiomycetes and Dothideomycetes. This reveals that the last common ancestor of the Lecanoromycetes has lost around 10% of its genes after they split from the non-lichenized ancestor they share with the Eurotiomycetes. These losses are furthermore enriched, showing an excessive loss of genes involved with the degradation of polysaccharides. The loss of these genes fits a change from an ancestral saprotrophic lifestyle that depends on degrading complex plant matter, to the symbiotic lifestyle that relies on simpler nutrients provided by the photobionts. While the last common ancestor of the Lecanoromycetes additionally gained around 400 genes these could so far not be further characterized due to a lack of functionally annotated reference data.
As the mycobiont L. pustulata could so far not been grown in axenic culture, we initially expected to find an extensive genomic remodeling compared to the other mycobionts that easily grow in culture. We do not find evidence for this. Analyzing both the contraction of gene families and the loss of genes, we observe that L. pustulata and Umbilicaria muehlenbergii – its close relative that is easily grown in culture – share most of these. Furthermore, L. pustulata does not show an excessive loss of evolutionary old and well-conserved genes. These effects are mirrored on the functional level, as neither gene family contractions nor gene losses show a functional enrichment. This is partially due to the lack of functional reference data, analogous to the genes gained in the Lecanoromycetes, rendering their characterization hard. Thus, further studies on the genomic consequences of lichenization and differences in symbiotic dependence will have to be conducted, including larger taxon sets. This will be even more important for the photobionts, as the Chlorophyta are even more sparsely sampled today, hindering an effective functional and evolutionary study.
Hypoxia is a condition in which cells are deprived of adequate oxygen supply and represents a main feature of solid tumours. Cells under hypoxic stress activate transcriptional responses driven by hypoxia-inducible factors (HIFs), which affect multiple cellular pathways, including angiogenesis, metabolic adaptation and cell proliferation. While the transcriptional changes induced in hypoxic tumours are well characterised, it is still poorly understood how hypoxia contributes to the aberrant post-transcriptional regulation observed in tumours. In this PhD thesis, I studied the RNA response to hypoxia in cancer, to provide novel insights into its regulation.
Using deep RNA-Sequencing (RNA-Seq), I investigated transcriptome changes of three human cell lines from lung, cervical and breast cancer under hypoxia, advancing our knowledge of post-transcriptional gene regulation in hypoxic cancer. I show that hypoxia induced consistent changes in transcript abundance in the three cancer types. This was coupled to divergent splicing responses, highlighting the cell type specificity of alternative splicing programs. While the mRNA levels of RNA-binding proteins were mainly reduced, hypoxia upregulated muscleblind-like protein 2 (MBNL2) in all three cell lines. Hypoxia control was specific for MBNL2, since it did not affect its paralogs MBNL1 and MBNL3. Via knockdown experiments of MBNL2 in hypoxic cells, I could show that MBNL2 induction promotes adaptation of cancer cells to low oxygen by regulating both transcript abundance and alternative splicing of hypoxia response genes. In addition, depletion of MBNL2 reduced the proliferation and migration of cancer cells, corroborating a function of MBNL2 as cancer driver.
In the last few years, a novel class of RNAs has gained attention, namely circular RNAs (circRNAs), which are produced by a particular splicing mechanism, known as back-splicing. CircRNAs have been reported to change their abundance in cancer and their high stability makes them promising candidates as diagnostic biomarkers. In this study, I took advantage of deep rRNA-depleted RNA-Seq data to comprehensively investigate the expression of circRNAs in human cancer cells and their changes in response to hypoxia. To reliably identify circRNAs, I established a pipeline that integrates two available tools. for circRNA detection with custom approaches for quantification and statistical analysis. Using this pipeline, I identified 12006 circRNAs in the three cancer cell lines. Their molecular features suggest an involvement of complementary RNA sequences as well as trans-acting factors in circRNA biogenesis, including the splicing factor HNRNPC. Remarkably, I detected 210 circRNAs that are more abundant than their linear counterparts. Upon hypoxic stress, 64 circRNAs were differentially expressed in cancer cells, in most cases in a cell type-specific manner. In summary, in this PhD thesis, I present a comparative transcriptome profiling in human cancer cell lines. It reveals MBNL2 as an important player in hypoxic cancer progression and provides novel insights into the biogenesis and regulation of circRNAs under hypoxic stress.
Baleen whales (Mysticeti) are a clade of highly adapted carnivorous marine mammals that can reach extremely large body sizes and feature characteristic keratinaceous baleen plates used for obligate filter feeding. From a conservation perspective, nearly all baleen whale species were hunted extensively over a roughly 100 years lasting time period that depleted many of the respective whale stocks with so far unknown consequences for e.g. their molecular viability. From an evolutionary perspective, the lack of fossil records together with conflicting molecular patterns resulted in a still unclear and debated phylogeny of modern baleen whales, particularly in rorquals (Balaenopteridae). In this dissertation, I will demonstrate the application of baleen whale genomes to tackle these open questions by using modern approaches of conservation and evolutionary genomics.
Conservation genomic aspects of baleen whales were addressed in two projects, both using whole genome data of either an Icelandic fin whale (Balaenoptera physalus) population or multiple blue whale (Balaenoptera musculus) populations to evaluate the impact of the industrial whaling era on their molecular viability. The results suggest a substantial drop in effective population size of both species but also a lack of manifestation in genotypes of the fin whale population when compared to the blue whale populations. Especially the rare and short runs of homozygosity (ROH), usually indicative for inbreeding, suggest frequent outcrossing in fin whales while all analyzed blue whale populations featured long and frequent ROH. In addition to these analyses, genome data of blue whale populations was further used to evaluate if northern hemisphere blue whales diverged into different subspecies. Population genetic and gene flow analyses showed clearly separated and well isolated populations in accordance with their assumed geographical distance. In contrast, the genome-wide divergence between all blue whale populations was low compared to other cetacean populations and to the next closely related sei whale species. Because this includes the morphologically different and well recognized pygmy blue whale subspecies, a proposal was made to equally categorize the two northern-hemisphere blue whale populations as subspecies.
Evolutionary aspects were addressed in a third project, by constructing the genome of the pygmy right whale (Caperea marginata) and testing its potential in phylogenetics and cancer research. Phylogenomic analyses using fragments of a whole-genome alignment featuring nearly all extant baleen whales, allowed the revision of the complex evolutionary relationships of rorquals by quantifying and characterizing the amounts of conflicts in early diverging branches. These relationships were further used to identify phylogenetically independent pairs of baleen whales with a maximum of diverging body size differences to compare rates of positive selection between their genomes. The results suggest nearly evenly distributed frequencies of alternative topologies which supports the representation of the early divergence of rorquals as a hard polytomy with high amounts of introgression and incomplete lineage sorting. Within the set of available genomic data, three independent pairs of baleen whales with diverging body sizes were found and comparisons of positive selection rates resulted in many potentially body size and cancer related genes. The lack of conserved selection patterns, however, suggest a more convergent evolution of size and cancer resistance like previously discussed in paleontology.
In conclusion, the application of whole genome data using methods of conservation genetics allowed for a comprehensive estimation about the molecular viability of blue and fin whales as well as an assessment of the taxonomic status of northern-hemisphere blue whale populations. The rather different results between blue and fin whales underlines the importance of genomic monitoring of baleen whales because different species show rather different molecular consequences of their potentially varying depletions. Furthermore, as showcased for the northern-hemisphere blue whale, many important isolated populations of baleen whales may still be unknown to conservation management and genome-wide comparisons will most likely contribute to overcome this under-classification problem. The application of whole genome data in evolutionary research allowed the characterization of the complex patterns of molecular conflicts within baleen whales and especially rorquals that will contribute to the still rather unclear understanding of their evolution. The here found molecular support for the idea of convergent evolution of gigantism in whales will further guide the search for molecular patterns responsible for Peto’s paradox.
Discrepancies between knockdown and knockout animal model phenotypes have long stood as a perplexing phenomenon. Several mechanisms explaining such observations have been proposed, namely the toxicity or the off-target effects of the knockdown reagents, as well as, in certain cases, genetic robustness – an organism's ability to maintain its phenotype despite genetic perturbations. In addition to these explanations, transcriptional adaptation (TA), a phenomenon defined as an event whereby a mutation in one gene leads to transcriptional upregulation or downregulation of another, adapting, gene or genes expression, has been recently proposed as an alternative explanation for the conflicting knockdown and knockout phenotype paradox.
Since its discovery in 2015, TA's precise mechanism remains a subject of ongoing research. Majority of evidence suggests that mutant mRNA degradation plays a central in TA. Epigenetic remodeling is also thought to play a role, as evidenced by an increase in active histone marks at the transcription start sites of the adapting genes. Whether mRNA degradation is indeed the key player in TA remains debated. Furthermore, it is still unknown how exactly TA develops, what adapting genes it targets, and whether genomic mutations that render mutant mRNA sensitive to degradation are required for TA to occur.
Throughout the experiments described in this Dissertation, I have designed an inducible TA system where TA can be triggered on demand and its effects on the cell’s transcriptome followed through time. I have demonstrated that degradation-prone transgenes, once induced and expressed, can be efficiently degraded, resulting in the protein loss-independent upregulation of adapting genes via TA. Adapting genes with higher degree of sequence similarity become upregulated faster than genes with lower degree of sequence similarity. Further functionality of this approach to study TA is limited by the leakiness of the inducible gene expression system; however, constitutively expressed degradation-prone transgenes were used to demonstrate TA in human cells.
In addition, I have developed an approach to target wild-type cytoplasmic mRNAs without altering the cell’s genome and reported a TA-like phenomenon, which manifested as adapting gene upregulation not relying on mutations in other genes. Cytoplasmic mRNA cleavage with CRISPR-Cas13d triggered a TA-like response in three different gene models: Actg1 knockdown, Ctnna1 knockdown, and Nckap1 knockdown. After comparing two different modes of triggering TA, CRISPR-Cas9 knockout versus CRISPR-Cas13d knockdown, I reported little overlap between the dysregulated genes and suggested that diverse mRNA degradation modes led to distinct TA responses. In addition, the transcriptional increase of Actg2 caused by CRISPR-Cas13d-mediated Actg1 mRNA cleavage did not require chromatin accessibility changes.
Experiments and genetic tools described in this dissertation investigated how TA develops from its earliest onset, how it affects the global transcriptome of the cell, as well as provided compelling evidence for an mRNA degradation-central TA mechanism. I have created tools to study both direct and indirect TA gene targets and unveiled important insights into the temporal dynamics of TA. Genes with higher sequence similarity were found to be upregulated more rapidly than those with lower similarity. Furthermore, it was revealed that the epigenetic properties of TA responses vary depending on the triggering mechanism. Cas13d-mediated degradation of wild-type mRNAs led to immediate transcriptional enhancement independent of epigenetic changes, which stood in contrast to previously measured alterations in chromatin accessibility in CRISPR-Cas9 mutants. This research has thus significantly advanced our knowledge of TA and provided valuable tools and findings that contribute to the broader understanding of gene expression regulation in response to mRNA degradation.
Die Bildung von Blutgefäßen ist essentiell für die Entwicklung und Homöostase von Wirbeltieren und die Endothelzellspezifikation ist ein wichtiger erster Schritt in diesem Prozess. Das früheste bekannte Ereignis bei der Endothelzellspezifikation im Zebrafisch ist die Expression des bHLH-PAS-Transkriptionsfaktor-Gens npas4l. Ich habe eine transgene V5-Linie zum Nachweis des markierten Npas4l auf Proteinebene und eine Gal4-VP16-Reporterlinie zur Visualisierung und Verfolgung von npas4l exprimierenden Zellen in vivo generiert. Beide Linien können bereits in frühen Entwicklungsstadien nachgewiesen werden und komplementieren auch starke npas4l-Mutanten Allele. Um npas4l Reporter exprimierende Zellen in npas4l Mutanten zu verfolgen, habe ich anschließend eine mutierte Variante der Gal4-Reporterlinie erzeugt. Diese Mutante trägt eine Insertion in der Region, die die DNA-Bindedomäne kodiert. Dadurch stört sie die Npas4l-Funktion, aber nicht die Reporterexpression. Dieses mutierte Reporterallel komplementiert nicht die npas4l-Mutanten und zeigt einen starken Phänotyp, was darauf hindeutet, dass es sich um ein funktionelles Nullallel handelt. Phänotypische Analysen zeigten, dass npas4l-Reporter positive Zellen in npas4l-Mutanten nicht spezifizieren oder zur Mittelachse wandern. Stattdessen tragen sie zu den vom intermediären Mesoderm abgeleiteten pronephrischen Tubuli und dem vom paraxialen Mesoderm abgeleiteten Skelettmuskel bei. Ich habe diese Phänotypen durch Einzelzell-RNAseq an den npas4l-Reporter positiven Zellen in npas4l+/- und npas4l-/- Embryonen bestätigt. Zusammen erklären diese beiden alternativen Zellschicksale den Großteil der beobachteten Veränderungen zwischen den Genotypen. Npas4l ist dafür bekannt die Expression der drei Transkriptionsfaktorgene etsrp, tal1 und lmo2 zu fördern. Ich stellte die Hypothese auf, dass das Fehlen jedes dieser Transkriptionsfaktoren in npas4l-Mutanten verschiedene Aspekte des npas4l-Phänotyps verursacht. Daher habe ich Mutantenlinien für alle drei Gene generiert und sie sowohl in vaskulären Reporterlinien als auch im npas4l-Reporterhintergrund analysiert. Die Daten legen nahe, dass verschiedene Gene unterschiedliche Prozesse während der frühen Endothelentwicklung regulieren. In npas4l-/- und etsrp-/- Embryonen differenzieren npas4l-Reporter exprimierende Zellen nicht zu Endothelzellen und tragen stattdessen zur Skelettmuskelzellpopulation bei. In npas4l-/- und tal1-/- Embryonen können npas4l-Reporter exprimierende Zellen nicht migrieren und tragen stattdessen zu der Bildung der pronephrischen Tubuli bei. Um die Beziehung zwischen diesen Faktoren besser zu verstehen, habe ich getestet, ob die Injektion von etsrp-, tal1- oder lmo2-mRNA verschiedene Aspekte des npas4l-Phänotyps retten würde. npas4l-, etsrp- und tal1-Mutanten zeigen alle schwere vaskuläre Phänotypen. Einige Endothelzellen und vaskuläre Strukturen bleiben jedoch in jeder Mutante erhalten. Der Phänotyp ist am stärksten in npas4l-/- Embryonen, aber selbst in diesen Embryonen können einige fli1a-positive Endothelzellen in der Schwanzregion beobachtet werden. Es war unklar, ob sich diese Population von Endothelzellen unabhängig von der Npas4l-, Tal1- und Etsrp-Funktion entwickelt oder als Folge einer restlichen tal1- oder etsrp-Expression unabhängig von Npas4l. Um diese Frage zu untersuchen, habe ich Doppelmutanten generiert und nach dem Vorhandensein von fli1a-positiven Endothelzellen in diesen Mutanten gesucht. Während fli1a-positive Endothelzellen in npas4l-/- und npas4l-/-;tal1-/- Embryonen deutlich vorhanden sind, können keine solchen Zellen in npas4l-/-;etsrp-/- oder etsrp-/-;tal1-/- Embryonen beobachtet werden. Diese Daten deuten darauf hin, dass sich im Zebrafisch keine Endothelzellen entwickeln können, wenn zugleich npas4l und etsrp oder etsrp und tal1 gestört sind. Während der Verlust von etsrp zu stärkeren Defekten in npas4l-Mutanten führt, gibt es keinen zusätzlichen Phänotyp, der durch den Verlust von tal1verursacht wird, was darauf hindeutet, dass die Expression von etsrp, aber nicht die von tal1, unabhängig von Npas4l auftreten kann. Diese Idee wird durch die Beobachtung unterstützt, dass etsrp, aber nicht tal1-Expression in den meisten fli1a-exprimierenden Zellen in npas4l-/- Embryonen beobachtet wird. Dennoch wird der Großteil -Expression durch Npas4l reguliert. tal1-mRNA-Injektionen reichten aus, um eine Wildtyp-ähnliche vaskuläre Musterbildung im Bauchbereich der npas4l-/- Embryonen wiederherzustellen, einschließlich der Rettung sowohl der Zellmigration als auch der Differenzierung. Da Npas4l mehrere unterschiedliche transkriptionelle Effektoren hat, war eine so starke Rettung durch nur einen dieser Effektoren unerwartet. In den geretteten Mutanten wurde die bilaterale Population von npas4l-Reporter-positiven pronephrischen Tubuluszellen nicht entdeckt, aber die Anzahl der ektopischen npas4l-Reporter exprimierenden Muskelzellen war im Vergleich zu nicht injizierten npas4l-Mutanten gleichbleibend.
...
RNA modification is a dynamic and complex process that involves the addition of various chemical groups to RNA molecules, contributing to their diversity and functional complexity. Among all the RNA modifications, N6-methyladenosine (m6A) is the most common post-transcriptional modification found in mRNA molecules, particularly in eukaryotic mRNA. It involves methylation of the adenosine base at the nitrogen-6 position. This modification plays a crucial role in many aspects of RNA metabolism, including splicing, stability, translation, and the cellular response to stress. With the development of m6A sequencing technologies, our knowledge of m6A has evolved rapidly over the past two decades. However, one of the most widely used m6A profiling techniques termed “m6A individual-nucleotide resolution UV cross-linking and immunoprecipitation (miCLIP)” suffers from a high unspecific background signal due to the limited antibody binding specificity.
To accurately discriminate m6A sites from the background signal in miCLIP data, in Chapter 4, I first developed different strategies to identify the true miCLIP2 signal changes that are corrected for the underlying transcript abundance changes. I performed this analysis on data that generated with an improved experiment protocol, named miCLIP2. With the best performing strategy, the Bin-based method, I detected more than 10,000 genuine m6A sites. I then used the information embedded in the genuine m6A sites to train a machine learning model - named "m6Aboost" - to enable accurate m6A site detection from the miCLIP2 data without a control dataset from an m6A depletion cell line. To allow an easy access for future users, I packaged the m6Aboost model into an R package that is available on Bioconductor.
Although previous studies have reported that m6A is involved in three different RNA decay pathways, it remains unclear how a pathway is selected for a specific transcript or m6A site. In Chapter 5, I reveal that m6A sites in the coding sequence (CDS) induce a stronger and faster RNA decay than the m6A sites in the 3’ untranslated region (3’UTR). Through an in-depth investigation, I found that m6A sites in CDS trigger a novel mRNA decay pathway, which I termed CDS-m6A decay (CMD). Importantly, CMD is distinct from the three previously reported m6A-mediated decay pathways. In terms of its mechanism, CMD relies on translation, where m6A sites in the CDS lead to ribosome pausing and subsequent destabilization of the transcript. The transcripts targeted by CMD are identified by the m6A reader protein YTHDF2, preferentially localized to processing bodies (P-bodies), and undergo degradation facilitated by the decapping factor DCP2. CMD provides a flexible way to control the expression of CDS m6A-containing transcripts which include many developmental regulators and retrogenes.
In summary, this PhD thesis introduces a novel workflow for identifying m6A sites in miCLIP data through the implementation of the m6Aboost machine learning model. Using the m6A sites identified by m6Aboost and additional data, a newly uncovered m6A-mediated mRNA decay pathway, CMD, is elucidated, providing valuable insights into m6A-mediated decay processes.