Universitätspublikationen
Refine
Year of publication
Language
- English (26)
Has Fulltext
- yes (26) (remove)
Is part of the Bibliography
- no (26)
Keywords
- orthology (2)
- Adhesion (1)
- Arabidopsis (1)
- Biodiversity (1)
- Brachiopoda (1)
- Brachiozoa (1)
- Bryozoa (1)
- Cell biology (1)
- Cetraria aculeata (1)
- Collembola (1)
Institute
- Biowissenschaften (22)
- Biodiversität und Klima Forschungszentrum (BiK-F) (10)
- Senckenbergische Naturforschende Gesellschaft (9)
- Medizin (6)
- Exzellenzcluster Makromolekulare Komplexe (3)
- Institut für Ökologie, Evolution und Diversität (3)
- Biochemie und Chemie (1)
- Buchmann Institut für Molekulare Lebenswissenschaften (BMLS) (1)
- Frankfurt Institute for Advanced Studies (FIAS) (1)
Functional genomics studies in model organisms and human cell lines provided important insights into gene functions and their context-dependent role in genetic circuits. However, our functional understanding of many of these genes and how they combinatorically regulate key biological processes, remains limited. To enable the SpCas9-dependent mapping of gene-gene interactions in human cells, we established 3Cs multiplexing for the generation of combinatorial gRNA libraries in a distribution-unbiased manner and demonstrate its robust performance. The optimal number for combinatorial hit calling was 16 gRNA pairs and the skew of a library’s distribution was identified as a critical parameter dictating experimental scale and data quality. Our approach enabled us to investigate 247,032 gRNA-pairs targeting 12,736 gene-interactions in human autophagy. We identified novel genes essential for autophagy and provide experimental evidence that gene-associated categories of phenotypic strengths exist in autophagy. Furthermore, circuits of autophagy gene interactions reveal redundant nodes driven by paralog genes. Our combinatorial 3Cs approach is broadly suitable to investigate unexpected gene-interaction phenotypes in unperturbed and diseased cell contexts.
Motivation Expert curation to differentiate between functionally diverged homologs and those that may still share a similar function routinely relies on the visual interpretation of domain architecture changes. However, the size of contemporary data sets integrating homologs from hundreds to thousands of species calls for alternate solutions. Scoring schemes to evaluate domain architecture similarities can help to automatize this procedure, in principle. But existing schemes are often too simplistic in the similarity assessment, many require an a-priori resolution of overlapping domain annotations, and those that allow overlaps to extend the set of annotations sources cannot account for redundant annotations. As a consequence, the gap between the automated similarity scoring and the similarity assessment based on visual architecture comparison is still too wide to make the integration of both approaches meaningful.
Results Here, we present FAS, a scoring system for the comparison of multi-layered feature architectures integrating information from a broad spectrum of annotation sources. Feature architectures are represented as directed acyclic graphs, and redundancies are resolved in the course of comparison using a score maximization algorithm. A benchmark using more than 10,000 human-yeast ortholog pairs reveals that FAS consistently outperforms existing scoring schemes. Using three examples, we show how automated architecture similarity assessments can be routinely applied in the benchmarking of orthology assignment software, in the identification of functionally diverged orthologs, and in the identification of entries in protein collections that most likely stem from a faulty gene prediction.
Molluscs are the second most species-rich phylum in the animal kingdom, yet only eleven genomes of this group have been published so far. Here, we present the draft genome sequence of the pulmonate freshwater snail Radix auricularia. Six whole genome shotgun libraries with different layouts were sequenced. The resulting assembly comprises 4,823 scaffolds with a cumulative length of 910 Mb and an overall read coverage of 72x. The assembly contains 94.6 % of a metazoan core gene collection, indicating an almost complete coverage of the coding fraction. The discrepancy of ~690 Mb compared to the estimated genome size of R. auricularia (1.6 Gb) results from a high repeat content of 70 % mainly comprising DNA transposons. The annotation of 17,338 protein coding genes was supported by the use of publicly-available transcriptome data. This draft will serve as starting point for further genomic and population genetic research in this scientifically important phylum.
Molluscs are the second most species-rich phylum in the animal kingdom, yet only 11 genomes of this group have been published so far. Here, we present the draft genome sequence of the pulmonate freshwater snail Radix auricularia. Six whole genome shotgun libraries with different layouts were sequenced. The resulting assembly comprises 4,823 scaffolds with a cumulative length of 910 Mb and an overall read coverage of 72×. The assembly contains 94.6% of a metazoan core gene collection, indicating an almost complete coverage of the coding fraction. The discrepancy of ∼690 Mb compared with the estimated genome size of R. auricularia (1.6 Gb) results from a high repeat content of 70% mainly comprising DNA transposons. The annotation of 17,338 protein coding genes was supported by the use of publicly available transcriptome data. This draft will serve as starting point for further genomic and population genetic research in this scientifically important phylum.
Combinatorial CRISPR-Cas screens have advanced the mapping of genetic interactions, but their experimental scale limits the number of targetable gene combinations. Here, we describe 3Cs multiplexing, a rapid and scalable method to generate highly diverse and uniformly distributed combinatorial CRISPR libraries. We demonstrate that the library distribution skew is the critical determinant of its required screening coverage. By circumventing iterative cloning of PCR-amplified oligonucleotides, 3Cs multiplexing facilitates the generation of combinatorial CRISPR libraries with low distribution skews. We show that combinatorial 3Cs libraries can be screened with minimal coverages, reducing associated efforts and costs at least 10-fold. We apply a 3Cs multiplexing library targeting 12,736 autophagy gene combinations with 247,032 paired gRNAs in viability and reporter-based enrichment screens. In the viability screen, we identify, among others, the synthetic lethal WDR45B-PIK3R4 and the proliferation-enhancing ATG7-KEAP1 genetic interactions. In the reporter-based screen, we identify over 1,570 essential genetic interactions for autophagy flux, including interactions among paralogous genes, namely ATG2A-ATG2B, GABARAP-MAP1LC3B and GABARAP-GABARAPL2. However, we only observe few genetic interactions within paralogous gene families of more than two members, indicating functional compensation between them. This work establishes 3Cs multiplexing as a platform for genetic interaction screens at scale.
Orthologs document the evolution of genes and metabolic capacities encoded in extant and ancient genomes. Orthologous genes that are detected across the full diversity of contemporary life allow reconstructing the gene set of LUCA, the last universal common ancestor. These genes presumably represent the functional repertoire common to – and necessary for – all living organisms. Design of artificial life has the potential to test this. Recently, a minimal gene (MG) set for a self-replicating cell was determined experimentally, and a surprisingly high number of genes have unknown functions and are not represented in LUCA. However, as similarity between orthologs decays with time, it becomes insufficient to infer common ancestry, leaving ancient gene set reconstructions incomplete and distorted to an unknown extent. Here we introduce the evolutionary traceability, together with the software protTrace, that quantifies, for each protein, the evolutionary distance beyond which the sensitivity of the ortholog search becomes limiting. We show that the LUCA set comprises only high-traceable proteins most of which have catalytic functions. We further show that proteins in the MG set lacking orthologs outside bacteria mostly have low traceability, leaving open whether their eukaryotic orthologs have just been overlooked. On the example of REC8, a protein essential for chromosome cohesion, we demonstrate how a traceability-informed adjustment of the search sensitivity identifies hitherto missed orthologs in the fast-evolving microsporidia. Taken together, the evolutionary traceability helps to differentiate between true absence and non-detection of orthologs, and thus improves our understanding about the evolutionary conservation of functional protein networks.
Accurate determination of the evolutionary relationships between genes is a foundational challenge in biology. Homology—evolutionary relatedness—is in many cases readily determined based on sequence similarity analysis. By contrast, whether or not two genes directly descended from a common ancestor by a speciation event (orthologs) or duplication event (paralogs) is more challenging, yet provides critical information on the history of a gene. Since 2009, this task has been the focus of the Quest for Orthologs (QFO) Consortium. The sixth QFO meeting took place in Okazaki, Japan in conjunction with the 67th National Institute for Basic Biology conference. Here, we report recent advances, applications, and oncoming challenges that were discussed during the conference. Steady progress has been made toward standardization and scalability of new and existing tools. A feature of the conference was the presentation of a panel of accessible tools for phylogenetic profiling and several developments to bring orthology beyond the gene unit—from domains to networks. This meeting brought into light several challenges to come: leveraging orthology computations to get the most of the incoming avalanche of genomic data, integrating orthology from domain to biological network levels, building better gene models, and adapting orthology approaches to the broad evolutionary and genomic diversity recognized in different forms of life and viruses.
Ribosome assembly is an essential and carefully choreographed cellular process. In eukaryotes, several 100 proteins, distributed across the nucleolus, nucleus, and cytoplasm, co-ordinate the step-wise assembly of four ribosomal RNAs (rRNAs) and approximately 80 ribosomal proteins (RPs) into the mature ribosomal subunits. Due to the inherent complexity of the assembly process, functional studies identifying ribosome biogenesis factors and, more importantly, their precise functions and interplay are confined to a few and very well-established model organisms. Although best characterized in yeast (Saccharomyces cerevisiae), emerging links to disease and the discovery of additional layers of regulation have recently encouraged deeper analysis of the pathway in human cells. In archaea, ribosome biogenesis is less well-understood. However, their simpler sub-cellular structure should allow a less elaborated assembly procedure, potentially providing insights into the functional essentials of ribosome biogenesis that evolved long before the diversification of archaea and eukaryotes. Here, we use a comprehensive phylogenetic profiling setup, integrating targeted ortholog searches with automated scoring of protein domain architecture similarities and an assessment of when search sensitivity becomes limiting, to trace 301 curated eukaryotic ribosome biogenesis factors across 982 taxa spanning the tree of life and including 727 archaea. We show that both factor loss and lineage-specific modifications of factor function modulate ribosome biogenesis, and we highlight that limited sensitivity of the ortholog search can confound evolutionary conclusions. Projecting into the archaeal domain, we find that only few factors are consistently present across the analyzed taxa, and lineage-specific loss is common. While members of the Asgard group are not special with respect to their inventory of ribosome biogenesis factors (RBFs), they unite the highest number of orthologs to eukaryotic RBFs in one taxon. Using large ribosomal subunit maturation as an example, we demonstrate that archaea pursue a simplified version of the corresponding steps in eukaryotes. Much of the complexity of this process evolved on the eukaryotic lineage by the duplication of ribosomal proteins and their subsequent functional diversification into ribosome biogenesis factors. This highlights that studying ribosome biogenesis in archaea provides fundamental information also for understanding the process in eukaryotes.
What is in Umbilicaria pustulata? A metagenomic approach to reconstruct the holo-genome of a lichen
(2020)
Lichens are valuable models in symbiosis research and promising sources of biosynthetic genes for biotechnological applications. Most lichenized fungi grow slowly, resist aposymbiotic cultivation, and are poor candidates for experimentation. Obtaining contiguous, high-quality genomes for such symbiotic communities is technically challenging. Here, we present the first assembly of a lichen holo-genome from metagenomic whole-genome shotgun data comprising both PacBio long reads and Illumina short reads. The nuclear genomes of the two primary components of the lichen symbiosis—the fungus Umbilicaria pustulata (33 Mb) and the green alga Trebouxia sp. (53 Mb)—were assembled at contiguities comparable to single-species assemblies. The analysis of the read coverage pattern revealed a relative abundance of fungal to algal nuclei of ∼20:1. Gap-free, circular sequences for all organellar genomes were obtained. The bacterial community is dominated by Acidobacteriaceae and encompasses strains closely related to bacteria isolated from other lichens. Gene set analyses showed no evidence of horizontal gene transfer from algae or bacteria into the fungal genome. Our data suggest a lineage-specific loss of a putative gibberellin-20-oxidase in the fungus, a gene fusion in the fungal mitochondrion, and a relocation of an algal chloroplast gene to the algal nucleus. Major technical obstacles during reconstruction of the holo-genome were coverage differences among individual genomes surpassing three orders of magnitude. Moreover, we show that GC-rich inverted repeats paired with nonrandom sequencing error in PacBio data can result in missing gene predictions. This likely poses a general problem for genome assemblies based on long reads.