Refine
Document Type
- Preprint (3) (remove)
Language
- English (3)
Has Fulltext
- yes (3)
Is part of the Bibliography
- no (3) (remove)
Institute
- Biowissenschaften (3) (remove)
Functional genomics studies in model organisms and human cell lines provided important insights into gene functions and their context-dependent role in genetic circuits. However, our functional understanding of many of these genes and how they combinatorically regulate key biological processes, remains limited. To enable the SpCas9-dependent mapping of gene-gene interactions in human cells, we established 3Cs multiplexing for the generation of combinatorial gRNA libraries in a distribution-unbiased manner and demonstrate its robust performance. The optimal number for combinatorial hit calling was 16 gRNA pairs and the skew of a library’s distribution was identified as a critical parameter dictating experimental scale and data quality. Our approach enabled us to investigate 247,032 gRNA-pairs targeting 12,736 gene-interactions in human autophagy. We identified novel genes essential for autophagy and provide experimental evidence that gene-associated categories of phenotypic strengths exist in autophagy. Furthermore, circuits of autophagy gene interactions reveal redundant nodes driven by paralog genes. Our combinatorial 3Cs approach is broadly suitable to investigate unexpected gene-interaction phenotypes in unperturbed and diseased cell contexts.
Motivation Expert curation to differentiate between functionally diverged homologs and those that may still share a similar function routinely relies on the visual interpretation of domain architecture changes. However, the size of contemporary data sets integrating homologs from hundreds to thousands of species calls for alternate solutions. Scoring schemes to evaluate domain architecture similarities can help to automatize this procedure, in principle. But existing schemes are often too simplistic in the similarity assessment, many require an a-priori resolution of overlapping domain annotations, and those that allow overlaps to extend the set of annotations sources cannot account for redundant annotations. As a consequence, the gap between the automated similarity scoring and the similarity assessment based on visual architecture comparison is still too wide to make the integration of both approaches meaningful.
Results Here, we present FAS, a scoring system for the comparison of multi-layered feature architectures integrating information from a broad spectrum of annotation sources. Feature architectures are represented as directed acyclic graphs, and redundancies are resolved in the course of comparison using a score maximization algorithm. A benchmark using more than 10,000 human-yeast ortholog pairs reveals that FAS consistently outperforms existing scoring schemes. Using three examples, we show how automated architecture similarity assessments can be routinely applied in the benchmarking of orthology assignment software, in the identification of functionally diverged orthologs, and in the identification of entries in protein collections that most likely stem from a faulty gene prediction.
Orthologs document the evolution of genes and metabolic capacities encoded in extant and ancient genomes. Orthologous genes that are detected across the full diversity of contemporary life allow reconstructing the gene set of LUCA, the last universal common ancestor. These genes presumably represent the functional repertoire common to – and necessary for – all living organisms. Design of artificial life has the potential to test this. Recently, a minimal gene (MG) set for a self-replicating cell was determined experimentally, and a surprisingly high number of genes have unknown functions and are not represented in LUCA. However, as similarity between orthologs decays with time, it becomes insufficient to infer common ancestry, leaving ancient gene set reconstructions incomplete and distorted to an unknown extent. Here we introduce the evolutionary traceability, together with the software protTrace, that quantifies, for each protein, the evolutionary distance beyond which the sensitivity of the ortholog search becomes limiting. We show that the LUCA set comprises only high-traceable proteins most of which have catalytic functions. We further show that proteins in the MG set lacking orthologs outside bacteria mostly have low traceability, leaving open whether their eukaryotic orthologs have just been overlooked. On the example of REC8, a protein essential for chromosome cohesion, we demonstrate how a traceability-informed adjustment of the search sensitivity identifies hitherto missed orthologs in the fast-evolving microsporidia. Taken together, the evolutionary traceability helps to differentiate between true absence and non-detection of orthologs, and thus improves our understanding about the evolutionary conservation of functional protein networks.