Refine
Document Type
- Article (2)
- Working Paper (1)
Language
- English (3)
Has Fulltext
- yes (3)
Is part of the Bibliography
- no (3)
Keywords
- assembly gaps (1)
- benchmarking (1)
- genome assembly (1)
- long sequencing reads (1)
Background: Long sequencing reads allow increasing contiguity and completeness of fragmented, short-read–based genome assemblies by closing assembly gaps, ideally at high accuracy. While several gap-closing methods have been developed, these methods often close an assembly gap with sequence that does not accurately represent the true sequence.
Findings: Here, we present DENTIST, a sensitive, highly accurate, and automated pipeline method to close gaps in short-read assemblies with long error-prone reads. DENTIST comprehensively determines repetitive assembly regions to identify reliable and unambiguous alignments of long reads to the correct loci, integrates a consensus sequence computation step to obtain a high base accuracy for the inserted sequence, and validates the accuracy of closed gaps. Unlike previous benchmarks, we generated test assemblies that have gaps at the exact positions where real short-read assemblies have gaps. Generating such realistic benchmarks for Drosophila (134 Mb genome), Arabidopsis (119 Mb), hummingbird (1 Gb), and human (3 Gb) and using simulated or real PacBio continuous long reads, we show that DENTIST consistently achieves a substantially higher accuracy compared to previous methods, while having a similar sensitivity.
Conclusion: DENTIST provides an accurate approach to improve the contiguity and completeness of fragmented assemblies with long reads. DENTIST's source code including a Snakemake workflow, conda package, and Docker container is available at https://github.com/a-ludi/dentist. All test assemblies as a resource for future benchmarking are at https://bds.mpi-cbg.de/hillerlab/DENTIST/.
Non-standard errors
(2021)
In statistics, samples are drawn from a population in a data-generating process (DGP). Standard errors measure the uncertainty in sample estimates of population parameters. In science, evidence is generated to test hypotheses in an evidence-generating process (EGP). We claim that EGP variation across researchers adds uncertainty: non-standard errors. To study them, we let 164 teams test six hypotheses on the same sample. We find that non-standard errors are sizeable, on par with standard errors. Their size (i) co-varies only weakly with team merits, reproducibility, or peer rating, (ii) declines significantly after peer-feedback, and (iii) is underestimated by participants.
Malignant germ cell tumors (GCT) are the most common malignant tumors in young men between 18 and 40 years. The correct identification of histological subtypes, in difficult cases supported by immunohistochemistry, is essential for therapeutic management. Furthermore, biomarkers may help to understand pathophysiological processes in these tumor types. Two GCT cell lines, TCam-2 with seminoma-like characteristics, and NTERA-2, an embryonal carcinoma-like cell line, were compared by a quantitative proteomic approach using high-resolution mass spectrometry (MS) in combination with stable isotope labelling by amino acid in cell culture (SILAC). We were able to identify 4856 proteins and quantify the expression of 3936. 347 were significantly differentially expressed between the two cell lines. For further validation, CD81, CBX-3, PHF6, and ENSA were analyzed by western blot analysis. The results confirmed the MS results. Immunohistochemical analysis on 59 formalin-fixed and paraffin-embedded (FFPE) normal and GCT tissue samples (normal testis, GCNIS, seminomas, and embryonal carcinomas) of these proteins demonstrated the ability to distinguish different GCT subtypes, especially seminomas and embryonal carcinomas. In addition, siRNA-mediated knockdown of these proteins resulted in an antiproliferative effect in TCam-2, NTERA-2, and an additional embryonal carcinoma-like cell line, NCCIT. In summary, this study represents a proteomic resource for the discrimination of malignant germ cell tumor subtypes and the observed antiproliferative effect after knockdown of selected proteins paves the way for the identification of new potential drug targets.