OPUS 4 | 004 Datenverarbeitung; Informatik

Concepts and methods for transcriptome-wide prediction of chemical messenger RNA modifications with machine learning (2023)

Acera Mateos, Pablo ; Zhou, You ; Zarnack, Katharina ; Eyras Jiménez, Eduardo Angel

The expanding field of epitranscriptomics might rival the epigenome in the diversity of biological processes impacted. In recent years, the development of new high-throughput experimental and computational techniques has been a key driving force in discovering the properties of RNA modifications. Machine learning applications, such as for classification, clustering or de novo identification, have been critical in these advances. Nonetheless, various challenges remain before the full potential of machine learning for epitranscriptomics can be leveraged. In this review, we provide a comprehensive survey of machine learning methods to detect RNA modifications using diverse input data sources. We describe strategies to train and test machine learning methods and to encode and interpret features that are relevant for epitranscriptomics. Finally, we identify some of the current challenges and open questions about RNA modification analysis, including the ambiguity in predicting RNA modifications in transcript isoforms or in single nucleotides, or the lack of complete ground truth sets to test RNA modifications. We believe this review will inspire and benefit the rapidly developing field of epitranscriptomics in addressing the current limitations through the effective use of machine learning.

Thermodynamic properties of non-Hermitian Nambu-Jona-Lasinio models (2023)

Alexander, Felski ; Beygi, Alireza ; Klevansky, Sandra Pamela

We investigate the impact of non-Hermiticity on the thermodynamic properties of interacting fermions by examining bilinear extensions to the 3+1 dimensional SU(2)-symmetric Nambu--Jona-Lasinio (NJL) model of quantum chromodynamics at finite temperature and chemical potential. The system is modified through the anti-PT-symmetric pseudoscalar bilinear ψ¯γ5ψ and the PT-symmetric pseudovector bilinear iBνψ¯γ5γνψ, introduced with a coupling g. Beyond the possibility of dynamical fermion mass generation at finite temperature and chemical potential, our findings establish model-dependent changes in the position of the chiral phase transition and the critical end-point. These are tunable with respect to g in the former case, and both g and |B|/B0 in the latter case, for both lightlike and spacelike fields. Moreover, the behavior of the quark number, entropy, pressure, and energy densities signal a potential fermion or antifermion excess compared to the standard NJL model, due to the pseudoscalar and pseudovector extension respectively. In both cases regions with negative interaction measure I=ϵ−3p are found. Future indications of such behaviors in strongly interacting fermion systems, for example in the context of neutron star physics, may point toward the presence of non-Hermitian contributions. These trends provide a first indication of curious potential mechanisms for producing non-Hermitian baryon asymmetry. In addition, the formalism described in this study is expected to apply more generally to other Hamiltonians with four-fermion interactions and thus the effects of the non-Hermitian bilinears are likely to be generic.

WormRuler: a software to track body length used to characterize a super red-shifted channelrhodopsin in Caenorhabditis elegans (2022)

Seidenthal, Marius ; Vettkötter, Dennis ; Gottschalk, Alexander

Manipulation of neuronal or muscular activity by optogenetics or other stimuli can be directly linked to the analysis of Caenorhabditis elegans (C. elegans) body length. Thus, WormRuler was developed as an open-source video analysis toolbox that offers video processing and data analysis in one application. Utilizing this novel tool, the super red-shifted channelrhodopsin variant, ChrimsonSA, was characterized in C. elegans. Expression and activation of ChrimsonSA in GABAergic motor neurons results in their depolarization and therefore elongation of body length, the extent of which providing information about the strength of neuronal transmission.

Universality of form: the case of retinal cone photoreceptor mosaics (2023)

Beygi, Alireza

Cone photoreceptor cells are wavelength-sensitive neurons in the retinas of vertebrate eyes and are responsible for color vision. The spatial distribution of these nerve cells is commonly referred to as the cone photoreceptor mosaic. By applying the principle of maximum entropy, we demonstrate the universality of retinal cone mosaics in vertebrate eyes by examining various species, namely, rodent, dog, monkey, human, fish, and bird. We introduce a parameter called retinal temperature, which is conserved across the retinas of vertebrates. The virial equation of state for two-dimensional cellular networks, known as Lemaître’s law, is also obtained as a special case of our formalism. We investigate the behavior of several artificially generated networks and the natural one of the retina concerning this universal, topological law.

AI-derived body composition parameters as prognostic factors in patients with HCC undergoing TACE: results from a multicenter study (2024)

Highlights: • Assessment of body composition parameters in a large cohort of patients with HCC undergoing TACE. • Fully automated artificial intelligence-based quantitative 3D volumetry of abdominal cavity tissue composition. • Skeletal muscle volume and related parameters were independent prognostic factors in patients with HCC undergoing TACE. Background & Aims: Body composition assessment (BCA) parameters have recently been identified as relevant prognostic factors for patients with hepatocellular carcinoma (HCC). Herein, we aimed to investigate the role of BCA parameters for prognosis prediction in patients with HCC undergoing transarterial chemoembolization (TACE). Methods: This retrospective multicenter study included a total of 754 treatment-naïve patients with HCC who underwent TACE at six tertiary care centers between 2010–2020. Fully automated artificial intelligence-based quantitative 3D volumetry of abdominal cavity tissue composition was performed to assess skeletal muscle volume (SM), total adipose tissue (TAT), intra- and intermuscular adipose tissue, visceral adipose tissue, and subcutaneous adipose tissue (SAT) on pre-intervention computed tomography scans. BCA parameters were normalized to the slice number of the abdominal cavity. We assessed the influence of BCA parameters on median overall survival and performed multivariate analysis including established estimates of survival. Results: Univariate survival analysis revealed that impaired median overall survival was predicted by low SM (p <0.001), high TAT volume (p = 0.013), and high SAT volume (p = 0.006). In multivariate survival analysis, SM remained an independent prognostic factor (p = 0.039), while TAT and SAT volumes no longer showed predictive ability. This predictive role of SM was confirmed in a subgroup analysis of patients with BCLC stage B. Conclusions: SM is an independent prognostic factor for survival prediction. Thus, the integration of SM into novel scoring systems could potentially improve survival prediction and clinical decision-making. Fully automated approaches are needed to foster the implementation of this imaging biomarker into daily routine. Impact and implications: Body composition assessment parameters, especially skeletal muscle volume, have been identified as relevant prognostic factors for many diseases and treatments. In this study, skeletal muscle volume has been identified as an independent prognostic factor for patients with hepatocellular carcinoma undergoing transarterial chemoembolization. Therefore, skeletal muscle volume as a metaparameter could play a role as an opportunistic biomarker in holistic patient assessment and be integrated into decision support systems. Workflow integration with artificial intelligence is essential for automated, quantitative body composition assessment, enabling broad availability in multidisciplinary case discussions.

Graph4Med: a web application and a graph database for visualizing and analyzing medical databases (2022)

Schäfer, Jero ; Tang, Ming ; Luu, Danny ; Bergmann, Anke Katharina ; Wiese, Lena

Background: Medical databases normally contain large amounts of data in a variety of forms. Although they grant significant insights into diagnosis and treatment, implementing data exploration into current medical databases is challenging since these are often based on a relational schema and cannot be used to easily extract information for cohort analysis and visualization. As a consequence, valuable information regarding cohort distribution or patient similarity may be missed. With the rapid advancement of biomedical technologies, new forms of data from methods such as Next Generation Sequencing (NGS) or chromosome microarray (array CGH) are constantly being generated; hence it can be expected that the amount and complexity of medical data will rise and bring relational database systems to a limit. Description: We present Graph4Med, a web application that relies on a graph database obtained by transforming a relational database. Graph4Med provides a straightforward visualization and analysis of a selected patient cohort. Our use case is a database of pediatric Acute Lymphoblastic Leukemia (ALL). Along routine patients’ health records it also contains results of latest technologies such as NGS data. We developed a suitable graph data schema to convert the relational data into a graph data structure and store it in Neo4j. We used NeoDash to build a dashboard for querying and displaying patients’ cohort analysis. This way our tool (1) quickly displays the overview of patients’ cohort information such as distributions of gender, age, mutations (fusions), diagnosis; (2) provides mutation (fusion) based similarity search and display in a maneuverable graph; (3) generates an interactive graph of any selected patient and facilitates the identification of interesting patterns among patients. Conclusion: We demonstrate the feasibility and advantages of a graph database for storing and querying medical databases. Our dashboard allows a fast and interactive analysis and visualization of complex medical data. It is especially useful for patients similarity search based on mutations (fusions), of which vast amounts of data have been generated by NGS in recent years. It can discover relationships and patterns in patients cohorts that are normally hard to grasp. Expanding Graph4Med to more medical databases will bring novel insights into diagnostic and research.

Evaluation of automatic discrimination between benign and malignant prostate tissue in the era of high precision digital pathology (2023)

Zhdanovich, Yauheniya ; Ackermann, Jörg ; Wild, Peter Johannes ; Köllermann, Jens ; Bankov, Katrin ; Döring, Claudia ; Flinner, Nadine ; Reis, Henning ; Wenzel, Mike ; Höh, Robert Benedikt ; Mandel, Philipp ; Vogl, Thomas J. ; Harter, Patrick Nikolaus ; Filipski, Katharina Johanna ; Koch, Ina ; Bernatz, Simon

Background: Prostate cancer is a major health concern in aging men. Paralleling an aging society, prostate cancer prevalence increases emphasizing the need for efcient diagnostic algorithms. Methods: Retrospectively, 106 prostate tissue samples from 48 patients (mean age, 66 ± 6.6 years) were included in the study. Patients sufered from prostate cancer (n = 38) or benign prostatic hyperplasia (n = 10) and were treated with radical prostatectomy or Holmium laser enucleation of the prostate, respectively. We constructed tissue microarrays (TMAs) comprising representative malignant (n = 38) and benign (n = 68) tissue cores. TMAs were processed to histological slides, stained, digitized and assessed for the applicability of machine learning strategies and open–source tools in diagnosis of prostate cancer. We applied the software QuPath to extract features for shape, stain intensity, and texture of TMA cores for three stainings, H&E, ERG, and PIN-4. Three machine learning algorithms, neural network (NN), support vector machines (SVM), and random forest (RF), were trained and cross-validated with 100 Monte Carlo random splits into 70% training set and 30% test set. We determined AUC values for single color channels, with and without optimization of hyperparameters by exhaustive grid search. We applied recursive feature elimination to feature sets of multiple color transforms. Results: Mean AUC was above 0.80. PIN-4 stainings yielded higher AUC than H&E and ERG. For PIN-4 with the color transform saturation, NN, RF, and SVM revealed AUC of 0.93 ± 0.04, 0.91 ± 0.06, and 0.92 ± 0.05, respectively. Optimization of hyperparameters improved the AUC only slightly by 0.01. For H&E, feature selection resulted in no increase of AUC but to an increase of 0.02–0.06 for ERG and PIN-4. Conclusions: Automated pipelines may be able to discriminate with high accuracy between malignant and benign tissue. We found PIN-4 staining best suited for classifcation. Further bioinformatic analysis of larger data sets would be crucial to evaluate the reliability of automated classifcation methods for clinical practice and to evaluate potential discrimination of aggressiveness of cancer to pave the way to automatic precision medicine.

Unified probabilistic deep continual learning through generative replay and open set recognition (2022)

Mundt, Martin ; Pliushch, Iuliia ; Majumder, Sagnik ; Hong, Yongwon ; Ramesh, Visvanathan

Modern deep neural networks are well known to be brittle in the face of unknown data instances and recognition of the latter remains a challenge. Although it is inevitable for continual-learning systems to encounter such unseen concepts, the corresponding literature appears to nonetheless focus primarily on alleviating catastrophic interference with learned representations. In this work, we introduce a probabilistic approach that connects these perspectives based on variational inference in a single deep autoencoder model. Specifically, we propose to bound the approximate posterior by fitting regions of high density on the basis of correctly classified data points. These bounds are shown to serve a dual purpose: unseen unknown out-of-distribution data can be distinguished from already trained known tasks towards robust application. Simultaneously, to retain already acquired knowledge, a generative replay process can be narrowed to strictly in-distribution samples, in order to significantly alleviate catastrophic interference.

The critical need to foster computational reproducibility (2022)

Reinecke, Robert ; Trautmann, Tim ; Wagener, Thorsten ; Schüler, Katja

Generative AI for scalable feedback to multimodal exercises (2024)

Jürgensmeier, Lukas ; Skiera, Bernd

Detailed feedback on exercises helps learners become proficient but is time-consuming for educators and, thus, hardly scalable. This manuscript evaluates how well Generative Artificial Intelligence (AI) provides automated feedback on complex multimodal exercises requiring coding, statistics, and economic reasoning. Besides providing this technology through an easily accessible web application, this article evaluates the technology’s performance by comparing the quantitative feedback (i.e., points achieved) from Generative AI models with human expert feedback for 4,349 solutions to marketing analytics exercises. The results show that automated feedback produced by Generative AI (GPT-4) provides almost unbiased evaluations while correlating highly with (r = 0.94) and deviating only 6 % from human evaluations. GPT-4 performs best among seven Generative AI models, albeit at the highest cost. Comparing the models’ performance with costs shows that GPT-4, Mistral Large, Claude 3 Opus, and Gemini 1.0 Pro dominate three other Generative AI models (Claude 3 Sonnet, GPT-3.5, and Gemini 1.5 Pro). Expert assessment of the qualitative feedback (i.e., the AI’s textual response) indicates that it is mostly correct, sufficient, and appropriate for learners. A survey of marketing analytics learners shows that they highly recommend the app and its Generative AI feedback. An advantage of the app is its subject-agnosticism—it does not require any subject- or exercise-specific training. Thus, it is immediately usable for new exercises in marketing analytics and other subjects.

Open Access

004 Datenverarbeitung; Informatik

Refine

Author

Year of publication

Document Type

Language

Has Fulltext

Is part of the Bibliography

Keywords

Institute

264 search hits