Refine
Year of publication
Document Type
- Article (33)
Language
- English (33)
Has Fulltext
- yes (33)
Is part of the Bibliography
- no (33)
Keywords
- data science (5)
- Data science (4)
- artificial intelligence (4)
- digital medicine (4)
- Machine-learning (3)
- machine-learning (3)
- Biomedical informatics (2)
- Data processing (2)
- Functional clustering (2)
- Olfactory system (2)
Institute
- Medizin (30)
- Pharmazie (3)
- Biochemie und Chemie (1)
- Biochemie, Chemie und Pharmazie (1)
- Biowissenschaften (1)
Biomedical data obtained during cell experiments, laboratory animal research, or human studies often display a complex distribution. Statistical identification of subgroups in research data poses an analytical challenge. Here were introduce an interactive R-based bioinformatics tool, called “AdaptGauss”. It enables a valid identification of a biologically-meaningful multimodal structure in the data by fitting a Gaussian mixture model (GMM) to the data. The interface allows a supervised selection of the number of subgroups. This enables the expectation maximization (EM) algorithm to adapt more complex GMM than usually observed with a noninteractive approach. Interactively fitting a GMM to heat pain threshold data acquired from human volunteers revealed a distribution pattern with four Gaussian modes located at temperatures of 32.3, 37.2, 41.4, and 45.4 °C. Noninteractive fitting was unable to identify a meaningful data structure. Obtained results are compatible with known activity temperatures of different TRP ion channels suggesting the mechanistic contribution of different heat sensors to the perception of thermal pain. Thus, sophisticated analysis of the modal structure of biomedical data provides a basis for the mechanistic interpretation of the observations. As it may reflect the involvement of different TRP thermosensory ion channels, the analysis provides a starting point for hypothesis-driven laboratory experiments.
Background: Prevention of persistent pain following breast cancer surgery, via early identification of patients at high risk, is a clinical need. Supervised machine-learning was used to identify parameters that predict persistence of significant pain.
Methods: Over 500 demographic, clinical and psychological parameters were acquired up to 6 months after surgery from 1,000 women (aged 28–75 years) who were treated for breast cancer. Pain was assessed using an 11-point numerical rating scale before surgery and at months 1, 6, 12, 24, and 36. The ratings at months 12, 24, and 36 were used to allocate patents to either "persisting pain" or "non-persisting pain" groups. Unsupervised machine learning was applied to map the parameters to these diagnoses.
Results: A symbolic rule-based classifier tool was created that comprised 21 single or aggregated parameters, including demographic features, psychological and pain-related parameters, forming a questionnaire with "yes/no" items (decision rules). If at least 10 of the 21 rules applied, persisting pain was predicted at a cross-validated accuracy of 86% and a negative predictive value of approximately 95%.
Conclusions: The present machine-learned analysis showed that, even with a large set of parameters acquired from a large cohort, early identification of these patients is only partly successful. This indicates that more parameters are needed for accurate prediction of persisting pain. However, with the current parameters it is possible, with a certainty of almost 95%, to exclude the possibility of persistent pain developing in a woman being treated for breast cancer.
The use of artificial intelligence (AI) systems in biomedical and clinical settings can disrupt the traditional doctor–patient relationship, which is based on trust and transparency in medical advice and therapeutic decisions. When the diagnosis or selection of a therapy is no longer made solely by the physician, but to a significant extent by a machine using algorithms, decisions become nontransparent. Skill learning is the most common application of machine learning algorithms in clinical decision making. These are a class of very general algorithms (artificial neural networks, classifiers, etc.), which are tuned based on examples to optimize the classification of new, unseen cases. It is pointless to ask for an explanation for a decision. A detailed understanding of the mathematical details of an AI algorithm may be possible for experts in statistics or computer science. However, when it comes to the fate of human beings, this “developer’s explanation” is not sufficient. The concept of explainable AI (XAI) as a solution to this problem is attracting increasing scientific and regulatory interest. This review focuses on the requirement that XAIs must be able to explain in detail the decisions made by the AI to the experts in the field.
Feature selection is a common step in data preprocessing that precedes machine learning to reduce data space and the computational cost of processing or obtaining the data. Filtering out uninformative variables is also important for knowledge discovery. By reducing the data space to only those components that are informative to the class structure, feature selection can simplify models so that they can be more easily interpreted by researchers in the field, reminiscent of explainable artificial intelligence. Knowledge discovery in complex data thus benefits from feature selection that aims to understand feature sets in the thematic context from which the data set originates. However, a single variable selected from a very small number of variables that are technically sufficient for AI training may make little immediate thematic sense, whereas the additional consideration of a variable discarded during feature selection could make scientific discovery very explicit. In this report, we propose an approach to explainable feature selection (XFS) based on a systematic reconsideration of unselected features. The difference between the respective classifications when training the algorithms with the selected features or with the unselected features provides a valid estimate of whether the relevant features in a data set have been selected and uninformative or trivial information was filtered out. It is shown that revisiting originally unselected variables in multivariate data sets allows for the detection of pathologies and errors in the feature selection that occasionally resulted in the failure to identify the most appropriate variables.
In a recent discussion on how to deal with data analysis issues initiated by reviewers of pain-related scientific manuscripts in the European Journal of Pain, a seemingly simple statistical issue was raised: two subsets of data in a paper had the same mean and standard deviation. A reviewer asked for a statistical test for or against the identity of the subset distributions. The authors insisted that if the mean and standard deviation were the same, this was sufficient evidence that the subsets of data were not significantly different.
This prompted a discussion among pain researchers, who are not necessarily primarily from the field of data science, a discussion of the importance of carefully examining the distribution of pain-related data in a journal whose primary audience is pain researchers seems warranted...
Recent advances in mathematical modelling and artificial intelligence have challenged the use of traditional regression analysis in biomedical research. This study examined artificial and cancer research data using binomial and multinomial logistic regression and compared its performance with other machine learning models such as random forests, support vector machines, Bayesian classifiers, k-nearest neighbours and repeated incremental clipping (RIPPER). The alternative models often outperformed regression in accurately classifying new cases. Logistic regression had a structural problem similar to early single-layer neural networks, which limited its ability to identify variables with high statistical significance for reliable class assignment. Therefore, regression is not always the best model for class prediction in biomedical datasets. The study emphasises the importance of validating selected models and suggests that a mixture of experts approach may be a more advanced and effective strategy for analysing biomedical datasets.
Motivation: Gaussian mixture models (GMMs) are probabilistic models commonly used in biomedical research to detect subgroup structures in data sets with one-dimensional information. Reliable model parameterization requires that the number of modes, i.e., states of the generating process, is known. However, this is rarely the case for empirically measured biomedical data. Several implementations are available that estimate GMM parameters differently. This work aims to provide a comparative evaluation of automated GMM fitting methods.
Results and conclusions: The performance of commonly used algorithms for automatic parameterization and mode number determination was compared with respect to reproducing the ground truth of generated data derived from multiple normal distributions. Four main variants of Gaussian mode number detection algorithms and five variants of GMM parameter estimation methods were tested in a combinatory scenario. The combination of best performing mode number determination algorithms and GMM parameter estimation methods was then tested on artificial and real-live data sets known to display a GMM structure. None of the tested methods correctly determined the underlying data structure consistently. The likelihood ratio test had the best performance in identifying the mode number associated with the best GMM fit of the data distribution while the Markov chain Monte Carlo (MCMC) algorithm was best for GMM parameter estimation while. The combination of the two methods of number determination algorithms and GMM parameter estimation was consistently among the best and overall outperformed the available implementations.
Implementation: An automated tool for the detection of GMM based structures in (biomedical) datasets was created based on the present results and made freely available in the R library “opGMMassessment” at https://cran.r-project.org/package=opGMMassessment.
Selecting the k best features is a common task in machine learning. Typically, a few features have high importance, but many have low importance (right-skewed distribution). This report proposes a numerically precise method to address this skewed feature importance distribution in order to reduce a feature set to the informative minimum of items. Computed ABC analysis (cABC) is an item categorization method that aims to identify the most important items by partitioning a set of non-negative numerical items into subsets "A", "B", and "C" such that subset "A" contains the "few important" items based on specific properties of ABC curves defined by their relationship to Lorenz curves. In its recursive form, the cABC analysis can be applied again to subset "A". A generic image dataset and three biomedical datasets (lipidomics and two genomics datasets) with a large number of variables were used to perform the experiments. The experimental results show that the recursive cABC analysis limits the dimensions of the data projection to a minimum where the relevant information is still preserved and directs the feature selection in machine learning to the most important class-relevant information, including filtering feature sets for nonsense variables. Feature sets were reduced to 10% or less of the original variables and still provided accurate classification in data not used for feature selection. cABC analysis, in its recursive variant, provides a computationally precise means of reducing information to a minimum. The minimum is the result of a computation of the number of k most relevant items, rather than a decision to select the k best items from a list. In addition, there are precise criteria for stopping the reduction process. The reduction to the most important features can improve the human understanding of the properties of the data set. The cABC method is implemented in the Python package "cABCanalysis" available at https://pypi.org/project/cABCanalysis/.
Bayesian inference is ubiquitous in science and widely used in biomedical research such as cell sorting or “omics” approaches, as well as in machine learning (ML), artificial neural networks, and “big data” applications. However, the calculation is not robust in regions of low evidence. In cases where one group has a lower mean but a higher variance than another group, new cases with larger values are implausibly assigned to the group with typically smaller values. An approach for a robust extension of Bayesian inference is proposed that proceeds in two main steps starting from the Bayesian posterior probabilities. First, cases with low evidence are labeled as “uncertain” class membership. The boundary for low probabilities of class assignment (threshold 𝜀
) is calculated using a computed ABC analysis as a data-based technique for item categorization. This leaves a number of cases with uncertain classification (p < 𝜀
). Second, cases with uncertain class membership are relabeled based on the distance to neighboring classified cases based on Voronoi cells. The approach is demonstrated on biomedical data typically analyzed with Bayesian statistics, such as flow cytometric data sets or biomarkers used in medical diagnostics, where it increased the class assignment accuracy by 1–10% depending on the data set. The proposed extension of the Bayesian inference of class membership can be used to obtain robust and plausible class assignments even for data at the extremes of the distribution and/or for which evidence is weak.
Aim: Exposure to opioids has been associated with epigenetic effects. Studies in rodents suggested a role of varying degrees of DNA methylation in the differential regulation of μ-opioid receptor expression across the brain.
Methods: In a translational investigation, using tissue acquired postmortem from 21 brain regions of former opiate addicts, representing a human cohort with chronic opioid exposure, μ-opioid receptor expression was analyzed at the level of DNA methylation, mRNA and protein.
Results & conclusion: While high or low μ-opioid receptor expression significantly correlated with local OPRM1 mRNA levels, there was no corresponding association with OPRM1 methylation status. Additional experiments in human cell lines showed that changes in DNA methylation associated with changes in μ-opioid expression were an order of magnitude greater than differences in brain. Hence, different degrees of DNA methylation associated with chronic opioid exposure are unlikely to exert a major role in the region-specificity of μ-opioid receptor expression in the human brain.