Refine
Document Type
- Article (19) (remove)
Language
- English (19)
Has Fulltext
- yes (19)
Is part of the Bibliography
- no (19)
Keywords
- data science (19) (remove)
Institute
- Medizin (18)
- Biochemie, Chemie und Pharmazie (1)
Bayesian inference is ubiquitous in science and widely used in biomedical research such as cell sorting or “omics” approaches, as well as in machine learning (ML), artificial neural networks, and “big data” applications. However, the calculation is not robust in regions of low evidence. In cases where one group has a lower mean but a higher variance than another group, new cases with larger values are implausibly assigned to the group with typically smaller values. An approach for a robust extension of Bayesian inference is proposed that proceeds in two main steps starting from the Bayesian posterior probabilities. First, cases with low evidence are labeled as “uncertain” class membership. The boundary for low probabilities of class assignment (threshold 𝜀
) is calculated using a computed ABC analysis as a data-based technique for item categorization. This leaves a number of cases with uncertain classification (p < 𝜀
). Second, cases with uncertain class membership are relabeled based on the distance to neighboring classified cases based on Voronoi cells. The approach is demonstrated on biomedical data typically analyzed with Bayesian statistics, such as flow cytometric data sets or biomarkers used in medical diagnostics, where it increased the class assignment accuracy by 1–10% depending on the data set. The proposed extension of the Bayesian inference of class membership can be used to obtain robust and plausible class assignments even for data at the extremes of the distribution and/or for which evidence is weak.
Persistent and, in particular, neuropathic pain is a major healthcare problem with still insufficient pharmacological treatment options. This triggered research activities aimed at finding analgesics with a novel mechanism of action. Results of these efforts will need to pass through the phases of drug development, in which experimental human pain models are established components e.g. implemented as chemical hyperalgesia induced by capsaicin. We aimed at ranking the various readouts of a human capsaicin–based pain model with respect to the most relevant information about the effects of a potential reference analgesic. In a placebo‐controlled, randomized cross‐over study, seven different pain‐related readouts were acquired in 16 healthy individuals before and after oral administration of 300 mg pregabalin. The sizes of the effect on pain induced by intradermal injection of capsaicin were quantified by calculating Cohen's d. While in four of the seven pain‐related parameters, pregabalin provided a small effect judged by values of Cohen's d exceeding 0.2, an item categorization technique implemented as computed ABC analysis identified the pain intensities in the area of secondary hyperalgesia and of allodynia as the most suitable parameters to quantify the analgesic effects of pregabalin. Results of this study provide further support for the ability of the intradermal capsaicin pain model to show analgesic effects of pregabalin. Results can serve as a basis for the designs of studies where the inclusion of this particular pain model and pregabalin is planned.
Advances in flow cytometry enable the acquisition of large and high-dimensional data sets per patient. Novel computational techniques allow the visualization of structures in these data and, finally, the identification of relevant subgroups. Correct data visualizations and projections from the high-dimensional space to the visualization plane require the correct representation of the structures in the data. This work shows that frequently used techniques are unreliable in this respect. One of the most important methods for data projection in this area is the t-distributed stochastic neighbor embedding (t-SNE). We analyzed its performance on artificial and real biomedical data sets. t-SNE introduced a cluster structure for homogeneously distributed data that did not contain any subgroupstructure. Inotherdatasets,t-SNEoccasionallysuggestedthewrongnumberofsubgroups or projected data points belonging to different subgroups, as if belonging to the same subgroup. As an alternative approach, emergent self-organizing maps (ESOM) were used in combination with U-matrix methods. This approach allowed the correct identification of homogeneous data while in sets containing distance or density-based subgroups structures; the number of subgroups and data point assignments were correctly displayed. The results highlight possible pitfalls in the use of a currently widely applied algorithmic technique for the detection of subgroups in high dimensional cytometric data and suggest a robust alternative.
Feature selection is a common step in data preprocessing that precedes machine learning to reduce data space and the computational cost of processing or obtaining the data. Filtering out uninformative variables is also important for knowledge discovery. By reducing the data space to only those components that are informative to the class structure, feature selection can simplify models so that they can be more easily interpreted by researchers in the field, reminiscent of explainable artificial intelligence. Knowledge discovery in complex data thus benefits from feature selection that aims to understand feature sets in the thematic context from which the data set originates. However, a single variable selected from a very small number of variables that are technically sufficient for AI training may make little immediate thematic sense, whereas the additional consideration of a variable discarded during feature selection could make scientific discovery very explicit. In this report, we propose an approach to explainable feature selection (XFS) based on a systematic reconsideration of unselected features. The difference between the respective classifications when training the algorithms with the selected features or with the unselected features provides a valid estimate of whether the relevant features in a data set have been selected and uninformative or trivial information was filtered out. It is shown that revisiting originally unselected variables in multivariate data sets allows for the detection of pathologies and errors in the feature selection that occasionally resulted in the failure to identify the most appropriate variables.
Recent advances in mathematical modelling and artificial intelligence have challenged the use of traditional regression analysis in biomedical research. This study examined artificial and cancer research data using binomial and multinomial logistic regression and compared its performance with other machine learning models such as random forests, support vector machines, Bayesian classifiers, k-nearest neighbours and repeated incremental clipping (RIPPER). The alternative models often outperformed regression in accurately classifying new cases. Logistic regression had a structural problem similar to early single-layer neural networks, which limited its ability to identify variables with high statistical significance for reliable class assignment. Therefore, regression is not always the best model for class prediction in biomedical datasets. The study emphasises the importance of validating selected models and suggests that a mixture of experts approach may be a more advanced and effective strategy for analysing biomedical datasets.
Background: Prevention of persistent pain after breast cancer surgery, via early identification of patients at high risk, is a clinical need. Psychological factors are among the most consistently proposed predictive parameters for the development of persistent pain. However, repeated use of long psychological questionnaires in this context may be exhaustive for a patient and inconvenient in everyday clinical practice.
Methods: Supervised machine learning was used to create a short form of questionnaires that would provide the same predictive performance of pain persistence as the full questionnaires in a cohort of 1000 women followed up for 3 yr after breast cancer surgery. Machine-learned predictors were first trained with the full-item set of Beck's Depression Inventory (BDI), Spielberger's State–Trait Anxiety Inventory (STAI), and the State–Trait Anger Expression Inventory (STAXI-2). Subsequently, features were selected from the questionnaires to create predictors having a reduced set of items.
Results: A combined seven-item set of 10% of the original psychological questions from STAI and BDI, provided the same predictive performance parameters as the full questionnaires for the development of persistent postsurgical pain. The seven-item version offers a shorter and at least as accurate identification of women in whom pain persistence is unlikely (almost 95% negative predictive value).
Conclusions: Using a data-driven machine-learning approach, a short list of seven items from BDI and STAI is proposed as a basis for a predictive tool for the persistence of pain after breast cancer surgery.
An important measure in pain research is the intensity of nociceptive stimuli and their cortical representation. However, there is evidence of different cerebral representations of nociceptive stimuli, including the fact that cortical areas recruited during processing of intranasal nociceptive chemical stimuli included those outside the traditional trigeminal areas. Therefore, the aim of this study was to investigate the major cerebral representations of stimulus intensity associated with intranasal chemical trigeminal stimulation. Trigeminal stimulation was achieved with carbon dioxide presented to the nasal mucosa. Using a single‐blinded, randomized crossover design, 24 subjects received nociceptive stimuli with two different stimulation paradigms, depending on the just noticeable differences in the stimulus strengths applied. Stimulus‐related brain activations were recorded using functional magnetic resonance imaging with event‐related design. Brain activations increased significantly with increasing stimulus intensity, with the largest cluster at the right Rolandic operculum and a global maximum in a smaller cluster at the left lower frontal orbital lobe. Region of interest analyses additionally supported an activation pattern correlated with the stimulus intensity at the piriform cortex as an area of special interest with the trigeminal input. The results support the piriform cortex, in addition to the secondary somatosensory cortex, as a major area of interest for stimulus strength‐related brain activation in pain models using trigeminal stimuli. This makes both areas a primary objective to be observed in human experimental pain settings where trigeminal input is used to study effects of analgesics.
Background: Persistent postsurgical neuropathic pain (PPSNP) can occur after intraoperative damage to somatosensory nerves, with a prevalence of 29–57% in breast cancer surgery. Proteomics is an active research field in neuropathic pain and the first results support its utility for establishing diagnoses or finding therapy strategies. Methods: 57 women (30 non-PPSNP/27 PPSNP) who had experienced a surgeon-verified intercostobrachial nerve injury during breast cancer surgery, were examined for patterns in 74 serum proteomic markers that allowed discrimination between subgroups with or without PPSNP. Serum samples were obtained both before and after surgery. Results: Unsupervised data analyses, including principal component analysis and self-organizing maps of artificial neurons, revealed patterns that supported a data structure consistent with pain-related subgroup (non-PPSPN vs. PPSNP) separation. Subsequent supervised machine learning-based analyses revealed 19 proteins (CD244, SIRT2, CCL28, CXCL9, CCL20, CCL3, IL.10RA, MCP.1, TRAIL, CCL25, IL10, uPA, CCL4, DNER, STAMPB, CCL23, CST5, CCL11, FGF.23) that were informative for subgroup separation. In cross-validated training and testing of six different machine-learned algorithms, subgroup assignment was significantly better than chance, whereas this was not possible when training the algorithms with randomly permuted data or with the protein markers not selected. In particular, sirtuin 2 emerged as a key protein, presenting both before and after breast cancer treatments in the PPSNP compared with the non-PPSNP subgroup. Conclusions: The identified proteins play important roles in immune processes such as cell migration, chemotaxis, and cytokine-signaling. They also have considerable overlap with currently known targets of approved or investigational drugs. Taken together, several lines of unsupervised and supervised analyses pointed to structures in serum proteomics data, obtained before and after breast cancer surgery, that relate to neuroinflammatory processes associated with the development of neuropathic pain after an intraoperative nerve lesion.
Knowledge discovery in biomedical data using supervised methods assumes that the data contain structure relevant to the class structure if a classifier can be trained to assign a case to the correct class better than by guessing. In this setting, acceptance or rejection of a scientific hypothesis may depend critically on the ability to classify cases better than randomly, without high classification performance being the primary goal. Random forests are often chosen for knowledge-discovery tasks because they are considered a powerful classifier that does not require sophisticated data transformation or hyperparameter tuning and can be regarded as a reference classifier for tabular numerical data. Here, we report a case where the failure of random forests using the default hyperparameter settings in the standard implementations of R and Python would have led to the rejection of the hypothesis that the data contained structure relevant to the class structure. After tuning the hyperparameters, classification performance increased from 56% to 65% balanced accuracy in R, and from 55% to 67% balanced accuracy in Python. More importantly, the 95% confidence intervals in the tuned versions were to the right of the value of 50% that characterizes guessing-level classification. Thus, tuning provided the desired evidence that the data structure supported the class structure of the data set. In this case, the tuning made more than a quantitative difference in the form of slightly better classification accuracy, but significantly changed the interpretation of the data set. This is especially true when classification performance is low and a small improvement increases the balanced accuracy to over 50% when guessing.
The use of artificial intelligence (AI) systems in biomedical and clinical settings can disrupt the traditional doctor–patient relationship, which is based on trust and transparency in medical advice and therapeutic decisions. When the diagnosis or selection of a therapy is no longer made solely by the physician, but to a significant extent by a machine using algorithms, decisions become nontransparent. Skill learning is the most common application of machine learning algorithms in clinical decision making. These are a class of very general algorithms (artificial neural networks, classifiers, etc.), which are tuned based on examples to optimize the classification of new, unseen cases. It is pointless to ask for an explanation for a decision. A detailed understanding of the mathematical details of an AI algorithm may be possible for experts in statistics or computer science. However, when it comes to the fate of human beings, this “developer’s explanation” is not sufficient. The concept of explainable AI (XAI) as a solution to this problem is attracting increasing scientific and regulatory interest. This review focuses on the requirement that XAIs must be able to explain in detail the decisions made by the AI to the experts in the field.