Refine
Year of publication
Document Type
- Article (76)
Has Fulltext
- yes (76)
Is part of the Bibliography
- no (76)
Keywords
- data science (19)
- machine-learning (7)
- Data science (6)
- pain (6)
- Machine-learning (5)
- artificial intelligence (5)
- digital medicine (5)
- machine learning (5)
- patients (5)
- Pain (3)
Institute
Background: Persistent postsurgical neuropathic pain (PPSNP) can occur after intraoperative damage to somatosensory nerves, with a prevalence of 29–57% in breast cancer surgery. Proteomics is an active research field in neuropathic pain and the first results support its utility for establishing diagnoses or finding therapy strategies. Methods: 57 women (30 non-PPSNP/27 PPSNP) who had experienced a surgeon-verified intercostobrachial nerve injury during breast cancer surgery, were examined for patterns in 74 serum proteomic markers that allowed discrimination between subgroups with or without PPSNP. Serum samples were obtained both before and after surgery. Results: Unsupervised data analyses, including principal component analysis and self-organizing maps of artificial neurons, revealed patterns that supported a data structure consistent with pain-related subgroup (non-PPSPN vs. PPSNP) separation. Subsequent supervised machine learning-based analyses revealed 19 proteins (CD244, SIRT2, CCL28, CXCL9, CCL20, CCL3, IL.10RA, MCP.1, TRAIL, CCL25, IL10, uPA, CCL4, DNER, STAMPB, CCL23, CST5, CCL11, FGF.23) that were informative for subgroup separation. In cross-validated training and testing of six different machine-learned algorithms, subgroup assignment was significantly better than chance, whereas this was not possible when training the algorithms with randomly permuted data or with the protein markers not selected. In particular, sirtuin 2 emerged as a key protein, presenting both before and after breast cancer treatments in the PPSNP compared with the non-PPSNP subgroup. Conclusions: The identified proteins play important roles in immune processes such as cell migration, chemotaxis, and cytokine-signaling. They also have considerable overlap with currently known targets of approved or investigational drugs. Taken together, several lines of unsupervised and supervised analyses pointed to structures in serum proteomics data, obtained before and after breast cancer surgery, that relate to neuroinflammatory processes associated with the development of neuropathic pain after an intraoperative nerve lesion.
Sex differences in pain perception have been extensively studied, but precision medicine applications such as sex-specific pain pharmacology have barely progressed beyond proof-of-concept. A data set of pain thresholds to mechanical (blunt and punctate pressure) and thermal (heat and cold) stimuli applied to non-sensitized and sensitized (capsaicin, menthol) forearm skin of 69 male and 56 female healthy volunteers was analyzed for data structures contingent with the prior sex structure using unsupervised and supervised approaches. A working hypothesis that the relevance of sex differences could be approached via reversibility of the association, i.e., sex should be identifiable from pain thresholds, was verified with trained machine learning algorithms that could infer a person's sex in a 20% validation sample not seen to the algorithms during training, with balanced accuracy of up to 79%. This was only possible with thresholds for mechanical stimuli, but not for thermal stimuli or sensitization responses, which were not sufficient to train an algorithm that could assign sex better than by guessing or when trained with nonsense (permuted) information. This enabled the translation to the molecular level of nociceptive targets that convert mechanical but not thermal information into signals interpreted as pain, which could eventually be used for pharmacological precision medicine approaches to pain. By exploiting a key feature of machine learning, which allows for the recognition of data structures and the reduction of information to the minimum relevant, experimental human pain data could be characterized in a way that incorporates "non" logic that could be translated directly to the molecular pharmacological level, pointing toward sex-specific precision medicine for pain.
Background: Persistent pain in breast cancer survivors is common. Psychological and sleep-related factors modulate perception, interpretation and coping with pain and may contribute to the clinical phenotype. The present analysis pursued the hypothesis that breast cancer survivors form subgroups, based on psychological and sleep-related parameters that are relevant to the impact of pain on the patients’ life.
Methods: We analysed 337 women treated for breast cancer, in whom psychological and sleep-related parameters as well as parameters related to pain intensity and interference had been acquired. Data were analysed by using supervised and unsupervised machine-learning techniques (i) to detect patient subgroups based on the pattern of psychological or sleep-related parameters, (ii) to interpret the detected cluster structure and (iii) to relate this data structure to pain interference and impact on life.
Results: Artificial intelligence-based detection of data structure, implemented as self-organizing neuronal maps, identified two different clusters of patients. A smaller cluster (11.5% of the patients) had comparatively lower resilience, more depressive symptoms and lower extraversion than the other patients. In these patients, life-satisfaction, mood, and life in general were comparatively more impeded by persistent pain.
Conclusions: The results support the initial hypothesis that psychological and sleep-related parameter patterns are meaningful for subgrouping patients with respect to how persistent pain after breast cancer treatments interferes with their life. This indicates that management of pain should address more complex features than just pain intensity. Artificial intelligence is a useful tool in the identification of subgroups of patients based on psychological factors.
Selecting the k best features is a common task in machine learning. Typically, a few features have high importance, but many have low importance (right-skewed distribution). This report proposes a numerically precise method to address this skewed feature importance distribution in order to reduce a feature set to the informative minimum of items. Computed ABC analysis (cABC) is an item categorization method that aims to identify the most important items by partitioning a set of non-negative numerical items into subsets "A", "B", and "C" such that subset "A" contains the "few important" items based on specific properties of ABC curves defined by their relationship to Lorenz curves. In its recursive form, the cABC analysis can be applied again to subset "A". A generic image dataset and three biomedical datasets (lipidomics and two genomics datasets) with a large number of variables were used to perform the experiments. The experimental results show that the recursive cABC analysis limits the dimensions of the data projection to a minimum where the relevant information is still preserved and directs the feature selection in machine learning to the most important class-relevant information, including filtering feature sets for nonsense variables. Feature sets were reduced to 10% or less of the original variables and still provided accurate classification in data not used for feature selection. cABC analysis, in its recursive variant, provides a computationally precise means of reducing information to a minimum. The minimum is the result of a computation of the number of k most relevant items, rather than a decision to select the k best items from a list. In addition, there are precise criteria for stopping the reduction process. The reduction to the most important features can improve the human understanding of the properties of the data set. The cABC method is implemented in the Python package "cABCanalysis" available at https://pypi.org/project/cABCanalysis/.
The evaluation of pharmacological data using machine learning requires high data quality. Therefore, data preprocessing, that is, cleaning analytical laboratory errors, replacing missing values or outliers, and transforming data adequately before actual data analysis, is crucial. Because current tools available for this purpose often require programming skills, preprocessing tools with graphical user interfaces that can be used interactively are needed. In collaboration between data scientists and experts in bioanalytical diagnostics, a graphical software package for data preprocessing called pguIMP is proposed, which contains a fixed sequence of preprocessing steps to enable reproducible interactive data preprocessing. As an R-based package, it also allows direct integration into this data science environment without requiring any programming knowledge. The implementation of contemporary data processing methods, including machine-learning-based imputation techniques, ensures the generation of corrected and cleaned bioanalytical data sets that preserve data structures such as clusters better than is possible with classical methods. This was evaluated on bioanalytical data sets from lipidomics and drug research using k-nearest-neighbors-based imputation followed by k-means clustering and density-based spatial clustering of applications with noise. The R package provides a Shiny-based web interface designed to be easy to use for non–data analysis experts. It is demonstrated that the spectrum of methods provided is suitable as a standard pipeline for preprocessing bioanalytical data in biomedical research domains. The R package pguIMP is freely available at the comprehensive R archive network (https://cran.r-project.org/web/packages/pguIMP/index.html).
Bayesian inference is ubiquitous in science and widely used in biomedical research such as cell sorting or “omics” approaches, as well as in machine learning (ML), artificial neural networks, and “big data” applications. However, the calculation is not robust in regions of low evidence. In cases where one group has a lower mean but a higher variance than another group, new cases with larger values are implausibly assigned to the group with typically smaller values. An approach for a robust extension of Bayesian inference is proposed that proceeds in two main steps starting from the Bayesian posterior probabilities. First, cases with low evidence are labeled as “uncertain” class membership. The boundary for low probabilities of class assignment (threshold 𝜀
) is calculated using a computed ABC analysis as a data-based technique for item categorization. This leaves a number of cases with uncertain classification (p < 𝜀
). Second, cases with uncertain class membership are relabeled based on the distance to neighboring classified cases based on Voronoi cells. The approach is demonstrated on biomedical data typically analyzed with Bayesian statistics, such as flow cytometric data sets or biomarkers used in medical diagnostics, where it increased the class assignment accuracy by 1–10% depending on the data set. The proposed extension of the Bayesian inference of class membership can be used to obtain robust and plausible class assignments even for data at the extremes of the distribution and/or for which evidence is weak.
Background: Transient receptor potential cation channel subfamily V member 1 (TRPV1) are sensitive to heat, capsaicin, pungent chemicals and other noxious stimuli. They play important roles in the pain pathway where in concert with proinflammatory factors such as leukotrienes they mediate sensitization and hyperalgesia. TRPV1 is the target of several novel analgesics drugs under development and therefore, TRPV1 genetic variants might represent promising candidates for pharmacogenetic modulators of drug effects.
Methods: A next-generation sequencing (NGS) panel was created for the human TRPV1 gene and in addition, for the leukotriene receptors BLT1 and BLT2 recently described to modulate TRPV1 mediated sensitisation processes rendering the coding genes LTB4R and LTB4R2 important co-players in pharmacogenetic approaches involving TRPV1. The NGS workflow was based on a custom AmpliSeq™ panel and designed for sequencing of human genes on an Ion PGM™ Sequencer. A cohort of 80 healthy subjects of Western European descent was screened to evaluate and validate the detection of exomic sequences of the coding genes with 25 base pair exon padding.
Results: The amplicons covered approximately 97% of the target sequence. A median of 2.81 x 10 6 reads per run was obtained. This identified approximately 140 chromosome loci where nucleotides deviated from the reference sequence GRCh37 hg19 comprising the three genes TRPV1, LTB4R and LTB4R2. Correspondence between NGS and Sanger derived nucleotide sequences was 100%.
Conclusions: Results suggested that the NGS approach based on AmpliSeq™ libraries and Ion Personal Genome Machine (PGM) sequencing is a highly efficient mutation detection method. It is suitable for large-scale sequencing of TRPV1 and functionally related genes. The method adds a large amount of genetic information as a basis for complete analysis of TRPV1 ion channel genetics and its functional consequences.
Internalin B–mediated activation of the membrane-bound receptor tyrosine kinase MET is accompanied by a change in receptor mobility. Conversely, it should be possible to infer from receptor mobility whether a cell has been treated with internalin B. Here, we propose a method based on hidden Markov modeling and explainable artificial intelligence that machine-learns the key differences in MET mobility between internalin B–treated and –untreated cells from single-particle tracking data. Our method assigns receptor mobility to three diffusion modes (immobile, slow, and fast). It discriminates between internalin B–treated and –untreated cells with a balanced accuracy of >99% and identifies three parameters that are most affected by internalin B treatment: a decrease in the mobility of slow molecules (1) and a depopulation of the fast mode (2) caused by an increased transition of fast molecules to the slow mode (3). Our approach is based entirely on free software and is readily applicable to the analysis of other membrane receptors.
Based on accumulating evidence of a role of lipid signaling in many physiological and pathophysiological processes including psychiatric diseases, the present data driven analysis was designed to gather information needed to develop a prospective biomarker, using a targeted lipidomics approach covering different lipid mediators. Using unsupervised methods of data structure detection, implemented as hierarchal clustering, emergent self-organizing maps of neuronal networks, and principal component analysis, a cluster structure was found in the input data space comprising plasma concentrations of d = 35 different lipid-markers of various classes acquired in n = 94 subjects with the clinical diagnoses depression, bipolar disorder, ADHD, dementia, or in healthy controls. The structure separated patients with dementia from the other clinical groups, indicating that dementia is associated with a distinct lipid mediator plasma concentrations pattern possibly providing a basis for a future biomarker. This hypothesis was subsequently assessed using supervised machine-learning methods, implemented as random forests or principal component analysis followed by computed ABC analysis used for feature selection, and as random forests, k-nearest neighbors, support vector machines, multilayer perceptron, and naïve Bayesian classifiers to estimate whether the selected lipid mediators provide sufficient information that the diagnosis of dementia can be established at a higher accuracy than by guessing. This succeeded using a set of d = 7 markers comprising GluCerC16:0, Cer24:0, Cer20:0, Cer16:0, Cer24:1, C16 sphinganine, and LacCerC16:0, at an accuracy of 77%. By contrast, using random lipid markers reduced the diagnostic accuracy to values of 65% or less, whereas training the algorithms with randomly permuted data was followed by complete failure to diagnose dementia, emphasizing that the selected lipid mediators were display a particular pattern in this disease possibly qualifying as biomarkers.
Euclidean distance-optimized data transformation for cluster analysis in biomedical data (EDOtrans)
(2022)
Background: Data transformations are commonly used in bioinformatics data processing in the context of data projection and clustering. The most used Euclidean metric is not scale invariant and therefore occasionally inappropriate for complex, e.g., multimodal distributed variables and may negatively affect the results of cluster analysis. Specifically, the squaring function in the definition of the Euclidean distance as the square root of the sum of squared differences between data points has the consequence that the value 1 implicitly defines a limit for distances within clusters versus distances between (inter-) clusters.
Methods: The Euclidean distances within a standard normal distribution (N(0,1)) follow a N(0,2–√) distribution. The EDO-transformation of a variable X is proposed as EDO=X/(2–√⋅s) following modeling of the standard deviation s by a mixture of Gaussians and selecting the dominant modes via item categorization. The method was compared in artificial and biomedical datasets with clustering of untransformed data, z-transformed data, and the recently proposed pooled variable scaling.
Results: A simulation study and applications to known real data examples showed that the proposed EDO scaling method is generally useful. The clustering results in terms of cluster accuracy, adjusted Rand index and Dunn’s index outperformed the classical alternatives. Finally, the EDO transformation was applied to cluster a high-dimensional genomic dataset consisting of gene expression data for multiple samples of breast cancer tissues, and the proposed approach gave better results than classical methods and was compared with pooled variable scaling.
Conclusions: For multivariate procedures of data analysis, it is proposed to use the EDO transformation as a better alternative to the established z-standardization, especially for nontrivially distributed data. The “EDOtrans” R package is available at https://cran.r-project.org/package=EDOtrans.