Refine
Year of publication
Document Type
- Article (33)
Language
- English (33)
Has Fulltext
- yes (33)
Is part of the Bibliography
- no (33)
Keywords
- data science (5)
- Data science (4)
- artificial intelligence (4)
- digital medicine (4)
- Machine-learning (3)
- machine-learning (3)
- Biomedical informatics (2)
- Data processing (2)
- Functional clustering (2)
- Olfactory system (2)
Institute
- Medizin (30)
- Pharmazie (3)
- Biochemie und Chemie (1)
- Biochemie, Chemie und Pharmazie (1)
- Biowissenschaften (1)
Euclidean distance-optimized data transformation for cluster analysis in biomedical data (EDOtrans)
(2022)
Background: Data transformations are commonly used in bioinformatics data processing in the context of data projection and clustering. The most used Euclidean metric is not scale invariant and therefore occasionally inappropriate for complex, e.g., multimodal distributed variables and may negatively affect the results of cluster analysis. Specifically, the squaring function in the definition of the Euclidean distance as the square root of the sum of squared differences between data points has the consequence that the value 1 implicitly defines a limit for distances within clusters versus distances between (inter-) clusters.
Methods: The Euclidean distances within a standard normal distribution (N(0,1)) follow a N(0,2–√) distribution. The EDO-transformation of a variable X is proposed as EDO=X/(2–√⋅s) following modeling of the standard deviation s by a mixture of Gaussians and selecting the dominant modes via item categorization. The method was compared in artificial and biomedical datasets with clustering of untransformed data, z-transformed data, and the recently proposed pooled variable scaling.
Results: A simulation study and applications to known real data examples showed that the proposed EDO scaling method is generally useful. The clustering results in terms of cluster accuracy, adjusted Rand index and Dunn’s index outperformed the classical alternatives. Finally, the EDO transformation was applied to cluster a high-dimensional genomic dataset consisting of gene expression data for multiple samples of breast cancer tissues, and the proposed approach gave better results than classical methods and was compared with pooled variable scaling.
Conclusions: For multivariate procedures of data analysis, it is proposed to use the EDO transformation as a better alternative to the established z-standardization, especially for nontrivially distributed data. The “EDOtrans” R package is available at https://cran.r-project.org/package=EDOtrans.
Next-generation sequencing (NGS) provides unrestricted access to the genome, but it produces ‘big data’ exceeding in amount and complexity the classical analytical approaches. We introduce a bioinformatics-based classifying biomarker that uses emergent properties in genetics to separate pain patients requiring extremely high opioid doses from controls. Following precisely calculated selection of the 34 most informative markers in the OPRM1, OPRK1, OPRD1 and SIGMAR1 genes, pattern of genotypes belonging to either patient group could be derived using a k-nearest neighbor (kNN) classifier that provided a diagnostic accuracy of 80.6±4%. This outperformed alternative classifiers such as reportedly functional opioid receptor gene variants or complex biomarkers obtained via multiple regression or decision tree analysis. The accumulation of several genetic variants with only minor functional influences may result in a qualitative consequence affecting complex phenotypes, pointing at emergent properties in genetics.
Background: Human genetic research has implicated functional variants of more than one hundred genes in the modulation of persisting pain. Artificial intelligence and machine‐learning techniques may combine this knowledge with results of genetic research gathered in any context, which permits the identification of the key biological processes involved in chronic sensitization to pain.
Methods: Based on published evidence, a set of 110 genes carrying variants reported to be associated with modulation of the clinical phenotype of persisting pain in eight different clinical settings was submitted to unsupervised machine‐learning aimed at functional clustering. Subsequently, a mathematically supported subset of genes, comprising those most consistently involved in persisting pain, was analysed by means of computational functional genomics in the Gene Ontology knowledgebase.
Results: Clustering of genes with evidence for a modulation of persisting pain elucidated a functionally heterogeneous set. The situation cleared when the focus was narrowed to a genetic modulation consistently observed throughout several clinical settings. On this basis, two groups of biological processes, the immune system and nitric oxide signalling, emerged as major players in sensitization to persisting pain, which is biologically highly plausible and in agreement with other lines of pain research.
Conclusions: The present computational functional genomics‐based approach provided a computational systems‐biology perspective on chronic sensitization to pain. Human genetic control of persisting pain points to the immune system as a source of potential future targets for drugs directed against persisting pain. Contemporary machine‐learned methods provide innovative approaches to knowledge discovery from previous evidence.
Significance: We show that knowledge discovery in genetic databases and contemporary machine‐learned techniques can identify relevant biological processes involved in Persitent pain.
Background: To prevent persistent post-surgery pain, early identification of patients at high risk is a clinical need. Supervised machine-learning techniques were used to test how accurately the patients’ performance in a preoperatively performed tonic cold pain test could predict persistent post-surgery pain.
Methods: We analysed 763 patients from a cohort of 900 women who were treated for breast cancer, of whom 61 patients had developed signs of persistent pain during three yr of follow-up. Preoperatively, all patients underwent a cold pain test (immersion of the hand into a water bath at 2–4 °C). The patients rated the pain intensity using a numerical ratings scale (NRS) from 0 to 10. Supervised machine-learning techniques were used to construct a classifier that could predict patients at risk of persistent pain.
Results: Whether or not a patient rated the pain intensity at NRS=10 within less than 45 s during the cold water immersion test provided a negative predictive value of 94.4% to assign a patient to the "persistent pain" group. If NRS=10 was never reached during the cold test, the predictive value for not developing persistent pain was almost 97%. However, a low negative predictive value of 10% implied a high false positive rate.
Conclusions: Results provide a robust exclusion of persistent pain in women with an accuracy of 94.4%. Moreover, results provide further support for the hypothesis that the endogenous pain inhibitory system may play an important role in the process of pain becoming persistent.
The human sense of smell is often analyzed as being composed of three main components comprising olfactory threshold, odor discrimination and the ability to identify odors. A relevant distinction of the three components and their differential changes in distinct disorders remains a research focus. The present data-driven analysis aimed at establishing a cluster structure in the pattern of olfactory subtest results. Therefore, unsupervised machine-learning was applied onto olfactory subtest results acquired in 10,714 subjects with nine different olfactory pathologies. Using the U-matrix, Emergent Self-organizing feature maps (ESOM) identified three different clusters characterized by (i) low threshold and good discrimination and identification, (ii) very high threshold associated with absent to poor discrimination and identification ability, or (iii) medium threshold, i.e., in the mid-range of possible thresholds, associated with reduced discrimination and identification ability. Specific etiologies of olfactory (dys)function were unequally represented in the clusters (p < 2.2 · 10−16). Patients with congenital anosmia were overrepresented in the second cluster while subjects with postinfectious olfactory dysfunction belonged frequently to the third cluster. However, the clusters provided no clear separation between etiologies. Hence, the present verification of a distinct cluster structure encourages continued scientific efforts at olfactory test pattern recognition.
Biomedical data obtained during cell experiments, laboratory animal research, or human studies often display a complex distribution. Statistical identification of subgroups in research data poses an analytical challenge. Here were introduce an interactive R-based bioinformatics tool, called “AdaptGauss”. It enables a valid identification of a biologically-meaningful multimodal structure in the data by fitting a Gaussian mixture model (GMM) to the data. The interface allows a supervised selection of the number of subgroups. This enables the expectation maximization (EM) algorithm to adapt more complex GMM than usually observed with a noninteractive approach. Interactively fitting a GMM to heat pain threshold data acquired from human volunteers revealed a distribution pattern with four Gaussian modes located at temperatures of 32.3, 37.2, 41.4, and 45.4 °C. Noninteractive fitting was unable to identify a meaningful data structure. Obtained results are compatible with known activity temperatures of different TRP ion channels suggesting the mechanistic contribution of different heat sensors to the perception of thermal pain. Thus, sophisticated analysis of the modal structure of biomedical data provides a basis for the mechanistic interpretation of the observations. As it may reflect the involvement of different TRP thermosensory ion channels, the analysis provides a starting point for hypothesis-driven laboratory experiments.
Background: Prevention of persistent pain following breast cancer surgery, via early identification of patients at high risk, is a clinical need. Supervised machine-learning was used to identify parameters that predict persistence of significant pain.
Methods: Over 500 demographic, clinical and psychological parameters were acquired up to 6 months after surgery from 1,000 women (aged 28–75 years) who were treated for breast cancer. Pain was assessed using an 11-point numerical rating scale before surgery and at months 1, 6, 12, 24, and 36. The ratings at months 12, 24, and 36 were used to allocate patents to either "persisting pain" or "non-persisting pain" groups. Unsupervised machine learning was applied to map the parameters to these diagnoses.
Results: A symbolic rule-based classifier tool was created that comprised 21 single or aggregated parameters, including demographic features, psychological and pain-related parameters, forming a questionnaire with "yes/no" items (decision rules). If at least 10 of the 21 rules applied, persisting pain was predicted at a cross-validated accuracy of 86% and a negative predictive value of approximately 95%.
Conclusions: The present machine-learned analysis showed that, even with a large set of parameters acquired from a large cohort, early identification of these patients is only partly successful. This indicates that more parameters are needed for accurate prediction of persisting pain. However, with the current parameters it is possible, with a certainty of almost 95%, to exclude the possibility of persistent pain developing in a woman being treated for breast cancer.
Aim: Exposure to opioids has been associated with epigenetic effects. Studies in rodents suggested a role of varying degrees of DNA methylation in the differential regulation of μ-opioid receptor expression across the brain.
Methods: In a translational investigation, using tissue acquired postmortem from 21 brain regions of former opiate addicts, representing a human cohort with chronic opioid exposure, μ-opioid receptor expression was analyzed at the level of DNA methylation, mRNA and protein.
Results & conclusion: While high or low μ-opioid receptor expression significantly correlated with local OPRM1 mRNA levels, there was no corresponding association with OPRM1 methylation status. Additional experiments in human cell lines showed that changes in DNA methylation associated with changes in μ-opioid expression were an order of magnitude greater than differences in brain. Hence, different degrees of DNA methylation associated with chronic opioid exposure are unlikely to exert a major role in the region-specificity of μ-opioid receptor expression in the human brain.
The comprehensive assessment of pain-related human phenotypes requires combinations of nociceptive measures that produce complex high-dimensional data, posing challenges to bioinformatic analysis. In this study, we assessed established experimental models of heat hyperalgesia of the skin, consisting of local ultraviolet-B (UV-B) irradiation or capsaicin application, in 82 healthy subjects using a variety of noxious stimuli. We extended the original heat stimulation by applying cold and mechanical stimuli and assessing the hypersensitization effects with a clinically established quantitative sensory testing (QST) battery (German Research Network on Neuropathic Pain). This study provided a 246 × 10-sized data matrix (82 subjects assessed at baseline, following UV-B application, and following capsaicin application) with respect to 10 QST parameters, which we analyzed using machine-learning techniques. We observed statistically significant effects of the hypersensitization treatments in 9 different QST parameters. Supervised machine-learned analysis implemented as random forests followed by ABC analysis pointed to heat pain thresholds as the most relevantly affected QST parameter. However, decision tree analysis indicated that UV-B additionally modulated sensitivity to cold. Unsupervised machine-learning techniques, implemented as emergent self-organizing maps, hinted at subgroups responding to topical application of capsaicin. The distinction among subgroups was based on sensitivity to pressure pain, which could be attributed to sex differences, with women being more sensitive than men. Thus, while UV-B and capsaicin share a major component of heat pain sensitization, they differ in their effects on QST parameter patterns in healthy subjects, suggesting a lack of redundancy between these models.
Background: It is assumed that different pain phenotypes are based on varying molecular pathomechanisms. Distinct ion channels seem to be associated with the perception of cold pain, in particular TRPM8 and TRPA1 have been highlighted previously. The present study analyzed the distribution of cold pain thresholds with focus at describing the multimodality based on the hypothesis that it reflects a contribution of distinct ion channels.
Methods: Cold pain thresholds (CPT) were available from 329 healthy volunteers (aged 18 - 37 years; 159 men) enrolled in previous studies. The distribution of the pooled and log-transformed threshold data was described using a kernel density estimation (Pareto Density Estimation (PDE)) and subsequently, the log data was modeled as a mixture of Gaussian distributions using the expectation maximization (EM) algorithm to optimize the fit.
Results: CPTs were clearly multi-modally distributed. Fitting a Gaussian Mixture Model (GMM) to the log-transformed threshold data revealed that the best fit is obtained when applying a three-model distribution pattern. The modes of the identified three Gaussian distributions, retransformed from the log domain to the mean stimulation temperatures at which the subjects had indicated pain thresholds, were obtained at 23.7 °C, 13.2 °C and 1.5 °C for Gaussian #1, #2 and #3, respectively.
Conclusions: The localization of the first and second Gaussians was interpreted as reflecting the contribution of two different cold sensors. From the calculated localization of the modes of the first two Gaussians, the hypothesis of an involvement of TRPM8, sensing temperatures from 25 - 24 °C, and TRPA1, sensing cold from 17 °C can be derived. In that case, subjects belonging to either Gaussian would possess a dominance of the one or the other receptor at the skin area where the cold stimuli had been applied. The findings therefore support a suitability of complex analytical approaches to detect mechanistically determined patterns from pain phenotype data.