Refine
Year of publication
Document Type
- Article (76)
Has Fulltext
- yes (76)
Is part of the Bibliography
- no (76)
Keywords
- data science (19)
- machine-learning (7)
- Data science (6)
- pain (6)
- Machine-learning (5)
- artificial intelligence (5)
- digital medicine (5)
- machine learning (5)
- patients (5)
- Pain (3)
Institute
DNA methylation is a major regulatory process of gene transcription, and aberrant DNA methylation is associated with various diseases including cancer. Many compounds have been reported to modify DNA methylation states. Despite increasing interest in the clinical application of drugs with epigenetic effects, and the use of diagnostic markers for genome-wide hypomethylation in cancer, large-scale screening systems to measure the effects of drugs on DNA methylation are limited. In this study, we improved the previously established fluorescence polarization-based global DNA methylation assay so that it is more suitable for application to human genomic DNA. Our methyl-sensitive fluorescence polarization (MSFP) assay was highly repeatable (inter-assay coefficient of variation = 1.5%) and accurate (r2 = 0.99). According to signal linearity, only 50–80 ng human genomic DNA per reaction was necessary for the 384-well format. MSFP is a simple, rapid approach as all biochemical reactions and final detection can be performed in one well in a 384-well plate without purification steps in less than 3.5 hours. Furthermore, we demonstrated a significant correlation between MSFP and the LINE-1 pyrosequencing assay, a widely used global DNA methylation assay. MSFP can be applied for the pre-screening of compounds that influence global DNA methylation states and also for the diagnosis of certain types of cancer.
Motivation: Calculating the magnitude of treatment effects or of differences between two groups is a common task in quantitative science. Standard effect size measures based on differences, such as the commonly used Cohen's, fail to capture the treatment-related effects on the data if the effects were not reflected by the central tendency. The present work aims at (i) developing a non-parametric alternative to Cohen’s d, which (ii) circumvents some of its numerical limitations and (iii) involves obvious changes in the data that do not affect the group means and are therefore not captured by Cohen’s d.
Results: We propose "Impact” as a novel non-parametric measure of effect size obtained as the sum of two separate components and includes (i) a difference-based effect size measure implemented as the change in the central tendency of the group-specific data normalized to pooled variability and (ii) a data distribution shape-based effect size measure implemented as the difference in probability density of the group-specific data. Results obtained on artificial and empirical data showed that “Impact”is superior to Cohen's d by its additional second component in detecting clearly visible effects not reflected in central tendencies. The proposed effect size measure is invariant to the scaling of the data, reflects changes in the central tendency in cases where differences in the shape of probability distributions between subgroups are negligible, but captures changes in probability distributions as effects and is numerically stable even if the variances of the data set or its subgroups disappear.
Conclusions: The proposed effect size measure shares the ability to observe such an effect with machine learning algorithms. Therefore, the proposed effect size measure is particularly well suited for data science and artificial intelligence-based knowledge discovery from big and heterogeneous data.
In the context of data science, data projection and clustering are common procedures. The chosen analysis method is crucial to avoid faulty pattern recognition. It is therefore necessary to know the properties and especially the limitations of projection and clustering algorithms. This report describes a collection of datasets that are grouped together in the Fundamental Clustering and Projection Suite (FCPS). The FCPS contains 10 datasets with the names "Atom", "Chainlink", "EngyTime", "Golfball", "Hepta", "Lsun", "Target", "Tetra", "TwoDiamonds", and "WingNut". Common clustering methods occasionally identified non-existent clusters or assigned data points to the wrong clusters in the FCPS suite. Likewise, common data projection methods could only partially reproduce the data structure correctly on a two-dimensional plane. In conclusion, the FCPS dataset collection addresses general challenges for clustering and projection algorithms such as lack of linear separability, different or small inner class spacing, classes defined by data density rather than data spacing, no cluster structure at all, outliers, or classes that are in contact. This report describes a collection of datasets that are grouped together in the Fundamental Clustering and Projection Suite (FCPS). It is designed to address specific problems of structure discovery in high-dimensional spaces.
Pain and pain chronification are incompletely understood and unresolved medical problems that continue to have a high prevalence. It has been accepted that pain is a complex phenomenon. Contemporary methods of computational science can use complex clinical and experimental data to better understand the complexity of pain. Among data science techniques, machine learning is referred to as a set of methods that can automatically detect patterns in data and then use the uncovered patterns to predict or classify future data, to observe structures such as subgroups in the data, or to extract information from the data suitable to derive new knowledge. Together with (bio)statistics, artificial intelligence and machine learning aim at learning from data. ...
Finding subgroups in biomedical data is a key task in biomedical research and precision medicine. Already one-dimensional data, such as many different readouts from cell experiments, preclinical or human laboratory experiments or clinical signs, often reveal a more complex distribution than a single mode. Gaussian mixtures play an important role in the multimodal distribution of one-dimensional data. However, although fitting of Gaussian mixture models (GMM) is often aimed at obtaining the separate modes composing the mixture, current technical implementations, often using the Expectation Maximization (EM) algorithm, are not optimized for this task. This occasionally results in poorly separated modes that are unsuitable for determining a distinguishable group structure in the data. Here, we introduce “Distribution Optimization” an evolutionary algorithm to GMM fitting that uses an adjustable error function that is based on chi-square statistics and the probability density. The algorithm can be directly targeted at the separation of the modes of the mixture by employing additional criterion for the degree by which single modes overlap. The obtained GMM fits were comparable with those obtained with classical EM based fits, except for data sets where the EM algorithm produced unsatisfactory results with overlapping Gaussian modes. There, the proposed algorithm successfully separated the modes, providing a basis for meaningful group separation while fitting the data satisfactorily. Through its optimization toward mode separation, the evolutionary algorithm proofed particularly suitable basis for group separation in multimodally distributed data, outperforming alternative EM based methods.
BACKGROUND: Micro-RNAs (miRNA) are attributed to the systems biological role of a regulatory mechanism of the expression of protein coding genes. Research has identified miRNAs dysregulations in several but distinct pathophysiological processes, which hints at distinct systems-biology functions of miRNAs. The present analysis approached the role of miRNAs from a genomics perspective and assessed the biological roles of 2954 genes and 788 human miRNAs, which can be considered to interact, based on empirical evidence and computational predictions of miRNA versus gene interactions.
RESULTS: From a genomics perspective, the biological processes in which the genes that are influenced by miRNAs are involved comprise of six major topics comprising biological regulation, cellular metabolism, information processing, development, gene expression and tissue homeostasis. The usage of this knowledge as a guidance for further research is sketched for two genetically defined functional areas: cell death and gene expression. Results suggest that the latter points to a fundamental role of miRNAs consisting of hyper-regulation of gene expression, i.e., the control of the expression of such genes which control specifically the expression of genes.
CONCLUSIONS: Laboratory research identified contributions of miRNA regulation to several distinct biological processes. The present analysis transferred this knowledge to a systems-biology level. A comprehensible and precise description of the biological processes in which the genes that are influenced by miRNAs are notably involved could be made. This knowledge can be employed to guide future research concerning the biological role of miRNA (dys-) regulations. The analysis also suggests that miRNAs especially control the expression of genes that control the expression of genes.
Computed ABC analysis for rational selection of most informative variables in multivariate data
(2015)
Objective: Multivariate data sets often differ in several factors or derived statistical parameters, which have to be selected for a valid interpretation. Basing this selection on traditional statistical limits leads occasionally to the perception of losing information from a data set. This paper proposes a novel method for calculating precise limits for the selection of parameter sets.
Methods: The algorithm is based on an ABC analysis and calculates these limits on the basis of the mathematical properties of the distribution of the analyzed items. The limits implement the aim of any ABC analysis, i.e., comparing the increase in yield to the required additional effort. In particular, the limit for set A, the "important few", is optimized in a way that both, the effort and the yield for the other sets (B and C), are minimized and the additional gain is optimized.
Results: As a typical example from biomedical research, the feasibility of the ABC analysis as an objective replacement for classical subjective limits to select highly relevant variance components of pain thresholds is presented. The proposed method improved the biological interpretation of the results and increased the fraction of valid information that was obtained from the experimental data.
Conclusions: The method is applicable to many further biomedical problems including the creation of diagnostic complex biomarkers or short screening tests from comprehensive test batteries. Thus, the ABC analysis can be proposed as a mathematically valid replacement for traditional limits to maximize the information obtained from multivariate research data.
Process pharmacology : a pharmacological data science approach to drug development and therapy
(2016)
A novel functional-genomics based concept of pharmacology that uses artificial intelligence techniques for mining and knowledge discovery in "big data" providing comprehensive information about the drugs’ targets and their functional genomics is proposed. In “process pharmacology”, drugs are associated with biological processes. This puts the disease, regarded as alterations in the activity in one or several cellular processes, in the focus of drug therapy. In this setting, the molecular drug targets are merely intermediates. The identification of drugs for therapeutic or repurposing is based on similarities in the high-dimensional space of the biological processes that a drug influences. Applying this principle to data associated with lymphoblastic leukemia identified a short list of candidate drugs, including one that was recently proposed as novel rescue medication for lymphocytic leukemia. The pharmacological data science approach provides successful selections of drug candidates within development and repurposing tasks.
The Gini index is a measure of the inequality of a distribution that can be derived from Lorenz curves. While commonly used in, e.g., economic research, it suffers from ambiguity via lack of Lorenz dominance preservation. Here, investigation of large sets of empirical distributions of incomes of the World’s countries over several years indicated firstly, that the Gini indices are centered on a value of 33.33% corresponding to the Gini index of the uniform distribution and secondly, that the Lorenz curves of these distributions are consistent with Lorenz curves of log-normal distributions. This can be employed to provide a Lorenz dominance preserving equivalent of the Gini index. Therefore, a modified measure based on log-normal approximation and standardization of Lorenz curves is proposed. The so-called UGini index provides a meaningful and intuitive standardization on the uniform distribution as this characterizes societies that provide equal chances. The novel UGini index preserves Lorenz dominance. Analysis of the probability density distributions of the UGini index of the World’s counties income data indicated multimodality in two independent data sets. Applying Bayesian statistics provided a data-based classification of the World’s countries’ income distributions. The UGini index can be re-transferred into the classical index to preserve comparability with previous research.
Background: The quantification of global DNA methylation has been established in epigenetic screening. As more practicable alternatives to the HPLC-based gold standard, the methylation analysis of CpG islands in repeatable elements (LINE-1) and the luminometric methylation assay (LUMA) of overall 5-methylcytosine content in “CCGG” recognition sites are most widely used. Both methods are applied as virtually equivalent, despite the hints that their results only partly agree. This triggered the present agreement assessments.
Results: Three different human cell types (cultured MCF7 and SHSY5Y cell lines treated with different chemical modulators of DNA methylation and whole blood drawn from pain patients and healthy volunteers) were submitted to the global DNA methylation assays employing LINE-1 or LUMA-based pyrosequencing measurements. The agreement between the two bioassays was assessed using generally accepted approaches to the statistics for laboratory method comparison studies. Although global DNA methylation levels measured by the two methods correlated, five different lines of statistical evidence consistently rejected the assumption of complete agreement. Specifically, a bias was observed between the two methods. In addition, both the magnitude and direction of bias were tissue-dependent. Interassay differences could be grouped based on Bayesian statistics, and these groups allowed in turn to re-identify the originating tissue.
Conclusions: Although providing partly correlated measurements of DNA methylation, interchangeability of the quantitative results obtained with LINE-1 and LUMA was jeopardized by a consistent bias between the results. Moreover, the present analyses strongly indicate a tissue specificity of the differences between the two methods.
Advances in flow cytometry enable the acquisition of large and high-dimensional data sets per patient. Novel computational techniques allow the visualization of structures in these data and, finally, the identification of relevant subgroups. Correct data visualizations and projections from the high-dimensional space to the visualization plane require the correct representation of the structures in the data. This work shows that frequently used techniques are unreliable in this respect. One of the most important methods for data projection in this area is the t-distributed stochastic neighbor embedding (t-SNE). We analyzed its performance on artificial and real biomedical data sets. t-SNE introduced a cluster structure for homogeneously distributed data that did not contain any subgroupstructure. Inotherdatasets,t-SNEoccasionallysuggestedthewrongnumberofsubgroups or projected data points belonging to different subgroups, as if belonging to the same subgroup. As an alternative approach, emergent self-organizing maps (ESOM) were used in combination with U-matrix methods. This approach allowed the correct identification of homogeneous data while in sets containing distance or density-based subgroups structures; the number of subgroups and data point assignments were correctly displayed. The results highlight possible pitfalls in the use of a currently widely applied algorithmic technique for the detection of subgroups in high dimensional cytometric data and suggest a robust alternative.
Computational analyses of functions of gene sets obtained in microarray analyses or by topical database searches are increasingly important in biology. To understand their functions, the sets are usually mapped to Gene Ontology knowledge bases by means of over-representation analysis (ORA). Its result represents the specific knowledge of the functionality of the gene set. However, the specific ontology typically consists of many terms and relationships, hindering the understanding of the ‘main story’. We developed a methodology to identify a comprehensibly small number of GO terms as “headlines” of the specific ontology allowing to understand all central aspects of the roles of the involved genes. The Functional Abstraction method finds a set of headlines that is specific enough to cover all details of a specific ontology and is abstract enough for human comprehension. This method exceeds the classical approaches at ORA abstraction and by focusing on information rather than decorrelation of GO terms, it directly targets human comprehension. Functional abstraction provides, with a maximum of certainty, information value, coverage and conciseness, a representation of the biological functions in a gene set plays a role. This is the necessary means to interpret complex Gene Ontology results thus strengthening the role of functional genomics in biomarker and drug discovery.
The measurement of concentrations of drugs and endogenous substances is widely used in basic and clinical pharmacology research and service tasks. Using data science‐derived visualizations of laboratory data, it is demonstrated on a real‐life example that basic statistical exploration of laboratory assay results or advised standard visual methods of data inspection may fall short in detecting systematic laboratory errors. For example, data pathologies such as generating always the same value in all probes of a particular assay run may pass undetected when using standard methods of data quality check. It is shown that the use of different data visualizations that emphasize different views of the data may enhance the detection of systematic laboratory errors. A dotplot of single data in the order of assay is proposed that provides an overview on the data range, outliers and a particular type of systematic errors where similar values are wrongly measured in all probes.
The presence of cerebral lesions in patients with neurosensory alterations provides a unique window into brain function. Using a fuzzy logic based combination of morphological information about 27 olfactory-eloquent brain regions acquired with four different brain imaging techniques, patterns of brain damage were analyzed in 127 patients who displayed anosmia, i.e., complete loss of the sense of smell (n = 81), or other and mechanistically still incompletely understood olfactory dysfunctions including parosmia, i.e., distorted perceptions of olfactory stimuli (n = 50), or phantosmia, i.e., olfactory hallucinations (n = 22). A higher prevalence of parosmia, and as a tendency also phantosmia, was observed in subjects with medium overall brain damage. Further analysis showed a lower frequency of lesions in the right temporal lobe in patients with parosmia than in patients without parosmia. This negative direction of the differences was unique for parosmia. In anosmia, and also in phantosmia, lesions were more frequent in patients displaying the respective symptoms than in those without these dysfunctions. In anosmic patients, lesions in the right olfactory bulb region were much more frequent than in patients with preserved sense of smell, whereas a higher frequency of carriers of lesions in the left frontal lobe was observed for phantosmia. We conclude that anosmia, and phantosmia, are the result of lost function in relevant brain areas whereas parosmia is more complex, requiring damaged and intact brain regions at the same time.
Background: High-dimensional biomedical data are frequently clustered to identify subgroup structures pointing at distinct disease subtypes. It is crucial that the used cluster algorithm works correctly. However, by imposing a predefined shape on the clusters, classical algorithms occasionally suggest a cluster structure in homogenously distributed data or assign data points to incorrect clusters. We analyzed whether this can be avoided by using emergent self-organizing feature maps (ESOM).
Methods: Data sets with different degrees of complexity were submitted to ESOM analysis with large numbers of neurons, using an interactive R-based bioinformatics tool. On top of the trained ESOM the distance structure in the high dimensional feature space was visualized in the form of a so-called U-matrix. Clustering results were compared with those provided by classical common cluster algorithms including single linkage, Ward and k-means.
Results: Ward clustering imposed cluster structures on cluster-less "golf ball", "cuboid" and "S-shaped" data sets that contained no structure at all (random data). Ward clustering also imposed structures on permuted real world data sets. By contrast, the ESOM/U-matrix approach correctly found that these data contain no cluster structure. However, ESOM/U-matrix was correct in identifying clusters in biomedical data truly containing subgroups. It was always correct in cluster structure identification in further canonical artificial data. Using intentionally simple data sets, it is shown that popular clustering algorithms typically used for biomedical data sets may fail to cluster data correctly, suggesting that they are also likely to perform erroneously on high dimensional biomedical data.
Conclusions: The present analyses emphasized that generally established classical hierarchical clustering algorithms carry a considerable tendency to produce erroneous results. By contrast, unsupervised machine-learned analysis of cluster structures, applied using the ESOM/U-matrix method, is a viable, unbiased method to identify true clusters in the high-dimensional space of complex data.
Graphical abstract: 3-D representation of high dimensional data following ESOM projection and visualization of group (cluster) structures using the U-matrix, which employs a geographical map analogy of valleys where members of the same cluster are located, separated by mountain ranges marking cluster borders.
Biomedinformatics: A New Journal for the New Decade to Publish Biomedical Informatics Research
(2021)
With this volume, the peer-reviewed open access journal Biomedinformatics published online on the website https://www.mdpi.com/journal/biomedinformatics, and bearing the current International Standard Serial Number ISSN 2673-7426 enters the scientific community. At the beginning of the 3rd decade of the 21st century, this new journal is dedicated to research reports in the field of biomedical informatics. Biomedinformatics appears at a time when computational methods have reached clinical practice and the transformation to digital medicine is accelerating. Both digitized healthcare and bioinformatics-based research is producing and benefiting from increasingly complex data. This requires the development of tools and methods to extract information from these data and translate it into new knowledge. While biomedical research continues to require clinical and experi- mental data collection, digital healthcare research has clearly evolved from a collection of supporting methods to an equivalent scientific approach, enabling a paradigm shift from almost exclusively hypothesis-driven approaches to increasingly data-driven biomedical research. Indeed, computational science is a rapidly growing multidisciplinary field that uses advanced computational capabilities to understand and solve complex problems by applying new methods of computational intelligence, machine learning, and advanced statistics [1].
Optimal distribution-preserving downsampling of large biomedical data sets (opdisDownsampling)
(2021)
Motivation: The size of today’s biomedical data sets pushes computer equipment to its limits, even for seemingly standard analysis tasks such as data projection or clustering. Reducing large biomedical data by downsampling is therefore a common early step in data processing, often performed as random uniform class-proportional downsampling. In this report, we hypothesized that this can be optimized to obtain samples that better reflect the entire data set than those obtained using the current standard method. Results: By repeating the random sampling and comparing the distribution of the drawn sample with the distribution of the original data, it was possible to establish a method for obtaining subsets of data that better reflect the entire data set than taking only the first randomly selected subsample, as is the current standard. Experiments on artificial and real biomedical data sets showed that the reconstruction of the remaining data from the original data set from the downsampled data improved significantly. This was observed with both principal component analysis and autoencoding neural networks. The fidelity was dependent on both the number of cases drawn from the original and the number of samples drawn. Conclusions: Optimal distribution-preserving class-proportional downsampling yields data subsets that reflect the structure of the entire data better than those obtained with the standard method. By using distributional similarity as the only selection criterion, the proposed method does not in any way affect the results of a later planned analysis.
Chronic rhinosinusitis (CRS) is often treated by functional endoscopic paranasal sinus surgery, which improves endoscopic parameters and quality of life, while olfactory function was suggested as a further criterion of treatment success. In a prospective cohort study, 37 parameters from four categories were recorded from 60 men and 98 women before and four months after endoscopic sinus surgery, including endoscopic measures of nasal anatomy/pathology, assessments of olfactory function, quality of life, and socio-demographic or concomitant conditions. Parameters containing relevant information about changes associated with surgery were examined using unsupervised and supervised methods, including machine-learning techniques for feature selection. The analyzed cohort included 52 men and 38 women. Changes in the endoscopic Lildholdt score allowed separation of baseline from postoperative data with a cross-validated accuracy of 85%. Further relevant information included primary nasal symptoms from SNOT-20 assessments, and self-assessments of olfactory function. Overall improvement in these relevant parameters was observed in 95% of patients. A ranked list of criteria was developed as a proposal to assess the outcome of functional endoscopic sinus surgery in CRS patients with nasal polyposis. Three different facets were captured, including the Lildholdt score as an endoscopic measure and, in addition, disease-specific quality of life and subjectively perceived olfactory function.
Olfactory self-assessments have been analyzed with often negative but also positive conclusions about their usefulness as a surrogate for sensory olfactory testing. Patients with nasal polyposis have been highlighted as a well-predisposed group for reliable self-assessment. In a prospective cohort of n = 156 nasal polyposis patients, olfactory threshold, odor discrimination, and odor identification were tested using the “Sniffin’ Sticks” test battery, along with self-assessments of olfactory acuity on a numerical rating scale with seven named items or on a 10-point scale with only the extremes named. Apparent highly significant correlations in the complete cohort proved to reflect the group differences in olfactory diagnoses of anosmia (n = 65), hyposmia (n = 74), and normosmia (n = 17), more than the true correlations of self-ratings with olfactory test results, which were mostly very weak. The olfactory self-ratings correlated with a quality of life score, however, only weakly. By contrast, olfactory self-ratings proved as informative in assigning the categorical olfactory diagnosis. Using an olfactory diagnostic instrument, which consists of a mapping rule of two numerical rating scales of one’s olfactory function to the olfactory functional diagnosis based on the “Sniffin’ Sticks” clinical test battery, the diagnoses of anosmia, hyposmia, or normosmia could be derived from the self-ratings at a satisfactorily balanced accuracy of about 80%. It remains to be seen whether this approach of translating self-assessments into olfactory diagnoses of anosmia, hyposmia, and normosmia can be generalized to other clinical cohorts in which olfaction plays a role.
Recent scientific evidence suggests that chronic pain phenotypes are reflected in metabolomic changes. However, problems associated with chronic pain, such as sleep disorders or obesity, may complicate the metabolome pattern. Such a complex phenotype was investigated to identify common metabolomics markers at the interface of persistent pain, sleep, and obesity in 71 men and 122 women undergoing tertiary pain care. They were examined for patterns in d = 97 metabolomic markers that segregated patients with a relatively benign pain phenotype (low and little bothersome pain) from those with more severe clinical symptoms (high pain intensity, more bothersome pain, and co-occurring problems such as sleep disturbance). Two independent lines of data analysis were pursued. First, a data-driven supervised machine learning-based approach was used to identify the most informative metabolic markers for complex phenotype assignment. This pointed primarily at adenosine monophosphate (AMP), asparagine, deoxycytidine, glucuronic acid, and propionylcarnitine, and secondarily at cysteine and nicotinamide adenine dinucleotide (NAD) as informative for assigning patients to clinical pain phenotypes. After this, a hypothesis-driven analysis of metabolic pathways was performed, including sleep and obesity. In both the first and second line of analysis, three metabolic markers (NAD, AMP, and cysteine) were found to be relevant, including metabolic pathway analysis in obesity, associated with changes in amino acid metabolism, and sleep problems, associated with downregulated methionine metabolism. Taken together, present findings provide evidence that metabolomic changes associated with co-occurring problems may play a role in the development of severe pain. Co-occurring problems may influence each other at the metabolomic level. Because the methionine and glutathione metabolic pathways are physiologically linked, sleep problems appear to be associated with the first metabolic pathway, whereas obesity may be associated with the second.
Motivation: Gaussian mixture models (GMMs) are probabilistic models commonly used in biomedical research to detect subgroup structures in data sets with one-dimensional information. Reliable model parameterization requires that the number of modes, i.e., states of the generating process, is known. However, this is rarely the case for empirically measured biomedical data. Several implementations are available that estimate GMM parameters differently. This work aims to provide a comparative evaluation of automated GMM fitting methods.
Results and conclusions: The performance of commonly used algorithms for automatic parameterization and mode number determination was compared with respect to reproducing the ground truth of generated data derived from multiple normal distributions. Four main variants of Gaussian mode number detection algorithms and five variants of GMM parameter estimation methods were tested in a combinatory scenario. The combination of best performing mode number determination algorithms and GMM parameter estimation methods was then tested on artificial and real-live data sets known to display a GMM structure. None of the tested methods correctly determined the underlying data structure consistently. The likelihood ratio test had the best performance in identifying the mode number associated with the best GMM fit of the data distribution while the Markov chain Monte Carlo (MCMC) algorithm was best for GMM parameter estimation while. The combination of the two methods of number determination algorithms and GMM parameter estimation was consistently among the best and overall outperformed the available implementations.
Implementation: An automated tool for the detection of GMM based structures in (biomedical) datasets was created based on the present results and made freely available in the R library “opGMMassessment” at https://cran.r-project.org/package=opGMMassessment.
Knowledge discovery in biomedical data using supervised methods assumes that the data contain structure relevant to the class structure if a classifier can be trained to assign a case to the correct class better than by guessing. In this setting, acceptance or rejection of a scientific hypothesis may depend critically on the ability to classify cases better than randomly, without high classification performance being the primary goal. Random forests are often chosen for knowledge-discovery tasks because they are considered a powerful classifier that does not require sophisticated data transformation or hyperparameter tuning and can be regarded as a reference classifier for tabular numerical data. Here, we report a case where the failure of random forests using the default hyperparameter settings in the standard implementations of R and Python would have led to the rejection of the hypothesis that the data contained structure relevant to the class structure. After tuning the hyperparameters, classification performance increased from 56% to 65% balanced accuracy in R, and from 55% to 67% balanced accuracy in Python. More importantly, the 95% confidence intervals in the tuned versions were to the right of the value of 50% that characterizes guessing-level classification. Thus, tuning provided the desired evidence that the data structure supported the class structure of the data set. In this case, the tuning made more than a quantitative difference in the form of slightly better classification accuracy, but significantly changed the interpretation of the data set. This is especially true when classification performance is low and a small improvement increases the balanced accuracy to over 50% when guessing.
The evaluation of pharmacological data using machine learning requires high data quality. Therefore, data preprocessing, that is, cleaning analytical laboratory errors, replacing missing values or outliers, and transforming data adequately before actual data analysis, is crucial. Because current tools available for this purpose often require programming skills, preprocessing tools with graphical user interfaces that can be used interactively are needed. In collaboration between data scientists and experts in bioanalytical diagnostics, a graphical software package for data preprocessing called pguIMP is proposed, which contains a fixed sequence of preprocessing steps to enable reproducible interactive data preprocessing. As an R-based package, it also allows direct integration into this data science environment without requiring any programming knowledge. The implementation of contemporary data processing methods, including machine-learning-based imputation techniques, ensures the generation of corrected and cleaned bioanalytical data sets that preserve data structures such as clusters better than is possible with classical methods. This was evaluated on bioanalytical data sets from lipidomics and drug research using k-nearest-neighbors-based imputation followed by k-means clustering and density-based spatial clustering of applications with noise. The R package provides a Shiny-based web interface designed to be easy to use for non–data analysis experts. It is demonstrated that the spectrum of methods provided is suitable as a standard pipeline for preprocessing bioanalytical data in biomedical research domains. The R package pguIMP is freely available at the comprehensive R archive network (https://cran.r-project.org/web/packages/pguIMP/index.html).
Bayesian inference is ubiquitous in science and widely used in biomedical research such as cell sorting or “omics” approaches, as well as in machine learning (ML), artificial neural networks, and “big data” applications. However, the calculation is not robust in regions of low evidence. In cases where one group has a lower mean but a higher variance than another group, new cases with larger values are implausibly assigned to the group with typically smaller values. An approach for a robust extension of Bayesian inference is proposed that proceeds in two main steps starting from the Bayesian posterior probabilities. First, cases with low evidence are labeled as “uncertain” class membership. The boundary for low probabilities of class assignment (threshold 𝜀
) is calculated using a computed ABC analysis as a data-based technique for item categorization. This leaves a number of cases with uncertain classification (p < 𝜀
). Second, cases with uncertain class membership are relabeled based on the distance to neighboring classified cases based on Voronoi cells. The approach is demonstrated on biomedical data typically analyzed with Bayesian statistics, such as flow cytometric data sets or biomarkers used in medical diagnostics, where it increased the class assignment accuracy by 1–10% depending on the data set. The proposed extension of the Bayesian inference of class membership can be used to obtain robust and plausible class assignments even for data at the extremes of the distribution and/or for which evidence is weak.
Background: Transient receptor potential cation channel subfamily V member 1 (TRPV1) are sensitive to heat, capsaicin, pungent chemicals and other noxious stimuli. They play important roles in the pain pathway where in concert with proinflammatory factors such as leukotrienes they mediate sensitization and hyperalgesia. TRPV1 is the target of several novel analgesics drugs under development and therefore, TRPV1 genetic variants might represent promising candidates for pharmacogenetic modulators of drug effects.
Methods: A next-generation sequencing (NGS) panel was created for the human TRPV1 gene and in addition, for the leukotriene receptors BLT1 and BLT2 recently described to modulate TRPV1 mediated sensitisation processes rendering the coding genes LTB4R and LTB4R2 important co-players in pharmacogenetic approaches involving TRPV1. The NGS workflow was based on a custom AmpliSeq™ panel and designed for sequencing of human genes on an Ion PGM™ Sequencer. A cohort of 80 healthy subjects of Western European descent was screened to evaluate and validate the detection of exomic sequences of the coding genes with 25 base pair exon padding.
Results: The amplicons covered approximately 97% of the target sequence. A median of 2.81 x 10 6 reads per run was obtained. This identified approximately 140 chromosome loci where nucleotides deviated from the reference sequence GRCh37 hg19 comprising the three genes TRPV1, LTB4R and LTB4R2. Correspondence between NGS and Sanger derived nucleotide sequences was 100%.
Conclusions: Results suggested that the NGS approach based on AmpliSeq™ libraries and Ion Personal Genome Machine (PGM) sequencing is a highly efficient mutation detection method. It is suitable for large-scale sequencing of TRPV1 and functionally related genes. The method adds a large amount of genetic information as a basis for complete analysis of TRPV1 ion channel genetics and its functional consequences.
Based on accumulating evidence of a role of lipid signaling in many physiological and pathophysiological processes including psychiatric diseases, the present data driven analysis was designed to gather information needed to develop a prospective biomarker, using a targeted lipidomics approach covering different lipid mediators. Using unsupervised methods of data structure detection, implemented as hierarchal clustering, emergent self-organizing maps of neuronal networks, and principal component analysis, a cluster structure was found in the input data space comprising plasma concentrations of d = 35 different lipid-markers of various classes acquired in n = 94 subjects with the clinical diagnoses depression, bipolar disorder, ADHD, dementia, or in healthy controls. The structure separated patients with dementia from the other clinical groups, indicating that dementia is associated with a distinct lipid mediator plasma concentrations pattern possibly providing a basis for a future biomarker. This hypothesis was subsequently assessed using supervised machine-learning methods, implemented as random forests or principal component analysis followed by computed ABC analysis used for feature selection, and as random forests, k-nearest neighbors, support vector machines, multilayer perceptron, and naïve Bayesian classifiers to estimate whether the selected lipid mediators provide sufficient information that the diagnosis of dementia can be established at a higher accuracy than by guessing. This succeeded using a set of d = 7 markers comprising GluCerC16:0, Cer24:0, Cer20:0, Cer16:0, Cer24:1, C16 sphinganine, and LacCerC16:0, at an accuracy of 77%. By contrast, using random lipid markers reduced the diagnostic accuracy to values of 65% or less, whereas training the algorithms with randomly permuted data was followed by complete failure to diagnose dementia, emphasizing that the selected lipid mediators were display a particular pattern in this disease possibly qualifying as biomarkers.
Euclidean distance-optimized data transformation for cluster analysis in biomedical data (EDOtrans)
(2022)
Background: Data transformations are commonly used in bioinformatics data processing in the context of data projection and clustering. The most used Euclidean metric is not scale invariant and therefore occasionally inappropriate for complex, e.g., multimodal distributed variables and may negatively affect the results of cluster analysis. Specifically, the squaring function in the definition of the Euclidean distance as the square root of the sum of squared differences between data points has the consequence that the value 1 implicitly defines a limit for distances within clusters versus distances between (inter-) clusters.
Methods: The Euclidean distances within a standard normal distribution (N(0,1)) follow a N(0,2–√) distribution. The EDO-transformation of a variable X is proposed as EDO=X/(2–√⋅s) following modeling of the standard deviation s by a mixture of Gaussians and selecting the dominant modes via item categorization. The method was compared in artificial and biomedical datasets with clustering of untransformed data, z-transformed data, and the recently proposed pooled variable scaling.
Results: A simulation study and applications to known real data examples showed that the proposed EDO scaling method is generally useful. The clustering results in terms of cluster accuracy, adjusted Rand index and Dunn’s index outperformed the classical alternatives. Finally, the EDO transformation was applied to cluster a high-dimensional genomic dataset consisting of gene expression data for multiple samples of breast cancer tissues, and the proposed approach gave better results than classical methods and was compared with pooled variable scaling.
Conclusions: For multivariate procedures of data analysis, it is proposed to use the EDO transformation as a better alternative to the established z-standardization, especially for nontrivially distributed data. The “EDOtrans” R package is available at https://cran.r-project.org/package=EDOtrans.
Background: In pain research and clinics, it is common practice to subgroup subjects according to shared pain characteristics. This is often achieved by computer‐aided clustering. In response to a recent EU recommendation that computer‐aided decision making should be transparent, we propose an approach that uses machine learning to provide (1) an understandable interpretation of a cluster structure to (2) enable a transparent decision process about why a person concerned is placed in a particular cluster.
Methods: Comprehensibility was achieved by transforming the interpretation problem into a classification problem: A sub‐symbolic algorithm was used to estimate the importance of each pain measure for cluster assignment, followed by an item categorization technique to select the relevant variables. Subsequently, a symbolic algorithm as explainable artificial intelligence (XAI) provided understandable rules of cluster assignment. The approach was tested using 100‐fold cross‐validation.
Results: The importance of the variables of the data set (6 pain‐related characteristics of 82 healthy subjects) changed with the clustering scenarios. The highest median accuracy was achieved by sub‐symbolic classifiers. A generalized post‐hoc interpretation of clustering strategies of the model led to a loss of median accuracy. XAI models were able to interpret the cluster structure almost as correctly, but with a slight loss of accuracy.
Conclusions: Assessing the variables importance in clustering is important for understanding any cluster structure. XAI models are able to provide a human‐understandable interpretation of the cluster structure. Model selection must be adapted individually to the clustering problem. The advantage of comprehensibility comes at an expense of accuracy.
Background: Modulation of cortical excitability by transcranial magnetic stimulation (TMS) is used for investigating human brain functions. A common observation is the high variability of long-term depression (LTD)-like changes in human (motor) cortex excitability. This study aimed at analyzing the response subgroup distribution after paired continuous theta burst stimulation (cTBS) as a basis for subject selection.
Methods: The effects of paired cTBS using 80% active motor threshold (AMT) in 31 healthy volunteers were assessed at the primary motor cortex (M1) corresponding to the representation of the first dorsal interosseous (FDI) muscle of the left hand, before and up to 50 min after plasticity induction. The changes in motor evoked potentials (MEPs) were analyzed using machine-learning derived methods implemented as Gaussian mixture modeling (GMM) and computed ABC analysis.
Results: The probability density distribution of the MEP changes from baseline was tri-modal, showing a clear separation at 80.9%. Subjects displaying at least this degree of LTD-like changes were n = 6 responders. By contrast, n = 7 subjects displayed a paradox response with increase in MEP. Reassessment using ABC analysis as alternative approach led to the same n = 6 subjects as a distinct category.
Conclusion: Depressive effects of paired cTBS using 80% AMT endure at least 50 min, however, only in a small subgroup of healthy subjects. Hence, plasticity induction by paired cTBS might not reflect a general mechanism in human motor cortex excitability. A mathematically supported criterion is proposed to select responders for enrolment in assessments of human brain functional networks using virtual brain lesions.
Next-generation sequencing (NGS) provides unrestricted access to the genome, but it produces ‘big data’ exceeding in amount and complexity the classical analytical approaches. We introduce a bioinformatics-based classifying biomarker that uses emergent properties in genetics to separate pain patients requiring extremely high opioid doses from controls. Following precisely calculated selection of the 34 most informative markers in the OPRM1, OPRK1, OPRD1 and SIGMAR1 genes, pattern of genotypes belonging to either patient group could be derived using a k-nearest neighbor (kNN) classifier that provided a diagnostic accuracy of 80.6±4%. This outperformed alternative classifiers such as reportedly functional opioid receptor gene variants or complex biomarkers obtained via multiple regression or decision tree analysis. The accumulation of several genetic variants with only minor functional influences may result in a qualitative consequence affecting complex phenotypes, pointing at emergent properties in genetics.
Background: Many gene variants modulate the individual perception of pain and possibly also its persistence. The limited selection of single functional variants is increasingly being replaced by analyses of the full coding and regulatory sequences of pain-relevant genes accessible by means of next generation sequencing (NGS).
Methods: An NGS panel was created for a set of 77 human genes selected following different lines of evidence supporting their role in persisting pain. To address the role of these candidate genes, we established a sequencing assay based on a custom AmpliSeqTM panel to assess the exomic sequences in 72 subjects of Caucasian ethnicity. To identify the systems biology of the genes, the biological functions associated with these genes were assessed by means of a computational over-representation analysis.
Results: Sequencing generated a median of 2.85 ⋅ 106 reads per run with a mean depth close to 200 reads, mean read length of 205 called bases and an average chip loading of 71%. A total of 3,185 genetic variants were called. A computational functional genomics analysis indicated that the proposed NGS gene panel covers biological processes identified previously as characterizing the functional genomics of persisting pain.
Conclusion: Results of the NGS assay suggested that the produced nucleotide sequences are comparable to those earned with the classical Sanger sequencing technique. The assay is applicable for small to large-scale experimental setups to target the accessing of information about any nucleotide within the addressed genes in a study cohort.
Background: Human genetic research has implicated functional variants of more than one hundred genes in the modulation of persisting pain. Artificial intelligence and machine‐learning techniques may combine this knowledge with results of genetic research gathered in any context, which permits the identification of the key biological processes involved in chronic sensitization to pain.
Methods: Based on published evidence, a set of 110 genes carrying variants reported to be associated with modulation of the clinical phenotype of persisting pain in eight different clinical settings was submitted to unsupervised machine‐learning aimed at functional clustering. Subsequently, a mathematically supported subset of genes, comprising those most consistently involved in persisting pain, was analysed by means of computational functional genomics in the Gene Ontology knowledgebase.
Results: Clustering of genes with evidence for a modulation of persisting pain elucidated a functionally heterogeneous set. The situation cleared when the focus was narrowed to a genetic modulation consistently observed throughout several clinical settings. On this basis, two groups of biological processes, the immune system and nitric oxide signalling, emerged as major players in sensitization to persisting pain, which is biologically highly plausible and in agreement with other lines of pain research.
Conclusions: The present computational functional genomics‐based approach provided a computational systems‐biology perspective on chronic sensitization to pain. Human genetic control of persisting pain points to the immune system as a source of potential future targets for drugs directed against persisting pain. Contemporary machine‐learned methods provide innovative approaches to knowledge discovery from previous evidence.
Significance: We show that knowledge discovery in genetic databases and contemporary machine‐learned techniques can identify relevant biological processes involved in Persitent pain.
Background: Glial cells in the central nervous system play a key role in neuroinflammation and subsequent central sensitization to pain. They are therefore involved in the development of persistent pain. One of the main sites of interaction of the immune system with persistent pain has been identified as neuro-immune crosstalk at the glialopioid interface. The present study examined a potential association between the DNA methylation of two key players of glial/opioid intersection and persistent postoperative pain. Methods: In a cohort of 140 women who had undergone breast cancer surgery, and were assigned based on a 3year follow-up to either a persistent or non-persistent pain phenotype, the role of epigenetic regulation of key players in the glial-opioid interface was assessed. The methylation of genes coding for the Toll-like receptor 4 (TLR4) as a major mediator of glial contributions to persistent pain or for the μ-opioid receptor (OPRM1) was analyzed and its association with the pain phenotype was compared with that conferred by global genome-wide DNA methylation assessed via quantification of the methylation in the retrotransposon LINE1. Results: Training of machine learning algorithms indicated that the global DNA methylation provided a similar diagnostic accuracy for persistent pain as previously established non-genetic predictors. However, the diagnosis can be based on a single DNA based marker. By contrast, the methylation of TLR4 or OPRM1 genes could not contribute further to the allocation of the patients to the pain-related phenotype groups. Conclusions: While clearly supporting a predictive utility of epigenetic testing, the present analysis cannot provide support for specific epigenetic modulation of persistent postoperative pain via methylation of two key genes of the glial-opioid interface.
The genetic background of pain is becoming increasingly well understood, which opens up possibilities for predicting the individual risk of persistent pain and the use of tailored therapies adapted to the variant pattern of the patient’s pain-relevant genes. The individual variant pattern of pain-relevant genes is accessible via next-generation sequencing, although the analysis of all “pain genes” would be expensive. Here, we report on the development of a cost-effective next generation sequencing-based pain-genotyping assay comprising the development of a customized AmpliSeq™ panel and bioinformatics approaches that condensate the genetic information of pain by identifying the most representative genes. The panel includes 29 key genes that have been shown to cover 70% of the biological functions exerted by a list of 540 so-called “pain genes” derived from transgenic mice experiments. These were supplemented by 43 additional genes that had been independently proposed as relevant for persistent pain. The functional genomics covered by the resulting 72 genes is particularly represented by mitogen-activated protein kinase of extracellular signal-regulated kinase and cytokine production and secretion. The present genotyping assay was established in 61 subjects of Caucasian ethnicity and investigates the functional role of the selected genes in the context of the known genetic architecture of pain without seeking functional associations for pain. The assay identified a total of 691 genetic variants, of which many have reports for a clinical relevance for pain or in another context. The assay is applicable for small to large-scale experimental setups at contemporary genotyping costs.
Interactions of drugs with the classical epigenetic mechanism of DNA methylation or histone modification are increasingly being elucidated mechanistically and used to develop novel classes of epigenetic therapeutics. A data science approach is used to synthesize current knowledge on the pharmacological implications of epigenetic regulation of gene expression. Computer-aided knowledge discovery for epigenetic implications of current approved or investigational drugs was performed by querying information from multiple publicly available gold-standard sources to (i) identify enzymes involved in classical epigenetic processes, (ii) screen original biomedical scientific publications including bibliometric analyses, (iii) identify drugs that interact with epigenetic enzymes, including their additional non-epigenetic targets, and (iv) analyze computational functional genomics of drugs with epigenetic interactions. PubMed database search yielded 3051 hits on epigenetics and drugs, starting in 1992 and peaking in 2016. Annual citations increased to a plateau in 2000 and show a downward trend since 2008. Approved and investigational drugs in the DrugBank database included 122 compounds that interacted with 68 unique epigenetic enzymes. Additional molecular functions modulated by these drugs included other enzyme interactions, whereas modulation of ion channels or G-protein-coupled receptors were underrepresented. Epigenetic interactions included (i) drug-induced modulation of DNA methylation, (ii) drug-induced modulation of histone conformations, and (iii) epigenetic modulation of drug effects by interference with pharmacokinetics or pharmacodynamics. Interactions of epigenetic molecular functions and drugs are mutual. Recent research activities on the discovery and development of novel epigenetic therapeutics have passed successfully, whereas epigenetic effects of non-epigenetic drugs or epigenetically induced changes in the targets of common drugs have not yet received the necessary systematic attention in the context of pharmacological plasticity.
Background: To prevent persistent post-surgery pain, early identification of patients at high risk is a clinical need. Supervised machine-learning techniques were used to test how accurately the patients’ performance in a preoperatively performed tonic cold pain test could predict persistent post-surgery pain.
Methods: We analysed 763 patients from a cohort of 900 women who were treated for breast cancer, of whom 61 patients had developed signs of persistent pain during three yr of follow-up. Preoperatively, all patients underwent a cold pain test (immersion of the hand into a water bath at 2–4 °C). The patients rated the pain intensity using a numerical ratings scale (NRS) from 0 to 10. Supervised machine-learning techniques were used to construct a classifier that could predict patients at risk of persistent pain.
Results: Whether or not a patient rated the pain intensity at NRS=10 within less than 45 s during the cold water immersion test provided a negative predictive value of 94.4% to assign a patient to the "persistent pain" group. If NRS=10 was never reached during the cold test, the predictive value for not developing persistent pain was almost 97%. However, a low negative predictive value of 10% implied a high false positive rate.
Conclusions: Results provide a robust exclusion of persistent pain in women with an accuracy of 94.4%. Moreover, results provide further support for the hypothesis that the endogenous pain inhibitory system may play an important role in the process of pain becoming persistent.
The human sense of smell is often analyzed as being composed of three main components comprising olfactory threshold, odor discrimination and the ability to identify odors. A relevant distinction of the three components and their differential changes in distinct disorders remains a research focus. The present data-driven analysis aimed at establishing a cluster structure in the pattern of olfactory subtest results. Therefore, unsupervised machine-learning was applied onto olfactory subtest results acquired in 10,714 subjects with nine different olfactory pathologies. Using the U-matrix, Emergent Self-organizing feature maps (ESOM) identified three different clusters characterized by (i) low threshold and good discrimination and identification, (ii) very high threshold associated with absent to poor discrimination and identification ability, or (iii) medium threshold, i.e., in the mid-range of possible thresholds, associated with reduced discrimination and identification ability. Specific etiologies of olfactory (dys)function were unequally represented in the clusters (p < 2.2 · 10−16). Patients with congenital anosmia were overrepresented in the second cluster while subjects with postinfectious olfactory dysfunction belonged frequently to the third cluster. However, the clusters provided no clear separation between etiologies. Hence, the present verification of a distinct cluster structure encourages continued scientific efforts at olfactory test pattern recognition.
A machine-learned analysis suggests non-redundant diagnostic information in olfactory subtests
(2019)
Background: The functional performance of the human sense of smell can be approached via assessment of the olfactory threshold, the ability to discriminate odors or the ability to identify odors. Contemporary clinical test batteries include all or a selection of these components, with some dissent about the required number and choice.
Methods: Olfactory thresholds, odor discrimination and odor identification scores were available from 10,714 subjects (3662 with anomia, 4299 with hyposmia, and 2752 with normal olfactory function). To assess, whether the olfactory subtests confer the same information or each subtest confers at least partly non-redundant information relevant to the olfactory diagnosis, we compared the diagnostic accuracy of supervised machine learning algorithms trained with the complete information from all three subtests with that obtained when performing the training with the information of only two or one subtests.
Results: The training of machine-learned algorithms with the full information about olfactory thresholds, odor discrimination and odor identification from 2/3 of the cases, resulted in a balanced olfactory diagnostic accuracy of 98% or better in the 1/3 remaining cases. The most pronounced decrease in the balanced accuracy, to approximately 85%, was observed when omitting olfactory thresholds from the training, whereas omitting odor discrimination or identification was associated with smaller decreases (balanced accuracies approximately 90%).
Conclusions: Results support partly non-redundant contributions of each olfactory subtest to the clinical olfactory diagnosis. Olfactory thresholds provided the largest amount of non-redundant information to the olfactory diagnosis.
Biomedical data obtained during cell experiments, laboratory animal research, or human studies often display a complex distribution. Statistical identification of subgroups in research data poses an analytical challenge. Here were introduce an interactive R-based bioinformatics tool, called “AdaptGauss”. It enables a valid identification of a biologically-meaningful multimodal structure in the data by fitting a Gaussian mixture model (GMM) to the data. The interface allows a supervised selection of the number of subgroups. This enables the expectation maximization (EM) algorithm to adapt more complex GMM than usually observed with a noninteractive approach. Interactively fitting a GMM to heat pain threshold data acquired from human volunteers revealed a distribution pattern with four Gaussian modes located at temperatures of 32.3, 37.2, 41.4, and 45.4 °C. Noninteractive fitting was unable to identify a meaningful data structure. Obtained results are compatible with known activity temperatures of different TRP ion channels suggesting the mechanistic contribution of different heat sensors to the perception of thermal pain. Thus, sophisticated analysis of the modal structure of biomedical data provides a basis for the mechanistic interpretation of the observations. As it may reflect the involvement of different TRP thermosensory ion channels, the analysis provides a starting point for hypothesis-driven laboratory experiments.
Background: Prevention of persistent pain following breast cancer surgery, via early identification of patients at high risk, is a clinical need. Supervised machine-learning was used to identify parameters that predict persistence of significant pain.
Methods: Over 500 demographic, clinical and psychological parameters were acquired up to 6 months after surgery from 1,000 women (aged 28–75 years) who were treated for breast cancer. Pain was assessed using an 11-point numerical rating scale before surgery and at months 1, 6, 12, 24, and 36. The ratings at months 12, 24, and 36 were used to allocate patents to either "persisting pain" or "non-persisting pain" groups. Unsupervised machine learning was applied to map the parameters to these diagnoses.
Results: A symbolic rule-based classifier tool was created that comprised 21 single or aggregated parameters, including demographic features, psychological and pain-related parameters, forming a questionnaire with "yes/no" items (decision rules). If at least 10 of the 21 rules applied, persisting pain was predicted at a cross-validated accuracy of 86% and a negative predictive value of approximately 95%.
Conclusions: The present machine-learned analysis showed that, even with a large set of parameters acquired from a large cohort, early identification of these patients is only partly successful. This indicates that more parameters are needed for accurate prediction of persisting pain. However, with the current parameters it is possible, with a certainty of almost 95%, to exclude the possibility of persistent pain developing in a woman being treated for breast cancer.
Consequences of a human TRPA1 genetic variant on the perception of nociceptive and olfactory stimuli
(2014)
Background: TRPA1 ion channels are involved in nociception and are also excited by pungent odorous substances. Based on reported associations of TRPA1 genetics with increased sensitivity to thermal pain stimuli, we therefore hypothesized that this association also exists for increased olfactory sensitivity.
Methods: Olfactory function and nociception was compared between carriers (n = 38) and non-carriers (n = 43) of TRPA1 variant rs11988795 G.A, a variant known to enhance cold pain perception. Olfactory function was quantified by assessing the odor threshold, odor discrimination and odor identification, and by applying 200-ms pulses of H2S intranasal. Nociception was assessed by measuring pain thresholds to experimental nociceptive stimuli (blunt pressure, electrical stimuli, cold and heat stimuli, and 200-ms intranasal pulses of CO2).
Results: Among the 11 subjects with moderate hyposmia, carriers of the minor A allele (n = 2) were underrepresented (34 carriers among the 70 normosmic subjects; p = 0.049). Moreover, carriers of the A allele discriminated odors significantly better than non-carriers (13.161.5 versus 12.361.6 correct discriminations) and indicated a higher intensity of the H2S stimuli (29.2613.2 versus 21612.8 mm VAS, p = 0.006), which, however, could not be excluded to have involved a trigeminal component during stimulation. Finally, the increased sensitivity to thermal pain could be reproduced.
Conclusions: The findings are in line with a previous association of a human TRPA1 variant with nociceptive parameters and extend the association to the perception of odorants. However, this addresses mainly those stimulants that involve a trigeminal component whereas a pure olfactory effect may remain disputable. Nevertheless, findings suggest that future TRPA1 modulating drugs may modify the perception of odorants.
Background and Aims: Mutations reducing the function of Nav1.7 sodium channels entail diminished pain perception and olfactory acuity, suggesting a link between nociception and olfaction at ion channel level. We hypothesized that if such link exists, it should work in both directions and gain-of-function Nav1.7 mutations known to be associated with increased pain perception should also increase olfactory acuity.
Methods: SCN9A variants were assessed known to enhance pain perception and found more frequently in the average population. Specifically, carriers of SCN9A variants rs41268673C>A (P610T; n = 14) or rs6746030C>T (R1150W; n = 21) were compared with non-carriers (n = 40). Olfactory function was quantified by assessing odor threshold, odor discrimination and odor identification using an established olfactory test. Nociception was assessed by measuring pain thresholds to experimental nociceptive stimuli (punctate and blunt mechanical pressure, heat and electrical stimuli).
Results: The number of carried alleles of the non-mutated SCN9A haplotype rs41268673C/rs6746030C was significantly associated with the comparatively highest olfactory threshold (0 alleles: threshold at phenylethylethanol dilution step 12 of 16 (n = 1), 1 allele: 10.6±2.6 (n = 34), 2 alleles: 9.5±2.1 (n = 40)). The same SCN9A haplotype determined the pain threshold to blunt pressure stimuli (0 alleles: 21.1 N/m2, 1 allele: 29.8±10.4 N/m2, 2 alleles: 33.5±10.2 N/m2).
Conclusions: The findings established a working link between nociception and olfaction via Nav1.7 in the gain-of-function direction. Hence, together with the known reduced olfaction and pain in loss-of-function mutations, a bidirectional genetic functional association between nociception and olfaction exists at Nav1.7 level.
Genetic association studies have shown their usefulness in assessing the role of ion channels in human thermal pain perception. We used machine learning to construct a complex phenotype from pain thresholds to thermal stimuli and associate it with the genetic information derived from the next-generation sequencing (NGS) of 15 ion channel genes which are involved in thermal perception, including ASIC1, ASIC2, ASIC3, ASIC4, TRPA1, TRPC1, TRPM2, TRPM3, TRPM4, TRPM5, TRPM8, TRPV1, TRPV2, TRPV3, and TRPV4. Phenotypic information was complete in 82 subjects and NGS genotypes were available in 67 subjects. A network of artificial neurons, implemented as emergent self-organizing maps, discovered two clusters characterized by high or low pain thresholds for heat and cold pain. A total of 1071 variants were discovered in the 15 ion channel genes. After feature selection, 80 genetic variants were retained for an association analysis based on machine learning. The measured performance of machine learning-mediated phenotype assignment based on this genetic information resulted in an area under the receiver operating characteristic curve of 77.2%, justifying a phenotype classification based on the genetic information. A further item categorization finally resulted in 38 genetic variants that contributed most to the phenotype assignment. Most of them (10) belonged to the TRPV3 gene, followed by TRPM3 (6). Therefore, the analysis successfully identified the particular importance of TRPV3 and TRPM3 for an average pain phenotype defined by the sensitivity to moderate thermal stimuli.
Aim: Exposure to opioids has been associated with epigenetic effects. Studies in rodents suggested a role of varying degrees of DNA methylation in the differential regulation of μ-opioid receptor expression across the brain.
Methods: In a translational investigation, using tissue acquired postmortem from 21 brain regions of former opiate addicts, representing a human cohort with chronic opioid exposure, μ-opioid receptor expression was analyzed at the level of DNA methylation, mRNA and protein.
Results & conclusion: While high or low μ-opioid receptor expression significantly correlated with local OPRM1 mRNA levels, there was no corresponding association with OPRM1 methylation status. Additional experiments in human cell lines showed that changes in DNA methylation associated with changes in μ-opioid expression were an order of magnitude greater than differences in brain. Hence, different degrees of DNA methylation associated with chronic opioid exposure are unlikely to exert a major role in the region-specificity of μ-opioid receptor expression in the human brain.
Inverted perceptual judgment of nociceptive stimuli at threshold level following inconsistent cues
(2015)
Objective: The perception of pain is susceptible to modulation by psychological and contextual factors. It has been shown that subjects judge noxious stimuli as more painful in a respective suggestive context, which disappears when the modifying context is resolved. However, a context in which subjects judge the painfulness of a nociceptive stimulus in exactly the opposite direction to that of the cues has never been shown so far.
Methods: Nociceptive stimuli (300 ms intranasal gaseous CO2) at the individual pain threshold level were applied after a visual cue announcing the stimulus as either "no pain", merely a "stimulus", or "pain". Among the stimuli at threshold level, other CO2 stimuli that were clearly below or above pain threshold were randomly interspersed. These were announced beforehand in 12 subjects randomly with correct or incorrect cues, i.e., clearly painful or clearly non-painful stimuli were announced equally often as not painful or painful. By contrast, in a subsequent group of another 12 subjects, the stimuli were always announced correctly with respect to the evoked pain.
Results: The random and often incorrect announcement of stimuli clearly below or above pain threshold caused the subjects to rate the stimuli at pain-threshold level in the opposite direction of the cue, i.e., when the stimuli were announced as "pain" significantly more often than as non-painful and vice versa (p < 10-4). By contrast, in the absence of incongruence between announcement and perception of the far-from-threshold stimuli, stimuli at pain threshold were rated in the cued direction.
Conclusions: The present study revealed the induction of associations incongruent with a given message in the perception of pain. We created a context of unreliable cues whereby subjects perceived the stimulus opposite to that suggested by a prior cue, i.e., potentially nociceptive stimuli at pain threshold level that were announced as painful were judged as non-painful and vice versa. These findings are consistent with reported data on the effects of distrust on non-painful cognitive responses.
Background: Prevention of persistent pain after breast cancer surgery, via early identification of patients at high risk, is a clinical need. Psychological factors are among the most consistently proposed predictive parameters for the development of persistent pain. However, repeated use of long psychological questionnaires in this context may be exhaustive for a patient and inconvenient in everyday clinical practice.
Methods: Supervised machine learning was used to create a short form of questionnaires that would provide the same predictive performance of pain persistence as the full questionnaires in a cohort of 1000 women followed up for 3 yr after breast cancer surgery. Machine-learned predictors were first trained with the full-item set of Beck's Depression Inventory (BDI), Spielberger's State–Trait Anxiety Inventory (STAI), and the State–Trait Anger Expression Inventory (STAXI-2). Subsequently, features were selected from the questionnaires to create predictors having a reduced set of items.
Results: A combined seven-item set of 10% of the original psychological questions from STAI and BDI, provided the same predictive performance parameters as the full questionnaires for the development of persistent postsurgical pain. The seven-item version offers a shorter and at least as accurate identification of women in whom pain persistence is unlikely (almost 95% negative predictive value).
Conclusions: Using a data-driven machine-learning approach, a short list of seven items from BDI and STAI is proposed as a basis for a predictive tool for the persistence of pain after breast cancer surgery.
The comprehensive assessment of pain-related human phenotypes requires combinations of nociceptive measures that produce complex high-dimensional data, posing challenges to bioinformatic analysis. In this study, we assessed established experimental models of heat hyperalgesia of the skin, consisting of local ultraviolet-B (UV-B) irradiation or capsaicin application, in 82 healthy subjects using a variety of noxious stimuli. We extended the original heat stimulation by applying cold and mechanical stimuli and assessing the hypersensitization effects with a clinically established quantitative sensory testing (QST) battery (German Research Network on Neuropathic Pain). This study provided a 246 × 10-sized data matrix (82 subjects assessed at baseline, following UV-B application, and following capsaicin application) with respect to 10 QST parameters, which we analyzed using machine-learning techniques. We observed statistically significant effects of the hypersensitization treatments in 9 different QST parameters. Supervised machine-learned analysis implemented as random forests followed by ABC analysis pointed to heat pain thresholds as the most relevantly affected QST parameter. However, decision tree analysis indicated that UV-B additionally modulated sensitivity to cold. Unsupervised machine-learning techniques, implemented as emergent self-organizing maps, hinted at subgroups responding to topical application of capsaicin. The distinction among subgroups was based on sensitivity to pressure pain, which could be attributed to sex differences, with women being more sensitive than men. Thus, while UV-B and capsaicin share a major component of heat pain sensitization, they differ in their effects on QST parameter patterns in healthy subjects, suggesting a lack of redundancy between these models.
Persistent and, in particular, neuropathic pain is a major healthcare problem with still insufficient pharmacological treatment options. This triggered research activities aimed at finding analgesics with a novel mechanism of action. Results of these efforts will need to pass through the phases of drug development, in which experimental human pain models are established components e.g. implemented as chemical hyperalgesia induced by capsaicin. We aimed at ranking the various readouts of a human capsaicin–based pain model with respect to the most relevant information about the effects of a potential reference analgesic. In a placebo‐controlled, randomized cross‐over study, seven different pain‐related readouts were acquired in 16 healthy individuals before and after oral administration of 300 mg pregabalin. The sizes of the effect on pain induced by intradermal injection of capsaicin were quantified by calculating Cohen's d. While in four of the seven pain‐related parameters, pregabalin provided a small effect judged by values of Cohen's d exceeding 0.2, an item categorization technique implemented as computed ABC analysis identified the pain intensities in the area of secondary hyperalgesia and of allodynia as the most suitable parameters to quantify the analgesic effects of pregabalin. Results of this study provide further support for the ability of the intradermal capsaicin pain model to show analgesic effects of pregabalin. Results can serve as a basis for the designs of studies where the inclusion of this particular pain model and pregabalin is planned.
Background: It is assumed that different pain phenotypes are based on varying molecular pathomechanisms. Distinct ion channels seem to be associated with the perception of cold pain, in particular TRPM8 and TRPA1 have been highlighted previously. The present study analyzed the distribution of cold pain thresholds with focus at describing the multimodality based on the hypothesis that it reflects a contribution of distinct ion channels.
Methods: Cold pain thresholds (CPT) were available from 329 healthy volunteers (aged 18 - 37 years; 159 men) enrolled in previous studies. The distribution of the pooled and log-transformed threshold data was described using a kernel density estimation (Pareto Density Estimation (PDE)) and subsequently, the log data was modeled as a mixture of Gaussian distributions using the expectation maximization (EM) algorithm to optimize the fit.
Results: CPTs were clearly multi-modally distributed. Fitting a Gaussian Mixture Model (GMM) to the log-transformed threshold data revealed that the best fit is obtained when applying a three-model distribution pattern. The modes of the identified three Gaussian distributions, retransformed from the log domain to the mean stimulation temperatures at which the subjects had indicated pain thresholds, were obtained at 23.7 °C, 13.2 °C and 1.5 °C for Gaussian #1, #2 and #3, respectively.
Conclusions: The localization of the first and second Gaussians was interpreted as reflecting the contribution of two different cold sensors. From the calculated localization of the modes of the first two Gaussians, the hypothesis of an involvement of TRPM8, sensing temperatures from 25 - 24 °C, and TRPA1, sensing cold from 17 °C can be derived. In that case, subjects belonging to either Gaussian would possess a dominance of the one or the other receptor at the skin area where the cold stimuli had been applied. The findings therefore support a suitability of complex analytical approaches to detect mechanistically determined patterns from pain phenotype data.
Correlations between personality traits and a wide range of sensory thresholds were examined. Participants (N = 124) completed a personality inventory (NEO-FFI) and underwent assessment of olfactory, trigeminal, tactile and gustatory detection thresholds, as well as examination of trigeminal and tactile pain thresholds. Significantly enhanced odor sensitivity in socially agreeable people, significantly enhanced trigeminal sensitivity in neurotic subjects, and a tendency for enhanced pain tolerance in highly conscientious participants was revealed. It is postulated that varied sensory processing may influence an individual's perception of the environment; particularly their perception of socially relevant or potentially dangerous stimuli and thus, varied with personality.
Background: A delta and C fibers are the major pain-conducting nerve fibers, activate only partly the same brain areas, and are differently involved in pain syndromes. Whether a stimulus excites predominantly A delta or C fibers is a commonly asked question in basic pain research but a quick test was lacking so far. Methodology/Principal Findings: Of 77 verbal descriptors of pain sensations, "pricking", "dull" and "pressing" distinguished best (95% cases correctly) between A delta fiber mediated (punctate pressure produced by means of von Frey hairs) and C fiber mediated (blunt pressure) pain, applied to healthy volunteers in experiment 1. The sensation was assigned to A delta fibers when "pricking" but neither "dull" nor "pressing" were chosen, and to C fibers when the sum of the selections of "dull" or "pressing" was greater than that of the selection of "pricking". In experiment 2, with an independent cohort, the three-descriptor questionnaire achieved sensitivity and specificity above 0.95 for distinguishing fiber preferential non-mechanical induced pain (laser heat, exciting A delta fibers, and 5-Hz electric stimulation, exciting C fibers). Conclusion: A three-item verbal rating test using the words "pricking", "dull", and "pressing" may provide sufficient information to characterize a pain sensation evoked by a physical stimulus as transmitted via A delta or via C fibers. It meets the criteria of a screening test by being easy to administer, taking little time, being comfortable in handling, and inexpensive while providing high specificity for relevant information.
Effect sizes in experimental pain produced by gender, genetic variants and sensitization procedures
(2011)
Background: Various effects on pain have been reported with respect to their statistical significance, but a standardized measure of effect size has been rarely added. Such a measure would ease comparison of the magnitude of the effects across studies, for example the effect of gender on heat pain with the effect of a genetic variant on pressure pain. Methodology/Principal Findings: Effect sizes on pain thresholds to stimuli consisting of heat, cold, blunt pressure, punctuate pressure and electrical current, administered to 125 subjects, were analyzed for 29 common variants in eight human genes reportedly modulating pain, gender and sensitization procedures using capsaicin or menthol. The genotype explained 0–5.9% of the total interindividual variance in pain thresholds to various stimuli and produced mainly small effects (Cohen's d 0–1.8). The largest effect had the TRPA1 rs13255063T/rs11988795G haplotype explaining >5% of the variance in electrical pain thresholds and conferring lower pain sensitivity to homozygous carriers. Gender produced larger effect sizes than most variant alleles (1–14.8% explained variance, Cohen's d 0.2–0.8), with higher pain sensitivity in women than in men. Sensitization by capsaicin or menthol explained up to 63% of the total variance (4.7–62.8%) and produced largest effects according to Cohen's d (0.4–2.6), especially heat sensitization by capsaicin (Cohen's d = 2.6). Conclusions: Sensitization, gender and genetic variants produce effects on pain in the mentioned order of effect sizes. The present report may provide a basis for comparative discussions of factors influencing pain.
Background and Aims: Chronic infection with the hepatitis B virus (HBV) is a major health issue worldwide. Recently, single nucleotide polymorphisms (SNPs) within the human leukocyte antigen (HLA)-DP locus were identified to be associated with HBV infection in Asian populations. Most significant associations were observed for the A alleles of HLA-DPA1 rs3077 and HLA-DPB1 rs9277535, which conferred a decreased risk for HBV infection. We assessed the implications of these variants for HBV infection in Caucasians.
Methods: Two HLA-DP gene variants (rs3077 and rs9277535) were analyzed for associations with persistent HBV infection and with different clinical outcomes, i.e., inactive HBsAg carrier status versus progressive chronic HBV (CHB) infection in Caucasian patients (n = 201) and HBsAg negative controls (n = 235).
Results: The HLA-DPA1 rs3077 C allele was significantly associated with HBV infection (odds ratio, OR = 5.1, 95% confidence interval, CI: 1.9–13.7; p = 0.00093). However, no significant association was seen for rs3077 with progressive CHB infection versus inactive HBsAg carrier status (OR = 2.7, 95% CI: 0.6–11.1; p = 0.31). In contrast, HLA-DPB1 rs9277535 was not associated with HBV infection in Caucasians (OR = 0.8, 95% CI: 0.4–1.9; p = 1).
Conclusions: A highly significant association of HLA-DPA1 rs3077 with HBV infection was observed in Caucasians. However, as a differentiation between different clinical courses of HBV infection was not possible, knowledge of the HLA-DPA1 genotype cannot be translated into personalized anti-HBV therapy approaches.
Increasing evidence about the central nervous representation of pain in the brain suggests that the operculo-insular cortex is a crucial part of the pain matrix. The pain-specificity of a brain region may be tested by administering nociceptive stimuli while controlling for unspecific activations by administering non-nociceptive stimuli. We applied this paradigm to nasal chemosensation, delivering trigeminal or olfactory stimuli, to verify the pain-specificity of the operculo-insular cortex. In detail, brain activations due to intranasal stimulation induced by non-nociceptive olfactory stimuli of hydrogen sulfide (5 ppm) or vanillin (0.8 ppm) were used to mask brain activations due to somatosensory, clearly nociceptive trigeminal stimulations with gaseous carbon dioxide (75% v/v). Functional magnetic resonance (fMRI) images were recorded from 12 healthy volunteers in a 3T head scanner during stimulus administration using an event-related design. We found that significantly more activations following nociceptive than non-nociceptive stimuli were localized bilaterally in two restricted clusters in the brain containing the primary and secondary somatosensory areas and the insular cortices consistent with the operculo-insular cortex. However, these activations completely disappeared when eliminating activations associated with the administration of olfactory stimuli, which were small but measurable. While the present experiments verify that the operculo-insular cortex plays a role in the processing of nociceptive input, they also show that it is not a pain-exclusive brain region and allow, in the experimental context, for the interpretation that the operculo-insular cortex splay a major role in the detection of and responding to salient events, whether or not these events are nociceptive or painful.
The manifestation of chronic back pain depends on structural, psychosocial, occupational and genetic influences. Heritability estimates for back pain range from 30% to 45%. Genetic influences are caused by genes affecting intervertebral disc degeneration or the immune response and genes involved in pain perception, signalling and psychological processing. This inter-individual variability which is partly due to genetic differences would require an individualized pain management to prevent the transition from acute to chronic back pain or improve the outcome. The genetic profile may help to define patients at high risk for chronic pain. We summarize genetic factors that (i) impact on intervertebral disc stability, namely Collagen IX, COL9A3, COL11A1, COL11A2, COL1A1, aggrecan (AGAN), cartilage intermediate layer protein, vitamin D receptor, metalloproteinsase-3 (MMP3), MMP9, and thrombospondin-2, (ii) modify inflammation, namely interleukin-1 (IL-1) locus genes and IL-6 and (iii) and pain signalling namely guanine triphosphate (GTP) cyclohydrolase 1, catechol-O-methyltransferase, μ opioid receptor (OPMR1), melanocortin 1 receptor (MC1R), transient receptor potential channel A1 and fatty acid amide hydrolase and analgesic drug metabolism (cytochrome P450 [CYP]2D6, CYP2C9).
High glucosylceramides and low anandamide contribute to sensory loss and pain in Parkinson's disease
(2020)
Background: Parkinson's disease (PD) causes chronic pain in two‐thirds of patients, in part originating from sensory neuropathies. The aim of the present study was to describe the phenotype of PD‐associated sensory neuropathy and to evaluate its associations with lipid allostasis, the latter motivated by recent genetic studies associating mutations of glucocerebrosidase with PD onset and severity. Glucocerebrosidase catalyzes the metabolism of glucosylceramides.
Methods: We used quantitative sensory tests, pain ratings, and questionnaires and analyzed plasma levels of multiple bioactive lipid species using targeted lipidomic analyses. The study comprised 2 sets of patients and healthy controls: the first 128 Israeli PD patients and 224 young German healthy controls for exploration, the second 50/50 German PD patients and matched healthy controls for deeper analyses.
Results: The data showed a 70% prevalence of PD pain and sensory neuropathies with a predominant phenotype of thermal sensory loss plus mechanical hypersensitivity. Multivariate analyses of lipids revealed major differences between PD patients and healthy controls, mainly originating from glucosylceramides and endocannabinoids. Glucosylceramides were increased, whereas anandamide and lysophosphatidic acid 20:4 were reduced, stronger in patients with ongoing pain and with a linear relationship with pain intensity and sensory losses, particularly for glucosylceramide 18:1 and glucosylceramide 24:1.
Conclusions: Our data suggest that PD‐associated sensory neuropathies and PD pain are in part caused by accumulations of glucosylceramides, raising the intriguing possibility of reducing PD pain and sensory loss by glucocerebrosidase substituting or refolding approaches. © 2020 The Authors. Movement Disorders published by Wiley Periodicals LLC on behalf of International Parkinson and Movement Disorder Society.
Dysregulation of lysophosphatidic acids in multiple sclerosis and autoimmune encephalomyelitis
(2017)
Bioactive lipids contribute to the pathophysiology of multiple sclerosis. Here, we show that lysophosphatidic acids (LPAs) are dysregulated in multiple sclerosis (MS) and are functionally relevant in this disease. LPAs and autotaxin, the major enzyme producing extracellular LPAs, were analyzed in serum and cerebrospinal fluid in a cross-sectional population of MS patients and were compared with respective data from mice in the experimental autoimmune encephalomyelitis (EAE) model, spontaneous EAE in TCR1640 mice, and EAE in Lpar2 -/- mice. Serum LPAs were reduced in MS and EAE whereas spinal cord LPAs in TCR1640 mice increased during the ‘symptom-free’ intervals, i.e. on resolution of inflammation during recovery hence possibly pointing to positive effects of brain LPAs during remyelination as suggested in previous studies. Peripheral LPAs mildly re-raised during relapses but further dropped in refractory relapses. The peripheral loss led to a redistribution of immune cells from the spleen to the spinal cord, suggesting defects of lymphocyte homing. In support, LPAR2 positive T-cells were reduced in EAE and the disease was intensified in Lpar2 deficient mice. Further, treatment with an LPAR2 agonist reduced clinical signs of relapsing-remitting EAE suggesting that the LPAR2 agonist partially compensated the endogenous loss of LPAs and implicating LPA signaling as a novel treatment approach.
Based on increasing evidence suggesting that MS pathology involves alterations in bioactive lipid metabolism, the present analysis was aimed at generating a complex serum lipid-biomarker. Using unsupervised machine-learning, implemented as emergent self-organizing maps of neuronal networks, swarm intelligence and Minimum Curvilinear Embedding, a cluster structure was found in the input data space comprising serum concentrations of d = 43 different lipid-markers of various classes. The structure coincided largely with the clinical diagnosis, indicating that the data provide a basis for the creation of a biomarker (classifier). This was subsequently assessed using supervised machine-learning, implemented as random forests and computed ABC analysis-based feature selection. Bayesian statistics-based biomarker creation was used to map the diagnostic classes of either MS patients (n = 102) or healthy subjects (n = 301). Eight lipid-markers passed the feature selection and comprised GluCerC16, LPA20:4, HETE15S, LacCerC24:1, C16Sphinganine, biopterin and the endocannabinoids PEA and OEA. A complex classifier or biomarker was developed that predicted MS at a sensitivity, specificity and accuracy of approximately 95% in training and test data sets, respectively. The present successful application of serum lipid marker concentrations to MS data is encouraging for further efforts to establish an MS biomarker based on serum lipidomics.
Background: Cannabis proofed to be effective in pain relief, but one major side effect is its influence on memory in humans. Therefore, the role of memory on central processing of nociceptive information was investigated in healthy volunteers.
Methods: In a placebo-controlled cross-over study including 22 healthy subjects, the effect of 20 mg oral Δ9-tetrahydrocannabinol (THC) on memory involving nociceptive sensations was studied, using a delayed stimulus discrimination task (DSDT). To control for nociceptive specificity, a similar DSDT-based study was performed in a subgroup of thirteen subjects, using visual stimuli.
Results: For each nociceptive stimulus pair, the second stimulus was associated with stronger and more extended brain activations than the first stimulus. These differences disappeared after THC administration. The THC effects were mainly located in two clusters comprising the insula and inferior frontal cortex in the right hemisphere, and the caudate nucleus and putamen bilaterally. These cerebral effects were accompanied in the DSDT by a significant reduction of correct ratings from 41.61% to 37.05% after THC administration (rm-ANOVA interaction "drug" by "measurement": F (1,21) = 4.685, p = 0.042). Rating performance was also reduced for the visual DSDT (69.87% to 54.35%; rm-ANOVA interaction of "drug" by "measurement": F (1,12) = 13.478, p = 0.003) and reflected in a reduction of stimulus-related brain deactivations in the bilateral angular gyrus.
Conclusions: Results suggest that part of the effect of THC on pain may be related to memory effects. THC reduced the performance in DSDT of nociceptive and visual stimuli, which was accompanied by significant effects on brain activations. However, a pain specificity of these effects cannot be deduced from the data presented.
An important measure in pain research is the intensity of nociceptive stimuli and their cortical representation. However, there is evidence of different cerebral representations of nociceptive stimuli, including the fact that cortical areas recruited during processing of intranasal nociceptive chemical stimuli included those outside the traditional trigeminal areas. Therefore, the aim of this study was to investigate the major cerebral representations of stimulus intensity associated with intranasal chemical trigeminal stimulation. Trigeminal stimulation was achieved with carbon dioxide presented to the nasal mucosa. Using a single‐blinded, randomized crossover design, 24 subjects received nociceptive stimuli with two different stimulation paradigms, depending on the just noticeable differences in the stimulus strengths applied. Stimulus‐related brain activations were recorded using functional magnetic resonance imaging with event‐related design. Brain activations increased significantly with increasing stimulus intensity, with the largest cluster at the right Rolandic operculum and a global maximum in a smaller cluster at the left lower frontal orbital lobe. Region of interest analyses additionally supported an activation pattern correlated with the stimulus intensity at the piriform cortex as an area of special interest with the trigeminal input. The results support the piriform cortex, in addition to the secondary somatosensory cortex, as a major area of interest for stimulus strength‐related brain activation in pain models using trigeminal stimuli. This makes both areas a primary objective to be observed in human experimental pain settings where trigeminal input is used to study effects of analgesics.
Diminished sense of smell impairs the quality of life but olfactorily disabled people are hardly considered in measures of disability inclusion. We aimed to stratify perceptual characteristics and odors according to the extent to which they are perceived differently with reduced sense of smell, as a possible basis for creating olfactory experiences that are enjoyed in a similar way by subjects with normal or impaired olfactory function. In 146 subjects with normal or reduced olfactory function, perceptual characteristics (edibility, intensity, irritation, temperature, familiarity, hedonics, painfulness) were tested for four sets of 10 different odors each. Data were analyzed with (i) a projection based on principal component analysis and (ii) the training of a machine-learning algorithm in a 1000-fold cross-validated setting to distinguish between olfactory diagnosis based on odor property ratings. Both analytical approaches identified perceived intensity and familiarity with the odor as discriminating characteristics between olfactory diagnoses, while evoked pain sensation and perceived temperature were not discriminating, followed by edibility. Two disjoint sets of odors were identified, i.e., d = 4 “discriminating odors” with respect to olfactory diagnosis, including cis-3-hexenol, methyl salicylate, 1-butanol and cineole, and d = 7 “non-discriminating odors”, including benzyl acetate, heptanal, 4-ethyl-octanoic acid, methional, isobutyric acid, 4-decanolide and p-cresol. Different weightings of the perceptual properties of odors with normal or reduced sense of smell indicate possibilities to create sensory experiences such as food, meals or scents that by emphasizing trigeminal perceptions can be enjoyed by both normosmic and hyposmic individuals.
Purpose: The antifungal drugs ketoconazole and itraconazole reduce serum concentrations of 4β-hydroxycholesterol, which is a validated marker for hepatic cytochrome P450 (CYP) 3A4 activity. We tested the effect of another antifungal triazole agent, fluconazole, on serum concentrations of different sterols and oxysterols within the cholesterol metabolism to see if this inhibitory reaction is a general side effect of azole antifungal agents.
Methods: In a prospective, double-blind, placebo-controlled, two-way crossover design, we studied 17 healthy subjects (nine men, eight women) who received 400 mg fluconazole or placebo daily for 8 days. On day 1 before treatment and on day 8 after the last dose, fasting blood samples were collected. Serum cholesterol precursors and oxysterols were measured by gas chromatography-mass spectrometry-selected ion monitoring and expressed as the ratio to cholesterol (R_sterol).
Results: Under fluconazole treatment, serum R_lanosterol and R_24,25-dihydrolanosterol increased significantly without affecting serum cholesterol or metabolic downstream markers of hepatic cholesterol synthesis. Serum R_4β-, R_24S-, and R_27-hydroxycholesterol increased significantly.
Conclusion: Fluconazole inhibits the 14α-demethylation of lanosterol and 24,25-dihydrolanosterol, regulated by CYP51A1, without reduction of total cholesterol synthesis. The increased serum level of R_4β-hydroxycholesterol under fluconazole treatment is in contrast to the reductions observed under ketoconazole and itraconazole treatments. The question, whether this increase is caused by induction of CYP3A4 or by inhibition of the catabolism of 4β-hydroxycholesterol, must be answered by mechanistic in vitro and in vivo studies comparing effects of various azole antifungal agents on hepatic CYP3A4 activity.
Background: Persistent postsurgical neuropathic pain (PPSNP) can occur after intraoperative damage to somatosensory nerves, with a prevalence of 29–57% in breast cancer surgery. Proteomics is an active research field in neuropathic pain and the first results support its utility for establishing diagnoses or finding therapy strategies. Methods: 57 women (30 non-PPSNP/27 PPSNP) who had experienced a surgeon-verified intercostobrachial nerve injury during breast cancer surgery, were examined for patterns in 74 serum proteomic markers that allowed discrimination between subgroups with or without PPSNP. Serum samples were obtained both before and after surgery. Results: Unsupervised data analyses, including principal component analysis and self-organizing maps of artificial neurons, revealed patterns that supported a data structure consistent with pain-related subgroup (non-PPSPN vs. PPSNP) separation. Subsequent supervised machine learning-based analyses revealed 19 proteins (CD244, SIRT2, CCL28, CXCL9, CCL20, CCL3, IL.10RA, MCP.1, TRAIL, CCL25, IL10, uPA, CCL4, DNER, STAMPB, CCL23, CST5, CCL11, FGF.23) that were informative for subgroup separation. In cross-validated training and testing of six different machine-learned algorithms, subgroup assignment was significantly better than chance, whereas this was not possible when training the algorithms with randomly permuted data or with the protein markers not selected. In particular, sirtuin 2 emerged as a key protein, presenting both before and after breast cancer treatments in the PPSNP compared with the non-PPSNP subgroup. Conclusions: The identified proteins play important roles in immune processes such as cell migration, chemotaxis, and cytokine-signaling. They also have considerable overlap with currently known targets of approved or investigational drugs. Taken together, several lines of unsupervised and supervised analyses pointed to structures in serum proteomics data, obtained before and after breast cancer surgery, that relate to neuroinflammatory processes associated with the development of neuropathic pain after an intraoperative nerve lesion.
Background: Persistent pain in breast cancer survivors is common. Psychological and sleep-related factors modulate perception, interpretation and coping with pain and may contribute to the clinical phenotype. The present analysis pursued the hypothesis that breast cancer survivors form subgroups, based on psychological and sleep-related parameters that are relevant to the impact of pain on the patients’ life.
Methods: We analysed 337 women treated for breast cancer, in whom psychological and sleep-related parameters as well as parameters related to pain intensity and interference had been acquired. Data were analysed by using supervised and unsupervised machine-learning techniques (i) to detect patient subgroups based on the pattern of psychological or sleep-related parameters, (ii) to interpret the detected cluster structure and (iii) to relate this data structure to pain interference and impact on life.
Results: Artificial intelligence-based detection of data structure, implemented as self-organizing neuronal maps, identified two different clusters of patients. A smaller cluster (11.5% of the patients) had comparatively lower resilience, more depressive symptoms and lower extraversion than the other patients. In these patients, life-satisfaction, mood, and life in general were comparatively more impeded by persistent pain.
Conclusions: The results support the initial hypothesis that psychological and sleep-related parameter patterns are meaningful for subgrouping patients with respect to how persistent pain after breast cancer treatments interferes with their life. This indicates that management of pain should address more complex features than just pain intensity. Artificial intelligence is a useful tool in the identification of subgroups of patients based on psychological factors.
Because it is associated with central nervous changes, and olfactory dysfunction has been reported with increased prevalence among persons with diabetes, this study addressed the question of whether the risk of developing diabetes in the next 10 years is reflected in olfactory symptoms. In a cross-sectional study, in 164 individuals seeking medical consulting for possible diabetes, olfactory function was evaluated using a standardized clinical test assessing olfactory threshold, odor discrimination, and odor identification. Metabolomics parameters were assessed via blood concentrations. The individual diabetes risk was quantified according to the validated German version of the “FINDRISK” diabetes risk score. Machine learning algorithms trained with metabolomics patterns predicted low or high diabetes risk with a balanced accuracy of 63–75%. Similarly, olfactory subtest results predicted the olfactory dysfunction category with a balanced accuracy of 85–94%, occasionally reaching 100%. However, olfactory subtest results failed to improve the prediction of diabetes risk based on metabolomics data, and metabolomics data did not improve the prediction of the olfactory dysfunction category based on olfactory subtest results. Results of the present study suggest that olfactory function is not a useful predictor of diabetes.
The single nucleotide polymorphism 118A>G of the human micro-opioid receptor gene OPRM1, which leads to an exchange of the amino acid asparagine (N) to aspartic acid (D) at position 40 of the extracellular receptor region, alters the in vivo effects of opioids to different degrees in pain-processing brain regions. The most pronounced N40D effects were found in brain regions involved in the sensory processing of pain intensity. Using the mu-opioid receptor-specific agonist DAMGO, we analyzed the micro-opioid receptor signaling, expression, and binding affinity in human brain tissue sampled postmortem from the secondary somatosensory area (SII) and from the ventral posterior part of the lateral thalamus, two regions involved in the sensory processing and transmission of nociceptive information. We show that the main effect of the N40D micro-opioid receptor variant is a reduction of the agonist-induced receptor signaling efficacy. In the SII region of homo- and heterozygous carriers of the variant 118G allele (n=18), DAMGO was only 62% as efficient (p=0.002) as in homozygous carriers of the wild-type 118A allele (n=15). In contrast, the number of [3H]DAMGO binding sites was unaffected. Hence, the micro-opioid receptor G-protein coupling efficacy in SII of carriers of the 118G variant was only 58% as efficient as in homozygous carriers of the 118A allele (p<0.001). The thalamus was unaffected by the OPRM1 118A>G SNP. In conclusion, we provide a molecular basis for the reduced clinical effects of opioid analgesics in carriers of mu-opioid receptor variant N40D.
The use of artificial intelligence (AI) systems in biomedical and clinical settings can disrupt the traditional doctor–patient relationship, which is based on trust and transparency in medical advice and therapeutic decisions. When the diagnosis or selection of a therapy is no longer made solely by the physician, but to a significant extent by a machine using algorithms, decisions become nontransparent. Skill learning is the most common application of machine learning algorithms in clinical decision making. These are a class of very general algorithms (artificial neural networks, classifiers, etc.), which are tuned based on examples to optimize the classification of new, unseen cases. It is pointless to ask for an explanation for a decision. A detailed understanding of the mathematical details of an AI algorithm may be possible for experts in statistics or computer science. However, when it comes to the fate of human beings, this “developer’s explanation” is not sufficient. The concept of explainable AI (XAI) as a solution to this problem is attracting increasing scientific and regulatory interest. This review focuses on the requirement that XAIs must be able to explain in detail the decisions made by the AI to the experts in the field.
Background: The categorization of individuals as normosmic, hyposmic, or anosmic from test results of odor threshold, discrimination, and identification may provide a limited view of the sense of smell. The purpose of this study was to expand the clinical diagnostic repertoire by including additional tests. Methods: A random cohort of n = 135 individuals (83 women and 52 men, aged 21 to 94 years) was tested for odor threshold, discrimination, and identification, plus a distance test, in which the odor of peanut butter is perceived, a sorting task of odor dilutions for phenylethyl alcohol and eugenol, a discrimination test for odorant enantiomers, a lateralization test with eucalyptol, a threshold assessment after 10 min of exposure to phenylethyl alcohol, and a questionnaire on the importance of olfaction. Unsupervised methods were used to detect structure in the olfaction-related data, followed by supervised feature selection methods from statistics and machine learning to identify relevant variables. Results: The structure in the olfaction-related data divided the cohort into two distinct clusters with n = 80 and 55 subjects. Odor threshold, discrimination, and identification did not play a relevant role for cluster assignment, which, on the other hand, depended on performance in the two odor dilution sorting tasks, from which cluster assignment was possible with a median 100-fold cross-validated balanced accuracy of 77–88%. Conclusions: The addition of an odor sorting task with the two proposed odor dilutions to the odor test battery expands the phenotype of olfaction and fits seamlessly into the sensory focus of standard test batteries.
Internalin B–mediated activation of the membrane-bound receptor tyrosine kinase MET is accompanied by a change in receptor mobility. Conversely, it should be possible to infer from receptor mobility whether a cell has been treated with internalin B. Here, we propose a method based on hidden Markov modeling and explainable artificial intelligence that machine-learns the key differences in MET mobility between internalin B–treated and –untreated cells from single-particle tracking data. Our method assigns receptor mobility to three diffusion modes (immobile, slow, and fast). It discriminates between internalin B–treated and –untreated cells with a balanced accuracy of >99% and identifies three parameters that are most affected by internalin B treatment: a decrease in the mobility of slow molecules (1) and a depopulation of the fast mode (2) caused by an increased transition of fast molecules to the slow mode (3). Our approach is based entirely on free software and is readily applicable to the analysis of other membrane receptors.
Selecting the k best features is a common task in machine learning. Typically, a few features have high importance, but many have low importance (right-skewed distribution). This report proposes a numerically precise method to address this skewed feature importance distribution in order to reduce a feature set to the informative minimum of items. Computed ABC analysis (cABC) is an item categorization method that aims to identify the most important items by partitioning a set of non-negative numerical items into subsets "A", "B", and "C" such that subset "A" contains the "few important" items based on specific properties of ABC curves defined by their relationship to Lorenz curves. In its recursive form, the cABC analysis can be applied again to subset "A". A generic image dataset and three biomedical datasets (lipidomics and two genomics datasets) with a large number of variables were used to perform the experiments. The experimental results show that the recursive cABC analysis limits the dimensions of the data projection to a minimum where the relevant information is still preserved and directs the feature selection in machine learning to the most important class-relevant information, including filtering feature sets for nonsense variables. Feature sets were reduced to 10% or less of the original variables and still provided accurate classification in data not used for feature selection. cABC analysis, in its recursive variant, provides a computationally precise means of reducing information to a minimum. The minimum is the result of a computation of the number of k most relevant items, rather than a decision to select the k best items from a list. In addition, there are precise criteria for stopping the reduction process. The reduction to the most important features can improve the human understanding of the properties of the data set. The cABC method is implemented in the Python package "cABCanalysis" available at https://pypi.org/project/cABCanalysis/.
Background: The opioid system is involved in the control of pain, reward, addictive behaviors and vegetative effects. Opioids exert their pharmacological actions through the agonistic binding at opioid receptors and variation in the coding genes has been found to modulate opioid receptor expression or signaling. However, a limited selection of functional opioid receptor variants is perceived as insufficient in providing a genetic diagnosis of clinical phenotypes and therefore, unrestricted access to opioid receptor genetics is required.
Methods: Next-generation sequencing (NGS) workflow was based on a custom AmpliSeq™ panel and designed for sequencing of human genes related to the opioid receptor group (OPRM1, OPRD1, OPRK1, SIGMA1, OPRL1) on an Ion PGM™ Sequencer. A cohort of 79 previously studied chronic pain patients was screened to evaluate and validate the detection of exomic sequences of the coding genes with 25 base pair exon padding. In-silico analysis was performed using SNP and Variation Suite® software.
Results: The amplicons covered approximately 90% of the target sequence. A median of 2.54 × 106 reads per run was obtained generating a total of 35,447 nucleotide reads from each DNA sample. This identified approximately 100 chromosome loci where nucleotides deviated from the reference sequence GRCh37 hg19, including functional variants such as the OPRM1 rs1799971 SNP (118 A > G) as the most scientifically regarded variant or rs563649 SNP coding for μ-opioid receptor splice variants. Correspondence between NGS and Sanger derived nucleotide sequences was 100%.
Conclusion: Results suggested that the NGS approach based on AmpliSeq™ libraries and Ion PGM sequencing is a highly efficient mutation detection method. It is suitable for large-scale sequencing of opioid receptor genes. The method includes the variants studied so far for functional associations and adds a large amount of genetic information as a basis for complete analysis of human opioid receptor genetics and its functional consequences.
Recent advances in mathematical modelling and artificial intelligence have challenged the use of traditional regression analysis in biomedical research. This study examined artificial and cancer research data using binomial and multinomial logistic regression and compared its performance with other machine learning models such as random forests, support vector machines, Bayesian classifiers, k-nearest neighbours and repeated incremental clipping (RIPPER). The alternative models often outperformed regression in accurately classifying new cases. Logistic regression had a structural problem similar to early single-layer neural networks, which limited its ability to identify variables with high statistical significance for reliable class assignment. Therefore, regression is not always the best model for class prediction in biomedical datasets. The study emphasises the importance of validating selected models and suggests that a mixture of experts approach may be a more advanced and effective strategy for analysing biomedical datasets.
Feature selection is a common step in data preprocessing that precedes machine learning to reduce data space and the computational cost of processing or obtaining the data. Filtering out uninformative variables is also important for knowledge discovery. By reducing the data space to only those components that are informative to the class structure, feature selection can simplify models so that they can be more easily interpreted by researchers in the field, reminiscent of explainable artificial intelligence. Knowledge discovery in complex data thus benefits from feature selection that aims to understand feature sets in the thematic context from which the data set originates. However, a single variable selected from a very small number of variables that are technically sufficient for AI training may make little immediate thematic sense, whereas the additional consideration of a variable discarded during feature selection could make scientific discovery very explicit. In this report, we propose an approach to explainable feature selection (XFS) based on a systematic reconsideration of unselected features. The difference between the respective classifications when training the algorithms with the selected features or with the unselected features provides a valid estimate of whether the relevant features in a data set have been selected and uninformative or trivial information was filtered out. It is shown that revisiting originally unselected variables in multivariate data sets allows for the detection of pathologies and errors in the feature selection that occasionally resulted in the failure to identify the most appropriate variables.
In a recent discussion on how to deal with data analysis issues initiated by reviewers of pain-related scientific manuscripts in the European Journal of Pain, a seemingly simple statistical issue was raised: two subsets of data in a paper had the same mean and standard deviation. A reviewer asked for a statistical test for or against the identity of the subset distributions. The authors insisted that if the mean and standard deviation were the same, this was sufficient evidence that the subsets of data were not significantly different.
This prompted a discussion among pain researchers, who are not necessarily primarily from the field of data science, a discussion of the importance of carefully examining the distribution of pain-related data in a journal whose primary audience is pain researchers seems warranted...
Sex differences in pain perception have been extensively studied, but precision medicine applications such as sex-specific pain pharmacology have barely progressed beyond proof-of-concept. A data set of pain thresholds to mechanical (blunt and punctate pressure) and thermal (heat and cold) stimuli applied to non-sensitized and sensitized (capsaicin, menthol) forearm skin of 69 male and 56 female healthy volunteers was analyzed for data structures contingent with the prior sex structure using unsupervised and supervised approaches. A working hypothesis that the relevance of sex differences could be approached via reversibility of the association, i.e., sex should be identifiable from pain thresholds, was verified with trained machine learning algorithms that could infer a person's sex in a 20% validation sample not seen to the algorithms during training, with balanced accuracy of up to 79%. This was only possible with thresholds for mechanical stimuli, but not for thermal stimuli or sensitization responses, which were not sufficient to train an algorithm that could assign sex better than by guessing or when trained with nonsense (permuted) information. This enabled the translation to the molecular level of nociceptive targets that convert mechanical but not thermal information into signals interpreted as pain, which could eventually be used for pharmacological precision medicine approaches to pain. By exploiting a key feature of machine learning, which allows for the recognition of data structures and the reduction of information to the minimum relevant, experimental human pain data could be characterized in a way that incorporates "non" logic that could be translated directly to the molecular pharmacological level, pointing toward sex-specific precision medicine for pain.