Refine
Year of publication
Document Type
- Article (75)
Has Fulltext
- yes (75)
Is part of the Bibliography
- no (75)
Keywords
- data science (19)
- machine-learning (7)
- Data science (6)
- pain (6)
- Machine-learning (5)
- artificial intelligence (5)
- digital medicine (5)
- machine learning (5)
- patients (5)
- Pain (3)
Institute
Knowledge discovery in biomedical data using supervised methods assumes that the data contain structure relevant to the class structure if a classifier can be trained to assign a case to the correct class better than by guessing. In this setting, acceptance or rejection of a scientific hypothesis may depend critically on the ability to classify cases better than randomly, without high classification performance being the primary goal. Random forests are often chosen for knowledge-discovery tasks because they are considered a powerful classifier that does not require sophisticated data transformation or hyperparameter tuning and can be regarded as a reference classifier for tabular numerical data. Here, we report a case where the failure of random forests using the default hyperparameter settings in the standard implementations of R and Python would have led to the rejection of the hypothesis that the data contained structure relevant to the class structure. After tuning the hyperparameters, classification performance increased from 56% to 65% balanced accuracy in R, and from 55% to 67% balanced accuracy in Python. More importantly, the 95% confidence intervals in the tuned versions were to the right of the value of 50% that characterizes guessing-level classification. Thus, tuning provided the desired evidence that the data structure supported the class structure of the data set. In this case, the tuning made more than a quantitative difference in the form of slightly better classification accuracy, but significantly changed the interpretation of the data set. This is especially true when classification performance is low and a small improvement increases the balanced accuracy to over 50% when guessing.
Background and Aims: Chronic infection with the hepatitis B virus (HBV) is a major health issue worldwide. Recently, single nucleotide polymorphisms (SNPs) within the human leukocyte antigen (HLA)-DP locus were identified to be associated with HBV infection in Asian populations. Most significant associations were observed for the A alleles of HLA-DPA1 rs3077 and HLA-DPB1 rs9277535, which conferred a decreased risk for HBV infection. We assessed the implications of these variants for HBV infection in Caucasians.
Methods: Two HLA-DP gene variants (rs3077 and rs9277535) were analyzed for associations with persistent HBV infection and with different clinical outcomes, i.e., inactive HBsAg carrier status versus progressive chronic HBV (CHB) infection in Caucasian patients (n = 201) and HBsAg negative controls (n = 235).
Results: The HLA-DPA1 rs3077 C allele was significantly associated with HBV infection (odds ratio, OR = 5.1, 95% confidence interval, CI: 1.9–13.7; p = 0.00093). However, no significant association was seen for rs3077 with progressive CHB infection versus inactive HBsAg carrier status (OR = 2.7, 95% CI: 0.6–11.1; p = 0.31). In contrast, HLA-DPB1 rs9277535 was not associated with HBV infection in Caucasians (OR = 0.8, 95% CI: 0.4–1.9; p = 1).
Conclusions: A highly significant association of HLA-DPA1 rs3077 with HBV infection was observed in Caucasians. However, as a differentiation between different clinical courses of HBV infection was not possible, knowledge of the HLA-DPA1 genotype cannot be translated into personalized anti-HBV therapy approaches.
Persistent and, in particular, neuropathic pain is a major healthcare problem with still insufficient pharmacological treatment options. This triggered research activities aimed at finding analgesics with a novel mechanism of action. Results of these efforts will need to pass through the phases of drug development, in which experimental human pain models are established components e.g. implemented as chemical hyperalgesia induced by capsaicin. We aimed at ranking the various readouts of a human capsaicin–based pain model with respect to the most relevant information about the effects of a potential reference analgesic. In a placebo‐controlled, randomized cross‐over study, seven different pain‐related readouts were acquired in 16 healthy individuals before and after oral administration of 300 mg pregabalin. The sizes of the effect on pain induced by intradermal injection of capsaicin were quantified by calculating Cohen's d. While in four of the seven pain‐related parameters, pregabalin provided a small effect judged by values of Cohen's d exceeding 0.2, an item categorization technique implemented as computed ABC analysis identified the pain intensities in the area of secondary hyperalgesia and of allodynia as the most suitable parameters to quantify the analgesic effects of pregabalin. Results of this study provide further support for the ability of the intradermal capsaicin pain model to show analgesic effects of pregabalin. Results can serve as a basis for the designs of studies where the inclusion of this particular pain model and pregabalin is planned.
The Gini index is a measure of the inequality of a distribution that can be derived from Lorenz curves. While commonly used in, e.g., economic research, it suffers from ambiguity via lack of Lorenz dominance preservation. Here, investigation of large sets of empirical distributions of incomes of the World’s countries over several years indicated firstly, that the Gini indices are centered on a value of 33.33% corresponding to the Gini index of the uniform distribution and secondly, that the Lorenz curves of these distributions are consistent with Lorenz curves of log-normal distributions. This can be employed to provide a Lorenz dominance preserving equivalent of the Gini index. Therefore, a modified measure based on log-normal approximation and standardization of Lorenz curves is proposed. The so-called UGini index provides a meaningful and intuitive standardization on the uniform distribution as this characterizes societies that provide equal chances. The novel UGini index preserves Lorenz dominance. Analysis of the probability density distributions of the UGini index of the World’s counties income data indicated multimodality in two independent data sets. Applying Bayesian statistics provided a data-based classification of the World’s countries’ income distributions. The UGini index can be re-transferred into the classical index to preserve comparability with previous research.
Based on accumulating evidence of a role of lipid signaling in many physiological and pathophysiological processes including psychiatric diseases, the present data driven analysis was designed to gather information needed to develop a prospective biomarker, using a targeted lipidomics approach covering different lipid mediators. Using unsupervised methods of data structure detection, implemented as hierarchal clustering, emergent self-organizing maps of neuronal networks, and principal component analysis, a cluster structure was found in the input data space comprising plasma concentrations of d = 35 different lipid-markers of various classes acquired in n = 94 subjects with the clinical diagnoses depression, bipolar disorder, ADHD, dementia, or in healthy controls. The structure separated patients with dementia from the other clinical groups, indicating that dementia is associated with a distinct lipid mediator plasma concentrations pattern possibly providing a basis for a future biomarker. This hypothesis was subsequently assessed using supervised machine-learning methods, implemented as random forests or principal component analysis followed by computed ABC analysis used for feature selection, and as random forests, k-nearest neighbors, support vector machines, multilayer perceptron, and naïve Bayesian classifiers to estimate whether the selected lipid mediators provide sufficient information that the diagnosis of dementia can be established at a higher accuracy than by guessing. This succeeded using a set of d = 7 markers comprising GluCerC16:0, Cer24:0, Cer20:0, Cer16:0, Cer24:1, C16 sphinganine, and LacCerC16:0, at an accuracy of 77%. By contrast, using random lipid markers reduced the diagnostic accuracy to values of 65% or less, whereas training the algorithms with randomly permuted data was followed by complete failure to diagnose dementia, emphasizing that the selected lipid mediators were display a particular pattern in this disease possibly qualifying as biomarkers.
Background: Modulation of cortical excitability by transcranial magnetic stimulation (TMS) is used for investigating human brain functions. A common observation is the high variability of long-term depression (LTD)-like changes in human (motor) cortex excitability. This study aimed at analyzing the response subgroup distribution after paired continuous theta burst stimulation (cTBS) as a basis for subject selection.
Methods: The effects of paired cTBS using 80% active motor threshold (AMT) in 31 healthy volunteers were assessed at the primary motor cortex (M1) corresponding to the representation of the first dorsal interosseous (FDI) muscle of the left hand, before and up to 50 min after plasticity induction. The changes in motor evoked potentials (MEPs) were analyzed using machine-learning derived methods implemented as Gaussian mixture modeling (GMM) and computed ABC analysis.
Results: The probability density distribution of the MEP changes from baseline was tri-modal, showing a clear separation at 80.9%. Subjects displaying at least this degree of LTD-like changes were n = 6 responders. By contrast, n = 7 subjects displayed a paradox response with increase in MEP. Reassessment using ABC analysis as alternative approach led to the same n = 6 subjects as a distinct category.
Conclusion: Depressive effects of paired cTBS using 80% AMT endure at least 50 min, however, only in a small subgroup of healthy subjects. Hence, plasticity induction by paired cTBS might not reflect a general mechanism in human motor cortex excitability. A mathematically supported criterion is proposed to select responders for enrolment in assessments of human brain functional networks using virtual brain lesions.
A machine-learned analysis suggests non-redundant diagnostic information in olfactory subtests
(2019)
Background: The functional performance of the human sense of smell can be approached via assessment of the olfactory threshold, the ability to discriminate odors or the ability to identify odors. Contemporary clinical test batteries include all or a selection of these components, with some dissent about the required number and choice.
Methods: Olfactory thresholds, odor discrimination and odor identification scores were available from 10,714 subjects (3662 with anomia, 4299 with hyposmia, and 2752 with normal olfactory function). To assess, whether the olfactory subtests confer the same information or each subtest confers at least partly non-redundant information relevant to the olfactory diagnosis, we compared the diagnostic accuracy of supervised machine learning algorithms trained with the complete information from all three subtests with that obtained when performing the training with the information of only two or one subtests.
Results: The training of machine-learned algorithms with the full information about olfactory thresholds, odor discrimination and odor identification from 2/3 of the cases, resulted in a balanced olfactory diagnostic accuracy of 98% or better in the 1/3 remaining cases. The most pronounced decrease in the balanced accuracy, to approximately 85%, was observed when omitting olfactory thresholds from the training, whereas omitting odor discrimination or identification was associated with smaller decreases (balanced accuracies approximately 90%).
Conclusions: Results support partly non-redundant contributions of each olfactory subtest to the clinical olfactory diagnosis. Olfactory thresholds provided the largest amount of non-redundant information to the olfactory diagnosis.
Background: Human genetic research has implicated functional variants of more than one hundred genes in the modulation of persisting pain. Artificial intelligence and machine‐learning techniques may combine this knowledge with results of genetic research gathered in any context, which permits the identification of the key biological processes involved in chronic sensitization to pain.
Methods: Based on published evidence, a set of 110 genes carrying variants reported to be associated with modulation of the clinical phenotype of persisting pain in eight different clinical settings was submitted to unsupervised machine‐learning aimed at functional clustering. Subsequently, a mathematically supported subset of genes, comprising those most consistently involved in persisting pain, was analysed by means of computational functional genomics in the Gene Ontology knowledgebase.
Results: Clustering of genes with evidence for a modulation of persisting pain elucidated a functionally heterogeneous set. The situation cleared when the focus was narrowed to a genetic modulation consistently observed throughout several clinical settings. On this basis, two groups of biological processes, the immune system and nitric oxide signalling, emerged as major players in sensitization to persisting pain, which is biologically highly plausible and in agreement with other lines of pain research.
Conclusions: The present computational functional genomics‐based approach provided a computational systems‐biology perspective on chronic sensitization to pain. Human genetic control of persisting pain points to the immune system as a source of potential future targets for drugs directed against persisting pain. Contemporary machine‐learned methods provide innovative approaches to knowledge discovery from previous evidence.
Significance: We show that knowledge discovery in genetic databases and contemporary machine‐learned techniques can identify relevant biological processes involved in Persitent pain.
Motivation: Calculating the magnitude of treatment effects or of differences between two groups is a common task in quantitative science. Standard effect size measures based on differences, such as the commonly used Cohen's, fail to capture the treatment-related effects on the data if the effects were not reflected by the central tendency. The present work aims at (i) developing a non-parametric alternative to Cohen’s d, which (ii) circumvents some of its numerical limitations and (iii) involves obvious changes in the data that do not affect the group means and are therefore not captured by Cohen’s d.
Results: We propose "Impact” as a novel non-parametric measure of effect size obtained as the sum of two separate components and includes (i) a difference-based effect size measure implemented as the change in the central tendency of the group-specific data normalized to pooled variability and (ii) a data distribution shape-based effect size measure implemented as the difference in probability density of the group-specific data. Results obtained on artificial and empirical data showed that “Impact”is superior to Cohen's d by its additional second component in detecting clearly visible effects not reflected in central tendencies. The proposed effect size measure is invariant to the scaling of the data, reflects changes in the central tendency in cases where differences in the shape of probability distributions between subgroups are negligible, but captures changes in probability distributions as effects and is numerically stable even if the variances of the data set or its subgroups disappear.
Conclusions: The proposed effect size measure shares the ability to observe such an effect with machine learning algorithms. Therefore, the proposed effect size measure is particularly well suited for data science and artificial intelligence-based knowledge discovery from big and heterogeneous data.