OPUS 4 | Search

Uncertainty about the war in Ukraine: measurement and effects on the German business cycle (2023)

Grebe, Moritz ; Kandemir, Sinem ; Tillmann, Peter

We assemble a data set of more than eight million German Twitter posts related to the war in Ukraine. Based on state-of-the-art methods of text analysis, we construct a daily index of uncertainty about the war as perceived by German Twitter. The approach also allows us to separate this index into uncertainty about sanctions against Russia, energy policy and other dimensions. We then estimate a VAR model with daily financial and macroeconomic data and identify an exogenous uncertainty shock. The increase in uncertainty has strong effects on financial markets and causes a significant decline in economic activity as well as an increase in expected inflation. We find the effects of uncertainty to be particularly strong in the first months of the war.

Analysis of machine learning prediction quality for automated subgroups within the MIMIC III dataset (2023)

Vanek, Jakob

The motivation for this master’s thesis is to explore the potential of predictive data analytics in the field of medicine. For this, the MIMIC-III dataset offers an extensive foundation for the construction of prediction models, including Random Forest, XGBOOST, and deep learning networks. These models were implemented to forecast the mortality of 2,655 stroke patients. The first part of the thesis involved conducting a comprehensive data analysis of the filtered MIMIC-III dataset. Subsequently, the effectiveness and fairness of the predictive models were evaluated. Although the performance levels of the developed models did not match those reported in related research, their potential became evident. The results obtained demonstrated promising capabilities and highlighted the effectiveness of the applied methodologies. Moreover, the feature relevance within the XGBOOST model was examined to increase model explainability. Finally, relevant subgroups were identified to perform a comparative analysis of the prediction performance across these subgroups. While this approach can be regarded as a valuable methodology, it was not possible to investigate underlying reasons for potential unfairness across clusters. Inside the test data, not enough instances remained per subgroup for further fairness or feature relevance analysis. In conclusion, the implementation of an alternative use case with a higher patient count is recommended. The code for this analysis is made available via a GitHub repository and includes a frontend to visualize the results.

Few temporally distributed brain connectivity states predict human cognitive abilities (2023)

Wehrheim, Maren H. ; Faskowitz, Joshua ; Sporns, Olaf ; Fiebach, Christian ; Kaschube, Matthias ; Hilger, Kirsten

Human functional brain connectivity can be temporally decomposed into states of high and low cofluctuation, defined as coactivation of brain regions over time. Rare states of particularly high cofluctuation have been shown to reflect fundamentals of intrinsic functional network architecture and to be highly subject-specific. However, it is unclear whether such network-defining states also contribute to individual variations in cognitive abilities – which strongly rely on the interactions among distributed brain regions. By introducing CMEP, a new eigenvector-based prediction framework, we show that as few as 16 temporally separated time frames (< 1.5% of 10min resting-state fMRI) can significantly predict individual differences in intelligence (N = 263, p < .001). Against previous expectations, individual’s network-defining time frames of particularly high cofluctuation do not predict intelligence. Multiple functional brain networks contribute to the prediction, and all results replicate in an independent sample (N = 831). Our results suggest that although fundamentals of person-specific functional connectomes can be derived from few time frames of highest connectivity, temporally distributed information is necessary to extract information about cognitive abilities. This information is not restricted to specific connectivity states, like network-defining high-cofluctuation states, but rather reflected across the entire length of the brain connectivity time series.

Classifying the activity states of small vertebrates using automated VHF telemetry (2022)

Gottwald, Jannis ; Royauté, Raphaël ; Becker, Marcel ; Geitz, Tobias ; Höchst, Jonas ; Lampe, Patrick ; Leister, Lea ; Lindner, Kim ; Maier, Julia ; Rösner, Sascha ; Schabo, Dana G. ; Freisleben, Bernd ; Brandl, Roland ; Mueller, Thomas ; Farwig, Nina ; Nauss, Thomas

he most basic behavioural states of animals can be described as active or passive. While high-resolution observations of activity patterns can provide insights into the ecology of animal species, few methods are able to measure the activity of individuals of small taxa in their natural environment. We present a novel approach in which a combination of automatic radiotracking and machine learning is used to distinguish between active and passive behaviour in small vertebrates fitted with lightweight transmitters (<0.4 g). We used a dataset containing >3 million signals from very-high-frequency (VHF) telemetry from two forest-dwelling bat species (Myotis bechsteinii [n = 52] and Nyctalus leisleri [n = 20]) to train and test a random forest model in assigning either active or passive behaviour to VHF-tagged individuals. The generalisability of the model was demonstrated by recording and classifying the behaviour of tagged birds and by simulating the effect of different activity levels with the help of humans carrying transmitters. The model successfully classified the activity states of bats as well as those of birds and humans, although the latter were not included in model training (F1 0.96–0.98). We provide an ecological case-study demonstrating the potential of this automated monitoring tool. We used the trained models to compare differences in the daily activity patterns of two bat species. The analysis showed a pronounced bimodal activity distribution of N. leisleri over the course of the night while the night-time activity of M. bechsteinii was relatively constant. These results show that subtle differences in the timing of species' activity can be distinguished using our method. Our approach can classify VHF-signal patterns into fundamental behavioural states with high precision and is applicable to different terrestrial and flying vertebrates. To encourage the broader use of our radiotracking method, we provide the trained random forest models together with an R package that includes all necessary data processing functionalities. In combination with state-of-the-art open-source automated radiotracking, this toolset can be used by the scientific community to investigate the activity patterns of small vertebrates with high temporal resolution, even in dense vegetation.

Novel diagnostic and therapeutic options for KMT2A-rearranged acute leukemias (2022)

Lopes, Bruno A. ; Poubel, Caroline Pires ; Esteves Teixeira, Cristiane ; Caye-Eude, Aurélie ; Cavé, Hélène ; Meyer, Claus ; Marschalek, Rolf ; Boroni, Mariana ; Emerenciano, Mariana

The KMT2A (MLL) gene rearrangements (KMT2A-r) are associated with a diverse spectrum of acute leukemias. Although most KMT2A-r are restricted to nine partner genes, we have recently revealed that KMT2A-USP2 fusions are often missed during FISH screening of these genetic alterations. Therefore, complementary methods are important for appropriate detection of any KMT2A-r. Here we use a machine learning model to unravel the most appropriate markers for prediction of KMT2A-r in various types of acute leukemia. A Random Forest and LightGBM classifier was trained to predict KMT2A-r in patients with acute leukemia. Our results revealed a set of 20 genes capable of accurately estimating KMT2A-r. The SKIDA1 (AUC: 0.839; CI: 0.799–0.879) and LAMP5 (AUC: 0.746; CI: 0.685–0.806) overexpression were the better markers associated with KMT2A-r compared to CSPG4 (also named NG2; AUC: 0.722; CI: 0.659–0.784), regardless of the type of acute leukemia. Of importance, high expression levels of LAMP5 estimated the occurrence of all KMT2A-USP2 fusions. Also, we performed drug sensitivity analysis using IC50 data from 345 drugs available in the GDSC database to identify which ones could be used to treat KMT2A-r leukemia. We observed that KMT2A-r cell lines were more sensitive to 5-Fluorouracil (5FU), Gemcitabine (both antimetabolite chemotherapy drugs), WHI-P97 (JAK-3 inhibitor), Foretinib (MET/VEGFR inhibitor), SNX-2112 (Hsp90 inhibitor), AZD6482 (PI3Kβ inhibitor), KU-60019 (ATM kinase inhibitor), and Pevonedistat (NEDD8-activating enzyme (NAE) inhibitor). Moreover, IC50 data from analyses of ex-vivo drug sensitivity to small-molecule inhibitors reveals that Foretinib is a promising drug option for AML patients carrying FLT3 activating mutations. Thus, we provide novel and accurate options for the diagnostic screening and therapy of KMT2A-r leukemia, regardless of leukemia subtype.

Biased auctioneers (2022)

Aubry, Mathieu ; Kräussl, Roman ; Manso, Gustavo ; Spaenjers, Christophe

We construct a neural network algorithm that generates price predictions for art at auction, relying on both visual and non-visual object characteristics. We find that higher automated valuations relative to auction house pre-sale estimates are associated with substantially higher price-to-estimate ratios and lower buy-in rates, pointing to estimates’ informational inefficiency. The relative contribution of machine learning is higher for artists with less dispersed and lower average prices. Furthermore, we show that auctioneers’ prediction errors are persistent both at the artist and at the auction house level, and hence directly predictable themselves using information on past errors.

Sorting of odor dilutions is a meaningful addition to assessments of olfactory function as suggested by machine-learning-based analyses (2022)

Lötsch, Jörn ; Huster, Anne ; Hummel, Thomas

Background: The categorization of individuals as normosmic, hyposmic, or anosmic from test results of odor threshold, discrimination, and identification may provide a limited view of the sense of smell. The purpose of this study was to expand the clinical diagnostic repertoire by including additional tests. Methods: A random cohort of n = 135 individuals (83 women and 52 men, aged 21 to 94 years) was tested for odor threshold, discrimination, and identification, plus a distance test, in which the odor of peanut butter is perceived, a sorting task of odor dilutions for phenylethyl alcohol and eugenol, a discrimination test for odorant enantiomers, a lateralization test with eucalyptol, a threshold assessment after 10 min of exposure to phenylethyl alcohol, and a questionnaire on the importance of olfaction. Unsupervised methods were used to detect structure in the olfaction-related data, followed by supervised feature selection methods from statistics and machine learning to identify relevant variables. Results: The structure in the olfaction-related data divided the cohort into two distinct clusters with n = 80 and 55 subjects. Odor threshold, discrimination, and identification did not play a relevant role for cluster assignment, which, on the other hand, depended on performance in the two odor dilution sorting tasks, from which cluster assignment was possible with a median 100-fold cross-validated balanced accuracy of 77–88%. Conclusions: The addition of an odor sorting task with the two proposed odor dilutions to the odor test battery expands the phenotype of olfaction and fits seamlessly into the sensory focus of standard test batteries.

A machine learning-empowered workflow to discriminate bacillus subtilis motility phenotypes (2022)

Mayer, Benjamin ; Holtrup, Sven ; Graumann, Peter

Bacteria that are capable of organizing themselves as biofilms are an important public health issue. Knowledge discovery focusing on the ability to swarm and conquer the surroundings to form persistent colonies is therefore very important for microbiological research communities that focus on a clinical perspective. Here, we demonstrate how a machine learning workflow can be used to create useful models that are capable of discriminating distinct associated growth behaviors along distinct phenotypes. Based on basic gray-scale images, we provide a processing pipeline for binary image generation, making the workflow accessible for imaging data from a wide range of devices and conditions. The workflow includes a locally estimated regression model that easily applies to growth-related data and a shape analysis using identified principal components. Finally, we apply a density-based clustering application with noise (DBSCAN) to extract and analyze characteristic, general features explained by colony shapes and areas to discriminate distinct Bacillus subtilis phenotypes. Our results suggest that the differences regarding their ability to swarm and subsequently conquer the medium that surrounds them result in characteristic features. The differences along the time scales of the distinct latency for the colony formation give insights into the ability to invade the surroundings and therefore could serve as a useful monitoring tool.

Robust classification using posterior probability threshold computation followed by Voronoi cell based class assignment circumventing pitfalls of Bayesian analysis of biomedical data (2022)

Ultsch, Alfred ; Lötsch, Jörn

Bayesian inference is ubiquitous in science and widely used in biomedical research such as cell sorting or “omics” approaches, as well as in machine learning (ML), artificial neural networks, and “big data” applications. However, the calculation is not robust in regions of low evidence. In cases where one group has a lower mean but a higher variance than another group, new cases with larger values are implausibly assigned to the group with typically smaller values. An approach for a robust extension of Bayesian inference is proposed that proceeds in two main steps starting from the Bayesian posterior probabilities. First, cases with low evidence are labeled as “uncertain” class membership. The boundary for low probabilities of class assignment (threshold 𝜀 ) is calculated using a computed ABC analysis as a data-based technique for item categorization. This leaves a number of cases with uncertain classification (p < 𝜀 ). Second, cases with uncertain class membership are relabeled based on the distance to neighboring classified cases based on Voronoi cells. The approach is demonstrated on biomedical data typically analyzed with Bayesian statistics, such as flow cytometric data sets or biomarkers used in medical diagnostics, where it increased the class assignment accuracy by 1–10% depending on the data set. The proposed extension of the Bayesian inference of class membership can be used to obtain robust and plausible class assignments even for data at the extremes of the distribution and/or for which evidence is weak.

Image-based annotation of chemogenomic libraries for phenotypic screening (2022)

Tjaden, Amelie ; Chaikuad, Apirat ; Kowarz, Eric ; Marschalek, Rolf ; Knapp, Stefan ; Schröder, Martin ; Müller, Susanne

Phenotypical screening is a widely used approach in drug discovery for the identification of small molecules with cellular activities. However, functional annotation of identified hits often poses a challenge. The development of small molecules with narrow or exclusive target selectivity such as chemical probes and chemogenomic (CG) libraries, greatly diminishes this challenge, but non-specific effects caused by compound toxicity or interference with basic cellular functions still pose a problem to associate phenotypic readouts with molecular targets. Hence, each compound should ideally be comprehensively characterized regarding its effects on general cell functions. Here, we report an optimized live-cell multiplexed assay that classifies cells based on nuclear morphology, presenting an excellent indicator for cellular responses such as early apoptosis and necrosis. This basic readout in combination with the detection of other general cell damaging activities of small molecules such as changes in cytoskeletal morphology, cell cycle and mitochondrial health provides a comprehensive time-dependent characterization of the effect of small molecules on cellular health in a single experiment. The developed high-content assay offers multi-dimensional comprehensive characterization that can be used to delineate generic effects regarding cell functions and cell viability, allowing an assessment of compound suitability for subsequent detailed phenotypic and mechanistic studies.

A consensus compound/bioactivity dataset for data-driven drug design and chemogenomics (2022)

Isigkeit, Laura ; Chaikuad, Apirat ; Merk, Daniel

Publicly available compound and bioactivity databases provide an essential basis for data-driven applications in life-science research and drug design. By analyzing several bioactivity repositories, we discovered differences in compound and target coverage advocating the combined use of data from multiple sources. Using data from ChEMBL, PubChem, IUPHAR/BPS, BindingDB, and Probes & Drugs, we assembled a consensus dataset focusing on small molecules with bioactivity on human macromolecular targets. This allowed an improved coverage of compound space and targets, and an automated comparison and curation of structural and bioactivity data to reveal potentially erroneous entries and increase confidence. The consensus dataset comprised of more than 1.1 million compounds with over 10.9 million bioactivity data points with annotations on assay type and bioactivity confidence, providing a useful ensemble for computational applications in drug design and chemogenomics.

On the correspondence between classic coding theory and machine learning (2021)

Rechberger, Julia

When we browse via WiFi on our laptop or mobile phone, we receive data over a noisy channel. The received message may differ from the one that was sent originally. Luckily it is often possible to reconstruct the original message but it may take a lot of time. That’s because decoding the received message is a complex problem, NP-hard to be exact. As we continue browsing, new information is sent to us in a high frequency. So if lags are to be avoided and as memory is finite, there is not much time left for decoding. Coding theory tackles this problem by creating models of the channels we use to communicate and tailor codes based on the channel properties. A well known family of codes are Low-Density Parity-Check codes (LDPC codes), they are widely used in standards like WiFi and DVB-T2. In practical settings the complexity of decoding a received message can be heavily reduced by using LDPC codes and approximative decoding algorithms. This thesis lays out the basic construction of LDPC codes and a proper decoding using the sum-product algorithm. On this basis a neural network to improve decoding is introduced. Therefore the sum-product algorithm is transformed into a neural network decoder. This approach was first presented by Nachmani et al. and treated in detail by Navneet Agrawal in 2017. To find out how machine learning can improve the codes, the bit error rates of the trained neural network decoder are compared with the bit error rates of the classic sum-product algorithm approach. Experiments with static and dynamic training datasets of diverse sizes, various signal-to-noise ratios, a feed forward as well as a recurrent architecture show how to tune the neural network decoder even further. Results of the experiments are used to verify statements made in Agrawal’s work. In addition, corrections and improvements in the area of metrics are presented. An implementation of the neural network to facilitate access for others will be made available to the public.

Development and validation of a simplified risk score for the prediction of critical COVID-19 illness in newly diagnosed patients (2021)

Werfel, Stanislas ; Jakob, Carolin Ellen Marianne ; Borgmann, Stefan ; Schneider, Jochen ; Spinner, Christoph Daniel ; Schons, Maximilian ; Hower, Martin ; Wille, Kai ; Haselberger, Martina Maria ; Heuzeroth, Hanno ; Rüthrich, Maria Madeleine ; Dolff, Sebastian Conrad Johannes ; Kessel, Johanna ; Heemann, Uwe ; Vehreschild, Jörg Janne ; Rieg, Siegbert ; Schmaderer, Christoph

Scores to identify patients at high risk of progression of coronavirus disease (COVID-19), caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), may become instrumental for clinical decision-making and patient management. We used patient data from the multicentre Lean European Open Survey on SARS-CoV-2-Infected Patients (LEOSS) and applied variable selection to develop a simplified scoring system to identify patients at increased risk of critical illness or death. A total of 1946 patients who tested positive for SARS-CoV-2 were included in the initial analysis and assigned to derivation and validation cohorts (n = 1297 and n = 649, respectively). Stability selection from over 100 baseline predictors for the combined endpoint of progression to the critical phase or COVID-19-related death enabled the development of a simplified score consisting of five predictors: C-reactive protein (CRP), age, clinical disease phase (uncomplicated vs. complicated), serum urea, and D-dimer (abbreviated as CAPS-D score). This score yielded an area under the curve (AUC) of 0.81 (95% confidence interval [CI]: 0.77–0.85) in the validation cohort for predicting the combined endpoint within 7 days of diagnosis and 0.81 (95% CI: 0.77–0.85) during full follow-up. We used an additional prospective cohort of 682 patients, diagnosed largely after the “first wave” of the pandemic to validate the predictive accuracy of the score and observed similar results (AUC for the event within 7 days: 0.83 [95% CI: 0.78–0.87]; for full follow-up: 0.82 [95% CI: 0.78–0.86]). An easily applicable score to calculate the risk of COVID-19 progression to critical illness or death was thus established and validated.

Explainable artificial intelligence (XAI) in biomedicine: making AI decisions trustworthy for physicians and patients (2021)

Lötsch, Jörn ; Kringel, Dario ; Ultsch, Alfred

The use of artificial intelligence (AI) systems in biomedical and clinical settings can disrupt the traditional doctor–patient relationship, which is based on trust and transparency in medical advice and therapeutic decisions. When the diagnosis or selection of a therapy is no longer made solely by the physician, but to a significant extent by a machine using algorithms, decisions become nontransparent. Skill learning is the most common application of machine learning algorithms in clinical decision making. These are a class of very general algorithms (artificial neural networks, classifiers, etc.), which are tuned based on examples to optimize the classification of new, unseen cases. It is pointless to ask for an explanation for a decision. A detailed understanding of the mathematical details of an AI algorithm may be possible for experts in statistics or computer science. However, when it comes to the fate of human beings, this “developer’s explanation” is not sufficient. The concept of explainable AI (XAI) as a solution to this problem is attracting increasing scientific and regulatory interest. This review focuses on the requirement that XAIs must be able to explain in detail the decisions made by the AI to the experts in the field.

Explainable machine learning for default privacy setting prediction (2021)

Löbner, Sascha ; Tesfay, Welderufael Berhane ; Nakamura, Toru ; Pape, Sebastian

When requesting a web-based service, users often fail in setting the website’s privacy settings according to their self privacy preferences. Being overwhelmed by the choice of preferences, a lack of knowledge of related technologies or unawareness of the own privacy preferences are just some reasons why users tend to struggle. To address all these problems, privacy setting prediction tools are particularly well-suited. Such tools aim to lower the burden to set privacy preferences according to owners’ privacy preferences. To be in line with the increased demand for explainability and interpretability by regulatory obligations – such as the General Data Protection Regulation (GDPR) in Europe – in this paper an explainable model for default privacy setting prediction is introduced. Compared to the previous work we present an improved feature selection, increased interpretability of each step in model design and enhanced evaluation metrics to better identify weaknesses in the model’s design before it goes into production. As a result, we aim to provide an explainable and transparent tool for default privacy setting prediction which users easily understand and are therefore more likely to use.

Comprehensive analysis of tumour sub-volumes for radiomic risk modelling in locally advanced HNSCC (2020)

Simple Summary: Radiomic risk models are usually based on imaging features, which are extracted from the entire gross tumour volume (GTV entire ). This approach does not explicitly consider the complex biological structure of the tumours. Therefore, in this retrospective study, we investigated the prognostic value of radiomic analyses based on different tumour sub-volumes using computed tomography imaging of patients with locally advanced head and neck squamous cell carcinoma who were treated with primary radio-chemotherapy. The GTV entire was cropped by different margins to define the rim and corresponding core sub-volumes of the tumour. Furthermore, the best performing tumour rim sub-volume was extended into surrounding tissue with different margins. As a result, the models based on the 5 mm tumour rim and on the 3 mm extended rim sub-volume showed an improved performance compared to models based on the corresponding tumour core. This indicates that the consideration of tumour sub-volumes may help to improve radiomic risk models. Abstract: Imaging features for radiomic analyses are commonly calculated from the entire gross tumour volume (GTVentire). However, tumours are biologically complex and the consideration of different tumour regions in radiomic models may lead to an improved outcome prediction. Therefore, we investigated the prognostic value of radiomic analyses based on different tumour sub-volumes using computed tomography imaging of patients with locally advanced head and neck squamous cell carcinoma. The GTVentire was cropped by different margins to define the rim and the corresponding core sub-volumes of the tumour. Subsequently, the best performing tumour rim sub-volume was extended into surrounding tissue with different margins. Radiomic risk models were developed and validated using a retrospective cohort consisting of 291 patients in one of the six Partner Sites of the German Cancer Consortium Radiation Oncology Group treated between 2005 and 2013. The validation concordance index (C-index) averaged over all applied learning algorithms and feature selection methods using the GTVentire achieved a moderate prognostic performance for loco-regional tumour control (C-index: 0.61 ± 0.04 (mean ± std)). The models based on the 5 mm tumour rim and on the 3 mm extended rim sub-volume showed higher median performances (C-index: 0.65 ± 0.02 and 0.64 ± 0.05, respectively), while models based on the corresponding tumour core volumes performed less (C-index: 0.59 ± 0.01). The difference in C-index between the 5 mm tumour rim and the corresponding core volume showed a statistical trend (p = 0.10). After additional prospective validation, the consideration of tumour sub-volumes may be a promising way to improve prognostic radiomic risk models.

Proteomic profiling as a diagnostic biomarker for discriminating between bipolar and unipolar depression (2020)

Kittel-Schneider, Sarah ; Hahn, Tim ; Haenisch, Frieder ; McNeill, Rhiannon ; Reif, Andreas ; Bahn, Sabine

Introduction: Affective disorders are a major global burden, with approximately 15% of people worldwide suffering from some form of affective disorder. In patients experiencing their first depressive episode, in most cases it cannot be distinguished whether this is due to bipolar disorder (BD) or major depressive disorder (MDD). Valid fluid biomarkers able to discriminate between the two disorders in a clinical setting are not yet available. Material and Methods: Seventy depressed patients suffering from BD (bipolar I and II subtypes) and 42 patients with major MDD were recruited and blood samples were taken for proteomic analyses after 8 h fasting. Proteomic profiles were analyzed using the Multiplex Immunoassay platform from Myriad Rules Based Medicine (Myriad RBM; Austin, Texas, USA). Human DiscoveryMAPTM was used to measure the concentration of various proteins, peptides, and small molecules. A multivariate predictive model was consequently constructed to differentiate between BD and MDD. Results: Based on the various proteomic profiles, the algorithm could discriminate depressed BD patients from MDD patients with an accuracy of 67%. Discussion: The results of this preliminary study suggest that future discrimination between bipolar and unipolar depression in a single case could be possible, using predictive biomarker models based on blood proteomic profiling.

Machine-learned association of next-generation sequencing-derived variants in thermosensitive ion channels genes with human thermal pain sensitivity phenotypes (2020)

Lötsch, Jörn ; Kringel, Dario ; Geisslinger, Gerd ; Oertel, Bruno Georg ; Resch, Eduard ; Malkusch, Sebastian

Genetic association studies have shown their usefulness in assessing the role of ion channels in human thermal pain perception. We used machine learning to construct a complex phenotype from pain thresholds to thermal stimuli and associate it with the genetic information derived from the next-generation sequencing (NGS) of 15 ion channel genes which are involved in thermal perception, including ASIC1, ASIC2, ASIC3, ASIC4, TRPA1, TRPC1, TRPM2, TRPM3, TRPM4, TRPM5, TRPM8, TRPV1, TRPV2, TRPV3, and TRPV4. Phenotypic information was complete in 82 subjects and NGS genotypes were available in 67 subjects. A network of artificial neurons, implemented as emergent self-organizing maps, discovered two clusters characterized by high or low pain thresholds for heat and cold pain. A total of 1071 variants were discovered in the 15 ion channel genes. After feature selection, 80 genetic variants were retained for an association analysis based on machine learning. The measured performance of machine learning-mediated phenotype assignment based on this genetic information resulted in an area under the receiver operating characteristic curve of 77.2%, justifying a phenotype classification based on the genetic information. A further item categorization finally resulted in 38 genetic variants that contributed most to the phenotype assignment. Most of them (10) belonged to the TRPV3 gene, followed by TRPM3 (6). Therefore, the analysis successfully identified the particular importance of TRPV3 and TRPM3 for an average pain phenotype defined by the sensitivity to moderate thermal stimuli.

Machine learning, human experts, and the valuation of real assets (2019)

Aubry, Mathieu ; Kräussl, Roman ; Manso, Gustavo ; Spaenjers, Christophe

We study the accuracy and usefulness of automated (i.e., machine-generated) valuations for illiquid and heterogeneous real assets. We assemble a database of 1.1 million paintings auctioned between 2008 and 2015. We use a popular machine-learning technique—neural networks—to develop a pricing algorithm based on both non-visual and visual artwork characteristics. Our out-of-sample valuations predict auction prices dramatically better than valuations based on a standard hedonic pricing model. Moreover, they help explaining price levels and sale probabilities even after conditioning on auctioneers’ pre-sale estimates. Machine learning is particularly helpful for assets that are associated with high price uncertainty. It can also correct human experts’ systematic biases in expectations formation—and identify ex ante situations in which such biases are likely to arise.

A data science-based analysis points at distinct patterns of lipid mediator plasma concentrations in patients with dementia (2019)

Gurke, Robert ; Etyemez, Semra ; Prvulovic, David ; Thomas, Dominique Jeanette ; Fleck, Stefanie Christina ; Reif, Andreas ; Geisslinger, Gerd ; Lötsch, Jörn

Based on accumulating evidence of a role of lipid signaling in many physiological and pathophysiological processes including psychiatric diseases, the present data driven analysis was designed to gather information needed to develop a prospective biomarker, using a targeted lipidomics approach covering different lipid mediators. Using unsupervised methods of data structure detection, implemented as hierarchal clustering, emergent self-organizing maps of neuronal networks, and principal component analysis, a cluster structure was found in the input data space comprising plasma concentrations of d = 35 different lipid-markers of various classes acquired in n = 94 subjects with the clinical diagnoses depression, bipolar disorder, ADHD, dementia, or in healthy controls. The structure separated patients with dementia from the other clinical groups, indicating that dementia is associated with a distinct lipid mediator plasma concentrations pattern possibly providing a basis for a future biomarker. This hypothesis was subsequently assessed using supervised machine-learning methods, implemented as random forests or principal component analysis followed by computed ABC analysis used for feature selection, and as random forests, k-nearest neighbors, support vector machines, multilayer perceptron, and naïve Bayesian classifiers to estimate whether the selected lipid mediators provide sufficient information that the diagnosis of dementia can be established at a higher accuracy than by guessing. This succeeded using a set of d = 7 markers comprising GluCerC16:0, Cer24:0, Cer20:0, Cer16:0, Cer24:1, C16 sphinganine, and LacCerC16:0, at an accuracy of 77%. By contrast, using random lipid markers reduced the diagnostic accuracy to values of 65% or less, whereas training the algorithms with randomly permuted data was followed by complete failure to diagnose dementia, emphasizing that the selected lipid mediators were display a particular pattern in this disease possibly qualifying as biomarkers.

Modeling music-selection behavior in everyday life: A multilevel statistical learning approach and mediation analysis of experience sampling data (2019)

Greb, Fabian ; Steffens, Jochen ; Schlotz, Wolff

Music listening has become a highly individualized activity with smartphones and music streaming services providing listeners with absolute freedom to listen to any kind of music in any situation. Until now, little has been written about the processes underlying the selection of music in daily life. The present study aimed to disentangle some of the complex processes among the listener, situation, and functions of music listening involved in music selection. Utilizing the experience sampling method, data were collected from 119 participants using a smartphone application. For 10 consecutive days, participants received 14 prompts using stratified-random sampling throughout the day and reported on their music-listening behavior. Statistical learning procedures on multilevel regression models and multilevel structural equation modeling were used to determine the most important predictors and analyze mediation processes between person, situation, functions of listening, and music selection. Results revealed that the features of music selected in daily life were predominantly determined by situational characteristics, whereas consistent individual differences were of minor importance. Functions of music listening were found to act as a mediator between characteristics of the situation and music-selection behavior. We further observed several significant random effects, which indicated that individuals differed in how situational variables affected their music selection behavior. Our findings suggest a need to shift the focus of music-listening research from individual differences to situational influences, including potential person-situation interactions.

Classify QCD phase transition with deep learning (2019)

Pang, Long-Gang ; Zhou, Kai ; Su, Nan ; Petersen, Hannah ; Stöcker, Horst ; Wang, Xin-Nian

The state-of-the-art pattern recognition method in machine learning (deep convolution neural network) is used to identify the equation of state (EoS) employed in the relativistic hydrodynamic simulations of heavy ion collisions. High-level correlations of particle spectra in transverse momentum and azimuthal angle learned by the network act as an effective EoS-meter in deciphering the nature of the phase transition in QCD. The EoS-meter is model independent and insensitive to other simulation inputs including the initial conditions and shear viscosity for hydrodynamic simulations. Through this study we demonstrate that there is a traceable encoder of the dynamical information from the phase structure that survives the evolution and exists in the final snapshot of heavy ion collisions and one can exclusively and effectively decode these information from the highly complex final output with machine learning when traditional methods fail. Besides the deep neural network, the performance of traditional machine learning classifiers are also provided.

The digitalization of the globe: (Machine-)Learning about population in need of support (2018)

Nagenborg, Michael ; Kuffer, Monika

The discussion about the interplay between digital technologies and the process of globalization is often focused around the following question: who has access to global information networks and who benefits from digital communication technologies? These are essential questions and it can hardly be denied that they confront us with a series of political and ethical questions. However, we also need to recognize the ongoing digitalization of the globe, a process where more and more people are put on various kinds of maps...

Predictable biases in macroeconomic forecasts and their impact across asset classes (2018)

Félix, Luiz ; Kräussl, Roman ; Stork, Philip

This paper investigates how biases in macroeconomic forecasts are associated with economic surprises and market responses across asset classes around US data announcements. We find that the skewness of the distribution of economic forecasts is a strong predictor of economic surprises, suggesting that forecasters behave strategically (rational bias) and possess private information. Our results also show that consensus forecasts of US macroeconomic releases embed anchoring. Under these conditions, both economic surprises and the returns of assets that are sensitive to macroeconomic conditions are predictable. Our findings indicate that local equities and bond markets are more predictable than foreign markets, currencies and commodities. Economic surprises are found to link to asset returns very distinctively through the stages of the economic cycle, whereas they strongly depend on economic releases being inflation- or growth-related. Yet, when forecasters fail to correctly forecast the direction of economic surprises, regret becomes a relevant cognitive bias to explain asset price responses. We find that the behavioral and rational biases encountered in US economic forecasting also exists in Continental Europe, the United Kingdom and Japan, albeit, to a lesser extent.

Active management of operational risk in the regimes of the "unknown": What can machine learning or heuristics deliver? (2018)

Milkau, Udo ; Bott, Jürgen

Advanced machine learning has achieved extraordinary success in recent years. “Active” operational risk beyond ex post analysis of measured-data machine learning could provide help beyond the regime of traditional statistical analysis when it comes to the “known unknown” or even the “unknown unknown.” While machine learning has been tested successfully in the regime of the “known,” heuristics typically provide better results for an active operational risk management (in the sense of forecasting). However, precursors in existing data can open a chance for machine learning to provide early warnings even for the regime of the “unknown unknown.”

BIOfid, a platform to enhance accessibility of biodiversity data (2018)

Weiland, Claus ; Driller, Christine ; Koch, Markus ; Schmidt, Marco ; Abrami, Giuseppe ; Ahmed, Sajawel ; Mehler, Alexander ; Pachzelt, Adrian ; Kasperek, Gerwin ; Hausinger, Angela ; Hörnschemeyer, Thomas

With the ongoing loss of global biodiversity, long-term recordings of species distribution patterns are increasingly becoming important to investigate the causes and consequences for their change. Therefore, the digitization of scientific literature, both modern and historical, has been attracting growing attention in recent years. To meet this growing demand the Specialised Information Service for Biodiversity Research (BIOfid) was launched in 2017 with the aim of increasing the availability and accessibility of biodiversity information. Closely tied to the research community the interdisciplinary BIOfid team is digitizing data sources of biodiversity related research and provides a modern and professional infrastructure for hosting and sharing them. As a pilot project, German publications on the distribution and ecology of vascular plants, birds, moths and butterflies covering the past 250 years are prioritized. Large parts of the text corpus defined in accordance with the needs of the relevant German research community have already been transferred to a machine-readable format and will be publicly accessible soon. Software tools for text mining, semantic annotation and analysis with respect to the current trends in machine learning are developed to maximize bioscientific data output through user-specific queries that can be created via the BIOfid web portal (https://www.biofid.de/). To boost knowledge discovery, specific ontologies focusing on morphological traits and taxonomy are being prepared and will continuously be extended to keep up with an ever-expanding volume of literature sources.

Bionic electronic nose based on MOS sensors array and machine learning algorithms used for wne properties detection (2018)

Liu, Huixiang ; Li, Qing ; Yan, Bin ; Zhang, Lei ; Gu, Yu

In this study, a portable electronic nose (E-nose) prototype is developed using metal oxide semiconductor (MOS) sensors to detect odors of different wines. Odor detection facilitates the distinction of wines with different properties, including areas of production, vintage years, fermentation processes, and varietals. Four popular machine learning algorithms—extreme gradient boosting (XGBoost), random forest (RF), support vector machine (SVM), and backpropagation neural network (BPNN)—were used to build identification models for different classification tasks. Experimental results show that BPNN achieved the best performance, with accuracies of 94% and 92.5% in identifying production areas and varietals, respectively; and SVM achieved the best performance in identifying vintages and fermentation processes, with accuracies of 67.3% and 60.5%, respectively. Results demonstrate the effectiveness of the developed E-nose, which could be used to distinguish different wines based on their properties following selection of an optimal algorithm.

Modellierung von Wasserhaushalts- und Nährstoffstufen im Rahmen der Niedersächsischen forstlichen Standortskartierung (2016)

Köhler, Michael ; Steinicke, Christian ; Evers, Jan ; Meesenburg, Henning ; Ahrends, Bernd

In Niedersachsen sind etwa 50 % der forstlichen Standorte in einem Maßstab 1 : 25 000 nach einem relativ komplexen Verfahren kartiert. Jede kartierte Einheit besteht aus Stufen für den Geländewasserhaushalt (WHZ; 43 Stufen), die Nährstoffversorgung (NZ; 16 Stufen) und die Substratund Lagerungsverhältnisse (SLZ; 105 Stufen). Das Ziel der Arbeit war es, WHZ und NZ Stufen der Niedersächsischen forstlichen Standortskartierung für nicht kartierte Gebiete vorherzusagen. Anhand von stratifizierten Zufallsstichproben der WHZ und NZ Stufen aus der Kartierung wurden zwei RandomForest-Modelle kalibriert. Das Modell klassifizierte etwa 77 % der Teststichprobe für die WHZ richtig. Die F1-Werte der einzelnen Stufen reichten dabei von 50–95 %. Falsche Vorhersagen mehrten sich bei Übergängen benachbarter WHZ (z. B. Übergang von Tälern zu Hängen) und bei WHZ mit ähnlichen Geländeeigenschaften, aber Abstufungen in der Wasserversorgung. Einige Modellfehler hängen aber offenbar auch von Unschärfen innerhalb der zugrundeliegenden Kartierung ab. Zusätzlich sagt das Modell im Vergleich zur Feldkartierung viel kleinräumigere Muster vorher, die zwar vom zugrundeliegenden Gelände her nachvollziehbar erscheinen, aber in dieser Genauigkeit nicht im Feld kartiert werden. Etwa 66 % des Testdatensatzes für die NZ wurden richtig klassifiziert. Falsche Vorhersagen traten hier vor allem in direkt benachbarten Stufen der Nährstoffversorgung auf. Unsicherheiten deuten zum einen auf weniger gut geeignete Kovariablen hin, sind möglicherweise aber auch durch zeitliche Änderungen der Bodeneigenschaften selbst sowie durch Ungenauigkeiten in der Kartierung zu erwarten, die wenige Regeln für die Vergabe der Nährstoffzahl vorgibt. Insgesamt beurteilen wir die Modelle als gut geeignet, um sie landesweit anzuwenden. Allerdings ist zu erwarten, dass eine lokale Kalibrierung der Modelle für einzelne Wuchsgebiete die Modellgüte deutlich erhöht. Gleiches kann eine Zusammenfassung ähnlicher Stufen zu waldbaulich relevanten Obergruppen leisten.

Open Access

Refine

Author

Year of publication

Document Type

Language

Has Fulltext

Is part of the Bibliography

Keywords

Institute

28 search hits