004 Datenverarbeitung; Informatik
Refine
Year of publication
- 2022 (50) (remove)
Document Type
- Article (21)
- Doctoral Thesis (11)
- Preprint (6)
- Bachelor Thesis (4)
- Master's Thesis (3)
- Working Paper (2)
- Part of a Book (1)
- Conference Proceeding (1)
- Contribution to a Periodical (1)
Has Fulltext
- yes (50) (remove)
Is part of the Bibliography
- no (50)
Keywords
- data science (5)
- NLP (3)
- artificial intelligence (3)
- digital medicine (3)
- machine learning (3)
- machine-learning (3)
- Biomedical informatics (2)
- Data science (2)
- Natural Language Processing (2)
- patients (2)
Institute
With free delivery of products virtually being a standard in E-commerce, product returns pose a major challenge for online retailers and society. For retailers, product returns involve significant transportation, labor, disposal, and administrative costs. From a societal perspective, product returns contribute to greenhouse gas emissions and packaging disposal and are often a waste of natural resources. Therefore, reducing product returns has become a key challenge. This paper develops and validates a novel smart green nudging approach to tackle the problem of product returns during customers’ online shopping processes. We combine a green nudge with a novel data enrichment strategy and a modern causal machine learning method. We first run a large-scale randomized field experiment in the online shop of a German fashion retailer to test the efficacy of a novel green nudge. Subsequently, we fuse the data from about 50,000 customers with publicly-available aggregate data to create what we call enriched digital footprints and train a causal machine learning system capable of optimizing the administration of the green nudge. We report two main findings: First, our field study shows that the large-scale deployment of a simple, low-cost green nudge can significantly reduce product returns while increasing retailer profits. Second, we show how a causal machine learning system trained on the enriched digital footprint can amplify the effectiveness of the green nudge by “smartly” administering it only to certain types of customers. Overall, this paper demonstrates how combining a low-cost marketing instrument, a privacy-preserving data enrichment strategy, and a causal machine learning method can create a win-win situation from both an environmental and economic perspective by simultaneously reducing product returns and increasing retailers’ profits.
Recent advances in artificial neural networks enabled the quick development of new learning algorithms, which, among other things, pave the way to novel robotic applications. Traditionally, robots are programmed by human experts so as to accomplish pre-defined tasks. Such robots must operate in a controlled environment to guarantee repeatability, are designed to solve one unique task and require costly hours of development. In developmental robotics, researchers try to artificially imitate the way living beings acquire their behavior by learning. Learning algorithms are key to conceive versatile and robust robots that can adapt to their environment and solve multiple tasks efficiently. In particular, Reinforcement Learning (RL) studies the acquisition of skills through teaching via rewards. In this thesis, we will introduce RL and present recent advances in RL applied to robotics. We will review Intrinsically Motivated (IM) learning, a special form of RL, and we will apply in particular the Active Efficient Coding (AEC) principle to the learning of active vision. We also propose an overview of Hierarchical Reinforcement Learning (HRL), an other special form of RL, and apply its principle to a robotic manipulation task.
Orientation hypercolumns in the visual cortex are delimited by the repeating pinwheel patterns of orientation selective neurons. We design a generative model for visual cortex maps that reproduces such orientation hypercolumns as well as ocular dominance maps while preserving retinotopy. The model uses a neural placement method based on t–distributed stochastic neighbour embedding (t–SNE) to create maps that order common features in the connectivity matrix of the circuit. We find that, in our model, hypercolumns generally appear with fixed cell numbers independently of the overall network size. These results would suggest that existing differences in absolute pinwheel densities are a consequence of variations in neuronal density. Indeed, available measurements in the visual cortex indicate that pinwheels consist of a constant number of ∼30, 000 neurons. Our model is able to reproduce a large number of characteristic properties known for visual cortex maps. We provide the corresponding software in our MAPStoolbox for Matlab.
Bayesian inference is ubiquitous in science and widely used in biomedical research such as cell sorting or “omics” approaches, as well as in machine learning (ML), artificial neural networks, and “big data” applications. However, the calculation is not robust in regions of low evidence. In cases where one group has a lower mean but a higher variance than another group, new cases with larger values are implausibly assigned to the group with typically smaller values. An approach for a robust extension of Bayesian inference is proposed that proceeds in two main steps starting from the Bayesian posterior probabilities. First, cases with low evidence are labeled as “uncertain” class membership. The boundary for low probabilities of class assignment (threshold 𝜀
) is calculated using a computed ABC analysis as a data-based technique for item categorization. This leaves a number of cases with uncertain classification (p < 𝜀
). Second, cases with uncertain class membership are relabeled based on the distance to neighboring classified cases based on Voronoi cells. The approach is demonstrated on biomedical data typically analyzed with Bayesian statistics, such as flow cytometric data sets or biomarkers used in medical diagnostics, where it increased the class assignment accuracy by 1–10% depending on the data set. The proposed extension of the Bayesian inference of class membership can be used to obtain robust and plausible class assignments even for data at the extremes of the distribution and/or for which evidence is weak.
Euclidean distance-optimized data transformation for cluster analysis in biomedical data (EDOtrans)
(2022)
Background: Data transformations are commonly used in bioinformatics data processing in the context of data projection and clustering. The most used Euclidean metric is not scale invariant and therefore occasionally inappropriate for complex, e.g., multimodal distributed variables and may negatively affect the results of cluster analysis. Specifically, the squaring function in the definition of the Euclidean distance as the square root of the sum of squared differences between data points has the consequence that the value 1 implicitly defines a limit for distances within clusters versus distances between (inter-) clusters.
Methods: The Euclidean distances within a standard normal distribution (N(0,1)) follow a N(0,2–√) distribution. The EDO-transformation of a variable X is proposed as EDO=X/(2–√⋅s) following modeling of the standard deviation s by a mixture of Gaussians and selecting the dominant modes via item categorization. The method was compared in artificial and biomedical datasets with clustering of untransformed data, z-transformed data, and the recently proposed pooled variable scaling.
Results: A simulation study and applications to known real data examples showed that the proposed EDO scaling method is generally useful. The clustering results in terms of cluster accuracy, adjusted Rand index and Dunn’s index outperformed the classical alternatives. Finally, the EDO transformation was applied to cluster a high-dimensional genomic dataset consisting of gene expression data for multiple samples of breast cancer tissues, and the proposed approach gave better results than classical methods and was compared with pooled variable scaling.
Conclusions: For multivariate procedures of data analysis, it is proposed to use the EDO transformation as a better alternative to the established z-standardization, especially for nontrivially distributed data. The “EDOtrans” R package is available at https://cran.r-project.org/package=EDOtrans.
Enabling cybersecurity and protecting personal data are crucial challenges in the development and provision of digital service chains. Data and information are the key ingredients in the creation process of new digital services and products. While legal and technical problems are frequently discussed in academia, ethical issues of digital service chains and the commercialization of data are seldom investigated. Thus, based on outcomes of the Horizon2020 PANELFIT project, this work discusses current ethical issues related to cybersecurity. Utilizing expert workshops and encounters as well as a scientific literature review, ethical issues are mapped on individual steps of digital service chains. Not surprisingly, the results demonstrate that ethical challenges cannot be resolved in a general way, but need to be discussed individually and with respect to the ethical principles that are violated in the specific step of the service chain. Nevertheless, our results support practitioners by providing and discussing a list of ethical challenges to enable legally compliant as well as ethically acceptable solutions in the future.
Die Emergenz digitaler Netzwerke ist auf die ständige Entwicklung und Transformation neuer Informationstechnologien zurückzuführen.
Dieser Strukturwandel führt zu äußerst komplexen Systemen in vielen verschiedenen Lebensbereichen.
Es besteht daher verstärkt die Notwendigkeit, die zugrunde liegenden wesentlichen Eigenschaften von realen Netzwerken zu untersuchen und zu verstehen.
In diesem Zusammenhang wird die Netzwerkanalyse als Mittel für die Untersuchung von Netzwerken herangezogen und stellt beobachtete Strukturen mithilfe mathematischer Modelle dar.
Hierbei, werden in der Regel parametrisierbare Zufallsgraphen verwendet, um eine systematische experimentelle Evaluation von Algorithmen und Datenstrukturen zu ermöglichen.
Angesichts der zunehmenden Menge an Informationen, sind viele Aspekte der Netzwerkanalyse datengesteuert und zur Interpretation auf effiziente Algorithmen angewiesen.
Algorithmische Lösungen müssen daher sowohl die strukturellen Eigenschaften der Eingabe als auch die Besonderheiten der zugrunde liegenden Maschinen, die sie ausführen, sorgfältig berücksichtigen.
Die Generierung und Analyse massiver Netzwerke ist dementsprechend eine anspruchsvolle Aufgabe für sich.
Die vorliegende Arbeit bietet daher algorithmische Lösungen für die Generierung und Analyse massiver Graphen.
Zu diesem Zweck entwickeln wir Algorithmen für das Generieren von Graphen mit vorgegebenen Knotengraden, die Berechnung von Zusammenhangskomponenten massiver Graphen und zertifizierende Grapherkennung für Instanzen, die die Größe des Hauptspeichers überschreiten.
Unsere Algorithmen und Implementierungen sind praktisch effizient für verschiedene Maschinenmodelle und bieten sequentielle, Shared-Memory parallele und/oder I/O-effiziente Lösungen.
Antimicrobial resistant infections arise as a consequential response to evolutionary mechanisms within microbes which cause them to be protected from the effects of antimicrobials. The frequent occurrence of resistant infections poses a global public health threat as their control has become challenging despite many efforts. The dynamics of such infections are driven by processes at multiple levels. For a long time, mathematical models have proved valuable for unravelling complex mechanisms in the dynamics of infections. In this thesis, we focus on mathematical approaches to modelling the development and spread of resistant infections at between-host (population-wide) and within-host (individual) levels.
Within an individual host, switching between treatments has been identified as one of the methods that can be employed for the gradual eradication of resistant strains on the long term. With this as motivation, we study the problem using dynamical systems and notions from control theory. We present a model based on deterministic logistic differential equations which capture the general dynamics of microbial resistance inside an individual host. Fundamentally, this model describes the spread of resistant infections whilst accounting for evolutionary mutations observed in resistant pathogens and capturing them in mutation matrices. We extend this model to explore the implications of therapy switching from a control theoretic perspective by using switched systems and developing control strategies with the goal of reducing the appearance of drug resistant pathogens within the host.
At the between-host level, we use compartmental models to describe the transmission of infection between multiple individuals in a population. In particular, we make a case study of the evolution and spread of the novel coronavirus (SARS-CoV-2) pandemic. So far, vaccination remains a critical component in the eventual solution to this public health crisis. However, as with many other pathogens, vaccine resistant variants of the virus have been a major concern in control efforts by governments and all stakeholders. Using network theory, we investigate the spread and transmission of the disease on social networks by compartmentalising and studying the progression of the disease in each compartment, considering both the original virus strain and one of its highly transmissible vaccine-resistant mutant strains. We investigate these dynamics in the presence of vaccinations and other interventions. Although vaccinations are of absolute importance during viral outbreaks, resistant variants coupled with population hesitancy towards vaccination can lead to further spread of the virus.
AI-based computer vision systems play a crucial role in the environment perception for autonomous driving. Although the development of self-driving systems has been pursued for multiple decades, it is only recently that breakthroughs in Deep Neural Networks (DNNs) have led to their widespread application in perception pipelines, which are getting more and more sophisticated. However, with this rising trend comes the need for a systematic safety analysis to evaluate the DNN's behavior in difficult scenarios as well as to identify the various factors that cause misbehavior in such systems. This work aims to deliver a crucial contribution to the lacking literature on the systematic analysis of Performance Limiting Factors (PLFs) for DNNs by investigating the task of pedestrian detection in urban traffic from a monocular camera mounted on an autonomous vehicle. To investigate the common factors that lead to DNN misbehavior, six commonly used state-of-the-art object detection architectures and three detection tasks are studied using a new large-scale synthetic dataset and a smaller real-world dataset for pedestrian detection. The systematic analysis includes 17 factors from the literature and four novel factors that are introduced as part of this work. Each of the 21 factors is assessed based on its influence on the detection performance and whether it can be considered a Performance Limiting Factor (PLF). In order to support the evaluation of the detection performance, a novel and task-oriented Pedestrian Detection Safety Metric (PDSM) is introduced, which is specifically designed to aid in the identification of individual factors that contribute to DNN failure. This work further introduces a training approach for F1-Score maximization whose purpose is to ensure that the DNNs are assessed at their highest performance. Moreover, a new occlusion estimation model is introduced to replace the missing pedestrian occlusion annotations in the real-world dataset. Based on a qualitative analysis of the correlation graphs that visualize the correlation between the PLFs and the detection performance, this study identified 16 of the initial 21 factors as being PLFs for DNNs out of which the entropy, the occlusion ratio, the boundary edge strength, and the bounding box aspect ratio turned out to be most severely affecting the detection performance. The findings of this study highlight some of the most serious shortcomings of current DNNs and pave the way for future research to address these issues.
TriMem: A parallelized hybrid Monte Carlo software for efficient simulations of lipid membranes
(2022)
Lipid membranes are integral building blocks of living cells and perform a multitude of biological functions. Currently, molecular simulations of cellular-scale membrane remodeling processes at atomic resolution are extremely difficult, due to their size, complexity, and the large times-scales on which these processes occur. Instead, elastic membrane models are used to simulate membrane shapes and transitions between them and to infer their properties and functions. Unfortunately, an efficiently parallelized open-source simulation code to do so has been lacking. Here, we present TriMem, a parallel hybrid Monte Carlo simulation engine for triangulated lipid membranes. The kernels are efficiently coded in C++ and wrapped with Python for ease-of-use. The parallel implementation of the energy and gradient calculations and of Monte Carlo flip moves of edges in the triangulated membrane enable us to simulate large and highly curved membrane structures. For validation, we reproduce phase diagrams of vesicles with varying surface-to-volume ratios and area difference. We also compute the density of states to verify correct Boltzmann sampling. The software can be used to tackle a range of large-scale membrane remodeling processes as a step toward cell-scale simulations. Additionally, extensive documentation make the software accessible to the broad biophysics and computational cell biology communities.
Gasdermin-D (GSDMD) is the ultimate effector of pyroptosis, a form of programmed cell death associated with pathogen invasion and inflammation. After proteolytic cleavage by caspases activated by the inflammasome, the GSDMD N-terminal domain (GSDMDNT) assembles on the inner leaflet of the plasma membrane and induces the formation of large membrane pores. We use atomistic molecular dynamics simulations to study GSDMDNT monomers, oligomers, and rings in an asymmetric plasma membrane mimetic. We identify distinct interaction motifs of GSDMDNT with phosphatidylinositol-4,5-bisphosphate (PI(4,5)P2) and phosphatidylserine (PS) head-groups and describe differential lipid binding between the pore and prepore conformations. Oligomers are stabilized by shared lipid binding sites between neighboring monomers acting akin to double-sided tape. We show that already small GSDMDNT oligomers form stable, water-filled and ion-conducting membrane pores bounded by curled beta-sheets. In large-scale simulations, we resolve the process of pore formation by lipid detachment from GSDMDNT arcs and lipid efflux from partial rings. We find that that high-order GSDMDNT oligomers can crack under the line tension of 86 pN created by an open membrane edge to form the slit pores or closed GSDMDNT rings seen in experiment. Our simulations provide a detailed view of key steps in GSDMDNT-induced plasma membrane pore formation, including sublytic pores that explain nonselective ion flux during early pyroptosis.
Gasdermin-D (GSDMD) is the ultimate effector of pyroptosis, a form of programmed cell death associated with pathogen invasion and inflammation. After proteolytic cleavage by caspases, the GSDMD N-terminal domain (GSDMDNT) assembles on the inner leaflet of the plasma membrane and induces the formation of membrane pores. We use atomistic molecular dynamics simulations to study GSDMDNT monomers, oligomers, and rings in an asymmetric plasma membrane mimetic. We identify distinct interaction motifs of GSDMDNT with phosphatidylinositol-4,5-bisphosphate (PI(4,5)P2) and phosphatidylserine (PS) headgroups and describe their conformational dependence. Oligomers are stabilized by shared lipid binding sites between neighboring monomers acting akin to double-sided tape. We show that already small GSDMDNT oligomers support stable, water-filled, and ion-conducting membrane pores bounded by curled beta-sheets. In large-scale simulations, we resolve the process of pore formation from GSDMDNT arcs and lipid efflux from partial rings. We find that high-order GSDMDNT oligomers can crack under the line tension of 86 pN created by an open membrane edge to form the slit pores or closed GSDMDNT rings seen in atomic force microscopy experiments. Our simulations provide a detailed view of key steps in GSDMDNT-induced plasma membrane pore formation, including sublytic pores that explain nonselective ion flux during early pyroptosis.
The electrical and computational properties of neurons in our brains are determined by a rich repertoire of membrane-spanning ion channels and elaborate dendritic trees. However, the precise reason for this inherent complexity remains unknown. Here, we generated large stochastic populations of biophysically realistic hippocampal granule cell models comparing those with all 15 ion channels to their reduced but functional counterparts containing only 5 ion channels. Strikingly, valid parameter combinations in the full models were more frequent and more stable in the face of perturbations to channel expression levels. Scaling up the numbers of ion channels artificially in the reduced models recovered these advantages confirming the key contribution of the actual number of ion channel types. We conclude that the diversity of ion channels gives a neuron greater flexibility and robustness to achieve target excitability.
With open banking, consumers take greater control over their own financial data and share it at their discretion. Using a rich set of loan application data from the largest German FinTech lender in consumer credit, this paper studies what characterizes borrowers who share data and assesses its impact on loan application outcomes. I show that riskier borrowers share data more readily, which subsequently leads to an increase in the probability of loan approval and a reduction in interest rates. The effects hold across all credit risk profiles but are the most pronounced for borrowers with lower credit scores (a higher increase in loan approval rate) and higher credit scores (a larger reduction in interest rate). I also find that standard variables used in credit scoring explain substantially less variation in loan application outcomes when customers share data. Overall, these findings suggest that open banking improves financial inclusion, and also provide policy implications for regulators engaged in the adoption or extension of open banking policies.
Recent scientific evidence suggests that chronic pain phenotypes are reflected in metabolomic changes. However, problems associated with chronic pain, such as sleep disorders or obesity, may complicate the metabolome pattern. Such a complex phenotype was investigated to identify common metabolomics markers at the interface of persistent pain, sleep, and obesity in 71 men and 122 women undergoing tertiary pain care. They were examined for patterns in d = 97 metabolomic markers that segregated patients with a relatively benign pain phenotype (low and little bothersome pain) from those with more severe clinical symptoms (high pain intensity, more bothersome pain, and co-occurring problems such as sleep disturbance). Two independent lines of data analysis were pursued. First, a data-driven supervised machine learning-based approach was used to identify the most informative metabolic markers for complex phenotype assignment. This pointed primarily at adenosine monophosphate (AMP), asparagine, deoxycytidine, glucuronic acid, and propionylcarnitine, and secondarily at cysteine and nicotinamide adenine dinucleotide (NAD) as informative for assigning patients to clinical pain phenotypes. After this, a hypothesis-driven analysis of metabolic pathways was performed, including sleep and obesity. In both the first and second line of analysis, three metabolic markers (NAD, AMP, and cysteine) were found to be relevant, including metabolic pathway analysis in obesity, associated with changes in amino acid metabolism, and sleep problems, associated with downregulated methionine metabolism. Taken together, present findings provide evidence that metabolomic changes associated with co-occurring problems may play a role in the development of severe pain. Co-occurring problems may influence each other at the metabolomic level. Because the methionine and glutathione metabolic pathways are physiologically linked, sleep problems appear to be associated with the first metabolic pathway, whereas obesity may be associated with the second.
Gene therapy (GT) is becoming a realistic treatment option for patients with haemophilia. Outside clinical trials, the complexity and potential complications of GT will pose unprecedented challenges to haemophilia care centres.AIM: To explore the potential use of electronic tools to improve the delivery of GT under real-world conditions.METHODS: Considering the hub-and-spoke model, the GTH working group on GT considered the entire patient pathway and reached consensus on requirements for an integrative software tool to secure documenting and sharing information between treaters, pharmacies and patients.RESULTS: Six steps of the gene therapy process were identified, each requiring completion of the previous step as a prerequisite for entry. The responsibilities of GT dosing and follow-up treatment centres, read/write access rules, and the minimum data set were outlined. Data contributed by patients through mobile devices was also considered.CONCLUSION: Important information needs to be shared between patients and treatment centres in a real-world GT hub-and-spoke model. Collecting and sharing this information in well-organised electronic applications will not only improve patient care but also enable national and international data collection in clinical registries...
Bei der Bekleidungsmodellierung geht es um den Entwurf von Bekleidung von Personen, die beispielsweise in Szenen dargestellt werden können. Dabei stützt sich der Entwurf auf Informationen aus einer Datengrundlage. Die Darstellung von Szenen, in denen Personen dargestellt werden, stellt sich grundsätzlich als Zusammenspiel komplexer Teilaspekte dar. Dabei wird die Nachvollziehbarkeit einer modellierten Szene oder modellierter Avatare im Auge des Betrachters ganz wesentlich durch den Faktor passend gewählter Kleidung bestimmt.
In dieser Arbeit werden Ansätze und Verfahren vorgestellt, die zur Bekleidungsmodellierung auf Grundlage von Textdokumenten basieren. Dafür werden Möglichkeiten erörtert, die es erlauben Informationen aus Texten zu extrahieren und für die Modellierung einzusetzen.
Zur Bearbeitung der Aufgabenstellung wird zunächst ein aus dem Machine Learning bekanntes kontextuelles Modell hinsichtlich einer Mehrklassen-Klassifizierung trainiert und angewendet. Daraufhin wird die Erstellung einer eigenen Wissensressource, die sich auf textlicher Ebene mit dem Thema der Bekleidung auseinandersetzt, aufgebaut und mit zahlreichen Informationen aus bereits bestehenden Ressourcen popularisiert. Die neue Ressource wird in Form einer Graphdatenbank entworfen. Dabei werden Relationen zwischen den einzelnen Elementen mithilfe von statischen Modellen sowie einem kontextuellen Modell, dem BERT-Modell, erstellt. Schließlich wird auf Grundlage der entwickelten Graphdatenbank ein in der Programmiersprache Python entwickeltes Programm vorgestellt, dass Eingabetexte unter Hinzunahme der Informationen und Relationen innerhalb der Graphdatenbank verarbeitet und Kleidungsstücke detektiert.
Nach der theoretischen Aufarbeitung der entwickelten Ansätze werden die daraus resultierenden Ergebnisse diskutiert und bestehende Problematiken bei der Bearbeitung der Aufgabenstellung angesprochen. Abschließend wird die Arbeit zusammengefasst und Anregungen für die weitere Bearbeitung dieser Thematik vorgestellt.
Recent experimental findings have reported the presence of unconventional charge orders in the enlarged (2 × 2) unit-cell of kagome metals AV3Sb5 (A = K, Rb, Cs) and hinted towards specific topological signatures. Motivated by these discoveries, we investigate the types of topological phases that can be realized in such kagome superlattices. In this context, we employ a recently introduced statistical method capable of constructing topological models for any generic lattice. By analyzing large data sets generated from symmetry-guided distributions of randomized tight-binding parameters, and labeled with the corresponding topological index, we extract physically meaningful information. We illustrate the possible real-space manifestations of charge and bond modulations and associated flux patterns for different topological classes, and discuss their relation to present theoretical predictions and experimental signatures for the AV3Sb5 family. Simultaneously, we predict higher-order topological phases that may be realized by appropriately manipulating the currently known systems.
Bacteria that are capable of organizing themselves as biofilms are an important public health issue. Knowledge discovery focusing on the ability to swarm and conquer the surroundings to form persistent colonies is therefore very important for microbiological research communities that focus on a clinical perspective. Here, we demonstrate how a machine learning workflow can be used to create useful models that are capable of discriminating distinct associated growth behaviors along distinct phenotypes. Based on basic gray-scale images, we provide a processing pipeline for binary image generation, making the workflow accessible for imaging data from a wide range of devices and conditions. The workflow includes a locally estimated regression model that easily applies to growth-related data and a shape analysis using identified principal components. Finally, we apply a density-based clustering application with noise (DBSCAN) to extract and analyze characteristic, general features explained by colony shapes and areas to discriminate distinct Bacillus subtilis phenotypes. Our results suggest that the differences regarding their ability to swarm and subsequently conquer the medium that surrounds them result in characteristic features. The differences along the time scales of the distinct latency for the colony formation give insights into the ability to invade the surroundings and therefore could serve as a useful monitoring tool.