004 Datenverarbeitung; Informatik
Refine
Year of publication
- 2022 (50) (remove)
Document Type
- Article (21)
- Doctoral Thesis (11)
- Preprint (6)
- Bachelor Thesis (4)
- Master's Thesis (3)
- Working Paper (2)
- Part of a Book (1)
- Conference Proceeding (1)
- Contribution to a Periodical (1)
Has Fulltext
- yes (50)
Is part of the Bibliography
- no (50)
Keywords
- data science (5)
- NLP (3)
- artificial intelligence (3)
- digital medicine (3)
- machine learning (3)
- machine-learning (3)
- Biomedical informatics (2)
- Data science (2)
- Natural Language Processing (2)
- patients (2)
Institute
With free delivery of products virtually being a standard in E-commerce, product returns pose a major challenge for online retailers and society. For retailers, product returns involve significant transportation, labor, disposal, and administrative costs. From a societal perspective, product returns contribute to greenhouse gas emissions and packaging disposal and are often a waste of natural resources. Therefore, reducing product returns has become a key challenge. This paper develops and validates a novel smart green nudging approach to tackle the problem of product returns during customers’ online shopping processes. We combine a green nudge with a novel data enrichment strategy and a modern causal machine learning method. We first run a large-scale randomized field experiment in the online shop of a German fashion retailer to test the efficacy of a novel green nudge. Subsequently, we fuse the data from about 50,000 customers with publicly-available aggregate data to create what we call enriched digital footprints and train a causal machine learning system capable of optimizing the administration of the green nudge. We report two main findings: First, our field study shows that the large-scale deployment of a simple, low-cost green nudge can significantly reduce product returns while increasing retailer profits. Second, we show how a causal machine learning system trained on the enriched digital footprint can amplify the effectiveness of the green nudge by “smartly” administering it only to certain types of customers. Overall, this paper demonstrates how combining a low-cost marketing instrument, a privacy-preserving data enrichment strategy, and a causal machine learning method can create a win-win situation from both an environmental and economic perspective by simultaneously reducing product returns and increasing retailers’ profits.
Recent advances in artificial neural networks enabled the quick development of new learning algorithms, which, among other things, pave the way to novel robotic applications. Traditionally, robots are programmed by human experts so as to accomplish pre-defined tasks. Such robots must operate in a controlled environment to guarantee repeatability, are designed to solve one unique task and require costly hours of development. In developmental robotics, researchers try to artificially imitate the way living beings acquire their behavior by learning. Learning algorithms are key to conceive versatile and robust robots that can adapt to their environment and solve multiple tasks efficiently. In particular, Reinforcement Learning (RL) studies the acquisition of skills through teaching via rewards. In this thesis, we will introduce RL and present recent advances in RL applied to robotics. We will review Intrinsically Motivated (IM) learning, a special form of RL, and we will apply in particular the Active Efficient Coding (AEC) principle to the learning of active vision. We also propose an overview of Hierarchical Reinforcement Learning (HRL), an other special form of RL, and apply its principle to a robotic manipulation task.
Orientation hypercolumns in the visual cortex are delimited by the repeating pinwheel patterns of orientation selective neurons. We design a generative model for visual cortex maps that reproduces such orientation hypercolumns as well as ocular dominance maps while preserving retinotopy. The model uses a neural placement method based on t–distributed stochastic neighbour embedding (t–SNE) to create maps that order common features in the connectivity matrix of the circuit. We find that, in our model, hypercolumns generally appear with fixed cell numbers independently of the overall network size. These results would suggest that existing differences in absolute pinwheel densities are a consequence of variations in neuronal density. Indeed, available measurements in the visual cortex indicate that pinwheels consist of a constant number of ∼30, 000 neurons. Our model is able to reproduce a large number of characteristic properties known for visual cortex maps. We provide the corresponding software in our MAPStoolbox for Matlab.
Euclidean distance-optimized data transformation for cluster analysis in biomedical data (EDOtrans)
(2022)
Background: Data transformations are commonly used in bioinformatics data processing in the context of data projection and clustering. The most used Euclidean metric is not scale invariant and therefore occasionally inappropriate for complex, e.g., multimodal distributed variables and may negatively affect the results of cluster analysis. Specifically, the squaring function in the definition of the Euclidean distance as the square root of the sum of squared differences between data points has the consequence that the value 1 implicitly defines a limit for distances within clusters versus distances between (inter-) clusters.
Methods: The Euclidean distances within a standard normal distribution (N(0,1)) follow a N(0,2–√) distribution. The EDO-transformation of a variable X is proposed as EDO=X/(2–√⋅s) following modeling of the standard deviation s by a mixture of Gaussians and selecting the dominant modes via item categorization. The method was compared in artificial and biomedical datasets with clustering of untransformed data, z-transformed data, and the recently proposed pooled variable scaling.
Results: A simulation study and applications to known real data examples showed that the proposed EDO scaling method is generally useful. The clustering results in terms of cluster accuracy, adjusted Rand index and Dunn’s index outperformed the classical alternatives. Finally, the EDO transformation was applied to cluster a high-dimensional genomic dataset consisting of gene expression data for multiple samples of breast cancer tissues, and the proposed approach gave better results than classical methods and was compared with pooled variable scaling.
Conclusions: For multivariate procedures of data analysis, it is proposed to use the EDO transformation as a better alternative to the established z-standardization, especially for nontrivially distributed data. The “EDOtrans” R package is available at https://cran.r-project.org/package=EDOtrans.
Bayesian inference is ubiquitous in science and widely used in biomedical research such as cell sorting or “omics” approaches, as well as in machine learning (ML), artificial neural networks, and “big data” applications. However, the calculation is not robust in regions of low evidence. In cases where one group has a lower mean but a higher variance than another group, new cases with larger values are implausibly assigned to the group with typically smaller values. An approach for a robust extension of Bayesian inference is proposed that proceeds in two main steps starting from the Bayesian posterior probabilities. First, cases with low evidence are labeled as “uncertain” class membership. The boundary for low probabilities of class assignment (threshold 𝜀
) is calculated using a computed ABC analysis as a data-based technique for item categorization. This leaves a number of cases with uncertain classification (p < 𝜀
). Second, cases with uncertain class membership are relabeled based on the distance to neighboring classified cases based on Voronoi cells. The approach is demonstrated on biomedical data typically analyzed with Bayesian statistics, such as flow cytometric data sets or biomarkers used in medical diagnostics, where it increased the class assignment accuracy by 1–10% depending on the data set. The proposed extension of the Bayesian inference of class membership can be used to obtain robust and plausible class assignments even for data at the extremes of the distribution and/or for which evidence is weak.
Enabling cybersecurity and protecting personal data are crucial challenges in the development and provision of digital service chains. Data and information are the key ingredients in the creation process of new digital services and products. While legal and technical problems are frequently discussed in academia, ethical issues of digital service chains and the commercialization of data are seldom investigated. Thus, based on outcomes of the Horizon2020 PANELFIT project, this work discusses current ethical issues related to cybersecurity. Utilizing expert workshops and encounters as well as a scientific literature review, ethical issues are mapped on individual steps of digital service chains. Not surprisingly, the results demonstrate that ethical challenges cannot be resolved in a general way, but need to be discussed individually and with respect to the ethical principles that are violated in the specific step of the service chain. Nevertheless, our results support practitioners by providing and discussing a list of ethical challenges to enable legally compliant as well as ethically acceptable solutions in the future.
Die Emergenz digitaler Netzwerke ist auf die ständige Entwicklung und Transformation neuer Informationstechnologien zurückzuführen.
Dieser Strukturwandel führt zu äußerst komplexen Systemen in vielen verschiedenen Lebensbereichen.
Es besteht daher verstärkt die Notwendigkeit, die zugrunde liegenden wesentlichen Eigenschaften von realen Netzwerken zu untersuchen und zu verstehen.
In diesem Zusammenhang wird die Netzwerkanalyse als Mittel für die Untersuchung von Netzwerken herangezogen und stellt beobachtete Strukturen mithilfe mathematischer Modelle dar.
Hierbei, werden in der Regel parametrisierbare Zufallsgraphen verwendet, um eine systematische experimentelle Evaluation von Algorithmen und Datenstrukturen zu ermöglichen.
Angesichts der zunehmenden Menge an Informationen, sind viele Aspekte der Netzwerkanalyse datengesteuert und zur Interpretation auf effiziente Algorithmen angewiesen.
Algorithmische Lösungen müssen daher sowohl die strukturellen Eigenschaften der Eingabe als auch die Besonderheiten der zugrunde liegenden Maschinen, die sie ausführen, sorgfältig berücksichtigen.
Die Generierung und Analyse massiver Netzwerke ist dementsprechend eine anspruchsvolle Aufgabe für sich.
Die vorliegende Arbeit bietet daher algorithmische Lösungen für die Generierung und Analyse massiver Graphen.
Zu diesem Zweck entwickeln wir Algorithmen für das Generieren von Graphen mit vorgegebenen Knotengraden, die Berechnung von Zusammenhangskomponenten massiver Graphen und zertifizierende Grapherkennung für Instanzen, die die Größe des Hauptspeichers überschreiten.
Unsere Algorithmen und Implementierungen sind praktisch effizient für verschiedene Maschinenmodelle und bieten sequentielle, Shared-Memory parallele und/oder I/O-effiziente Lösungen.
Antimicrobial resistant infections arise as a consequential response to evolutionary mechanisms within microbes which cause them to be protected from the effects of antimicrobials. The frequent occurrence of resistant infections poses a global public health threat as their control has become challenging despite many efforts. The dynamics of such infections are driven by processes at multiple levels. For a long time, mathematical models have proved valuable for unravelling complex mechanisms in the dynamics of infections. In this thesis, we focus on mathematical approaches to modelling the development and spread of resistant infections at between-host (population-wide) and within-host (individual) levels.
Within an individual host, switching between treatments has been identified as one of the methods that can be employed for the gradual eradication of resistant strains on the long term. With this as motivation, we study the problem using dynamical systems and notions from control theory. We present a model based on deterministic logistic differential equations which capture the general dynamics of microbial resistance inside an individual host. Fundamentally, this model describes the spread of resistant infections whilst accounting for evolutionary mutations observed in resistant pathogens and capturing them in mutation matrices. We extend this model to explore the implications of therapy switching from a control theoretic perspective by using switched systems and developing control strategies with the goal of reducing the appearance of drug resistant pathogens within the host.
At the between-host level, we use compartmental models to describe the transmission of infection between multiple individuals in a population. In particular, we make a case study of the evolution and spread of the novel coronavirus (SARS-CoV-2) pandemic. So far, vaccination remains a critical component in the eventual solution to this public health crisis. However, as with many other pathogens, vaccine resistant variants of the virus have been a major concern in control efforts by governments and all stakeholders. Using network theory, we investigate the spread and transmission of the disease on social networks by compartmentalising and studying the progression of the disease in each compartment, considering both the original virus strain and one of its highly transmissible vaccine-resistant mutant strains. We investigate these dynamics in the presence of vaccinations and other interventions. Although vaccinations are of absolute importance during viral outbreaks, resistant variants coupled with population hesitancy towards vaccination can lead to further spread of the virus.
AI-based computer vision systems play a crucial role in the environment perception for autonomous driving. Although the development of self-driving systems has been pursued for multiple decades, it is only recently that breakthroughs in Deep Neural Networks (DNNs) have led to their widespread application in perception pipelines, which are getting more and more sophisticated. However, with this rising trend comes the need for a systematic safety analysis to evaluate the DNN's behavior in difficult scenarios as well as to identify the various factors that cause misbehavior in such systems. This work aims to deliver a crucial contribution to the lacking literature on the systematic analysis of Performance Limiting Factors (PLFs) for DNNs by investigating the task of pedestrian detection in urban traffic from a monocular camera mounted on an autonomous vehicle. To investigate the common factors that lead to DNN misbehavior, six commonly used state-of-the-art object detection architectures and three detection tasks are studied using a new large-scale synthetic dataset and a smaller real-world dataset for pedestrian detection. The systematic analysis includes 17 factors from the literature and four novel factors that are introduced as part of this work. Each of the 21 factors is assessed based on its influence on the detection performance and whether it can be considered a Performance Limiting Factor (PLF). In order to support the evaluation of the detection performance, a novel and task-oriented Pedestrian Detection Safety Metric (PDSM) is introduced, which is specifically designed to aid in the identification of individual factors that contribute to DNN failure. This work further introduces a training approach for F1-Score maximization whose purpose is to ensure that the DNNs are assessed at their highest performance. Moreover, a new occlusion estimation model is introduced to replace the missing pedestrian occlusion annotations in the real-world dataset. Based on a qualitative analysis of the correlation graphs that visualize the correlation between the PLFs and the detection performance, this study identified 16 of the initial 21 factors as being PLFs for DNNs out of which the entropy, the occlusion ratio, the boundary edge strength, and the bounding box aspect ratio turned out to be most severely affecting the detection performance. The findings of this study highlight some of the most serious shortcomings of current DNNs and pave the way for future research to address these issues.
TriMem: A parallelized hybrid Monte Carlo software for efficient simulations of lipid membranes
(2022)
Lipid membranes are integral building blocks of living cells and perform a multitude of biological functions. Currently, molecular simulations of cellular-scale membrane remodeling processes at atomic resolution are extremely difficult, due to their size, complexity, and the large times-scales on which these processes occur. Instead, elastic membrane models are used to simulate membrane shapes and transitions between them and to infer their properties and functions. Unfortunately, an efficiently parallelized open-source simulation code to do so has been lacking. Here, we present TriMem, a parallel hybrid Monte Carlo simulation engine for triangulated lipid membranes. The kernels are efficiently coded in C++ and wrapped with Python for ease-of-use. The parallel implementation of the energy and gradient calculations and of Monte Carlo flip moves of edges in the triangulated membrane enable us to simulate large and highly curved membrane structures. For validation, we reproduce phase diagrams of vesicles with varying surface-to-volume ratios and area difference. We also compute the density of states to verify correct Boltzmann sampling. The software can be used to tackle a range of large-scale membrane remodeling processes as a step toward cell-scale simulations. Additionally, extensive documentation make the software accessible to the broad biophysics and computational cell biology communities.