Refine
Document Type
- Doctoral Thesis (5)
- Master's Thesis (3)
Language
- English (8)
Has Fulltext
- yes (8)
Is part of the Bibliography
- no (8)
Keywords
- AI Safety (1)
- Autonomous Driving (1)
- Autonomous Learning (1)
- DNN Robustness (1)
- Deep Learning (1)
- Developmental Robotics (1)
- Historical Document Analysis (1)
- Information Retrieval (1)
- Intrinsic Motivations (1)
- Machine Learning (1)
Institute
Machine learning (ML) techniques have evolved rapidly in recent years and have shown impressive capabilities in feature extraction, pattern recognition, and causal inference. There has been an increasing attention to applying ML to medical applications, such as medical diagnosis, drug discovery, personalized medicine, and numerous other medical problems. ML-based methods have the advantage of processing vast amounts of data.
With an ever increasing amount of medical data collection and large, inter-subject variability in the medical data, automated data processing pipelines are very much desirable since it is laborious, expensive, and error-prone to rely solely on human processing. ML methods have the potential to uncover interesting patterns, unravel correlations between complex features, learn patient-specific representations, and make accurate predictions. Motivated by these promising aspects, in this thesis, I present studies where I have implemented deep neural networks for the early diagnosis of epilepsy based on electroencephalography (EEG) data and brain tumor detection based on magnetic resonance spectroscopy (MRS) data.
In the project for early diagnosis of epilepsy, we are dealing with one of the most common neurological disorders, epilepsy, which is characterized by recurrent unprovoked seizures. It can be triggered by a variety of initial brain injuries and manifests itself after a time window which is called the latent period. During this period, a cascade of structural and functional brain alterations takes place leading to an increased seizure susceptibility.
The development and extension of brain tissue capable of generating spontaneous seizures is defined as epileptogenesis (EPG).
Detecting the presence of EPG provides a precious opportunity for targeted early medical interventions and, thus, can slow down or even halt the disease progression. In order to study brain signals in this latent window, animal epilepsy models are used to provide valuable data as it is extremely difficult to obtain this data from human patients. The aim of this study is to discover biomarkers of EPG using animal models and then to find the equivalent and counterparts in human patients' data. However, the EEG features for EPG are not well-understood and there is not a sufficiently large amount of annotated data for ML-based algorithms. To approach this problem, firstly, I utilized the timestamp information of the recorded EEG from an animal epilepsy model where epilepsy is induced by an electrical stimulation. The timestamp serves as a form of weak supervision, i.e., before and after the stimulation. Secondly, I implemented a deep residual neural network and trained it with a binary classification task to distinguish the EEG signals from these two phases. After obtaining a high discriminative ability on the binary classification task, I proposed to divide further the time span after the stimulation for a three-class classification, aiming to detect possible stages of the progression of the latent EPG phase. I have shown that the model can distinguish EEG signals at different stages of EPG with high accuracy and generalization ability. I have also demonstrated that some of the learned features from the network are clinically relevant.
In the task of detecting brain tumors based on MRS data, I first proposed to apply a deep neural network on the MRS data collected from over 400 patients for a binary classification task. To combat the challenge of noisy labeling, I developed a distillation step to filter out relatively ``cleanly'' labeled samples. A mixing-based data augmentation method was also implemented to expand the size of the training set. All the experiments were designed to be conducted with a leave-patient-out scheme to ensure the generalization ability of the model. Averaged across all leave-patient-out cross-validation sets, the proposed method performed on par with human neuroradiologists, while outperforming other baseline methods. I have demonstrated the distillation effect on the MNIST data set with manually-introduced label noise as well as providing visualization of the input influences on the final classification through a class activation map method.
Moreover, I have proposed to aggregate information at the subject level, which could provide more information and insights. This is inspired by the concept of multiple instance learning, where instance-level labels are not required and which is more tolerant to noisy labeling. I have proposed to generate data bags consisting of instances from each patient and also proposed two modules to ensure permutation invariance, i.e., an attention module and a pooling module. I have compared the performance of the network in different cases, i.e., with and without permutation-invariant modules, with and without data augmentation, single-instance-based and multiple-instance-based learning and have shown that neural networks equipped with the proposed attention or pooling modules can outperform human experts.
In the human brain, the incoming light to the retina is transformed into meaningful representations that allow us to interact with the world. In a similar vein, the RGB pixel values are transformed by a deep neural network (DNN) into meaningful representations relevant to solving a computer vision task it was trained for. Therefore, in my research, I aim to reveal insights into the visual representations in the human visual cortex and DNNs solving vision tasks.
In the previous decade, DNNs have emerged as the state-of-the-art models for predicting neural responses in the human and monkey visual cortex. Research has shown that training on a task related to a brain region’s function leads to better predictivity than a randomly initialized network. Based on this observation, we proposed that we can use DNNs trained on different computer vision tasks to identify functional mapping of the human visual cortex.
To validate our proposed idea, we first investigate a brain region occipital place area (OPA) using DNNs trained on scene parsing task and scene classification task. From the previous investigations about OPA’s functions, we knew that it encodes navigational affordances that require spatial information about the scene. Therefore, we hypothesized that OPA’s representation should be closer to a scene parsing model than a scene classification model as the scene parsing task explicitly requires spatial information about the scene. Our results showed that scene parsing models had representation closer to OPA than scene classification models thus validating our approach.
We then selected multiple DNNs performing a wide range of computer vision tasks ranging from low-level tasks such as edge detection, 3D tasks such as surface normals, and semantic tasks such as semantic segmentation. We compared the representations of these DNNs with all the regions in the visual cortex, thus revealing the functional representations of different regions of the visual cortex. Our results highly converged with previous investigations of these brain regions validating the feasibility of the proposed approach in finding functional representations of the human brain. Our results also provided new insights into underinvestigated brain regions that can serve as starting hypotheses and promote further investigation into those brain regions.
We applied the same approach to find representational insights about the DNNs. A DNN usually consists of multiple layers with each layer performing a computation leading to the final layer that performs prediction for a given task. Training on different tasks could lead to very different representations. Therefore, we first investigate at which stage does the representation in DNNs trained on different tasks starts to differ. We further investigate if the DNNs trained on similar tasks lead to similar representations and on dissimilar tasks lead to more dissimilar representations. We selected the same set of DNNs used in the previous work that were trained on the Taskonomy dataset on a diverse range of 2D, 3D and semantic tasks. Then, given a DNN trained on a particular task, we compared the representation of multiple layers to corresponding layers in other DNNs. From this analysis, we aimed to reveal where in the network architecture task-specific representation is prominent. We found that task specificity increases as we go deeper into the DNN architecture and similar tasks start to cluster in groups. We found that the grouping we found using representational similarity was highly correlated with grouping based on transfer learning thus creating an interesting application of the approach to model selection in transfer learning.
During previous works, several new measures were introduced to compare DNN representations. So, we identified the commonalities in different measures and unified different measures into a single framework referred to as duality diagram similarity. This work opens up new possibilities for similarity measures to understand DNN representations. While demonstrating a much higher correlation with transfer learning than previous state-of-the-art measures we extend it to understanding layer-wise representations of models trained on the Imagenet and Places dataset using different tasks and demonstrate its applicability to layer selection for transfer learning.
In all the previous works, we used the task-specific DNN representations to understand the representations in the human visual cortex and other DNNs. We were able to interpret our findings in terms of computer vision tasks such as edge detection, semantic segmentation, depth estimation, etc. however we were not able to map the representations to human interpretable concepts. Therefore in our most recent work, we developed a new method that associates individual artificial neurons with human interpretable concepts.
Overall, the works in this thesis revealed new insights into the representation of the visual cortex and DNNs...
Recent advances in artificial neural networks enabled the quick development of new learning algorithms, which, among other things, pave the way to novel robotic applications. Traditionally, robots are programmed by human experts so as to accomplish pre-defined tasks. Such robots must operate in a controlled environment to guarantee repeatability, are designed to solve one unique task and require costly hours of development. In developmental robotics, researchers try to artificially imitate the way living beings acquire their behavior by learning. Learning algorithms are key to conceive versatile and robust robots that can adapt to their environment and solve multiple tasks efficiently. In particular, Reinforcement Learning (RL) studies the acquisition of skills through teaching via rewards. In this thesis, we will introduce RL and present recent advances in RL applied to robotics. We will review Intrinsically Motivated (IM) learning, a special form of RL, and we will apply in particular the Active Efficient Coding (AEC) principle to the learning of active vision. We also propose an overview of Hierarchical Reinforcement Learning (HRL), an other special form of RL, and apply its principle to a robotic manipulation task.
When performing transfer learning in Computer Vision, normally a pretrained model (source model) that is trained on a specific task and a large dataset like ImageNet is used. The learned representation of that source model is then used to perform a transfer to a target task. Performing transfer learning in this way had a great impact on Computer Vision, because it worked seamlessly, especially on tasks that are related to each other. Current research topics have investigated the relationship between different tasks and their impact on transfer learning by developing similarity methods. These similarity methods have in common, to do transfer learning without actually doing transfer learning in the first place but rather by predicting transfer learning rankings so that the best possible source model can be selected from a range of different source models. However, these methods have focused only on singlesource transfers and have not paid attention to multi-source transfers. Multi-source transfers promise even better results than single-source transfers as they combine information from multiple source tasks, all of which are useful to the target task. We fill this gap and propose a many-to-one task similarity method called MOTS that predicts both, single-source transfers and multi-source transfers to a specific target task. We do that by using linear regression and the source representations of the source models to predict the target representation. We show that we achieve at least results on par with related state-of-the-art methods when only focusing on singlesource transfers using the Pascal VOC and Taskonomy benchmark. We show that we even outperform all of them when using single and multi-source transfers together (0.9 vs. 0.8) on the Taskonomy benchmark. We additionally investigate the performance of MOTS in conjunction with a multi-task learning architecture. The task-decoder heads of a multi-task learning architecture are used in different variations to do multi-source transfers since it promises efficiency over multiple singletask architectures and incurs less computational cost. Results show that our proposed method accurately predicts transfer learning rankings on the NYUD dataset and even shows the best transfer learning results always being achieved when using more than one source task. Additionally, it is further examined that even just using one task-decoder head from the multi-task learning architecture promises better transfer learning results, than using a single-task architecture for the same task, which is due to the shared information from different tasks in the multi-task learning architecture in previous layers. Since the MOTS rankings for selecting the MTI-Net task-decoder head with the highest transfer learning performance were very accurate for the NYUD but not satisfying for the Pascal VOC dataset, further experiments need to varify the generalizability of MOTS rankings for the selection of the optimal task-decoder head from a multi-task architecture.
In the recent past, we are making huge progress in the field of Artificial Intelligence. Since the rise of neural networks, astonishing new frontiers are continuously being discovered. The development is so fast that overall no major technical limits are in sight. Hence, digitization has expanded from the base of academia and industry to such an extent that it is prevalent in the politics, mass media and even popular arts. The DFG-funded project Specialized Information Service for Biodiversity Research and the BMBF-funded project Linked Open Tafsir can be placed exactly in that overall development. Both projects aim to build an intelligent, up-to-date, modern research infrastructure on biodiversity and theological studies for scholars researching in these respective fields of historical science. Starting from digitized German and Arabic historical literature containing so far unavailable valuable knowledge on biodiversity and theological studies, at its core, our dissertation targets to incorporate state-of-the-art Machine Learning methods for analyzing natural language texts of low-resource languages and enabling foundational Natural Language Processing tasks on them, such as Sentence Boundary Detection, Named Entity Recognition, and Topic Modeling. This ultimately leads to paving the way for new scientific discoveries in the historical disciplines of natural science and humanities. By enriching the landscape of historical low-resource languages with valuable annotation data, our work becomes part of the greater movement of digitizing the society, thus allowing people to focus on things which really matter in science and industry.
Statistical shape models learn to capture the most characteristic geometric variations of anatomical structures given samples from their population. Accordingly, shape models have become an essential tool for many medical applications and are used in, for example, shape generation, reconstruction, and classification tasks. However, established statistical shape models require precomputed dense correspondence between shapes, often lack robustness, and ignore the global surface topology. This thesis presents a novel neural flow-based shape model that does not require any precomputed correspondence. The proposed model relies on continuous flows of a neural ordinary differential equation to model shapes as deformations of a template. To increase the expressivity of the neural flow and disentangle global, low-frequency deformations from the generation of local, high- frequency details, we propose to apply a hierarchy of flows. We evaluate the performance of our model on two anatomical structures, liver, and distal femur. Our model outperforms state-of-the-art methods in providing an expressive and robust shape prior, as indicated by its generalization ability and specificity. More so, we demonstrate the effectiveness of our shape model on shape reconstruction tasks and find anatomically plausible solutions. Finally, we assess the quality of the emerging shape representation in an unsupervised setting and discriminate healthy from pathological shapes.
Deep learning with neural networks seems to have largely replaced traditional design of computer vision systems. Automated methods to learn a plethora of parameters are now used in favor of previously practiced selection of explicit mathematical operators for a specific task. The entailed promise is that practitioners no longer need to take care of every individual step, but rather focus on gathering big amounts of data for neural network training. As a consequence, both a shift in mindset towards a focus on big datasets, as well as a wave of conceivable applications based exclusively on deep learning can be observed.
This PhD dissertation aims to uncover some of the only implicitly mentioned or overlooked deep learning aspects, highlight unmentioned assumptions, and finally introduce methods to address respective immediate weaknesses. In the author’s humble opinion, these prevalent shortcomings can be tied to the fact that the involved steps in the machine learning workflow are frequently decoupled. Success is predominantly measured based on accuracy measures designed for evaluation with static benchmark test sets. Individual machine learning workflow components are assessed in isolation with respect to available data, choice of neural network architecture, and a particular learning algorithm, rather than viewing the machine learning system as a whole in context of a particular application. Correspondingly, in this dissertation, three key challenges have been identified: 1. Choice and flexibility of a neural network architecture. 2. Identification and rejection of unseen unknown data to avoid false predictions. 3. Continual learning without forgetting of already learned information. These latter challenges have already been crucial topics in older literature, alas, seem to require a renaissance in modern deep learning literature. Initially, it may appear that they pose independent research questions, however, the thesis posits that the aspects are intertwined and require a joint perspective in machine learning based systems. In summary, the essential question is thus how to pick a suitable neural network architecture for a specific task, how to recognize which data inputs belong to this context, which ones originate from potential other tasks, and ultimately how to continuously include such identified novel data in neural network training over time without overwriting existing knowledge.
Thus, the central emphasis of this dissertation is to build on top of existing deep learning strengths, yet also acknowledge mentioned weaknesses, in an effort to establish a deeper understanding of interdependencies and synergies towards the development of unified solution mechanisms. For this purpose, the main portion of the thesis is in cumulative form. The respective publications can be grouped according to the three challenges outlined above. Correspondingly, chapter 1 is focused on choice and extendability of neural network architectures, analyzed in context of popular image classification tasks. An algorithm to automatically determine neural network layer width is introduced and is first contrasted with static architectures found in the literature. The importance of neural architecture design is then further showcased on a real-world application of defect detection in concrete bridges. Chapter 2 is comprised of the complementary ensuing questions of how to identify unknown concepts and subsequently incorporate them into continual learning. A joint central mechanism to distinguish unseen concepts from what is known in classification tasks, while enabling consecutive training without forgetting or revisiting older classes, is proposed. Once more, the role of the chosen neural network architecture is quantitatively reassessed. Finally, chapter 3 culminates in an overarching view, where developed parts are connected. Here, an extensive survey further serves the purpose to embed the gained insights in the broader literature landscape and emphasizes the importance of a common frame of thought. The ultimately presented approach thus reflects the overall thesis’ contribution to advance neural network based machine learning towards a unified solution that ties together choice of neural architecture with the ability to learn continually and the capability to automatically separate known from unknown data.
AI-based computer vision systems play a crucial role in the environment perception for autonomous driving. Although the development of self-driving systems has been pursued for multiple decades, it is only recently that breakthroughs in Deep Neural Networks (DNNs) have led to their widespread application in perception pipelines, which are getting more and more sophisticated. However, with this rising trend comes the need for a systematic safety analysis to evaluate the DNN's behavior in difficult scenarios as well as to identify the various factors that cause misbehavior in such systems. This work aims to deliver a crucial contribution to the lacking literature on the systematic analysis of Performance Limiting Factors (PLFs) for DNNs by investigating the task of pedestrian detection in urban traffic from a monocular camera mounted on an autonomous vehicle. To investigate the common factors that lead to DNN misbehavior, six commonly used state-of-the-art object detection architectures and three detection tasks are studied using a new large-scale synthetic dataset and a smaller real-world dataset for pedestrian detection. The systematic analysis includes 17 factors from the literature and four novel factors that are introduced as part of this work. Each of the 21 factors is assessed based on its influence on the detection performance and whether it can be considered a Performance Limiting Factor (PLF). In order to support the evaluation of the detection performance, a novel and task-oriented Pedestrian Detection Safety Metric (PDSM) is introduced, which is specifically designed to aid in the identification of individual factors that contribute to DNN failure. This work further introduces a training approach for F1-Score maximization whose purpose is to ensure that the DNNs are assessed at their highest performance. Moreover, a new occlusion estimation model is introduced to replace the missing pedestrian occlusion annotations in the real-world dataset. Based on a qualitative analysis of the correlation graphs that visualize the correlation between the PLFs and the detection performance, this study identified 16 of the initial 21 factors as being PLFs for DNNs out of which the entropy, the occlusion ratio, the boundary edge strength, and the bounding box aspect ratio turned out to be most severely affecting the detection performance. The findings of this study highlight some of the most serious shortcomings of current DNNs and pave the way for future research to address these issues.