Machine learning for healthcare with a focus on the early diagnosis of epilepsy and brain tumor detection

  • Machine learning (ML) techniques have evolved rapidly in recent years and have shown impressive capabilities in feature extraction, pattern recognition, and causal inference. There has been an increasing attention to applying ML to medical applications, such as medical diagnosis, drug discovery, personalized medicine, and numerous other medical problems. ML-based methods have the advantage of processing vast amounts of data. With an ever increasing amount of medical data collection and large, inter-subject variability in the medical data, automated data processing pipelines are very much desirable since it is laborious, expensive, and error-prone to rely solely on human processing. ML methods have the potential to uncover interesting patterns, unravel correlations between complex features, learn patient-specific representations, and make accurate predictions. Motivated by these promising aspects, in this thesis, I present studies where I have implemented deep neural networks for the early diagnosis of epilepsy based on electroencephalography (EEG) data and brain tumor detection based on magnetic resonance spectroscopy (MRS) data. In the project for early diagnosis of epilepsy, we are dealing with one of the most common neurological disorders, epilepsy, which is characterized by recurrent unprovoked seizures. It can be triggered by a variety of initial brain injuries and manifests itself after a time window which is called the latent period. During this period, a cascade of structural and functional brain alterations takes place leading to an increased seizure susceptibility. The development and extension of brain tissue capable of generating spontaneous seizures is defined as epileptogenesis (EPG). Detecting the presence of EPG provides a precious opportunity for targeted early medical interventions and, thus, can slow down or even halt the disease progression. In order to study brain signals in this latent window, animal epilepsy models are used to provide valuable data as it is extremely difficult to obtain this data from human patients. The aim of this study is to discover biomarkers of EPG using animal models and then to find the equivalent and counterparts in human patients' data. However, the EEG features for EPG are not well-understood and there is not a sufficiently large amount of annotated data for ML-based algorithms. To approach this problem, firstly, I utilized the timestamp information of the recorded EEG from an animal epilepsy model where epilepsy is induced by an electrical stimulation. The timestamp serves as a form of weak supervision, i.e., before and after the stimulation. Secondly, I implemented a deep residual neural network and trained it with a binary classification task to distinguish the EEG signals from these two phases. After obtaining a high discriminative ability on the binary classification task, I proposed to divide further the time span after the stimulation for a three-class classification, aiming to detect possible stages of the progression of the latent EPG phase. I have shown that the model can distinguish EEG signals at different stages of EPG with high accuracy and generalization ability. I have also demonstrated that some of the learned features from the network are clinically relevant. In the task of detecting brain tumors based on MRS data, I first proposed to apply a deep neural network on the MRS data collected from over 400 patients for a binary classification task. To combat the challenge of noisy labeling, I developed a distillation step to filter out relatively ``cleanly'' labeled samples. A mixing-based data augmentation method was also implemented to expand the size of the training set. All the experiments were designed to be conducted with a leave-patient-out scheme to ensure the generalization ability of the model. Averaged across all leave-patient-out cross-validation sets, the proposed method performed on par with human neuroradiologists, while outperforming other baseline methods. I have demonstrated the distillation effect on the MNIST data set with manually-introduced label noise as well as providing visualization of the input influences on the final classification through a class activation map method. Moreover, I have proposed to aggregate information at the subject level, which could provide more information and insights. This is inspired by the concept of multiple instance learning, where instance-level labels are not required and which is more tolerant to noisy labeling. I have proposed to generate data bags consisting of instances from each patient and also proposed two modules to ensure permutation invariance, i.e., an attention module and a pooling module. I have compared the performance of the network in different cases, i.e., with and without permutation-invariant modules, with and without data augmentation, single-instance-based and multiple-instance-based learning and have shown that neural networks equipped with the proposed attention or pooling modules can outperform human experts.
  • Die Methoden des maschinellen Lernens (ML) waren in den vergangenen Jahren sehr erfolgreich und haben ihr großes Potential in vielen Forschungsgebieten gezeigt, z. B. das Lernen von Spielen [2, 3, 4], das Generieren hochwertiger Bilder [5, 6], Style Transfer [7], Spracherkennung und Synthese [8, 9] sowie die Verarbeitung natürlicher Sprache [10, 11]. Das maschinelle Lernen proőtiert stark von der immer größeren Rechenleistung, der Verfügbarkeit großer und spezialisierter Datensätze und tieferen theoretischen Einsichten in viele Lernalgorithmen. In den letzten Jahren gab es eine Vielzahl von Forschungbemühungen, die sich mit der Anwendung von ML-Methoden im Gesundsheitsbereich befassen. Es gibt beeindruckende Arbeiten für diverse medizinische Probleme, beispielsweise Klassifikation von Herzkreislauferkrankungen [12], Hautkrebserkennung [13], Lungenkrebsdiagnose [14], automatische Vorhersage von Erkrankungen [15], sowie COVID-19 Diagnose und Behandlung [16]. In dieser Dissertation befassen wir uns mit dem Problem, ML-methoden im Kontext von Epilepsie- und Gehirntumorerkennung anzuwenden. Im ersten Projekt versuchen wir den Krankheitsverlauf in der latenten Epileptogenesephase (nach der Gehirnschädigung aber vor dem ersten spontanen epileptischen Anfall) zu verstehen und frühzeitig Vorhersagen zu treffen, ob ein bestimmtes Individuum ein hohes Risiko hat, Epilepsie zu entwickeln oder nicht. Im zweiten Projekt zielen wir darauf ab, ein Pre-Screening-Werkzeug zu entwickeln, welches Gehirntumore basierend auf Magnetresonanzspektroskopiedaten (MRSśDaten) erkennen kann. Im Folgenden werden wir unsere Arbeit zu diesen beiden Themen zusammenfassen...

Download full text files

Export metadata

Additional Services

Share in Twitter Search Google Scholar
Author:Diyuan LuGND
Place of publication:Frankfurt am Main
Referee:Jochen TrieschORCiD, Gemma Roig NogueraORCiDGND
Document Type:Doctoral Thesis
Date of Publication (online):2022/04/28
Year of first Publication:2022
Publishing Institution:Universitätsbibliothek Johann Christian Senckenberg
Granting Institution:Johann Wolfgang Goethe-Universität
Date of final exam:2022/03/03
Release Date:2022/05/09
Page Number:149
Institutes:Informatik und Mathematik
Dewey Decimal Classification:0 Informatik, Informationswissenschaft, allgemeine Werke / 00 Informatik, Wissen, Systeme / 004 Datenverarbeitung; Informatik
6 Technik, Medizin, angewandte Wissenschaften / 61 Medizin und Gesundheit / 610 Medizin und Gesundheit
Licence (German):License LogoDeutsches Urheberrecht