004 Datenverarbeitung; Informatik
Refine
Year of publication
Document Type
- Article (251)
- Doctoral Thesis (147)
- Working Paper (122)
- Conference Proceeding (53)
- Bachelor Thesis (51)
- Diploma Thesis (47)
- Preprint (43)
- Part of a Book (42)
- Contribution to a Periodical (38)
- diplomthesis (31)
Is part of the Bibliography
- no (892)
Keywords
- Lambda-Kalkül (21)
- Inklusion (13)
- Formale Semantik (11)
- Barrierefreiheit (10)
- Digitalisierung (10)
- Operationale Semantik (9)
- data science (9)
- lambda calculus (9)
- machine learning (9)
- Computerlinguistik (8)
Institute
- Informatik (469)
- Informatik und Mathematik (102)
- Präsidium (73)
- Frankfurt Institute for Advanced Studies (FIAS) (51)
- Medizin (51)
- Wirtschaftswissenschaften (44)
- Physik (34)
- Hochschulrechenzentrum (24)
- studiumdigitale (24)
- Extern (12)
Neuropsychiatric disorders are complex, highly heritable but incompletely understood disorders. The clinical and genetic heterogeneity of these disorders poses a significant challenge to the identification of disorder related biomarkers. Besides significant progress in unveiling the genetic basis of these disorders, the underlying causes and biological mechanisms remain obscure. With the advancement in the array, sequencing, and big data technologies, a huge amount of data is generated from individuals across different platforms and in various data structures. But there is a paucity of bioinformatics tools that can integrate this plethora of data. Therefore, there is a need to develop an integrative bioinformatics data analysis tool that combines biological and clinical data from different data types to better understand the underlying genetics.
This thesis presents a bioinformatics pipeline implementing data from different platforms to provide a thorough understanding of the genetic etiology of a neuropsychiatric quantitative as well as a qualitative trait of interest. Throughout the thesis, we present two aspects: one is the development and architecture of the bioinformatics pipeline named MApping the Genetics of neuropsychiatric traits to the molecular NETworks of the human brain (MAGNET). The other part demonstrates the implementation and usefulness of MAGNET analysing large Autism Spectrum Disorder (ASD) cohorts.
MAGNET is a freely available command-line tool available on GitHub (https://github.com/SheenYo/MAGNET). It is implemented within one framework using data integration approaches based on state-of-the-art algorithms and software to ultimately identify the genes and pathways genetically associated with a trait of interest. MAGNET provides an edge over the existing tools since it performs a comprehensive analysis taking care of the data handling and parsing steps necessary to communicate between the different APIs (Application Program Interface). Thus, this avoids the in-between data handling steps required by researchers to provide output from one analysis to the next. Moreover, depending on the size of the dataset users can deduce important information regarding their trait of interest within a time frame of a few days. Besides gaining insights into genetic associations, one of the central features is the mapping of the associated genes onto developing human brain implementing transcriptome data of 16 different brain regions starting from the 5th post-conceptional week to over 40 years of age.
In the second part as proof of concept, we implemented MAGNET on two ASD cohorts. ASD is a group of psychiatric disorders. Clinically, ASD is characterized by the following psychopathology: A) limitations in social interaction and communication, and B) restricted, repetitive behavior. The etiology of this disorder is extremely complex due to its heterogeneous clinical traits and genetics. Therefore, to date, no reliable biomarkers are identified. Here, the aim is to characterize the genetic architecture of ASD taking into account the two aforementioned ASD diagnostic domains. As well as to investigate if these domains are genetically linked or independent of each other. Moreover, we addressed the question if these traits share genetic risk with the categorical diagnosis of ASD and how much of the phenotypic variance of these traits can be explained by the underlying genetics.
We included affected individuals from two ASD cohorts, i.e. the Autism Genome Project (AGP) and a German cohort consisting of 2,735 and 705 families respectively. MAGNET was applied to each of the ASD subdomains as a quantitative dependent variable. MAGNET is divided into five main sections i.e. (1) quality check of the genotype data, (2) imputation of missing genotype data, (3) association analysis of genotype and trait data, (4) gene-based analysis, and (5) enrichment analysis using gene expression data from the human brain.
MAGNET was applied to each of the individual traits in each cohort to perform quality control of the genetic data and imputed the missing data in an automated fashion. MAGNET identified 292 known and new ASD risk genes. These genes were subsequently assigned to biological signaling pathways and gene ontologies via MAGNET. The underlying biological mechanisms converged with respect to neuronal transmission and development processes. By reconciling these genes with the transcriptome of the developing human brain, MAGNET was able to identify that the significant genes associated with the subdomains are expressed at specific time points in brain areas such as the hippocampus, amygdala, and cortical regions. Further, we found that ASD subdomains related to domain A but not
to domain B have a shared genetic etiology.
Poster Presentation from Nineteenth Annual Computational Neuroscience Meeting: CNS*2010 San Antonio, TX, USA. 24-30 July 2010 In order to model extracellular potentials the Line-Source method provides [1] a very powerful and accurate approach. In this method transmembane fluxes are understood as sources for potential distributions which obey the Poission-equation with zero boundary conditions in the infinity. Its solutions reveal that the waveforms are proportional to local transmembrane net currents. The extracellular potentials are comparable small in amplitude and with the aid of their second special derivatives, it is possible to interpret them as additional fluxes to be included into the cable equation having an impact on the membrane potential of surrounding cells [2]. On this basis ephaptic interactions have been studied and have been considered to play a minor role in the network activity. This modeling study provides a new approach based on the first principle of the conservation of charges which leads to a generalized form of the cable equation taking into account the full three-dimensional detail of the cell’s geometry and the presence of the extracellular potential. So instead of coupling the compartment model and the model for extracellular potentials by means of the transmembrane currents, a non-linear system of partial differential equations is solved. Because the abstraction of deviding the cell’s geometry into compartments falls apart, it is possible to examine the contribution of the precise cell geometry to the signal processing while not neglecting the impact which could result from the extracellular potential. Some simulations of propagating action potentials on ramified geometries are going to be shown as well as the resulting distributions of extracellular action potentials.
Human lymph nodes play a central part of immune defense against infection agents and tumor cells. Lymphoid follicles are compartments of the lymph node which are spherical, mainly filled with B cells. B cells are cellular components of the adaptive immune systems. In the course of a specific immune response, lymphoid follicles pass different morphological differentiation stages. The morphology and the spatial distribution of lymphoid follicles can be sometimes associated to a particular causative agent and development stage of a disease. We report our new approach for the automatic detection of follicular regions in histological whole slide images of tissue sections immuno-stained with actin. The method is divided in two phases: (1) shock filter-based detection of transition points and (2) segmentation of follicular regions. Follicular regions in 10 whole slide images were manually annotated by visual inspection, and sample surveys were conducted by an expert pathologist. The results of our method were validated by comparing with the manual annotation. On average, we could achieve a Zijbendos similarity index of 0.71, with a standard deviation of 0.07.
Das adaptive Immunsystem schützt den Menschen vor extra- wie auch intrakorporal auftretenden Pathogenen und Krebszellen. Die Funktionalität dieses Prozesses geht hierbei auf die Interaktion und Kooperation einer Vielzahl verschiedener Zelltypen des Körpers zurück und ist vorwiegend innerhalb der Lymphknoten lokalisiert. Ist auch nur ein Bestandteil dieses sensiblen Prozesses gestört, kann dies zu einem teilweisen oder vollständigen Verlust der immunologischen Fitness des Menschen führen. Daher war es das Ziel dieser Arbeit, solche Aberrationen des humanen Lymphknotengewebes umfassend digital-pathologisch zu detektieren und zu definieren.
Hierfür wurde zunächst eine digitale Gewebedatenbank etabliert. Diese basiert auf dem im Rahmen dieser Arbeit implementierten Content-Management-System Digital Tissue Management Suite. Weiterhin wurde die Software Feature analysis in tissue histomorphometry entwickelt, welche die Analyse von zweidimensionalen whole slide images ermöglicht. Hierbei werden Methoden aus dem Bereich Computer Vision und Graphentheorie eingesetzt, um morphologische und distributionale Eigenschaften der Zelltypen des Lymphknotens zu charakterisieren. Darüber hinaus enthält diese Software Plug-ins zur Visualisierung und statistischen Analyse der Daten.
Aufbauend auf der eigens implementierten, digitalen Infrastruktur, in Kombination mit der Software Imaris wurden zweidimensional und dreidimensional gescannte, reaktive und neoplastische Gewebeproben digital phänotypisiert. Hierbei konnten neue mechanische Barrieren zur Kompartimentalisierung der Keimzentren aufgeklärt werden. Weiterhin konnte der Erhalt des quantitativen Verhältnisses einzelner Zellpopulationen innerhalb der Keimzentren beschrieben werden. Ausgehend von den reaktiven Phänotypen des Lymphknotens, wurden pathophysiologische Aberrationen in verschiedenen lymphatischen Neoplasien untersucht. Hierbei konnte gezeigt werden, dass speziell die strukturelle Destruktion häufig mit einer morphologischen Veränderung der fibroblastischen Retikulumzellen einhergeht.
Neben strukturellen Veränderungen sind auch zytologische Veränderungen der Tumormikroumgebung zu verzeichnen. Eine besondere Rolle spielen hierbei sogenannte Tumor-assoziierte Makrophagen. Im Rahmen dieser Arbeit konnte gezeigt werden, dass speziell Makrophagen in der Tumormikroumgebung des diffus großzelligen B-Zell-Lymphoms und der chronisch lymphatischen Leukämie spezifische pathophysiologische Veränderungen aufzeigen. Auch konnte gezeigt werden, dass genetische Änderungen neoplastischer B-Zellen mit einer generellen Reduktion der CD20-Antigendichte einhergehen.
Zusammenfassend ermöglichten die Ergebnisse die Generierung eines umfassenden digital-pathologischen Profils des klassischen Hodgkin-Lymphoms. Hierbei konnten morphologische Veränderungen neoplastischer, CD30-positiver Hodgkin-Reed-Sternberg-Zellen validiert und beschrieben werden. Auch konnten pathologische Veränderungen des Konnektoms und der Tumormikroumgebung dieser Zellen parametrisiert und quantifiziert werden. Abschließend wurde unter Anwendung eines Random forest-Klassifikators die diagnostische Potenz digital-pathologischer Profile evaluiert und validiert.
Network graphs have become a popular tool to represent complex systems composed of many interacting subunits; especially in neuroscience, network graphs are increasingly used to represent and analyze functional interactions between multiple neural sources. Interactions are often reconstructed using pairwise bivariate analyses, overlooking the multivariate nature of interactions: it is neglected that investigating the effect of one source on a target necessitates to take all other sources as potential nuisance variables into account; also combinations of sources may act jointly on a given target. Bivariate analyses produce networks that may contain spurious interactions, which reduce the interpretability of the network and its graph metrics. A truly multivariate reconstruction, however, is computationally intractable because of the combinatorial explosion in the number of potential interactions. Thus, we have to resort to approximative methods to handle the intractability of multivariate interaction reconstruction, and thereby enable the use of networks in neuroscience. Here, we suggest such an approximative approach in the form of an algorithm that extends fast bivariate interaction reconstruction by identifying potentially spurious interactions post-hoc: the algorithm uses interaction delays reconstructed for directed bivariate interactions to tag potentially spurious edges on the basis of their timing signatures in the context of the surrounding network. Such tagged interactions may then be pruned, which produces a statistically conservative network approximation that is guaranteed to contain non-spurious interactions only. We describe the algorithm and present a reference implementation in MATLAB to test the algorithm’s performance on simulated networks as well as networks derived from magnetoencephalographic data. We discuss the algorithm in relation to other approximative multivariate methods and highlight suitable application scenarios. Our approach is a tractable and data-efficient way of reconstructing approximative networks of multivariate interactions. It is preferable if available data are limited or if fully multivariate approaches are computationally infeasible.
Measuring information processing in neural data: The application of transfer entropy in neuroscience
(2017)
It is a common notion in neuroscience research that the brain and neural systems in general "perform computations" to generate their complex, everyday behavior (Schnitzer, 2002). Understanding these computations is thus an important step in understanding neural systems as a whole (Carandini, 2012;Clark, 2013; Schnitzer, 2002; de-Wit, 2016). It has been proposed that one way to analyze these computations is by quantifying basic information processing operations necessary for computation, namely the transfer, storage, and modification of information (Langton, 1990; Mitchell, 2011; Mitchell, 1993;Wibral, 2015). A framework for the analysis of these operations has been emerging (Lizier2010thesis), using measures from information theory (Shannon, 1948) to analyze computation in arbitrary information processing systems (e.g., Lizier, 2012b). Of these measures transfer entropy (TE) (Schreiber2000), a measure of information transfer, is the most widely used in neuroscience today (e.g., Vicente, 2011; Wibral, 2011; Gourevitch, 2007; Vakorin, 2010; Besserve, 2010; Lizier, 2011; Richter, 2016; Huang, 2015; Rivolta, 2015; Roux, 2013). Yet, despite this popularity, open theoretical and practical problems in the application of TE remain (e.g., Vicente, 2011; Wibral, 2014a). The present work addresses some of the most prominent of these methodological problems in three studies.
The first study presents an efficient implementation for the estimation of TE from non-stationary data. The statistical properties of non-stationary data are not invariant over time such that TE can not be easily estimated from these observations. Instead, necessary observations can be collected over an ensemble of data, i.e., observations of physical or temporal replications of the same process (Gomez-Herrero, 2010). The latter approach is computationally more demanding than the estimation from observations over time. The present study demonstrates how to handles this increased computational demand by presenting a highly-parallel implementation of the estimator using graphics processing units.
The second study addresses the problem of estimating bivariate TE from multivariate data. Neuroscience research often investigates interactions between more than two (sub-)systems. It is common to analyze these interactions by iteratively estimating TE between pairs of variables, because a fully multivariate approach to TE-estimation is computationally intractable (Lizier, 2012a; Das, 2008; Welch, 1982). Yet, the estimation of bivariate TE from multivariate data may yield spurious, false-positive results (Lizier, 2012a;Kaminski, 2001; Blinowska, 2004). The present study proposes that such spurious links can be identified by characteristic coupling-motifs and the timings of their information transfer delays in networks of bivariate TE-estimates. The study presents a graph-algorithm that detects these coupling motifs and marks potentially spurious links. The algorithm thus partially corrects for spurious results due to multivariate effects and yields a more conservative approximation of the true network of multivariate information transfer.
The third study investigates the TE between pre-frontal and primary visual cortical areas of two ferrets under different levels of anesthesia. Additionally, the study investigates local information processing in source and target of the TE by estimating information storage (Lizier, 2012) and signal entropy. Results of this study indicate an alternative explanation for the commonly observed reduction in TE under anesthesia (Imas, 2005; Ku, 2011; Lee, 2013; Jordan, 2013; Untergehrer, 2014), which is often explained by changes in the underlying coupling between areas. Instead, the present study proposes that reduced TE may be due to a reduction in information generation measured by signal entropy in the source of TE. The study thus demonstrates how interpreting changes in TE as evidence for changes in causal coupling may lead to erroneous conclusions. The study further discusses current bast-practice in the estimation of TE, namely the use of state-of-the-art estimators over approximative methods and the use of optimization procedures for estimation parameters over the use of ad-hoc choices. It is demonstrated how not following this best-practice may lead to over- or under-estimation of TE or failure to detect TE altogether.
In summary, the present work proposes an implementation for the efficient estimation of TE from non-stationary data, it presents a correction for spurious effects in bivariate TE-estimation from multivariate data, and it presents current best-practice in the estimation and interpretation of TE. Taken together, the work presents solutions to some of the most pressing problems of the estimation of TE in neuroscience, improving the robust estimation of TE as a measure of information transfer in neural systems.
Chatbots are a promising technology with the potential to enhance workplaces and everyday life. In terms of scalability and accessibility, they also offer unique possibilities as communication and information tools for digital learning. In this paper, we present a systematic literature review investigating the areas of education where chatbots have already been applied, explore the pedagogical roles of chatbots, the use of chatbots for mentoring purposes, and their potential to personalize education. We conducted a preliminary analysis of 2,678 publications to perform this literature review, which allowed us to identify 74 relevant publications for chatbots’ application in education. Through this, we address five research questions that, together, allow us to explore the current state-of-the-art of this educational technology. We conclude our systematic review by pointing to three main research challenges: 1) Aligning chatbot evaluations with implementation objectives, 2) Exploring the potential of chatbots for mentoring students, and 3) Exploring and leveraging adaptation capabilities of chatbots. For all three challenges, we discuss opportunities for future research.
Untersuchungen von evolutionären Algorithmen zum Training neuronaler Netze in der Sprachverarbeitung
(1997)
Im Rahmen der vorliegenden Diplomarbeit wurde die Leistungsfähigkeit von evolutionären Algorithmen zum Training von RNN untersucht und mit gradientenbasierten Trainingsalgorithmen verglichen. Die Zielsetzung war dabei im besonderen die Prüfung der Verwendbarkeit in der Sprachverarbeitung, speziell der Spracherkennung. Zunächst wurde anhand eines Prädiktionsproblems die prinzipielle Leistungsfähigkeit von EA untersucht, indem ein MLP mit unterschiedlichen evolutionären Algorithmen trainiert wurde. Verschiedene Varianten von GA und ES sind an diesem Beispiel getestet und miteinander verglichen worden. Im Rahmen der Untersuchungen an GA stellte sich heraus, daß eine Mindestgenauigkeit der Quantisierung zur Lösung erforderlich ist. Es zeigt sich, daß die Genauigkeit der Approximation mit abnehmendem Quantisierungsfehler besser wird. Damit ist eine Behandlung dieses Problems mit grob quantisierten Gewichten nachteilig. Demgegenüber profitiert ES sowohl in der Approximationsgenauigkeit, als auch in der Konvergenzgeschwindigkeit von der direkten Darstellung der Objektvariablen als reelle Zahlen. Weiterhin zeigte sich bei ES, daß die Genauigkeit einer Lösung auch von der Populationsgröße abhängig ist, da mit wachsender Populationsgröße der Parameterraum besser abgetastet werden kann. Im Vergleich mit ES benötigten GA längere Konvergenzzeiten und bedingten zudem aufgrund der Codierung und Decodierung einen höheren Rechenaufwand als ES, so daß die Untersuchungen an RNN nur mit ES durchgeführt wurden. Zunächst wurde mit dem Latching-Problem eine, in der Komplexität eng begrenzte, Klassifikationsaufgabe mit Zeitabhängigkeiten untersucht. Die zur Verfügung gestellte Information war bei diesem Beispiel sehr gering, da der Fehler nur am Ende einer Mustersequenz berechnet wurde. Es stellte sich heraus, daß selbst bei dieser sehr einfachen Aufgabenstellung die gradientenbasierten Verfahren nach dem Überschreiten einer maximalen Mustersequenzlänge T keine Lösung finden konnten. Im Gegensatz dazu war ES in der Lage, das Problem für alle gemessenen Variationen des Parameters T zu lösen. Erst wenn während des Trainings dem Gradientenverfahren zusätzliche Informationen durch Fehlereinspeisung zur Verfügung gestellt wurde, hatte der BPTT-Algorithmus die selbe Leistungsfähigkeit. Als weiteres Experiment mit Zeitabhängigkeiten wurde das Automaton-Problem un- tersucht, welches mittels eines RNN gelöst werden sollte. Bei diesem Problem wurde besonderer Wert auf die Untersuchung des Konvergenzverhaltens bei Änderungen der Parameter von ES gelegt. Die Untersuchungen ergaben, daß die einzelnen Parameter in komplexer Weise miteinander interagieren und nur eine gute Abstimmung aller Parameter aufeinander eine befriedigende Leistung in Bezug auf Konvergenzgeschwindigkeit und Klassifikationsergebnis erbringt. Wie bei dem Latching-Problem wurde der Fehler nur am Ende einer Mustersequenz berechnet. Dies bewirkt, daß der BPTT-Algorithmus bereits bei Sequenzlängen von T = 27 nicht mehr in der Lage ist, die Zeitabhängigkeiten in dem Gradienten zu repräsentieren. Mit ES dagegen konnten RNN trainiert werden, die in der Lage sind, Sequenzlängen bis zu T = 41 richtig zu klassifizieren. Die Untersuchungen bestätigen, daß der beschränkende Faktor in erster Linie der Trainingsalgorithmus und nicht das Netzwerksparadigma ist. Die Simulationsexperimente mit zeitnormierten Sprachdaten zeigen, daß mit ES prinzipiell höhere Erkennungsleistungen als mit dem gradientenbasierten Algorithmus des BPTT erzielt werden können. Jedoch nimmt schon bei der Klassifikation der Zahlwörter Zwei und Drei die Klassifikationsleistung mit zunehmender Sequenzlänge ab. Es erfordert eine drastische Vergrößerung der Populationsgröße, um zumindest gleich gute Ergebnisse zu erzielen. Zusätzliche Tests am Automaton-Problem stützen diese Aussage. Jedoch steigt der Rechenaufwand durch Vergrößerung der Populationsgröße so stark an, daß bei nicht zeitnormierten Sprachdaten ES mit adäquater Populationsgröße nicht mehr simulierbar waren. In den Untersuchungen an dem Vokabular mit sechs Wörtern wurde der Fehler für jeden anliegenden Merkmalsvektor berechnet und im Gradienten bzw. zur Bewertung bei ES im Training verwendet. In diesen Messungen erbringen beide Algorithmen nahezu identische Klassifikationsergebnisse. Insgesamt verhindert der drastisch ansteigende Rechenaufwand bei den Sprachdaten die Verarbeitung von größeren Vokabularien und langen Wörtern durch ES. Aus der Beschränkung der Populationsgröße durch die vorhandene Rechnerkapazität resultierte eine nichtoptimale Anpassung von Selektionsdruck, Mutationsrate und Populationsverteilung im Suchraum. Insbesondere erweist sich die globale Anpassung der Strategieparameter bei den vergrößerten Populationen als problematisch. Weitere Untersuchungen an ES mit Strategien zur Selbstadaption dieser Parameter bieten sich daher für zukünftige Forschung an.
The dissertation deals with the general problem of how the brain can establish correspondences between neural patterns stored in different cortical areas. Although an important capability in many cognitive areas like language understanding, abstract reasoning, or motor control, this thesis concentrates on invariant object recognition as application of correspondence finding. One part of the work presents a correspondence-based, neurally plausible system for face recognition. Other parts address the question of visual information routing over several stages by proposing optimal architectures for such routing ('switchyards') and deriving ontogenetic mechanisms for the growth of switchyards. Finally, the idea of multi-stage routing is united with the object recognition system introduced before, making suggestions of how the so far distinct feature-based and correspondence-based approaches to object recognition could be reconciled.