Institutes
Refine
Year of publication
Document Type
- Doctoral Thesis (91)
- Article (59)
- Bachelor Thesis (18)
- Book (16)
- Master's Thesis (10)
- Conference Proceeding (4)
- Contribution to a Periodical (4)
- Habilitation (2)
- Preprint (2)
- Diploma Thesis (1)
Has Fulltext
- yes (207)
Is part of the Bibliography
- no (207)
Keywords
- Machine Learning (5)
- NLP (5)
- ALICE (3)
- Annotation (3)
- Machine learning (3)
- Text2Scene (3)
- TextAnnotator (3)
- Virtual Reality (3)
- mathematics education (3)
- Artificial intelligence (2)
Institute
- Informatik und Mathematik (207) (remove)
Diese Bachelorarbeit befasst sich mit der Themenklassifikation von unstrukturiertem Text. Aufgrund der stetig steigenden Menge von textbasierten Daten werden automatisierte Klassifikationsmethoden in vielen Disziplinen benötigt und erforscht. Aufbauend auf dem text2ddc-Klassifikator, der am Text Technology Lab der Goethe-Universität Frankfurt am Main entwickelt wurde, werden die Auswirkungen der Vergrößerung des Trainingskorpus mittels unterschiedlicher Methoden untersucht. text2ddc nutzt die Dewey Decimal Classification (DDC) als Zielklassifikation und wird trainiert auf Artikeln der Wikipedia. Nach einer Einführung, in der Grundlagen beschrieben werden, wird das Klassifikationsmodell von text2ddc vorgestellt, sowie die Probleme und daraus resultierenden Aufgaben betrachtet. Danach wird die Aktualisierung der bisherigen Daten beschrieben, gefolgt von der Vorstellung der verschiedenen Methoden, das Trainingskorpus zu erweitern. Mit insgesamt elf Sprachen wird experimentiert. Die Evaluation zeigt abschließend die Verbesserungen der Qualität der Klassifikation mit text2ddc auf, diskutiert die problematischen Fälle und gibt Anregungen für weitere zukünftige Arbeiten.
Aufgrund der §§20, 44 Abs. 1 Nr. 1 des Hessischen Hochschulgesetzes in der Fassung vom 14. Dezember 2009 (GVBl. I, S. 666), zuletzt geändert durch Art. 2 des Gesetzes vom 18. Dezember 2017 (GVBl. I, S. 284), hat der Fachbereichsrat des Fachbereichs Informatik und Mathematik der Johann Wolfgang Goethe-Universität Frankfurt am Main am 25. Mai 2020 die folgende Ordnung für den Bachelorstudiengang Mathematik beschlossen. Diese Ordnung hat das Präsidium der Goethe-Universität gemäß §37 Abs. 5 Hessisches Hochschulgesetz am 30. Juni 2020 genehmigt. Sie wird hiermit bekannt gemacht.
Das Ziel dieser Arbeit ist die realitätsgetreue Entwicklung eines interaktiven 3D-Stadtmodells, welches auf den ÖPNV zugeschnitten ist. Dabei soll das Programm anhand von Benutzereingaben und mit Hilfe einer Datenquelle, automatisch eine dreidimensionale Visualisierung der Gebäude erzeugen und den lokalen ÖPNV mitintegrieren. Als Beispiel der Ausarbeitung diente das ÖPNV-Netz der Stadt Frankfurt. Hierbei wurde auf die Problematik der Erhebung von Geoinformationen und der Verarbeitung von solchen komplexen Daten eingegangen. Es wurde ermittelt, welche Nutzergruppen einen Mehrwert durch eine derartige 3D Visualisierung haben und welche neuen Erweiterungs- und Nutzungspotenziale das Modell bietet.
Dem Leser soll insbesondere ein Einblick in die Generierung von interaktiven 3D-Modellen aus reinen Rohdaten verschafft werden. Dazu wurde als Entwicklungsumgebung die Spiele-Engine Unity eingesetzt, welche sich als sehr fähiges und modernes Entwicklungswerkzeug bei der Erstellung von funktionalen 3D-Visualisierungen herausgestellt hat. Als Datenquelle wurde das OpenStreetMap Projekt benutzt und im Rahmen dieser Arbeit behandelt. Anschließend wurde zur Evaluation, das Modell verschiedenen Nutzern bereitgestellt und anhand eines Fragebogens evaluiert.
The World Wide Web is increasing the number of freely accessible textual data, which has led to an increasing interest in research in the field of computational linguistics (CL). This area of research addresses theoretical research to answer the question of how language and knowledge must be represented in order to understand and produce language. For this purpose, mathematical models are being developed to capture the phenomena at various levels in human languages. Another field of research experiencing an increase in interest that is closely related to CL is Natural Language Processing (NLP), which is primarily concerned with developing effective and efficient data structures and algorithms that implement the mathematical models of CL.
With increasing interest in these areas, NLP tools are rapidly and frequently being developed incorporating different CL models to handle different levels of language. The open source trend has benefited all those in the scientific community who develop and use these tools. Due to yet undefined I/O standards for NLP, however, the rapid growth leads to a heterogeneous NLP landscape in which the specializations of the tools cannot benefit from each other because of interface incompatibility. In addition, the constantly growing amount of freely accessible text data requires a high-performance processing solution. This performance can be achieved by horizontal and vertical scaling of hardware and software. For these reasons the first part of this thesis deals with the homogenization of the NLP tool landscape, which is achieved by a standardized framework called TextImager. It is a cloud computing based multi-service, multi-server, multi-pipeline, multi-database, multi-user, multi-representation and multi-visual framework that already provides a variety of tools for various languages to process various levels of linguistic complexity. This makes it possible to answer research questions that require the processing of a large amount of data at several linguistic levels.
The integrated tools and the homogenized I/O data streams of the TextImager make it possible to combine the built-in tools in two dimensions: (1) the horizontal dimension to achieve NLP task-specific improvement (2) the orthogonal dimension to implement CL models that are based on multiple linguistic levels and thus rely on a combination of different NLP tools. The second part of this thesis therefore deals with the creation of models for the horizontal combination of tools in order to show the possibilities for improvement using the example of the NLP task of Named Entity Recognition (NER). The TextImager offers several tools for each NLP task, most of which have been trained on the same training basis, but can produce different results. This means that each of the tools processes a subset of the data correctly and at the same time makes errors in another subset. In order to process as large a subset of the data as possible correctly, a horizontal combination of tools is therefore required. Machine learning-based voting mechanisms called LSTMVoter and CRFVoter were developed for this purpose, which allow a combination of the outputs of individual NLP tools so that better partial data results can be achieved. In this thesis the benefit of Voter is shown using the example of the NER task, whose results flow
back into the TextImager tool landscape.
The third part of this thesis deals with the orthogonal combination of TextImager tools to accomplish the verb sense disambiguation (VSD). The CL question is investigated, how verb senses should be modelled in order to disambiguate them computatively. Verbsenses have a syntagmatic-paradigmatic relationship with surrounding words. Therefore, preprocessing on several linguistic levels and consequently an orthogonal combination of NLP tools is required to disambiguate verbs on a computational level. With TextImager’s integrated NLP landscape, it is now possible to perform these preprocessing steps to induce the information needed for the VSD. The newly developed NLP tool for the VSD has been integrated into the TextImager tool landscape, enabling the analysis of a further linguistic level.
This thesis presents a framework that homogenizes the NLP tool landscape in a cluster-based way. Methods for combining the integrated tools are implemented to improve the analysis of a specific linguistic level or to develop tools that open up new linguistic levels.
Generic tasks for algorithms
(2020)
Due to its links to computer science (CS), teaching computational thinking (CT) often involves the handling of algorithms in activities, such as their implementation or analysis. Although there already exists a wide variety of different tasks for various learning environments in the area of computer science, there is less material available for CT. In this article, we propose so-called Generic Tasks for algorithms inspired by common programming tasks from CS education. Generic Tasks can be seen as a family of tasks with a common underlying structure, format, and aim, and can serve as best-practice examples. They thus bring many advantages, such as facilitating the process of creating new content and supporting asynchronous teaching formats. The Generic Tasks that we propose were evaluated by 14 experts in the field of Science, Technology, Engineering, and Mathematics (STEM) education. Apart from a general estimation in regard to the meaningfulness of the proposed tasks, the experts also rated which and how strongly six core CT skills are addressed by the tasks. We conclude that, even though the experts consider the tasks to be meaningful, not all CT-related skills can be specifically addressed. It is thus important to define additional tasks for CT that are detached from algorithms and programming.
Density visualization pipeline: a tool for cellular and network density visualization and analysis
(2020)
Neuron classification is an important component in analyzing network structure and quantifying the effect of neuron topology on signal processing. Current quantification and classification approaches rely on morphology projection onto lower-dimensional spaces. In this paper a 3D visualization and quantification tool is presented. The Density Visualization Pipeline (DVP) computes, visualizes and quantifies the density distribution, i.e., the “mass” of interneurons. We use the DVP to characterize and classify a set of GABAergic interneurons. Classification of GABAergic interneurons is of crucial importance to understand on the one hand their various functions and on the other hand their ubiquitous appearance in the neocortex. 3D density map visualization and projection to the one-dimensional x, y, z subspaces show a clear distinction between the studied cells, based on these metrics. The DVP can be coupled to computational studies of the behavior of neurons and networks, in which network topology information is derived from DVP information. The DVP reads common neuromorphological file formats, e.g., Neurolucida XML files, NeuroMorpho.org SWC files and plain ASCII files. Full 3D visualization and projections of the density to 1D and 2D manifolds are supported by the DVP. All routines are embedded within the visual programming IDE VRL-Studio for Java which allows the definition and rapid modification of analysis workflows.
Due to the resurrection of data-hungry models (such as deep convolutional neural nets), there is an increasing demand for large-scale labeled datasets and benchmarks in the computer vision fields (CV). However, collecting real data across diverse scene contexts along with high-quality annotations is often expensive and time-consuming, especially for detailed pixel-level label prediction tasks such as semantic segmentation, etc. To address the scarcity of real-world training sets, recent works have proposed the use of computer graphics (CG) generated data to train and/or characterize performance of modern CV systems. CG based virtual worlds provide easy access to ground truth annotations and control over scene states. Most of these works utilized training data simulated from video games and pre-designed virtual environments and demonstrated promising results. However, little effort has been devoted to the systematic generation of massive quantities of sufficiently complex synthetic scenes for training scene understanding algorithms. In this work, we develop a full pipeline for simulating large-scale datasets along with per-pixel ground truth information. Our simulation pipeline constitutes of mainly two components: (a) a stochastic scene generative model that automatically synthesizes traffic scene layouts by using marked point processes coupled with 3D CAD objects and factor potentials, (b) an annotated-image rendering tool that renders the sampled 3D scene as RGB image with a chosen rendering method along with pixel-level annotations such as semantic labels, depth, surface normals etc. This pipeline is capable of automatically generating and rendering a potentially infinite variety of outdoor traffic scenes that can be used to train convolutional neural nets (CNN).
However, several recent works, including our own initial experiments demonstrated that the CV models that are trained naively on simulated data lack generalization capabilities to real-world scenes. This opens up several fundamental questions about what is it lacking in simulated data compared to real data and how to use it effectively. Furthermore, there has been a long debate since 1980’s on the usefulness of CG generated data for tuning CV systems. Primarily, the impact of modeling errors and computational rendering approximations, due to various choices in the rendering pipeline, on trained CV systems generalization performance is still not clear. In this thesis, we take a case study in the context of traffic scenarios to empirically analyze the performance degradations when CV systems trained with virtual data are transferred to real data. We first explore system performance tradeoffs due to the choice of the rendering engine (e.g., Lambertian shader (LS), ray-tracing (RT), and Monte-Carlo path tracing (MCPT)) and their parameters. A CNN architecture, DeepLab, that performs semantic segmentation, is chosen as the CV system being evaluated. In our case study, involving traffic scenes, a CNN trained with CG data samples generated with photorealistic rendering methods (such as RT or MCPT), shows already a reasonably good performance on real-world testing data from CityScapes benchmark. Use of samples from an elementary rendering method, i.e., LS, degraded the performance of CNN by nearly 20%. This result conveys that training data must be photorealistic enough for better generalizability of the trained CNN models. Furthermore, the use of physics-based MCPT rendering improved the performance by 6% but at the cost of more than three times the rendering time. This MCPT generated dataset when augmented with just 10% of real-world training data from CityScapes dataset, the performance levels achieved are comparable to that of training CNN with the complete CityScapes dataset.
The next aspect we study in the thesis involves the impact of choice of parameter settings of scene generation model on the generalization performance of CNN models trained with the generated data. Towards this end, we first propose an algorithm to estimate our scene generation model parameters given an unlabeled real world dataset from the target domain. This unsupervised tuning approach utilizes the concept of generative adversarial training, which aims at adapting the generative model by measuring the discrepancy between generated and real data in terms of their separability in the space of a deep discriminatively-trained classifier. Our method involves an iterative estimation of the posterior density of prior distributions for the generative graphical model used in the simulation. Initially, we assume uniform distributions as priors over parameters of a scene described by our generative graphical model. As iterations proceed the uniform prior distributions are updated sequentially to distributions for the simulation model parameters that leads to simulated data with statistics that are closer to the distributions of the unlabeled target data.
...
Zielsetzung dieser Arbeit ist es Nutzern, ohne Programmierkenntnisse oder Fachwissen im Bereich der Informatik, Zugang zu der automatischen Verarbeitung von Texten zu gewährleisten. Speziell soll es um Geotagging, also das Referenzieren verschiedener Objekte auf einer Karte, gehen. Als Basis soll ein ontologisches Modell dienen, mit Hilfe dessen Struktur die Objekte in Klassen eingeteilt werden. Zur Verarbeitung des Textes werden NaturalLanguage Processing Werkzeuge verwendet. Natural Language Processing beschreibt Methoden zur maschinellen Verarbeitung natürlicher Sprache. Sie ermöglichen es, die in Texten enthaltenen unstrukturierten Informationen in eine strukturierte Form zu bringen. Die so erhaltenen Informationen können für weitere maschinelle Verarbeitungsschritte verwendet oder einem Nutzer direkt bereitgestellt werden. Sollten sie direkt bereitgestellt werden, ist es ausschlaggebend, sie in einer Form zu präsentieren, die auch ohne Fachkenntnisse oder Vorwissen verständlich ist. Im Bereich der Geographie wird oft der Ansatz befolgt, die erhaltenen Informationen auf Basis verschiedener Karten, also visuell zu verarbeiten. Visualisierungen dienen hierbei der Veranschaulichung von Informationen. Durch sie werden die relevanten Aspekte dem Nutzer verdeutlicht und so die Komplexität der Informationen reduziert. Es bietet sich also an, die durch das Natural Language Processing gesammelten Informationen in Form einer Visualisierung für den Nutzer zugänglich zu machen. Im Rahmen dieser Arbeit über Geotagging und Ontologie-basierte Visualisierung für das TextImaging wird ein Tool entwickelt, das diese Brücke schlägt. Die Texte werden auf einer Karte visualisiert und bieten so eine Möglichkeit, beschriebene geographische Zusammenhänge auf einen Blick zu erfassen. Durch die Kombination der Visualisierung auf einer Karte und der Markierung der entsprechenden Entitäten im Text kann eine zuverlässige und nutzerfreundliche Visualisierung erzeugt werden. Bei einer abschließenden Evaluation hat sich gezeigt das mit dem Tool der Zeitaufwand und die Anzahl der fehlerhaften Annotationen reduziert werden konnte.Die von dem Tool gebotenen Funktionen machen dieses auch für weiterführende Arbeiten interessant. Eine Möglichkeit ist die entwickelten Annotatoren zu verwenden um ein ontology matching auf Basis bestimmter Texte auszuführen. Im Bereich der Visualisierung bieten sich Projekte wie die Visualisierung historischer Texte auf Basis automatisch ermittelter, zeitgerechter Karten an.
Neuropsychiatric disorders are complex, highly heritable but incompletely understood disorders. The clinical and genetic heterogeneity of these disorders poses a significant challenge to the identification of disorder related biomarkers. Besides significant progress in unveiling the genetic basis of these disorders, the underlying causes and biological mechanisms remain obscure. With the advancement in the array, sequencing, and big data technologies, a huge amount of data is generated from individuals across different platforms and in various data structures. But there is a paucity of bioinformatics tools that can integrate this plethora of data. Therefore, there is a need to develop an integrative bioinformatics data analysis tool that combines biological and clinical data from different data types to better understand the underlying genetics.
This thesis presents a bioinformatics pipeline implementing data from different platforms to provide a thorough understanding of the genetic etiology of a neuropsychiatric quantitative as well as a qualitative trait of interest. Throughout the thesis, we present two aspects: one is the development and architecture of the bioinformatics pipeline named MApping the Genetics of neuropsychiatric traits to the molecular NETworks of the human brain (MAGNET). The other part demonstrates the implementation and usefulness of MAGNET analysing large Autism Spectrum Disorder (ASD) cohorts.
MAGNET is a freely available command-line tool available on GitHub (https://github.com/SheenYo/MAGNET). It is implemented within one framework using data integration approaches based on state-of-the-art algorithms and software to ultimately identify the genes and pathways genetically associated with a trait of interest. MAGNET provides an edge over the existing tools since it performs a comprehensive analysis taking care of the data handling and parsing steps necessary to communicate between the different APIs (Application Program Interface). Thus, this avoids the in-between data handling steps required by researchers to provide output from one analysis to the next. Moreover, depending on the size of the dataset users can deduce important information regarding their trait of interest within a time frame of a few days. Besides gaining insights into genetic associations, one of the central features is the mapping of the associated genes onto developing human brain implementing transcriptome data of 16 different brain regions starting from the 5th post-conceptional week to over 40 years of age.
In the second part as proof of concept, we implemented MAGNET on two ASD cohorts. ASD is a group of psychiatric disorders. Clinically, ASD is characterized by the following psychopathology: A) limitations in social interaction and communication, and B) restricted, repetitive behavior. The etiology of this disorder is extremely complex due to its heterogeneous clinical traits and genetics. Therefore, to date, no reliable biomarkers are identified. Here, the aim is to characterize the genetic architecture of ASD taking into account the two aforementioned ASD diagnostic domains. As well as to investigate if these domains are genetically linked or independent of each other. Moreover, we addressed the question if these traits share genetic risk with the categorical diagnosis of ASD and how much of the phenotypic variance of these traits can be explained by the underlying genetics.
We included affected individuals from two ASD cohorts, i.e. the Autism Genome Project (AGP) and a German cohort consisting of 2,735 and 705 families respectively. MAGNET was applied to each of the ASD subdomains as a quantitative dependent variable. MAGNET is divided into five main sections i.e. (1) quality check of the genotype data, (2) imputation of missing genotype data, (3) association analysis of genotype and trait data, (4) gene-based analysis, and (5) enrichment analysis using gene expression data from the human brain.
MAGNET was applied to each of the individual traits in each cohort to perform quality control of the genetic data and imputed the missing data in an automated fashion. MAGNET identified 292 known and new ASD risk genes. These genes were subsequently assigned to biological signaling pathways and gene ontologies via MAGNET. The underlying biological mechanisms converged with respect to neuronal transmission and development processes. By reconciling these genes with the transcriptome of the developing human brain, MAGNET was able to identify that the significant genes associated with the subdomains are expressed at specific time points in brain areas such as the hippocampus, amygdala, and cortical regions. Further, we found that ASD subdomains related to domain A but not
to domain B have a shared genetic etiology.
In dieser Arbeit werden drei Themenkomplexe aus dem Bereich der Externspeicheralgorithmen näher beleuchtet: Approximationsalgorithmen, dynamische Algorithmen und Echtzeitanfragen. Das Thema Approximationsalgorithmen wird sowohl im Kapitel 3 als auch im Kapitel 5 behandelt.
In Kapitel 3 wird ein Algorithmus vorgestellt, welcher den Durchmesser eines Graphen heuristisch bestimmt. Im RAM- Modell ist eine modifizierte Breitensuche selbst ein günstiger und äußerst genauer Algorithmus. Dies ändert sich im Externspeicher. Dort ist die Hauptspeicher-Breitensuche durch die O(n + m) unstrukturierten Zugriffe auf den externen Speicher zu teuer. 2008 wurde von Meyer ein Verfahren zu effizienten Approximation des Graphdurchmessers im Externspeicher gezeigt, welches O(k · scan(n + m) + sort(n + m) + √(n·m/k·B)· log(n/k) + MST(n, m)) I/Os bei einem multiplikativen Approximationsfehler von O(√k · log (k)) benötigt. Die Implementierung, welche in dieser Arbeit vorgestellt wird, konnte in vielen praktischen Fällen die Anzahl an I/Os durch Rekursion auf O(k · scan(n + m) + sort(n + m) + MST(n, m)) I/Os reduzieren. Dabei wurden verschiedene Techniken untersucht, um die Auswahl der Startpunkte (Masterknoten) zum rekursiven Schrumpfen des Graphen so wählen zu können, dass der Fehler möglichst klein bleibt. Weiterhin wurde eine adaptive Regel eingeführt, um nur so viele Masterknoten zu wählen, dass der geschrumpfte Graph nach möglichst wenigen Rekursionsaufrufen in den Hauptspeicher passt. Es wirdgezeigt, dass die untere Schranke für den worst case-Fehler dabei auf Ω(k^{4/3−e}) mit hoher Wahrscheinlichkeit steigt. Die experimentelle Auswertung zeigt jedoch, dass in der Praxis häufig deutlich bessere Ergebnisse erzielt werden.
In Kapitel 4 wird ein Algorithmus vorgestellt, welcher, nach dem Einfügen einer neuen Kante in einen Graphen, den zugehörigen Baum der Breitensuche unter Verwendung von O(n · (n/B^{2/3} + sort(n) · log (B))) I/Os mit hoher Wahrscheinlichkeit aktualisiert. Dies ist für hinreichend große B schneller als die statische Neuberechnung. Zur Umsetzung des Algorithmus wurde eine neue deterministische Partitionsmethode entwickelt, bei der die Größe der Cluster balanciert und effizient veränderbar ist. Hierfür wird ein Dendrogramm des Graphen auf einer geeigneten Baumrepräsentation, wie beispielsweise Spannbaum, berechnet. Dadurch hat jeder Knoten ein Label, welches aufgrund seiner Lage innerhalb der Baumrepräsentation berechnet worden ist. Folglich kann mittels schneller Bit-Operationen das Label um niederwertige Stellen gekürzt werden, um Cluster der Größe µ = 2 i zu berechnen, wobei der Clusterdurchmesser auf µ beschränkt ist, was für die I/O-Komplexität gewährleistet sein muss, da der Trade-off aus MM_BFS zwischen Cluster- und Hotpoolgröße genutzt wird. In der experimentellen Auswertung wird gezeigt, dass die Performanz von dynamischer Breitensuche sowohl auf synthetischen als auch auf realen Daten oftmals schneller ist als eine statische Neuberechnung des Baums der Breitensuche. Selbst wenn dies nicht der Falls ist, so sind wir nur um kleine, konstante Faktoren langsamer als die statische Implementierung von MM_BFS.
Schließlich wird in Kapitel 5 ein Approximationsalgorithmus vorgestellt, welcher sowohl dynamische Komponenten beinhaltet als auch die Eigenschaft besitzt, Anfragen in Echtzeit zu beantworten. Um die Echtzeitfähigkeit zu erreichen, darf eine Anfrage nur O(1) I/Os hervorrufen. Im Szenario dieser Arbeit wurden Anfragen zu Distanzen zwischen zwei beliebigen Knoten u und v auf realen Graphdaten mittels eines Distanzorakels beantwortet. Es wird eine Implementierung sowohl für mechanische Festplatten als auch für SSDs vorgestellt, wobei kontinuierliche Anfragen im Onlineszenario von SSDs in Millisekunden gelöst werden können, während ein großer Block von Anfragen auf beiden Architekturen in Mikrosekunden pro Anfrage amortisiert gelöst werden kann.
The main contribution of the thesis is in helping to understand which software system parameters mostly affect the performance of Big Data Platforms under realistic workloads. In detail, the main research contributions of the thesis are:
1. Definition of the new concept of heterogeneity for Big Data Architectures (Chapter 2);
2. Investigation of the performance of Big Data systems (e.g. Hadoop) in virtualized environments (Section 3.1);
3. Investigation of the performance of NoSQL databases versus Hadoop distributions (Section 3.2);
4. Execution and evaluation of the TPCx-HS benchmark (Section 3.3);
5. Evaluation and comparison of Hive and Spark SQL engines using benchmark queries (Section 3.4);
6. Evaluation of the impact of compression techniques on SQL-on-Hadoop engine performance (Section 3.5);
7. Extensions of the standardized Big Data benchmark BigBench (TPCx-BB)(Section 4.1 and 4.3);
8. Definition of a new benchmark, called ABench (Big Data Architecture Stack Benchmark), that takes into account the heterogeneity of Big Data architectures (Section 4.5).
The thesis is an attempt to re-define system benchmarking taking into account the new requirements posed by the Big Data applications. With the explosion of Artificial Intelligence (AI) and new hardware computing power, this is a first step towards a more holistic approach to benchmarking.
Deep learning and isolation based security for intrusion detection and prevention in grid computing
(2018)
The use of distributed computational resources for the solution of scientific problems, which require highly intensive data processing is a fundamental mechanism for modern scientific collaborations. The Worldwide Large Hadron Collider Computing Grid (WLCG) is one of the most important examples of a distributed infrastructure for scientific projects and is one of the pioneering examples of grid computing. The WLCG is the global grid that analyzes data from the Large Hadron Collider (LHC) at the European Organization for Nuclear Research (CERN), with 170 sites in 40 countries and more than 600,000 processing cores. The grid service providers grant users access to resources that they can utilize on demand for the execution of custom software applications used for the analysis of data. The code that the users can execute is completely flexible, and commonly there are no significant restrictions. This flexibility and the availability of immense computing power increases the security challenges of these environments. Attackers are a concern for grid administrators. These attackers may request the execution of software with a malicious code that gives them the possibility of compromising the underlying institutions’ infrastructure. Grid systems need security countermeasures to keep the user code running, without allowing access to critical components but whilst still retaining flexibility. The administrators of grid systems also need to be continuously monitoring the activities that the applications are carrying out. An analysis of these activities is necessary to detect possible security issues, to identify ongoing incidents and to perform autonomous responses. The size and complexity of grid systems make manual security monitoring and response expensive and complicated for human analysts. Legacy intrusion detection and prevention systems (IDPS) such as Snort and OSSEC are traditionally used for security incident monitoring in the grid, cloud, clusters and standalone systems. However, IDPS are limited due to the use of hardcoded fixed rules that need to be updated continuously to cope with different threats.
This thesis introduces an architecture for improving security in grid computing. The architecture integrates the use of security by isolation, behavior monitoring and deep learning (DL) for the classification of real-time traces of the running user payloads also known as grid jobs. The first component of the proposal, the Linux containers (LCs), are used to provide isolation between grid jobs and to gather specific traceable information about the behavior of individual jobs. LCs offer a safe environment for the execution of arbitrary user scripts or binaries, protecting the sensitive components of the grid member organizations. The containers consist of a software sandboxing technique and form a lightweight alternative to other technologies such as virtual machines (VMs) that usually implement a full machine-level emulation and can, therefore, significantly affect the performance. This performance loss is commonly unacceptable in high-throughput computing scenarios. Containers enable the collection of monitoring information from the processes running inside them. The data collected via the LCs monitoring is employed to feed a DL-based IDPS.
DL methods can acquire knowledge from experience, which eliminates the need for operators to formally specify all the knowledge that a system requires. These methods can improve IDPS by building models that are utilized to detect security incidents automatically, having the ability to generalize to new classes of issues. DL can produce lower false positive rates for intrusion detection, but also provides a measure of false negatives, which can be improved with new training data. Convolutional neural networks (CNNs) are utilized for the distinction between regular and malicious job classes. A set of samples is collected from regular production grid jobs from the grid infrastructure of “A Large Ion Collider Experiment” (ALICE) and malicious Linux binaries from a malware research website. The features extracted from these samples are utilized for the training and validation of the machine learning (ML) models. The utilization of a generative approach to enhance the required training data is also proposed. Recurrent neural networks (RNN) are used as generative models for the simulation of training data that complements and improves the real collected dataset. This data augmentation strategy is useful to supplement the lack of training data in ML processes.
...
Die folgende Arbeit handelt von einem Human Computer Interaction Interface, welches es gestattet, mit Hilfe von Gesten zu schreiben. Das System ermöglicht seinen Nutzern, neue Gesten hinzuzufügen und zu verwenden. Da Gesten besser erkannt werden können, je genauer die Darstellung der Hände ist, wird diese durch Datenhandschuhe an den Computer übertragen. Die Hände werden einerseits in der Virtual Reality (VR) dargestellt, damit sie der Nutzer sieht. Andererseits werden die Daten, die die Gestenerkennung benötigt, an das Interface weitergeleitet. Die Erkennung der Gesten wird mit Hilfe eines Neuronales Netz (NN) implementiert. Dieses ist in der Lage, Gesten zu unterscheiden, sofern es genügend Trainingsdaten erhalten hat. Die genutzten Gesten sind entweder einhändig oder beidhändig auszuführen. Die Aussagen der Gesten beziehen sich in dieser Arbeit vor allem auf relationale Operatoren, die Beziehungen zwischen Objekten ausdrücken, wie beispielsweise „gleich“ oder „größer gleich“. Abschließend wird in dieser Arbeit ein System geschaffen, das es ermöglicht, mit Gesten Sätze auszudrücken. Dies betrifft das sogenannte gestische Schreiben nach Mehler, Lücking und Abrami 2014. Zu diesem Zweck befindet sich der Nutzer in einem virtuellen Raum mit Objekten, die er verknüpfen kann, wobei er Sätze in einem relationalen Kontext manifestiert.
Hierarchical self-organizing systems for task-allocation in large scaled distributed architectures
(2019)
This thesis deals with the subject of autonomous, decentralized task allocation in a large scaled multi-core network. The self-organization of such interconnected systems becomes more and more important for upcoming developments. It is to be expected that the complexity of those systems becomes hardly manageable to human users. Self-organization is part of a research field of the Organic Computing initiative, which aims to find solutions for technical systems by imitating natural systems and their processes. Within this initiative, a system for task allocation in a small scaled multi-core network was already developed, researched and published. The system is called the Artificial Hormone System (AHS), since it is inspired by the endocrine system of mammals. The AHS produces a high amount of communication load in case the multi-core network is of a bigger scale.
The contribution of this thesis is two new approaches, both based on the AHS in order to cope with large scaled architectures. The major idea of those two approaches is to introduce a hierarchy into the AHS in order to reduce the produced communication load. The first and more detailed researched approach is called the Hierarchical Artificial Hormone System (HAHS), which orders the processing elements in clusters and builds an additional communication layer between them. The second approach is the Recursive Artificial Hormone System (RAHS), which also clusters the system’s processing elements and orders the clusters into a topological tree structure for communication.
Both approaches will be explained in this thesis by their principle structure as well as some optional methods. Furthermore, this thesis presents estimations for the worst case timing behavior and the worst-case communication load of the HAHS and RAHS. At last, the evaluation results of both approaches, especially in comparison to the AHS, will be shown and discussed.
Biologische Signalwege bilden komplexe Netzwerke aus, um die Zellantwort sensibel regulieren zu können. Systembiologische Ansätze werden eingesetzt, um biologische Systeme anhand von Computer-gestützten Modellen zu untersuchen. Ein mathematisches Modell erlaubt, neben der logischen Erfassung der Regulation des biologischen Systems, die systemweite Simulation des dynamischen Verhaltens und Analyse der Robustheit und Anfälligkeit.
Der TNFR1-vermittelte Signalweg reguliert essenzielle Zellvorgänge wie Entzündungsantworten,
Proliferation und Zelltod. TNFR1 wird von dem Zytokin TNF-α stimuliert und fördert daraufhin die Bildung verschiedener makromolekularer Komplexe, welche unterschiedliche Zellantworten einleiten, von der Aktivierung des Transkriptionsfaktors NF-κB, welcher die Expression von proliferationsfördernden Genen reguliert, bis zu zwei Formen des Zelltods, der Apoptose und der Nekroptose. Die Regulation der verschiedenen Zellantworten wird auch als molekularer Schalter bezeichnet. Die exakten molekularen Vorgänge, welche die Zellantwort modulieren, sind noch nicht vollständig entschlüsselt. Eine Fehlregulation des Signalwegs kann chronische Entzündungen hervorrufen oder die Entstehung von Tumoren fördern.
In dieser Thesis haben wir die neuesten Erkenntnisse der Forschung des TNFR1-Signalwegs anhand von umfangreichen Interaktionsdaten aus der Literatur erstmals in einem Petrinetz-Modell erfasst und analysiert. Das manuell kuratierte Modell umfasst die sequenziellen Prozesse der NF-κB-Aktivierung, Apoptose und Nekroptose und berücksichtigt den Einfluss posttranslationaler Modifikationen.
Weiterhin wurden Analysemethoden für Signalwegs-Modelle entwickelt, welche die spezifischen Anforderungen dieser biologischen Systeme berücksichtigen und eine biologisch motivierte Netzwerkanalyse ermöglichen. Die Manatee-Invarianten identifizieren Signalflüsse im Gleichgewichtszustand in Modellen, die Zyklen aufweisen, und werden als Linearkombination von Transitions-Invarianten gebildet. Diese Signalflüsse erfassen idealerweise einen Prozess von der Rezeptorstimulation zur Zellantwort in einem Modell eines Signalwegs. Die Bestimmung aller möglichen Signalflüsse in Modellen von Signalwegen ist eine notwendige Voraussetzung für weitere biologisch motivierte Analysen, wie die in silico-Knockout Analyse. Wir haben ebenfalls ein neues Konzept zur Untersuchung von in silico-Knockouts vorgestellt. Die Effekte der in silico-Knockouts auf einzelne Komplexe und Prozesse des Signalwegs werden in der in silico-Knockout-Matrix repräsentiert. Wir haben die Software-Anwendung isiKnock entwickelt, welche beide Konzepte kombiniert und eine systematische Knockout-Analyse von Petrinetz-Modellen unterstützt.
Das Petrinetz-Modell des TNFR1-Signalwegs wurde auf seine elementaren Eigenschaften geprüft und die etablierten Analysen wie Platz-Invarianten und Transitions-Invarianten durchgeführt. Hierbei konnten die Transitions-Invarianten nicht in allen Fällen komplette biologische Signalflüsse beschreiben. Wir haben ebenfalls die neu vorgestellten Methoden auf das Petrinetz-Modell angewandt. Anhand der Manatee-Invarianten konnten wir die zusammenhängenden Signalflüsse identifizieren und nach ihrem biologischen Ausgang klassifizieren sowie die Auswirkungen der Rückkopplungen untersuchen. Wir konnten zeigen, dass die survival-Antwort durch die Aktivierung von NF-κB am häufigsten auftritt, danach die Apoptose, gefolgt von der Nekroptose. Die alternativen Signalflüsse in Form der Manatee-Invarianten spiegeln die Robustheit des biologischen Systems wider. Wir führten eine ausgiebige in silico-Knockout-Analyse basierend auf den Manatee-Invarianten durch, um die Proteine des Signalwegs nach ihrem Einfluss einzustufen und zu gruppieren. Die Proteine des Komplex I wiesen hierbei den größten Einfluss auf, angeführt von der Rezeptorstimulation und RIP1. Wir betrachteten und diskutierten die Regulation des molekularen Schalters anhand der Knockout-Analyse von selektierten Proteinen und deren Auswirkung auf wichtige Komplexe im Modell. Wir identifizierten die Ubiquitinierung in Komplex I sowie die NF-κB-abhängige Genexpression als die wichtigen Kontrollpunkte des TNFR1-Signalwegs. In Komplex II ist die Regulation der Aktivierung der Caspase-Aktivität entscheidend.
Die umfangreiche Netzwerkanalyse basierend auf Manatee-Invarianten und systematischer in silico-Knockout-Analyse verifizierte das Petrinetz-Modell und erlaubte die Untersuchung der Robustheit und Anfälligkeit des Systems. Die neu entwickelten Methoden ermöglichen eine fundierte, biologisch relevante Untersuchung von in silico-Modellen von Signalwegen. Der systembiologische Ansatz unterstützt die Aufklärung der Regulation und Funktion des verflochtenen Netzwerks des TNFR1-Signalwegs.
Die digitale Pathologie ist ein neues, aber stetig wachsendes, Feld in der Medizin. Die kontinuierliche Entwicklung von verbesserten digitalen Scannern erlaubt heute das Abscannen von kompletten Gewebeschnitten und Whole Slide Images gewinnen an Bedeutung. Ziel dieser Arbeit ist die Methodenentwicklung zur Analyse von Whole Slide Images des klassischen Hodgkin Lymphoms. Das Hodgkin-Lymphom, oder Morbus Hodgkin, ist eine Tumorerkrankung des Lymphsystems, bei der die monoklonalen Tumorzellen in der Regel von B-Lymphozyten im Vorläuferstadium abstammen.
Etwas mehr als 9.000 Hodgkin-Lymphom-Fälle werden jährlich in den USA diagnostiziert. Zwar ist die 5-Jahre-Überlebensrate für Hodgkin-Lymphome mit 85,3 % vergleichsweise hoch, dennoch werden etwa 1.100 Todesfälle pro Jahr in den USA registriert. Auf mikroskopischer Ebene sind die Hodgkin-Reed-Sternberg Zellen (HRS-Zellen) typisch für das klassische Hodgkin Lymphom. HRS-Zellen haben einen oder mehrere Zellkerne, die stark vergrößert sind und eine grobe Chromatinstruktur aufweisen. Immunhistologisch gibt es für HRS-Zellen charakterisierende Marker, so sind HRS-Zellen positiv für den Aktivierungsmarker CD30.
Neben der konventionellen Mikroskopie, ermöglichen Scanner das Digitalisieren von ganzen Objektträgern (Whole Slide Image). Whole Slide Images werden bisher wenig in der Routinediagnostik eingesetzt. Ein großer Vorteil von digitalisierten Gewebeschnitten bietet sich bei der computergestützten Analyse. Automatisierte Bildanalyseverfahren wie Zellerkennung können Pathologen bei der Diagnose unterstützen, indem sie umfassende Statistiken zur Anzahl und Verteilung von immungefärbten Zellen bereitstellen.
Die untersuchten immunohistologischen Bilder wurden vom Dr. Senckenbergisches Institut für Pathologie des Universitätsklinikums Frankfurt bereit gestellt. Die betrachteten Gewebeschnitte sind gegen CD30 immungefärbt, einem Membranrezeptor, welcher in HRS-Zellen und aktivierten Lymphozyten exprimiert wird. Die Gewebeschnitte wurden mit einem Aperio ScanScope slide scanner digitalisiert und liegen mit einer hohen Auflösung von 0,25 μm pro Pixel vor. Bei den vorliegenden Gewebeschnittgrößen ergeben sich Bilder mit bis zu 90.000 x 90.000 Pixeln.
Der untersuchte Bilddatensatz umfasst 35 Bilder von Lymphknotengewebeschnitten der drei Krankheitsbilder: Gemischtzelliges klassisches Hodgkinlymphom, noduläres klassisches Hodgkinlymphom und Lymphadenitis. Die Bildverarbeitungspipeline wurden teils neu implementiert, teils von etablierten Bilderkennungssoftware und -bibliotheken wie CellProfiler und Java Advanced Imaging verwendet. CD30-positive Zellobjekte werden in den Gewebeschnitten automatisiert erkannt und neben der globalen Position im Whole Slide Image weitere Morphologiedeskriptoren berechnet, wie Fläche, Feret-Durchmesser, Exzentrität und Solidität. Die Zellerkennung zeigt mit 84 % eine hohe Präzision und mit 95 % eine sehr gute Sensitivität.
Es konnte gezeigt werden, dass in Lymphadenitisfällen im Schnitt deutlich weniger CD30- positive Zellen präsent sind als in klassisches Hodgkinlymphom. Während hier im Schnitt nur rund 3.000 Zellen gefunden wurden, lag der Durchschnitt für das Mischtyp klassisches Hodgkinlymphom bei rund 19.000 CD30 positiven Zellen. Während die CD30-positiven Zellen in Lymphadenitisfällen relativ gleichmäßig verteilt sind, bilden diese in klassischen Hodgkinlymphom-Fällen Zellcluster höherer Dichte.
Die berechneten Morphologiedeskriptoren bieten die Möglichkeit die Gewebeschnitte und den Krankheitsverlauf näher zu beschreiben. Zudem sind bisher Größe und Erscheinungsbild der HRS-Zellen hauptsächlich anhand manuell ausgewählter Zellen bestimmt worden. Ein Maß für die Ausdehnung der Zellen ist der maximale Feret-Durchmesser. Bei CD30-Zellen im klassischen Hodgkinlymphom liegt dieser im Durchschnitt bei 20 μm und ist somit deutlich größer als die durchschnittlich gemessenen 15 μm in Lymphadenitis.
Es wurde ein graphentheoretischer Ansatz gewählt, um die CD30 positiven Zellen und ihre räumliche Nachbarschaft zu modellieren. In CD30-Zellgraphen von klassischen Hodgkinlymphom-Gewebeschnitten ist der durchschnittliche Knotengrad gegenüber den von Lymphadenitis-Bildern stark erhöht. Der Vergleich mit Zufallsgraphen zeigt, dass die beobachteten Knotengradverteilungen nicht für eine zufällige Verteilung der Zellen im Gewebeschnitt sprechen. Eigenschaften und Verteilung von Communities in CD30-Zellgraphen können hinzugenommen werden, um klassisches Hodgkinlymphom Gewebeschnitte näher zu charakterisieren.
Diese Arbeit zeigt, dass die Auswertung von Whole Slide Image unterstützend zur Verbesserung der Diagnose möglich ist. Die mehr als 400.000 automatisch erkannten CD30-positiven Zellobjekte wurden morphologisch beschrieben, und zusammen mit ihrer Position im Gewebeschnitt ist die Betrachtung wichtiger Eigenschaften des klassischen Hodgkinlymphoms realisierbar. Zellgraphen können durch weitere Zelltypen erweitert werden und auf andere Krankheitsbilder angewendet werden.
Precise timing of spikes between different neurons has been found to convey reliable information beyond the spike count. In contrast, the role of small phase delays with high temporal variability, as reported for example in oscillatory activity in the visual cortex, remains largely unclear. This issue becomes particularly important considering the high speed of neuronal information processing, which is assumed to be based on only a few milliseconds, or oscillation cycles within each processing step.
We investigate the role of small and imprecise phase delays with a stochastic spiking model that is strongly motivated by experimental observations. Within individual oscillation cycles the model contains only two signal parameters describing directly the rate and the phase. We specifically investigate two quantities, the probability of correct stimulus detection and the probability of correct change point detection, as a function of these signal parameters and within short periods of time such as individual oscillation cycles.
Optimal combinations of the signal parameters are derived that maximize these probabilities and enable comparison of pure rate, pure phase and combined codes. In particular, the gain in detection probability when adding imprecise phases to pure rate coding increases with the number of stimuli. More interestingly, imprecise phase delays can considerably improve the process of detecting changes in the stimulus, while also decreasing the probability of false alarms and thus, increasing robustness and speed of change point detection.
The results are applied to parameters extracted from empirical spike train recordings of neurons in the visual cortex in response to a number of visual stimuli. The results suggest that near-optimal combinations of rate and phase parameters can be implemented in the brain, and that phase parameters could particularly increase the quality of change point detection in cases of highly similar stimuli.
The thesis is about random Constraint Satisfaction Problems (rCSP). These are random instances of classical problems in NP. In the literature the study of rCSP involve identifying-locating phase transition phenomena as well as investigating algorithmic questions.
Recently, some ingenious however mathematically non-rigorous theories from statistical physics have given the study of rCSP a new perspective; the so-called Cavity Method makes some very impressing predictions about the most fundamental properties of rCSP.
In this thesis, we investigate the soundness of some of the most basic predictions of the Cavity Method, mainly, regarding the structure of the so-called Gibbs distribution on various rCSP models. Furthermore, we study some fundamental algorithmic problem related to rCSP. This includes both analysing well-known dynamical process (dynamics) like Glauber Dynamics, Metropolis Process, as well as proposing new algorithmic approaches to some natural problems related to rCSP.
We live in age of data ubiquity. Even the most conservative estimates predict exponential growth in produced, transmitted and stored data. Big data is used to power business analytics as well as to foster scientific discoveries. In many cases, explosion of produced data exceeds capabilities of digital storage systems. Scientific high-performance computing environments cope with this problem by utilizing large, distributed, storage systems. These complex systems can only provide a high degree of reliability and durability by means of data redundancy. The most straight-forward way of doing that is by replicating the data over different physical devices. However, more elaborate approaches, such as erasure coding, can provide similar data protection while utilizing less storage. Recently, software-defined reliability methods began to replace traditional, hardware- based, solutions. Complicated failure modes of storage system components also warrant checksums to guaranty long-term data integrity. To cope with ever increasing data volumes, flexible and efficient software implementation of error correction codes is of great importance. This thesis introduces a method for realizing a flexible Reed-Solomon erasure code using the “Just-In-Time” compilation technique. By exploiting intrinsic arithmetic redundancy in the algorithm, and by relying on modern optimizing compilers, we obtain a throughput-efficient erasure code implementation. Additionally, exploitation of data parallelism is achieved effortlessly by instructing the compiler to produce SIMD code for desired execution platform. We show results of codes implemented using SSE and AVX2 SIMD instruction sets for x86, and NEON instruction set for ARM platforms. Next, we introduce a framework for efficient vectorized RAID-Z redundancy operations of ZFS file system. Traditional, table-based Galois field multiplication algorithms are replaced with custom SSE and AVX2 parallel methods, providing significantly faster and more efficient parity operations. The implementation of this framework was made publicly available as a part of ZFS on Linux project, since version 0.7. Finally, we propose a new erasure scheme for use with existing, high performance, parallel filesystems. Described reliability middleware (ECCFS) allows definition of flexible, file-based, reliability policies, adapting to customized user needs. By utilizing the block erasure code, the ECCFS achieves optimal storage, computation, and network resource utilization, while providing a high level of reliability. The distributed nature of the middleware allows greater scalability and more efficient utilization of storage and network resources, in order to improve availability of the system.
Antimicrobial resistance became a serious threat to the worldwide public health in this century. A better understanding of the mechanisms, by which bacteria infect host cells and how the host counteracts against the invading pathogens, is an important subject of current research. Intracellular bacteria of the Salmonella genus have been frequently used as a model system for bacterial infections. Salmonella are ingested by contaminated food or water and cause gastroenteritis and typhoid fever in animals and humans. Once inside the gastrointestinal tract, Salmonella can invade intestinal epithelial cells. The host cell can fight against intracellular pathogens by a process called xenophagy. For complex systems, such as processes involved in the bacterial infection of cells, computational systems biology provides approaches to describe mathematically how these intertwined mechanisms in the cell function. Computational systems biology allows the analysis of biological systems at different levels of abstraction. Functional dependencies as well as dynamic behavior can be studied. In this thesis, we used the Petri net formalism to gain a better insight into bacterial infections and host defense mechanisms and to predict cellular behavior that can be tested experimentally. We also focused on the development of new computational methods.
In this work, the first realization of a mathematical model of the xenophagic capturing of Salmonella enterica serovar Typhimurium in epithelial cells was developed. The mathematical model expressed in the Petri net formalism was constructed in an iterative way of modeling and analyses. For the model verification, we analyzed the Petri net, including a computational performance of knockout experiments named in silico knockouts, which was established in this work. The in silico knockouts of the proposed Petri net are consistent with the published experimental perturbation studies and, thus, ensures the biological credibility of the Petri net. In silico knockouts that have not been experimentally investigated yet provide hypotheses for future investigations of the pathway.
To study the dynamic behavior of an epithelial cell infected with Salmonella enterica serovar Typhimurium, a stochastic Petri net was constructed. In experimental research, a decision like "Which incubation time is needed to infect half of the epithelial cells with Salmonella?" is based on experience or practicability. A mathematical model can help to answer these questions and improve experimental design. The stochastic Petri net models the cell at different stages of the Salmonella infection. We parameterized the model by a set of experimental data derived from different literature sources. The kinetic parameters of the stochastic Petri net determine the time evolution of the bacterial infection of a cell. The model captures the stochastic variation and heterogeneity of the intracellular Salmonella population of a single cell over time. The stochastic Petri net is a valuable tool to examine the dynamics of Salmonella infections in epithelial cells and generate valuable information for experimental design.
In the last part of this thesis, a novel theoretical method was introduced to perform knockout experiments in silico. The new concept of in silico knockouts is based on the computation of signal flows at steady state and allows the determination of knockout behavior that is comparable to experimental perturbation behavior. In this context, we established the concept of Manatee invariants and demonstrated the suitability of their application for in silico knockouts by reflecting biological dependencies from the signal initiation to the response. As a proof of principle, we applied the proposed concept of in silico knockouts to the Petri net of the xenophagic recognition of Salmonella. To enable the application of in silico knockouts for the scientific community, we implemented the novel method in the software isiKnock. isiKnock allows the automatized performance and visualization of in silico knockouts in signaling pathways expressed in the Petri net formalism. In conclusion, the knockout analysis provides a valuable method to verify computational models of signaling pathways, to detect inconsistencies in the current knowledge of a pathway, and to predict unknown pathway behavior.
In summary, the main contributions of this thesis are the Petri net of the xenophagic capturing of Salmonella enterica serovar Typhimurium in epithelial cells to study the knockout behavior and the stochastic Petri net of an epithelial cell infected with Salmonella enterica serovar Typhimurium to analyze the infection dynamics. Moreover, we established a new method for in silico knockouts, including the concept of Manatee invariants and the software isiKnock. The results of these studies are useful to a better understanding of bacterial infections and provide valuable model analysis techniques for the field of computational systems biology.
The technology of advanced driver assistance systems (ADAS) has rapidly developed in the last few decades. The current level of assistance provided by the ADAS technology significantly makes driving much safer by using the developed driver protection systems such as automatic obstacle avoidance and automatic emergency braking. With the use of ADAS, driving not only becomes safer but also easier as ADAS can take over some routine tasks from the driver, e.g. by using ADAS features of automatic lane keeping and automatic parking. With the continuous advancement of the ADAS technology, fully autonomous cars are predicted to be a reality in the near future.
One of the most important tasks in autonomous driving is to accurately localize the egocar and continuously track its position. The module which performs this task, namely odometry, can be built using different kinds of sensors: camera, LIDAR, GPS, etc. This dissertation covers the topic of visual odometry using a camera. While stereo visual odometry frameworks are widely used and dominating the KITTI odometry benchmark (Geiger, Lenz and Urtasun 2012), the accuracy and performance of monocular visual odometry is much less explored.
In this dissertation, a new monocular visual odometry framework is proposed, namely Predictive Monocular Odometry (PMO). PMO employs the prediction-and-correction mechanism in different steps of its implementation. PMO falls into the category of sparse methods. It detects and chooses keypoints from images and tracks them on the subsequence frames. The relative pose between two consecutive frames is first pre-estimated using the pitch-yaw-roll estimation based on the far-field view (Barnada, Conrad, Bradler, Ochs and Mester 2015) and the statistical motion prediction based on the vehicle motion model (Bradler, Wiegand and Mester 2015). The correction and optimization of the relative pose estimates are carried out by minimizing the photometric error of the keypoints matches using the joint epipolar tracking method (Bradler, Ochs, Fanani and Mester 2017).
The monocular absolute scale is estimated by employing a new approach to ground plane estimation. The camera height over ground is assumed to be known. The scale is first estimated using the propagation-based scale estimation. Both of the sparse matching and the dense matching of the ground features between two consecutive frames are then employed to refine the scale estimates. Additionally, street masks from a convolutional neural network (CNN) are also utilized to reject non-ground objects in the region of interest.
PMO also has a method to detect independently moving objects (IMO). This is important for visual odometry frameworks because the localization of the ego-car should be estimated only based on static objects. The IMO candidate masks are provided by a CNN. The case of crossing IMOs is handled by checking the epipolar consistency. The parallel-moving IMOs, which are epipolar conformant, are identified by checking the depth consistency against the depth maps from CNN.
In order to evaluate the accuracy of PMO, a full simulation on the KITTI odometry dataset was performed. PMO achieved the best accuracy level among the published monocular frameworks when it was submitted to the KITTI odometry benchmark in July 2017. As of January 2018, it is still one of the leading monocular methods in the KITTI odometry benchmark.
It is important to note that PMO was developed without employing random sampling consensus (RANSAC) which arguably has been long considered as one of the irreplaceable components in a visual odometry framework. In this sense, PMO introduces a new style of visual odometry framework. PMO was also developed without a multi-frame bundle adjustment step. This reflects the high potential of PMO when such multi-frame optimization scheme is also taken into account.
In the first part of the thesis we investigate Lyapunov exponents for general flat vector bundles over Riemann surfaces and we describe properties of Lyapunov exponents on special loci of the moduli space of flat vector bundles. In the second part of the thesis we show how the knowledge of Lyapunov exponent over a sporadic Teichmüller curve can be used to compute the algebraic equation of the associated universal family of curves.
A lot of software systems today need to make real-time decisions to optimize an objective of interest. This could be maximizing the click-through rate of an ad displayed on a web page or profit for an online trading software. The performance of these systems is crucial for the parties involved. Although great progress has been made over the years in understanding such online systems and devising efficient algorithms, a fine-grained analysis and problem specific solutions are often missing. This dissertation focuses on two such specific problems: bandit learning and pricing in gross-substitutes markets.
Bandit learning problems are a prominent class of sequential learning problems with several real-world applications. The classical algorithms proposed for these problems, although optimal in a theoretical sense often tend to overlook model-specific proper- ties. With this as our motivation, we explore several sequential learning models and give efficient algorithms for them. Our approaches, inspired by several classical works, incorporate the model-specific properties to derive better performance bounds.
The second part of the thesis investigates an important class of price update strategies in static markets. Specifically, we investigate the effectiveness of these strategies in terms of the total revenue generated by the sellers and the convergence of the resulting dynamics to market equilibrium. We further extend this study to a class of dynamic markets. Interestingly, in contrast to most prior works on this topic, we demonstrate that these price update dynamics may be interpreted as resulting from revenue optimizing actions of the sellers. No such interpretation was known previously. As a part of this investigation, we also study some specialized forms of no-regret dynamics and prediction techniques for supply estimation. These approaches based on learning algorithms are shown to be particularly effective in dynamic markets.
Die vorliegende Arbeit beschäftigt sich mit dem Thema Stemmatologie, d.h. primär der Rekonstruktion der Kopiergeschichte handschriftlich fixierter Dokumente. Zentrales Objekt der Stemmatologie ist das Stemma, eine visuelle Darstellung der Kopiergeschichte, welche i.d.R. graphtheoretisch als Baum bzw. gerichteter azyklischer Graph vorliegt, wobei die Knoten Textzeugen (d.s. die Textvarianten) darstellen während die Kanten für einzelne Kopierprozesse stehen. Im Mittelpunkt des Wissenschaftszweiges steht die Frage des Autorenoriginals (falls ein einziges solches existiert haben sollte) und die Frage der Rekonstruktion seines Textes. Das Stemma selbst ist ein Mittel zu diesem Hauptzweck (Cameron 1987). Der durch für manuelle Kopierprozesse kennzeichnende Abweichungen zunehmend abgewandelte Originaltext ist meist nicht direkt überliefert. Ziel der Arbeit ist es, die semi-automatische Stemmatologie umfassend zu beschreiben und durch Tools und analytische Verfahren weiterzuentwickeln. Der erste Teil der Arbeit beschreibt die Geschichte der computer-assistierten Stemmatologie inkl. ihrer klassischen Vorläufer und mündet in der Vorstellung eines einfachen Tools zur dynamischen graphischen Darstellung von Stemmata. Ein Exkurs zum philologischen Leitphänomen Lectio difficilior erörtert dessen mögliche psycholinguistische Ursachen im schnelleren lexikalischen Zugriff auf hochfrequente Lexeme. Im zweiten Teil wird daraufhin die existenziellste aller stemmatologischen Debatten, initiiert durch Joseph Bédier, mit mathematischen Argumenten auf Basis eines von Paul Maas 1937 vorgeschlagenen stemmatischen Models beleuchtet. Des Weiteren simuliert der Autor in diesem Kapitel Stemmata, um den potenziellen Einfluss der Distribution an Kopierhäufigkeiten pro Manuskript abzuschätzen.
Im nächsten Teil stellt der Autor ein eigens erstelltes Korpus in persischer Sprache vor, welches ebenso wie 3 der bekannten artifiziellen Korpora (Parzival, Notre Besoin, Heinrichi) qualitativ untersucht wird. Schließlich wird mit der Multi Modal Distance eine Methode zur Stemmagenerierung angewandt, welche auf externen Daten psycholinguistisch determinierter Buchstabenverwechslungswahrscheinlichkeiten beruht. Im letzten Teil arbeitet der Autor mit minimalen Spannbäumen zur Stemmaerzeugung, wobei eine vergleichende Studie zu 4 Methoden der Distanzmatrixgenerierung mit 4 Methoden zur Stemmaerzeugung durchgeführt, evaluiert und diskutiert wird.
The thesis deals with the analysis and modeling of point processes emerging from different experiments in neuroscience. In particular, the description and detection of different types of variability changes in point processes is of interest.
A non-stationary rate or variance of life times is a well-known problem in the description of point processes like neuronal spike trains and can affect the results of further analyses requiring stationarity. Moreover, non-stationary parameters might also contain important information themselves. The goal of the first part of the thesis is the (further) development of a technique to detect both rate and variance changes that may occur in multiple time scales separately or simultaneously. A two-step procedure building on the multiple filter test (Messer et al., 2014) is used that first tests the null hypothesis of rate homogeneity allowing for an inhomogeneous variance and that estimates change points in the rate if the null hypothesis is rejected. In the second step, the null hypothesis of variance homogeneity is tested and variance change points are estimated. Rate change points are used as input. The main idea is the comparison of estimated variances in adjacent windows of different sizes sliding over the process. To determine the rejection threshold functionals of the Brownian motion are identified as limit processes under the null of variance homogeneity. The non-parametric procedure is not restricted to the case of at most one change point. It is shown in simulation studies that the corresponding test keeps the asymptotic significance level for a wide range of parameters and that the test power is remarkable. The practical applicability of the procedure is underlined by the analysis of neuronal spike trains.
Point processes resulting from experiments on bistable perception are analyzed in the second part of the thesis. Visual illusions allowing for than more possible perception lead to unpredictable changes of perception. In the thesis data from (Schmack et al., 2015) are used. A rotating sphere with switching perceived rotation direction was presented to the participants of the study. The stimulus was presented continuously and intermittently, i.e., with short periods of „blank display“ between the presentation periods. There are remarkable differences in the response patterns between the two types of presentation. During continuous presentation the distribution of dominance times, i.e., the intervals of constant perception, is a right-skewed and unimodal distribution with a mean of about five seconds. In contrast, during intermittent presentation one observes very long, stable dominance times of more than one minute interchanging with very short, unstable dominance times of less than five seconds, i.e., an increase of variability.
The main goal of the second part is to develop a model for the response patterns to bistable perception that builds a bridge between empirical data analysis and mechanistic modeling. Thus, the model should be able to describe both the response patterns to continuous presentation and to intermittent presentation. Moreover, the model should be fittable to typically short experimental data, and the model should allow for neuronal correlates. Current approaches often use detailed assumptions and large parameter sets, which complicate parameter estimation.
First, a Hidden Markov Model is applied. Second, to allow for neuronal correlates, a Hierarchical Brownian Model (HBM) is introduced, where perception is modeled by the competition of two neuronal populations. The activity difference between these two populations is described by a Brownian motion with drift fluctuating between two borders, where each first hitting time causes a perceptual change. To model the response patterns to intermittent presentation a second layer with competing neuronal populations (coding a stable and an unstable state) is assumed. Again, the data are described very well, and the hypothesis that the relative time in the stable state is identical in a group of patients with schizophrenia and a control group is rejected. To sum up, the HBM intends to link empirical data analysis and mechanistic modeling and provides interesting new hypotheses on potential neuronal mechanisms of cognitive phenomena.
Diese Arbeit beschäftigt sich mit inversen Problemen für partielle Differentialgleichungen. Moderne Lösungsverfahren solcher inversen Probleme müssen die zugehörige partielle Differentialgleichung (PDGL) oft sehr häufig lösen. Mit Hinblick auf die Rechenzeit solcher Verfahren stellt das häufige Lösen der PDGL den Hauptanteil der benötigten Rechenzeit dar. Daraus resultiert die Grundidee dieser Arbeit: es sollen Lösungsverfahren von inversen Problemen beschleunigt werden, indem die für die Vorwärtslösung benötigte Rechenzeit verringert wird. Genauer gesagt soll anstatt der Vorwärtslösung eine Approximation an diese, welche kostengünstig zu berechnen ist, verwendet werden. Für die Bestimmung einer kostengünstigen Annäherung an die Vorwärtslösung wird die Reduzierte Basis Methode, eine Modellreduktionstechnik, verwendet.
Das Ziel der klassischen Reduzierten Basis Methode ist es einen globalen Reduzierte Basis Raum (RB-Raum) zu konstruieren. Dabei handelt es sich um einen niedrigdimensionalen Teilraum des Lösungsraumes der PDGL, welcher für jeden Parameter aus dem Parameterraum eine gute Näherung der PDGL-Lösung liefert. Eine beispielhafte Methode zur Konstruktion eines solchen Raumes ist es, geschickt Parameter auszuwählen und die dazu gehörigen PDGL-Lösungen als Basisvektoren des RB-Raumes zu verwenden. Die orthogonale Projektion der PDGL auf diesen RB-Raum liefert die entsprechenden Reduzierte Basis Lösungen. Das Besondere in dieser Arbeit ist, dass die betrachteten PDGLn einen sehr hochdimensionalen und unbeschränkten Parameterraum besitzen, und es ist bekannt, dass dies für die Reduzierte Basis Methode eine immense Schwierigkeit darstellt.
In Kapitel 1 wird ein schlechtgestelltes inverses Modellproblem, die Rekonstruktion der Wärmeleitfähigkeit eines Gegenstandes aus der Messung der Temperatur desselben, eingeführt und das nichtlineare Landweber-Verfahren als iteratives Regularisierungsverfahren zur Lösung dieses inversen Problems vorgestellt. Die Grundlagen der Reduzierten Basis Methode werden dargelegt und es wird erläutert, warum die klassische Variante der Methode in diesem Kontext der Bildrekonstruktion versagt. Daraufhin wird der neuartig Ansatz, ein adaptiver Reduzierte Basis Ansatz, entwickelt. Die folgenden Schritte bilden die Grundlage dieses adaptiven Reduzierte Basis Ansatzes:
1. Sei ein RB-Raum gegeben, so projiziere den Lösungsalgorithmus des inversen Problems auf diesen RB-Raum.
2. Generiere mit Hilfe dieses projizierten Verfahrens neue Iterierte bis entweder eine Iterierte das inverse Problem löst oder bis der RB-Raum erweitert werden muss.
3. Im ersten Fall wird das Verfahren beendet, im zweiten Fall wird die zur aktuellen Iterierten gehörige Vorwärtslösung verwendet um den RB-Raum zu verbessern. Danach wird mit dem ersten Schritt fortgefahren.
Es wird also nach und nach ein lokal approximierender RB-Raum konstruiert, indem Parameter für neue Basisvektoren mittels einer projizierten Variante des Lösungsalgorithmus des inversen Problems gefunden werden. Das neuartige Reduzierte Basis Landweber-Verfahren ist das Hauptresultat von Kapitel 1, wobei das Verfahren ausführlich numerisch untersucht und mit dem ursprünglichen Landweber-Verfahren verglichen wird.
In Kapitel 2 dieser Arbeit soll der zuvor entwickelte adaptive Reduzierte Basis Ansatz auf ein komplexes und praxisrelevantes Problem angewandt werden. Insbesondere soll die dadurch entstehende neue Methode mit Hinblick auf Konvergenz theoretisch ausführlich untersucht werden. Daher widmet sich der zweite Teil dieser Arbeit dem Problem der Magnet Resonanz Elektrischen Impedanztomographie (MREIT).
Bei der MREIT handelt es sich um ein Bildgebungsverfahren, welches während der letzten drei Jahrzehnte entwickelt wurde. Dabei wird ein Gegenstand, an welchen Elektroden angeheftet sind, in einen Kernspintomographen gelegt und es ist das Ziel des Verfahrens die elektrische Leitfähigkeit des Gegenstandes zu bestimmen. Die dazu benötigten Daten werden folgendermaßen gewonnen: indem Strom an einer der Elektroden angelegt wird, wird ein Stromfluss erzeugt, welcher wiederum eine Änderung der Magnetflussdichte induziert. Diese kann mit Hilfe des Kernspintomographen gemessen werden, wodurch man einen vollen Satz innerer Daten zur Hand hat, sodass hoch aufgelöste Bilder der elektrischen Leitfähigkeit des Gegenstandes rekonstruiert werden können.
Als Lösungsalgorithmus für dieses praxisrelevante Problem wird der bereits bekannte Harmonische Bz Algorithmus vorgestellt. Das Problem und der Algorithmus werden mit Hinblick auf Konvergenz des Verfahrens untersucht und ein Konvergenzresultat, welches die bestehende Konvergenztheorie hin zu einem approximativen Harmonischen Bz Algorithmus erweitert, wird bewiesen. Dabei hängt das Resultat nicht davon ab welche Art von Approximation an die Vorwärtslösung der entsprechenden PDGL im approximativen Harmonischen Bz Algorithmus verwendet wird solange diese einer Regularitäts- und einer Qualitätsbedingung genügt. Damit folgt das zweite Hauptresultat dieser Arbeit: die numerische Konvergenz des Harmonischen Bz Algorithmus. Es soll dabei hervorgehoben werden, dass Konvergenzresultate im Bereich der inversen Probleme (sofern es sie gibt) meistens die Kenntnis der exakten Vorwärtslösung annehmen, sodass keine numerische Konvergenz des zugehörigen Verfahrens folgt (in einer numerischen Implementation wird stets eine Approximation an die Vorwärtslösung verwendet). Somit ist dieses Konvergenzresultat ein Schritt hin zur numerischen Konvergenz anderer Lösungsverfahren von inversen Problemen.
Da das theoretische Resultat von der Art der Approximation nicht abhängt, erhält man ebenfalls die Konvergenz des neuartigen Reduzierte Basis Harmonischen Bz Algorithmus, welcher die Kombination des in Kapitel 1 entwickelten adaptiven Reduzierte Basis Ansatzes und des Harmonischen Bz Algorithmus ist. In einer kurzen numerischen Untersuchung wird festgestellt, dass dieser Reduzierte Basis Harmonische Bz Algorithmus schneller als der Harmonische Bz Algorithmus ist, wobei die Qualität der Rekonstruktion gleichbleibend ist. Somit funktioniert der entwickelte adaptive Reduzierte Basis Ansatz auch angewandt auf dieses komplexe praxisrelevante inverse Problem der MREIT.
The results of this thesis lie in the area of convex algebraic geometry, which is the intersection of real algebraic geometry, convex geometry, and optimization.
We study sums of nonnegative circuit polynomials (SONC) and their related cone, both geometrically and in application to polynomial optimization. SONC polynomials are certain sparse polynomials having a special structure in terms of their Newton polytopes and supports, and serve as a certificate of nonnegativity for real polynomials, which is independent of sums of squares.
The first part of this thesis is dedicated to the convex geometric study of the SONC cone. As main results we show that the SONC cone is full-dimensional in the cone of nonnegative polynomials, we exactly determine the number of zeros of a nonnegative circuit polynomial, and we give a complete and explicit characterization of the number of zeros of SONC polynomials and forms. Moreover, we provide a first approach to the study of the exposed faces of the SONC cone and their dimensions.
In the second part of the thesis we use SONC polynomials to tackle constrained polynomial optimization problems (CPOPs).
As a first step, we derive a lower bound for the optimal value of CPOP based on SONC polynomials by using a single convex optimization program, which is a geometric program (GP) under certain assumptions. GPs are a special type of convex optimization problems and can be solved in polynomial time. We test the new method experimentally and provide examples comparing our new SONC/GP approach with Lasserre's relaxation, a common approach for tackling CPOPs, which approximates nonnegative polynomials via sums of squares and semidefinite programming (SDP). The new approach comes with the benefit that in practice GPs can be solved significantly faster than SDPs. Furthermore, increasing the degree of a given problem has almost no effect on the runtime of the new program, which is in sharp contrast to SDPs.
As a second step, we establish a hierarchy of efficiently computable lower bounds converging to the optimal value of CPOP based on SONC polynomials. For a given degree each bound is computable by a relative entropy program. This program is also a convex optimization program, which is more general than a geometric program, but still efficiently solvable via interior point methods.
Powerful environment perception systems are a fundamental prerequisite for the successful deployment of intelligent vehicles, from advanced driver assistance systems to self-driving cars. Arguably the most essential task of such systems is the reliable detection and localization of obstacles in order to avoid collisions. Two particularly challenging scenarios in this context are represented by small, unexpected obstacles on the road ahead, and by potentially dynamic objects observed from a large distance. Both scenarios become exceedingly critical when the ego-vehicle is traveling at high speed. As a consequence, two major requirements placed on environment perception systems are the capability of (a) high-sensitivity generic object detection and (b) high-accuracy obstacle distance estimation. The present thesis addresses both requirements by proposing novel approaches based on stereo vision for spatial perception.
First, this work presents a novel method for the detection of small, generic obstacles and objects at long range directly from stereo imagery. The detection is based on sound statistical tests using local geometric criteria which are applicable to both static and moving objects. The approach is not limited to predefined sets of semantic object classes and does not rely on restrictive assumptions on the environment, such as oversimplified global ground surface models. Free-space and obstacle hypotheses are evaluated based on a statistical model of the input image data in order to avoid a loss of sensitivity through intermediate processing steps. In addition to the detection result, the algorithm simultaneously yields refined estimates of object distances, originating from an implicit optimization of the geometric obstacle hypothesis models. The proposed detection system provides multiple flexible output representations, ranging from 3D obstacle point clouds to compact mid-level obstacle segments to bounding box representations of object instances suitable for model-based tracking. The core algorithm concept lends itself to massive parallelization and can be implemented efficiently on dedicated hardware. Real-time execution is demonstrated on a test vehicle in real-world traffic. For a thorough quantitative evaluation of the detection performance, two dedicated datasets are employed, covering small and hard-to-detect obstacles in urban environments as well as distant dynamic objects in highway driving scenarios. The proposed system is shown to significantly outperform current general purpose obstacle detection approaches in both setups, providing a considerable increase in detection range while reducing the false positive rate at the same time.
Second, this work considers the high-accuracy estimation of object distances from stereo vision, particularly at long range. Several new methods for optimizing the stereo-based distance estimates of detected objects are proposed and compared to state-of-the-art concepts. A comprehensive statistical evaluation is performed on an extensive dedicated dataset, establishing reference values for the accuracy limits actually achievable in practice. Notably, the refined distance estimates implicitly provided by the proposed obstacle detection system are shown to yield highly accurate results, on par with the top-performing dedicated stereo matching algorithms considered in the analysis.
In this thesis we introduce the imaginary projection of (multivariate) polynomials as the projection of their variety onto its imaginary part, I(f) = { Im(z_1, ... , z_n) : f(z_1, ... , z_n) = 0 }. This induces a geometric viewpoint to stability, since a polynomial f is stable if and only if its imaginary projection does not intersect the positive orthant. Accordingly, the thesis is mainly motivated by the theory of stable polynomials.
Interested in the number and structure of components of the complement of imaginary projections, we show as a key result that there are only finitely many components which are all convex. This offers a connection to the theory of amoebas and coamoebas as well as to the theory of hyperbolic polynomials.
For hyperbolic polynomials, we show that hyperbolicity cones coincide with components of the complement of imaginary projections, which provides a strong structural relationship between these two sets. Based on this, we prove a tight upper bound for the number of hyperbolicity cones and, respectively, for the number of components of the complement in the case of homogeneous polynomials. Beside this, we investigate various aspects of imaginary projections and compute imaginary projections of several classes explicitly.
Finally, we initiate the study of a conic generalization of stability by considering polynomials whose roots have no imaginary part in the interior of a given real, n-dimensional, proper cone K. This appears to be very natural, since many statements known for univariate and multivariate stable polynomials can be transferred to the conic situation, like the Hermite-Biehler Theorem and the Hermite-Kakeya-Obreschkoff Theorem. When considering K to be the cone of positive semidefinite matrices, we prove a criterion for conic stability of determinantal polynomials.
As an integral part of ALICE, the dedicated heavy ion experiment at CERN’s Large Hadron Collider, the Transition Radiation Detector (TRD) contributes to the experiment’s tracking, triggering and particle identification. Central element in the TRD’s processing chain is its trigger and readout processor, the Global Tracking Unit (GTU). The GTU implements fast triggers on various signatures, which rely on the reconstruction of up to 20 000 particle track segments to global tracks, and performs the buffering and processing of event raw data as part of a complex detector readout tree.
The high data rates the system has to handle and its dual use as trigger and readout processor with shared resources and interwoven processing paths require the GTU to be a unique, high-performance parallel processing system. To achieve high data taking efficiency, all elements of the GTU are optimized for high running stability and low dead time.
The solutions presented in this thesis for the handling of readout data in the GTU, from the initial reception to the final assembly and transmission to the High-Level Trigger computer farm, address all these aspects. The presented concepts employ multi-event buffering, in-stream data processing, extensive embedded diagnostics, and advanced features of modern FPGAs to build a robust high-performance system that can conduct the high- bandwidth readout of the TRD with maximum stability and minimized dead time. The work summarized here not only includes the complete process from the conceptual layout of the multi-event data handling and segment control, but also its implementation, simulation, verification, operation and commissioning. It also covers the system upgrade for the second data taking period and presents an analysis of the actual system performance.
The presented design of the GTU’s input stage, which is comprised of 90 FPGA-based nodes, is built to support multi-event buffering for the data received from the 18 TRD supermodules on 1080 optical links at the full sender aggregate net bandwidth of 2.16 Tbit/s. With careful design of the control logic and the overall data path, the readout on the 18 concentrator nodes of the supermodule stage can utilize an effective aggregate output bandwidth of initially 3.33 GiB/s, and, after the successful readout bandwidth upgrade, 6.50 GiB/s via 18 optical links. The high possible readout link utilization of more than 99 % and the intermediate buffering of events on the GTU helps to keep the dead time associated with the local event building and readout typically below 10%. The GTU has been used for production data taking since start-up of the experiment and ever since performs the event buffering, local event building and readout for the TRD in a correct, efficient and highly dependable fashion.
Eine 1-1-Korrespondenz zwischen einer Klasse von Leftist-Bäumen und erweiterten t-nären Bäumen
(2006)
Leftist-Bäume sind eine Teilmenge der geordneten Bäume mit der Eigenschaft, daß der [kürzeste] Weg von jedem inneren Knoten zu einem Blatt des Teilbaums mit diesem Knoten als Wurzel immer über den am weitesten links stehenden Sohn dieses Knotens verläuft.
In der vorliegenden Arbeit wird eine 1-1-Korrespondenz zwischen erweiterten t-nären Bäumen und der Klasse der Leftist-Bäumen mit erlaubten Knotengraden 0, t, 2t-1, ... 1+t(t-1) präsentiert. Diese 1-1-Korrespondenz verallgemeinert ein Ergebnis von R. Kemp.
Embedding spanning structures into the random graph G(n,p) is a well-studied problem in random graph theory, but when one turns to the random r-uniform hypergraph H(r)(n,p) much less is known. In this thesis we will examine this topic from different perspectives, providing insights into various aspects of the theory of random graphs. Our results cover the determination of existence thresholds in two models, as well as an algorithmic approach. For the embeddings, we work with random and pseudorandom structures.
Together with Person we first notice that a general result of Riordan can be adapted from random graphs to hypergraphs and provide sufficient conditions for when H(r)(n,p) contains a given spanning structure asymptotically almost surely. As applications, we discuss several spanning structures such as cubes, lattices, spheres, and Hamilton cycles in hypergraphs.
Moreover, we study universality, i.e. when does an r-uniform hypergraph contain every hypergraph on n vertices with maximum vertex degree bounded by [delta]? For H(r)(n,p), it is shown with Person that this holds for p = w(ln n/n)1/[delta]) asymptotically almost surely by combining approaches taken by Dellamonica, Kohayakawa, Rödl, and Ruciński, of Ferber, Nenadov, and Peter, and of Kim and Lee.
Any hypergraph that is universal for the family of bounded degree r-uniform hypergraphs has to contain [omega](nr-r/[delta]) edges. With Hetterich and Person we exploit constructions of Alon and Capalbo to obtain universal r-uniform hypergraphs with the optimal number of edges O(nr-r/[delta]) when r is even, r | [delta], or [delta] = 2. Furthermore, we generalise the result of Alon and Asodi about optimal universal graphs for the family of graphs with at most m edges and no isolated vertices to hypergraphs.
In an r-uniform hypergraph on n vertices a tight Hamilton cycle consists of n edges such that there exists a cyclic ordering of the vertices where the edges correspond to consecutive segments of r vertices. In collaboration with Allen, Koch, and Person we provide a first deterministic polynomial time algorithm, which finds asymptotically almost surely tight Hamilton cycles in random r-uniform hypergraphs with edge probability at least C log3 n/n. This result partially answers a question of Nenadov and Skorić and of Dudek and Frieze who proved that tight Hamilton cycles exist already for p = w(1/n) for r = 3 and p [größer/gleich] (e + o(1))/n for r [größer/gleich] 4 using a second moment argument. Moreover our algorithm is superior to previous results of Allen, Böttcher, Kohayakawa, and Person and Nenadov and Skorić.
Lastly, we study the model of randomly perturbed dense graphs introduced by Bohman, Frieze and Martin, that is, the union of any n-vertex graph G[alpha] with minimum degree at least [alpha]n and G(n,p). For any fixed [alpha] > 0, and p = w(n-2/([delta]+1)), we show with Böttcher, Montgomery, and Person that G[alpha] UG(n,p) almost surely contains any single spanning graph with maximum degree [delta], where [delta] [größer/gleich] 5. As in previous results concerning this model, the bound used for p is lower by a log-term in comparison to the conjectured threshold for the general appearance of such subgraphs in G(n,p) alone. The new techniques we introduce also give simpler proofs of related results in the literature on trees and factors.
Measuring information processing in neural data: The application of transfer entropy in neuroscience
(2017)
It is a common notion in neuroscience research that the brain and neural systems in general "perform computations" to generate their complex, everyday behavior (Schnitzer, 2002). Understanding these computations is thus an important step in understanding neural systems as a whole (Carandini, 2012;Clark, 2013; Schnitzer, 2002; de-Wit, 2016). It has been proposed that one way to analyze these computations is by quantifying basic information processing operations necessary for computation, namely the transfer, storage, and modification of information (Langton, 1990; Mitchell, 2011; Mitchell, 1993;Wibral, 2015). A framework for the analysis of these operations has been emerging (Lizier2010thesis), using measures from information theory (Shannon, 1948) to analyze computation in arbitrary information processing systems (e.g., Lizier, 2012b). Of these measures transfer entropy (TE) (Schreiber2000), a measure of information transfer, is the most widely used in neuroscience today (e.g., Vicente, 2011; Wibral, 2011; Gourevitch, 2007; Vakorin, 2010; Besserve, 2010; Lizier, 2011; Richter, 2016; Huang, 2015; Rivolta, 2015; Roux, 2013). Yet, despite this popularity, open theoretical and practical problems in the application of TE remain (e.g., Vicente, 2011; Wibral, 2014a). The present work addresses some of the most prominent of these methodological problems in three studies.
The first study presents an efficient implementation for the estimation of TE from non-stationary data. The statistical properties of non-stationary data are not invariant over time such that TE can not be easily estimated from these observations. Instead, necessary observations can be collected over an ensemble of data, i.e., observations of physical or temporal replications of the same process (Gomez-Herrero, 2010). The latter approach is computationally more demanding than the estimation from observations over time. The present study demonstrates how to handles this increased computational demand by presenting a highly-parallel implementation of the estimator using graphics processing units.
The second study addresses the problem of estimating bivariate TE from multivariate data. Neuroscience research often investigates interactions between more than two (sub-)systems. It is common to analyze these interactions by iteratively estimating TE between pairs of variables, because a fully multivariate approach to TE-estimation is computationally intractable (Lizier, 2012a; Das, 2008; Welch, 1982). Yet, the estimation of bivariate TE from multivariate data may yield spurious, false-positive results (Lizier, 2012a;Kaminski, 2001; Blinowska, 2004). The present study proposes that such spurious links can be identified by characteristic coupling-motifs and the timings of their information transfer delays in networks of bivariate TE-estimates. The study presents a graph-algorithm that detects these coupling motifs and marks potentially spurious links. The algorithm thus partially corrects for spurious results due to multivariate effects and yields a more conservative approximation of the true network of multivariate information transfer.
The third study investigates the TE between pre-frontal and primary visual cortical areas of two ferrets under different levels of anesthesia. Additionally, the study investigates local information processing in source and target of the TE by estimating information storage (Lizier, 2012) and signal entropy. Results of this study indicate an alternative explanation for the commonly observed reduction in TE under anesthesia (Imas, 2005; Ku, 2011; Lee, 2013; Jordan, 2013; Untergehrer, 2014), which is often explained by changes in the underlying coupling between areas. Instead, the present study proposes that reduced TE may be due to a reduction in information generation measured by signal entropy in the source of TE. The study thus demonstrates how interpreting changes in TE as evidence for changes in causal coupling may lead to erroneous conclusions. The study further discusses current bast-practice in the estimation of TE, namely the use of state-of-the-art estimators over approximative methods and the use of optimization procedures for estimation parameters over the use of ad-hoc choices. It is demonstrated how not following this best-practice may lead to over- or under-estimation of TE or failure to detect TE altogether.
In summary, the present work proposes an implementation for the efficient estimation of TE from non-stationary data, it presents a correction for spurious effects in bivariate TE-estimation from multivariate data, and it presents current best-practice in the estimation and interpretation of TE. Taken together, the work presents solutions to some of the most pressing problems of the estimation of TE in neuroscience, improving the robust estimation of TE as a measure of information transfer in neural systems.
The ALICE High-Level-Trigger (HLT) is a large scale computing farm designed and constructed for the purpose of the realtime reconstruction of particle interactions (events) inside the ALICE detector. The reconstruction of such events is based on the raw data produced in collisions inside the ALICE at the Large Hadron Collider. The online reconstruction in the HLT allows the triggering on certain event topologies and a significant data reduction by applying compression algorithms. Moreover, it enables a real-time verification of the quality of the data.
To receive the raw data from the various sub-detectors of ALICE, the HLT is equipped with 226 custom built FPGA-based PCI-X cards, the H-RORCs. The H-RORC interfaces the detector readout electronics to the nodes of the HLT farm. In addition to the transfer of raw data, 108 H-RORCs host 216 Fast-Cluster-Finder (FCF) processors for the Time-Projection-Chamber (TPC). The TPC is the main tracking detector of ALICE and contributes with up to 16 GB/s to over 90% of the overall data volume. The FCF processor implements the first of two steps in the data reconstruction of the TPC. It calculates the space points and their properties from charge clouds (clusters) created by charged particles traversing the TPCs gas volume. Those space points are not only the base for the tracking algorithm, but also allow for a Huffman-based data compression, which reduces the data volume by a factor of 4 to 6.
The FCF processor is designed to cope with any incoming data rate up to the maximum bandwidth of the incoming optical link (160 MB/s) without creating back-pressure to the detectors readout electronics. A performance comparison with the software implementation of the algorithm shows a speedup factor of about 20 compared with one AMD Opteron 6172 Core @ 2.1 GHz, the CPU type used in the HLT during the LHC Run1 campaign. Comparison with an Intel E5-2690 Core @ 3.0 GHz, the CPU type used by the HLT for the LHC Run2 campaign, results in a speedup factor of 8.5. In total numbers, the 216 FCF processors provide the computing performance of 4255 AMD Opteron cores or 2203 Intel cores of the previously mentioned type. The performance of the reconstruction with respect to the physics analysis is equivalent or better than the official ALICE Offline clusterizer. Therefore, ALICE data taking was switched in 2011 to FCF cluster recording and compression only, discarding the raw data from the TPC. Due to the capability to compress the clusters, the recorded data volume could be increased by a factor of 4 to 6.
For the LHC Run3 campaign, starting in 2020, the FCF builds the foundation of the ALICE data taking and processing strategy. The raw data volume (before processing) of the upgraded TPC will exceed 3 TB/s. As a consequence, online processing of the raw data and compression of the results before it enters the online computing farms is an essential and crucial part of the computing model.
Within the scope of this thesis, the H-RORC card and the FCF processor were developed and built from scratch. It covers the conceptual design, the optimisation and implementation, as well as the verification. It is completed by performance benchmarks and experiences from real data taking.
Urn models are simple examples for random growth processes that involve various competing types. In the study of these schemes, one is generally interested in the impact of the specific form of interaction on the allocation of elements to the types. Depending on their reciprocal action, effects of cancellation and self-reinforcement become apparent in the long run of the system. For some urn models, the influencing is of a smoothing nature and the asymptotic allocation to the types is close to being a result of independent and identically distributed growth events. On the contrary, for others, almost sure random tendencies or logarithmically periodic terms emerge in the second growth order. The present thesis is devoted to the derivation of central limit theorems in the latter case. For urns of this kind, we use a "non-classical" normalisation to derive asymptotic joint normality of the types. This normalisation takes random tendencies and phases into account and consequently involves random centering and, also, possibly random scaling.
Biological ageing is a degenerative and irreversible process, ultimately leading to death of the organism. The process is complex and under the control of genetic, environmental and stochastic traits. Although many theories have been established during the last decades, none of these are able to fully describe the complex mechanisms, which lead to ageing. Generally, biological processes and environmental factors lead to molecular damage and an accumulation of impaired cellular components. In contrast, counteracting surveillance systems are effective, including repair, remodelling and degradation of damaged or impaired components, respectively. Nevertheless, at some point these systems are no longer effective, either because the increasing amount of molecular damages can not longer be removed efficiently or because the repairing and removing mechanisms themselves become affected by impairing effects. The organism finally declines and dies. To investigate and to understand these counteracting mechanisms and the complex interplay of decline and maintenance, holistic and systems biological investigations are required. Hence, the processes which lead to ageing in the fungal model organism Podospora anserina, had been analysed using different advanced bioinformatics methods. In contrast to many other ageing models, P. anserina exhibits a short lifespan, a less biochemical complexity and it provides a good accessibility for genetic manipulations.
To achieve a general overview on the different biochemical processes, which are affected during ageing in P. anserina, an initial comprehensive investigation was applied, which aimed to reveal genes significantly regulated and expressed in an age-dependent manner. This investigation was based on an age-dependent transcriptome analysis. Sophisticated and comprehensive analyses revealed different age-related pathways and indicated that especially autophagy may play a crucial role during ageing. For example, it was found that the expression of autophagy-associated genes increases in the course of ageing.
Subsequently, to investigate and to characterise the autophagy pathway, its associated single components and their interactions, Path2PPI, a new bioinformatics approach, was developed. Path2PPI enables the prediction of protein-protein interaction networks of particular pathways by means of a homology comparison approach and was applied to construct the protein-protein interaction network of autophagy in P. anserina.
The predicted network was extended by experimental data, comprising the transcriptome data as well as newly generated protein-protein interaction data achieved from a yeast two-hybrid analysis. Using different mathematical and statistical methods the topological properties of the constructed network had been compared with those of randomly generated networks to approve its biological significance. In addition, based on this topological and functional analysis, the most important proteins were determined and functional modules were identified, which correspond to the different sub-pathways of autophagy. Due to the integrated transcriptome data the autophagy network could be linked to the ageing process. For example, different proteins had been identified, which genes are continuously up- or down-regulated during ageing and it was shown for the first time that autophagy-associated genes are significantly often co-expressed during ageing.
The presented biological network provides a systems biological view on autophagy and enables further studies, which aim to analyse the relationship of autophagy and ageing. Furthermore, it allows the investigation of potential methods for intervention into the ageing process and to extend the healthy lifespan of P. anserina as well as of other eukaryotic organisms, in particular humans.
For the class of balanced, irreducible Pólya urn schemes with two colours, say black and white, limit theorems for the number of black balls after n steps are known. Depending on the ratio of the eigenvalues of the replacement matrix, two regimes of limit laws occur: almost sure convergence to a non-degenerate random variable whose distribution depends on the initial composition of the urn and that is known to be not normally distributed and weak convergence to the normal distribution. In this thesis, upper bounds on the rates of convergence in both the non-normal limit case and the normal limit case are given.
Recently, Aumüller and Dietzfelbinger proposed a version of a dual-pivot Quicksort, called "Count", which is optimal among dual-pivot versions with respect to the average number of key comparisons required. In this master's thesis we provide further probabilistic analysis of "Count". We derive an exact formula for the average number of swaps needed by "Count" as well as an asymptotic formula for the variance of the number of swaps and a limit law. Also for the number of key comparisons the asymptotic variance and a limit law are identified. We also consider both complexity measures jointly and find their asymptotic correlation.
The future heavy-ion experiment CBM (FAIR/GSI, Darmstadt, Germany) will focus on the measurements of very rare probes, which require the experiment to operate under extreme interaction rates of up to 10 MHz. Due to high multiplicity of charged particles in heavy-ion collisions, this will lead to the data rates of up to 1 TB/s. In order to meet the modern achievable archival rate, this data ow has to be reduced online by more than two orders of magnitude.
The rare observables are featured with complicated trigger signatures and require full event topology reconstruction to be performed online. The huge data rates together with the absence of simple hardware triggers make traditional latency limited trigger architectures typical for conventional experiments inapplicable for the case of CBM. Instead, CBM will employ a novel data acquisition concept with autonomous, self-triggered front-end electronics.
While in conventional experiments with event-by-event processing the association of detector hits with corresponding physical event is known a priori, it is not true for the CBM experiment, where the reconstruction algorithms should be modified in order to process non-event-associated data. At the highest interaction rates the time difference between hits belonging to the same collision will be larger than the average time difference between two consecutive collisions. Thus, events will overlap in time. Due to a possible overlap of events one needs to analyze time-slices rather than isolated events.
The time-stamped data will be shipped and collected into a readout buffer in a form of a time-slice of a certain length. The time-slice data will be delivered to a large computer farm, where the archival decision will be obtained after performing online reconstruction. In this case association of hit information with physical events must be performed in software and requires full online event reconstruction not only in space, but also in time, so-called 4-dimensional (4D) track reconstruction.
Within the scope of this work the 4D track finder algorithm for online reconstruction has been developed. The 4D CA track finder is able to reproduce performance and speed of the traditional event-based algorithm. The 4D CA track finder is both vectorized (using SIMD instructions) and parallelized (between CPU cores). The algorithm shows strong scalability on many-core systems. The speed-up factor of 10.1 has been achieved on a CPU with 10 hyper-threaded physical cores.
The 4D CA track finder algorithm is ready for the time-slice-based reconstruction in the CBM experiment.
Modern mobile devices offer a great variety of data that can be recorded. This broad range of information offers the possibility to tailor applications more to the needs of a user. Several context information can be collected, like e.g. information about position or movement. Besides integrated sensors, a broad range of additional sensors are available which can be connected to a mobile device. These additional sensors offer for example the possibility to measure physiological signals of a user.The human body offers a broad range of different signals. These signals have been used in several examples to conclude on the state of a user. The different signals allow to get a deeper insight into emotional or mental state of a user. Electrodermal activity gives feedback about the current arousal level of a user. Heart rate and heart rate variability can give an estimation about valence and mental load of a user. Several models exist to conclude from information like valence and arousal on different emotional states. Russell defined a two dimensional model, using valence and arousal to define affective states. Yerkes and Dodson developed a curve that expresses the relationship between arousal and performance of a user. Different examples exist, that use physiological signals to determine the user state for tailoring and adapting of applications. At the time of this work most of these examples did not address the usage of physiological signals for user state estimation in mobile applications and in mobile scenarios. Mobile scenarios lead to several challenges that need to be addressed. Influencing factors on physiological signals, like e.g. movement have to be controlled. Furthermore a user might be interrupted and influenced by environmental aspects. The combination of physiological data and context information might improve the interpretation of user state in mobile scenarios. In this work, we present a model that addresses the challenges of usage in mobile scenarios to offer an estimation of user state to mobile applications. To address a broad range of mobile applications, affective and cognitive state are provided as output. As input heart rate and electrodermal activity are used, as well as context information about movement and performance. Electrodermal activity is measured by a simple sensor that can be worn as a wristband. Heart rate is measured by a chest strap as used in sports. The input channels are transformed to affective and cognitive state based on a fuzzy rule based approach. With help of fuzzy logic, uncertainty can be expressed and the data continuously being processed. At the start, input channels are fuzzified by defined functions. After a that, a first fuzzy rule set transforms the input signals into values for valence, arousal and mental load. In a second step, these values and context information are transformed with another fuzzy rule set to values for affective and cognitive state. Affective state is based on the model of Russell, where valence and arousal are used to determine different emotional states. The output of the model are eight different affective states (alarmed, excited, happy, relaxed, tired, bored, sad and frustrated), which can have a high, medium, low or very low value as output. Cognitive state is determined based on mental load and context information about performance and movement. The output value can be very high, high, medium or low. The model was implemented as background service for Android devices. Different applications have been used for evaluation of the model. The model has been integrated in a multiplayer space shooter game, called ”Zone of Impulse”, which mainly benefits from the affective state. Cognitive state is more addressed in applications like a simple vocable trainer, which adapts difficulty based on user state. A study to evaluate different aspects of the model has been conducted. The study was designed to investigate the suitability of the model for mobile scenarios. The game ”zone of impulse” and the vocable trainer have been investigated in different configurations. Versions with integrated model have been compared to version of the applications without model, as well as versions of the model without context information. In total 41 participants took part in the study. A part of the participants had to do the tasks of the study in a mobile scenario, walking around several streets. The remaining participants had to do the tasks in a controlled environment in a sitting position. Different aspects were collected with ratings and questionnaires. Overall, participants rated that they did not feel impaired by the sensors they had to wear. The results showed, that the combination of physiological data and context information had an advantage against versions without context information in part of the ratings. A comparison between versions with and without model showed, that the subjective mental load ratings were significantly better for the version with model. Subjective ratings for aspects like fun, overstrain and support were mixed. When comparing the application versions in indoor and outdoor scenarios, no significant difference could be found, which leads to the assumption that there is no loss of interpretation quality in outdoor scenarios. The results also showed that the model seems to be robust enough to compensate the loss of an input channel, as there was no significant difference between application versions with full integrated model and versions with one channel lost. With the model developed in this work, context information and physiological data were combined to improve user state estimation. Furthermore pitfalls of user state estimation in mobile scenarios are overcome with this combination. However, the model has only been evaluated with a limited amount of applications and situations that mobile scenarios offer.
Data-parallel programming is more important than ever since serial performance is stagnating. All mainstream computing architectures have been and are still enhancing their support for general purpose computing with explicitly data-parallel execution. For CPUs, data-parallel execution is implemented via SIMD instructions and registers. GPU hardware works very similar allowing very efficient parallel processing of wide data streams with a common instruction stream.
These advances in parallel hardware have not been accompanied by the necessary advances in established programming languages. Developers have thus not been enabled to explicitly state the data-parallelism inherent in their algorithms. Some approaches of GPU and CPU vendors have introduced new programming languages, language extensions, or dialects enabling explicit data-parallel programming. However, it is arguable whether the programming models introduced by these approaches deliver the best solution. In addition, some of these approaches have shortcomings from a hardware-specific focus of the language design. There are several programming problems for which the aforementioned language approaches are not expressive and flexible enough.
This thesis presents a solution tailored to the C++ programming language. The concepts and interfaces are presented specifically for C++ but as abstract as possible facilitating adoption by other programming languages as well. The approach builds upon the observation that C++ is very expressive in terms of types. Types communicate intention and semantics to developers as well as compilers. It allows developers to clearly state their intentions and allows compilers to optimize via explicitly defined semantics of the type system.
Since data-parallelism affects data structures and algorithms, it is not sufficient to enhance the language's expressivity in only one area. The definition of types whose operators express data-parallel execution automatically enhances the possibilities for building data structures. This thesis therefore defines low-level, but fully portable, arithmetic and mask types required to build a flexible and portable abstraction for data-parallel programming. On top of these, it presents higher-level abstractions such as fixed-width vectors and masks, abstractions for interfacing with containers of scalar types, and an approach for automated vectorization of structured types.
The Vc library is an implementation of these types. I developed the Vc library for researching data-parallel types and as a solution for explicitly data-parallel programming. This thesis discusses a few example applications using the Vc library showing the real-world relevance of the library. The Vc types enable parallelization of search algorithms and data structures in a way unique to this solution. It shows the importance of using the type system for expressing data-parallelism. Vc has also become an important building block in the high energy physics community. Their reliance on Vc shows that the library and its interfaces were developed to production quality.
In the first part of the thesis, we show that the payment flow of a linear tax on trading gains from a security with a semimartingale price process can be constructed for all càglàd and adapted trading strategies. It is characterized as the unique continuous extension of the tax payments for elementary strategies w.r.t. the convergence "uniformly in probability". In this framework, we prove that under quite mild assumptions dividend payoffs have almost surely a negative effect on investor’s after-tax wealth if the riskless interest rate is always positive. In addition, we give an example for tax-efficient strategies for which the tax payment flow can be computed explicitly.
In the second part of the thesis, we investigate the impact of capital gains taxes on optimal investment decisions in a quite simple model. Namely, we consider a risk neutral investor who owns one risky stock from which she assumes that it has a lower expected return than the riskless bank account and determine the optimal stopping time at which she sells the stock to invest the proceeds in the bank account up to the maturity date. In the case of linear taxes and a positive riskless interest rate, the problem is nontrivial because at the selling time the investor has to realize book profits which triggers tax payments. We derive a boundary that is continuous and increasing in time, and decreasing in the volatility of the stock such that the investor sells the stock at the first time its price is smaller or equal to this boundary.
On development, feasibility, and limits of highly efficient CPU and GPU programs in several fields
(2013)
With processor clock speeds having stagnated, parallel computing architectures have achieved a breakthrough in recent years. Emerging many-core processors like graphics cards run hundreds of threads in parallel and vector instructions are experiencing a revival. Parallel processors with many independent but simple arithmetical logical units fail executing serial tasks efficiently. However, their sheer parallel processing power makes them predestined for parallel applications while the simple construction of their cores makes them unbeatably power efficient. Unfortunately, old programs cannot profit by simple recompilation. Adaptation often requires rethinking and modifying algorithms to make use of parallel execution. Many applications have some serial subroutines which are very hard to parallelize, hence contemporary compute clusters are often homogeneous, offering fast processors for serial tasks and parallel processors for parallel tasks. In order not to waste the available compute power, highly efficient programs are mandatory.
This thesis is about the development of fast algorithms and their implementations on modern CPUs and GPUs, about the maximum achievable efficiency with respect to peak performance and to power consumption respectively, and about feasibility and limits of programs for CPUs, GPUs, and heterogeneous systems. Three totally different applications from distinct fields, which were developed in the extent of this thesis, are presented.
The ALICE experiment at the LHC particle collider at CERN studies heavy-ion collisions at high rates of several hundred Hz, while every collision produces thousands of particles, whose trajectories must be reconstructed. For this purpose, ALICE track reconstruction and ALICE track merging have been adapted for GPUs and deployed on 64 GPU-enabled compute-nodes at CERN.
After a testing phase, the tracker ran in nonstop operation during 2012 providing full real-time track reconstruction. The tracker employs a multithreaded pipeline as well as asynchronous data transfer to ensure continuous GPU utilization and outperforms the fastest available CPUs by about a factor three.
The Linpack benchmark is the standard tool for ranking compute clusters. It solves a dense system of linear equations using primarily matrix multiplication facilitated by a routine called DGEMM. A heterogeneous GPU-enabled version of DGEMM and Linpack has been developed, which can utilize the CAL, CUDA, and OpenCL APIs as backend. Employing this implementation, the LOEWE-CSC cluster ranked place 22 in the November 2010 Top500 list of the fastest supercomputers, and the Sanam cluster achieved the second place in the November 2012 Green500 list of the most power efficient supercomputers. An elaborate lookahead algorithm, a pipeline, and asynchronous data transfer hide the serial CPU-bound tasks of Linpack behind DGEMM execution on the GPU reaching the highest efficiency on GPU-accelerated clusters.
Failure erasure codes enable failure tolerant storage of data and real-time failover, ensuring that in case of a hardware defect servers and even complete data centers remain operational. It is an absolute necessity for present-day computer infrastructure. The mathematical theory behind the codes involves matrix-computations in finite fields, which are not natively supported by modern processors and hence computationally very expensive. This thesis presents a novel scheme for fast encoding matrix generation and demonstrates a fast implementation for the encoding itself, which uses exclusively either integer or logical vector instructions. Depending on the scenario, it is always hitting different hard limits of the hardware: either the maximum attainable memory bandwidth, or the peak instruction throughput, or the PCI Express bandwidth limit when GPUs or FPGAs are used.
The thesis demonstrates that in most cases with respect to the available peak performance, GPU implementations can be as efficient as their CPU counterparts.
With respect to costs or power consumption, they are much more efficient. For this purpose, complex tasks must be split in serial as well as parallel parts and the execution must be pipelined such that the CPU bound tasks are hidden behind GPU execution. Few cases are identified where this is not possible due to PCI Express limitations or not reasonable because practical GPU languages are missing.
Die zentralen Objekte der Dissertation sind Translationsflächen. Dabei handelt es sich um Riemann’sche Flächen, die aus in die euklidische Ebene eingebetteten Polygonen durch Verkleben von parallelen gleichlangen Seiten entstehen. Zwei Translationsflächen sind gleich, wenn es möglich ist, die Polygone durch ”Zerschneiden und mittels Translationen neu Zusammenkleben“ ineinander zu überführen. Die Gruppe GL_2(R) operiert auf der Menge der Translationsflächen via der linearen Abbildungen auf den Polygonen. Der Stabilisator einer Translationsfläche X unter dieser Operation wird die Veech-Gruppe von X genannt und mit SL(X) bezeichnet. Die Veech-Gruppe ist eine diskrete Untergruppe von SL_2(R) und damit eine Fuchs’sche Gruppe.
Fuchs’sche Gruppen werden je nach ihrer Limesmenge in elementare und nicht-elementare Gruppen eingeteilt. Letztere wiederum unterteilt man in Gruppen erster oder zweiter Art. Fuchs’sche Gruppen mit endlichem co-Volumen heißen Gitter und sind genau die endlich erzeugten Gruppen erster Art. Translationsflächen, deren Veech-Gruppe ein Gitter ist, heißen Veech-Flächen und sind von besonderem Interesse, da für sie die Veech Alternative gilt.
Ein feineres Maß für die Größe einer Fuchs’schen Gruppe ist der kritische Exponent. Er ist definiert als das Infimum aller reellen Zahlen, für die die Poincaré Reihe konvergiert und liegt für alle unendlichen Fuchs’schen Gruppen zwischen 0 und 1. Hauptziel der Dissertation ist der Beweis von Theorem 1. Es gibt Translationsflächen, für die der kritische Exponent ihrer Veech-Gruppe echt zwischen 1/2 und 1 liegt.
Der kritische Exponent von elementaren Gruppen ist höchstens 1/2, Translationsflächen mit elementaren Veech-Gruppen sind also als Kandidaten für das Theorem ausgeschlossen. Der kritische Exponent von Gittern ist 1. Also scheiden auch Veech-Flächen für das Theorem aus.
Bis zum Jahr 2003 waren Gitter die einzigen bekannten nicht-elementaren Veech-Gruppen. McMullen klassifizierte die Veech-Flächen vom Geschlecht 2 und zeigte, dass jede solche Fläche, die nur eine Singularität besitzt, in der GL_2(R)-Bahn der Fläche L_D liegt, die aus einem L-förmigen Polygon mit geeigneten von D abhängigen Seitenlängen entsteht.
Während auch heute noch keine Translationsfläche mit Veech-Gruppe zweiter Art bekannt ist, fanden McMullen und unabhängig davon Hubert und Schmidt Konstruktionen unendlich erzeugter Veech-Gruppen erster Art. Eine Abschätzung des kritischen Exponenten dieser Gruppen war 10 Jahre lang eine wichtige offene Frage, die nun durch Theorem 1 beantwortet wird.
Zentral in der Konstruktion von Hubert und Schmidt sind spezielle Punkte, nämlich Verbindungspunkte. Hubert und Schmidt konstruieren Translationsflächen, deren Veech-Gruppen kommensurabel zum Stabilisator SL(X;P) von P sind und damit den gleichen kritischen Exponenten haben. Für Verbindungspunkte mit unendlicher SL(X)- Bahn (diese Punkte heißen nicht-periodisch) ist SL(X;P) unendlich erzeugt und von erster Art.
Wir zeigen Theorem 1, indem wir zeigen, dass für jedes D kongruent 0 mod 4, (kein Quadrat), und jeden nicht-periodischen Verbindungspunkt P in L_D der kritische Exponent der Gruppe SL(L_D;P) echt zwischen 1/2 und 1 liegt.
Eine natürliche Frage in diesem Zusammenhang ist die Abhängigkeit von P: Punkte Q in der SL(L_D)-Bahn von P sind auch er nicht-periodische Verbindungspunkte und die zugehö̈rigen Gruppen SL(L_D;P) und SL(L_D;Q) sind konjugiert zueinander. Daher widmen wir uns in Kapitel 4 der Bestimmung der Bahnen nicht-periodischer Verbindungspunkte.
Die Verbindungspunkte haben die Form P=(x_r+x_iw;y_r+y_iw) mit x_r,x_i,y_r,y_i aus Q. Wir zeigen, dass der Hauptnenner N(P) dieser (gekürzten) Brüche eine Invariante der Bahn ist. Daraus folgt:
Theorem 2. Es gibt unendlich viele verschiedene Bahnen von Verbindungspunkten von L_D.
Wir kennen die Operation der horizontalen und der vertikalen Scherungen A und B aus SL(L_D). Im Spezialfall D=8 erzeugen diese beiden Elemente die ganze Gruppe und wir geben je ein Verfahren an, um eine untere und eine obere Schranke an die Anzahl der Bahnen von nicht-periodischen Verbindungspunkten P mit fixiertem Hauptnenner N(P) zu finden. Damit zeigen wir:
Theorem 3. Die Menge der Verbindungspunkte P mit festem Wert N(P) zerfällt in eine endliche Anzahl von SL(L_8)-Bahnen.
Im Beweis von Theorem 1 ist es nötig, die Nicht-Mittelbarkeit eines Graphen zu zeigen. Da wir nur sehr wenige Informationen über dessen Struktur in unserer konkreten Situation haben, entwickeln wir in Kapitel 1 die folgende Methode:
Theorem 4. Sei G ein Graph, den man durch Weglassen von Kanten in einen Wald G′ ohne Blätter überführen kann, bei dem das Supremum der Längen von zusammenhängenden Valenz-2-Teilgraphen von G′ beschränkt ist. Dann ist G nicht mittelbar.
Um diese Methode anzuwenden, ordnen wir jeder Ecke P von G ein Komplexitätsmaß s(P) zu und weisen nach, dass dieser Wert für die Operation von Worten in A- und B-Potenzen mit wachsender Wortlänge ”tendenziell wächst“.
Die vorliegende Dissertation behandelt die Entwicklung eines Verkehrssimulationssystems, welches vollautomatisch aus Landkarten Simulationsgraphen erstellen kann. Der Fokus liegt bei urbanen Simulationsstudien in beliebigen Gemeinden und Städten. Das zweite fundamentale Standbein dieser Arbeit ist daher die Konstruktion von Verkehrsmodellen, die die wichtigsten Verkehrsteilnehmertypen im urbanen Bereich abbilden. Es wurden Modelle für Autos, Fahrräder und Fußgänger entwickelt.
Die Betrachtung des Stands der Forschung in diesem Bereich hat ergeben, dass die Verknüpfung von automatischer Grapherstellung und Modellen, die die Wechselwirkungen der verschiedenen Verkehrsteilnehmertypen abbilden, von keinem vorhandenen System geleistet wird. Es gibt grundlegend zwei Gruppen von Verkehrssimulationssystemen. Zum Einen existieren Systeme, die hohe Genauigkeiten an Simulationsergebnissen erzielen und dafür exakte (teil-)manuelle Modellierung der Gegebenheiten im zu simulierenden Bereich benötigen. Es werden in diesem Bereich meist Verkehrsmodelle simuliert, die die Verhaltensweisen der Verkehrsteilnehmer sehr gut abbilden und hierfür einen hohen Berechnungsaufwand benötigen. Auf der anderen Seiten existieren Simulationssysteme, die Straßengraphen automatisch erstellen können, darauf jedoch sehr vereinfachte Verkehrsmodelle simulieren. Es werden meist nur Autobewegungen simuliert. Der Nutzen dieser Herangehensweise ist die Möglichkeit, sehr große Szenarien simulieren zu können.
Im Rahmen dieser Arbeit wird ein System mit Eigenschaften beider grundlegenden Ansätze entwickelt, um multimodalen innerstädtischen Verkehr auf Basis automatisch erstellter Straßengraphen simulieren zu können. Die Entwicklung eines neuen Verkehrssimulationssystems erschien notwendig, da sich zum Zeitpunkt der Literaturbetrachtung kein anderes vorhandenes System für die Nutzung zur Erfüllung der genannten Zielstellung eignete. Das im Rahmen dieser Arbeit entwickelte System heißt MAINSIM (MultimodAle INnerstädtische VerkehrsSIMulation).
Die Simulationsgraphen werden aus Kartenmaterial von OpenStreetMap extrahiert. Kartenmaterial wird zuerst in verschiedene logische Layer separiert und anschließend zur Bestimmung eines Graphen des Straßennetzes genutzt. Eine Gruppe von Analyseschritten behebt Ungenauigkeiten im Kartenmaterial und ergänzt Informationen, die während der Simulation benötigt werden (z.B. die Verbindungsrichtung zwischen zwei Straßen). Das System verwendet Geoinformationssystemkomponenten zur Verarbeitung der Geodaten. Dies birgt den Vorteil der einfachen Erweiterbarkeit um weitere Datenquellen.
Die Verkehrssimulation verwendet mikroskopische Verhaltensmodelle. Jeder einzelne Verkehrsteilnehmer wird somit simuliert. Das Modell für Autos basiert auf dem in der Verkehrsforschung weit genutzten Nagel-Schreckenberg-Modell. Es verfügt jedoch über zahlreiche Modifikationen und Erweiterungen, um das Modell auch abseits von Autobahnen nutzen zu können und weitere Verhaltensweisen zu modellieren. Das Fahrradmodell entsteht durch geeignete Parametrisierung aus dem Automodell. Zur Entwicklung des Fußgängermodells wurde Literatur über das Verhalten von Fußgängern diskutiert, um daraus geeignete Eigenschaften (z.B. Geschwindigkeiten und Straßenüberquerungsverhaltensmuster) abzuleiten. MAINSIM ermöglicht folglich die Betrachtung des Verkehrsgeschehens auch aus der Sicht der Gruppe der Fußgänger oder Fahrradfahrer und kann deren Auswirkungen auf den Straßenverkehr einer ganzen Stadt bestimmen.
Das Automodell wurde auf Autobahnszenarien und innerstädtischen Straßengraphen evaluiert. Es konnte die gut verstandenen Zusammenhänge zwischen Verkehrsdichte, -fluss und -geschwindigkeit reproduzieren. Zur Evaluierung von Fahrradmodellen liegen nach dem besten Wissen des Autors keine Studien vor. Daher wurden an dieser Stelle der Einfluss der Fahrradfahrer auf den Straßenverkehr und die von Fahrrädern gefahrenen Geschwindigkeiten untersucht. Das Fußgängermodell konnte die aus der Literaturbetrachtung ermittelten Verhaltensweisen abbilden.
Nachdem die wichtigsten Komponenten von MAINSIM untersucht wurden, begannen Fallstudien, die verschiedene Gebiete abdecken. Die wichtigsten Ergebnisse aus diesem Teil der Arbeit sind:
- Es ist möglich, mit Hilfe maschineller Lernverfahren Staus innerhalb Frankfurts vorherzusagen.
- Nonkonformismus bezüglich der Verkehrsregeln kann je nach Verhalten den Verkehrsfluss empfindlich beeinflussen, kann aber auch ohne Effekt bleiben.
- Mit Hilfe von Kommunikationstechniken könnte in der Zukunft die Routenplanung von Autos verbessert werden. Ein Verfahren auf Basis von Pheromonspuren wurde im Rahmen dieser Arbeit untersucht.
- MAINSIM eignet sich zur Simulation großer Szenarien. In der letzten Fallstudie dieser Arbeit wurde der Autoverkehr eines Simulationsgebietes um Frankfurt am Main herum mit ca. 1,6 Mio. Trips pro Tag simuliert. Da MAINSIM über ein Kraftstoffverbrauchs- und CO2-Emissionsmodell verfügt, konnten die CO2-Emissionen innerhalb von Frankfurt ermittelt werden. Eine angekoppelte Simulation des Wetters mit Hilfe einer atmosphärischen Simulation zeigte, wie sich die Gase innerhalb Frankfurts verteilen.
Für den professionellen Einsatz in der Verkehrsforschung muss das entwickelte Simulationssystem um eine Methode zur Kalibrierung auf Sensordaten im Simulationsgebiet erweitert werden. Die vorhandenen Ampelschaltungen bilden nicht reale Ampeln ab. Eine Erweiterung des Systems um die automatische Integrierung maschinell lesbarer Schaltpläne von Ampeln im Bereich des Simulationsgebietes würde die Ergebnisgüte weiter erhöhen.
MAINSIM hat mehrere Anwendungsgebiete. Es können sehr schnell Simulationsgebiete modelliert werden. Daher bietet sich die Nutzung für Vorabstudien an. Wenn große Szenarien simuliert werden müssen, um z.B. die Verteilung der CO2-Emissionen innerhalb einer Stadt zu ermitteln, kann MAINSIM genutzt werden. Es hat sich im Rahmen dieser Arbeit gezeigt, dass Fahrräder und Fußgänger einen Effekt auf die Mengen des Kraftstoffverbrauchs von Autos haben können. Es sollte bei derartigen Szenarien folglich ein Simulationssysytem genutzt werden, welches die relevanten Verkehrsteilnehmertypen abbilden kann. Zur Untersuchung weiterer wissenschaftlicher Fragestellungen kann MAINSIM beliebig erweitert werden.
Zeitreihen von spontan auftretenden Topograpfien elektrischer Felder an der Kopfoberfläche, die durch eine Elektroenzephalografie (EEG) gemessen werden, zeigen Zeiträume („EEG-Microstates“), während denen die Topografie quasi-stabil ist. Diese EEG-Microstates werden üblicherweise dadurch analysiert, dass die zu spezifischen Zeitpunkten beobachteten Ausprägungen des EEGs in eine kleine Anzahl von prototypischen Topografien („Karten“) eingeteilt werden. Dadurch erhält man eine diskrete Kartensequenz.
Um die Struktur der Übergangswahrscheinlichkeiten in experimentellen Kartensequenzen zu beschreiben, werden diese Sequenzen durch eine reduzierte Markov-Kette modelliert mit nur einem Parameter pro Karte. Die Markov-Ketten können mithilfe von zwei bestimmten stochastischen Prozessen konstruiert werden. Durch den einen Prozess werden zufällige Intervalle definiert, die zufällig den verschiedenen Karten zugeordnet werden. Durch den anderen Prozess werden zufällige Abtastungszeitpunkte bestimmt, zu denen die Karte des jeweils aktuellen Intervalls abgelesen wird.
Neben der Motivation und Vorstellung des Markov-Ketten-Modells werden in dieser Arbeit Schätzer für die Modellparameter vorgeschlagen und diskutiert sowie ihre asymptotischen Varianzen hergeleitet. Zudem wird ein Anpassungstest durchgeführt und es werden Abwandlungen des Modells untersucht.
Ultrarelativistic Quantum Molecular Dynamics is a physics model to describe the transport, collision, scattering, and decay of nuclear particles. The UrQMD framework has been in use for nearly 20 years since its first development. In this period computing aspects, the design of code, and the efficiency of computation have been minor points of interest. Nowadays an additional issue arises due to the fact that the run time of the framework does not diminish any more with new hardware generations.
The current development in computing hardware is mainly focused on parallelism. Especially in scientific applications a high order of parallelisation can be achieved due to the superposition principle. In this thesis it is shown how modern design criteria and algorithm redesign are applied to physics frameworks. The redesign with a special emphasise on many-core architectures allows for significant improvements of the execution speed.
The most time consuming part of UrQMD is a newly introduced relativistic hydrodynamic phase. The algorithm used to simulate the hydrodynamic evolution is the SHASTA. As the sequential form of SHASTA is successfully applied in various simulation frameworks for heavy ion collisions its possible parallelisation is analysed. Two different implementations of SHASTA are presented.
The first one is an improved sequential implementation. By applying a more concise design and evading unnecessary memory copies, the execution time could be reduced to the half of the FORTRAN version’s execution time. The usage of memory could be reduced by 80% compared to the memory needed in the original version.
The second implementation concentrates fully on the usage of many-core architectures and deviates significantly from the classical implementation. Contrary to the sequential implementation, it follows the recalculate instead of memory look-up paradigm. By this means the execution speed could be accelerated up to a factor of 460 on GPUs.
Additionally a stability analysis of the UrQMD model is presented. Applying metapro- gramming UrQMD is compiled and executed in a massively parallel setup. The resulting simulation data of all parallel UrQMD instances were hereafter gathered and analysed. Hence UrQMD could be proven of high stability to the uncertainty of experimental data.
As a further application of modern programming paradigms a prototypical implementa- tion of the worldline formalism is presented. This formalism allows for a direct calculation of Feynman integrals and constitutes therefore an interesting enhancement for the UrQMD model. Its massively parallel implementation on GPUs is examined.
Spin(9)-invariant valuations
(2013)
The first aim of this thesis is to give a Hadwiger-type theorem for the exceptional Lie group Spin(9). The space of Spin(9)-invariant k-homogeneous valuations is studied through the construction of an exact sequence involving some spaces of differential forms. We present then a description of the spin representation using the properties of the 8-dimensional division algebra of the octonions. Using this description as well as representation-theoretic formulas, we can compute the dimensions of the spaces of differential forms appearing in the exact sequence. Hence we obtain the dimensions of the spaces of k-homogeneous Spin(9)-invariant valuations for k=0,1,...,16.
In the second part of this work, we construct one new element for a basis of one of these spaces. It is clear, that the k-th intrinsic volume is also Spin(9)-invariant. The last chapter of this work presents the construction of a new 2-homogeneous Spin(9)-invariant valuation. On a Riemannian manifold (M,g), we construct a valuation by integrating the curvature tensor over the disc bundle. We associate to this valuation on M a family of valuations on the tangent spaces. We show that these valuations are even and homogeneous of degree 2. Moreover, since the valuation on M is invariant under the action of the isometry group of M, the induced valuation on the tangent space in a point p in M is invariant under the action of the stabilisator of p for all p in M. In the special case where M is the octonionic projective plane, this construction yields an even, homogeneous of degree 2, Spin(9)-invariant valuation, whose Klain function is not constant, i.e. which is linearly independent of the second intrinsic volume.
This thesis presents various algorithms which have been developed for on-line event reconstruction in the CBM experiment at GSI, Darmstadt and the ALICE experiment at CERN, Geneve. Despite the fact that the experiments are different — CBM is a fixed target experiment with forward geometry, while ALICE has a typical collider geometry — they share common aspects when reconstruction is concerned.
The thesis describes:
— general modifications to the Kalman filter method, which allows one to accelerate, to improve, and to simplify existing fit algorithms;
— developed algorithms for track fit in CBM and ALICE experiment, including a new method for track extrapolation in non-homogeneous magnetic field.
— developed algorithms for primary and secondary vertex fit in the both experiments. In particular, a new method of reconstruction of decayed particles is presented.
— developed parallel algorithm for the on-line tracking in the CBM experiment.
— developed parallel algorithm for the on-line tracking in High Level Trigger of the ALICE experiment.
— the realisation of the track finders on modern hardware, such as SIMD CPU registers and GPU accelerators.
All the presented methods have been developed by or with the direct participation of the author.
In der modernen Hochschullehre haben sich eLearning-Elemente als ein Teil des Lehrrepertoires etabliert. Der Einsatz interaktiver webbasierter Selbstlernmodule (Web Based Trainings (WBT)) ist dabei eine Option. Hochschulen und Unternehmen versprechen sich dadurch neue Möglichkeiten des Lehrens und Lernens, um z. B. einen Ausgleich heterogener Vorerfahrungen sowie eine stärkere aktive Beteiligung der Lernenden zu bewirken. Damit die Erstellung und Strukturierung dieser Inhalte mit möglichst geringem Aufwand erfolgen kann, bieten Autorensysteme Unterstützung.
Zu den Grundfunktionen von Autorensystemen gehören unter anderem, das Einbinden gebräuchlicher Medienformate, die einfache Erstellung von Fragen sowie verschiedene Auswertungs- und Feedbackmöglichkeiten. Obwohl Autorensysteme schon vor vielen Jahren ihre erste praktische Anwendung fanden, gibt es nach wie vor Schwachstellen, die sich auf den gesamten Erstellungsprozess wie auch auf einzelne Funktionen beziehen. Im Detail wird bemängelt, dass die Werkzeuge zu komplex und unflexibel sind. Darüber hinaus fehlt häufig eine zufriedenstellende Verknüpfung der vielen Werkzeuge entlang der Prozesskette zu einer Gesamtlösung.
Des Weiteren wird die Konzentration auf die Produktionsphase kritisiert, wodurch andere wichtige Prozesse in den Hintergrund treten bzw. außer Acht gelassen werden.
Im Rahmen der Zusammenarbeit mit einem Automobilhersteller, für den die erste Version des Autorensystems LernBar weiterentwickelt wurde, spielte der Begriff „Lean Production“ inhaltlich in der Umsetzung der WBTs eine wesentliche Rolle. Die Lean Production, die über viele Jahre für die Automobilindustrie entwickelt, verbessert und angepasst wurde, liefert Optimierungsansätze für den Produktionsbereich. Ein wirtschaftlicher Nutzen des Lean-Ansatzes wird auch in anderen Bereichen gesehen wie z. B. in der Softwareentwicklung („Lean Software Development“) oder im Management („Lean Management“). Dabei bietet die Wertschöpfungsorientierung Lösungen für die widersprüchlichen Ziele mehr Leistungen zu geringeren Kosten, schneller und in höherer Qualität zugleich zu liefern. Aus der Grundidee der Lean Production entwickelte sich vorliegendes Dissertationsthema in Bezug darauf, inwiefern sich diese Prinzipien auf den WBT-Produktionsprozess übertragen lassen und die LernBar (das hierfür weiterentwickelnde Autorensystem) dabei Unterstützung bieten kann.
Zunächst wurde analysiert, welche Werkzeuge und Hilfestellungen benötigt werden, um unter dem Aspekt der Lean Production WBTs im universitären Umfeld erstellen zu können. In diesem Zusammenhang wurden Merkmale einer „Lean Media Production“ definiert sowie konzeptionell und technisch umgesetzt. Zur Verbesserung der Prozesse flossen Ergebnisse aus empirischer und praktischer Forschung ein. Im Vergleich zu anderen Entwicklungen bei denen häufig das Hauptziel eine umfangreiche Funktionalität ist, werden u.a. folgende übertragbare Ziele bei der Umsetzung verfolgt: Verschwendung vermeiden, eine starke Einbeziehung der Kunden, Werkzeuge die nahtlos ineinandergreifen, eine hohe Flexibilität und eine stetige Qualitätsverbesserung.
Zur Erreichung dieser Zielsetzungen wurden alle Prozesse kontinuierlich verbessert, sich auf das Wesentliche und die Wertschöpfung konzentriert sowie überflüssige Schritte eliminiert. Demnach ist unter dem Begriff „Lean Media Production“ ein skalierbarer, effizienter und effektiver Produktionsprozess zu verstehen, in dem alle Werkzeuge ineinandergreifen.
Die Realisierung der „Lean Media Production“ erfolgte anhand des Autorensystems LernBar, wobei die typischen Softwareentwicklungsphasen Entwurf, Implementierung und Evaluierung mehrfach durchlaufen wurden. Ausschlaggebend dabei war, dass der „Lean“-Aspekt berücksichtigt wurde und dies somit eine neue Vorgehensweise bei der Umsetzung eines Autorensystems darstellt. Im Verlauf der Entwicklungen ergaben sich, durch eine formative Evaluation, den Einsatz in Projekten und eine empirische Begleitforschung, neue Anforderungen an das System. Ein Vergleich der zwei Produktionssysteme, Automobil vs. WBT-Produktion, zeigt und bestätigt die Erwartung, dass nicht alle Prinzipien der Lean Production übertragbar sind.
Dennoch war diese Untersuchung notwendig, da sie Denkanstöße zur Entwicklung und Optimierung des Erstellungsprozesses eines WBTs gab. Auch die Ergebnisse der abschließenden Online-Befragung ergaben, dass die Ziele der Arbeit erreicht wurden, dass aber weiterer Optimierungsbedarf besteht. Die LernBar Release 3 bietet für alle Produktionsphasen Werkzeuge an, durch die eine effektive und effiziente Erstellung von WBTs von der Idee bis zur Distribution möglich ist.
Stand noch vor fünf Jahren zu Beginn dieser Arbeit das Endprodukt bei der LernBar Entwicklung im Vordergrund, verlagerte sich durch den Einfluss dieser Dissertation der Schwerpunkt auf den gesamten Produktionsprozess. Unter Berücksichtigung der in diesem Zusammenhang entwickelten Prinzipien einer „Lean Media Production“, nehmen bspw. die Wirtschaftlichkeit und die starke Kundenorientierung während des Produktionsprozesses einen wichtigen Stellenwert ein. Dieser Ansatz ist eine neue Vorgehensweise im Bereich der Entwicklung von Autorensystemen, der seine Anerkennung und Professionalität durch die Ergebnisse des selbstentwickelten Evaluationsbogens sowie dem stetig wachsenden Einsatz in Schulen, Hochschulen und Unternehmen belegen kann.
In weiteren Forschungsarbeiten ist zu untersuchen, welche Lean Production Prinzipien zu verwenden oder anzupassen sind, wenn z. B. in größeren Teams oder mobil produziert wird. Des Weiteren sollte überprüft werden, inwieweit die Lernenden mit dem Endprodukt zufrieden sind und in ihrem Lernprozess unterstützt werden. Durch diese Forschungsarbeit wurde ein Beitrag dazu geleistet, die Lehre und Ausbildung zu optimieren, indem die Autoren/Lehrende in der Erstellung ihrer digitalen Lerninhalte im gesamten Prozess von aufeinander abgestimmten Werkzeugen unterstützt werden.
The Project European Privacy Open Space (PrivacyOS) aims at bringing together industry, SMEs, Government, Academia and Civil Society to foster development and deployment of privacy infrastructures for Europe. The general objectives of PrivacyOS are to create a longterm collaboration in the thematic network and establish collective interfaces with other EU projects. Participants exchange research and best practices, as well as develop strategies and joint projects following four core policy goals: Awareness-rising, enabling privacy on the Web, fostering privacy-friendly Identity Management, and stipulating research.
...
This report focuses on the 3rd PrivacyOS conference, which was held in Vienna, October 26th and 27th 2009, co-located with the Austrian Big Brother Awards. 50 participants attended the conference and devised the agenda with 21 presentations in two parallel tracks. The topics of the presentations discussed included, amongst others: data protection awareness, data protection in healthcare, data protection in the Web 2.0, privacy-related technologies such as EnCoRe, TOR or Microformats as well as regulatory, cultural and sociological implications of data protection. Also at the 3rd PrivacyOS conference the software product “KiwiSecurity” was awarded the EuroPriSe Seal (European Privacy Seal, www.european-privacy-seal.eu). EuroPriSe is an initiative of the data protection authority Unabhängiges Landeszentrum für Datenschutz Schleswig-Holstein (ULD), Germany. It has been started as a European Project under the eTEN programme.