Institutes
Refine
Year of publication
- 2018 (20) (remove)
Document Type
- Doctoral Thesis (17)
- Bachelor Thesis (1)
- Contribution to a Periodical (1)
- Habilitation (1)
Has Fulltext
- yes (20)
Is part of the Bibliography
- no (20)
Keywords
- Brownian motion (1)
- Hidden Markov Model (1)
- Lyapunov exponents (1)
- Mc Kean martingale (1)
- bistable perception (1)
- changepoint (1)
- cover times (1)
- erasure codes (1)
- error correction codes (1)
- extreme value theory (1)
Institute
- Informatik und Mathematik (20)
- Präsidium (1)
Begriffe sind häufig nicht eindeutig. Eine „Bank“ kann ein Finanzinstitut oder eine Sitzgelegenheit sein und die Stadt Frankfurt existiert mehr als einmal. Dennoch können sie in vielen Fällen problemlos von Menschen unterschieden werden. Computer sind noch nicht in der Lage, diese Leistung mit vergleichbarer Genauigkeit zu erfüllen.
Der in dieser Arbeit vorgestellte Ansatz baut auf dem für das Deutsche bereits gute Ergebnisse erzielenden fastSense auf und verwendet ein neuronales Netz, um Namen und Begriffe in englischen Texten mit Hilfe der Wikipedia zu disambiguieren. Dabei konnte eine Genauigkeit von bis zu 89,5% auf Testdaten erreicht werden.
Mit dem entwickelten Python-Modul kann das trainierte Modell in bestehende Anwendungen eingebunden werden. Die im Modul enthaltenen Programme ermöglichen es, neue Modelle zu trainieren und zu testen.
Due to the resurrection of data-hungry models (such as deep convolutional neural nets), there is an increasing demand for large-scale labeled datasets and benchmarks in the computer vision fields (CV). However, collecting real data across diverse scene contexts along with high-quality annotations is often expensive and time-consuming, especially for detailed pixel-level label prediction tasks such as semantic segmentation, etc. To address the scarcity of real-world training sets, recent works have proposed the use of computer graphics (CG) generated data to train and/or characterize performance of modern CV systems. CG based virtual worlds provide easy access to ground truth annotations and control over scene states. Most of these works utilized training data simulated from video games and pre-designed virtual environments and demonstrated promising results. However, little effort has been devoted to the systematic generation of massive quantities of sufficiently complex synthetic scenes for training scene understanding algorithms. In this work, we develop a full pipeline for simulating large-scale datasets along with per-pixel ground truth information. Our simulation pipeline constitutes of mainly two components: (a) a stochastic scene generative model that automatically synthesizes traffic scene layouts by using marked point processes coupled with 3D CAD objects and factor potentials, (b) an annotated-image rendering tool that renders the sampled 3D scene as RGB image with a chosen rendering method along with pixel-level annotations such as semantic labels, depth, surface normals etc. This pipeline is capable of automatically generating and rendering a potentially infinite variety of outdoor traffic scenes that can be used to train convolutional neural nets (CNN).
However, several recent works, including our own initial experiments demonstrated that the CV models that are trained naively on simulated data lack generalization capabilities to real-world scenes. This opens up several fundamental questions about what is it lacking in simulated data compared to real data and how to use it effectively. Furthermore, there has been a long debate since 1980’s on the usefulness of CG generated data for tuning CV systems. Primarily, the impact of modeling errors and computational rendering approximations, due to various choices in the rendering pipeline, on trained CV systems generalization performance is still not clear. In this thesis, we take a case study in the context of traffic scenarios to empirically analyze the performance degradations when CV systems trained with virtual data are transferred to real data. We first explore system performance tradeoffs due to the choice of the rendering engine (e.g., Lambertian shader (LS), ray-tracing (RT), and Monte-Carlo path tracing (MCPT)) and their parameters. A CNN architecture, DeepLab, that performs semantic segmentation, is chosen as the CV system being evaluated. In our case study, involving traffic scenes, a CNN trained with CG data samples generated with photorealistic rendering methods (such as RT or MCPT), shows already a reasonably good performance on real-world testing data from CityScapes benchmark. Use of samples from an elementary rendering method, i.e., LS, degraded the performance of CNN by nearly 20%. This result conveys that training data must be photorealistic enough for better generalizability of the trained CNN models. Furthermore, the use of physics-based MCPT rendering improved the performance by 6% but at the cost of more than three times the rendering time. This MCPT generated dataset when augmented with just 10% of real-world training data from CityScapes dataset, the performance levels achieved are comparable to that of training CNN with the complete CityScapes dataset.
The next aspect we study in the thesis involves the impact of choice of parameter settings of scene generation model on the generalization performance of CNN models trained with the generated data. Towards this end, we first propose an algorithm to estimate our scene generation model parameters given an unlabeled real world dataset from the target domain. This unsupervised tuning approach utilizes the concept of generative adversarial training, which aims at adapting the generative model by measuring the discrepancy between generated and real data in terms of their separability in the space of a deep discriminatively-trained classifier. Our method involves an iterative estimation of the posterior density of prior distributions for the generative graphical model used in the simulation. Initially, we assume uniform distributions as priors over parameters of a scene described by our generative graphical model. As iterations proceed the uniform prior distributions are updated sequentially to distributions for the simulation model parameters that leads to simulated data with statistics that are closer to the distributions of the unlabeled target data.
...
Deep learning and isolation based security for intrusion detection and prevention in grid computing
(2018)
The use of distributed computational resources for the solution of scientific problems, which require highly intensive data processing is a fundamental mechanism for modern scientific collaborations. The Worldwide Large Hadron Collider Computing Grid (WLCG) is one of the most important examples of a distributed infrastructure for scientific projects and is one of the pioneering examples of grid computing. The WLCG is the global grid that analyzes data from the Large Hadron Collider (LHC) at the European Organization for Nuclear Research (CERN), with 170 sites in 40 countries and more than 600,000 processing cores. The grid service providers grant users access to resources that they can utilize on demand for the execution of custom software applications used for the analysis of data. The code that the users can execute is completely flexible, and commonly there are no significant restrictions. This flexibility and the availability of immense computing power increases the security challenges of these environments. Attackers are a concern for grid administrators. These attackers may request the execution of software with a malicious code that gives them the possibility of compromising the underlying institutions’ infrastructure. Grid systems need security countermeasures to keep the user code running, without allowing access to critical components but whilst still retaining flexibility. The administrators of grid systems also need to be continuously monitoring the activities that the applications are carrying out. An analysis of these activities is necessary to detect possible security issues, to identify ongoing incidents and to perform autonomous responses. The size and complexity of grid systems make manual security monitoring and response expensive and complicated for human analysts. Legacy intrusion detection and prevention systems (IDPS) such as Snort and OSSEC are traditionally used for security incident monitoring in the grid, cloud, clusters and standalone systems. However, IDPS are limited due to the use of hardcoded fixed rules that need to be updated continuously to cope with different threats.
This thesis introduces an architecture for improving security in grid computing. The architecture integrates the use of security by isolation, behavior monitoring and deep learning (DL) for the classification of real-time traces of the running user payloads also known as grid jobs. The first component of the proposal, the Linux containers (LCs), are used to provide isolation between grid jobs and to gather specific traceable information about the behavior of individual jobs. LCs offer a safe environment for the execution of arbitrary user scripts or binaries, protecting the sensitive components of the grid member organizations. The containers consist of a software sandboxing technique and form a lightweight alternative to other technologies such as virtual machines (VMs) that usually implement a full machine-level emulation and can, therefore, significantly affect the performance. This performance loss is commonly unacceptable in high-throughput computing scenarios. Containers enable the collection of monitoring information from the processes running inside them. The data collected via the LCs monitoring is employed to feed a DL-based IDPS.
DL methods can acquire knowledge from experience, which eliminates the need for operators to formally specify all the knowledge that a system requires. These methods can improve IDPS by building models that are utilized to detect security incidents automatically, having the ability to generalize to new classes of issues. DL can produce lower false positive rates for intrusion detection, but also provides a measure of false negatives, which can be improved with new training data. Convolutional neural networks (CNNs) are utilized for the distinction between regular and malicious job classes. A set of samples is collected from regular production grid jobs from the grid infrastructure of “A Large Ion Collider Experiment” (ALICE) and malicious Linux binaries from a malware research website. The features extracted from these samples are utilized for the training and validation of the machine learning (ML) models. The utilization of a generative approach to enhance the required training data is also proposed. Recurrent neural networks (RNN) are used as generative models for the simulation of training data that complements and improves the real collected dataset. This data augmentation strategy is useful to supplement the lack of training data in ML processes.
...
Biologische Signalwege bilden komplexe Netzwerke aus, um die Zellantwort sensibel regulieren zu können. Systembiologische Ansätze werden eingesetzt, um biologische Systeme anhand von Computer-gestützten Modellen zu untersuchen. Ein mathematisches Modell erlaubt, neben der logischen Erfassung der Regulation des biologischen Systems, die systemweite Simulation des dynamischen Verhaltens und Analyse der Robustheit und Anfälligkeit.
Der TNFR1-vermittelte Signalweg reguliert essenzielle Zellvorgänge wie Entzündungsantworten,
Proliferation und Zelltod. TNFR1 wird von dem Zytokin TNF-α stimuliert und fördert daraufhin die Bildung verschiedener makromolekularer Komplexe, welche unterschiedliche Zellantworten einleiten, von der Aktivierung des Transkriptionsfaktors NF-κB, welcher die Expression von proliferationsfördernden Genen reguliert, bis zu zwei Formen des Zelltods, der Apoptose und der Nekroptose. Die Regulation der verschiedenen Zellantworten wird auch als molekularer Schalter bezeichnet. Die exakten molekularen Vorgänge, welche die Zellantwort modulieren, sind noch nicht vollständig entschlüsselt. Eine Fehlregulation des Signalwegs kann chronische Entzündungen hervorrufen oder die Entstehung von Tumoren fördern.
In dieser Thesis haben wir die neuesten Erkenntnisse der Forschung des TNFR1-Signalwegs anhand von umfangreichen Interaktionsdaten aus der Literatur erstmals in einem Petrinetz-Modell erfasst und analysiert. Das manuell kuratierte Modell umfasst die sequenziellen Prozesse der NF-κB-Aktivierung, Apoptose und Nekroptose und berücksichtigt den Einfluss posttranslationaler Modifikationen.
Weiterhin wurden Analysemethoden für Signalwegs-Modelle entwickelt, welche die spezifischen Anforderungen dieser biologischen Systeme berücksichtigen und eine biologisch motivierte Netzwerkanalyse ermöglichen. Die Manatee-Invarianten identifizieren Signalflüsse im Gleichgewichtszustand in Modellen, die Zyklen aufweisen, und werden als Linearkombination von Transitions-Invarianten gebildet. Diese Signalflüsse erfassen idealerweise einen Prozess von der Rezeptorstimulation zur Zellantwort in einem Modell eines Signalwegs. Die Bestimmung aller möglichen Signalflüsse in Modellen von Signalwegen ist eine notwendige Voraussetzung für weitere biologisch motivierte Analysen, wie die in silico-Knockout Analyse. Wir haben ebenfalls ein neues Konzept zur Untersuchung von in silico-Knockouts vorgestellt. Die Effekte der in silico-Knockouts auf einzelne Komplexe und Prozesse des Signalwegs werden in der in silico-Knockout-Matrix repräsentiert. Wir haben die Software-Anwendung isiKnock entwickelt, welche beide Konzepte kombiniert und eine systematische Knockout-Analyse von Petrinetz-Modellen unterstützt.
Das Petrinetz-Modell des TNFR1-Signalwegs wurde auf seine elementaren Eigenschaften geprüft und die etablierten Analysen wie Platz-Invarianten und Transitions-Invarianten durchgeführt. Hierbei konnten die Transitions-Invarianten nicht in allen Fällen komplette biologische Signalflüsse beschreiben. Wir haben ebenfalls die neu vorgestellten Methoden auf das Petrinetz-Modell angewandt. Anhand der Manatee-Invarianten konnten wir die zusammenhängenden Signalflüsse identifizieren und nach ihrem biologischen Ausgang klassifizieren sowie die Auswirkungen der Rückkopplungen untersuchen. Wir konnten zeigen, dass die survival-Antwort durch die Aktivierung von NF-κB am häufigsten auftritt, danach die Apoptose, gefolgt von der Nekroptose. Die alternativen Signalflüsse in Form der Manatee-Invarianten spiegeln die Robustheit des biologischen Systems wider. Wir führten eine ausgiebige in silico-Knockout-Analyse basierend auf den Manatee-Invarianten durch, um die Proteine des Signalwegs nach ihrem Einfluss einzustufen und zu gruppieren. Die Proteine des Komplex I wiesen hierbei den größten Einfluss auf, angeführt von der Rezeptorstimulation und RIP1. Wir betrachteten und diskutierten die Regulation des molekularen Schalters anhand der Knockout-Analyse von selektierten Proteinen und deren Auswirkung auf wichtige Komplexe im Modell. Wir identifizierten die Ubiquitinierung in Komplex I sowie die NF-κB-abhängige Genexpression als die wichtigen Kontrollpunkte des TNFR1-Signalwegs. In Komplex II ist die Regulation der Aktivierung der Caspase-Aktivität entscheidend.
Die umfangreiche Netzwerkanalyse basierend auf Manatee-Invarianten und systematischer in silico-Knockout-Analyse verifizierte das Petrinetz-Modell und erlaubte die Untersuchung der Robustheit und Anfälligkeit des Systems. Die neu entwickelten Methoden ermöglichen eine fundierte, biologisch relevante Untersuchung von in silico-Modellen von Signalwegen. Der systembiologische Ansatz unterstützt die Aufklärung der Regulation und Funktion des verflochtenen Netzwerks des TNFR1-Signalwegs.
Die digitale Pathologie ist ein neues, aber stetig wachsendes, Feld in der Medizin. Die kontinuierliche Entwicklung von verbesserten digitalen Scannern erlaubt heute das Abscannen von kompletten Gewebeschnitten und Whole Slide Images gewinnen an Bedeutung. Ziel dieser Arbeit ist die Methodenentwicklung zur Analyse von Whole Slide Images des klassischen Hodgkin Lymphoms. Das Hodgkin-Lymphom, oder Morbus Hodgkin, ist eine Tumorerkrankung des Lymphsystems, bei der die monoklonalen Tumorzellen in der Regel von B-Lymphozyten im Vorläuferstadium abstammen.
Etwas mehr als 9.000 Hodgkin-Lymphom-Fälle werden jährlich in den USA diagnostiziert. Zwar ist die 5-Jahre-Überlebensrate für Hodgkin-Lymphome mit 85,3 % vergleichsweise hoch, dennoch werden etwa 1.100 Todesfälle pro Jahr in den USA registriert. Auf mikroskopischer Ebene sind die Hodgkin-Reed-Sternberg Zellen (HRS-Zellen) typisch für das klassische Hodgkin Lymphom. HRS-Zellen haben einen oder mehrere Zellkerne, die stark vergrößert sind und eine grobe Chromatinstruktur aufweisen. Immunhistologisch gibt es für HRS-Zellen charakterisierende Marker, so sind HRS-Zellen positiv für den Aktivierungsmarker CD30.
Neben der konventionellen Mikroskopie, ermöglichen Scanner das Digitalisieren von ganzen Objektträgern (Whole Slide Image). Whole Slide Images werden bisher wenig in der Routinediagnostik eingesetzt. Ein großer Vorteil von digitalisierten Gewebeschnitten bietet sich bei der computergestützten Analyse. Automatisierte Bildanalyseverfahren wie Zellerkennung können Pathologen bei der Diagnose unterstützen, indem sie umfassende Statistiken zur Anzahl und Verteilung von immungefärbten Zellen bereitstellen.
Die untersuchten immunohistologischen Bilder wurden vom Dr. Senckenbergisches Institut für Pathologie des Universitätsklinikums Frankfurt bereit gestellt. Die betrachteten Gewebeschnitte sind gegen CD30 immungefärbt, einem Membranrezeptor, welcher in HRS-Zellen und aktivierten Lymphozyten exprimiert wird. Die Gewebeschnitte wurden mit einem Aperio ScanScope slide scanner digitalisiert und liegen mit einer hohen Auflösung von 0,25 μm pro Pixel vor. Bei den vorliegenden Gewebeschnittgrößen ergeben sich Bilder mit bis zu 90.000 x 90.000 Pixeln.
Der untersuchte Bilddatensatz umfasst 35 Bilder von Lymphknotengewebeschnitten der drei Krankheitsbilder: Gemischtzelliges klassisches Hodgkinlymphom, noduläres klassisches Hodgkinlymphom und Lymphadenitis. Die Bildverarbeitungspipeline wurden teils neu implementiert, teils von etablierten Bilderkennungssoftware und -bibliotheken wie CellProfiler und Java Advanced Imaging verwendet. CD30-positive Zellobjekte werden in den Gewebeschnitten automatisiert erkannt und neben der globalen Position im Whole Slide Image weitere Morphologiedeskriptoren berechnet, wie Fläche, Feret-Durchmesser, Exzentrität und Solidität. Die Zellerkennung zeigt mit 84 % eine hohe Präzision und mit 95 % eine sehr gute Sensitivität.
Es konnte gezeigt werden, dass in Lymphadenitisfällen im Schnitt deutlich weniger CD30- positive Zellen präsent sind als in klassisches Hodgkinlymphom. Während hier im Schnitt nur rund 3.000 Zellen gefunden wurden, lag der Durchschnitt für das Mischtyp klassisches Hodgkinlymphom bei rund 19.000 CD30 positiven Zellen. Während die CD30-positiven Zellen in Lymphadenitisfällen relativ gleichmäßig verteilt sind, bilden diese in klassischen Hodgkinlymphom-Fällen Zellcluster höherer Dichte.
Die berechneten Morphologiedeskriptoren bieten die Möglichkeit die Gewebeschnitte und den Krankheitsverlauf näher zu beschreiben. Zudem sind bisher Größe und Erscheinungsbild der HRS-Zellen hauptsächlich anhand manuell ausgewählter Zellen bestimmt worden. Ein Maß für die Ausdehnung der Zellen ist der maximale Feret-Durchmesser. Bei CD30-Zellen im klassischen Hodgkinlymphom liegt dieser im Durchschnitt bei 20 μm und ist somit deutlich größer als die durchschnittlich gemessenen 15 μm in Lymphadenitis.
Es wurde ein graphentheoretischer Ansatz gewählt, um die CD30 positiven Zellen und ihre räumliche Nachbarschaft zu modellieren. In CD30-Zellgraphen von klassischen Hodgkinlymphom-Gewebeschnitten ist der durchschnittliche Knotengrad gegenüber den von Lymphadenitis-Bildern stark erhöht. Der Vergleich mit Zufallsgraphen zeigt, dass die beobachteten Knotengradverteilungen nicht für eine zufällige Verteilung der Zellen im Gewebeschnitt sprechen. Eigenschaften und Verteilung von Communities in CD30-Zellgraphen können hinzugenommen werden, um klassisches Hodgkinlymphom Gewebeschnitte näher zu charakterisieren.
Diese Arbeit zeigt, dass die Auswertung von Whole Slide Image unterstützend zur Verbesserung der Diagnose möglich ist. Die mehr als 400.000 automatisch erkannten CD30-positiven Zellobjekte wurden morphologisch beschrieben, und zusammen mit ihrer Position im Gewebeschnitt ist die Betrachtung wichtiger Eigenschaften des klassischen Hodgkinlymphoms realisierbar. Zellgraphen können durch weitere Zelltypen erweitert werden und auf andere Krankheitsbilder angewendet werden.
Precise timing of spikes between different neurons has been found to convey reliable information beyond the spike count. In contrast, the role of small phase delays with high temporal variability, as reported for example in oscillatory activity in the visual cortex, remains largely unclear. This issue becomes particularly important considering the high speed of neuronal information processing, which is assumed to be based on only a few milliseconds, or oscillation cycles within each processing step.
We investigate the role of small and imprecise phase delays with a stochastic spiking model that is strongly motivated by experimental observations. Within individual oscillation cycles the model contains only two signal parameters describing directly the rate and the phase. We specifically investigate two quantities, the probability of correct stimulus detection and the probability of correct change point detection, as a function of these signal parameters and within short periods of time such as individual oscillation cycles.
Optimal combinations of the signal parameters are derived that maximize these probabilities and enable comparison of pure rate, pure phase and combined codes. In particular, the gain in detection probability when adding imprecise phases to pure rate coding increases with the number of stimuli. More interestingly, imprecise phase delays can considerably improve the process of detecting changes in the stimulus, while also decreasing the probability of false alarms and thus, increasing robustness and speed of change point detection.
The results are applied to parameters extracted from empirical spike train recordings of neurons in the visual cortex in response to a number of visual stimuli. The results suggest that near-optimal combinations of rate and phase parameters can be implemented in the brain, and that phase parameters could particularly increase the quality of change point detection in cases of highly similar stimuli.
The thesis is about random Constraint Satisfaction Problems (rCSP). These are random instances of classical problems in NP. In the literature the study of rCSP involve identifying-locating phase transition phenomena as well as investigating algorithmic questions.
Recently, some ingenious however mathematically non-rigorous theories from statistical physics have given the study of rCSP a new perspective; the so-called Cavity Method makes some very impressing predictions about the most fundamental properties of rCSP.
In this thesis, we investigate the soundness of some of the most basic predictions of the Cavity Method, mainly, regarding the structure of the so-called Gibbs distribution on various rCSP models. Furthermore, we study some fundamental algorithmic problem related to rCSP. This includes both analysing well-known dynamical process (dynamics) like Glauber Dynamics, Metropolis Process, as well as proposing new algorithmic approaches to some natural problems related to rCSP.
We live in age of data ubiquity. Even the most conservative estimates predict exponential growth in produced, transmitted and stored data. Big data is used to power business analytics as well as to foster scientific discoveries. In many cases, explosion of produced data exceeds capabilities of digital storage systems. Scientific high-performance computing environments cope with this problem by utilizing large, distributed, storage systems. These complex systems can only provide a high degree of reliability and durability by means of data redundancy. The most straight-forward way of doing that is by replicating the data over different physical devices. However, more elaborate approaches, such as erasure coding, can provide similar data protection while utilizing less storage. Recently, software-defined reliability methods began to replace traditional, hardware- based, solutions. Complicated failure modes of storage system components also warrant checksums to guaranty long-term data integrity. To cope with ever increasing data volumes, flexible and efficient software implementation of error correction codes is of great importance. This thesis introduces a method for realizing a flexible Reed-Solomon erasure code using the “Just-In-Time” compilation technique. By exploiting intrinsic arithmetic redundancy in the algorithm, and by relying on modern optimizing compilers, we obtain a throughput-efficient erasure code implementation. Additionally, exploitation of data parallelism is achieved effortlessly by instructing the compiler to produce SIMD code for desired execution platform. We show results of codes implemented using SSE and AVX2 SIMD instruction sets for x86, and NEON instruction set for ARM platforms. Next, we introduce a framework for efficient vectorized RAID-Z redundancy operations of ZFS file system. Traditional, table-based Galois field multiplication algorithms are replaced with custom SSE and AVX2 parallel methods, providing significantly faster and more efficient parity operations. The implementation of this framework was made publicly available as a part of ZFS on Linux project, since version 0.7. Finally, we propose a new erasure scheme for use with existing, high performance, parallel filesystems. Described reliability middleware (ECCFS) allows definition of flexible, file-based, reliability policies, adapting to customized user needs. By utilizing the block erasure code, the ECCFS achieves optimal storage, computation, and network resource utilization, while providing a high level of reliability. The distributed nature of the middleware allows greater scalability and more efficient utilization of storage and network resources, in order to improve availability of the system.
Antimicrobial resistance became a serious threat to the worldwide public health in this century. A better understanding of the mechanisms, by which bacteria infect host cells and how the host counteracts against the invading pathogens, is an important subject of current research. Intracellular bacteria of the Salmonella genus have been frequently used as a model system for bacterial infections. Salmonella are ingested by contaminated food or water and cause gastroenteritis and typhoid fever in animals and humans. Once inside the gastrointestinal tract, Salmonella can invade intestinal epithelial cells. The host cell can fight against intracellular pathogens by a process called xenophagy. For complex systems, such as processes involved in the bacterial infection of cells, computational systems biology provides approaches to describe mathematically how these intertwined mechanisms in the cell function. Computational systems biology allows the analysis of biological systems at different levels of abstraction. Functional dependencies as well as dynamic behavior can be studied. In this thesis, we used the Petri net formalism to gain a better insight into bacterial infections and host defense mechanisms and to predict cellular behavior that can be tested experimentally. We also focused on the development of new computational methods.
In this work, the first realization of a mathematical model of the xenophagic capturing of Salmonella enterica serovar Typhimurium in epithelial cells was developed. The mathematical model expressed in the Petri net formalism was constructed in an iterative way of modeling and analyses. For the model verification, we analyzed the Petri net, including a computational performance of knockout experiments named in silico knockouts, which was established in this work. The in silico knockouts of the proposed Petri net are consistent with the published experimental perturbation studies and, thus, ensures the biological credibility of the Petri net. In silico knockouts that have not been experimentally investigated yet provide hypotheses for future investigations of the pathway.
To study the dynamic behavior of an epithelial cell infected with Salmonella enterica serovar Typhimurium, a stochastic Petri net was constructed. In experimental research, a decision like "Which incubation time is needed to infect half of the epithelial cells with Salmonella?" is based on experience or practicability. A mathematical model can help to answer these questions and improve experimental design. The stochastic Petri net models the cell at different stages of the Salmonella infection. We parameterized the model by a set of experimental data derived from different literature sources. The kinetic parameters of the stochastic Petri net determine the time evolution of the bacterial infection of a cell. The model captures the stochastic variation and heterogeneity of the intracellular Salmonella population of a single cell over time. The stochastic Petri net is a valuable tool to examine the dynamics of Salmonella infections in epithelial cells and generate valuable information for experimental design.
In the last part of this thesis, a novel theoretical method was introduced to perform knockout experiments in silico. The new concept of in silico knockouts is based on the computation of signal flows at steady state and allows the determination of knockout behavior that is comparable to experimental perturbation behavior. In this context, we established the concept of Manatee invariants and demonstrated the suitability of their application for in silico knockouts by reflecting biological dependencies from the signal initiation to the response. As a proof of principle, we applied the proposed concept of in silico knockouts to the Petri net of the xenophagic recognition of Salmonella. To enable the application of in silico knockouts for the scientific community, we implemented the novel method in the software isiKnock. isiKnock allows the automatized performance and visualization of in silico knockouts in signaling pathways expressed in the Petri net formalism. In conclusion, the knockout analysis provides a valuable method to verify computational models of signaling pathways, to detect inconsistencies in the current knowledge of a pathway, and to predict unknown pathway behavior.
In summary, the main contributions of this thesis are the Petri net of the xenophagic capturing of Salmonella enterica serovar Typhimurium in epithelial cells to study the knockout behavior and the stochastic Petri net of an epithelial cell infected with Salmonella enterica serovar Typhimurium to analyze the infection dynamics. Moreover, we established a new method for in silico knockouts, including the concept of Manatee invariants and the software isiKnock. The results of these studies are useful to a better understanding of bacterial infections and provide valuable model analysis techniques for the field of computational systems biology.