004 Datenverarbeitung; Informatik
Refine
Year of publication
Document Type
- Working Paper (119)
- Article (72)
- Doctoral Thesis (71)
- Diploma Thesis (46)
- Bachelor Thesis (34)
- Conference Proceeding (33)
- diplomthesis (28)
- Preprint (19)
- Report (12)
- Part of a Book (11)
Has Fulltext
- yes (470)
Is part of the Bibliography
- no (470)
Keywords
Institute
- Informatik (470) (remove)
Augmented Reality ist eine Technologie, mit der die Wahrnehmung der realen Umgebung durch computergenerierte Sinnesreize verändert bzw. erweitert wird. Zur Erweiterung dieser „angereicherten Realität“ werden virtuelle Informationen wie z.B. 3D-Objekte, Grafiken und Videos in Echtzeit in Abbildern der realen Umgebung dargestellt. Die Erweiterungen helfen dem Anwender Aufgaben in der Realität auszuführen, da sie ihm Informationen bereitstellen, die er – ohne AR – nicht unmittelbar wahrnehmen könnte. Die Zielsetzung ist, dem Benutzer den Eindruck zu vermitteln, dass die reale Umgebung und die virtuellen Objekte koexistent miteinander verschmelzen. Für AR-Anwendungen existieren zahlreiche potenzielle Einsatzgebiete, doch verhindern bisher einige Probleme die Verbreitung dieser Technologie. Einer breiten Nutzung von AR-Anwendungen steht beispielsweise die Problematik gegenüber, dass deren Erstellung hohe programmiertechnische Anforderungen an die Entwickler stellt. Zur Verminderung dieser Probleme ist es wünschenswert Benutzern ohne Programmierkenntnisse (Autoren) die Entwicklung von AR-Anwendungen zu ermöglichen. Zum anderen bestehen technologische Probleme bei den für die Registrierung der virtuellen Objekte essenziellen Trackingverfahren. Weiterhin weisen die bisherigen AR-Anwendungen im Allgemeinen und die mittels autorenorientierter Systeme erstellten AR-Applikationen im Besonderen Defizite bezüglich der Authentizität der Darstellungen auf. Dabei sind hauptsächlich inkorrekte Verdeckungen und unrealistische Schatten bei den virtuellen Objekten verantwortlich für den Verlust des Koexistenzeindrucks. In dieser Arbeit wird unter Berücksichtigung der Trackingprobleme und auf Basis von Analysen, die die wichtigsten Authentizitätskriterien bestimmen, ein Konzept zur authentischen Integration von virtuellen Objekten in AR-Anwendungen erarbeitet und dargelegt. Auf diesem Integrationsprozess basierend werden Konzepte für Werkzeuge mit grafischen Benutzungsschnittstellen abgeleitet, mit denen Autoren die Erstellung von AR-Anwendungen mit hoher Darstellungsauthentizität ermöglicht wird. Einerseits verfügen die mit diesen Werkzeugen erstellten AR-Anwendungen über eine verbesserte Registrierung der virtuellen Objekte. Andererseits stellen die Werkzeuge Lösungen bereit, damit die virtuellen Objekte der AR-Anwendungen korrekte Verdeckungen aufweisen und über Schatten und Schattierungseffekte verfügen, die mit der tatsächlichen Beleuchtungssituation der realen Umgebung übereinstimmen. Sämtliche dieser Autorenwerkzeuge basieren auf einem in dieser Arbeit dargelegten Prinzip, bei dem die authentische Integration mittels leicht verständlicher bzw. wenig komplexer Arbeitsschritte und auf Basis der Verwendung einer Bildsequenz der realen Zielumgebung stattfindet. Die Konzepte dieser Arbeit werden durch die Implementierung der Autorenwerkzeuge validiert. Dabei zeigt sich, dass die Konzepte technisch umsetzbar sind. Die Evaluierung basiert auf der Gegenüberstellung eines in dieser Arbeit entwickelten Anforderungskatalogs und verdeutlicht die Eignung des Integrationsprozesses und der davon abgeleiteten Konzepte der Autorenwerkzeuge. Die Autorenwerkzeuge werden in eine bestehende, frei verfügbare AR-Autorenumgebung integriert.
In der folgenden Anleitung werden diverse Methoden für den Zugriff auf das Ressourcen-Management, entwickelt von der AG Texttechnologie, erläutert. Das Ressourcen-Management ist für alle Anwendungen identisch. Erklärt wird das Auslesen des Ressourcen-Managements der Projects „PHI Picturing Atlas“. Alle Anweisungen erfolgen per RESTful-Aufrufen. Die API-Dokumentation findet sich unter http://phi.resources.hucompute.org.
Monitoring is an indispensable tool for the operation of any large installation of grid or cluster computing, be it high energy physics or elsewhere. Usually, monitoring is configured to collect a small amount of data, just enough to enable detection of abnormal conditions. Once detected, the abnormal condition is handled by gathering all information from the affected components. This data is processed by querying it in a manner similar to a database.
This contribution shows how the metaphor of a debugger (for software applications) can be transferred to a compute cluster. The concepts of variables, assertions and breakpoints that are used in debugging can be applied to monitoring by defining variables as the quantities recorded by monitoring and breakpoints as invariants formulated via these variables. It is found that embedding fragments of a data extracting and reporting tool such as the UNIX tool awk facilitates concise notations for commonly used variables since tools like awk are designed to process large event streams (in textual representations) with bounded memory. A functional notation similar to both the pipe notation used in the UNIX shell and the point-free style used in functional programming simplify the combination of variables that commonly occur when formulating breakpoints.
We introduce tree-width for first order formulae φ, fotw(φ). We show that computing fotw is fixed-parameter tractable with parameter fotw. Moreover, we show that on classes of formulae of bounded fotw, model checking is fixed parameter tractable, with parameter the length of the formula. This is done by translating a formula φ with fotw(φ)<k into a formula of the k-variable fragment Lk of first order logic. For fixed k, the question whether a given first order formula is equivalent to an Lk formula is undecidable. In contrast, the classes of first order formulae with bounded fotw are fragments of first order logic for which the equivalence is decidable. Our notion of tree-width generalises tree-width of conjunctive queries to arbitrary formulae of first order logic by taking into account the quantifier interaction in a formula. Moreover, it is more powerful than the notion of elimination-width of quantified constraint formulae, defined by Chen and Dalmau (CSL 2005): for quantified constraint formulae, both bounded elimination-width and bounded fotw allow for model checking in polynomial time. We prove that fotw of a quantified constraint formula φ is bounded by the elimination-width of φ, and we exhibit a class of quantified constraint formulae with bounded fotw, that has unbounded elimination-width. A similar comparison holds for strict tree-width of non-recursive stratified datalog as defined by Flum, Frick, and Grohe (JACM 49, 2002). Finally, we show that fotw has a characterization in terms of a cops and robbers game without monotonicity cost.
Das größte Problem bei der Erstellung von MR-Anwendungen besteht darin, dass sie meistens durch Programmierung erstellt werden. Daher muss ein Autor spezielles Fachwissen über MR-Technologie und zumindest allgemeine Programmierkenntnisse mitbringen, um eine MR-Anwendung erstellen zu können. Dieser Erstellungsprozess soll mit Hilfe von MR-Autorensystemen, die derzeit auf dem Markt existieren und in der Forschung entwickelt werden, vereinfacht werden. Dies war ein Grund, warum diese Arbeit sich zum Ziel erklärte, zu überprüfen, inwieweit die Erstellung von MRAnwendungen durch Einsatz von MR-Autorensystemen vereinfacht wird. Ein weiteres Hauptziel war die Erstellung einer repräsentativen MR-Anwendung, die in dieser Arbeit als MR-Referenzanwendung bezeichnet wird. Sie sollte vor allem bei weiteren Entwicklungen als Vorlage dienen können und auf Basis von standardisierten Vorgehensmodellen, wie das Wasserfallmodell, erstellt werden. Ganz wichtig war es noch im Rahmen dieser Arbeit zu bestätigen, dass standardisierte Vorgehensmodelle auf MR-Anwendungen übertragbar sind. Um diese Ziele zu erreichen, sind in dieser Arbeit viele Schritte befolgt worden, die jeweils als Teilziele betrachtet werden können. Die MR-Referenzanwendung , die im Rahmen dieser Arbeit erstellt wurde, sollte mit Hilfe eines MR-Autorensystems umgesetzt werden. Um das richtige MRAutorensystem dafür auszusuchen, wurden im Rahmen einer Analyse fakultative und obligatorische Anforderungen an MR-Autorensysteme definiert, worin auch Funktionen identifiziert wurden, die ein solches System bereitstellen sollte. Das Anbieten einer Vorschau ist ein Beispiel für diese Funktionen, die bei der Erstellung von MR-Anwendungen eine essentielle Rolle spielen können. Die obligatorischen Anforderungen sind welche, die jedes Softwaresystem erfüllen soll, während die fakultativen das Ziel der Verbesserung von Autorensystemen verfolgen. Mit Hilfe der Analyse wurde ein Vergleich zwischen bekannten MR-Autorensystemen gezogen, dessen Ergebnis AMIRE als ein für die Ziele dieser Arbeit geeignetes MR-Autorensystem identifizierte. Für die MR-Referenzanwendung , die ähnliche Funktionen aufweisen sollte wie andere typische MR-Anwendungen wurden Funktionen, Anwendungsfälle und Design der Oberfläche spezifiziert. Diese Spezifikation wurde unabhängig von dem ausgesuchten Autorensystem durchgeführt, um darin analog zur Software-Technik das Augenmerk auf fachliche und nicht auf technische Aspekte zu legen. Um ans Ziel zu gelangen, wurde die MR-Referenzanwendung durch AMIRE realisiert, jedoch musste zuvor ihre Spezifikation auf dieses MR-Autorensystem überführt werden. Bei der Überführung wurde die Realisierung aus technischer Sicht betrachtet, das heißt es wurden verschiedene Vorbereitungen, wie die Auswahl der benötigten Komponenten, die Planung der Anwendungslogik und die Aufteilung der Anwendung in verschiedenen Zuständen, durchgeführt. Nach der gelungenen Realisierung und beispielhaften Dokumentation der MRReferenzanwendung konnte die Arbeit bewertet werden, worin die erzielten Resultate den Zielen der Arbeit gegenübergestellt wurden. Die Ergebnisse bestätigen, dass mit AMIRE die Entwicklung einer MR-Anwendung ohne Spezialwissen möglich ist und dass diese Arbeit alle ihrer Ziele innerhalb des festgelegten Zeitrahmens erreicht hat.
One of the main things that we as humans do in our lifetime is the recognition and/or classification of all kind of visual objects. It is known that about fifty percentage of the neocortex is responsible for visual processing. This fact tells us that object recognition (OR) is a complex task in our and in the animal brain, but we do it in a fraction of a second.
The main question is: How does the brain exactly do it? Does the brain use some feature extraction algorithm for OR tasks? The hierarchical structure of the visual cortex and studies on a part of the visual cortex called V1 tell us that our brain uses feature extraction for OR tasks by Gabor filters. We also use our previous knowledge in object recognition to detect and recognize the objects which we never saw before. Also, as we grow up we learn new objects faster than before.
These facts imply that the visual cortex of human and other animals uses some common (universal) features at least in the first stages to distinguish between different objects. In this context, we might ask: Do universal features in images exist, such that by using them we are able to efficiently recognize any unknown object? Is it necessary to extract new special features for any new object? How about using existing features from other tasks for this? Is it possible to efficiently use extracted feature of a specific task for other tasks? Are there some general features in natural and non-natural images which can also be used for specific object recognition? For example, can we use extracted features of natural images also for handwritten digit classification?
In this context, our work proposes a new information-based approach and tries to give some answers to the questions above. As a result, in our case we found that we could indeed extract unique features which are valid in all three different kinds of tasks. They give classification results that are about as good as the results reported by the corresponding literature for the specialized systems, or even better ones.
Another problem of the OR task is the recognition of objects, independently of any perception changes. We as humans or also animals can recognize objects in spite of many deformations (e.g. changes in illumination, rotation in any direction or angles, distortion and scaling up or down) in a fraction of a second. When observing an object which we never saw, we can imagine the rotated or scaled up objectin our mind. Here, also the question arises: How does the brain solve this problem? To do this, does the brain learn some mapping algorithm (transformation), independent of the objects or their features?
There are many approaches to model the mapping task. One of the most versatile ones is the idea of dynamically changing mappings, the dynamic link mapping (DLM). Although the dynamic link mapping systems show interesting results, the DLM system has the problem of a high computational complexity. In addition, because it uses the least mean squared error as risk function, the performance for classification is also not optimal. For random values where outliers are present, this system may not work well because outliers influence the mean squared error classification much more than probability-based systems. Therefore, we would like to complete the DLM system by a modified approach.
In our contribution, we will introduce a new system which employs the information criteria (i.e. probabilities) to overcome the outlier problem of the DLM systems and has a smaller computational complexity. The new information based selforganised system can solve the problem of invariant object recognition, especially in the task of rotation in depth, and does not have the disadvantage of current DLM systems and has a smaller computational complexity.
The encoding of images by semantic entities is still an unresolved task. This paper proposes the encoding of images by only a few important components or image primitives. Classically, this can be done by the Principal Component Analysis (PCA). Recently, the Independent Component Analysis (ICA) has found strong interest in the signal processing and neural network community. Using this as pattern primitives we aim for source patterns with the highest occurrence probability or highest information. For the example of a synthetic image composed by characters this idea selects the salient ones. For natural images it does not lead to an acceptable reproduction error since no a-priori probabilities can be computed. Combining the traditional principal component criteria of PCA with the independence property of ICA we obtain a better encoding. It turns out that the Independent Principal Components (IPC) in contrast to the Principal Independent Components (PIC) implement the classical demand of Shannon’s rate distortion theory.
This paper proposes a new approach for the encoding of images by only a few important components. Classically, this is done by the Principal Component Analysis (PCA). Recently, the Independent Component Analysis (ICA) has found strong interest in the neural network community. Applied to images, we aim for the most important source patterns with the highest occurrence probability or highest information called principal independent components (PIC). For the example of a synthetic image composed by characters this idea selects the salient ones. For natural images it does not lead to an acceptable reproduction error since no a-priori probabilities can be computed. Combining the traditional principal component criteria of PCA with the independence property of ICA we obtain a better encoding. It turns out that this definition of PIC implements the classical demand of Shannon’s rate distortion theory.
Classically, encoding of images by only a few, important components is done by the Principal Component Analysis (PCA). Recently, a data analysis tool called Independent Component Analysis (ICA) for the separation of independent influences in signals has found strong interest in the neural network community. This approach has also been applied to images. Whereas the approach assumes continuous source channels mixed up to the same number of channels by a mixing matrix, we assume that images are composed by only a few image primitives. This means that for images we have less sources than pixels. Additionally, in order to reduce unimportant information, we aim only for the most important source patterns with the highest occurrence probabilities or biggest information called „Principal Independent Components (PIC)“. For the example of a synthetic picture composed by characters this idea gives us the most important ones. Nevertheless, for natural images where no a-priori probabilities can be computed this does not lead to an acceptable reproduction error. Combining the traditional principal component criteria of PCA with the independence property of ICA we obtain a better encoding. It turns out that this definition of PIC implements the classical demand of Shannon’s rate distortion theory.
The efficient management of large multimedia databases requires the development of new techniques to process, characterize, and search for multimedia objects. Especially in the case of image data, the rapidly growing amount of documents prohibits a manual description of the images’ content. Instead, the automated characterization is highly desirable to support annotation and retrieval of digital images. However, this is a very complex and still unsolved task. To contribute to a solution of this problem, we have developed a mechanism for recognizing objects in images based on the query by example paradigm. Therefore, the most salient image features of an example image representing the searched object are extracted to obtain a scale-invariant object model. The use of this model provides an efficient and robust strategy for recognizing objects in images independently of their size. Further applications of the mechanism are classical recognition tasks such as scene decomposition or object tracking in video sequences.
Im Rahmen dieser Bachelorarbeit werden verschiedene Non-Photorealistic Rendering Verfahren zur Darstellung von rekonstruierten Artefakten, im Bereich der Paläontologie, beschrieben und implementiert. Hauptsächlich arbeiten die vorgestellten Verfahren im zweidimensionalen Bildraum, um beispielsweise Kanten in Bildern zu detektieren. Hierbei bedienen wir uns sogenannter Normal- und Depthmaps, welche als Zwischenresultate dienen, um die nötigen Informationen zu sammeln, welche zur Erkennung von Kanten im Bild notwendig sind. Neben der Kantendetektion werden NPR Verfahren genutzt, um skizzenhafte Illustrationen zu erzeugen, welche per Hand gezeichnete wissenschaftliche bzw. technische Illustrationen nachahmen und somit (halb)automatisieren sollen. Mithilfe von (programmierbaren) Shadern werden dann spezielle Texturen auf die Oberflächen der Modelle gelegt, um eine skizzenhafte Darstellung zu erzeugen. Solche Verfahren erleichtern demnach die aufwändige Arbeit der Künstler, welche gewöhnlich viel Zeit für ihre Illustrationen benötigen.
Driving can be dangerous. Humans become inattentive when performing a monotonous task like driving. Also the risk implied while multi-tasking, like using the cellular phone while driving, can break the concentration of the driver and increase the risk of accidents. Others factors like exhaustion, nervousness and excitement affect the performance of the driver and the response time. Consequently, car manufacturers have developed systems in the last decades which assist the driver under various circumstances. These systems are called driver assistance systems. Driver assistance systems are meant to support the task of driving, and the field of action varies from alerting the driver, with acoustical or optical warnings, to taking control of the car, such as keeping the vehicle in the traffic lane until the driver resumes control. For such a purpose, the vehicle is equipped with on-board sensors which allow the perception of the environment and/or the state of the vehicle. Cameras are sensors which extract useful information about the visual appearance of the environment. Additionally, a binocular system allows the extraction of 3D information. One of the main requirements for most camera-based driver assistance systems is the accurate knowledge of the motion of the vehicle. Some sources of information, like velocimeters and GPS, are of common use in vehicles today. Nevertheless, the resolution and accuracy usually achieved with these systems are not enough for many real-time applications. The computation of ego-motion from sequences of stereo images for the implementation of driving intelligent systems, like autonomous navigation or collision avoidance, constitutes the core of this thesis. This dissertation proposes a framework for the simultaneous computation of the 6 degrees of freedom of ego-motion (rotation and translation in 3D Euclidean space), the estimation of the scene structure and the detection and estimation of independently moving objects. The input is exclusively provided by a binocular system and the framework does not call for any data acquisition strategy, i.e. the stereo images are just processed as they are provided. Stereo allows one to establish correspondences between left and right images, estimating 3D points of the environment via triangulation. Likewise, feature tracking establishes correspondences between the images acquired at different time instances. When both are used together for a large number of points, the result is a set of clouds of 3D points with point-to-point correspondences between clouds. The apparent motion of the 3D points between consecutive frames is caused by a variety of reasons. The most dominant motion for most of the points in the clouds is caused by the ego-motion of the vehicle; as the vehicle moves and images are acquired, the relative position of the world points with respect to the vehicle changes. Motion is also caused by objects moving in the environment. They move independently of the vehicle motion, so the observed motion for these points is the sum of the ego-vehicle motion and the independent motion of the object. A third reason, and of paramount importance in vision applications, is caused by correspondence problems, i.e. the incorrect spatial or temporal assignment of the point-to-point correspondence. Furthermore, all the points in the clouds are actually noisy measurements of the real unknown 3D points of the environment. Solving ego-motion and scene structure from the clouds of points requires some previous analysis of the noise involved in the imaging process, and how it propagates as the data is processed. Therefore, this dissertation analyzes the noise properties of the 3D points obtained through stereo triangulation. This leads to the detection of a bias in the estimation of 3D position, which is corrected with a reformulation of the projection equation. Ego-motion is obtained by finding the rotation and translation between the two clouds of points. This problem is known as absolute orientation, and many solutions based on least squares have been proposed in the literature. This thesis reviews the available closed form solutions to the problem. The proposed framework is divided in three main blocks: 1) stereo and feature tracking computation, 2) ego-motion estimation and 3) estimation of 3D point position and 3D velocity. The first block solves the correspondence problem providing the clouds of points as output. No special implementation of this block is required in this thesis. The ego-motion block computes the motion of the cameras by finding the absolute orientation between the clouds of static points in the environment. Since the cloud of points might contain independently moving objects and outliers generated by false correspondences, the direct computation of the least squares might lead to an erroneous solution. The first contribution of this thesis is an effective rejection rule that detects outliers based on the distance between predicted and measured quantities, and reduces the effects of noisy measurement by assigning appropriate weights to the data. This method is called Smoothness Motion Constraint (SMC). The ego-motion of the camera between two frames is obtained finding the absolute orientation between consecutive clouds of weighted 3D points. The complete ego-motion since initialization is achieved concatenating the individual motion estimates. This leads to a super-linear propagation of the error, since noise is integrated. A second contribution of this dissertation is a predictor/corrector iterative method, which integrates the clouds of 3D points of multiple time instances for the computation of ego-motion. The presented method considerably reduces the accumulation of errors in the estimated ego-position of the camera. Another contribution of this dissertation is a method which recursively estimates the 3D world position of a point and its velocity; by fusing stereo, feature tracking and the estimated ego-motion in a Kalman Filter system. An improved estimation of point position is obtained this way, which is used in the subsequent system cycle resulting in an improved computation of ego-motion. The general contribution of this dissertation is a single framework for the real time computation of scene structure, independently moving objects and ego-motion for automotive applications.
Multi-view microscopy techniques are used to increase the resolution along the optical axis for 3D imaging. Without this, the resolution is insufficient to resolve subcellular events. In addition, parts of the images of opaque specimens are often highly degraded or masked. Both problems motivate scientists to record the same specimen from multiple directions. The images, then have to be digitally fused into a single high-quality image. Selective-plane illumination microscopy has proven to be a powerful imaging technique due to its unsurpassed acquisition speed and gentle optical sectioning. However, even in the case of multi view imaging techniques that illuminate and image the sample from multiple directions, light scattering inside tissues often severely impairs image contrast.
Here we show that for c-elegans embryos multi view registration can be achieved based on segmented nuclei. However, segmentation of nuclei in high density distribution like c-elegans embryo is challenging. We propose a method which uses 3D Mexican hat filter for preprocessing and 3D Gaussian curvature for the post-processing step to separate nuclei. We used this method successfully on 3 data sets of c-elegans embryos in 3 different views. The result of segmentation outperforms previous methods. Moreover, we provide a simple GUI for manual correction and adjusting the parameters for different data.
We then proposed a method that combines point and voxel registration for an accurate multi view reg- istration of c-elegans embryo, which does not need any special experimental preparation. We demonstrate the performance of our approach on data acquired from fixed embryos of c-elegans worms. This multi step approach is successfully evaluated by comparison to different methods and also by using synthetic data. The proposed method could overcome the typically low resolution along the optical axis and enable stitching to- gether the different parts of the embryo available through the different views. A tool for running the code and analyzing the results is developed.
Algorithms and data structures constitute the theoretical foundations of computer science and are an integral part of any classical computer science curriculum. Due to their high level of abstraction, the understanding of algorithms is of crucial concern to the vast majority of novice students. To facilitate the understanding and teaching of algorithms, a new research field termed "algorithm visualisation" evolved in the early 1980's. This field is concerned with innovating techniques and concepts for the development of effective algorithm visualisations for teaching, study, and research purposes. Due to the large number of requirements that high-quality algorithm visualisations need to meet, developing and deploying effective algorithm visualisations from scratch is often deemed to be an arduous, time-consuming task, which necessitates high-level skills in didactics, design, programming and evaluation. A substantial part of this thesis is devoted to the problems and solutions related to the automation of three-dimensional visual simulation of algorithms. The scientific contribution of the research presented in this work lies in addressing three concerns: - Identifying and investigating the issues related to the full automation of visual simulations. - Developing an automation-based approach to minimising the effort required for creating effective visual simulations. - Designing and implementing a rich environment for the visualisation of arbitrary algorithms and data structures in 3D. The presented research in this thesis is of considerable interest to (1) researchers anxious to facilitate the development process of algorithm visualisations, (2) educators concerned with adopting algorithm visualisations as a teaching aid and (3) students interested in developing their own algorithm animations.
A central concern in genetics is to identify mechanisms of transcriptional regulation. The aim is to unravel the mapping between the DNA sequence and gene expression. However, it turned out that this is extremely complex. Gene regulation is highly cell type-specific and even moderate changes in gene ex- pression can have functional consequences.
Important contributors to gene regulation are transcription factors (TFs), that are able to directly interact with the DNA. Often, a first step in understanding the effect of a TF on the gene’s regulation is to identify the genomic regions a TF binds to. Therefore, one needs to be aware of the TF’s binding preferences, which are commonly summarized in TF binding motifs. Although for many TFs the binding motif is experimentally validated, there is still a large number of TFs where no binding motif is known. There exist many tools that link TF binding motifs to TFs. We developed the method Massif that improves the performance of such tools by incorporating a domain score that uses the DNA binding domain of the studied TF as additional information.
TF binding sites are often enriched in regulatory elements (REMs) such as promoters or enhancers, where the latter can be located megabases away from its target gene. However, to understand the regulation of a gene it is crucial to know where the REMs of a gene are located. We introduced the EpiRegio webserver that holds REMs associated to target genes predicted across many cell types and tissues using STITCHIT, a previously established method. Our publicly available webserver enables to query for REMs associated to genes (gene query) and REMs overlapping genomic regions (region query). We illus- trated the usefulness of EpiRegio by pointing to a TF that occurs enriched in the REMs of differential expressed genes in circPLOD2 depleted pericytes. Further, we highlighted genes, which are affected by CRISPR-Cas induced mutations in non-coding genomic regions using EpiRegio’s region query. Non-coding genetic variants within REMs may alter gene expression by modifying TF binding sites, which can lead to various kinds of traits or diseases. To understand the underlying molecular mechanisms, one aims to evaluate the effect of such genetic variations on TF binding sites. We developed an accurate and fast statistical approach, that can assess whether a single nucleotide polymorphism (SNP) is regulatory. Further, we combined this approach with epigenetic data and additional analyses in our Sneep workflow. For instance, it enables to identify TFs whose binding preferences are affected by the analyzed SNPs, which is illustrated on eQTL datasets for different cell types. Additionally, we used our Sneep workflow to highlight cardiovascular disease genes using regulatory SNPs and REM-gene interactions.
Overall, the described results allow a better understanding of REM-gene interactions and their interplay with TFs on gene regulation.
Non-coding variations located within regulatory elements may alter gene expression by modifying Transcription Factor (TF) binding sites and thereby lead to functional consequences like various traits or diseases. To understand these molecular mechanisms, different TF models are being used to assess the effect of DNA sequence variations, such as Single Nucleotide Polymorphisms (SNPs). However, few statistical approaches exist to compute statistical significance of results but they often are slow for large sets of SNPs, such as data obtained from a genome-wide association study (GWAS) or allele-specific analysis of chromatin data.
Results We investigate the distribution of maximal differential TF binding scores for general computational models that assess TF binding. We find that a modified Laplace distribution can adequately approximate the empirical distributions. A benchmark on in vitro and in vivo data sets showed that our new approach improves on an existing method in terms of performance and speed. In applications on large sets of eQTL and GWAS SNPs we could illustrate the usefulness of the novel statistic to highlight cell type specific regulators and TF target genes.
Conclusions Our approach allows the evaluation of DNA changes that induce differential TF binding in a fast and accurate manner, permitting computations on large mutation data sets. An implementation of the novel approach is freely available at https://github.com/SchulzLab/SNEEP.
Die Integration von Dienstgüte-Vorkehrungen in objektorientierte Verteilungsinfrastrukturen befähigt Anwendungsentwickler, den Verteilungs-induzierten Problemen verteilter Systeme zu begegnen. Im Rahmen dieser Arbeit wurde die generische Einbettung von Dienstgüte-Vorkehrungen in verteilte Objektsysteme untersucht und ein Lösungsansatz präsentiert. Zunächst wurde eine Analyse der für das Dienstgüte-Management notwendigen Aufgaben vorgestellt. Ausgehend von einem verteilten Objektmodell wurde untersucht, wie Dienstgüte-Vorkehrungen integriert werden können. Dienstgüte-Vorkehrungen stellen bei einem zugrundeliegenden Ob- jektmodell nicht-einkapselbare Verantwortlichkeiten dar. Die enge Bindung der Dienstgüte-Vorkehrungen an einen Dienst führt so zu Vermaschungen in den Strukturen der Implementierung. Damit ist die getrennte Wieder- verwendung beider erschwert. Zusätzlich werden unterschiedliche Abstrak- tionen vermischt. Die aspektorientierte Programmierung (AOP) behandelt solche Vermaschungen. Dienstgüte wurde bei der Integration in ein verteil- tes Objektmodell als ein Aspekt im Sinne der AOP klassifiziert. Ausgehend von den Anforderungen an das Dienstgüte-Management wur- de ein Rahmenwerk auf Basis eines verteilten Objektmodells entworfen. Der in dieser Arbeit dargestellte Schwerpunkt liegt auf der Spezifikation von Dienstgüte-Charakteristiken und deren Umsetzung in die Implementie- rungssprache der Anwendungsobjekte. Für die Unterstützung der Ende-zu- Ende-Dienstgüte-Erbringung ist der Einbezug von Dienstgüte-Vorkehrun- gen des Netzwerks, Betriebssystems oder spezieller Bibliotheken notwendig. Die resultierende Hierarchie von Dienstgüte-Mechanismen wird durch die vorgestellte Integration in eine Verteilungsinfrastruktur unterstützt. Durch die Integration der Dienstgüte-Spezifikation in die Schnittstel- lenbeschreibungssprache erlaubt das Rahmenwerk einen aspektorientierten Ansatz ohne die Einführung weiterer Sprachen zur Spezifikation oder Im- plementierung. Die Spezifikation von Dienstgüte-Charakteristiken in der erweiterten IDL wird in spezielle Entwurfsmuster in der Zielsprache umge- setzt. Diese Entwurfsmuster separieren die Anwendungsobjekte weitgehend von den Dienstgüte-Vorkehrungen. Die auf der Ebene der Anwendungsobjekte generierten Vorlagen für die Dienstgüte-Vorkehrungen können durch einen modifizierten bzw. schon da- für ausgelegten Verteilungsinfrastrukturkern in das System integriert wer- den. Eine einheitliche statische Schnittstelle erlaubt einen einfachen re- effektiven Ansatz. So ist der Zugriff auf Dienstgüte-Vorkehrungen tieferer Schichten wie auch die Integration anwendungsspezifischer Dienstgüte-Vor- kehrungen auf der Netzwerkschicht möglich. Das Rahmenwerk bietet somit eine klare Trennung der Verantwortlich- keiten, die sowohl Anwendungsentwickler wie auch Dienstgüte-Implemen- tierer unterstützt. Die aus der Schnittstellenbeschreibungssprache generier- ten Einheiten stellen für die Anwendungsobjekte eine Abstraktion dar, die sowohl die Verteilungsaspekte wie auch die Dienstgüte-Vorkehrungen ein- fach nutzbar anbietet und von der zugrundeliegenden Plattform isoliert. Eine sich aus dieser Arbeit ergebende Fragestellung besteht in der Er- weiterung und Verallgemeinerung des aspektorientierten Ansatzes. Die im Rahmen der Analyse betrachteten Dienstgüte-Charakteristiken sind aus dem systemnahen Bereich und insbesondere aus der Betrachtung typi- scher Probleme in verteilten Systemen und den daraus erwachsenen Anwen- dungsanforderungen gewonnen. Nicht-funktionale Aspekte der Dienster- bringung lassen sich weiter fassen. So kann ausgehend von den bereitge- stellten Abstraktionen untersucht werden, inwieweit auf Anwendungsebe- ne nicht-funktionale Eigenschaften in ähnlicher Weise einbettbar sind. Im Rahmen dieser Arbeit wurde beispielsweise eine Dienstgüte-Charakteristik zur Parallelisierung von Berechnungen realisiert. Eine anwendungsbezogene Dienstgüte-Charakteristik könnte numerische Optimierungen realisieren, die von den reinen mathematischen Operationen zu trennen ist. Andere Beispiele aus der Multimedia-Kategorie sind durch die Qualität einer Au- dio-Übertragung gegeben. So kann bei einer geringen Bandbreite durch die Kompression der Daten eine bessere Qualität der Audiowiedergabe ereicht werden, als durch Übertragung der Rohdaten. Die Kompressionsrate kann von der Anwendung isoliert und durch entsprechende Dienstgüte-Mecha- nismen realisiert werden. Qualitätsunterschiede ergeben sich durch mögli- che verlustbehaftete Kompression und de notwendigen Anforderungen an Hardware- oder Software-Unterstützung. Andere Kriterien für die Qualität lassen sich weniger leicht vor der Anwendung verbergen. Die Wiedergabe von Stereo- oder Mono-Audiodaten erfordert entsprechende Anwendungen und auch Ausstattungen der Endgeräte. Im Kontext dieser Arbeit wurde ein Objektmodell betrachtet, das eine starke Bindung zwischen Schnittstellen und Objekten besitzt. Insbeson- deren wurde bei der Umsetzung der Schnittstellenbeschreibungssprache in die Zielsprache eine Umsetzung gewählt, die Dienste als Objekte reprä- sentiert. Involviert die Diensterbringung verschiedene Objekte, kann nur ein Objekt als Stellvertreter all dieser Dienste den Service anbieten. Dieses Objekt ist für die Einhaltung von Dienstgüte-Vereinbarungen mit Klien- ten verantwortlich. Innerhalb der Objekte, die den Service realisieren, sind für die Dienstgüte-Erbringung dann ggf. weitere interne Dienstgüte-Vor- kehrungen zu etablieren. Komponentenmodelle versprechen hier einen all- gemeineren Ansatz, der die Integration von Dienstgüte-Vorkehrungen loh- nenswert erscheinen lässt. Zum einen unterstützen Komponentenmodelle definierte Schnittstellen zur Interaktion zwischen den beteiligten Objek- ten einer Komponente, und zum anderen bieten Komponenten eine über die Schnittstellenbeschreibungssprache hinausgehende Beschreibung ihrer Funktionalität in einer Komponentenspezifikation. Diese Komponentenspe- zifikation verspricht einen guten Ansatz, um Dienstgüte-Spezifikationen der Komponenten zu integrieren. Neben den beiden bislang beschriebenen Forschungsrichtungen, die je- weils ein Rahmenwerk für das Dienstgüte-Management voraussetzen und darauf aufbauen, existieren innerhalb des in der Arbeit vorgestellten Rah- menwerkes weitere offene Forschungsfragen. Die Ausgestaltung von Preisen bei der Vergabe von Ressourcen und die damit verbundenen Richtlinien für die Vergabe und auch den Entzug stellen noch kein abgeschlossenes Gebiet dar. Hier ist der Einbezug anderer Disziplinen vielversprechend. Preisrichtlinien für manche Ressourcen, die bei Nicht-Nutzung verfallen wie Netzwerkkapazität sind Gegenstand der Forschung in der Betriebs- wirtschaftslehre. Die Gestaltung von Vergaberichtlinien, insbesondere aber die Festlegung von Vergütungen bei Nichterbringung eines festgesetzten Dienstgüte-Niveaus oder Kompensationen bei dem Entzug von Ressourcen mit einer damit einhergehenden Verletzung der Dienstgüte-Vereinbarung, wirft rechtliche Fragen über die Gültigkeit solcher Richtlinien auf. Weitere, nicht-interdisziplinäre Fragestellungen, ergeben sich aus der Frage der Wiederverwendbarkeit und Dokumentation von Dienstgüte-Vor- kehrungen im Rahmenwerk. Die Erstellung eines Katalogs mit einem ein- heitlichen Aufbau wie es bei Entwurfsmustern üblich ist verspricht eine geeignete Dokumentationsform. Allerdings muss eine solche Dokumentati- on zwei Zielgruppen gerecht werden. Zum einen sind dies Anwendungsent- wickler, die eine gegebene Dienstgüte-Implementierung anwenden wollen und Informationen für die Nutzung und Anpassung der Anwendung benö- tigen und zum anderen Dienstgüte-Entwickler, die auf bereits existierende transportspezifische Dienstgüte-Mechanismen aufbauen. Für die hier skizzierten Forschungsrichtungen ist ein Rahmenwerk für das Dienstgüte-Management unerlässlich. Das in dieser Arbeit vorgestellte Rahmenwerk bietet eine gute Ausgangsbasis.
Configuration, simulation and visualization of simple biochemical reaction-diffusion systems in 3D
(2004)
Background In biological systems, molecules of different species diffuse within the reaction compartments and interact with each other, ultimately giving rise to such complex structures like living cells. In order to investigate the formation of subcellular structures and patterns (e.g. signal transduction) or spatial effects in metabolic processes, it would be helpful to use simulations of such reaction-diffusion systems. Pattern formation has been extensively studied in two dimensions. However, the extension to three-dimensional reaction-diffusion systems poses some challenges to the visualization of the processes being simulated. Scope of the Thesis The aim of this thesis is the specification and development of algorithms and methods for the three-dimensional configuration, simulation and visualization of biochemical reaction-diffusion systems consisting of a small number of molecules and reactions. After an initial review of existing literature about 2D/3D reaction-diffusion systems, a 3D simulation algorithm (PDE solver), based on an existing 2D-simulation algorithm for reaction-diffusion systems written by Prof. Herbert Sauro, has to be developed. In a succeeding step, this algorithm has to be optimized for high performance. A prototypic 3D configuration tool for the initial state of the system has to be developed. This basic tool should enable the user to define and store the location of molecules, membranes and channels within the reaction space of user-defined size. A suitable data structure has to be defined for the representation of the reaction space. The main focus of this thesis is the specification and prototypic implementation of a suitable reaction space visualization component for the display of the simulation results. In particular, the possibility of 3D visualization during course of the simulation has to be investigated. During the development phase, the quality and usability of the visualizations has to be evaluated in user tests. The simulation, configuration and visualization prototypes should be compliant with the Systems Biology Workbench to ensure compatibility with software from other authors. The thesis is carried out in close cooperation with Prof. Herbert Sauro at the Keck Graduate Institute, Claremont, CA, USA. Due to this international cooperation the thesis will be written in English.
"Prognosen sind schwierig, besonders, wenn sie die Zukunft betreffen", sagt ein geflügeltes Wort. Die letzte Finanzkrise ist dafür ein gutes Beispiel, denn die wenigsten Analysten und Wirtschaftsweisen haben sie kommen sehen. Da Finanzkrisen glücklicherweise selten sind, ist es allerdings schwierig, Modelle zu entwickeln, die rechtzeitig vor einem Crash warnen.
Algorithms for the Maximum Cardinality Matching Problem which greedily add edges to the solution enjoy great popularity. We systematically study strengths and limitations of such algorithms, in particular of those which consider node degree information to select the next edge. Concentrating on nodes of small degree is a promising approach: it was shown, experimentally and analytically, that very good approximate solutions are obtained for restricted classes of random graphs. Results achieved under these idealized conditions, however, remained unsupported by statements which depend on less optimistic assumptions.
The KarpSipser algorithm and 1-2-Greedy, which is a simplified variant of the well-known MinGreedy algorithm, proceed as follows. In each step, if a node of degree one (resp. at most two) exists, then an edge incident with a minimum degree node is picked, otherwise an arbitrary edge is added to the solution.
We analyze the approximation ratio of both algorithms on graphs of degree at most D. Families of graphs are known for which the expected approximation ratio converges to 1/2 as D grows to infinity, even if randomization against the worst case is used. If randomization is not allowed, then we show the following convergence to 1/2: the 1-2-Greedy algorithm achieves approximation ratio (D-1)/(2D-3); if the graph is bipartite, then the more restricted KarpSipser algorithm achieves the even stronger factor D/(2D-2). These guarantees set both algorithms apart from other famous matching heuristics like e.g. Greedy or MRG: these algorithms depend on randomization to break the 1/2-barrier even for paths with D=2. Moreover, for any D our guarantees are strictly larger than the best known bounds on the expected performance of the randomized variants of Greedy and MRG.
To investigate whether KarpSipser or 1-2-Greedy can be refined to achieve better performance, or be simplified without loss of approximation quality, we systematically study entire classes of deterministic greedy-like algorithms for matching. Therefore we employ the adaptive priority algorithm framework by Borodin, Nielsen, and Rackoff: in each round, an adaptive priority algorithm requests one or more edges by formulating their properties---like e.g. "is incident with a node of minimum degree"---and adds the received edges to the solution. No constraints on time and space usage are imposed, hence an adaptive priority algorithm is restricted only by its nature of picking edges in a greedy-like fashion. If an adaptive priority algorithm requests edges by processing degree information, then we show that it does not surpass the performance of KarpSipser: our D/(2D-2)-guarantee for bipartite graphs is tight and KarpSipser is optimal among all such "degree-sensitive" algorithms even though it uses degree information merely to detect degree-1 nodes. Moreover, we show that if degrees of both nodes of an edge may be processed, like e.g. the Double-MinGreedy algorithm does, then the performance of KarpSipser can only be increased marginally, if at all. Of special interest is the capability of requesting edges not only by specifying the degree of a node but additionally its set of neighbors. This enables an adaptive priority algorithm to "traverse" the input graph. We show that on general degree-bounded graphs no such algorithm can beat factor (D-1)/(2D-3). Hence our bound for 1-2-Greedy is tight and this algorithm performs optimally even though it ignores neighbor information. Furthermore, we show that an adaptive priority algorithm deteriorates to approximation ratio exactly 1/2 if it does not request small degree nodes. This tremendous decline of approximation quality happens for graphs on which 1-2-Greedy and KarpSipser perform optimally, namely paths with D=2. Consequently, requesting small degree nodes is vital to beat factor 1/2.
Summarizing, our results show that 1-2-Greedy and KarpSipser stand out from known and hypothetical algorithms as an intriguing combination of both approximation quality and conceptual simplicity.
Die vorliegende Arbeit stellt ein organisches Taskverarbeitungssystem vor, das die zuverlässige Verwaltung und Verarbeitung von Tasks auf Multi-Core basierten SoC-Architekturen umsetzt. Aufgrund der zunehmenden Integrationsdichte treten bei der planaren Halbleiter-Fertigung vermehrt Nebeneffekte auf, die im Systembetrieb zu Fehler und Ausfällen von Komponenten führen, was die Zuverlässigkeit der SoCs zunehmend beeinträchtigt. Bereits ab einer Fertigungsgröße von weniger als 100 nm ist eine drastische Zunahme von Elektromigration und der Strahlungssensitivität zu beobachten. Gleichzeitig nimmt die Komplexität (Applikations-Anforderungen) weiter zu, wobei der aktuelle Trend auf eine immer stärkere Vernetzung von Geräten abzielt (Ubiquitäre Systeme). Um diese Herausforderungen autonom bewältigen zu können, wird in dieser Arbeit ein biologisch inspiriertes Systemkonzept vorgestellt. Dieses bedient sich der Eigenschaften und Techniken des menschlichen endokrinen Hormonsystems und setzt ein vollständig dezentrales Funktionsprinzip mit Selbst-X Eigenschaften aus dem Organic Computing Bereich um. Die Durchführung dieses organischen Funktionsprinzips erfolgt in zwei getrennten Regelkreisen, die gemeinsam die dezentrale Verwaltung und Verarbeitung von Tasks übernehmen. Der erste Regelkreis wird durch das künstliche Hormonsystem (KHS) abgebildet und führt die Verteilung aller Tasks auf die verfügbaren Kerne durch. Die Verteilung erfolgt durch das Mitwirken aller Kerne und berücksichtigt deren lokale Eignung und aktueller Zustand. Anschließend erfolgt die Synchronisation mit dem zweiten Regelkreis, der durch die hormongeregelte Taskverarbeitung (HTV) abgebildet wird und einen dynamischen Task-Transfer gemäß der aktuellen Verteilung vollzieht. Dabei werden auch die im Netz verfügbaren Zustände von Tasks berücksichtigt und es entsteht ein vollständiger Verarbeitungspfad, ausgehend von der initialen Taskzuordnung, hinweg über den Transfer der Taskkomponenten, gefolgt von der Erzeugung der lokalen Taskinstanz bis zum Start des zugehörigen Taskprozesses auf dem jeweiligen Kern. Die System-Implementierung setzt sich aus modularen Hardware- und Software-Komponenten zusammen. Dadurch kann das System entweder vollständig in Hardware, Software oder in hybrider Form betrieben und genutzt werden. Mittels eines FPGA-basierten Prototyps konnten die formal bewiesenen Zeitschranken durch Messungen in realer Systemumgebung bestätigt werden. Die Messergebnisse zeigen herausragende Zeitschranken bezüglich der Selbst-X Eigenschaften. Des Weiteren zeigt der quantitative Vergleich gegenüber anderen Systemen, dass der hier gewählte dezentrale Regelungsansatz bezüglich Ausfallsicherheit, Flächen- und Rechenaufwand deutlich überlegen ist.
Diese Arbeit beschäftigt sich mit der konkreten Erzeugung von 2-3D Visualisierung. Im Fokus steht der notwendige Prozess zur Erzeugung von Computergrafik.
Da die Computergrafik heut zu Tage wichtiger Bestandteil vieler Aufgabengebiete ist, sollte deren Nutzung auch allen Menschen zugänglich sein. In den vergangen Jahren blieb dies meist nur Leuten aus den Fachgebieten vorbehalten, aufgrund der Komplexität und des notwendigen „Know-how“ über die Thematik. Mittlerweile gilt diese Tatsache als überholt. Viele Erneuerung im Bereich von Hardware und Software haben es ermöglicht, dass selbst ungeübte Anwender in der Lage sind, ansehnliche 3D Grafiken an ihren PCs bei der Arbeit oder zu Hause zu erzeugen. Dies soll ebenfalls das Ziel dieser Arbeit sein. Dazu wird in eine Applikation erstellt die die Visualisierung von graphischen Primitiven unter der Verwendug von Microsofts DirectX leicht und schnell ermöglichen soll. Als Basis dient ein Rendering-Framework, welches auf einheitliches Schnittstellenkonzept setzt, um die strikte Trennung zwischen Anwender- und Fachwissen zu vollziehen.
Weitere Schwerpunkte dieser Arbeit liegen im Bereich der Modellierung von graphischen Primtiven und der Nutzung von Shadern. Dazu wird in der Modellierung der Import von archivierten Modellen umgesetzt. Die Nutzung von Shadern soll soweit vereinfacht werden, dass Anwender auf Shader beleibig zugreifen können. Dies soll durch eine Verknüpfung zwischen Shadern und Modellen erfolgen, die ebenfalls im Bereich der Modellierung erfolgt.
In dieser Arbeit wurden Web Browser bezüglich ihrer Eignung zum Erstellen interaktiver eLearning Fragen untersucht. Vor dem Hintergrund der speziellen Charakteristika von mobilen Endgeräten wurden insbesondere die Aspekte der Beschränkung auf standardisierte Technologien, sowie die Clientseitigkeit der Applikation hervorgehoben. Es konnte eine Grundlage geschaffen werden, die das Erstellen von interaktiven Fragen nur mit Hilfe von HTML und Javascript ermöglicht und es wurde ein weitgehender Verzicht auf serverseitige Komponenten erreicht.
With the rise of digitalization and ubiquity of media use, both opportunities and challenges emerge for academic learning. One prevalent challenge is media multitasking, which can become distracting and hinder learning success. This thesis investigates two facets of this issue: the enhancement of data tracking, and the exploration of digital interventions that support self-control.
The first paper focuses on digital tracking of media use, as a comprehensive understanding of digital distractions requires careful data collection to avoid misinterpretations. The paper presents a tracking system where media use is linked to learning activities. An annotation dashboard enabled the enrichment of the log data with self-reports. The efficacy of this system was evaluated in a 14-day online course taken by 177 students, with results confirming the initial assumptions about media tracking.
The second paper tackles the recognition of whether a text was thoroughly read, an issue brought on by the tendency of students to skip lengthy and demanding texts. A method utilizing scroll data and time series classification algorithms is presented and tested, showing promising results for early recognition and intervention.
The third paper presents the results of a systematic literature review on the effectiveness of digital self-control tools in academic learning. The paper identifies gaps in existing research and outlines a roadmap for further research on self-control tools.
The fourth paper shares findings from a survey of 273 students, exploring the practical use and perceived helpfulness of DSCTs. The study highlights the challenge of balancing between too restrictive and too lenient DSCTs, particularly for platforms offering both learning content and entertainment. The results also show a special role of media use that is highly habitual.
The fifth paper of this work investigates facets of app-based habit building. In a study over 27 days, 106 school-aged children used the specially developed PROMPT-app. The children carried out one of three digital activities each day, each of which was supposed to promote a deeper or more superficial processing of plans. Significant differences regarding the processing of plans emerged between the three activities, and the results suggest that a child-friendly planning application needs to be personalized to be effective.
Overall, this work offers a comprehensive insight into the complexity and potentials of dealing with distracting media usage and shows ways for future research and interventions in this fascinating and ever more important field.
Research in the field of Digital Humanities, also known as Humanities Computing, has seen a steady increase over the past years. Situated at the intersection of computing science and the humanities, present efforts focus on making resources such as texts, images, musical pieces and other semiotic artifacts digitally available, searchable and analysable. To this end, computational tools enabling textual search, visual analytics, data mining, statistics and natural language processing are harnessed to support the humanities researcher. The processing of large data sets with appropriate software opens up novel and fruitful approaches to questions in the traditional humanities. This report summarizes the Dagstuhl seminar 14301 on “Computational Humanities - bridging the gap between Computer Science and Digital Humanities”.
1998 ACM Subject Classification I.2.7 Natural Language Processing, J.5 Arts and Humanities
In den Anwendungsbereichen der Mixed Reality (MR) werden die reale und die virtuelle Welt kombiniert, so dass ein Eindruck der Koexistenz beider Welten entsteht. Meist wird dabei die reale Umgebung durch virtuelle Objekte angereichert, die dem Anwender zusätzliche Informationen bieten sollen. Um die virtuellen Objekte richtig zu positionieren, muss die reale Umgebung erkannt werden. Diese Erkennung der realen Umgebung wird meist durch Bestimmung und Verfolgung von Orientierung und Positionierung der realen Objekte realisiert, was als Tracking bezeichnet wird und einen der wichtigsten Bestandteile für MR-Anwendung darstellt. Ohne die exakte Ausrichtung von realen und virtuellen Objekten, geht die Illusion verloren, dass die virtuellen Objekte Teil der realen Umgebung sind und mit ihr verschmelzen. Markerkombination Das markerbasierte Tracking ist ein Verfahren, das die Bestimmung der Positionierung von realen Objekten durch zusätzliche Markierungen in der realen Umgebung ermöglicht. Diese Markierungen können besonders gut durch Bildanalyseverfahren extrahiert werden und bieten anhand ihrer speziellen Form Positionierungsinformationen. Der Einsatz dieser Trackingtechnologie ist dabei denkbar einfache und kostengünstig. Ein breiter Anwendungsbereich ist durch den kostengünstigen Einsatz dieser Technologien gegeben, allerdings ist das Erstellen von MR-Anwendungen fast ausschließlich MR-Spezialisten vorbehalten, die über Programmierfertigkeiten und spezielle Kenntnisse aus dem MR-Bereich besitzen. Diese Arbeit beschreibt die Entwicklung und Umsetzung der Konzepte, die einem Personenkreis, der lediglich über geringe Kenntnisse von MR-Technologien und deren Anwendung verfügt, den kostengünstigen und einfachen Einsatz von markerbasierten Trackingtechnologien ermöglicht. Die im Rahmen der Arbeit durchgeführte Analyse verweist auf die problematischen Anwendungsfälle des markerbasierten Trackings, die durch die Verdeckung von Markern zustande kommen, in der Beschränkung der Markeranzahl begründet sind, oder durch die Schwankung der Trackingangaben entstehen. Diese Problembereiche sind bei der Entwicklung berücksichtigt worden und können mit Hilfe der entwickelten Konzepte vom Autor bewältigt werden. Das Konzept der Markerkategorien ermöglicht dabei den Einsatz von angepassten Filterungstechniken. Die redundante Markerkombination behebt das Verdeckungsproblem und eliminiert Schwankungen durch das Kombinieren von mehreren Trackinginformationen. Die Gütefunktion ermöglicht die Bewertung von Trackinginformationen und wird zur Gewichtung der Trackingangaben innerhalb einer Markerkombination genutzt. Das Konzept der Markertupel ermöglicht eine Wiederverwendung von Markern, durch den Ansatz der Bereichsunterteilung. Die Konzepte sind in der AMIRE-Umgebung vollständig implementiert und getestet worden. Zum Abschluss ist rückblickend eine kritische Betrachtung der Arbeit, in punkto Vorgehensweise und erreichter Ergebnisse durchgeführt worden.
Virtual machines are for the most part not used inside of high-energy physics (HEP) environments. Even though they provide a high degree of isolation, the performance overhead they introduce is too great for them to be used. With the rising number of container technologies and their increasing separation capabilities, HEP-environments are evaluating if they could utilize the technology. The container images are small and self-contained which allows them to be easily distributed throughout the global environment. They also offer a near native performance while at the same time aproviding an often acceptable level of isolation. Only the needed services and libraries are packed into an image and executed directly by the host kernel. This work compared the performance impact of the three container technologies Docker, rkt and Singularity. The host kernel was additionally hardened with grsecurity and PaX to strengthen its security and make an exploitation from inside a container harder. The execution time of a physics simulation was used as a benchmark. The results show that the different container technologies have a different impact on the performance. The performance loss on a stock kernel is small; in some cases they were even faster than no container. Docker showed overall the best performance on a stock kernel. The difference on a hardened kernel was bigger than on a stock kernel, but in favor of the container technologies. rkt showed performed in almost all cases better than all the others.
Time-critical applications process a continuous stream of input data and have to meet specific timing constraints. A common approach to ensure that such an application satisfies its constraints is over-provisioning: The application is deployed in a dedicated cluster environment with enough processing power to achieve the target performance for every specified data input rate. This approach comes with a drawback: At times of decreased data input rates, the cluster resources are not fully utilized. A typical use case is the HLT-Chain application that processes physics data at runtime of the ALICE experiment at CERN. From a perspective of cost and efficiency it is desirable to exploit temporarily unused cluster resources. Existing approaches aim for that goal by running additional applications. These approaches, however, a) lack in flexibility to dynamically grant the time-critical application the resources it needs, b) are insufficient for isolating the time-critical application from harmful side-effects introduced by additional applications or c) are not general because application-specific interfaces are used. In this thesis, a software framework is presented that allows to exploit unused resources in a dedicated cluster without harming a time-critical application. Additional applications are hosted in Virtual Machines (VMs) and unused cluster resources are allocated to these VMs at runtime. In order to avoid resource bottlenecks, the resource usage of VMs is dynamically modified according to the needs of the time-critical application. For this purpose, a number of previously not combined methods is used. On a global level, appropriate VM manipulations like hot migration, suspend/resume and start/stop are determined by an informed search heuristic and applied at runtime. Locally on cluster nodes, a feedback-controlled adaption of VM resource usage is carried out in a decentralized manner. The employment of this framework allows to increase a cluster’s usage by running additional applications, while at the same time preventing negative impact towards a time-critical application. This capability of the framework is shown for the HLT-Chain application: In an empirical evaluation the cluster CPU usage is increased from 49% to 79%, additional results are computed and no negative effect towards the HLT-Chain application are observed.
Performance and storage requirements of topology-conserving maps for robot manipulator control
(1989)
A new programming paradigm for the control of a robot manipulator by learning the mapping between the Cartesian space and the joint space (inverse Kinematic) is discussed. It is based on a Neural Network model of optimal mapping between two high-dimensional spaces by Kohonen. This paper describes the approach and presents the optimal mapping, based on the principle of maximal information gain. It is shown that Kohonens mapping in the 2-dimensional case is optimal in this sense. Furthermore, the principal control error made by the learned mapping is evaluated for the example of the commonly used PUMA robot, the trade-off between storage resources and positional error is discussed and an optimal position encoding resolution is proposed.
In bioinformatics, biochemical signal pathways can be modeled by many differential equations. It is still an open problem how to fit the huge amount of parameters of the equations to the available data. Here, the approach of systematically obtaining the most appropriate model and learning its parameters is extremely interesting. One of the most often used approaches for model selection is to choose the least complex model which “fits the needs”. For noisy measurements, the model which has the smallest mean squared error of the observed data results in a model which fits too accurately to the data – it is overfitting. Such a model will perform good on the training data, but worse on unknown data. This paper propose as model selection criterion the least complex description of the observed data by the model, the minimum description length. For the small, but important example of inflammation modeling the performance of the approach is evaluated. Keywords: biochemical pathways, differential equations, septic shock, parameter estimation, overfitting, minimum description length.
In bioinformatics, biochemical pathways can be modeled by many differential equations. It is still an open problem how to fit the huge amount of parameters of the equations to the available data. Here, the approach of systematically learning the parameters is necessary. In this paper, for the small, important example of inflammation modeling a network is constructed and different learning algorithms are proposed. It turned out that due to the nonlinear dynamics evolutionary approaches are necessary to fit the parameters for sparse, given data. Proceedings of the 15th IEEE International Conference on Tools with Artificial Intelligence - ICTAI 2003
The early prediction of mortality is one of the unresolved tasks in intensive care medicine. This contribution models medical symptoms as observations cased by transitions between hidden markov states. Learning the underlying state transition probabilities results in a prediction probability success of about 91%. The results are discussed and put in relation to the model used. Finally, the rationales for using the model are reflected: Are there states in the septic shock data?
Data driven automatic model selection and parameter adaptation – a case study for septic shock
(2004)
In bioinformatics, biochemical pathways can be modeled by many differential equations. It is still an open problem how to fit the huge amount of parameters of the equations to the available data. Here, the approach of systematically learning the parameters is necessary. This paper propose as model selection criterion the least complex description of the observed data by the model, the minimum description length. For the small, but important example of inflammation modeling the performance of the approach is evaluated.
In its first part, this contribution reviews shortly the application of neural network methods to medical problems and characterizes its advantages and problems in the context of the medical background. Successful application examples show that human diagnostic capabilities are significantly worse than the neural diagnostic systems. Then, paradigm of neural networks is shortly introduced and the main problems of medical data base and the basic approaches for training and testing a network by medical data are described. Additionally, the problem of interfacing the network and its result is given and the neuro-fuzzy approach is presented. Finally, as case study of neural rule based diagnosis septic shock diagnosis is described, on one hand by a growing neural network and on the other hand by a rule based system. Keywords: Statistical Classification, Adaptive Prediction, Neural Networks, Neurofuzzy, Medical Systems
In bioinformatics, biochemical pathways can be modeled by many differential equations. It is still an open problem how to fit the huge amount of parameters of the equations to the available data. Here, the approach of systematically learning the parameters is necessary. In this paper, for the small, important example of inflammation modeling a network is constructed and different learning algorithms are proposed. It turned out that due to the nonlinear dynamics evolutionary approaches are necessary to fit the parameters for sparse, given data. Keywords: model parameter adaption, septic shock. coupled differential equations, genetic algorithm.
It is well known that artificial neural nets can be used as approximators of any continous functions to any desired degree. Nevertheless, for a given application and a given network architecture the non-trivial task rests to determine the necessary number of neurons and the necessary accuracy (number of bits) per weight for a satisfactory operation. In this paper the problem is treated by an information theoretic approach. The values for the weights and thresholds in the approximator network are determined analytically. Furthermore, the accuracy of the weights and the number of neurons are seen as general system parameters which determine the the maximal output information (i.e. the approximation error) by the absolute amount and the relative distribution of information contained in the network. A new principle of optimal information distribution is proposed and the conditions for the optimal system parameters are derived. For the simple, instructive example of a linear approximation of a non-linear, quadratic function, the principle of optimal information distribution gives the the optimal system parameters, i.e. the number of neurons and the different resolutions of the variables.
One of the most interesting domains of feedforward networks is the processing of sensor signals. There do exist some networks which extract most of the information by implementing the maximum entropy principle for Gaussian sources. This is done by transforming input patterns to the base of eigenvectors of the input autocorrelation matrix with the biggest eigenvalues. The basic building block of these networks is the linear neuron, learning with the Oja learning rule. Nevertheless, some researchers in pattern recognition theory claim that for pattern recognition and classification clustering transformations are needed which reduce the intra-class entropy. This leads to stable, reliable features and is implemented for Gaussian sources by a linear transformation using the eigenvectors with the smallest eigenvalues. In another paper (Brause 1992) it is shown that the basic building block for such a transformation can be implemented by a linear neuron using an Anti-Hebb rule and restricted weights. This paper shows the analog VLSI design for such a building block, using standard modules of multiplication and addition. The most tedious problem in this VLSI-application is the design of an analog vector normalization circuitry. It can be shown that the standard approaches of weight summation will not give the convergence to the eigenvectors for a proper feature transformation. To avoid this problem, our design differs significantly from the standard approaches by computing the real Euclidean norm. Keywords: minimum entropy, principal component analysis, VLSI, neural networks, surface approximation, cluster transformation, weight normalization circuit.
For the efficient management of large image databases, the automated characterization of images and the usage of that characterization for searching and ordering tasks is highly desirable. The purpose of the project SEMACODE is to combine the still unsolved problem of content-oriented characterization of images with scale-invariant object recognition and modelbased compression methods. To achieve this goal, existing techniques as well as new concepts related to pattern matching, image encoding, and image compression are examined. The resulting methods are integrated in a common framework with the aid of a content-oriented conception. For the application, an image database at the library of the university of Frankfurt/Main (StUB; about 60000 images), the required operations are developed. The search and query interfaces are defined in close cooperation with the StUB project “Digitized Colonial Picture Library”. This report describes the fundamentals and first results of the image encoding and object recognition algorithms developed within the scope of the project.
In contrast to the symbolic approach, neural networks seldom are designed to explain what they have learned. This is a major obstacle for its use in everyday life. With the appearance of neuro-fuzzy systems which use vague, human-like categories the situation has changed. Based on the well-known mechanisms of learning for RBF networks, a special neuro-fuzzy interface is proposed in this paper. It is especially useful in medical applications, using the notation and habits of physicians and other medically trained people. As an example, a liver disease diagnosis system is presented.
In intensive care units physicians are aware of a high lethality rate of septic shock patients. In this contribution we present typical problems and results of a retrospective, data driven analysis based on two neural network methods applied on the data of two clinical studies. Our approach includes necessary steps of data mining, i.e. building up a data base, cleaning and preprocessing the data and finally choosing an adequate analysis for the medical patient data. We chose two architectures based on supervised neural networks. The patient data is classified into two classes (survived and deceased) by a diagnosis based either on the black-box approach of a growing RBF network and otherwise on a second network which can be used to explain its diagnosis by human-understandable diagnostic rules. The advantages and drawbacks of these classification methods for an early warning system are discussed.
Since the description of sepsis by Schottmüller in 1914, the amount on knowledge available on sepsis and its underlying pathophysiology has substantially increased. Epidemiologic examinations of abdominal septic shock patients show the potential for high risk posed by and the extensive therapy situation in the intensive care unit (ICU) (5). Unfortunately, until now it has not been possible to significantly reduce the mortality rate of septic shock, which is as high as 50-60% worldwide, although PROWESS' results (1) are encouraging. This paper summarizes the main results of the MEDAN project and their medical impacts. Several aspects are already published, see the references. The heterogeneity of patient groups and the variations in therapy strategies is seen as one of the main problems for sepsis trials. In the MEDAN multi-center study of 71 intensive care units in Germany, a group of 382 patients made up exclusively of abdominal septic shock patients who met the consensus criteria for septic shock (3) was analysed. For use within scores or stand-alone experiments variables are often studied as isolated variables, not as a multidimensional whole, e.g. a recent study takes a look at the role thrombocytes play (15). To avoid this limitation, our study compares several established scores (SOFA, APACHE II, SAPS II, MODS) by a multi-dimensional neuronal network analysis. For outcome prediction the data of 382 patients was analysed by using most of the commonly documented vital parameters and doses of medicine (metric variables). Data was collected in German hospitals from 1998 to 2001. The 382 handwritten patient records were transferred to an electronic database giving the amount of 2.5 million data entries. The metric data contained in the database is composed of daily measurements and doses of medicine. We used range and plausibility checks to allow no faulty data in the electronic database. 187 of the 382 patients are deceased (49 %).
The prevention of credit card fraud is an important application for prediction techniques. One major obstacle for using neural network training techniques is the high necessary diagnostic quality: Since only one financial transaction of a thousand is invalid no prediction success less than 99.9% is acceptable. Due to these credit card transaction proportions complete new concepts had to be developed and tested on real credit card data. This paper shows how advanced data mining techniques and neural network algorithm can be combined successfully to obtain a high fraud coverage combined with a low false alarm rate.
The prevention of credit card fraud is an important application for prediction techniques. One major obstacle for using neural network training techniques is the high necessary diagnostic quality: Since only one financial transaction of a thousand is invalid no prediction success less than 99.9% is acceptable. Due to these credit card transaction proportions complete new concepts had to be developed and tested on real credit card data. This paper shows how advanced data mining techniques and neural network algorithm can be combined successfully to obtain a high fraud coverage combined with a low false alarm rate.