Refine
Year of publication
Document Type
- Doctoral Thesis (54) (remove)
Language
- English (54) (remove)
Has Fulltext
- yes (54) (remove)
Is part of the Bibliography
- no (54)
Keywords
- FPGA (2)
- ALICE (1)
- Abfrageverarbeitung (1)
- Abstraction (1)
- Agent <Künstliche Intelligenz> (1)
- Agenten (1)
- Agents (1)
- Analog (1)
- Analog Circuits (1)
- Analog Verification (1)
Institute
- Informatik (54) (remove)
Magnetoencephalography (MEG) measures neural activity non-invasively and at an excellent temporal resolution. Since its invention (Cohen, 1968, 1972), MEG has proven a most valuable tool in neurocognitive (Salmelin et al., 1994) and clinical research (Stufflebeam et al., 2009; Van ’t Ent et al., 2003). MEG is able to measure rapid changes in electrophysiological neural signals related to sensory and cognitive processes. The magnetic fields measured outside the head by MEG directly reflect the cortical currents generated by the synchronised activity of thousands of neuronal sources. This distinguishes MEG from functional magnetic resonance imaging (fMRI), where measurements are only indirectly related to electrophysiological activity through neurovascular coupling...
The presented work inside this thesis aims to raise the degree of automation in analog circuit design. Therefore, a framework was developed to provide the necessary mechanisms in order to carry out a fully automated analog circuit synthesis, i.e., the construction of an analog circuit fulfilling all previously defined (electrical) specifications. Nowadays, analog circuit design in general is a very time consuming process compared to a digital design flow. Due to its discrete nature, the digital design process is highly automated and thus very efficient compared to analog circuit design. In modern Very-Large-Scale integration (VLSI) circuits the analog parts are mostly just a small portion of the overall chip area. Although this small portion is known to consume a major part of the needed workforce. Paired with product cycles which constantly get shorter, the time needed to develop the analog parts of an integrated circuit (IC) becomes a determinant factor. Apart from this, the ongoing progress in semiconductor processing technologies promises more speed with less power consumption on smaller areas, forcing the IC developers to keep track with the technology nodes in order to maintain competitiveness. Analog circuitry exhibits the inherent property of being hard to reuse, as porting from one technology node to another imposes critical changes for operating conditions (e.g., supply voltage) - mostly leading to a full redesign for most of the analog modules. This productivity gap between digital and analog design resembles the primary motivation for this thesis. Due to the availability of commercial sizing tools, this work deliberately focuses on the construction of circuit topologies in distinction to parameter synthesis, which can be obtained with a dedicated sizing tool. The focus on circuit construction allows the development of a framework which allows a full design space exploration. This thesis describes the needed concepts and methods to realize a deterministic, explorative analog synthesis framework. Despite this, a reference implementation is presented, which demonstrates the applicability in current analog design flows.
Efficient algorithms for object recognition are crucial for the newly robotics and computer vision applications that demand real-time and on-line methods. Some examples are autonomous systems, navigating robots, autonomous driving. In this work, we focus on efficient semantic segmentation, which is the problem of labeling each pixel of an image with a semantic class.
Our aim is to speed-up all of the parts of the semantic segmentation pipeline. We also aim at delivering a labeling solution on a time budget, that can be decided on-the-fly. For this purpose, we analyze all the components of the semantic segmentation pipeline, and identify the computational bottleneck of each of them. The different components of the pipeline are over-segmenting the image with local regions, extracting features and classify the local regions, and the final inference of the image labeling with semantic classes. We focus on each of these steps.
First, we introduce a new superpixel algorithm to over-segment the image. Our superpixel method runs in real-time and can deliver a solution at any time budget. Then, for feature extraction, we focus on the framework that computes descriptors and encodes them, followed by a pooling step. We see that the encoding step is the bottleneck, for computational efficiency and performance. We present a novel assignment-based encoding formulation, that allows for the design of a new, very efficient, encoding. Finally, the image labeling output is obtained modeling the dependencies with a Conditional Random Field (CRF). In semantic image segmentation, the computational cost of instantiating the potentials is much higher than MAP inference. We introduce Active MAP inference to on-the-fly select a subset of potentials to be instantiated in the energy function, leaving the rest as unknown, and to estimate the MAP labeling from such incomplete energy function.
We perform experiments on all proposed methods for the different parts of the semantic segmentation pipeline. We show that our superpixel extraction achieves higher accuracy than state-of-the-art on standard superpixel benchmark, while it runs in real-time. We test our feature encoding on standard image classification and segmentation benchmarks, and we show that our method achieves competitive results with the state-of-the-art, and requires less time and memory. Finally, results for semantic segmentation benchmark show that Active MAP inference achieves similar levels of accuracy but with major efficiency gains.
Acceleration of Biomedical Image Processing and Reconstruction with FPGAs
Increasing chip sizes and better programming tools have made it possible to increase the boundaries of application acceleration with reconfigurable computer chips. In this thesis the potential of acceleration with Field Programmable Gate Arrays (FPGAs) is examined for applications that perform biomedical image processing and reconstruction. The dataflow paradigm was used to port the analysis of image data for localization microscopy and for 3D electron tomography from an imperative description towards the FPGA for the first time.
After the primitives of image processing on FPGAs are presented, a general workflow is given for analyzing imperative source code and converting it to a hardware pipeline where every node processes image data in parallel. The theoretical foundation is then used to accelerate both example applications. For localization microscopy, an acceleration of 185 compared to an Intel i5 450 CPU was achieved, and electron tomography could be sped up by a factor of 5 over an Nvidia Tesla C1060 graphics card while maintaining full accuracy in both cases.
Mathematical modeling of Arabidopsis thaliana with focus on network decomposition and reduction
(2014)
Systems biology has become an important research field during the last decade. It focusses on the understanding of the systems which emit the measured data. An important part of this research field is the network analysis, investigating biological networks. An essential point of the inspection of these network models is their validation, i.e., the successful comparison of predicted properties to measured data. Here especially Petri nets have shown their usefulness as modeling technique, coming with sound analysis methods and an intuitive representation of biological network data.
A very important tool for network validation is the analysis of the Transition-invariants (TI), which represent possible steady-state pathways, and the investigation of the liveness property. The computational complexity of the determination of both, TI and liveness property, often hamper their investigation.
To investigate this issue, a metabolic network model is created. It describes the core metabolism of Arabidopsis thaliana, and it is solely based on data from the literature. The model is too complex to determine the TI and the liveness property.
Several strategies are followed to enable an analysis and validation of the network. A network decomposition is utilized in two different ways: manually, motivated by idea to preserve the integrity of biological pathways, and automatically, motivated by the idea to minimize the number of crossing edges. As a decomposition may not be preserving important properties like the coveredness, a network reduction approach is suggested, which is mathematically proven to conserve these important properties. To deal with the large amount of data coming from the TI analysis, new organizational structures are proposed. The liveness property is investigated by reducing the complexity of the calculation method and adapting it to biological networks.
The results obtained by these approaches suggest a valid network model. In conclusion, the proposed approaches and strategies can be used in combination to allow the validation and analysis of highly complex biological networks.
The number of multilingual texts in the World Wide Web (WWW) is increasing dramatically and a multilingual economic zone like the European Union (EU) requires the availability of multilingual Natural Language Processing (NLP) tools. Due to a rapid development of NLP tools, many lexical, syntactic, semantic and other linguistic features have been used in different NLP applications. However, there are some situations where these features can not be used due the application type or unavailability of NLP resources for some of the languages. That is why an application that is intended to handle multilingual texts must have features that are not dependent on a particular language and specific linguistic tools. In this thesis, we will focus on two such applications: text readability and source and translation classification.
In this thesis, we provide 18 features that are not only suitable for both applications, but are also language and linguistic tools independent. In order to build a readability classifier, we use texts from three different languages: English, German and Bangla. Our proposed features achieve a classification accuracy that is comparable with a classifier using 40 linguistic features. The readability classifier achieves a classification F-score of 74.21% on the English Wikipedia corpus, an F-score of 75.47% on the English textbook corpus, an F-score of 86.46% on the Bangla textbook corpus and an F-score of 86.26% on the German GEO/GEOLino corpus.
We used more than two million sentence pairs from 21 European languages in order to build the source and translation classifier. The classifier using the same eighteen features achieves a classification accuracy of 86.63%. We also used the same features to build a classifier that classifies translated texts based on their origin. The classifier achieves classification accuracy of 75% for texts from 10 European languages. In this thesis, we also provide four different corpora, three for text readability analysis and one for corpus based translation studies.
Quarks and gluons are the building blocks of all hadronic matter, like protons and neutrons. Their interaction is described by Quantum Chromodynamics (QCD), a theory under test by large scale experiments like the Large Hadron Collider (LHC) at CERN and in the future at the Facility for Antiproton and Ion Research (FAIR) at GSI. However, perturbative methods can only be applied to QCD for high energies. Studies from first principles are possible via a discretization onto an Euclidean space-time grid. This discretization of QCD is called Lattice QCD (LQCD) and is the only ab-initio option outside of the high-energy regime. LQCD is extremely compute and memory intensive. In particular, it is by definition always bandwidth limited. Thus—despite the complexity of LQCD applications—it led to the development of several specialized compute platforms and influenced the development of others. However, in recent years General-Purpose computation on Graphics Processing Units (GPGPU) came up as a new means for parallel computing. Contrary to machines traditionally used for LQCD, graphics processing units (GPUs) are a massmarket product. This promises advantages in both the pace at which higher-performing hardware becomes available and its price. CL2QCD is an OpenCL based implementation of LQCD using Wilson fermions that was developed within this thesis. It operates on GPUs by all major vendors as well as on central processing units (CPUs). On the AMD Radeon HD 7970 it provides the fastest double-precision D= kernel for a single GPU, achieving 120GFLOPS. D=—the most compute intensive kernel in LQCD simulations—is commonly used to compare LQCD platforms. This performance is enabled by an in-depth analysis of optimization techniques for bandwidth-limited codes on GPUs. Further, analysis of the communication between GPU and CPU, as well as between multiple GPUs, enables high-performance Krylov space solvers and linear scaling to multiple GPUs within a single system. LQCD calculations require a sampling of the phase space. The hybrid Monte Carlo (HMC) algorithm performs this. For this task, a single AMD Radeon HD 7970 GPU provides four times the performance of two AMD Opteron 6220 running an optimized reference code. The same advantage is achieved in terms of energy-efficiency. In terms of normalized total cost of acquisition (TCA), GPU-based clusters match conventional large-scale LQCD systems. Contrary to those, however, they can be scaled up from a single node. Examples of large GPU-based systems are LOEWE-CSC and SANAM. On both, CL2QCD has already been used in production for LQCD studies.
Local protein synthesis has re-defined our ideas on the basic cellular mechanisms that underlie synaptic plasticity and memory formation. The population of messenger RNAs that are localised to dendrites, however, remains sparsely identified. Furthermore, neuronal morphological complexity and spatial compartmentalisation require efficient mechanisms for messenger RNA localisation and control over translational efficiency or transcript stability. 3’ untranslated regions, downstream from stop codons, are recognised for providing binding platforms for many regulatory units, thus encoding the processing of the above processes. The hippocampus, a part of the brain involved in the formation, organisation and storage of memories, provides a natural platform to investigate patterns of RNA localisation. The hippocampus comprises tissue layers, which naturally separate the principle neuronal cell bodies from their processes (axons and dendrites). Identifying the full-complement of localised transcripts and associated 3’UTR isoforms is of great importance to understand both basic neuronal functions and principles of synaptic plasticity. These findings can be used to study the properties of neuronal networks as well as to understand how these networks malfunction in neuronal diseases.
Here, deep sequencing is used to identify the mRNAs resident in the synaptic neuropil in the hippocampus. Analysis of a neuropil data set yields a list of 8,379 transcripts of which 2,550 are localised in dendrites and/or axons. Using a fluorescent barcode strategy to label individual mRNAs shows that the relative abundance of different mRNAs in the neuropil varies over 5 orders of magnitude. High-resolution in situ hybridisation validated the presence of mRNAs in both cultured neurons and hippocampal slices. Among the many mRNAs identified, a large fraction of known synaptic proteins including signaling molecules, scaffolds and receptors is discovered. These results reveal a previously unappreciated enormous potential for the local protein synthesis machinery to supply, maintain and modify the dendritic and synaptic proteome.
Using advances in library preparation for next generation sequencing experiments, the diversity of 3’UTR isoforms present in localised transcripts from the rat hippocampus is examined. The obtained results indicate that there is an increase in 3’UTR heterogeneity and 3’UTR length in neuronal tissue. The evolutionary importance of the 3’UTR diversity and correlation with changes in species,tissue and cell complexity is investigated. The conducted analysis reveals the population of 3’UTR isoforms required for transcript localisation in overall neuronal transcriptome as well as the regulatory elements and binding sites specific for neuronal compartments. The configuration of poly(A) signals is correlated with gene function and can be further exploit to determine similar mechanisms for alternative polyadenylation.
Usage of custom specified methods for next-generation sequencing as well as novel approaches for RNA quantification and visualisation necessitate the development and implementation of new downstream analytic methods. Library methods for data-mining transcripts annotation, expression and ontology relations is provided. Usage of a specialised search engine targeting key features of previous experiments is proposed. A processing pipeline for NanoString technology, defining experimental quality and exploiting methods for data normalisation is developed. High-resolution in situ images are analysed by custom application, showing a correlation between RNA quantity and spatial distribution. The vast variety of bioinformatic methods included in this work indicates the importance of downstream analysis to reach biological conclusions. Maintaining the integrability and modularity of our implementations is of great priority, as the dynamic nature of many experimental techniques requires constant improvement in computational analysis.
Driven by rapid technological advancements, the amount of data that is created, captured, communicated, and stored worldwide has grown exponentially over the past decades. Along with this development it has become critical for many disciplines of science and business to being able to gather and analyze large amounts of data. The sheer volume of the data often exceeds the capabilities of classical storage systems, with the result that current large-scale storage systems are highly distributed and are comprised of a high number of individual storage components. As with any other electronic device, the reliability of storage hardware is governed by certain probability distributions, which in turn are influenced by the physical processes utilized to store the information. The traditional way to deal with the inherent unreliability of combined storage systems is to replicate the data several times. Another popular approach to achieve failure tolerance is to calculate the block-wise parity in one or more dimensions. With better understanding of the different failure modes of storage components, it has become evident that sophisticated high-level error detection and correction techniques are indispensable for the ever-growing distributed systems. The utilization of powerful cyclic error-correcting codes, however, comes with a high computational penalty, since the required operations over finite fields do not map very well onto current commodity processors. This thesis introduces a versatile coding scheme with fully adjustable fault-tolerance that is tailored specifically to modern processor architectures. To reduce stress on the memory subsystem the conventional table-based algorithm for multiplication over finite fields has been replaced with a polynomial version. This arithmetically intense algorithm is better suited to the wide SIMD units of the currently available general purpose processors, but also displays significant benefits when used with modern many-core accelerator devices (for instance the popular general purpose graphics processing units). A CPU implementation using SSE and a GPU version using CUDA are presented. The performance of the multiplication depends on the distribution of the polynomial coefficients in the finite field elements. This property has been used to create suitable matrices that generate a linear systematic erasure-correcting code which shows a significantly increased multiplication performance for the relevant matrix elements. Several approaches to obtain the optimized generator matrices are elaborated and their implications are discussed. A Monte-Carlo-based construction method allows it to influence the specific shape of the generator matrices and thus to adapt them to special storage and archiving workloads. Extensive benchmarks on CPU and GPU demonstrate the superior performance and the future application scenarios of this novel erasure-resilient coding scheme.