Refine
Year of publication
Document Type
- Doctoral Thesis (13)
- Master's Thesis (1)
Language
- English (14)
Has Fulltext
- yes (14)
Is part of the Bibliography
- no (14)
Keywords
- ALICE (3)
- FPGA (3)
- Blockchain (1)
- Coding Scheme (1)
- Dataflow Computing (1)
- Detector Readout (1)
- Erasure-Correcting Codes (1)
- Error Mitigation (1)
- Failure Erasure Code (1)
- Fault Tolerance (1)
Institute
- Informatik (8)
- Informatik und Mathematik (6)
Virtual machines are for the most part not used inside of high-energy physics (HEP) environments. Even though they provide a high degree of isolation, the performance overhead they introduce is too great for them to be used. With the rising number of container technologies and their increasing separation capabilities, HEP-environments are evaluating if they could utilize the technology. The container images are small and self-contained which allows them to be easily distributed throughout the global environment. They also offer a near native performance while at the same time aproviding an often acceptable level of isolation. Only the needed services and libraries are packed into an image and executed directly by the host kernel. This work compared the performance impact of the three container technologies Docker, rkt and Singularity. The host kernel was additionally hardened with grsecurity and PaX to strengthen its security and make an exploitation from inside a container harder. The execution time of a physics simulation was used as a benchmark. The results show that the different container technologies have a different impact on the performance. The performance loss on a stock kernel is small; in some cases they were even faster than no container. Docker showed overall the best performance on a stock kernel. The difference on a hardened kernel was bigger than on a stock kernel, but in favor of the container technologies. rkt showed performed in almost all cases better than all the others.
Driven by rapid technological advancements, the amount of data that is created, captured, communicated, and stored worldwide has grown exponentially over the past decades. Along with this development it has become critical for many disciplines of science and business to being able to gather and analyze large amounts of data. The sheer volume of the data often exceeds the capabilities of classical storage systems, with the result that current large-scale storage systems are highly distributed and are comprised of a high number of individual storage components. As with any other electronic device, the reliability of storage hardware is governed by certain probability distributions, which in turn are influenced by the physical processes utilized to store the information. The traditional way to deal with the inherent unreliability of combined storage systems is to replicate the data several times. Another popular approach to achieve failure tolerance is to calculate the block-wise parity in one or more dimensions. With better understanding of the different failure modes of storage components, it has become evident that sophisticated high-level error detection and correction techniques are indispensable for the ever-growing distributed systems. The utilization of powerful cyclic error-correcting codes, however, comes with a high computational penalty, since the required operations over finite fields do not map very well onto current commodity processors. This thesis introduces a versatile coding scheme with fully adjustable fault-tolerance that is tailored specifically to modern processor architectures. To reduce stress on the memory subsystem the conventional table-based algorithm for multiplication over finite fields has been replaced with a polynomial version. This arithmetically intense algorithm is better suited to the wide SIMD units of the currently available general purpose processors, but also displays significant benefits when used with modern many-core accelerator devices (for instance the popular general purpose graphics processing units). A CPU implementation using SSE and a GPU version using CUDA are presented. The performance of the multiplication depends on the distribution of the polynomial coefficients in the finite field elements. This property has been used to create suitable matrices that generate a linear systematic erasure-correcting code which shows a significantly increased multiplication performance for the relevant matrix elements. Several approaches to obtain the optimized generator matrices are elaborated and their implications are discussed. A Monte-Carlo-based construction method allows it to influence the specific shape of the generator matrices and thus to adapt them to special storage and archiving workloads. Extensive benchmarks on CPU and GPU demonstrate the superior performance and the future application scenarios of this novel erasure-resilient coding scheme.
Acceleration of Biomedical Image Processing and Reconstruction with FPGAs
Increasing chip sizes and better programming tools have made it possible to increase the boundaries of application acceleration with reconfigurable computer chips. In this thesis the potential of acceleration with Field Programmable Gate Arrays (FPGAs) is examined for applications that perform biomedical image processing and reconstruction. The dataflow paradigm was used to port the analysis of image data for localization microscopy and for 3D electron tomography from an imperative description towards the FPGA for the first time.
After the primitives of image processing on FPGAs are presented, a general workflow is given for analyzing imperative source code and converting it to a hardware pipeline where every node processes image data in parallel. The theoretical foundation is then used to accelerate both example applications. For localization microscopy, an acceleration of 185 compared to an Intel i5 450 CPU was achieved, and electron tomography could be sped up by a factor of 5 over an Nvidia Tesla C1060 graphics card while maintaining full accuracy in both cases.
Detectors of modern high-energy physics experiments generate huge data rates during operation. The efficient read-out of this data from the front-end electronics is a sophisticated task, the main challenges, however, may vary from experiment to experiment. The Compressed Baryonic Matter (CBM) experiment that is currently under construction at the Facility for Antiproton and Ion Research (FAIR) in Darmstadt/Germany foresees a novel approach for data acquisition.
Unlike previous comparable experiments that organize data read-out based on global, hierarchical trigger decisions, CBM is based on free-running and self-triggered front-end electronics. Data is pushed to the next stage of the read-out chain rather than pulled from the buffers of the previous stage. This new paradigm requires a completely new development of read-out electronics.
As one part of this thesis, a firmware for a read-out controller to interface such a free-running and self-triggered front-end ASIC, the GET4 chip, was implemented. The firmware in question was developed to run on a Field Programmable Gate Array (FPGA). An FPGA is an integrated circuit whose behavior can be reconfigured "in the field" which offers a lot of flexibility, bugs can be fixed and also completely new features can be added, even after the hardware has already been installed. Due to these general advantages, the usage of FPGAs is desired for the final experiment. However, there is also a drawback to the usage of FPGAs. The only affordable FPGAs today are based on either SRAM or Flash technology and both cannot easily be operated in a radiation environment.
SRAM-based devices suffer severely from Single Event Upsets (SEUs) and Flash-based FPGAs deteriorate too fast from Total Ionizing Dose (TID) effects.
Several radiation mitigation techniques exist for SRAM-based FPGAs, but careful evaluation for each use case is required. For CBM it is not clear if the higher resource consumption of added redundancy, that more or less directly translates in to additional cost, outweighs the advantaged of using FPGAs. In addition, it is even not clear if radiation mitigation techniques (e.g. scrubbing) that were already successfully put into operation in space applications also work as efficiently at the much higher particle rates expected at CBM.
In this thesis, existing radiation mitigation techniques have been analyzed and eligible techniques have been implemented for the above-mentioned read-out controller. To minimize additional costs, redundancy was only implemented for selected parts of the design.
Finally, the radiation mitigated read-out controller was tested by mounting the device directly into a particle beam at Forschungszentrum Jülich. The tests show that the radiation mitigation effect of the implemented techniques remains sound, even at a very high particle flux and with only part of the design protected by costly redundancy.
The promising results of the in-beam tests suggest to use FPGAs in the read-out chain of the CBM-ToF detector.
Time-critical applications process a continuous stream of input data and have to meet specific timing constraints. A common approach to ensure that such an application satisfies its constraints is over-provisioning: The application is deployed in a dedicated cluster environment with enough processing power to achieve the target performance for every specified data input rate. This approach comes with a drawback: At times of decreased data input rates, the cluster resources are not fully utilized. A typical use case is the HLT-Chain application that processes physics data at runtime of the ALICE experiment at CERN. From a perspective of cost and efficiency it is desirable to exploit temporarily unused cluster resources. Existing approaches aim for that goal by running additional applications. These approaches, however, a) lack in flexibility to dynamically grant the time-critical application the resources it needs, b) are insufficient for isolating the time-critical application from harmful side-effects introduced by additional applications or c) are not general because application-specific interfaces are used. In this thesis, a software framework is presented that allows to exploit unused resources in a dedicated cluster without harming a time-critical application. Additional applications are hosted in Virtual Machines (VMs) and unused cluster resources are allocated to these VMs at runtime. In order to avoid resource bottlenecks, the resource usage of VMs is dynamically modified according to the needs of the time-critical application. For this purpose, a number of previously not combined methods is used. On a global level, appropriate VM manipulations like hot migration, suspend/resume and start/stop are determined by an informed search heuristic and applied at runtime. Locally on cluster nodes, a feedback-controlled adaption of VM resource usage is carried out in a decentralized manner. The employment of this framework allows to increase a cluster’s usage by running additional applications, while at the same time preventing negative impact towards a time-critical application. This capability of the framework is shown for the HLT-Chain application: In an empirical evaluation the cluster CPU usage is increased from 49% to 79%, additional results are computed and no negative effect towards the HLT-Chain application are observed.
Das Projekt anan ist ein Werkzeug zur Fehlersuche in verteilten Hochleistungsrechnern. Die Neuheit des Beitrags besteht darin, dass die bekannten Methoden, die bereits erfolgreich zum Debuggen von Soft- und Hardware eingesetzt werden, auf Hochleistungs-Rechnen übertragen worden sind. Im Rahmen der vorliegenden Arbeit wurde ein Werkzeug namens anan implementiert, das bei der Fehlersuche hilft. Außerdem kann es als dynamischeres Monitoring eingesetzt werden. Beide Einsatzzwecke sind
getestet worden.
Das Werkzeug besteht aus zwei Teilen:
1. aus einem Teil namens anan, der interaktiv vom Nutzer bedient wird
2. und aus einem Teil namens anand, der automatisiert die verlangten Messwerte erhebt und nötigenfalls Befehle ausführt.
Der Teil anan führt Sensoren aus — kleine mustergesteuerte Algorithmen —, deren Ergebnisse per anan zusammengeführt werden. In erster Näherung lässt anan sich als Monitoring beschreiben, welches (1) schnell umkonfiguriert werden (2) komplexere Werte messen kann, die über Korrelationen einfacher Zeitreihen hinausgehen.
Programmable hardware in the form of FPGAs found its place in various high energy physics experiments over the past few decades. These devices provide highly parallel and fully configurable data transport, data formatting, and data processing capabilities with custom interfaces, even in rigid or constrained environments. Additionally, FPGA functionalities and the number of their logic resources have grown exponentially in the last few years, making FPGAs more and more suitable for complex data processing tasks. ALICE is one of the four main experiments at the LHC and specialized in the study of heavy-ion collisions. The readout chain of the ALICE detectors makes use of FPGAs at various places. The Read-Out Receiver Cards (RORCs) are one example of FPGA-based readout hardware, building the interface between the custom detector electronics and the commercial server nodes in the data processing clusters of the Data Acquisition (DAQ) system as well as the High Level Trigger (HLT). These boards are implemented as server plug-in cards with serial optical links towards the detectors. Experimental data is received via more than 500 optical links, already partly pre-processed in the FPGAs, and pushed towards the host machines. Computer clusters consisting of a few hundred nodes collect, aggregate, compress, reconstruct, and prepare the experimental data for permanent storage and later analysis. With the end of the first LHC run period in 2012 and the start of Run 2 in 2015, the DAQ and HLT systems were renewed and several detector components were upgraded for higher data rates and event rates. Increased detector link rates and obsolete host interfaces rendered it impossible to reuse the previous RORCs in Run 2.
This thesis describes the development, integration, and maintenance of the next generation of RORCs for ALICE in Run 2. A custom hardware platform, initially developed as a joint effort between the ALICE DAQ and HLT groups in the course of this work, found its place in the Run 2 readout systems of the ALICE and ATLAS experiments. The hardware fulfills all experiment requirements, matches its target performance, and has been running stable in the production systems since the start of Run 2. Firmware and software developments for the hardware evaluation, the design of the board, the mass production hardware tests, as well as the operation of the final board in the HLT, were carried out as part of this work. 74 boards were integrated into the HLT hardware and software infrastructure, with various firmware and software developments, to provide the main experimental data input and output interface of the HLT for Run 2. The hardware cluster finder, an FPGA-based data pre-processing core from the previous generation of RORCs, was ported to the new hardware. It has been improved and extended to meet the experimental requirements throughout Run 2. The throughput of this firmware component could be doubled and the algorithm extended, providing an improved noise rejection and an increased overall mean data compression ratio compared to its previous implementation. The hardware cluster finder forms a crucial component in the HLT data reconstruction and compression scheme with a processing performance of one board equivalent to around ten server nodes for comparable processing steps in software.
The work on the firmware development, especially on the hardware cluster finder, once more demonstrated that developing and maintaining data processing algorithms with the common low-level hardware description methods is tedious and time-consuming. Therefore, a high-level synthesis (HLS) hardware description method applying dataflow computing at an algorithmic level to FPGAs was evaluated in this context. The hardware cluster finder served as an example of a typical data processing algorithm in a high energy physics readout application. The existing and highly optimized low-level implementation provided a reference for comparisons in terms of throughput and resource usage. The cluster finder algorithm could be implemented in the dataflow description with comparably little effort, providing fast development cycles, compact code and at, the same time, simplified extension and maintenance options. The performance results in terms of throughput and resource usage are comparable to the manual implementation. The dataflow environment proved to be highly valuable for design space explorations. An integration of the dataflow description into the HLT firmware and software infrastructure could be demonstrated as a proof of concept. A high-level hardware description could ease both the design space exploration, the initial development, the maintenance, and the extension of hardware algorithms for high energy physics readout applications.
Deep learning and isolation based security for intrusion detection and prevention in grid computing
(2018)
The use of distributed computational resources for the solution of scientific problems, which require highly intensive data processing is a fundamental mechanism for modern scientific collaborations. The Worldwide Large Hadron Collider Computing Grid (WLCG) is one of the most important examples of a distributed infrastructure for scientific projects and is one of the pioneering examples of grid computing. The WLCG is the global grid that analyzes data from the Large Hadron Collider (LHC) at the European Organization for Nuclear Research (CERN), with 170 sites in 40 countries and more than 600,000 processing cores. The grid service providers grant users access to resources that they can utilize on demand for the execution of custom software applications used for the analysis of data. The code that the users can execute is completely flexible, and commonly there are no significant restrictions. This flexibility and the availability of immense computing power increases the security challenges of these environments. Attackers are a concern for grid administrators. These attackers may request the execution of software with a malicious code that gives them the possibility of compromising the underlying institutions’ infrastructure. Grid systems need security countermeasures to keep the user code running, without allowing access to critical components but whilst still retaining flexibility. The administrators of grid systems also need to be continuously monitoring the activities that the applications are carrying out. An analysis of these activities is necessary to detect possible security issues, to identify ongoing incidents and to perform autonomous responses. The size and complexity of grid systems make manual security monitoring and response expensive and complicated for human analysts. Legacy intrusion detection and prevention systems (IDPS) such as Snort and OSSEC are traditionally used for security incident monitoring in the grid, cloud, clusters and standalone systems. However, IDPS are limited due to the use of hardcoded fixed rules that need to be updated continuously to cope with different threats.
This thesis introduces an architecture for improving security in grid computing. The architecture integrates the use of security by isolation, behavior monitoring and deep learning (DL) for the classification of real-time traces of the running user payloads also known as grid jobs. The first component of the proposal, the Linux containers (LCs), are used to provide isolation between grid jobs and to gather specific traceable information about the behavior of individual jobs. LCs offer a safe environment for the execution of arbitrary user scripts or binaries, protecting the sensitive components of the grid member organizations. The containers consist of a software sandboxing technique and form a lightweight alternative to other technologies such as virtual machines (VMs) that usually implement a full machine-level emulation and can, therefore, significantly affect the performance. This performance loss is commonly unacceptable in high-throughput computing scenarios. Containers enable the collection of monitoring information from the processes running inside them. The data collected via the LCs monitoring is employed to feed a DL-based IDPS.
DL methods can acquire knowledge from experience, which eliminates the need for operators to formally specify all the knowledge that a system requires. These methods can improve IDPS by building models that are utilized to detect security incidents automatically, having the ability to generalize to new classes of issues. DL can produce lower false positive rates for intrusion detection, but also provides a measure of false negatives, which can be improved with new training data. Convolutional neural networks (CNNs) are utilized for the distinction between regular and malicious job classes. A set of samples is collected from regular production grid jobs from the grid infrastructure of “A Large Ion Collider Experiment” (ALICE) and malicious Linux binaries from a malware research website. The features extracted from these samples are utilized for the training and validation of the machine learning (ML) models. The utilization of a generative approach to enhance the required training data is also proposed. Recurrent neural networks (RNN) are used as generative models for the simulation of training data that complements and improves the real collected dataset. This data augmentation strategy is useful to supplement the lack of training data in ML processes.
...
A Large Ion Collider Experiment (ALICE) is one of the four large experiments at the Large Hadron Collider (LHC) at the European Organization for Particle Physics (CERN). ALICE focuses on the physics of the strong interaction and in particular on the Quark-Gluon Plasma. This is a state of matter in which quarks are de-confined. It is believed that it existed in the earliest moments of the evolution of the universe. The ALICE detector studies the products of the collisions between heavy-nuclei, between protons, and between protons and heavy-nuclei. The sub-detector closest to the interaction point is the Inner Tracking System (ITS), which is used to measure the momentum and trajectory of the particles generated by the collisions and allows reconstructing primary and secondary interaction vertices. The ITS needs to have an accurate spatial resolution, together with a low material budget to limit the effect of multiple scattering on low-energetic particles to precisely reconstruct their trajectory. During the Long Shutdown 2 (2019-2020) of the LHC, the current ITS will be replaced by a completely redesigned sub-detector, which will improve readout rate and particle tracking performance especially at low-momentum.
The ALice PIxel DEtector (ALPIDE) chip was designed to meet the requirements of the upgraded ITS in terms of resolution, material budget, radiation hardness, and readout rate. The ALPIDE chip is a Monolithic Active Pixel Sensor (MAPS) realised in Complementary Metal-Oxide Semiconductor (CMOS) technology. Sensing element, analogue front-end, and its digital readout are integrated into the same silicon die. The readout architecture of the new ITS foresees that data is transmitted via a high-speed serial link directly from the ALPIDE to the off-detector electronics. The data is transmitted off-chip by a so-called Data Transmission Unit (DTU) which needs to be tolerant to Single-Event Effects induced by radiation, in order to guarantee reliable operation. The ALPIDE chip will operate in a radiation field with a High-Energy Hadron peak flux of 7.7·10^5 cm^-2s^-1.
The data are sent by the ALPIDE on copper cables to the readout system, which aggregates them and re-transmits them via optical fibres to the counting room. The position where the readout electronics will be placed is constrained by the maximum transmission distance reasonably achievable by the ALPIDE Data Transmission Unit and mechanical constraints of the ALICE experiment. The radiation field at that location is not negligible for its effects on electronics: the high-energy hadrons flux can reach 10^3 cm^-2s^-1. Static RAM (SRAM)-based Field Programmable Gate Arrays (FPGAs) are favoured over Application Specific Integrated Circuits (ASICs) or Radiation Hard by Design (RHBD) commercial devices because of cost effectiveness. Moreover, SRAM-based FPGAs are re-configurable and provide the data throughput required by the ITS. The main issue with SRAM-based FPGAs, for the intended application, is the susceptibility of their Configuration RAM (CRAM) to Single-Event Upsets: the number of CRAM bits is indeed much higher than the logic they configure. Total Ionizing Dose (TID) at the readout designed position is indeed still acceptable for Component Off The Shelf (COTS), provided that proper verification is carried out.
This dissertation focuses on two parts of the design of the readout system: the Data Transmission Unit of the ALPIDE chip and the design of fundamental modules for the SRAM-based FPGA of the readout electronics. In the first part, a module of the Data Transmission Unit is designed, optimising the trade-off between power consumption, radiation tolerance, and jitter performance. The design was tested and thoroughly characterised, including tests while under irradiation with a 30 MeV protons. Furthermore the Data Transmission Unit performance was validated after the integration into the first prototypes of ITS modules. In the second part, the problem of developing a radiation-tolerant SRAM-based FPGA design is investigated and a solution is provided. First, a general methodology for designing radiation-tolerant Finite State Machines in SRAM-based FPGAs is analysed, implemented, and verified. Later, the radiation-tolerant FPGA design for the ITS readout is described together with the radiation effects mitigation techniques that were selectively applied to the different modules. The design was tested with multiple irradiation tests and the results are stated below.
Conceptual design of an ALICE Tier-2 centre integrated into a multi-purpose computing facility
(2012)
This thesis discusses the issues and challenges associated with the design and operation of a data analysis facility for a high-energy physics experiment at a multi-purpose computing centre. At the spotlight is a Tier-2 centre of the distributed computing model of the ALICE experiment at the Large Hadron Collider at CERN in Geneva, Switzerland. The design steps, examined in the thesis, include analysis and optimization of the I/O access patterns of the user workload, integration of the storage resources, and development of the techniques for effective system administration and operation of the facility in a shared computing environment. A number of I/O access performance issues on multiple levels of the I/O subsystem, introduced by utilization of hard disks for data storage, have been addressed by the means of exhaustive benchmarking and thorough analysis of the I/O of the user applications in the ALICE software framework. Defining the set of requirements to the storage system, describing the potential performance bottlenecks and single points of failure and examining possible ways to avoid them allows one to develop guidelines for selecting the way how to integrate the storage resources. The solution, how to preserve a specific software stack for the experiment in a shared environment, is presented along with its effects on the user workload performance. The proposal for a flexible model to deploy and operate the ALICE Tier-2 infrastructure and applications in a virtual environment through adoption of the cloud computing technology and the 'Infrastructure as Code' concept completes the thesis. Scientific software applications can be efficiently computed in a virtual environment, and there is an urgent need to adapt the infrastructure for effective usage of cloud resources.
On development, feasibility, and limits of highly efficient CPU and GPU programs in several fields
(2013)
With processor clock speeds having stagnated, parallel computing architectures have achieved a breakthrough in recent years. Emerging many-core processors like graphics cards run hundreds of threads in parallel and vector instructions are experiencing a revival. Parallel processors with many independent but simple arithmetical logical units fail executing serial tasks efficiently. However, their sheer parallel processing power makes them predestined for parallel applications while the simple construction of their cores makes them unbeatably power efficient. Unfortunately, old programs cannot profit by simple recompilation. Adaptation often requires rethinking and modifying algorithms to make use of parallel execution. Many applications have some serial subroutines which are very hard to parallelize, hence contemporary compute clusters are often homogeneous, offering fast processors for serial tasks and parallel processors for parallel tasks. In order not to waste the available compute power, highly efficient programs are mandatory.
This thesis is about the development of fast algorithms and their implementations on modern CPUs and GPUs, about the maximum achievable efficiency with respect to peak performance and to power consumption respectively, and about feasibility and limits of programs for CPUs, GPUs, and heterogeneous systems. Three totally different applications from distinct fields, which were developed in the extent of this thesis, are presented.
The ALICE experiment at the LHC particle collider at CERN studies heavy-ion collisions at high rates of several hundred Hz, while every collision produces thousands of particles, whose trajectories must be reconstructed. For this purpose, ALICE track reconstruction and ALICE track merging have been adapted for GPUs and deployed on 64 GPU-enabled compute-nodes at CERN.
After a testing phase, the tracker ran in nonstop operation during 2012 providing full real-time track reconstruction. The tracker employs a multithreaded pipeline as well as asynchronous data transfer to ensure continuous GPU utilization and outperforms the fastest available CPUs by about a factor three.
The Linpack benchmark is the standard tool for ranking compute clusters. It solves a dense system of linear equations using primarily matrix multiplication facilitated by a routine called DGEMM. A heterogeneous GPU-enabled version of DGEMM and Linpack has been developed, which can utilize the CAL, CUDA, and OpenCL APIs as backend. Employing this implementation, the LOEWE-CSC cluster ranked place 22 in the November 2010 Top500 list of the fastest supercomputers, and the Sanam cluster achieved the second place in the November 2012 Green500 list of the most power efficient supercomputers. An elaborate lookahead algorithm, a pipeline, and asynchronous data transfer hide the serial CPU-bound tasks of Linpack behind DGEMM execution on the GPU reaching the highest efficiency on GPU-accelerated clusters.
Failure erasure codes enable failure tolerant storage of data and real-time failover, ensuring that in case of a hardware defect servers and even complete data centers remain operational. It is an absolute necessity for present-day computer infrastructure. The mathematical theory behind the codes involves matrix-computations in finite fields, which are not natively supported by modern processors and hence computationally very expensive. This thesis presents a novel scheme for fast encoding matrix generation and demonstrates a fast implementation for the encoding itself, which uses exclusively either integer or logical vector instructions. Depending on the scenario, it is always hitting different hard limits of the hardware: either the maximum attainable memory bandwidth, or the peak instruction throughput, or the PCI Express bandwidth limit when GPUs or FPGAs are used.
The thesis demonstrates that in most cases with respect to the available peak performance, GPU implementations can be as efficient as their CPU counterparts.
With respect to costs or power consumption, they are much more efficient. For this purpose, complex tasks must be split in serial as well as parallel parts and the execution must be pipelined such that the CPU bound tasks are hidden behind GPU execution. Few cases are identified where this is not possible due to PCI Express limitations or not reasonable because practical GPU languages are missing.
The constantly increasing memory density and performance of recent Field Programmable Gate Arrays (FPGA) has boosted a usage in many technical applications such as particle accelerators, automotive industry as well as defense and space. Some of these fields of interest are characterized by the presence of ionizing radiation as caused by natural decay or artificial excitation processes. Unfortunately, this type of radiation affects various digital circuits, including transistors forming Static Random Access Memory (SRAM) storage cells that constitute the technology node for high performance FPGAs. Various digital misbehavior in temporal or permanent manner as well as physical destruction of transistors are the consequence. Therefore, the mitigation of such effects becomes an essential design rule when using SRAM FPGAs in ionizing radiation environments. Tolerance against soft errors can be handled across various layers of modern FPGA design, starting with the most basic silicon manufacturing process, towards configuration, firmware, and system design, until finally ending up with application and software engineering. But only a highly optimized, joint concept of system-wide fault tolerance provides sufficient resilience against ionizing radiation effects without losing too much valuable device resources to the safety approach. This concept is introduced, analyzed, improved and validated in the present work. It includes, but is not limited to, static configuration scrubbing, various firmware redundancy approaches, dynamic memory conservation as well as state machine protection. Guidelines are given to improve manual design practices concerning fault tolerance and tools are shown to reduce necessary efforts. Finally, the SysCore development platform has been maintained to support the recommended design methods and act as Device Under Test (DUT) for all particle irradiation experiments that prove the efficiency of the proposed concept of system-wide fault tolerance for SRAM FPGAs in ionizing radiation environments.
The ALICE High-Level-Trigger (HLT) is a large scale computing farm designed and constructed for the purpose of the realtime reconstruction of particle interactions (events) inside the ALICE detector. The reconstruction of such events is based on the raw data produced in collisions inside the ALICE at the Large Hadron Collider. The online reconstruction in the HLT allows the triggering on certain event topologies and a significant data reduction by applying compression algorithms. Moreover, it enables a real-time verification of the quality of the data.
To receive the raw data from the various sub-detectors of ALICE, the HLT is equipped with 226 custom built FPGA-based PCI-X cards, the H-RORCs. The H-RORC interfaces the detector readout electronics to the nodes of the HLT farm. In addition to the transfer of raw data, 108 H-RORCs host 216 Fast-Cluster-Finder (FCF) processors for the Time-Projection-Chamber (TPC). The TPC is the main tracking detector of ALICE and contributes with up to 16 GB/s to over 90% of the overall data volume. The FCF processor implements the first of two steps in the data reconstruction of the TPC. It calculates the space points and their properties from charge clouds (clusters) created by charged particles traversing the TPCs gas volume. Those space points are not only the base for the tracking algorithm, but also allow for a Huffman-based data compression, which reduces the data volume by a factor of 4 to 6.
The FCF processor is designed to cope with any incoming data rate up to the maximum bandwidth of the incoming optical link (160 MB/s) without creating back-pressure to the detectors readout electronics. A performance comparison with the software implementation of the algorithm shows a speedup factor of about 20 compared with one AMD Opteron 6172 Core @ 2.1 GHz, the CPU type used in the HLT during the LHC Run1 campaign. Comparison with an Intel E5-2690 Core @ 3.0 GHz, the CPU type used by the HLT for the LHC Run2 campaign, results in a speedup factor of 8.5. In total numbers, the 216 FCF processors provide the computing performance of 4255 AMD Opteron cores or 2203 Intel cores of the previously mentioned type. The performance of the reconstruction with respect to the physics analysis is equivalent or better than the official ALICE Offline clusterizer. Therefore, ALICE data taking was switched in 2011 to FCF cluster recording and compression only, discarding the raw data from the TPC. Due to the capability to compress the clusters, the recorded data volume could be increased by a factor of 4 to 6.
For the LHC Run3 campaign, starting in 2020, the FCF builds the foundation of the ALICE data taking and processing strategy. The raw data volume (before processing) of the upgraded TPC will exceed 3 TB/s. As a consequence, online processing of the raw data and compression of the results before it enters the online computing farms is an essential and crucial part of the computing model.
Within the scope of this thesis, the H-RORC card and the FCF processor were developed and built from scratch. It covers the conceptual design, the optimisation and implementation, as well as the verification. It is completed by performance benchmarks and experiences from real data taking.
Blockchains in public administration : a RADIUS on blockchain framework for public administration
(2023)
The emergence of blockchain technology has generated a great deal of attention, as reflected in numerous scientific and journalistic articles. However, the implementation of blockchain for public administrations in Germany has encountered a setback owing to unsuccessful initiatives. Initial enthusiasm was followed by disillusionment. Nevertheless, technology continues to evolve. This paper examines whether the use of a blockchain can still optimize the processes of public administrations. Not only the failed projects are analysed, but also more current applications of the technology and their potential relevance for the administration, especially in the state of Hesse.
To answer if blockchains are promising to administrations, a Design Science Research (DSR) research approach is chosen. The DSR method is a research-based approach that aims to create new and innovative solutions to real-world problems through the development and evaluation of artefacts such as models, methods, or prototypes. For this work, the implementation of a framework to realize an Authentication, Authorization, and Accounting (AAA) system on the blockchain was identified as profitable. The framework aims to implement the aforementioned AAA tasks using a blockchain. The Remote Authentication Dial-In User Service (RADIUS) protocol has been identified as a potential protocol of the AAA system. The goal is to create a way to implement the system either entirely on a blockchain or as a hybrid system. Various blockchain technologies will be considered. Suitable for development, the framework AAA-me is named.
The development of AAA-me has shown that the desired framework for implementing RADIUS on the blockchain is possible in various degrees of implementation. Previous work mostly relied on full development. Additionally, it has been shown that AAA-me can be used to perform hybrid integration at different implementation levels. This makes AAA-me stand out from the few hybrid previous approaches. Furthermore, AAA-me was investigated in different laboratory environments. This was to determine the expected resilience against Single Point of Failure (SPOF). The results of the lab investigation indicated that a RADIUS system on top of a blockchain can provide benefits in terms of security and performance. In the lab environment, times were measured within which a series of authorization requests were processed. In addition, it was illustrated how a RADIUS system implemented using blockchain can protect itself against Man-in-the-Middle (MITM) attacks.
Finally, in collaboration with the Hessian Central Office for Data Processing (German: Hessische Zentrale für Datenverarbeitung) (HZD), another test lab demonstrated how a RADIUS system on the blockchain can integrate with the existing IT systems of the German state of Hesse. Based on these findings, this work reevaluated the applicability of blockchain technology for public administration processes.
The work has thus shown that the use of a blockchain can still be purposeful. However, it has also been shown that an implementation can bring many problems with it. The small number of blockchain developers and engineers also poses the risk of finding people to develop and maintain a system. In addition, one faces the problem of determining an architecture now that will be applied to many projects in the future. However, each project can, in turn, have an impact on the choice of architecture. Once one has solved this problem and a blockchain infrastructure is available, it can be established quickly and be more SPOF resistant, for example, for Public Key Infrastructure (PKI) systems.
AAA-me was only applied in lab and test environments. As a result, no real data ran over its own infrastructure. This allowed the necessary flexibility for development. However, system-related properties could appear in real situations that are not detectable here in this way. Furthermore, the initial stage of AAA-me’s development is still in its infancy. Many manual adjustments need to be made in order for this to integrate with an existing RADIUS system. Also, no system security effort in and of itself has been carried out in the lab environments. Thus, vulnerabilities can quickly open up on web servers due to misconfigurations and missing updates. For the above reasons, productive use should be discouraged unless major developments are carried out.