Refine
Year of publication
Document Type
- Doctoral Thesis (13)
- Master's Thesis (1)
Language
- English (14)
Has Fulltext
- yes (14)
Is part of the Bibliography
- no (14)
Keywords
- ALICE (3)
- FPGA (3)
- Blockchain (1)
- Coding Scheme (1)
- Dataflow Computing (1)
- Detector Readout (1)
- Erasure-Correcting Codes (1)
- Error Mitigation (1)
- Failure Erasure Code (1)
- Fault Tolerance (1)
Institute
- Informatik (8)
- Informatik und Mathematik (6)
Deep learning and isolation based security for intrusion detection and prevention in grid computing
(2018)
The use of distributed computational resources for the solution of scientific problems, which require highly intensive data processing is a fundamental mechanism for modern scientific collaborations. The Worldwide Large Hadron Collider Computing Grid (WLCG) is one of the most important examples of a distributed infrastructure for scientific projects and is one of the pioneering examples of grid computing. The WLCG is the global grid that analyzes data from the Large Hadron Collider (LHC) at the European Organization for Nuclear Research (CERN), with 170 sites in 40 countries and more than 600,000 processing cores. The grid service providers grant users access to resources that they can utilize on demand for the execution of custom software applications used for the analysis of data. The code that the users can execute is completely flexible, and commonly there are no significant restrictions. This flexibility and the availability of immense computing power increases the security challenges of these environments. Attackers are a concern for grid administrators. These attackers may request the execution of software with a malicious code that gives them the possibility of compromising the underlying institutions’ infrastructure. Grid systems need security countermeasures to keep the user code running, without allowing access to critical components but whilst still retaining flexibility. The administrators of grid systems also need to be continuously monitoring the activities that the applications are carrying out. An analysis of these activities is necessary to detect possible security issues, to identify ongoing incidents and to perform autonomous responses. The size and complexity of grid systems make manual security monitoring and response expensive and complicated for human analysts. Legacy intrusion detection and prevention systems (IDPS) such as Snort and OSSEC are traditionally used for security incident monitoring in the grid, cloud, clusters and standalone systems. However, IDPS are limited due to the use of hardcoded fixed rules that need to be updated continuously to cope with different threats.
This thesis introduces an architecture for improving security in grid computing. The architecture integrates the use of security by isolation, behavior monitoring and deep learning (DL) for the classification of real-time traces of the running user payloads also known as grid jobs. The first component of the proposal, the Linux containers (LCs), are used to provide isolation between grid jobs and to gather specific traceable information about the behavior of individual jobs. LCs offer a safe environment for the execution of arbitrary user scripts or binaries, protecting the sensitive components of the grid member organizations. The containers consist of a software sandboxing technique and form a lightweight alternative to other technologies such as virtual machines (VMs) that usually implement a full machine-level emulation and can, therefore, significantly affect the performance. This performance loss is commonly unacceptable in high-throughput computing scenarios. Containers enable the collection of monitoring information from the processes running inside them. The data collected via the LCs monitoring is employed to feed a DL-based IDPS.
DL methods can acquire knowledge from experience, which eliminates the need for operators to formally specify all the knowledge that a system requires. These methods can improve IDPS by building models that are utilized to detect security incidents automatically, having the ability to generalize to new classes of issues. DL can produce lower false positive rates for intrusion detection, but also provides a measure of false negatives, which can be improved with new training data. Convolutional neural networks (CNNs) are utilized for the distinction between regular and malicious job classes. A set of samples is collected from regular production grid jobs from the grid infrastructure of “A Large Ion Collider Experiment” (ALICE) and malicious Linux binaries from a malware research website. The features extracted from these samples are utilized for the training and validation of the machine learning (ML) models. The utilization of a generative approach to enhance the required training data is also proposed. Recurrent neural networks (RNN) are used as generative models for the simulation of training data that complements and improves the real collected dataset. This data augmentation strategy is useful to supplement the lack of training data in ML processes.
...
A Large Ion Collider Experiment (ALICE) is one of the four large experiments at the Large Hadron Collider (LHC) at the European Organization for Particle Physics (CERN). ALICE focuses on the physics of the strong interaction and in particular on the Quark-Gluon Plasma. This is a state of matter in which quarks are de-confined. It is believed that it existed in the earliest moments of the evolution of the universe. The ALICE detector studies the products of the collisions between heavy-nuclei, between protons, and between protons and heavy-nuclei. The sub-detector closest to the interaction point is the Inner Tracking System (ITS), which is used to measure the momentum and trajectory of the particles generated by the collisions and allows reconstructing primary and secondary interaction vertices. The ITS needs to have an accurate spatial resolution, together with a low material budget to limit the effect of multiple scattering on low-energetic particles to precisely reconstruct their trajectory. During the Long Shutdown 2 (2019-2020) of the LHC, the current ITS will be replaced by a completely redesigned sub-detector, which will improve readout rate and particle tracking performance especially at low-momentum.
The ALice PIxel DEtector (ALPIDE) chip was designed to meet the requirements of the upgraded ITS in terms of resolution, material budget, radiation hardness, and readout rate. The ALPIDE chip is a Monolithic Active Pixel Sensor (MAPS) realised in Complementary Metal-Oxide Semiconductor (CMOS) technology. Sensing element, analogue front-end, and its digital readout are integrated into the same silicon die. The readout architecture of the new ITS foresees that data is transmitted via a high-speed serial link directly from the ALPIDE to the off-detector electronics. The data is transmitted off-chip by a so-called Data Transmission Unit (DTU) which needs to be tolerant to Single-Event Effects induced by radiation, in order to guarantee reliable operation. The ALPIDE chip will operate in a radiation field with a High-Energy Hadron peak flux of 7.7·10^5 cm^-2s^-1.
The data are sent by the ALPIDE on copper cables to the readout system, which aggregates them and re-transmits them via optical fibres to the counting room. The position where the readout electronics will be placed is constrained by the maximum transmission distance reasonably achievable by the ALPIDE Data Transmission Unit and mechanical constraints of the ALICE experiment. The radiation field at that location is not negligible for its effects on electronics: the high-energy hadrons flux can reach 10^3 cm^-2s^-1. Static RAM (SRAM)-based Field Programmable Gate Arrays (FPGAs) are favoured over Application Specific Integrated Circuits (ASICs) or Radiation Hard by Design (RHBD) commercial devices because of cost effectiveness. Moreover, SRAM-based FPGAs are re-configurable and provide the data throughput required by the ITS. The main issue with SRAM-based FPGAs, for the intended application, is the susceptibility of their Configuration RAM (CRAM) to Single-Event Upsets: the number of CRAM bits is indeed much higher than the logic they configure. Total Ionizing Dose (TID) at the readout designed position is indeed still acceptable for Component Off The Shelf (COTS), provided that proper verification is carried out.
This dissertation focuses on two parts of the design of the readout system: the Data Transmission Unit of the ALPIDE chip and the design of fundamental modules for the SRAM-based FPGA of the readout electronics. In the first part, a module of the Data Transmission Unit is designed, optimising the trade-off between power consumption, radiation tolerance, and jitter performance. The design was tested and thoroughly characterised, including tests while under irradiation with a 30 MeV protons. Furthermore the Data Transmission Unit performance was validated after the integration into the first prototypes of ITS modules. In the second part, the problem of developing a radiation-tolerant SRAM-based FPGA design is investigated and a solution is provided. First, a general methodology for designing radiation-tolerant Finite State Machines in SRAM-based FPGAs is analysed, implemented, and verified. Later, the radiation-tolerant FPGA design for the ITS readout is described together with the radiation effects mitigation techniques that were selectively applied to the different modules. The design was tested with multiple irradiation tests and the results are stated below.
Conceptual design of an ALICE Tier-2 centre integrated into a multi-purpose computing facility
(2012)
This thesis discusses the issues and challenges associated with the design and operation of a data analysis facility for a high-energy physics experiment at a multi-purpose computing centre. At the spotlight is a Tier-2 centre of the distributed computing model of the ALICE experiment at the Large Hadron Collider at CERN in Geneva, Switzerland. The design steps, examined in the thesis, include analysis and optimization of the I/O access patterns of the user workload, integration of the storage resources, and development of the techniques for effective system administration and operation of the facility in a shared computing environment. A number of I/O access performance issues on multiple levels of the I/O subsystem, introduced by utilization of hard disks for data storage, have been addressed by the means of exhaustive benchmarking and thorough analysis of the I/O of the user applications in the ALICE software framework. Defining the set of requirements to the storage system, describing the potential performance bottlenecks and single points of failure and examining possible ways to avoid them allows one to develop guidelines for selecting the way how to integrate the storage resources. The solution, how to preserve a specific software stack for the experiment in a shared environment, is presented along with its effects on the user workload performance. The proposal for a flexible model to deploy and operate the ALICE Tier-2 infrastructure and applications in a virtual environment through adoption of the cloud computing technology and the 'Infrastructure as Code' concept completes the thesis. Scientific software applications can be efficiently computed in a virtual environment, and there is an urgent need to adapt the infrastructure for effective usage of cloud resources.
Blockchains in public administration : a RADIUS on blockchain framework for public administration
(2023)
The emergence of blockchain technology has generated a great deal of attention, as reflected in numerous scientific and journalistic articles. However, the implementation of blockchain for public administrations in Germany has encountered a setback owing to unsuccessful initiatives. Initial enthusiasm was followed by disillusionment. Nevertheless, technology continues to evolve. This paper examines whether the use of a blockchain can still optimize the processes of public administrations. Not only the failed projects are analysed, but also more current applications of the technology and their potential relevance for the administration, especially in the state of Hesse.
To answer if blockchains are promising to administrations, a Design Science Research (DSR) research approach is chosen. The DSR method is a research-based approach that aims to create new and innovative solutions to real-world problems through the development and evaluation of artefacts such as models, methods, or prototypes. For this work, the implementation of a framework to realize an Authentication, Authorization, and Accounting (AAA) system on the blockchain was identified as profitable. The framework aims to implement the aforementioned AAA tasks using a blockchain. The Remote Authentication Dial-In User Service (RADIUS) protocol has been identified as a potential protocol of the AAA system. The goal is to create a way to implement the system either entirely on a blockchain or as a hybrid system. Various blockchain technologies will be considered. Suitable for development, the framework AAA-me is named.
The development of AAA-me has shown that the desired framework for implementing RADIUS on the blockchain is possible in various degrees of implementation. Previous work mostly relied on full development. Additionally, it has been shown that AAA-me can be used to perform hybrid integration at different implementation levels. This makes AAA-me stand out from the few hybrid previous approaches. Furthermore, AAA-me was investigated in different laboratory environments. This was to determine the expected resilience against Single Point of Failure (SPOF). The results of the lab investigation indicated that a RADIUS system on top of a blockchain can provide benefits in terms of security and performance. In the lab environment, times were measured within which a series of authorization requests were processed. In addition, it was illustrated how a RADIUS system implemented using blockchain can protect itself against Man-in-the-Middle (MITM) attacks.
Finally, in collaboration with the Hessian Central Office for Data Processing (German: Hessische Zentrale für Datenverarbeitung) (HZD), another test lab demonstrated how a RADIUS system on the blockchain can integrate with the existing IT systems of the German state of Hesse. Based on these findings, this work reevaluated the applicability of blockchain technology for public administration processes.
The work has thus shown that the use of a blockchain can still be purposeful. However, it has also been shown that an implementation can bring many problems with it. The small number of blockchain developers and engineers also poses the risk of finding people to develop and maintain a system. In addition, one faces the problem of determining an architecture now that will be applied to many projects in the future. However, each project can, in turn, have an impact on the choice of architecture. Once one has solved this problem and a blockchain infrastructure is available, it can be established quickly and be more SPOF resistant, for example, for Public Key Infrastructure (PKI) systems.
AAA-me was only applied in lab and test environments. As a result, no real data ran over its own infrastructure. This allowed the necessary flexibility for development. However, system-related properties could appear in real situations that are not detectable here in this way. Furthermore, the initial stage of AAA-me’s development is still in its infancy. Many manual adjustments need to be made in order for this to integrate with an existing RADIUS system. Also, no system security effort in and of itself has been carried out in the lab environments. Thus, vulnerabilities can quickly open up on web servers due to misconfigurations and missing updates. For the above reasons, productive use should be discouraged unless major developments are carried out.