ANaN — ANalyse And Navigate: debugging compute clusters with techniques from functional programming and text stream processing

  • Monitoring is an indispensable tool for the operation of any large installation of grid or cluster computing, be it high energy physics or elsewhere. Usually, monitoring is configured to collect a small amount of data, just enough to enable detection of abnormal conditions. Once detected, the abnormal condition is handled by gathering all information from the affected components. This data is processed by querying it in a manner similar to a database. This contribution shows how the metaphor of a debugger (for software applications) can be transferred to a compute cluster. The concepts of variables, assertions and breakpoints that are used in debugging can be applied to monitoring by defining variables as the quantities recorded by monitoring and breakpoints as invariants formulated via these variables. It is found that embedding fragments of a data extracting and reporting tool such as the UNIX tool awk facilitates concise notations for commonly used variables since tools like awk are designed to process large event streams (in textual representations) with bounded memory. A functional notation similar to both the pipe notation used in the UNIX shell and the point-free style used in functional programming simplify the combination of variables that commonly occur when formulating breakpoints.

Download full text files

Export metadata

Metadaten
Author:Alexander AdlerGND, Udo KebschullGND
URN:urn:nbn:de:hebis:30:3-569102
DOI:https://doi.org/10.1051/epjconf/202024501041
ISSN:2100-014X
Parent Title (English):EPJ Web of Conferences
Publisher:EDP Sciences
Place of publication:Les Ulis
Document Type:Article
Language:English
Date of Publication (online):2020/11/16
Year of first Publication:2020
Publishing Institution:Universitätsbibliothek Johann Christian Senckenberg
Contributing Corporation:24th International Conference on Computing in High Energy and Nuclear Physics (CHEP 2019)
Release Date:2020/11/18
Volume:245
Issue:01041
Page Number:8
HeBIS-PPN:475953886
Institutes:Informatik und Mathematik / Informatik
Zentrale Einrichtung / Hochschulrechenzentrum
Dewey Decimal Classification:0 Informatik, Informationswissenschaft, allgemeine Werke / 00 Informatik, Wissen, Systeme / 004 Datenverarbeitung; Informatik
Sammlungen:Universitätspublikationen
Licence (German):License LogoCreative Commons - Namensnennung 4.0