TY - THES A1 - Ivanov, Todor T1 - Classifying, evaluating and advancing big data benchmarks N2 - The main contribution of the thesis is in helping to understand which software system parameters mostly affect the performance of Big Data Platforms under realistic workloads. In detail, the main research contributions of the thesis are: 1. Definition of the new concept of heterogeneity for Big Data Architectures (Chapter 2); 2. Investigation of the performance of Big Data systems (e.g. Hadoop) in virtualized environments (Section 3.1); 3. Investigation of the performance of NoSQL databases versus Hadoop distributions (Section 3.2); 4. Execution and evaluation of the TPCx-HS benchmark (Section 3.3); 5. Evaluation and comparison of Hive and Spark SQL engines using benchmark queries (Section 3.4); 6. Evaluation of the impact of compression techniques on SQL-on-Hadoop engine performance (Section 3.5); 7. Extensions of the standardized Big Data benchmark BigBench (TPCx-BB)(Section 4.1 and 4.3); 8. Definition of a new benchmark, called ABench (Big Data Architecture Stack Benchmark), that takes into account the heterogeneity of Big Data architectures (Section 4.5). The thesis is an attempt to re-define system benchmarking taking into account the new requirements posed by the Big Data applications. With the explosion of Artificial Intelligence (AI) and new hardware computing power, this is a first step towards a more holistic approach to benchmarking. KW - Big Data Benchmarks KW - Big Data Y1 - 2019 UR - http://publikationen.ub.uni-frankfurt.de/frontdoor/index/index/docId/51157 UR - https://nbn-resolving.org/urn:nbn:de:hebis:30:3-511574 CY - Frankfurt am Main ER -