Classifying, evaluating and advancing big data benchmarks

  • The main contribution of the thesis is in helping to understand which software system parameters mostly affect the performance of Big Data Platforms under realistic workloads. In detail, the main research contributions of the thesis are: 1. Definition of the new concept of heterogeneity for Big Data Architectures (Chapter 2); 2. Investigation of the performance of Big Data systems (e.g. Hadoop) in virtualized environments (Section 3.1); 3. Investigation of the performance of NoSQL databases versus Hadoop distributions (Section 3.2); 4. Execution and evaluation of the TPCx-HS benchmark (Section 3.3); 5. Evaluation and comparison of Hive and Spark SQL engines using benchmark queries (Section 3.4); 6. Evaluation of the impact of compression techniques on SQL-on-Hadoop engine performance (Section 3.5); 7. Extensions of the standardized Big Data benchmark BigBench (TPCx-BB)(Section 4.1 and 4.3); 8. Definition of a new benchmark, called ABench (Big Data Architecture Stack Benchmark), that takes into account the heterogeneity of Big Data architectures (Section 4.5). The thesis is an attempt to re-define system benchmarking taking into account the new requirements posed by the Big Data applications. With the explosion of Artificial Intelligence (AI) and new hardware computing power, this is a first step towards a more holistic approach to benchmarking.

Download full text files

Export metadata

Additional Services

Share in Twitter Search Google Scholar
Author:Todor Ivanov
Place of publication:Frankfurt am Main
Referee:Roberto V. Zicari, Carsten Binning
Advisor:Roberto V. Zicari
Document Type:Doctoral Thesis
Date of Publication (online):2019/12/09
Year of first Publication:2019
Publishing Institution:Universitätsbibliothek Johann Christian Senckenberg
Granting Institution:Johann Wolfgang Goethe-Universität
Date of final exam:2019/07/23
Release Date:2019/09/19
Tag:Big Data; Big Data Benchmarks
Page Number:354
Institutes:Informatik und Mathematik
Dewey Decimal Classification:0 Informatik, Informationswissenschaft, allgemeine Werke / 00 Informatik, Wissen, Systeme / 004 Datenverarbeitung; Informatik
Licence (German):License LogoDeutsches Urheberrecht