SEDGE : symbolic example data generation for dataflow programs

  • Exhaustive, automatic testing of dataflow (esp. mapreduce) programs has emerged as an important challenge. Past work demonstrated effective ways to generate small example data sets that exercise operators in the Pig platform, used to generate Hadoop map-reduce programs. Although such prior techniques attempt to cover all cases of operator use, in practice they often fail. Our SEDGE system addresses these completeness problems: for every dataflow operator, we produce data aiming to cover all cases that arise in the dataflow program (e.g., both passing and failing a filter). SEDGE relies on transforming the program into symbolic constraints, and solving the constraints using a symbolic reasoning engine (a powerful SMT solver), while using input data as concrete aids in the solution process. The approach resembles dynamic-symbolic (a.k.a. "concolic") execution in a conventional programming language, adapted to the unique features of the dataflow domain. In third-party benchmarks, SEDGE achieves higher coverage than past techniques for 5 out of 20 PigMix benchmarks and 7 out of 11 SDSS benchmarks and (with equal coverage for the rest of the benchmarks). We also show that our targeting of the high-level dataflow language pays off: for complex programs, state-of-the-art dynamic-symbolic execution at the level of the generated map-reduce code (instead of the original dataflow program) requires many more test cases or achieves much lower coverage than our approach.

Download full text files

Export metadata

Additional Services

Share in Twitter Search Google Scholar
Metadaten
Author:Kaituo Li, Christoph Reichenbach, Yannis Smaragdakis, Yanlei Diao, Christoph Csallner
URN:urn:nbn:de:hebis:30:3-438959
URL:https://zenodo.org/record/7730/files/sedge-ase13.pdf
ISBN:978-1-4799-0215-6
ISBN:978-1-4799-0216-3
Editor:Ewen Denney, Tevfik Bultan, Andreas Zeller
Document Type:Conference Proceeding
Language:English
Date of Publication (online):2017/10/19
Year of first Publication:2013
Publishing Institution:Universitätsbibliothek Johann Christian Senckenberg
Contributing Corporation:2013 28th IEEE/ACM international conference on automated software engineering (ASE), November 11-15, 2013, Palo Alto, USA
Release Date:2017/10/19
Tag:Benchmark testing; Cognition; Concrete; Data processing; Educational institutions; Extraterrestrial measurements; Programming
Page Number:11
Note:
Auch in: Ewen Denney ; Tevfik Bultan ; Andreas Zeller (Hrsg.): 2013 28th IEEE/ACM international conference on automated software engineering (ASE) : proceedings, Conference Publishing Consulting : Passau, 2013, S. 235-245, ISBN: 978-1-4799-0215-6, ISBN: 978-1-4799-0216-3, doi:10.1109/ASE.2013.6693083
HeBIS-PPN:41972284X
Institutes:Informatik und Mathematik / Informatik
Dewey Decimal Classification:0 Informatik, Informationswissenschaft, allgemeine Werke / 00 Informatik, Wissen, Systeme / 004 Datenverarbeitung; Informatik
Sammlungen:Universitätspublikationen
Licence (German):License LogoDeutsches Urheberrecht