SEDGE: Symbolic example data generation for dataflow programs
Exhaustive, automatic testing of dataflow (esp. mapreduce) programs has emerged as an important challenge. Past work demonstrated effective ways to generate small example data sets that exercise operators in the Pig platform, used to generate Hadoop map-reduce programs. Although such prior technique...
Saved in:
| Published in: | 2013 IEEE/ACM 28th International Conference on Automated Software Engineering (ASE) pp. 235 - 245 |
|---|---|
| Main Authors: | , , , , |
| Format: | Conference Proceeding |
| Language: | English |
| Published: |
IEEE
01.11.2013
|
| Subjects: | |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | Exhaustive, automatic testing of dataflow (esp. mapreduce) programs has emerged as an important challenge. Past work demonstrated effective ways to generate small example data sets that exercise operators in the Pig platform, used to generate Hadoop map-reduce programs. Although such prior techniques attempt to cover all cases of operator use, in practice they often fail. Our SEDGE system addresses these completeness problems: for every dataflow operator, we produce data aiming to cover all cases that arise in the dataflow program (e.g., both passing and failing a filter). SEDGE relies on transforming the program into symbolic constraints, and solving the constraints using a symbolic reasoning engine (a powerful SMT solver), while using input data as concrete aids in the solution process. The approach resembles dynamic-symbolic (a.k.a. "concolic") execution in a conventional programming language, adapted to the unique features of the dataflow domain. In third-party benchmarks, SEDGE achieves higher coverage than past techniques for 5 out of 20 PigMix benchmarks and 7 out of 11 SDSS benchmarks and (with equal coverage for the rest of the benchmarks). We also show that our targeting of the high-level dataflow language pays off: for complex programs, state-of-the-art dynamic-symbolic execution at the level of the generated map-reduce code (instead of the original dataflow program) requires many more test cases or achieves much lower coverage than our approach. |
|---|---|
| AbstractList | Exhaustive, automatic testing of dataflow (esp. mapreduce) programs has emerged as an important challenge. Past work demonstrated effective ways to generate small example data sets that exercise operators in the Pig platform, used to generate Hadoop map-reduce programs. Although such prior techniques attempt to cover all cases of operator use, in practice they often fail. Our SEDGE system addresses these completeness problems: for every dataflow operator, we produce data aiming to cover all cases that arise in the dataflow program (e.g., both passing and failing a filter). SEDGE relies on transforming the program into symbolic constraints, and solving the constraints using a symbolic reasoning engine (a powerful SMT solver), while using input data as concrete aids in the solution process. The approach resembles dynamic-symbolic (a.k.a. "concolic") execution in a conventional programming language, adapted to the unique features of the dataflow domain. In third-party benchmarks, SEDGE achieves higher coverage than past techniques for 5 out of 20 PigMix benchmarks and 7 out of 11 SDSS benchmarks and (with equal coverage for the rest of the benchmarks). We also show that our targeting of the high-level dataflow language pays off: for complex programs, state-of-the-art dynamic-symbolic execution at the level of the generated map-reduce code (instead of the original dataflow program) requires many more test cases or achieves much lower coverage than our approach. |
| Author | Kaituo Li Reichenbach, Christoph Diao, Yanlei Csallner, Christoph Smaragdakis, Yannis |
| Author_xml | – sequence: 1 surname: Kaituo Li fullname: Kaituo Li organization: Comput. Sci. Dept., Univ. of Massachusetts, Amherst, MA, USA – sequence: 2 givenname: Christoph surname: Reichenbach fullname: Reichenbach, Christoph organization: Inst. of Inf., Goethe Univ. Frankfurt, Frankfurt am Main, Germany – sequence: 3 givenname: Yannis surname: Smaragdakis fullname: Smaragdakis, Yannis organization: Dept. of Inf., Univ. of Athens, Athens, Greece – sequence: 4 givenname: Yanlei surname: Diao fullname: Diao, Yanlei organization: Comput. Sci. Dept., Univ. of Massachusetts, Amherst, MA, USA – sequence: 5 givenname: Christoph surname: Csallner fullname: Csallner, Christoph organization: Comput. Sci. & Eng., Univ. of Texas at Arlington, Arlington, TX, USA |
| BookMark | eNotj8tKw0AARUdQUGv2gpv5gcR5PwQXpcYqFFxE12WeJZJkwiSg_XtD7epyz4EL9xZcDmkIANxjVGGM9OO6qSuCMK2E0BQpegEKLRVmUmtEMBfXoJimb4SWslDGbsBzU79s6yfYHHubutbB8Gv6sQvQm9nAQxhCNnObBhhTPrHYpR845nTIpp_uwFU03RSKc67A12v9uXkrdx_b9816VxrC9FxyYR0RCDnFPbU6REdlpFZaz5UXzjJJpBdcEEaVZk5pJPziBbMUW4UlXYGH_902hLAfc9ubfNyfT9I_joBHzg |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1109/ASE.2013.6693083 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Xplore IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Xplore url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISBN | 9781479902156 1479902152 |
| EndPage | 245 |
| ExternalDocumentID | 6693083 |
| Genre | orig-research |
| GroupedDBID | 6IE 6IL ACM ALMA_UNASSIGNED_HOLDINGS APO CBEJK GUFHI LHSKQ RIB RIC RIE RIL |
| ID | FETCH-LOGICAL-a249t-56bc2600c85d3b9efc37f3b7bd58d6cb4727d656243894c8906df3b64b31b8173 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 12 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000331090200025&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| IngestDate | Wed Aug 27 03:48:50 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | false |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-a249t-56bc2600c85d3b9efc37f3b7bd58d6cb4727d656243894c8906df3b64b31b8173 |
| PageCount | 11 |
| ParticipantIDs | ieee_primary_6693083 |
| PublicationCentury | 2000 |
| PublicationDate | 2013-Nov. |
| PublicationDateYYYYMMDD | 2013-11-01 |
| PublicationDate_xml | – month: 11 year: 2013 text: 2013-Nov. |
| PublicationDecade | 2010 |
| PublicationTitle | 2013 IEEE/ACM 28th International Conference on Automated Software Engineering (ASE) |
| PublicationTitleAbbrev | ASE |
| PublicationYear | 2013 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| SSID | ssj0002181444 |
| Score | 1.6184977 |
| Snippet | Exhaustive, automatic testing of dataflow (esp. mapreduce) programs has emerged as an important challenge. Past work demonstrated effective ways to generate... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 235 |
| SubjectTerms | Benchmark testing Cognition Concrete Data processing Educational institutions Extraterrestrial measurements Programming |
| Title | SEDGE: Symbolic example data generation for dataflow programs |
| URI | https://ieeexplore.ieee.org/document/6693083 |
| WOSCitedRecordID | wos000331090200025&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEB7a4sFT1VZ8k4NHt49NNskKHkRaPEgprEpvZZPMiqBd6cPHv3cn3a4IXryFhBCYIcxMMt_3AZyj7QlnIxmIfsapQAmD1BkZ2ChMueJR7Dzq_fFOjUZ6MonHNbiosDCI6JvPsEND_5fvcruip7KuJN0-zetQV0qusVrVewqFKiHE5ieyF3evkwG1bvFOue2XfooPH8Pm_w7egfYPDo-NqwizCzWc7UFzI8TAynvZgquEiI0vWfL1aojnl-FnSqy_jPo_2ZNnliYHsCJD9XPZS_7BytasRRsehoP7m9ug1EUI0qJYWgaRNJZ45a2OHDcxZparjBtlXKSdtEYUOYkr8rSQlM2F1XFPumJdCsP7RvcV34fGLJ_hATCpVCYMOsvRiNBkhlvBHSqdysK7IR5Ci6wxfVtTX0xLQxz9PX0M22TwNVTvBBrL-QpPYcu-L58X8zPvr2-Mbpb9 |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEB5qFfRUtRXf5uDRbbeb5woeRFoq1lJold7K5rEiaCt9-Pj37qTbiuDFW0gIgRnCzCTzfR_AuTMhs4aLgNVTigVKFCRWi8DwKKGS8th61PtjW3Y6ajCIuwW4WGFhnHO--cxVcej_8u3YzPGprCZQt0_RNVjnjEXhAq21elHBYMUYW_5FhnHtutfA5i1azTf-UlDxAaRZ-t_R21D5QeKR7irG7EDBjXahtJRiIPnNLMNVD6mNL0nv61Uj0y9xnwny_hLsACVPnlsaXUCyHNXPpS_jD5I3Z00r8NBs9G9aQa6MECRZuTQLuNAGmeWN4pbq2KWGypRqqS1XVhjNsqzEZplahNrmzKg4FDZbF0zTulZ1SfegOBqP3D4QIWXKtLOGOs0inWpqGLVOqkRk_o3cAZTRGsO3BfnFMDfE4d_TZ7DZ6t-3h-3bzt0RbKHxF8C9YyjOJnN3AhvmffY8nZx6330Dl7SaRA |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2013+IEEE%2FACM+28th+International+Conference+on+Automated+Software+Engineering+%28ASE%29&rft.atitle=SEDGE%3A+Symbolic+example+data+generation+for+dataflow+programs&rft.au=Kaituo+Li&rft.au=Reichenbach%2C+Christoph&rft.au=Smaragdakis%2C+Yannis&rft.au=Diao%2C+Yanlei&rft.date=2013-11-01&rft.pub=IEEE&rft.spage=235&rft.epage=245&rft_id=info:doi/10.1109%2FASE.2013.6693083&rft.externalDocID=6693083 |