TerzoN: Human-in-the-Loop Software Testing with a Composite Oracle

Software testing is difficult, tedious, and may consume 28%–50% of software engineering labor. Automatic test generators aim to ease this burden but have important trade-offs. Fuzzers use an implicit oracle that can detect obviously invalid results, but the oracle problem has no general solution, an...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Proceedings of the ACM on software engineering Jg. 2; H. FSE; S. 1983 - 2005
Hauptverfasser: Davis, Matthew C., Wei, Amy, Myers, Brad A., Sunshine, Joshua
Format: Journal Article
Sprache:Englisch
Veröffentlicht: New York, NY, USA ACM 19.06.2025
Schlagworte:
ISSN:2994-970X, 2994-970X
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Software testing is difficult, tedious, and may consume 28%–50% of software engineering labor. Automatic test generators aim to ease this burden but have important trade-offs. Fuzzers use an implicit oracle that can detect obviously invalid results, but the oracle problem has no general solution, and an implicit oracle cannot automatically evaluate correctness. Test suite generators like EvoSuite use the program under test as the oracle and therefore cannot evaluate correctness. Property-based testing tools evaluate correctness, but users have difficulty coming up with properties to test and understanding whether their properties are correct. Consequently, practitioners create many test suites manually and often use an example-based oracle to tediously specify correct input and output examples. To help bridge the gaps among various oracle and tool types, we present the Composite Oracle, which organizes various oracle types into a hierarchy and renders a single test result per example execution. To understand the Composite Oracle’s practical properties, we built TerzoN, a test suite generator that includes a particular instantiation of the Composite Oracle. TerzoN displays all the test results in an integrated view composed from the results of three types of oracles and finds some types of test assertion inconsistencies that might otherwise lead to misleading test results. We evaluated TerzoN in a randomized controlled trial with 14 professional software engineers with a popular industry tool, fast-check, as the control. Participants using TerzoN elicited 72% more bugs (p < 0.01), accurately described more than twice the number of bugs (p < 0.01) and tested 16% more quickly (p < 0.05) relative to fast-check.
ISSN:2994-970X
2994-970X
DOI:10.1145/3729359