Hyppo: using equivalences to optimize pipelines in exploratory machine learning

Saved in:
Bibliographic Details
Title: Hyppo: using equivalences to optimize pipelines in exploratory machine learning
Authors: Kontaxakis, Antonios, Sacharidis, Dimitris, Simitsis, Alkis, Abelló Gamazo, Alberto, Nadal Francesch, Sergi
Contributors: Universitat Politècnica de Catalunya. Doctorat en Computació, Universitat Politècnica de Catalunya. Departament d'Enginyeria de Serveis i Sistemes d'Informació, Universitat Politècnica de Catalunya. inSSIDE - integrated Software, Services, Information and Data Engineering
Publisher Information: Institute of Electrical and Electronics Engineers (IEEE)
Publication Year: 2024
Collection: Universitat Politècnica de Catalunya, BarcelonaTech: UPCommons - Global access to UPC knowledge
Subject Terms: Àrees temàtiques de la UPC::Informàtica::Intel·ligència artificial::Aprenentatge automàtic, Query optimization, Pipeline optimization, Exploratory machine learning, Materialization, Equivalence, Sharing, Hypergraphs
Description: We present HYPPO, a novel system to optimize pipelines encountered in exploratory machine learning. HYPPO exploits alternative computational paths of artifacts from past executions to derive better execution plans while reusing materialized artifacts. Adding alternative computations introduces new challenges for exploratory machine learning regarding workload representation, system architecture, and optimal execution plan generation. To this end, we present a novel workload representation based on directed hypergraphs, and we formulate the problem of discovering the optimal execution plan as a search problem over directed hypergraphs and that of selecting artifacts to materialize as an optimization problem. A thorough experimental evaluation shows that HYPPO results in plans that are typically one order (up to two orders) of magnitude faster and cheaper than the non-optimized pipeline and considerably (up to one order of magnitude) faster and cheaper than plans generated by the state of the art when materializing artifacts is possible. Lastly, our evaluation reveals that HYPPO reduces the cost by 3–4× even when materialization cannot be exploited. ; This work has been partially supported by the H2020-MSCAITN-2020 DEDS (GA.955895), the EU-HORIZON programmes FAIR-CORE4EOSC (GA.101057264), CREXDATA (GA.101092749), ExtremeXP (GA.101093164), and the Spanish Ministerio de Ciencia e Innovación under project PID2020-117191RB-I00/AEI/10.13039/501100011033 (DOGO4ML). ; Peer Reviewed ; Postprint (author's final draft)
Document Type: conference object
File Description: 14 p.; application/pdf
Language: English
Relation: https://ieeexplore.ieee.org/document/10598141; info:eu-repo/grantAgreement/EC/H2020/955895/EU/Data Engineering for Data Science/DEDS; info:eu-repo/grantAgreement/EC/HE/101093164/EU/EXPeriment driven and user eXPerience oriented analytics for eXtremely Precise outcomes and decisions/ExtremeXP; info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PID2020-117191RB-I00/ES/DESARROLLO, OPERATIVA Y GOBERNANZA DE DATOS PARA SISTEMAS SOFTWARE BASADOS EN APRENDIZAJE AUTOMATICO/; http://hdl.handle.net/2117/418546
DOI: 10.1109/ICDE60146.2024.00024
Availability: http://hdl.handle.net/2117/418546
https://doi.org/10.1109/ICDE60146.2024.00024
Rights: Open Access
Accession Number: edsbas.BAC36273
Database: BASE
Description
Abstract:We present HYPPO, a novel system to optimize pipelines encountered in exploratory machine learning. HYPPO exploits alternative computational paths of artifacts from past executions to derive better execution plans while reusing materialized artifacts. Adding alternative computations introduces new challenges for exploratory machine learning regarding workload representation, system architecture, and optimal execution plan generation. To this end, we present a novel workload representation based on directed hypergraphs, and we formulate the problem of discovering the optimal execution plan as a search problem over directed hypergraphs and that of selecting artifacts to materialize as an optimization problem. A thorough experimental evaluation shows that HYPPO results in plans that are typically one order (up to two orders) of magnitude faster and cheaper than the non-optimized pipeline and considerably (up to one order of magnitude) faster and cheaper than plans generated by the state of the art when materializing artifacts is possible. Lastly, our evaluation reveals that HYPPO reduces the cost by 3–4× even when materialization cannot be exploited. ; This work has been partially supported by the H2020-MSCAITN-2020 DEDS (GA.955895), the EU-HORIZON programmes FAIR-CORE4EOSC (GA.101057264), CREXDATA (GA.101092749), ExtremeXP (GA.101093164), and the Spanish Ministerio de Ciencia e Innovación under project PID2020-117191RB-I00/AEI/10.13039/501100011033 (DOGO4ML). ; Peer Reviewed ; Postprint (author's final draft)
DOI:10.1109/ICDE60146.2024.00024