Hyppo: using equivalences to optimize pipelines in exploratory machine learning
Saved in:
| Title: | Hyppo: using equivalences to optimize pipelines in exploratory machine learning |
|---|---|
| Authors: | Kontaxakis, Antonios, Sacharidis, Dimitris, Simitsis, Alkis, Abelló Gamazo, Alberto, Nadal Francesch, Sergi |
| Contributors: | Universitat Politècnica de Catalunya. Doctorat en Computació, Universitat Politècnica de Catalunya. Departament d'Enginyeria de Serveis i Sistemes d'Informació, Universitat Politècnica de Catalunya. inSSIDE - integrated Software, Services, Information and Data Engineering |
| Publisher Information: | Institute of Electrical and Electronics Engineers (IEEE) |
| Publication Year: | 2024 |
| Collection: | Universitat Politècnica de Catalunya, BarcelonaTech: UPCommons - Global access to UPC knowledge |
| Subject Terms: | Àrees temàtiques de la UPC::Informàtica::Intel·ligència artificial::Aprenentatge automàtic, Query optimization, Pipeline optimization, Exploratory machine learning, Materialization, Equivalence, Sharing, Hypergraphs |
| Description: | We present HYPPO, a novel system to optimize pipelines encountered in exploratory machine learning. HYPPO exploits alternative computational paths of artifacts from past executions to derive better execution plans while reusing materialized artifacts. Adding alternative computations introduces new challenges for exploratory machine learning regarding workload representation, system architecture, and optimal execution plan generation. To this end, we present a novel workload representation based on directed hypergraphs, and we formulate the problem of discovering the optimal execution plan as a search problem over directed hypergraphs and that of selecting artifacts to materialize as an optimization problem. A thorough experimental evaluation shows that HYPPO results in plans that are typically one order (up to two orders) of magnitude faster and cheaper than the non-optimized pipeline and considerably (up to one order of magnitude) faster and cheaper than plans generated by the state of the art when materializing artifacts is possible. Lastly, our evaluation reveals that HYPPO reduces the cost by 3–4× even when materialization cannot be exploited. ; This work has been partially supported by the H2020-MSCAITN-2020 DEDS (GA.955895), the EU-HORIZON programmes FAIR-CORE4EOSC (GA.101057264), CREXDATA (GA.101092749), ExtremeXP (GA.101093164), and the Spanish Ministerio de Ciencia e Innovación under project PID2020-117191RB-I00/AEI/10.13039/501100011033 (DOGO4ML). ; Peer Reviewed ; Postprint (author's final draft) |
| Document Type: | conference object |
| File Description: | 14 p.; application/pdf |
| Language: | English |
| Relation: | https://ieeexplore.ieee.org/document/10598141; info:eu-repo/grantAgreement/EC/H2020/955895/EU/Data Engineering for Data Science/DEDS; info:eu-repo/grantAgreement/EC/HE/101093164/EU/EXPeriment driven and user eXPerience oriented analytics for eXtremely Precise outcomes and decisions/ExtremeXP; info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PID2020-117191RB-I00/ES/DESARROLLO, OPERATIVA Y GOBERNANZA DE DATOS PARA SISTEMAS SOFTWARE BASADOS EN APRENDIZAJE AUTOMATICO/; http://hdl.handle.net/2117/418546 |
| DOI: | 10.1109/ICDE60146.2024.00024 |
| Availability: | http://hdl.handle.net/2117/418546 https://doi.org/10.1109/ICDE60146.2024.00024 |
| Rights: | Open Access |
| Accession Number: | edsbas.BAC36273 |
| Database: | BASE |
| Abstract: | We present HYPPO, a novel system to optimize pipelines encountered in exploratory machine learning. HYPPO exploits alternative computational paths of artifacts from past executions to derive better execution plans while reusing materialized artifacts. Adding alternative computations introduces new challenges for exploratory machine learning regarding workload representation, system architecture, and optimal execution plan generation. To this end, we present a novel workload representation based on directed hypergraphs, and we formulate the problem of discovering the optimal execution plan as a search problem over directed hypergraphs and that of selecting artifacts to materialize as an optimization problem. A thorough experimental evaluation shows that HYPPO results in plans that are typically one order (up to two orders) of magnitude faster and cheaper than the non-optimized pipeline and considerably (up to one order of magnitude) faster and cheaper than plans generated by the state of the art when materializing artifacts is possible. Lastly, our evaluation reveals that HYPPO reduces the cost by 3–4× even when materialization cannot be exploited. ; This work has been partially supported by the H2020-MSCAITN-2020 DEDS (GA.955895), the EU-HORIZON programmes FAIR-CORE4EOSC (GA.101057264), CREXDATA (GA.101092749), ExtremeXP (GA.101093164), and the Spanish Ministerio de Ciencia e Innovación under project PID2020-117191RB-I00/AEI/10.13039/501100011033 (DOGO4ML). ; Peer Reviewed ; Postprint (author's final draft) |
|---|---|
| DOI: | 10.1109/ICDE60146.2024.00024 |
Nájsť tento článok vo Web of Science