Two-stage optimization for machine learning workflow

Machine learning techniques play a preponderant role in dealing with massive amount of data and are employed in almost every possible domain. Building a high quality machine learning model to be deployed in production is a challenging task, from both, the subject matter experts and the machine learn...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Information systems (Oxford) Ročník 92; s. 101483
Hlavní autor: Quemy, Alexandre
Médium: Journal Article
Jazyk:angličtina
Vydáno: Oxford Elsevier Ltd 01.09.2020
Elsevier Science Ltd
Témata:
ISSN:0306-4379, 1873-6076
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract Machine learning techniques play a preponderant role in dealing with massive amount of data and are employed in almost every possible domain. Building a high quality machine learning model to be deployed in production is a challenging task, from both, the subject matter experts and the machine learning practitioners. For a broader adoption and scalability of machine learning systems, the construction and configuration of machine learning workflow need to gain in automation. In the last few years, several techniques have been developed in this direction, known as AutoML. In this paper, we present a two-stage optimization process to build data pipelines and configure machine learning algorithms. First, we study the impact of data pipelines compared to algorithm configuration in order to show the importance of data preprocessing over hyperparameter tuning. The second part presents policies to efficiently allocate search time between data pipeline construction and algorithm configuration. Those policies are agnostic from the metaoptimizer. Last, we present a metric to determine if a data pipeline is specific or independent from the algorithm, enabling fine-grain pipeline pruning and meta-learning for the coldstart problem. •The importance of optimizing data pipeline over hyperparameter tuning is studied.•The results show data pipelines are often more important than hyperparameter tuning.•A two-stage optimization process is proposed to search for a ML workflow.•This process is empirically validated over several time allocation policies.•Iterative and adaptive policies are more robust than static policies.•A metric to measure if a data pipeline is independent from the model is proposed.
AbstractList Machine learning techniques play a preponderant role in dealing with massive amount of data and are employed in almost every possible domain. Building a high quality machine learning model to be deployed in production is a challenging task, from both, the subject matter experts and the machine learning practitioners. For a broader adoption and scalability of machine learning systems, the construction and configuration of machine learning workflow need to gain in automation. In the last few years, several techniques have been developed in this direction, known as AutoML. In this paper, we present a two-stage optimization process to build data pipelines and configure machine learning algorithms. First, we study the impact of data pipelines compared to algorithm configuration in order to show the importance of data preprocessing over hyperparameter tuning. The second part presents policies to efficiently allocate search time between data pipeline construction and algorithm configuration. Those policies are agnostic from the metaoptimizer. Last, we present a metric to determine if a data pipeline is specific or independent from the algorithm, enabling fine-grain pipeline pruning and meta-learning for the coldstart problem.
Machine learning techniques play a preponderant role in dealing with massive amount of data and are employed in almost every possible domain. Building a high quality machine learning model to be deployed in production is a challenging task, from both, the subject matter experts and the machine learning practitioners. For a broader adoption and scalability of machine learning systems, the construction and configuration of machine learning workflow need to gain in automation. In the last few years, several techniques have been developed in this direction, known as AutoML. In this paper, we present a two-stage optimization process to build data pipelines and configure machine learning algorithms. First, we study the impact of data pipelines compared to algorithm configuration in order to show the importance of data preprocessing over hyperparameter tuning. The second part presents policies to efficiently allocate search time between data pipeline construction and algorithm configuration. Those policies are agnostic from the metaoptimizer. Last, we present a metric to determine if a data pipeline is specific or independent from the algorithm, enabling fine-grain pipeline pruning and meta-learning for the coldstart problem. •The importance of optimizing data pipeline over hyperparameter tuning is studied.•The results show data pipelines are often more important than hyperparameter tuning.•A two-stage optimization process is proposed to search for a ML workflow.•This process is empirically validated over several time allocation policies.•Iterative and adaptive policies are more robust than static policies.•A metric to measure if a data pipeline is independent from the model is proposed.
ArticleNumber 101483
Author Quemy, Alexandre
Author_xml – sequence: 1
  givenname: Alexandre
  surname: Quemy
  fullname: Quemy, Alexandre
  email: aquemy@pl.ibm.com
  organization: IBM Krakow Software Lab, Cracow, Poland
BookMark eNp9kE1LAzEQhoNUsK3ePS543pqvTTbepPgFgpd6Dmk6W7Nuk5qkFv31bl1Pgp6GgfeZ4X0maOSDB4TOCZ4RTMRlO3NpRjFRh5XX7AiNSS1ZKbAUIzTGDIuSM6lO0CSlFmNMK6XGiC_2oUzZrKEI2-w27tNkF3zRhFhsjH1xHooOTPTOr4t9iK9NF_an6LgxXYKznzlFz7c3i_l9-fh09zC_fiwtozSXNeYVIbKuFK4EtpVtmKSWCyn5yghFgbOG12TJpaAgFWeiAbmkogYlpGLApuhiuLuN4W0HKes27KLvX2rKOcaqFpz3KTGkbAwpRWi0dfm7RY7GdZpgfTCkW-16rjekB0M9iH-B2-g2Jn78h1wNCPS13x1EnawDb2HlItisV8H9DX8BLNt8sg
CitedBy_id crossref_primary_10_1007_s10796_021_10235_4
crossref_primary_10_3390_su122310124
crossref_primary_10_1016_j_ecoinf_2025_103166
crossref_primary_10_1016_j_is_2021_101822
crossref_primary_10_1016_j_is_2021_101957
crossref_primary_10_1016_j_asoc_2024_111292
crossref_primary_10_1016_j_is_2023_102258
crossref_primary_10_3390_app12199700
crossref_primary_10_1088_1361_6501_abd366
crossref_primary_10_1145_3698831
crossref_primary_10_36219_BPI_2024_4_08
crossref_primary_10_1145_3589328
crossref_primary_10_1016_j_compgeo_2023_106010
crossref_primary_10_3390_eng5010021
crossref_primary_10_1007_s11831_022_09765_0
crossref_primary_10_1186_s13195_024_01540_6
crossref_primary_10_3390_su142215292
crossref_primary_10_1109_ACCESS_2022_3225401
Cites_doi 10.1016/j.csi.2017.05.004
10.1088/1749-4699/8/1/014008
10.1007/978-3-319-99987-6_15
10.1016/j.protcy.2013.12.159
10.1162/neco.1996.8.7.1341
10.1145/3205455.3205586
10.1515/amcs-2017-0048
10.1016/j.ejor.2005.07.023
ContentType Journal Article
Copyright 2019 Elsevier Ltd
Copyright Elsevier Science Ltd. Sep 2020
Copyright_xml – notice: 2019 Elsevier Ltd
– notice: Copyright Elsevier Science Ltd. Sep 2020
DBID AAYXX
CITATION
7SC
8FD
E3H
F2A
JQ2
L7M
L~C
L~D
DOI 10.1016/j.is.2019.101483
DatabaseName CrossRef
Computer and Information Systems Abstracts
Technology Research Database
Library & Information Sciences Abstracts (LISA)
Library & Information Science Abstracts (LISA)
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
DatabaseTitle CrossRef
Technology Research Database
Computer and Information Systems Abstracts – Academic
Library and Information Science Abstracts (LISA)
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts Professional
DatabaseTitleList Technology Research Database

DeliveryMethod fulltext_linktorsrc
Discipline Engineering
Computer Science
EISSN 1873-6076
ExternalDocumentID 10_1016_j_is_2019_101483
S0306437919305356
GroupedDBID --K
--M
-~X
.DC
.~1
0R~
13V
1B1
1~.
1~5
29I
4.4
457
4G.
5GY
5VS
63O
7-5
71M
77K
8P~
9JN
9JO
AAAKF
AAAKG
AACTN
AAEDT
AAEDW
AAIAV
AAIKJ
AAKOC
AALRI
AAOAW
AAQFI
AAQXK
AARIN
AAXUO
AAYFN
ABBOA
ABFNM
ABKBG
ABMAC
ABMVD
ABTAH
ABUCO
ABXDB
ABYKQ
ACDAQ
ACGFS
ACHRH
ACNNM
ACNTT
ACRLP
ACZNC
ADBBV
ADEZE
ADJOM
ADMUD
AEBSH
AEKER
AENEX
AFFNX
AFKWA
AFTJW
AGHFR
AGJBL
AGUBO
AGUMN
AGYEJ
AHHHB
AHZHX
AI.
AIALX
AIEXJ
AIKHN
AITUG
AJBFU
AJOXV
ALEQD
ALMA_UNASSIGNED_HOLDINGS
AMFUW
AMRAJ
AOUOD
APLSM
ASPBG
AVWKF
AXJTR
AZFZN
BKOJK
BLXMC
BNSAS
CS3
DU5
EBS
EFJIC
EFLBG
EJD
EO8
EO9
EP2
EP3
F5P
FDB
FEDTE
FGOYB
FIRID
FNPLU
FYGXN
G-Q
G8K
GBLVA
GBOLZ
HAMUX
HF~
HLZ
HVGLF
HZ~
H~9
IHE
J1W
KOM
LG9
M41
MO0
N9A
O-L
O9-
OAUVE
OZT
P-8
P-9
P2P
PC.
PQQKQ
Q38
R2-
RIG
RNS
ROL
RPZ
SBC
SDF
SDG
SDP
SES
SEW
SPC
SPCBC
SSB
SSD
SSL
SSV
SSZ
T5K
TN5
UHS
VH1
WUQ
XSW
ZCG
ZY4
~G-
77I
9DU
AATTM
AAXKI
AAYWO
AAYXX
ABDPE
ABJNI
ABWVN
ACLOT
ACRPL
ACVFH
ADCNI
ADNMO
ADVLN
AEIPS
AEUPX
AFJKZ
AFPUW
AGQPQ
AIGII
AIIUN
AKBMS
AKRWK
AKYEP
ANKPU
APXCP
CITATION
EFKBS
~HD
7SC
8FD
E3H
F2A
JQ2
L7M
L~C
L~D
ID FETCH-LOGICAL-c322t-80451178590560c5cf372c46774da692e43f481b4762e79436fe7b268e96793e3
ISICitedReferencesCount 17
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000548664400003&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 0306-4379
IngestDate Fri Nov 14 18:50:35 EST 2025
Tue Nov 18 22:30:16 EST 2025
Sat Nov 29 07:19:08 EST 2025
Fri Feb 23 02:47:53 EST 2024
IsPeerReviewed true
IsScholarly true
Keywords AutoML
CASH
Data pipelines
Hyperparameter tuning
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c322t-80451178590560c5cf372c46774da692e43f481b4762e79436fe7b268e96793e3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
PQID 2440098644
PQPubID 2035446
ParticipantIDs proquest_journals_2440098644
crossref_citationtrail_10_1016_j_is_2019_101483
crossref_primary_10_1016_j_is_2019_101483
elsevier_sciencedirect_doi_10_1016_j_is_2019_101483
PublicationCentury 2000
PublicationDate September 2020
2020-09-00
20200901
PublicationDateYYYYMMDD 2020-09-01
PublicationDate_xml – month: 09
  year: 2020
  text: September 2020
PublicationDecade 2020
PublicationPlace Oxford
PublicationPlace_xml – name: Oxford
PublicationTitle Information systems (Oxford)
PublicationYear 2020
Publisher Elsevier Ltd
Elsevier Science Ltd
Publisher_xml – name: Elsevier Ltd
– name: Elsevier Science Ltd
References Bergstra, Bengio (b10) 2012; 13
Hutter, Kotthoff, Vanschoren (b2) 2019
Bilalli, Abelló, Aluja-Banet (b31) 2017; 27
Dasu, Johnson (b4) 2003
Bergstra, Komer, Eliasmith, Yamins, Cox (b16) 2015; 8
Crone, Lessmann, Stahlbock (b3) 2006; 173
Wolpert (b6) 1996; 8
Elshawi, Maher, Sakr (b1) 2019
Nawi, Atomi, Rehman (b5) 2013; 11
Swersky, Snoek, Adams (b23) 2014
Li, Jamieson (b25) 2018; 18
Jamieson, Talwalkar (b24) 2016
Pedregosa, Varoquaux, Gramfort, Michel, Thirion, Grisel, Blondel, Prettenhofer, Weiss, Dubourg, Vanderplas, Passos, Cournapeau, Brucher, Perrot, Duchesnay (b34) 2011; 12
Kingma, Ba (b33) 2014
J. Wilson, F. Hutter, M. Deisenroth, Maximizing acquisition functions for Bayesian optimization, in: Proc. Int. Conf. Neural Inf. Process. Syst., 2018, pp. 9884–9895.
B. Chen, H. Wu, W. Mo, I. Chattopadhyay, H. Lipson, Autostacker: A compositional evolutionary learning system, in: Proc. Gen. and Evol. Comput. Conf., 2018, pp. 402–409.
Močkus (b20) 1975
Kotthoff, Thornton, Hoos, Hutter, Leyton-Brown (b14) 2017; 18
Eggensperger, Feurer, Hutter, Bergstra, Snoek, Hoos, Leyton-Brown (b35) 2013
Sun, Lin, Bischl (b29) 2019
Bischl, Casalicchio, Feurer, Hutter, Lang, Mantovani, van Rijn, Vanschoren (b37) 2017
Frazier (b19) 2018
Thornton, Hutter, Hoos, Leyton-Brown (b13) 2013
J. Snoek, H. Larochelle, R.P. Adams, Practical bayesian optimization of machine learning algorithms, in: Proc. Int. Conf. Neural Inf. Process. Syst., 2012, pp. 2951–2959.
Feurer, Klein, Eggensperger, Springenberg, Blum, Hutter (b15) 2015
Hutter, Hoos, Leyton-Brown (b11) 2011
Chessell, Scheepers, Nguyen, van Kessel, van der Starre (b7) 2014
Rakotoarison, Sebag (b21) 2018
Domhan, Springenberg, Hutter (b22) 2015
Montgomery (b9) 2017
Vanschoren (b32) 2018
Bilalli, Abelló, Aluja-Banet, Wrembel (b36) 2018
J. Nalepa, M. Myller, S. Piechaczek, K. Hrynczenko, M. Kawulok, Genetic selection of training sets for (not only) artificial neural networks, in: Proc. Int. Conf. beyond Databases, Architectures Struct., 2018, pp. 194–206.
A. Quemy, Data pipeline selection and optimization, in: Pro. Int. Workshop on Design, Optim., Languages and Anal. Processing of Big Data, 2019.
J. Bergstra, R. Bardenet, Y. Bengio, B. Kégl, Algorithms for hyper-parameter optimization, in: Proc. Int. Conf. Neural Inf. Process. Syst., 2011, pp. 2546–2554.
Chowdhury, Magdon-Ismail, Yener (b38) 2019
Kim, Kim, Choi (b30) 2017
Olson, Bartley, Urbanowicz, Moore (b27) 2016
Dasu (10.1016/j.is.2019.101483_b4) 2003
Kim (10.1016/j.is.2019.101483_b30) 2017
Rakotoarison (10.1016/j.is.2019.101483_b21) 2018
Bilalli (10.1016/j.is.2019.101483_b36) 2018
Vanschoren (10.1016/j.is.2019.101483_b32) 2018
10.1016/j.is.2019.101483_b28
Olson (10.1016/j.is.2019.101483_b27) 2016
Jamieson (10.1016/j.is.2019.101483_b24) 2016
10.1016/j.is.2019.101483_b26
Bilalli (10.1016/j.is.2019.101483_b31) 2017; 27
Bergstra (10.1016/j.is.2019.101483_b16) 2015; 8
Swersky (10.1016/j.is.2019.101483_b23) 2014
Wolpert (10.1016/j.is.2019.101483_b6) 1996; 8
Montgomery (10.1016/j.is.2019.101483_b9) 2017
Feurer (10.1016/j.is.2019.101483_b15) 2015
Nawi (10.1016/j.is.2019.101483_b5) 2013; 11
Kotthoff (10.1016/j.is.2019.101483_b14) 2017; 18
Bischl (10.1016/j.is.2019.101483_b37) 2017
Bergstra (10.1016/j.is.2019.101483_b10) 2012; 13
Eggensperger (10.1016/j.is.2019.101483_b35) 2013
Chessell (10.1016/j.is.2019.101483_b7) 2014
Chowdhury (10.1016/j.is.2019.101483_b38) 2019
Hutter (10.1016/j.is.2019.101483_b11) 2011
10.1016/j.is.2019.101483_b12
Li (10.1016/j.is.2019.101483_b25) 2018; 18
Sun (10.1016/j.is.2019.101483_b29) 2019
Crone (10.1016/j.is.2019.101483_b3) 2006; 173
Kingma (10.1016/j.is.2019.101483_b33) 2014
10.1016/j.is.2019.101483_b17
Elshawi (10.1016/j.is.2019.101483_b1) 2019
10.1016/j.is.2019.101483_b18
Frazier (10.1016/j.is.2019.101483_b19) 2018
Pedregosa (10.1016/j.is.2019.101483_b34) 2011; 12
10.1016/j.is.2019.101483_b8
Domhan (10.1016/j.is.2019.101483_b22) 2015
Hutter (10.1016/j.is.2019.101483_b2) 2019
Thornton (10.1016/j.is.2019.101483_b13) 2013
Močkus (10.1016/j.is.2019.101483_b20) 1975
References_xml – reference: J. Bergstra, R. Bardenet, Y. Bengio, B. Kégl, Algorithms for hyper-parameter optimization, in: Proc. Int. Conf. Neural Inf. Process. Syst., 2011, pp. 2546–2554.
– volume: 12
  start-page: 2825
  year: 2011
  end-page: 2830
  ident: b34
  article-title: Scikit-learn: Machine learning in python
  publication-title: J. Mach. Learn. Res.
– year: 2014
  ident: b23
  article-title: Freeze-thaw Bayesian optimization
– volume: 27
  start-page: 697
  year: 2017
  end-page: 712
  ident: b31
  article-title: On the predictive power of meta-features in openml
  publication-title: Int. J. Appl. Math. Comput. Sci.
– reference: J. Wilson, F. Hutter, M. Deisenroth, Maximizing acquisition functions for Bayesian optimization, in: Proc. Int. Conf. Neural Inf. Process. Syst., 2018, pp. 9884–9895.
– year: 2013
  ident: b35
  article-title: Towards an empirical foundation for assessing Bayesian optimization of hyperparameters
  publication-title: NIPS Workshop on Bayesian Optimization in Theory and Practice
– year: 2019
  ident: b1
  article-title: Automated machine learning: State-of-the-art and open challenges
– reference: J. Nalepa, M. Myller, S. Piechaczek, K. Hrynczenko, M. Kawulok, Genetic selection of training sets for (not only) artificial neural networks, in: Proc. Int. Conf. beyond Databases, Architectures Struct., 2018, pp. 194–206.
– start-page: 485
  year: 2016
  end-page: 492
  ident: b27
  article-title: Evaluation of a tree-based pipeline optimization tool for automating data science
  publication-title: Proc. Gen. and Evol. Comput. Conf.
– start-page: 507
  year: 2011
  end-page: 523
  ident: b11
  article-title: Sequential model-based optimization for general algorithm configuration
  publication-title: Proc. Int. Conf. Learn. Intel. Optim.
– start-page: 240
  year: 2016
  end-page: 248
  ident: b24
  article-title: Non-stochastic best arm identification and hyperparameter optimization
  publication-title: Artificial Intelligence and Statistics
– volume: 8
  start-page: 014008
  year: 2015
  ident: b16
  article-title: Hyperopt: a python library for model selection and hyperparameter optimization
  publication-title: Comput. Sci. Discov.
– reference: J. Snoek, H. Larochelle, R.P. Adams, Practical bayesian optimization of machine learning algorithms, in: Proc. Int. Conf. Neural Inf. Process. Syst., 2012, pp. 2951–2959.
– volume: 173
  start-page: 781
  year: 2006
  end-page: 800
  ident: b3
  article-title: The impact of preprocessing on data mining: An evaluation of classifier sensitivity in direct marketing
  publication-title: European J. Oper. Res.
– start-page: 2962
  year: 2015
  end-page: 2970
  ident: b15
  article-title: Efficient and robust automated machine learning
  publication-title: Proc. Int. Conf. Neural Inf. Process. Syst.
– year: 2017
  ident: b30
  article-title: Learning to warm-start Bayesian hyperparameter optimization
– volume: 11
  start-page: 32
  year: 2013
  end-page: 39
  ident: b5
  article-title: The effect of data pre-processing on optimized training of artificial neural networks
  publication-title: Proc. Technol.
– start-page: 400
  year: 1975
  end-page: 404
  ident: b20
  article-title: On Bayesian methods for seeking the extremum
  publication-title: Optimization Techniques IFIP Technical Conference
– reference: A. Quemy, Data pipeline selection and optimization, in: Pro. Int. Workshop on Design, Optim., Languages and Anal. Processing of Big Data, 2019.
– year: 2015
  ident: b22
  article-title: Speeding up automatic hyperparameter optimization of deep neural networks by extrapolation of learning curves
  publication-title: Int. Conf. Artif. Intel.
– year: 2019
  ident: b38
  article-title: Quantifying error contributions of computational steps, algorithms and hyperparameter choices in image classification pipelines
– year: 2014
  ident: b33
  article-title: Adam: A method for stochastic optimization
– year: 2018
  ident: b21
  article-title: Automl with Monte Carlo tree search
  publication-title: Workshop AutoML 2018 @ ICML/IJCAI-ECAI
– reference: B. Chen, H. Wu, W. Mo, I. Chattopadhyay, H. Lipson, Autostacker: A compositional evolutionary learning system, in: Proc. Gen. and Evol. Comput. Conf., 2018, pp. 402–409.
– year: 2019
  ident: b29
  article-title: Reinbo: Machine learning pipeline search and configuration with Bayesian optimization embedded reinforcement learning
– year: 2003
  ident: b4
  article-title: Exploratory Data Mining and Data Cleaning, Vol. 479
– year: 2014
  ident: b7
  article-title: Governing and managing big data for analytics and decision makers
  publication-title: IBM Redguides Bus. Lead.
– year: 2019
  ident: b2
  article-title: Automatic machine learning: methods, systems, challenges
  publication-title: Challenges Mach. Learn.
– volume: 13
  start-page: 281
  year: 2012
  end-page: 305
  ident: b10
  article-title: Random search for hyper-parameter optimization
  publication-title: J. Mach. Learn. Res.
– volume: 8
  start-page: 1341
  year: 1996
  end-page: 1390
  ident: b6
  article-title: The lack of a priori distinctions between learning algorithms
  publication-title: Neural Comput.
– year: 2018
  ident: b32
  article-title: Meta-learning: A survey
– year: 2018
  ident: b19
  article-title: A tutorial on Bayesian optimization
– year: 2017
  ident: b9
  article-title: Design and Analysis of Experiments
– start-page: 101
  year: 2018
  end-page: 109
  ident: b36
  article-title: Intelligent assistance for data pre-processing
  publication-title: Comput. Stand. Interfaces
– volume: 18
  start-page: 1
  year: 2018
  end-page: 52
  ident: b25
  article-title: Hyperband: A novel bandit-based approach to hyperparameter optimization
  publication-title: J. Mach. Learn. Res.
– year: 2017
  ident: b37
  article-title: Openml benchmarking suites and the openml100
– start-page: 847
  year: 2013
  end-page: 855
  ident: b13
  article-title: Auto-WEKA: Combined selection and hyperparameter optimization of classification algorithms
  publication-title: Int. Conf. Knowl. Disc. Data Min.
– volume: 18
  start-page: 826
  year: 2017
  end-page: 830
  ident: b14
  article-title: Auto-WEKA 2.0: Automatic model selection and hyperparameter optimization in WEKA
  publication-title: J. Mach. Learn. Res.
– year: 2014
  ident: 10.1016/j.is.2019.101483_b23
– year: 2019
  ident: 10.1016/j.is.2019.101483_b29
– start-page: 101
  year: 2018
  ident: 10.1016/j.is.2019.101483_b36
  article-title: Intelligent assistance for data pre-processing
  publication-title: Comput. Stand. Interfaces
  doi: 10.1016/j.csi.2017.05.004
– start-page: 240
  year: 2016
  ident: 10.1016/j.is.2019.101483_b24
  article-title: Non-stochastic best arm identification and hyperparameter optimization
– ident: 10.1016/j.is.2019.101483_b12
– year: 2003
  ident: 10.1016/j.is.2019.101483_b4
– start-page: 2962
  year: 2015
  ident: 10.1016/j.is.2019.101483_b15
  article-title: Efficient and robust automated machine learning
– volume: 8
  start-page: 014008
  issue: 1
  year: 2015
  ident: 10.1016/j.is.2019.101483_b16
  article-title: Hyperopt: a python library for model selection and hyperparameter optimization
  publication-title: Comput. Sci. Discov.
  doi: 10.1088/1749-4699/8/1/014008
– ident: 10.1016/j.is.2019.101483_b26
  doi: 10.1007/978-3-319-99987-6_15
– volume: 11
  start-page: 32
  year: 2013
  ident: 10.1016/j.is.2019.101483_b5
  article-title: The effect of data pre-processing on optimized training of artificial neural networks
  publication-title: Proc. Technol.
  doi: 10.1016/j.protcy.2013.12.159
– ident: 10.1016/j.is.2019.101483_b18
– year: 2014
  ident: 10.1016/j.is.2019.101483_b33
– volume: 8
  start-page: 1341
  issue: 7
  year: 1996
  ident: 10.1016/j.is.2019.101483_b6
  article-title: The lack of a priori distinctions between learning algorithms
  publication-title: Neural Comput.
  doi: 10.1162/neco.1996.8.7.1341
– volume: 12
  start-page: 2825
  year: 2011
  ident: 10.1016/j.is.2019.101483_b34
  article-title: Scikit-learn: Machine learning in python
  publication-title: J. Mach. Learn. Res.
– year: 2019
  ident: 10.1016/j.is.2019.101483_b38
– year: 2019
  ident: 10.1016/j.is.2019.101483_b2
  article-title: Automatic machine learning: methods, systems, challenges
  publication-title: Challenges Mach. Learn.
– start-page: 485
  year: 2016
  ident: 10.1016/j.is.2019.101483_b27
  article-title: Evaluation of a tree-based pipeline optimization tool for automating data science
– ident: 10.1016/j.is.2019.101483_b28
  doi: 10.1145/3205455.3205586
– volume: 27
  start-page: 697
  issue: 4
  year: 2017
  ident: 10.1016/j.is.2019.101483_b31
  article-title: On the predictive power of meta-features in openml
  publication-title: Int. J. Appl. Math. Comput. Sci.
  doi: 10.1515/amcs-2017-0048
– year: 2017
  ident: 10.1016/j.is.2019.101483_b9
– year: 2018
  ident: 10.1016/j.is.2019.101483_b19
– year: 2014
  ident: 10.1016/j.is.2019.101483_b7
  article-title: Governing and managing big data for analytics and decision makers
  publication-title: IBM Redguides Bus. Lead.
– start-page: 507
  year: 2011
  ident: 10.1016/j.is.2019.101483_b11
  article-title: Sequential model-based optimization for general algorithm configuration
– volume: 173
  start-page: 781
  issue: 3
  year: 2006
  ident: 10.1016/j.is.2019.101483_b3
  article-title: The impact of preprocessing on data mining: An evaluation of classifier sensitivity in direct marketing
  publication-title: European J. Oper. Res.
  doi: 10.1016/j.ejor.2005.07.023
– start-page: 847
  year: 2013
  ident: 10.1016/j.is.2019.101483_b13
  article-title: Auto-WEKA: Combined selection and hyperparameter optimization of classification algorithms
– year: 2019
  ident: 10.1016/j.is.2019.101483_b1
– year: 2017
  ident: 10.1016/j.is.2019.101483_b30
– ident: 10.1016/j.is.2019.101483_b17
– year: 2013
  ident: 10.1016/j.is.2019.101483_b35
  article-title: Towards an empirical foundation for assessing Bayesian optimization of hyperparameters
– year: 2017
  ident: 10.1016/j.is.2019.101483_b37
– start-page: 400
  year: 1975
  ident: 10.1016/j.is.2019.101483_b20
  article-title: On Bayesian methods for seeking the extremum
– ident: 10.1016/j.is.2019.101483_b8
– year: 2018
  ident: 10.1016/j.is.2019.101483_b21
  article-title: Automl with Monte Carlo tree search
– volume: 18
  start-page: 1
  year: 2018
  ident: 10.1016/j.is.2019.101483_b25
  article-title: Hyperband: A novel bandit-based approach to hyperparameter optimization
  publication-title: J. Mach. Learn. Res.
– volume: 13
  start-page: 281
  issue: Feb
  year: 2012
  ident: 10.1016/j.is.2019.101483_b10
  article-title: Random search for hyper-parameter optimization
  publication-title: J. Mach. Learn. Res.
– year: 2018
  ident: 10.1016/j.is.2019.101483_b32
– volume: 18
  start-page: 826
  issue: 1
  year: 2017
  ident: 10.1016/j.is.2019.101483_b14
  article-title: Auto-WEKA 2.0: Automatic model selection and hyperparameter optimization in WEKA
  publication-title: J. Mach. Learn. Res.
– year: 2015
  ident: 10.1016/j.is.2019.101483_b22
  article-title: Speeding up automatic hyperparameter optimization of deep neural networks by extrapolation of learning curves
SSID ssj0002599
Score 2.390996
Snippet Machine learning techniques play a preponderant role in dealing with massive amount of data and are employed in almost every possible domain. Building a high...
SourceID proquest
crossref
elsevier
SourceType Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 101483
SubjectTerms Agnosticism
Algorithms
Automation
AutoML
CASH
Configurations
Data
Data pipelines
Data search
Hyperparameter tuning
Information systems
Machine learning
Optimization
Pipelines
Policies
Workflow
Title Two-stage optimization for machine learning workflow
URI https://dx.doi.org/10.1016/j.is.2019.101483
https://www.proquest.com/docview/2440098644
Volume 92
WOSCitedRecordID wos000548664400003&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVESC
  databaseName: Elsevier SD Freedom Collection Journals 2021
  customDbUrl:
  eissn: 1873-6076
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0002599
  issn: 0306-4379
  databaseCode: AIEXJ
  dateStart: 19950301
  isFulltext: true
  titleUrlDefault: https://www.sciencedirect.com
  providerName: Elsevier
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1JSwMxFA5uBz24i3VjDl48BIckk0yORRT1IIoKvQ2TNCOKnYqt27_3ZaujoqjgZWjTJin50ve-vLwFoW2qM0NSWWEjRBcD_-_iPBMEU1llaUm1ZGXqik2Ik5O805GnoeTlwJUTEHWdPz_Lu3-FGtoAbBs6-wu4R4NCA7wG0OEJsMPzZ8A_9TFQPmsMAHHQC3GWzp2w5zwnTSwVceUKxVS3_acmRQ0BSq6Tz_PsDLM-srBhODh7ML2XRoxM8KINFgQ4LkYXqWDWiqEtb35ELpwq5dgmK_SKwkvHXFDMU1-wJYpPX8rukyT2RoEbGMz6z0nbwHzJmg_5rc_tTHYioJI22QwfR5NEZBJE1GT7aL9zPFKscFKT_lLI_7Jw6-zd9d7P8xXL-KBvHYm4mEezgf0nbY_aAhoz9SKai5U1kiBoF9FMI03kEmIjSJMmpAkAkgRIkwhpEiFdRpcH-xd7hzhUu8AahOoQqIJNFSfyTAInTXWmKyqIBj0mWLfkkhhGKwaHDAbqy9i0frwyQhGeG8lByBq6gibqfm1WUVJKe_lMpCmVYppWSinaJfBecU6UUC20G1en0CEVvK1IcltEn7-b4npQ2PUs_Hq20M6ox51Pg_LNd2lc8CLQOE_PCtgb3_TaiNgU4d8EnzNmE94CZ1_706DraPptw2-gieH9g9lEU_pxeD243wr76xXBr3EW
linkProvider Elsevier
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Two-stage+optimization+for+machine+learning+workflow&rft.jtitle=Information+systems+%28Oxford%29&rft.au=Quemy%2C+Alexandre&rft.date=2020-09-01&rft.pub=Elsevier+Ltd&rft.issn=0306-4379&rft.eissn=1873-6076&rft.volume=92&rft_id=info:doi/10.1016%2Fj.is.2019.101483&rft.externalDocID=S0306437919305356
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0306-4379&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0306-4379&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0306-4379&client=summon