BOOMER — An algorithm for learning gradient boosted multi-label classification rules

Multi-label classification is concerned with the assignment of sets of labels to individual data points. Due to its diverse real-world applications, e.g., the annotation of text documents with topics, it has become a well-established field of machine learning research. Compared to traditional classi...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Software impacts Ročník 10; s. 100137
Hlavní autor: Rapp, Michael
Médium: Journal Article
Jazyk:angličtina
Vydáno: Elsevier B.V 01.11.2021
Témata:
ISSN:2665-9638, 2665-9638
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract Multi-label classification is concerned with the assignment of sets of labels to individual data points. Due to its diverse real-world applications, e.g., the annotation of text documents with topics, it has become a well-established field of machine learning research. Compared to traditional classification, where classes are mutually exclusive, multi-label classification comes with interesting challenges, most prominently the requirement to take dependencies between labels into account. In this work, we present a modular and customizable implementation of BOOMER – an algorithm for learning gradient boosted multi-label classification rules – that can flexibly be adjusted to different use cases and requirements. •BOOMER is an algorithm for learning gradient boosted multi-label classification rules.•The goal of multi-label classification is the automatic assignment of sets of labels to individual data points.•BOOMER enables to optimize decomposable and non-decomposable loss functions.•The implementation incorporates several optimizations and approximation techniques to be able to deal with large datasets.•Gradient-based Label Binning can be used to form groups of similar labels.
AbstractList Multi-label classification is concerned with the assignment of sets of labels to individual data points. Due to its diverse real-world applications, e.g., the annotation of text documents with topics, it has become a well-established field of machine learning research. Compared to traditional classification, where classes are mutually exclusive, multi-label classification comes with interesting challenges, most prominently the requirement to take dependencies between labels into account. In this work, we present a modular and customizable implementation of BOOMER – an algorithm for learning gradient boosted multi-label classification rules – that can flexibly be adjusted to different use cases and requirements. •BOOMER is an algorithm for learning gradient boosted multi-label classification rules.•The goal of multi-label classification is the automatic assignment of sets of labels to individual data points.•BOOMER enables to optimize decomposable and non-decomposable loss functions.•The implementation incorporates several optimizations and approximation techniques to be able to deal with large datasets.•Gradient-based Label Binning can be used to form groups of similar labels.
ArticleNumber 100137
Author Rapp, Michael
Author_xml – sequence: 1
  givenname: Michael
  orcidid: 0000-0001-8570-8240
  surname: Rapp
  fullname: Rapp, Michael
  email: mrapp@ke.tu-darmstadt.de
  organization: Knowledge Engineering Group, TU Darmstadt, Hochschulstraße 10, 64289 Darmstadt, Germany
BookMark eNqFkMtKAzEUhoNUsNY-gZu8wNRc5rpwUUu9QKUg6jZkMic1JTMpSSq48yF8Qp_EaetCXOjqHA58h___TtGgcx0gdE7JhBKaX6wnwbQbOWGE0f5CKC-O0JDleZZUOS8HP_YTNA5hTQhhGaU0L4fo-Wq5vJ8_4M_3DzztsLQr5018abF2HluQvjPdCq-8bAx0EdfOhQgNbrc2msTKGixWVoZgtFEyGtdhv7UQztCxljbA-HuO0NP1_HF2myyWN3ez6SJRPC1jolNCSwKUAS9zXTQFryQQzVlalwC0oqpKG5CyThkjBc-0riQlKmOyymqWST5C1eGv8i4ED1ooE_c5opfGCkrEzpFYi70jsXMkDo56lv9iN9600r_9Q10eKOhrvRrwIqhejYLGeFBRNM78yX8BeAKEhQ
CitedBy_id crossref_primary_10_3390_jimaging9020033
crossref_primary_10_1007_s10489_022_04370_x
crossref_primary_10_1016_j_asoc_2025_112740
Cites_doi 10.1109/MCSE.2010.118
10.1007/978-3-030-57977-7_1
10.1145/2939672.2939785
10.1007/978-3-030-67664-3_8
10.1007/s10994-012-5285-8
10.1007/978-3-030-86523-8_28
10.1002/widm.1139
10.1145/567806.567807
10.1007/978-3-642-23808-6_10
ContentType Journal Article
Copyright 2021 The Author(s)
Copyright_xml – notice: 2021 The Author(s)
DBID 6I.
AAFTH
AAYXX
CITATION
DOI 10.1016/j.simpa.2021.100137
DatabaseName ScienceDirect Open Access Titles
Elsevier:ScienceDirect:Open Access
CrossRef
DatabaseTitle CrossRef
DatabaseTitleList
DeliveryMethod fulltext_linktorsrc
EISSN 2665-9638
ExternalDocumentID 10_1016_j_simpa_2021_100137
S2665963821000567
GroupedDBID 0SF
6I.
AAEDW
AAFTH
AALRI
AAXUO
AEXQZ
AITUG
ALMA_UNASSIGNED_HOLDINGS
AMRAJ
EBS
EJD
FDB
M41
M~E
NCXOZ
ROL
0R~
AAYWO
AAYXX
ACVFH
ADCNI
ADVLN
AEUPX
AFJKZ
AFPUW
AIGII
AKBMS
AKRWK
AKYEP
APXCP
CITATION
ID FETCH-LOGICAL-c348t-f40180e12e386f7d739ae0f324b8ee191c94deaab4220735ff9a10c52a95b25a3
ISICitedReferencesCount 7
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000837034900025&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 2665-9638
IngestDate Tue Nov 18 22:27:08 EST 2025
Thu Nov 20 00:56:09 EST 2025
Thu Jul 20 20:14:31 EDT 2023
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Keywords Rule learning
Multi-label classification
Machine learning
Gradient boosting
Language English
License This is an open access article under the CC BY license.
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c348t-f40180e12e386f7d739ae0f324b8ee191c94deaab4220735ff9a10c52a95b25a3
ORCID 0000-0001-8570-8240
OpenAccessLink https://dx.doi.org/10.1016/j.simpa.2021.100137
ParticipantIDs crossref_citationtrail_10_1016_j_simpa_2021_100137
crossref_primary_10_1016_j_simpa_2021_100137
elsevier_sciencedirect_doi_10_1016_j_simpa_2021_100137
PublicationCentury 2000
PublicationDate November 2021
2021-11-00
PublicationDateYYYYMMDD 2021-11-01
PublicationDate_xml – month: 11
  year: 2021
  text: November 2021
PublicationDecade 2020
PublicationTitle Software impacts
PublicationYear 2021
Publisher Elsevier B.V
Publisher_xml – name: Elsevier B.V
References Tianqi Chen, Carlos Guestrin, XGBoost: A scalable tree boosting system, in: Proc. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016, pp. 785–794.
Tsoumakas, Spyromitros-Xioufis, Vilcek, Vlahavas (b15) 2011; 12
Anderson, Bai, Bischof, Blackford, Demmel, Dongarra, Du Croz, Greenbaum, Hammarling, McKenney (b12) 1999
Pedregosa, Varoquaux, Gramfort, Michel, Thirion, Grisel, Blondel, Prettenhofer, Weiss, Dubourg (b13) 2011; 12
Dembczyński, Waegeman, Cheng, Hüllermeier (b2) 2012; 88
Hüllermeier, Wever, Loza Mencía, Fürnkranz, Rapp (b9) 2020
Si Si, Huan Zhang, S. Sathiya Keerthi, Dhruv Mahajan, Inderjit S. Dhillon, Cho-Jui Hsieh, Gradient boosted decision trees for high dimensional sparse output, in: Proc. International Conference on Machine Learning (ICML), 2017, pp. 3182–3190.
Kirchhof, Schmid, Reining, ten Hompel, Pauly (b20) 2021
Michael Rapp, Eneldo Loza Mencía, Johannes Fürnkranz, Vu-Linh Nguyen, Eyke Hüllermeier, Learning gradient boosted multi-label classification rules, in: Proc. European Conference on Machine Learning and Knowledge Discovery in Databases (ECML-PKDD), 2020, pp. 124–140.
Ke, Meng, Finley, Wang, Chen, Ma, Ye, Liu (b4) 2017; 30
Chandra, Dagum, Kohr, Menon, Maydan, McDonald (b10) 2001
Michael Rapp, Eneldo Loza Mencía, Johannes Fürnkranz, Eyke Hüllermeier, Gradient-based label binning in multi-label classification, in: Proc. European Conference on Machine Learning and Knowledge Discovery in Databases (ECML-PKDD), 2021, pp. 462–477.
Behnel, Bradshaw, Citro, Dalcin, Seljebotn, Smith (b14) 2010; 13
Gibaja, Ventura (b1) 2014; 4
Blackford, Petitet, Pozo, Remington, Whaley, Demmel, Dongarra, Duff, Hammarling, Henry (b11) 2002; 28
Zhang, Jung (b7) 2020
Eyke Hüllermeier, Johannes Fürnkranz, Eneldo Loza Mencía, Vu-Linh Nguyen, Michael Rapp, Rule-based multi-label classification: Challenges and opportunities, in: Proc. International Joint Conference on Rules and Reasoning, 2020, pp. 3–19.
Konstantinos Sechidis, Grigorios Tsoumakas, Ioannis Vlahavas, On the stratification of multi-label data, in: Proc. European Conference on Machine Learning and Knowledge Discovery in Databases (ECML-PKDD), 2011, pp. 145–158.
Loza Mencía, Fürnkranz, Hüllermeier, Rapp (b16) 2018
Yonatan Amit, Ofer Dekel, Yoram Singer, A boosting algorithm for label covering in multilabel problems, in: Proc. International Conference on Artificial Intelligence and Statistics (AISTATS), 2007, pp. 27–34.
Ke (10.1016/j.simpa.2021.100137_b4) 2017; 30
10.1016/j.simpa.2021.100137_b5
10.1016/j.simpa.2021.100137_b19
10.1016/j.simpa.2021.100137_b18
10.1016/j.simpa.2021.100137_b3
10.1016/j.simpa.2021.100137_b17
10.1016/j.simpa.2021.100137_b8
10.1016/j.simpa.2021.100137_b6
Zhang (10.1016/j.simpa.2021.100137_b7) 2020
Blackford (10.1016/j.simpa.2021.100137_b11) 2002; 28
Hüllermeier (10.1016/j.simpa.2021.100137_b9) 2020
Tsoumakas (10.1016/j.simpa.2021.100137_b15) 2011; 12
Anderson (10.1016/j.simpa.2021.100137_b12) 1999
Behnel (10.1016/j.simpa.2021.100137_b14) 2010; 13
Dembczyński (10.1016/j.simpa.2021.100137_b2) 2012; 88
Pedregosa (10.1016/j.simpa.2021.100137_b13) 2011; 12
Loza Mencía (10.1016/j.simpa.2021.100137_b16) 2018
Kirchhof (10.1016/j.simpa.2021.100137_b20) 2021
Gibaja (10.1016/j.simpa.2021.100137_b1) 2014; 4
Chandra (10.1016/j.simpa.2021.100137_b10) 2001
References_xml – reference: Michael Rapp, Eneldo Loza Mencía, Johannes Fürnkranz, Eyke Hüllermeier, Gradient-based label binning in multi-label classification, in: Proc. European Conference on Machine Learning and Knowledge Discovery in Databases (ECML-PKDD), 2021, pp. 462–477.
– volume: 12
  start-page: 2825
  year: 2011
  end-page: 2830
  ident: b13
  article-title: Scikit-learn: Machine learning in Python
  publication-title: J. Mach. Learn. Res.
– year: 2021
  ident: b20
  article-title: PRSL: Interpretable multi-label stacking by learning probabilistic rules
– year: 1999
  ident: b12
  article-title: LAPACK Users’ guide
– volume: 30
  start-page: 3146
  year: 2017
  end-page: 3154
  ident: b4
  article-title: LightGBM: A highly efficient gradient boosting decision tree
  publication-title: Adv. Neural Inf. Process. Syst.
– reference: Si Si, Huan Zhang, S. Sathiya Keerthi, Dhruv Mahajan, Inderjit S. Dhillon, Cho-Jui Hsieh, Gradient boosted decision trees for high dimensional sparse output, in: Proc. International Conference on Machine Learning (ICML), 2017, pp. 3182–3190.
– reference: Michael Rapp, Eneldo Loza Mencía, Johannes Fürnkranz, Vu-Linh Nguyen, Eyke Hüllermeier, Learning gradient boosted multi-label classification rules, in: Proc. European Conference on Machine Learning and Knowledge Discovery in Databases (ECML-PKDD), 2020, pp. 124–140.
– year: 2020
  ident: b9
  article-title: A flexible class of dependence-aware multi-label loss functions
– volume: 4
  start-page: 411
  year: 2014
  end-page: 444
  ident: b1
  article-title: Multi-label learning: A review of the state of the art and ongoing research
  publication-title: Wiley Interdiscip. Rev. Data Min. Knowl. Discov.
– year: 2020
  ident: b7
  article-title: GBDT-MO: Gradient-boosted decision trees for multiple outputs
  publication-title: IEEE Trans. Neural Netw. Learn. Syst.
– reference: Eyke Hüllermeier, Johannes Fürnkranz, Eneldo Loza Mencía, Vu-Linh Nguyen, Michael Rapp, Rule-based multi-label classification: Challenges and opportunities, in: Proc. International Joint Conference on Rules and Reasoning, 2020, pp. 3–19.
– volume: 12
  start-page: 2411
  year: 2011
  end-page: 2414
  ident: b15
  article-title: Mulan: A Java library for multi-label learning
  publication-title: J. Mach. Learn. Res.
– reference: Yonatan Amit, Ofer Dekel, Yoram Singer, A boosting algorithm for label covering in multilabel problems, in: Proc. International Conference on Artificial Intelligence and Statistics (AISTATS), 2007, pp. 27–34.
– start-page: 81
  year: 2018
  end-page: 113
  ident: b16
  article-title: Learning interpretable rules for multi-label classification
  publication-title: Explainable and Interpretable Models in Computer Vision and Machine Learning
– reference: Konstantinos Sechidis, Grigorios Tsoumakas, Ioannis Vlahavas, On the stratification of multi-label data, in: Proc. European Conference on Machine Learning and Knowledge Discovery in Databases (ECML-PKDD), 2011, pp. 145–158.
– year: 2001
  ident: b10
  article-title: Parallel Programming in OpenMP
– volume: 13
  start-page: 31
  year: 2010
  end-page: 39
  ident: b14
  article-title: Cython: The best of both worlds
  publication-title: Comput. Sci. Eng.
– volume: 88
  start-page: 5
  year: 2012
  end-page: 45
  ident: b2
  article-title: On label dependence and loss minimization in multi-label classification
  publication-title: Mach. Learn.
– reference: Tianqi Chen, Carlos Guestrin, XGBoost: A scalable tree boosting system, in: Proc. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016, pp. 785–794.
– volume: 28
  start-page: 135
  year: 2002
  end-page: 151
  ident: b11
  article-title: An updated set of basic linear algebra subprograms (BLAS)
  publication-title: ACM Trans. Math. Software
– volume: 13
  start-page: 31
  issue: 2
  year: 2010
  ident: 10.1016/j.simpa.2021.100137_b14
  article-title: Cython: The best of both worlds
  publication-title: Comput. Sci. Eng.
  doi: 10.1109/MCSE.2010.118
– ident: 10.1016/j.simpa.2021.100137_b17
  doi: 10.1007/978-3-030-57977-7_1
– year: 2021
  ident: 10.1016/j.simpa.2021.100137_b20
– ident: 10.1016/j.simpa.2021.100137_b3
  doi: 10.1145/2939672.2939785
– year: 2020
  ident: 10.1016/j.simpa.2021.100137_b7
  article-title: GBDT-MO: Gradient-boosted decision trees for multiple outputs
  publication-title: IEEE Trans. Neural Netw. Learn. Syst.
– volume: 12
  start-page: 2411
  year: 2011
  ident: 10.1016/j.simpa.2021.100137_b15
  article-title: Mulan: A Java library for multi-label learning
  publication-title: J. Mach. Learn. Res.
– ident: 10.1016/j.simpa.2021.100137_b8
  doi: 10.1007/978-3-030-67664-3_8
– ident: 10.1016/j.simpa.2021.100137_b5
– year: 2001
  ident: 10.1016/j.simpa.2021.100137_b10
– volume: 30
  start-page: 3146
  year: 2017
  ident: 10.1016/j.simpa.2021.100137_b4
  article-title: LightGBM: A highly efficient gradient boosting decision tree
  publication-title: Adv. Neural Inf. Process. Syst.
– ident: 10.1016/j.simpa.2021.100137_b6
– volume: 88
  start-page: 5
  issue: 1–2
  year: 2012
  ident: 10.1016/j.simpa.2021.100137_b2
  article-title: On label dependence and loss minimization in multi-label classification
  publication-title: Mach. Learn.
  doi: 10.1007/s10994-012-5285-8
– year: 1999
  ident: 10.1016/j.simpa.2021.100137_b12
– ident: 10.1016/j.simpa.2021.100137_b18
  doi: 10.1007/978-3-030-86523-8_28
– volume: 4
  start-page: 411
  issue: 6
  year: 2014
  ident: 10.1016/j.simpa.2021.100137_b1
  article-title: Multi-label learning: A review of the state of the art and ongoing research
  publication-title: Wiley Interdiscip. Rev. Data Min. Knowl. Discov.
  doi: 10.1002/widm.1139
– start-page: 81
  year: 2018
  ident: 10.1016/j.simpa.2021.100137_b16
  article-title: Learning interpretable rules for multi-label classification
– volume: 12
  start-page: 2825
  year: 2011
  ident: 10.1016/j.simpa.2021.100137_b13
  article-title: Scikit-learn: Machine learning in Python
  publication-title: J. Mach. Learn. Res.
– year: 2020
  ident: 10.1016/j.simpa.2021.100137_b9
– volume: 28
  start-page: 135
  issue: 2
  year: 2002
  ident: 10.1016/j.simpa.2021.100137_b11
  article-title: An updated set of basic linear algebra subprograms (BLAS)
  publication-title: ACM Trans. Math. Software
  doi: 10.1145/567806.567807
– ident: 10.1016/j.simpa.2021.100137_b19
  doi: 10.1007/978-3-642-23808-6_10
SSID ssj0002511168
Score 2.2104938
Snippet Multi-label classification is concerned with the assignment of sets of labels to individual data points. Due to its diverse real-world applications, e.g., the...
SourceID crossref
elsevier
SourceType Enrichment Source
Index Database
Publisher
StartPage 100137
SubjectTerms Gradient boosting
Machine learning
Multi-label classification
Rule learning
Title BOOMER — An algorithm for learning gradient boosted multi-label classification rules
URI https://dx.doi.org/10.1016/j.simpa.2021.100137
Volume 10
WOSCitedRecordID wos000837034900025&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVHPJ
  databaseName: ROAD: Directory of Open Access Scholarly Resources
  customDbUrl:
  eissn: 2665-9638
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0002511168
  issn: 2665-9638
  databaseCode: M~E
  dateStart: 20190101
  isFulltext: true
  titleUrlDefault: https://road.issn.org
  providerName: ISSN International Centre
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Nb9QwELWgcOCCQIAoX_KB2-LVJrHX9nGpijjQFqFS9RY5Xrt0laarbAo9Vf0R_ML-EsZf2basVvTAJYqs2LHyJuPn0ZsxQu85tVRkY00o0GMCf6ImwjBFuOKjseRGV17le_CF7-6Kw0P5NZ7fufDHCfCmEefncv5foYY2ANulzt4B7n5QaIB7AB2uADtc_wn4j3t7O9vfkoiBTpqBqo9O2-Pux4nXFNYpGHLUerlXNwCi7YKeQVyYOhZgHqYeaMeunZwoGEp7VkfR4Syl99rul1OPhXTLpXJezee3ZfkxuJBnMcuu90GwfDPi_tGwXKxoS050dM0LZr5a6UoHHWIFs-HCzWroXjlcPn2zHPatZaoXDyZd2qz0g5RukDIMch89yDmTTtq3c7GMtbltVOazIvu5pwJUXur312RWk5RrxGP_CXocdwx4EpB-iu6Z5hk6CCjjq8vfeNLgHmEMCOOEME4I44gw9ggTjyy-iSz2yD5H3z9t7299JvGIDKILKjpiqSvAZrLcFGJs-ZQXUpmRBZZcCWNgL64lnRqlKprn4MyZtVJlI81yJVmVM1W8QBvNaWNeIlxIW1jgk1YDSzWZEJWxilsKG06mBJtuojx9k1LH-vHuGJO6XAPIJvrQd5qH8inrHx-nj11GBhiYXQn2s67jq7u95zV6tLT2N2ija8_MW_RQ_-yOF-07bzx_AFJpebc
linkProvider ISSN International Centre
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=BOOMER%E2%80%94An+algorithm+for+learning+gradient+boosted+multi%E2%80%93label+classification+rules&rft.jtitle=Software+impacts&rft.au=Rapp%2C+Michael&rft.date=2021-11-01&rft.issn=2665-9638&rft.eissn=2665-9638&rft.volume=10&rft.spage=100137&rft_id=info:doi/10.1016%2Fj.simpa.2021.100137&rft.externalDBID=n%2Fa&rft.externalDocID=10_1016_j_simpa_2021_100137
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2665-9638&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2665-9638&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2665-9638&client=summon