Fair Algorithms for Hierarchical Agglomerative Clustering

Hierarchical Agglomerative Clustering (HAC) algorithms are extensively utilized in modern data science, and seek to partition the dataset into clusters while generating a hierarchical relationship between the data samples. HAC algorithms are employed in many applications, such as biology, natural la...

Full description

Saved in:
Bibliographic Details
Published in:2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA) pp. 206 - 211
Main Authors: Chhabra, Anshuman, Mohapatra, Prasant
Format: Conference Proceeding
Language:English
Published: IEEE 01.12.2022
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract Hierarchical Agglomerative Clustering (HAC) algorithms are extensively utilized in modern data science, and seek to partition the dataset into clusters while generating a hierarchical relationship between the data samples. HAC algorithms are employed in many applications, such as biology, natural language processing, and recommender systems. Thus, it is imperative to ensure that these algorithms are fair- even if the dataset contains biases against certain protected groups, the cluster outputs generated should not discriminate against samples from any of these groups. However, recent work in clustering fairness has mostly focused on center-based clustering algorithms, such as k-median and k-means clustering. In this paper, we propose fair algorithms for performing HAC that enforce fairness constraints 1) irrespective of the distance linkage criteria used, 2) generalize to any natural measures of clustering fairness for HAC, 3) work for multiple protected groups, and 4) have competitive running times to vanilla HAC. Through extensive experiments on multiple real-world UCI datasets, we show that our proposed algorithm finds fairer clusterings compared to vanilla HAC as well as the only other state-of-the-art fair HAC approach.
AbstractList Hierarchical Agglomerative Clustering (HAC) algorithms are extensively utilized in modern data science, and seek to partition the dataset into clusters while generating a hierarchical relationship between the data samples. HAC algorithms are employed in many applications, such as biology, natural language processing, and recommender systems. Thus, it is imperative to ensure that these algorithms are fair- even if the dataset contains biases against certain protected groups, the cluster outputs generated should not discriminate against samples from any of these groups. However, recent work in clustering fairness has mostly focused on center-based clustering algorithms, such as k-median and k-means clustering. In this paper, we propose fair algorithms for performing HAC that enforce fairness constraints 1) irrespective of the distance linkage criteria used, 2) generalize to any natural measures of clustering fairness for HAC, 3) work for multiple protected groups, and 4) have competitive running times to vanilla HAC. Through extensive experiments on multiple real-world UCI datasets, we show that our proposed algorithm finds fairer clusterings compared to vanilla HAC as well as the only other state-of-the-art fair HAC approach.
Author Chhabra, Anshuman
Mohapatra, Prasant
Author_xml – sequence: 1
  givenname: Anshuman
  surname: Chhabra
  fullname: Chhabra, Anshuman
  email: chhabra@ucdavis.edu
  organization: University of California, Davis,Department of Computer Science,Davis,California,USA
– sequence: 2
  givenname: Prasant
  surname: Mohapatra
  fullname: Mohapatra, Prasant
  email: pmohapatra@ucdavis.edu
  organization: University of California, Davis,Department of Computer Science,Davis,California,USA
BookMark eNotjs1KxDAURiPowhl9A4W8QOvNz02TZSmOM1Bxo-vhtk06gf5IWgXf3oKuPjgcDt-OXU_z5Bl7FJALAe7pVL3WJaJxJpcgZQ4AylyxnTAGtZFWqVvmDhQTL4d-TnG9jAsPc-LH6BOl9hJbGnjZ98M8bmCN355Xw9ey-hSn_o7dBBoWf_-_e_ZxeH6vjln99nKqyjqLEvSaCWy0RYutFQ6tkiS2EyF4JDSCXAcuaKNbFbpQdAWZJihC56HZZEJs1J49_HWj9_78meJI6ecsAIyTolC_GEdEag
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/ICMLA55696.2022.00036
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 1665462833
9781665462839
EndPage 211
ExternalDocumentID 10069217
Genre orig-research
GroupedDBID 6IE
6IL
CBEJK
RIE
RIL
ID FETCH-LOGICAL-i204t-15b48585c8195832a1036ffe5a561a9d09f464c3fdf7d7a6bf3a59e0b583a55b3
IEDL.DBID RIE
ISICitedReferencesCount 7
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000980994900029&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Thu Jan 18 11:14:48 EST 2024
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i204t-15b48585c8195832a1036ffe5a561a9d09f464c3fdf7d7a6bf3a59e0b583a55b3
PageCount 6
ParticipantIDs ieee_primary_10069217
PublicationCentury 2000
PublicationDate 2022-Dec.
PublicationDateYYYYMMDD 2022-12-01
PublicationDate_xml – month: 12
  year: 2022
  text: 2022-Dec.
PublicationDecade 2020
PublicationTitle 2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)
PublicationTitleAbbrev ICMLA
PublicationYear 2022
Publisher IEEE
Publisher_xml – name: IEEE
Score 1.884305
Snippet Hierarchical Agglomerative Clustering (HAC) algorithms are extensively utilized in modern data science, and seek to partition the dataset into clusters while...
SourceID ieee
SourceType Publisher
StartPage 206
SubjectTerms Clustering
Clustering algorithms
Costs
Couplings
Data science
Fairness in Clustering
Hierarchical Agglomerative Clustering
Machine learning
Measurement
Natural language processing
Title Fair Algorithms for Hierarchical Agglomerative Clustering
URI https://ieeexplore.ieee.org/document/10069217
WOSCitedRecordID wos000980994900029&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LTwMhECa28eBJjWt8h4NXlC6w7BybxqYm2vSgSW8Nu0DdZO2avn6_DF0fFw_eCCGQgQzzwcw3Q8itSnNuhc8ZOvWYBAEMQALTrlSeGyiEsrHYhB6P8-kUJi1ZPXJhnHMx-MzdYTP68m1TbvCrLGg4zyBg6A7paJ3tyFotK6fH4f5x8PzUVyoDDD1IYyJOzLz8q2pKNBrDw38ud0SSH_odnXwblmOy5xYnBIamWtJ-PW_Cc_7tfUUD2qSjCvnDsZxJTfvzed3gHxPeYHRQbzAHQpggIa_Dh5fBiLV1D1iVcrlmPVVIdNeV6OMKGmd6QQ7vnTIB7BiwHLzMZCm89dpqkxVeGAWOF2GwUaoQp6S7aBbujNAAb1zuRSoMSCm9Nj4vBXjrco38K39OEpR79rFLbTH7Evnij_5LcoBbu4vnuCLd9XLjrsl-uV1Xq-VNPJBPtluNOw
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LTwMhECZaTfSkxhrfcvCK0gV2d45NY9PGtumhJr017AJ1k3XX9OHvl6H1cfHgjRACGcgwH8x8M4TcqyjlRriUoVOPSRDAACSwxObKcQ2ZUCYUm0hGo3Q6hfGWrB64MNbaEHxmH7AZfPmmztf4VeY1nMfgMfQu2VNSRnxD19ryclocHvud4aCtVAwYfBCFVJyYe_lX3ZRgNrpH_1zwmDR_CHh0_G1aTsiOrU4JdHWxoO1yXvsH_evbknq8SXsFMohDQZOStufzssZfJrzDaKdcYxYEP0GTvHSfJp0e21Y-YEXE5Yq1VCbRYZejl8vrnG55OZyzSnu4o8FwcDKWuXDGJSbRceaEVmB55gdrpTJxRhpVXdlzQj3AsakTkdAgpXSJdmkuwBmbJsjAchekiXLP3jfJLWZfIl_-0X9HDnqT4WA26I-er8ghbvMmuuOaNFaLtb0h-_nHqlgubsPhfAJAwJCC
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2022+21st+IEEE+International+Conference+on+Machine+Learning+and+Applications+%28ICMLA%29&rft.atitle=Fair+Algorithms+for+Hierarchical+Agglomerative+Clustering&rft.au=Chhabra%2C+Anshuman&rft.au=Mohapatra%2C+Prasant&rft.date=2022-12-01&rft.pub=IEEE&rft.spage=206&rft.epage=211&rft_id=info:doi/10.1109%2FICMLA55696.2022.00036&rft.externalDocID=10069217