Bi-criteria sublinear time algorithms for clustering with outliers in high dimensions
•Research highlight 1 This paper introduces a novel uniform sampling framework for solving k-means/ median clustering with outliers problems. The key theoretical innovation lies in the sample complexity being independent of both input size and dimensionality, making it particularly effective for lar...
Gespeichert in:
| Veröffentlicht in: | Theoretical computer science Jg. 1057; S. 115538 |
|---|---|
| Hauptverfasser: | , , |
| Format: | Journal Article |
| Sprache: | Englisch |
| Veröffentlicht: |
Elsevier B.V
06.12.2025
|
| Schlagworte: | |
| ISSN: | 0304-3975 |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Abstract | •Research highlight 1 This paper introduces a novel uniform sampling framework for solving k-means/ median clustering with outliers problems. The key theoretical innovation lies in the sample complexity being independent of both input size and dimensionality, making it particularly effective for large-scale and high-dimensional datasets. The analysis introduces a “star-shaped transformation” technique that enables rigorous analysis of clustering quality despite the presence of outliers.•Research highlight 2 The paper proposes a practical sub-linear time algorithm by combining uniform sampling with an “augmented sandwich lemma” technique to boost success probability. The experimental results demonstrate that this approach achieves comparable clustering quality to state-of-the-art methods while running significantly faster, especially on large datasets. The algorithm’s simple implementation and theoretical guarantees make it particularly suitable for real-world applications with limited data access or computational resources.
Real-world datasets often contain outliers, and the presence of outliers can make clustering problems be much more challenging. Existing algorithms for clustering with outliers often have high computational complexities. In this paper, we propose a simple yet effective sublinear framework for solving the representative center-based clustering with outliers problems: k-median/means clustering with outliers. Our analysis is fundamentally different from the previous (uniform and non-uniform) sampling based ideas. In particular, our sample complexity is independent of the input size and dimensionality, and thus it is suitable for dealing with large-scale and high-dimensional datasets. We also conduct a set of experiments to evaluate the effectiveness of our proposed method on both synthetic and real datasets. |
|---|---|
| AbstractList | •Research highlight 1 This paper introduces a novel uniform sampling framework for solving k-means/ median clustering with outliers problems. The key theoretical innovation lies in the sample complexity being independent of both input size and dimensionality, making it particularly effective for large-scale and high-dimensional datasets. The analysis introduces a “star-shaped transformation” technique that enables rigorous analysis of clustering quality despite the presence of outliers.•Research highlight 2 The paper proposes a practical sub-linear time algorithm by combining uniform sampling with an “augmented sandwich lemma” technique to boost success probability. The experimental results demonstrate that this approach achieves comparable clustering quality to state-of-the-art methods while running significantly faster, especially on large datasets. The algorithm’s simple implementation and theoretical guarantees make it particularly suitable for real-world applications with limited data access or computational resources.
Real-world datasets often contain outliers, and the presence of outliers can make clustering problems be much more challenging. Existing algorithms for clustering with outliers often have high computational complexities. In this paper, we propose a simple yet effective sublinear framework for solving the representative center-based clustering with outliers problems: k-median/means clustering with outliers. Our analysis is fundamentally different from the previous (uniform and non-uniform) sampling based ideas. In particular, our sample complexity is independent of the input size and dimensionality, and thus it is suitable for dealing with large-scale and high-dimensional datasets. We also conduct a set of experiments to evaluate the effectiveness of our proposed method on both synthetic and real datasets. |
| ArticleNumber | 115538 |
| Author | Liu, Wenjie Ding, Hu Huang, Jiawei |
| Author_xml | – sequence: 1 givenname: Jiawei orcidid: 0000-0003-4819-2585 surname: Huang fullname: Huang, Jiawei organization: University of Science and Technology of China, Hefei, 230000, Anhui, China – sequence: 2 givenname: Wenjie surname: Liu fullname: Liu, Wenjie organization: University of Science and Technology of China, Hefei, 230000, Anhui, China – sequence: 3 givenname: Hu orcidid: 0000-0002-1307-6077 surname: Ding fullname: Ding, Hu email: huding@ustc.edu.cn, huding@buffalo.edu organization: University of Science and Technology of China, Hefei, 230000, Anhui, China |
| BookMark | eNp9kMFOAyEQQDnUxLb6Ad74gV1hge4ST9qoNWnixZ4JC0NLswUDW41_L816di6TzMybzLwFmoUYAKE7SmpK6Or-WI8m1w1pRE2pEKyboTlhhFdMtuIaLXI-khKiXc3R7slXJvkRktc4n_vBB9AJj_4EWA_7WFqHU8YuJmyGc77MhT3-LlUcz-PgIWXsAz74_QHbAoXsY8g36MrpIcPtX16i3cvzx3pTbd9f39aP28o0go4Vc0zzhlnNWM9d1znZ94TTtrOil40BAdzJhjDprObSOA3EWOk6ZwznrLVsiei016SYcwKnPpM_6fSjKFEXF-qoigt1caEmF4V5mBgoh32VB1Q2HoIB6xOYUdno_6F_ARyobYE |
| Cites_doi | 10.1007/s11704-019-9059-3 10.1016/j.neucom.2021.04.028 10.1145/1970392.1970395 10.1145/304181.304206 10.1023/B:MACH.0000033114.18632.e0 10.1109/TKDE.2019.2954317 10.1023/B:MACH.0000033115.78247.f0 10.1214/aos/1031833664 10.1016/j.patrec.2009.09.011 10.14778/3067421.3067425 10.1145/1541880.1541882 |
| ContentType | Journal Article |
| Copyright | 2025 Elsevier B.V. |
| Copyright_xml | – notice: 2025 Elsevier B.V. |
| DBID | AAYXX CITATION |
| DOI | 10.1016/j.tcs.2025.115538 |
| DatabaseName | CrossRef |
| DatabaseTitle | CrossRef |
| DatabaseTitleList | |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Mathematics Computer Science |
| ExternalDocumentID | 10_1016_j_tcs_2025_115538 S0304397525004761 |
| GroupedDBID | --K --M -~X .DC .~1 0R~ 123 1B1 1RT 1~. 1~5 4.4 457 4G. 5VS 7-5 71M 8P~ 9JN AABNK AAEDW AAIKJ AAKOC AALRI AAOAW AAQFI AATTM AAXKI AAXUO AAYFN AAYWO ABAOU ABBOA ABJNI ABMAC ACDAQ ACGFS ACLOT ACRLP ACVFH ACZNC ADBBV ADCNI ADEZE AEBSH AEIPS AEKER AENEX AEUPX AFJKZ AFPUW AFTJW AGUBO AGYEJ AHHHB AHZHX AIALX AIEXJ AIGII AIIUN AIKHN AITUG AKBMS AKRWK AKYEP ALMA_UNASSIGNED_HOLDINGS AMRAJ ANKPU AOUOD APXCP ARUGR AXJTR BKOJK BLXMC CS3 DU5 EBS EFJIC EFKBS EFLBG EO8 EO9 EP2 EP3 F5P FDB FEDTE FIRID FNPLU FYGXN G-Q GBLVA GBOLZ HVGLF IHE IXB J1W KOM LG9 M26 M41 MHUIS MO0 N9A O-L O9- OAUVE OK1 OZT P-8 P-9 P2P PC. Q38 ROL RPZ SCC SDF SDG SES SEW SPC SPCBC SSV SSW T5K TN5 WH7 YNT ZMT ~G- ~HD 29Q 9DU AAEDT AAQXK AAYXX ABDPE ABEFU ABFNM ABWVN ABXDB ACNNM ACRPL ADMUD ADNMO ADVLN AEXQZ AGHFR AGQPQ ASPBG AVWKF AZFZN CITATION EJD FGOYB G-2 HZ~ R2- SSZ TAE WUQ ZY4 |
| ID | FETCH-LOGICAL-c251t-3f3a423da33b4f88f9bb04178d5b92ce5e4f92039fda49cfae0cd9f8fcc4437d3 |
| ISICitedReferencesCount | 0 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001578144500001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 0304-3975 |
| IngestDate | Sat Nov 29 06:53:49 EST 2025 Wed Dec 10 14:38:43 EST 2025 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Keywords | Outliers Clustering Sampling Approximation algorithm Sublinear |
| Language | English |
| LinkModel | OpenURL |
| MergedId | FETCHMERGED-LOGICAL-c251t-3f3a423da33b4f88f9bb04178d5b92ce5e4f92039fda49cfae0cd9f8fcc4437d3 |
| ORCID | 0000-0003-4819-2585 0000-0002-1307-6077 |
| ParticipantIDs | crossref_primary_10_1016_j_tcs_2025_115538 elsevier_sciencedirect_doi_10_1016_j_tcs_2025_115538 |
| PublicationCentury | 2000 |
| PublicationDate | 2025-12-06 |
| PublicationDateYYYYMMDD | 2025-12-06 |
| PublicationDate_xml | – month: 12 year: 2025 text: 2025-12-06 day: 06 |
| PublicationDecade | 2020 |
| PublicationTitle | Theoretical computer science |
| PublicationYear | 2025 |
| Publisher | Elsevier B.V |
| Publisher_xml | – name: Elsevier B.V |
| References | Meyerson, O’callaghan, Plotkin (bib0019) 2004; 56 Huang, Jiang, Li, Wu (bib0020) 2018 Czumaj, Sohler (bib0018) 2004 Chen, Azer, Zhang (bib0012) 2018 Manning, Raghavan, Schütze (bib0048) 2008 Friggstad, Khodamoradi, Rezapour, Salavatipour (bib0033) 2018 Liu, Li, Wu, Fu (bib0034) 2019; 33 Paul, Chakraborty, Das, Xu (bib0035) 2021; 34 Ash, Zhang, Krishnamurthy, Langford, Agarwal (bib0006) 2020 Huang, Jiang, Lou (bib0042) 2023 Charikar, Khuller, Mount, Narasimhan (bib0023) 2001 Deshpande, Kacham, Pratap (bib0010) 2020; 124 N. Alon, J.H. Spencer, The probabilistic method, 2004. Chandola, Banerjee, Kumar (bib0003) 2009; 41 P. Awasthi, M.-F. Balcan, Center based clustering: a foundational perspective(2014). . Huang, Feng, Huang, Xu, Wang (bib0013) 2024 Arthur, Vassilvitskii (bib0022) 2007 Chawla, Gionis (bib0024) 2013 Tang, Ding, Jankov, Yuan, Bourgeois, Jermaine (bib0027) 2023 Indyk (bib0017) 1999 Gupta (bib0028) 2018 Braverman, Cohen-Addad, Jiang, Krauthgamer, Schwiegelshohn, Toftrup, Wu (bib0041) 2022 Charikar, O’Callaghan, Panigrahy (bib0038) 2003 Jain (bib0001) 2010; 31 Huang, Jiang, Lou, Wu (bib0040) 2023 Candès, Li, Ma, Wright (bib0049) 2011; 58 Bachem, Lucic, Hassani, Krause (bib0046) 2016 Chen (bib0031) 2008 Grunau, Rozhoň (bib0011) 2022 Ding, Wang (bib0039) 2020; 119 Zhao, Christensen, Li, Hu, Yi (bib0016) 2018 Bhattacharjee, Mitra (bib0005) 2021; 15 Zhang, Feng, Huang, Guo, Xu, Wang (bib0025) 2021; 450 Chaudhuri, Motwani, Narasayya (bib0015) 1999; 28 Cuesta-Albertos, Gordaliza, Matran (bib0029) 1997; 25 Im, Qaem, Moseley, Sun, Zhou (bib0008) 2020 D. Dua, C. Graff, UCI Machine learning repository, 2017. Ester, Kriegel, Sander, Xu (bib0004) 1996 Sanders (bib0014) 2009 Mettu, Plaxton (bib0037) 2004; 56 Feldman, Langberg (bib0050) 2011 Gupta, Kumar, Lu, Moseley, Vassilvitskii (bib0007) 2017; 10 Georgogiannis (bib0030) 2016; 29 Bhaskara, Vadgama, Xu (bib0009) 2019; 32 Chen, Han, Xu, Xu, Zhang (bib0036) 2023 Moseley, Pruhs, Samadian, Wang (bib0026) 2021; 198 Krishnaswamy, Li, Sandeep (bib0032) 2018 Aggarwal, Deshpande, Kannan (bib0044) 2009 Ding (bib0045) 2020; 173 Mishra, Oblinger, Pitt (bib0021) 2001 Indyk (10.1016/j.tcs.2025.115538_bib0017) 1999 Mishra (10.1016/j.tcs.2025.115538_bib0021) 2001 Huang (10.1016/j.tcs.2025.115538_bib0013) 2024 Zhang (10.1016/j.tcs.2025.115538_bib0025) 2021; 450 Huang (10.1016/j.tcs.2025.115538_bib0040) 2023 Sanders (10.1016/j.tcs.2025.115538_bib0014) 2009 Jain (10.1016/j.tcs.2025.115538_bib0001) 2010; 31 Chen (10.1016/j.tcs.2025.115538_bib0031) 2008 Braverman (10.1016/j.tcs.2025.115538_bib0041) 2022 Chen (10.1016/j.tcs.2025.115538_bib0036) 2023 Mettu (10.1016/j.tcs.2025.115538_bib0037) 2004; 56 Feldman (10.1016/j.tcs.2025.115538_bib0050) 2011 Ester (10.1016/j.tcs.2025.115538_bib0004) 1996 Arthur (10.1016/j.tcs.2025.115538_bib0022) 2007 Gupta (10.1016/j.tcs.2025.115538_bib0028) 2018 Krishnaswamy (10.1016/j.tcs.2025.115538_bib0032) 2018 Aggarwal (10.1016/j.tcs.2025.115538_bib0044) 2009 Charikar (10.1016/j.tcs.2025.115538_bib0038) 2003 Bhattacharjee (10.1016/j.tcs.2025.115538_bib0005) 2021; 15 Huang (10.1016/j.tcs.2025.115538_bib0020) 2018 Friggstad (10.1016/j.tcs.2025.115538_bib0033) 2018 Moseley (10.1016/j.tcs.2025.115538_bib0026) 2021; 198 Grunau (10.1016/j.tcs.2025.115538_bib0011) 2022 Tang (10.1016/j.tcs.2025.115538_bib0027) 2023 Bachem (10.1016/j.tcs.2025.115538_bib0046) 2016 Charikar (10.1016/j.tcs.2025.115538_bib0023) 2001 10.1016/j.tcs.2025.115538_bib0043 Cuesta-Albertos (10.1016/j.tcs.2025.115538_bib0029) 1997; 25 10.1016/j.tcs.2025.115538_bib0047 10.1016/j.tcs.2025.115538_bib0002 Im (10.1016/j.tcs.2025.115538_bib0008) 2020 Paul (10.1016/j.tcs.2025.115538_bib0035) 2021; 34 Zhao (10.1016/j.tcs.2025.115538_bib0016) 2018 Georgogiannis (10.1016/j.tcs.2025.115538_bib0030) 2016; 29 Ding (10.1016/j.tcs.2025.115538_bib0045) 2020; 173 Chawla (10.1016/j.tcs.2025.115538_bib0024) 2013 Ding (10.1016/j.tcs.2025.115538_bib0039) 2020; 119 Chaudhuri (10.1016/j.tcs.2025.115538_bib0015) 1999; 28 Bhaskara (10.1016/j.tcs.2025.115538_bib0009) 2019; 32 Huang (10.1016/j.tcs.2025.115538_bib0042) 2023 Candès (10.1016/j.tcs.2025.115538_bib0049) 2011; 58 Czumaj (10.1016/j.tcs.2025.115538_bib0018) 2004 Ash (10.1016/j.tcs.2025.115538_bib0006) 2020 Chen (10.1016/j.tcs.2025.115538_bib0012) 2018 Liu (10.1016/j.tcs.2025.115538_bib0034) 2019; 33 Chandola (10.1016/j.tcs.2025.115538_bib0003) 2009; 41 Deshpande (10.1016/j.tcs.2025.115538_bib0010) 2020; 124 Meyerson (10.1016/j.tcs.2025.115538_bib0019) 2004; 56 Gupta (10.1016/j.tcs.2025.115538_bib0007) 2017; 10 Manning (10.1016/j.tcs.2025.115538_bib0048) 2008 |
| References_xml | – start-page: 7845 year: 2022 end-page: 7886 ident: bib0011 article-title: Adapting K-means algorithms for outliers publication-title: International Conference on Machine Learning – start-page: 428 year: 1999 end-page: 434 ident: bib0017 article-title: Sublinear time algorithms for metric space problems publication-title: Proceedings of the Thirty-First Annual ACM Symposium on Theory of Computing, May 1–4, 1999, Atlanta, Georgia, USA – start-page: 55 year: 2016 end-page: 63 ident: bib0046 article-title: Fast and provably good seedings for k-means publication-title: Proceedings of the 30th International Conference on Neural Information Processing Systems – start-page: 642 year: 2001 end-page: 651 ident: bib0023 article-title: Algorithms for facility location problems with outliers publication-title: Proceedings of the Twelfth Annual ACM-SIAM Symposium on Discrete Algorithms – reference: P. Awasthi, M.-F. Balcan, Center based clustering: a foundational perspective(2014). – start-page: 398 year: 2018 end-page: 414 ident: bib0033 article-title: Approximation schemes for clustering with outliers publication-title: Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms – year: 2023 ident: bib0042 article-title: The power of uniform sampling for k-median publication-title: International Conference on Machine Learning – start-page: 814 year: 2018 end-page: 825 ident: bib0020 article-title: Epsilon-coresets for clustering (with outliers) in doubling metrics publication-title: 2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS) – volume: 58 start-page: 11:1 year: 2011 end-page: 11:37 ident: bib0049 article-title: Robust principal component analysis? publication-title: J. ACM – start-page: 33581 year: 2023 end-page: 33598 ident: bib0027 article-title: Auto-differentiation of relational computations for very large scale machine learning publication-title: International Conference on Machine Learning – start-page: 462 year: 2022 end-page: 473 ident: bib0041 article-title: The power of uniform sampling for coresets publication-title: 2022 IEEE 63rd Annual Symposium on Foundations of Computer Science (FOCS) – start-page: 226 year: 1996 end-page: 231 ident: bib0004 article-title: A density-based algorithm for discovering clusters in large spatial databases with noise publication-title: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD) – year: 2018 ident: bib0028 publication-title: Approximation Algorithms for Clustering and Facility Location Problems – start-page: 646 year: 2018 end-page: 659 ident: bib0032 article-title: Constant approximation for k-median and k-means with outliers via iterative rounding publication-title: Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing – year: 2023 ident: bib0040 article-title: Near-optimal coresets for robust clustering publication-title: International Conference on Learning Representations, ICLR – start-page: 569 year: 2011 end-page: 578 ident: bib0050 article-title: A unified framework for approximating and clustering data publication-title: Proceedings of the Forty-Third Annual ACM Symposium on Theory of Computing – volume: 28 start-page: 263 year: 1999 end-page: 274 ident: bib0015 article-title: On random sampling over joins publication-title: ACM SIGMOD Rec. – start-page: 1027 year: 2007 end-page: 1035 ident: bib0022 article-title: k-means++: the advantages of careful seeding publication-title: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms – volume: 31 start-page: 651 year: 2010 end-page: 666 ident: bib0001 article-title: Data clustering: 50 years beyond K-means publication-title: Pattern Recognit. Lett. – volume: 15 start-page: 1 year: 2021 end-page: 27 ident: bib0005 article-title: A survey of density based clustering algorithms publication-title: Front. Comput. Sci. – volume: 198 year: 2021 ident: bib0026 article-title: Relational algorithms for k-means clustering publication-title: International Colloquium on Automata, Languages, and Programming – start-page: 30 year: 2003 end-page: 39 ident: bib0038 article-title: Better streaming algorithms for clustering problems publication-title: Proceedings of the Thirty-Fifth Annual ACM Symposium on Theory of Computing – year: 2008 ident: bib0048 article-title: Introduction to information retrieval – volume: 32 year: 2019 ident: bib0009 article-title: Greedy sampling for approximate clustering in the presence of outliers publication-title: Advances in Neural Information Processing Systems – volume: 56 start-page: 35 year: 2004 end-page: 60 ident: bib0037 article-title: Optimal time bounds for approximate clustering publication-title: Mach. Learn. – start-page: 321 year: 2009 end-page: 340 ident: bib0014 article-title: Algorithm engineering–an attempt at a definition publication-title: Efficient Algorithms: Essays Dedicated to Kurt Mehlhorn on the Occasion of His 60th Birthday – volume: 34 year: 2021 ident: bib0035 article-title: Uniform concentration bounds toward a unified framework for robust clustering publication-title: Adv. Neural Inf. Process. Syst. – volume: 10 start-page: 757 year: 2017 end-page: 768 ident: bib0007 article-title: Local search methods for k-means with outliers publication-title: Proc. VLDB Endow. – start-page: 15 year: 2009 end-page: 28 ident: bib0044 article-title: Adaptive sampling for k-means clustering publication-title: Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques – volume: 29 year: 2016 ident: bib0030 article-title: Robust k-means: a theoretical revisit publication-title: Adv. Neural Inf. Process. Syst. – volume: 25 start-page: 553 year: 1997 end-page: 576 ident: bib0029 article-title: Trimmed k-means: an attempt to robustify quantizers publication-title: Ann. Stat. – volume: 124 start-page: 799 year: 2020 end-page: 808 ident: bib0010 article-title: Robust K-means++ publication-title: Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI) – volume: 56 start-page: 61 year: 2004 end-page: 87 ident: bib0019 article-title: A k-median algorithm with running time independent of data size publication-title: Mach. Learn. – start-page: 189 year: 2013 end-page: 197 ident: bib0024 article-title: k-means–: a unified approach to clustering and outlier detection publication-title: Proceedings of the 2013 SIAM International Conference on Data Mining – volume: 173 start-page: 38:1 year: 2020 end-page: 38:21 ident: bib0045 article-title: A sub-linear time framework for geometric optimization with outliers in high dimensions publication-title: 28th Annual European Symposium on Algorithms, ESA 2020, September 7–9, 2020, Pisa, Italy (Virtual Conference) – volume: 41 start-page: 15 year: 2009 ident: bib0003 article-title: Anomaly detection: a survey publication-title: ACM Comput. Surv. – start-page: 396 year: 2004 end-page: 407 ident: bib0018 article-title: Sublinear-time approximation for clustering via random sampling publication-title: International Colloquium on Automata, Languages, and Programming – volume: 450 start-page: 230 year: 2021 end-page: 241 ident: bib0025 article-title: A local search algorithm for k-means with outliers publication-title: Neurocomputing – volume: 119 start-page: 2556 year: 2020 end-page: 2566 ident: bib0039 article-title: Layered sampling for robust optimization problems publication-title: Proceedings of the 37th International Conference on Machine Learning, ICML – start-page: 2253 year: 2018 end-page: 2262 ident: bib0012 article-title: A practical algorithm for distributed clustering and outlier detection publication-title: Advances in Neural Information Processing Systems – start-page: 295 year: 2023 end-page: 302 ident: bib0036 article-title: k-median/means with outliers revisited: a simple Fpt approximation publication-title: International Computing and Combinatorics Conference – start-page: 1525 year: 2018 end-page: 1539 ident: bib0016 article-title: Random sampling over joins revisited publication-title: Proceedings of the 2018 International Conference on Management of Data – volume: 33 start-page: 2369 year: 2019 end-page: 2379 ident: bib0034 article-title: Clustering with outlier removal publication-title: IEEE Trans. Knowl. Data Eng. – year: 2020 ident: bib0008 article-title: Fast noise removal for k-means clustering publication-title: Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics – reference: . – year: 2024 ident: bib0013 article-title: Near-linear time approximation algorithms for k-means with outliers publication-title: Forty-first International Conference on Machine Learning, ICML 2024, Vienna, Austria, July 21–27, 2024 – reference: N. Alon, J.H. Spencer, The probabilistic method, 2004. – year: 2020 ident: bib0006 article-title: Deep batch active learning by diverse, uncertain gradient lower bounds publication-title: 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26–30, 2020 – start-page: 439 year: 2001 end-page: 447 ident: bib0021 article-title: Sublinear time approximate clustering publication-title: Proceedings of the Twelfth Annual ACM-SIAM Symposium on Discrete Algorithms – start-page: 826 year: 2008 end-page: 835 ident: bib0031 article-title: A constant factor approximation algorithm for k-median clustering with outliers publication-title: Proceedings of the Nineteenth Annual ACM-SIAM Symposium on Discrete Algorithms – reference: D. Dua, C. Graff, UCI Machine learning repository, 2017. – volume: 15 start-page: 1 year: 2021 ident: 10.1016/j.tcs.2025.115538_bib0005 article-title: A survey of density based clustering algorithms publication-title: Front. Comput. Sci. doi: 10.1007/s11704-019-9059-3 – start-page: 295 year: 2023 ident: 10.1016/j.tcs.2025.115538_bib0036 article-title: k-median/means with outliers revisited: a simple Fpt approximation – year: 2008 ident: 10.1016/j.tcs.2025.115538_bib0048 – year: 2020 ident: 10.1016/j.tcs.2025.115538_bib0008 article-title: Fast noise removal for k-means clustering – start-page: 30 year: 2003 ident: 10.1016/j.tcs.2025.115538_bib0038 article-title: Better streaming algorithms for clustering problems – volume: 450 start-page: 230 year: 2021 ident: 10.1016/j.tcs.2025.115538_bib0025 article-title: A local search algorithm for k-means with outliers publication-title: Neurocomputing doi: 10.1016/j.neucom.2021.04.028 – volume: 58 start-page: 11:1 issue: 3 year: 2011 ident: 10.1016/j.tcs.2025.115538_bib0049 article-title: Robust principal component analysis? publication-title: J. ACM doi: 10.1145/1970392.1970395 – start-page: 642 year: 2001 ident: 10.1016/j.tcs.2025.115538_bib0023 article-title: Algorithms for facility location problems with outliers – volume: 28 start-page: 263 issue: 2 year: 1999 ident: 10.1016/j.tcs.2025.115538_bib0015 article-title: On random sampling over joins publication-title: ACM SIGMOD Rec. doi: 10.1145/304181.304206 – year: 2023 ident: 10.1016/j.tcs.2025.115538_bib0040 article-title: Near-optimal coresets for robust clustering – year: 2018 ident: 10.1016/j.tcs.2025.115538_bib0028 – volume: 124 start-page: 799 year: 2020 ident: 10.1016/j.tcs.2025.115538_bib0010 article-title: Robust K-means++ – year: 2023 ident: 10.1016/j.tcs.2025.115538_bib0042 article-title: The power of uniform sampling for k-median – start-page: 1525 year: 2018 ident: 10.1016/j.tcs.2025.115538_bib0016 article-title: Random sampling over joins revisited – start-page: 33581 year: 2023 ident: 10.1016/j.tcs.2025.115538_bib0027 article-title: Auto-differentiation of relational computations for very large scale machine learning – volume: 56 start-page: 35 issue: 1–3 year: 2004 ident: 10.1016/j.tcs.2025.115538_bib0037 article-title: Optimal time bounds for approximate clustering publication-title: Mach. Learn. doi: 10.1023/B:MACH.0000033114.18632.e0 – start-page: 2253 year: 2018 ident: 10.1016/j.tcs.2025.115538_bib0012 article-title: A practical algorithm for distributed clustering and outlier detection – volume: 119 start-page: 2556 year: 2020 ident: 10.1016/j.tcs.2025.115538_bib0039 article-title: Layered sampling for robust optimization problems – volume: 33 start-page: 2369 issue: 6 year: 2019 ident: 10.1016/j.tcs.2025.115538_bib0034 article-title: Clustering with outlier removal publication-title: IEEE Trans. Knowl. Data Eng. doi: 10.1109/TKDE.2019.2954317 – volume: 34 year: 2021 ident: 10.1016/j.tcs.2025.115538_bib0035 article-title: Uniform concentration bounds toward a unified framework for robust clustering publication-title: Adv. Neural Inf. Process. Syst. – ident: 10.1016/j.tcs.2025.115538_bib0047 – volume: 56 start-page: 61 issue: 1–3 year: 2004 ident: 10.1016/j.tcs.2025.115538_bib0019 article-title: A k-median algorithm with running time independent of data size publication-title: Mach. Learn. doi: 10.1023/B:MACH.0000033115.78247.f0 – start-page: 439 year: 2001 ident: 10.1016/j.tcs.2025.115538_bib0021 article-title: Sublinear time approximate clustering – year: 2020 ident: 10.1016/j.tcs.2025.115538_bib0006 article-title: Deep batch active learning by diverse, uncertain gradient lower bounds – start-page: 826 year: 2008 ident: 10.1016/j.tcs.2025.115538_bib0031 article-title: A constant factor approximation algorithm for k-median clustering with outliers – start-page: 1027 year: 2007 ident: 10.1016/j.tcs.2025.115538_bib0022 article-title: k-means++: the advantages of careful seeding – volume: 32 year: 2019 ident: 10.1016/j.tcs.2025.115538_bib0009 article-title: Greedy sampling for approximate clustering in the presence of outliers – ident: 10.1016/j.tcs.2025.115538_bib0043 – volume: 29 year: 2016 ident: 10.1016/j.tcs.2025.115538_bib0030 article-title: Robust k-means: a theoretical revisit publication-title: Adv. Neural Inf. Process. Syst. – start-page: 55 year: 2016 ident: 10.1016/j.tcs.2025.115538_bib0046 article-title: Fast and provably good seedings for k-means – start-page: 646 year: 2018 ident: 10.1016/j.tcs.2025.115538_bib0032 article-title: Constant approximation for k-median and k-means with outliers via iterative rounding – start-page: 15 year: 2009 ident: 10.1016/j.tcs.2025.115538_bib0044 article-title: Adaptive sampling for k-means clustering – ident: 10.1016/j.tcs.2025.115538_bib0002 – start-page: 428 year: 1999 ident: 10.1016/j.tcs.2025.115538_bib0017 article-title: Sublinear time algorithms for metric space problems – start-page: 7845 year: 2022 ident: 10.1016/j.tcs.2025.115538_bib0011 article-title: Adapting K-means algorithms for outliers – start-page: 321 year: 2009 ident: 10.1016/j.tcs.2025.115538_bib0014 article-title: Algorithm engineering–an attempt at a definition – volume: 25 start-page: 553 issue: 2 year: 1997 ident: 10.1016/j.tcs.2025.115538_bib0029 article-title: Trimmed k-means: an attempt to robustify quantizers publication-title: Ann. Stat. doi: 10.1214/aos/1031833664 – volume: 173 start-page: 38:1 year: 2020 ident: 10.1016/j.tcs.2025.115538_bib0045 article-title: A sub-linear time framework for geometric optimization with outliers in high dimensions – start-page: 226 year: 1996 ident: 10.1016/j.tcs.2025.115538_bib0004 article-title: A density-based algorithm for discovering clusters in large spatial databases with noise – start-page: 569 year: 2011 ident: 10.1016/j.tcs.2025.115538_bib0050 article-title: A unified framework for approximating and clustering data – year: 2024 ident: 10.1016/j.tcs.2025.115538_bib0013 article-title: Near-linear time approximation algorithms for k-means with outliers – start-page: 398 year: 2018 ident: 10.1016/j.tcs.2025.115538_bib0033 article-title: Approximation schemes for clustering with outliers – volume: 31 start-page: 651 issue: 8 year: 2010 ident: 10.1016/j.tcs.2025.115538_bib0001 article-title: Data clustering: 50 years beyond K-means publication-title: Pattern Recognit. Lett. doi: 10.1016/j.patrec.2009.09.011 – volume: 10 start-page: 757 issue: 7 year: 2017 ident: 10.1016/j.tcs.2025.115538_bib0007 article-title: Local search methods for k-means with outliers publication-title: Proc. VLDB Endow. doi: 10.14778/3067421.3067425 – start-page: 189 year: 2013 ident: 10.1016/j.tcs.2025.115538_bib0024 article-title: k-means–: a unified approach to clustering and outlier detection – volume: 41 start-page: 15 issue: 3 year: 2009 ident: 10.1016/j.tcs.2025.115538_bib0003 article-title: Anomaly detection: a survey publication-title: ACM Comput. Surv. doi: 10.1145/1541880.1541882 – start-page: 462 year: 2022 ident: 10.1016/j.tcs.2025.115538_bib0041 article-title: The power of uniform sampling for coresets – start-page: 396 year: 2004 ident: 10.1016/j.tcs.2025.115538_bib0018 article-title: Sublinear-time approximation for clustering via random sampling – start-page: 814 year: 2018 ident: 10.1016/j.tcs.2025.115538_bib0020 article-title: Epsilon-coresets for clustering (with outliers) in doubling metrics – volume: 198 year: 2021 ident: 10.1016/j.tcs.2025.115538_bib0026 article-title: Relational algorithms for k-means clustering |
| SSID | ssj0000576 |
| Score | 2.459046 |
| Snippet | •Research highlight 1 This paper introduces a novel uniform sampling framework for solving k-means/ median clustering with outliers problems. The key... |
| SourceID | crossref elsevier |
| SourceType | Index Database Publisher |
| StartPage | 115538 |
| SubjectTerms | Approximation algorithm Clustering Outliers Sampling Sublinear |
| Title | Bi-criteria sublinear time algorithms for clustering with outliers in high dimensions |
| URI | https://dx.doi.org/10.1016/j.tcs.2025.115538 |
| Volume | 1057 |
| WOSCitedRecordID | wos001578144500001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVESC databaseName: Elsevier SD Freedom Collection Journals 2021 issn: 0304-3975 databaseCode: AIEXJ dateStart: 20211214 customDbUrl: isFulltext: true dateEnd: 99991231 titleUrlDefault: https://www.sciencedirect.com omitProxy: false ssIdentifier: ssj0000576 providerName: Elsevier |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Lb9QwELZQywEOPAqI8pIPnIiCQmKv7WOBolJBhUQr7S1y_KBZVWnV7NL-_M7EjrNqQQIkLlaUhxPNN5qMxzPfEPJaKMH1TOvcSgcLFC9cDn6yzWe2KrVnVko2FAp_EQcHcj5X32Kbu35oJyC6Tl5eqrP_CjWcA7CxdPYv4E6Twgk4BtBhBNhh_CPg37c5WAKkYNZZj0GmDrl6sId8pk9-nMKl40DCkJmTFdIkpHAsJgdhZ2yMgSCNcWaR-r9PEb1F0qxU-2hiU4gs_kknLYlx6P1WX7g2pf20qyGpz3WLNt37MbZV2VuthyBKPqRzzKa42I3amFCPhXsuKvRFSba2CHTUNwx3iCEs3i4NcqiXHEw554H35Rof9necGOcF561gApe-m6XgCqzy5s7n3fn-9CPmImxVxw8ZN7WH9L5rL_q1W7Lmahw-IPfiGoHuBGwfkluu2yL3x_4bNJrjLXL3a-Lc7R-RozXgaQKeIvB0Ap4C8HQCniLwdASeth1F4OkE_GNy9Gn38MNeHptm5AZc1WVe-UqDi2x1VTXMS-lV0xTsnZCWN6o0jjvmVVlUylvNlPHaFcYqL70xjFXCVk_IRnfauaeEanCduWgMeOkFcyWXs6ZxsjRSG1sVym2TN6PM6rPAjVKPSYOLGgRco4DrIOBtwkap1lElg9NWgwr8_rFn__bYc3JnUtQXZGN5vnIvyW3zc9n256-iolwBHn12hg |
| linkProvider | Elsevier |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Bi-criteria+sublinear+time+algorithms+for+clustering+with+outliers+in+high+dimensions&rft.jtitle=Theoretical+computer+science&rft.au=Huang%2C+Jiawei&rft.au=Liu%2C+Wenjie&rft.au=Ding%2C+Hu&rft.date=2025-12-06&rft.pub=Elsevier+B.V&rft.issn=0304-3975&rft.volume=1057&rft_id=info:doi/10.1016%2Fj.tcs.2025.115538&rft.externalDocID=S0304397525004761 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0304-3975&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0304-3975&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0304-3975&client=summon |