NDSEARCH: Accelerating Graph-Traversal-Based Approximate Nearest Neighbor Search through Near Data Processing
Approximate nearest neighbor search (ANNS) is a key retrieval technique for vector database and many data center applications, such as person re-identification and recommendation systems. It is also fundamental to retrieval augmented generation (RAG) for large language models (LLM) now. Among all th...
Uloženo v:
| Vydáno v: | 2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA) s. 368 - 381 |
|---|---|
| Hlavní autoři: | , , , , , , , |
| Médium: | Konferenční příspěvek |
| Jazyk: | angličtina |
| Vydáno: |
IEEE
29.06.2024
|
| Témata: | |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | Approximate nearest neighbor search (ANNS) is a key retrieval technique for vector database and many data center applications, such as person re-identification and recommendation systems. It is also fundamental to retrieval augmented generation (RAG) for large language models (LLM) now. Among all the ANNS algorithms, graph-traversal-based ANNS achieves the highest recall rate. However, as the size of dataset increases, the graph may require hundreds of gigabytes of memory, exceeding the main memory capacity of a single workstation node. Although we can do partitioning and use solid-state drive (SSD) as the backing storage, the limited SSD I/O bandwidth severely degrades the performance of the system. To address this challenge, we present NDSEARCh, a hardware-software co-designed near-data processing (NDP) solution for ANNS processing. NDSeARCH consists of a novel in-storage computing architecture, namely, SEARSSD, that supports the ANNS kernels and leverages logic unit (LUN)-level parallelism inside the NAND flash chips. NDSEARCH also includes a processing model that is customized for NDP and cooperates with SearSSD. The processing model enables us to apply a two-level scheduling to improve the data locality and exploit the internal bandwidth in NDSearch, and a speculative searching mechanism to further accelerate the ANNS workload. Our results show that NDSEARCH improves the throughput by up to 31.7 \times, 14.6 \times, 7.4 \times 2.9 \times over CPU, GPU, a state-of-the-art SmartSSD-only design, and DeepStore, respectively. NDSEARCH also achieves two orders-of-magnitude higher energy efficiency than CPU and GPU. |
|---|---|
| AbstractList | Approximate nearest neighbor search (ANNS) is a key retrieval technique for vector database and many data center applications, such as person re-identification and recommendation systems. It is also fundamental to retrieval augmented generation (RAG) for large language models (LLM) now. Among all the ANNS algorithms, graph-traversal-based ANNS achieves the highest recall rate. However, as the size of dataset increases, the graph may require hundreds of gigabytes of memory, exceeding the main memory capacity of a single workstation node. Although we can do partitioning and use solid-state drive (SSD) as the backing storage, the limited SSD I/O bandwidth severely degrades the performance of the system. To address this challenge, we present NDSEARCh, a hardware-software co-designed near-data processing (NDP) solution for ANNS processing. NDSeARCH consists of a novel in-storage computing architecture, namely, SEARSSD, that supports the ANNS kernels and leverages logic unit (LUN)-level parallelism inside the NAND flash chips. NDSEARCH also includes a processing model that is customized for NDP and cooperates with SearSSD. The processing model enables us to apply a two-level scheduling to improve the data locality and exploit the internal bandwidth in NDSearch, and a speculative searching mechanism to further accelerate the ANNS workload. Our results show that NDSEARCH improves the throughput by up to 31.7 \times, 14.6 \times, 7.4 \times 2.9 \times over CPU, GPU, a state-of-the-art SmartSSD-only design, and DeepStore, respectively. NDSEARCH also achieves two orders-of-magnitude higher energy efficiency than CPU and GPU. |
| Author | Wang, Yitu Li, Shiyu Song, Linghao Li, Hai lHelenr Chen, Yiran Zheng, Qilin Chang, Andrew Li, Zongwang |
| Author_xml | – sequence: 1 givenname: Yitu surname: Wang fullname: Wang, Yitu email: yitu.wang@duke.edu organization: Duke University,Durham,North Carolina,USA – sequence: 2 givenname: Shiyu surname: Li fullname: Li, Shiyu email: shiyu.li@duke.edu organization: Duke University,Durham,North Carolina,USA – sequence: 3 givenname: Qilin surname: Zheng fullname: Zheng, Qilin email: qilin.zheng@duke.edu organization: Duke University,Durham,North Carolina,USA – sequence: 4 givenname: Linghao surname: Song fullname: Song, Linghao email: linghaosong@cs.ucla.edu organization: University of California,Los Angeles,California,USA – sequence: 5 givenname: Zongwang surname: Li fullname: Li, Zongwang email: zongwang.li@samsung.com organization: Samsung Semiconductor, Inc.,San Jose,California,USA – sequence: 6 givenname: Andrew surname: Chang fullname: Chang, Andrew email: andrew.c1@samsung.com organization: Samsung Semiconductor, Inc.,San Jose,California,USA – sequence: 7 givenname: Hai lHelenr surname: Li fullname: Li, Hai lHelenr email: hai.li@duke.edu organization: Duke University,Durham,North Carolina,USA – sequence: 8 givenname: Yiran surname: Chen fullname: Chen, Yiran email: yiran.chen@duke.edu organization: Duke University,Durham,North Carolina,USA |
| BookMark | eNotj8lOwzAURY0EElD6B134B1Ke7Tix2YW0tJWqgmhZV8_JyyC1SeQEBH9PGFZHule6wy27bNqGGJsJmAsB9n6zTxNtIY7nEmQ4BwClL9jUxtYoDUpG2ohrNu372kEENlax0TfsvFvsl8lrun7gSZbRiTwOdVPylceuCg4eP8j3eAoesaecJ13n28_6jAPxHaGnfhhZl5VrPd-PQlbxofLte1n9-nyBA_IX32Y09jblHbsq8NTT9J8T9va0PKTrYPu82qTJNkCpzRBE2mEYZUWuCnSFyxXaTObSIRAaF4eAsQydNUYAFTJChVhkCg0qAFKhVRM2-8utiejY-XGx_zqKn9-R0Oob0EhbrQ |
| CODEN | IEEPAD |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IH CBEJK RIE RIO |
| DOI | 10.1109/ISCA59077.2024.00035 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP) 1998-present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| EISBN | 9798350326581 |
| EndPage | 381 |
| ExternalDocumentID | 10609615 |
| Genre | orig-research |
| GroupedDBID | 6IE 6IH ACM ALMA_UNASSIGNED_HOLDINGS CBEJK RIE RIO |
| ID | FETCH-LOGICAL-a258t-65ba46cfd3fabfbd3a9c2d2ba0ea8b740a724b98810ef26a3aafc3a8a300e3493 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 3 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001290320700025&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| IngestDate | Wed Aug 27 02:35:02 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | true |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-a258t-65ba46cfd3fabfbd3a9c2d2ba0ea8b740a724b98810ef26a3aafc3a8a300e3493 |
| PageCount | 14 |
| ParticipantIDs | ieee_primary_10609615 |
| PublicationCentury | 2000 |
| PublicationDate | 2024-June-29 |
| PublicationDateYYYYMMDD | 2024-06-29 |
| PublicationDate_xml | – month: 06 year: 2024 text: 2024-June-29 day: 29 |
| PublicationDecade | 2020 |
| PublicationTitle | 2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA) |
| PublicationTitleAbbrev | ISCA |
| PublicationYear | 2024 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| SSID | ssib060973785 |
| Score | 2.3245342 |
| Snippet | Approximate nearest neighbor search (ANNS) is a key retrieval technique for vector database and many data center applications, such as person re-identification... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 368 |
| SubjectTerms | Approximate Nearest Neighbor Search Bandwidth Computational modeling Graphics processing units Hardware/Software Co-Design Memory management Near Data Processing Nearest neighbor methods Parallel processing Throughput |
| Title | NDSEARCH: Accelerating Graph-Traversal-Based Approximate Nearest Neighbor Search through Near Data Processing |
| URI | https://ieeexplore.ieee.org/document/10609615 |
| WOSCitedRecordID | wos001290320700025&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09T8MwELVoxcAEiCK-5YHV4NqOP9hKS4ElqkSRulVnx0aVoEVtivj52EkKLAxMiWIpke6cPF987z2ELgEyGZiixIA0RIBVBIJixIUiBOEsyCAqswmV53oyMaOGrF5xYbz3VfOZv0qn1V5-sXDr9KssvuEyOZRkLdRSStVkrc3kkUl3Rumsocd1qbl-fOr3slj8qVgGsiSSTZOp2y8TlQpDhrv_fPoe6vyw8fDoG2f20ZafH6C3fFDv9NzgnnMRO1Im5y_4PglQk3EyFVqu4JXcRpQqcC8ph3_O4urU4zyp1q7KeIx1eZwBuO44xo1jTzWOB1ACbkgE8bYd9Dy8G_cfSGOdQIBluiQysyBkDDgPYIMtOBjHCmaBetBWCQqKCWu07lIfmAQOEBwHDZxSz4Xhh6g9X8z9EcLOB4jLQNf1ReqhEVpE4KMFjx9J6r2Rx6iTYjV9r9Uxppswnfxx_RTtpHSkditmzlC7XK79Odp2H-VstbyocvoFZpmktw |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3PS8MwFA46BT2pOPG3OXiNpmmaJt7m5txwloETdhuvaSID7WTrxD_fpO3UiwdPLQ208F7aL6953_chdAkQCctiShQIRTikMQEbM6JtZi3XKQjLS7OJOEnkeKyGNVm95MIYY8rmM3PlT8u9_Gyml_5XmXvDhXcoidbRRsQ5Cyq61mr6CK88E8uoJsgFVF33n9qtyJV_sSsEmZfJpt7W7ZeNSoki3Z1_Pn8XNX_4eHj4jTR7aM3k--gt6VR7PTe4pbVDD5_L_AXfewlqMvK2QvMFvJJbh1MZbnnt8M-pW58anHjd2kXhjq4yd3MAVz3HuPbsKcdxBwrANY3A3baJnrt3o3aP1OYJBFgkCyKiFLhwIQ8tpDbNQlCaZSwFakCmMacQM54qKQNqLBMQAlgdgoSQUhNyFR6gRj7LzSHC2lhwC0EdmMx30XDJHfTRLHSfSWqMEkeo6WM1ea_0MSarMB3_cf0CbfVGj4PJoJ88nKBtnxrffMXUKWoU86U5Q5v6o5gu5udlfr8AJayn_g |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2024+ACM%2FIEEE+51st+Annual+International+Symposium+on+Computer+Architecture+%28ISCA%29&rft.atitle=NDSEARCH%3A+Accelerating+Graph-Traversal-Based+Approximate+Nearest+Neighbor+Search+through+Near+Data+Processing&rft.au=Wang%2C+Yitu&rft.au=Li%2C+Shiyu&rft.au=Zheng%2C+Qilin&rft.au=Song%2C+Linghao&rft.date=2024-06-29&rft.pub=IEEE&rft.spage=368&rft.epage=381&rft_id=info:doi/10.1109%2FISCA59077.2024.00035&rft.externalDocID=10609615 |