NDSEARCH: Accelerating Graph-Traversal-Based Approximate Nearest Neighbor Search through Near Data Processing

Approximate nearest neighbor search (ANNS) is a key retrieval technique for vector database and many data center applications, such as person re-identification and recommendation systems. It is also fundamental to retrieval augmented generation (RAG) for large language models (LLM) now. Among all th...

Full description

Saved in:
Bibliographic Details
Published in:2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA) pp. 368 - 381
Main Authors: Wang, Yitu, Li, Shiyu, Zheng, Qilin, Song, Linghao, Li, Zongwang, Chang, Andrew, Li, Hai lHelenr, Chen, Yiran
Format: Conference Proceeding
Language:English
Published: IEEE 29.06.2024
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract Approximate nearest neighbor search (ANNS) is a key retrieval technique for vector database and many data center applications, such as person re-identification and recommendation systems. It is also fundamental to retrieval augmented generation (RAG) for large language models (LLM) now. Among all the ANNS algorithms, graph-traversal-based ANNS achieves the highest recall rate. However, as the size of dataset increases, the graph may require hundreds of gigabytes of memory, exceeding the main memory capacity of a single workstation node. Although we can do partitioning and use solid-state drive (SSD) as the backing storage, the limited SSD I/O bandwidth severely degrades the performance of the system. To address this challenge, we present NDSEARCh, a hardware-software co-designed near-data processing (NDP) solution for ANNS processing. NDSeARCH consists of a novel in-storage computing architecture, namely, SEARSSD, that supports the ANNS kernels and leverages logic unit (LUN)-level parallelism inside the NAND flash chips. NDSEARCH also includes a processing model that is customized for NDP and cooperates with SearSSD. The processing model enables us to apply a two-level scheduling to improve the data locality and exploit the internal bandwidth in NDSearch, and a speculative searching mechanism to further accelerate the ANNS workload. Our results show that NDSEARCH improves the throughput by up to 31.7 \times, 14.6 \times, 7.4 \times 2.9 \times over CPU, GPU, a state-of-the-art SmartSSD-only design, and DeepStore, respectively. NDSEARCH also achieves two orders-of-magnitude higher energy efficiency than CPU and GPU.
AbstractList Approximate nearest neighbor search (ANNS) is a key retrieval technique for vector database and many data center applications, such as person re-identification and recommendation systems. It is also fundamental to retrieval augmented generation (RAG) for large language models (LLM) now. Among all the ANNS algorithms, graph-traversal-based ANNS achieves the highest recall rate. However, as the size of dataset increases, the graph may require hundreds of gigabytes of memory, exceeding the main memory capacity of a single workstation node. Although we can do partitioning and use solid-state drive (SSD) as the backing storage, the limited SSD I/O bandwidth severely degrades the performance of the system. To address this challenge, we present NDSEARCh, a hardware-software co-designed near-data processing (NDP) solution for ANNS processing. NDSeARCH consists of a novel in-storage computing architecture, namely, SEARSSD, that supports the ANNS kernels and leverages logic unit (LUN)-level parallelism inside the NAND flash chips. NDSEARCH also includes a processing model that is customized for NDP and cooperates with SearSSD. The processing model enables us to apply a two-level scheduling to improve the data locality and exploit the internal bandwidth in NDSearch, and a speculative searching mechanism to further accelerate the ANNS workload. Our results show that NDSEARCH improves the throughput by up to 31.7 \times, 14.6 \times, 7.4 \times 2.9 \times over CPU, GPU, a state-of-the-art SmartSSD-only design, and DeepStore, respectively. NDSEARCH also achieves two orders-of-magnitude higher energy efficiency than CPU and GPU.
Author Wang, Yitu
Li, Shiyu
Song, Linghao
Li, Hai lHelenr
Chen, Yiran
Zheng, Qilin
Chang, Andrew
Li, Zongwang
Author_xml – sequence: 1
  givenname: Yitu
  surname: Wang
  fullname: Wang, Yitu
  email: yitu.wang@duke.edu
  organization: Duke University,Durham,North Carolina,USA
– sequence: 2
  givenname: Shiyu
  surname: Li
  fullname: Li, Shiyu
  email: shiyu.li@duke.edu
  organization: Duke University,Durham,North Carolina,USA
– sequence: 3
  givenname: Qilin
  surname: Zheng
  fullname: Zheng, Qilin
  email: qilin.zheng@duke.edu
  organization: Duke University,Durham,North Carolina,USA
– sequence: 4
  givenname: Linghao
  surname: Song
  fullname: Song, Linghao
  email: linghaosong@cs.ucla.edu
  organization: University of California,Los Angeles,California,USA
– sequence: 5
  givenname: Zongwang
  surname: Li
  fullname: Li, Zongwang
  email: zongwang.li@samsung.com
  organization: Samsung Semiconductor, Inc.,San Jose,California,USA
– sequence: 6
  givenname: Andrew
  surname: Chang
  fullname: Chang, Andrew
  email: andrew.c1@samsung.com
  organization: Samsung Semiconductor, Inc.,San Jose,California,USA
– sequence: 7
  givenname: Hai lHelenr
  surname: Li
  fullname: Li, Hai lHelenr
  email: hai.li@duke.edu
  organization: Duke University,Durham,North Carolina,USA
– sequence: 8
  givenname: Yiran
  surname: Chen
  fullname: Chen, Yiran
  email: yiran.chen@duke.edu
  organization: Duke University,Durham,North Carolina,USA
BookMark eNotj8lOwzAURY0EElD6B134B1Ke7Tix2YW0tJWqgmhZV8_JyyC1SeQEBH9PGFZHule6wy27bNqGGJsJmAsB9n6zTxNtIY7nEmQ4BwClL9jUxtYoDUpG2ohrNu372kEENlax0TfsvFvsl8lrun7gSZbRiTwOdVPylceuCg4eP8j3eAoesaecJ13n28_6jAPxHaGnfhhZl5VrPd-PQlbxofLte1n9-nyBA_IX32Y09jblHbsq8NTT9J8T9va0PKTrYPu82qTJNkCpzRBE2mEYZUWuCnSFyxXaTObSIRAaF4eAsQydNUYAFTJChVhkCg0qAFKhVRM2-8utiejY-XGx_zqKn9-R0Oob0EhbrQ
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
RIE
RIO
DOI 10.1109/ISCA59077.2024.00035
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 9798350326581
EndPage 381
ExternalDocumentID 10609615
Genre orig-research
GroupedDBID 6IE
6IH
ACM
ALMA_UNASSIGNED_HOLDINGS
CBEJK
RIE
RIO
ID FETCH-LOGICAL-a258t-65ba46cfd3fabfbd3a9c2d2ba0ea8b740a724b98810ef26a3aafc3a8a300e3493
IEDL.DBID RIE
ISICitedReferencesCount 3
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001290320700025&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Aug 27 02:35:02 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a258t-65ba46cfd3fabfbd3a9c2d2ba0ea8b740a724b98810ef26a3aafc3a8a300e3493
PageCount 14
ParticipantIDs ieee_primary_10609615
PublicationCentury 2000
PublicationDate 2024-June-29
PublicationDateYYYYMMDD 2024-06-29
PublicationDate_xml – month: 06
  year: 2024
  text: 2024-June-29
  day: 29
PublicationDecade 2020
PublicationTitle 2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)
PublicationTitleAbbrev ISCA
PublicationYear 2024
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssib060973785
Score 2.3245342
Snippet Approximate nearest neighbor search (ANNS) is a key retrieval technique for vector database and many data center applications, such as person re-identification...
SourceID ieee
SourceType Publisher
StartPage 368
SubjectTerms Approximate Nearest Neighbor Search
Bandwidth
Computational modeling
Graphics processing units
Hardware/Software Co-Design
Memory management
Near Data Processing
Nearest neighbor methods
Parallel processing
Throughput
Title NDSEARCH: Accelerating Graph-Traversal-Based Approximate Nearest Neighbor Search through Near Data Processing
URI https://ieeexplore.ieee.org/document/10609615
WOSCitedRecordID wos001290320700025&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3LSsNAFB1sceFKxYpvZuF2NJ0k83BXW18goWCF7sqdlxS0lTYVP987SapuXLjKkCEZmDvhzsnccw4h584I7xBYYASUZhlwwyC3gnEdqwBV1nWmksx_lEWhxmM9bMjqFRfGe18Vn_mL2KzO8t3cruKvMvzCRXQoyVukJaWsyVrrxSOi7oxUeUOPw5EuH576vRzBn0QYyKNIdhJN3X6ZqFQ55Hb7n6PvkM4PG48Ov_PMLtnwsz3yVgzqk54r2rMWc0eM5OyF3kUBajaKpkKLJbyya8xSjvaicvjnFHennhZRtXZZ4hVxOa4AWlcc08axp-qnAyiBNiQCfG2HPN_ejPr3rLFOYMBzVTKRG8iEDS4NYIJxKWjLHTeQeFBGZglInhmtVDfxgQtIAYJNQUGaJD7NdLpP2rP5zB8QyqXw-Jg3IZhMZl4L7bywFnFeQHDZPSSdOFeT91odY7KepqM_7h-TrRiOWG7F9Qlpl4uVPyWb9qOcLhdnVUy_AL2MpD4
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3PS8MwGA06BT2pOPG3OXiNtmmaJt7m5txwloETdhtfflQG2snWiX--SdupFw-eGhraQL6UL6_53nsIXRrFrXHAwkVASMKAKgKx5oRKXwUoWGhUKZk_SNJUjMdyWJPVSy6MtbYsPrNXvlme5ZuZXvpfZe4L596hJF5HGzFjNKzoWqvlw73yTCLimiDnxrruP7VbsYN_iQOC1MtkB97W7ZeNSplFujv_HH8XNX_4eHj4nWn20JrN99Fb2qnOem5wS2uXPXws8xd87yWoycjbCs0X8EpuXZ4yuOW1wz-nbn9qcep1axeFuzpk7tYArmqOce3ZU_bjDhSAaxqBe20TPXfvRu0eqc0TCNBYFITHChjXmYkyUJkyEUhNDVUQWBAqYQEklCkpRBjYjHKIADIdgYAoCGzEZHSAGvkst4cI04Rb95hVWaZYwqzk0liutUN6mYOX4RFq-rmavFf6GJPVNB3_cf8CbfVGj4PJoJ8-nKBtHxpffEXlKWoU86U9Q5v6o5gu5udlfL8AdCWnhQ
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2024+ACM%2FIEEE+51st+Annual+International+Symposium+on+Computer+Architecture+%28ISCA%29&rft.atitle=NDSEARCH%3A+Accelerating+Graph-Traversal-Based+Approximate+Nearest+Neighbor+Search+through+Near+Data+Processing&rft.au=Wang%2C+Yitu&rft.au=Li%2C+Shiyu&rft.au=Zheng%2C+Qilin&rft.au=Song%2C+Linghao&rft.date=2024-06-29&rft.pub=IEEE&rft.spage=368&rft.epage=381&rft_id=info:doi/10.1109%2FISCA59077.2024.00035&rft.externalDocID=10609615