Co-design Hardware and Algorithm for Vector Search
Vector search has emerged as the foundation for large-scale information retrieval and machine learning systems, with search engines like Google and Bing processing tens of thousands of queries per second on petabyte-scale document datasets by evaluating vector similarities between encoded query text...
Uloženo v:
| Vydáno v: | International Conference for High Performance Computing, Networking, Storage and Analysis (Online) s. 1 - 16 |
|---|---|
| Hlavní autoři: | , , , , , , , , , , |
| Médium: | Konferenční příspěvek |
| Jazyk: | angličtina |
| Vydáno: |
ACM
11.11.2023
|
| Témata: | |
| ISSN: | 2167-4337 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | Vector search has emerged as the foundation for large-scale information retrieval and machine learning systems, with search engines like Google and Bing processing tens of thousands of queries per second on petabyte-scale document datasets by evaluating vector similarities between encoded query texts and web documents. As performance demands for vector search systems surge, accelerated hardware offers a promising solution in the post-Moore's Law era. We introduce FANNS, an end-to-end and scalable vector search framework on FPGAs. Given a user-provided recall requirement on a dataset and a hardware resource budget, FANNS automatically co-designs hardware and algorithm, subsequently generating the corresponding accelerator. The framework also supports scale-out by incorporating a hardware TCP/IP stack in the accelerator. FANNS attains up to 23.0× and 37.2× speedup compared to FPGA and CPU baselines, respectively, and demonstrates superior scalability to GPUs, achieving 5.5× and 7.6× speedup in median and 95 th per-centile (P95) latency within an eight-accelerator configuration. The remarkable performance of FANNS lays a robust groundwork for future FPGA integration in data centers and AI supercomputers. |
|---|---|
| AbstractList | Vector search has emerged as the foundation for large-scale information retrieval and machine learning systems, with search engines like Google and Bing processing tens of thousands of queries per second on petabyte-scale document datasets by evaluating vector similarities between encoded query texts and web documents. As performance demands for vector search systems surge, accelerated hardware offers a promising solution in the post-Moore's Law era. We introduce FANNS, an end-to-end and scalable vector search framework on FPGAs. Given a user-provided recall requirement on a dataset and a hardware resource budget, FANNS automatically co-designs hardware and algorithm, subsequently generating the corresponding accelerator. The framework also supports scale-out by incorporating a hardware TCP/IP stack in the accelerator. FANNS attains up to 23.0× and 37.2× speedup compared to FPGA and CPU baselines, respectively, and demonstrates superior scalability to GPUs, achieving 5.5× and 7.6× speedup in median and 95 th per-centile (P95) latency within an eight-accelerator configuration. The remarkable performance of FANNS lays a robust groundwork for future FPGA integration in data centers and AI supercomputers. |
| Author | Li, Shigang Alonso, Gustavo Hoefler, Torsten Renggli, Cedric Zhu, Yu Zhang, Shuai Rekatsinas, Theodoros Jiang, Wenqi de Fine Licht, Johannes He, Zhenhao Shi, Runbin |
| Author_xml | – sequence: 1 givenname: Wenqi surname: Jiang fullname: Jiang, Wenqi email: wenqi.jiang@inf.ethz.ch organization: Systems Group, ETH Zurich – sequence: 2 givenname: Shigang surname: Li fullname: Li, Shigang organization: SPCL, ETH Zurich – sequence: 3 givenname: Yu surname: Zhu fullname: Zhu, Yu organization: Systems Group, ETH Zurich – sequence: 4 givenname: Johannes surname: de Fine Licht fullname: de Fine Licht, Johannes organization: SPCL, ETH Zurich – sequence: 5 givenname: Zhenhao surname: He fullname: He, Zhenhao organization: Systems Group, ETH Zurich – sequence: 6 givenname: Runbin surname: Shi fullname: Shi, Runbin organization: Systems Group, ETH Zurich – sequence: 7 givenname: Cedric surname: Renggli fullname: Renggli, Cedric organization: Systems Group, ETH Zurich – sequence: 8 givenname: Shuai surname: Zhang fullname: Zhang, Shuai organization: Systems Group, ETH Zurich – sequence: 9 givenname: Theodoros surname: Rekatsinas fullname: Rekatsinas, Theodoros organization: Systems Group, ETH Zurich – sequence: 10 givenname: Torsten surname: Hoefler fullname: Hoefler, Torsten organization: SPCL, ETH Zurich – sequence: 11 givenname: Gustavo surname: Alonso fullname: Alonso, Gustavo organization: Systems Group, ETH Zurich |
| BookMark | eNotjE1Lw0AQQFdRsNacvXjIH0jd2ZlkN8cS1AoFD35cy2R3to20iWwC4r83oKfHe4d3rS76oRelbkGvAKi8x9KBdbTCSltN5ZnKals70rOBrs25WhiobEGI9kpl4_iptUajCZxeKNMMRZCx2_f5hlP45iQ59yFfH_dD6qbDKY9Dyj_ETzNehZM_3KjLyMdRsn8u1fvjw1uzKbYvT8_NelswUjUVzoprKwZGSyW60PoaDYFwSRE5BrKtoK8CtnGO3hpsvfgQ0Xgbfc24VHd_305Edl-pO3H62YEmVwIB_gL3TEbJ |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1145/3581784.3607045 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library Online IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISBN | 9798400701092 |
| EISSN | 2167-4337 |
| EndPage | 16 |
| ExternalDocumentID | 10485141 |
| Genre | orig-research |
| GroupedDBID | 6IE 6IF 6IH 6IK 6IL 6IN AAWTH ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IPLJI OCL RIE RIL |
| ID | FETCH-LOGICAL-a346t-87e8b6a1a374538dbc93241ea54f3afd47be3c6d3bf1eac723bcecdf32c7fc9a3 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 8 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001461755900074&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| IngestDate | Wed Aug 27 02:09:33 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | false |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-a346t-87e8b6a1a374538dbc93241ea54f3afd47be3c6d3bf1eac723bcecdf32c7fc9a3 |
| PageCount | 16 |
| ParticipantIDs | ieee_primary_10485141 |
| PublicationCentury | 2000 |
| PublicationDate | 2023-Nov.-11 |
| PublicationDateYYYYMMDD | 2023-11-11 |
| PublicationDate_xml | – month: 11 year: 2023 text: 2023-Nov.-11 day: 11 |
| PublicationDecade | 2020 |
| PublicationTitle | International Conference for High Performance Computing, Networking, Storage and Analysis (Online) |
| PublicationTitleAbbrev | SC |
| PublicationYear | 2023 |
| Publisher | ACM |
| Publisher_xml | – name: ACM |
| SSID | ssj0003204180 ssib053141430 |
| Score | 1.980188 |
| Snippet | Vector search has emerged as the foundation for large-scale information retrieval and machine learning systems, with search engines like Google and Bing... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 1 |
| SubjectTerms | Approximate nearest neighbor search Clustering algorithms Data centers FPGA Hardware hardware acceleration Search engines TCPIP Training Vectors |
| Title | Co-design Hardware and Algorithm for Vector Search |
| URI | https://ieeexplore.ieee.org/document/10485141 |
| WOSCitedRecordID | wos001461755900074&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV07T8MwELagYmAqjyLe8sCaEvscOxlRRcWAqg5Qdav8hEqQoJDC38d2EmBhYLNu8uNe8t33HUJXmVFAQeqECBlItQueFKmEhFOXKuEo4bHbYnEvZrN8uSzmHVg9YmGstbH5zI7DMtbyTaU34avMWzjzCUKAqW8LIVqwVq88XpcY6anEgxsGmjKSpx2dD2HZdaD6EjkbA_d6HvBLv-apxHAyHf5zI3to9APMw_PvkLOPtmx5gIb9ZAbcGeohopMqMbE3A4fS_KesLZalwTcvT1W9bp5fsc9V8SL-2OO243iEHqe3D5O7pJuOkEhgvPFuzOaKSyJBMO-1jNI-FWPEyow5kM4woSxobkA5L9SCgtJWGwdUC6cLCUdoUFalPUbYCUWIC0VO42045YVTWgLRUrLMebM8QaNwB6u3lgBj1R__9A_5GdoNU9kDZI-QczRo6o29QDv6o1m_15fx2b4AsWCWjQ |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV05T8MwFLZQQYKpHEXceGBNiY_YyYgqqiJK1aFU3SqfUAkSFFL4-9hOAiwMbNabfLxLfu_7HgBXiZYEE6EixIUn1c5YlMWCRAzbWHKLEQvdFvMxn0zSxSKbNmD1gIUxxoTmM9P3y1DL14Va-68yZ-HUJQgepr6ZUIpRDddq1cdpE0Utmbh3xATHFKVxQ-iDaHLtyb54SvuEOU33CKZfE1VCQBl2_7mVXdD7gebB6XfQ2QMbJt8H3XY2A2xM9QDgQRHp0J0BfXH-U5QGilzDm5enolxVz6_QZatwHv7sYd1z3AOPw9vZYBQ18xEiQSirnCMzqWQCCcKp81taKpeMUWREQi0RVlMuDVFME2mdUHFMpDJKW4IVtyoT5BB08iI3RwBaLhGyvsypnRXHLLNSCYKUEDSxzjCPQc_fwfKtpsBYtsc_-UN-CbZHs4fxcnw3uT8FO35GuwfwIXQGOlW5NudgS31Uq_fyIjzhF7lpmdQ |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=International+Conference+for+High+Performance+Computing%2C+Networking%2C+Storage+and+Analysis+%28Online%29&rft.atitle=Co-design+Hardware+and+Algorithm+for+Vector+Search&rft.au=Jiang%2C+Wenqi&rft.au=Li%2C+Shigang&rft.au=Zhu%2C+Yu&rft.au=de+Fine+Licht%2C+Johannes&rft.date=2023-11-11&rft.pub=ACM&rft.eissn=2167-4337&rft.spage=1&rft.epage=16&rft_id=info:doi/10.1145%2F3581784.3607045&rft.externalDocID=10485141 |