Write-Avoiding Algorithms
Communication, i.e., moving data between levels of a memory hierarchy or between processors over a network, is much more expensive (in time or energy) than arithmetic. There has thus been a recent focus on designing algorithms that minimize communication and, when possible, attain lower bounds on th...
Uloženo v:
| Vydáno v: | Proceedings - IEEE International Parallel and Distributed Processing Symposium s. 648 - 658 |
|---|---|
| Hlavní autoři: | , , , , , , |
| Médium: | Konferenční příspěvek |
| Jazyk: | angličtina |
| Vydáno: |
IEEE
01.05.2016
|
| Témata: | |
| ISSN: | 1530-2075 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | Communication, i.e., moving data between levels of a memory hierarchy or between processors over a network, is much more expensive (in time or energy) than arithmetic. There has thus been a recent focus on designing algorithms that minimize communication and, when possible, attain lower bounds on the total number of reads and writes. However, most previous work does not distinguish between the costs of reads and writes. Writes can be much more expensive than reads in some current and emerging storage devices such as nonvolatile memories. This motivates us to ask whether there are lower bounds on the number of writes that certain algorithms must perform, and whether these bounds are asymptotically smaller than bounds on the sum of reads and writes together. When these smaller lower bounds exist, we then ask when they are attainable, we call such algorithms "write-avoiding" (WA), to distinguish them from "communication-avoiding" (CA) algorithms, which only minimize the sum of reads and writes. We identify a number of cases in linear algebra and direct N-body methods where known CA algorithms are also WA (some are and some aren't). We also identify classes of algorithms, including Strassen's matrix multiplication, Cooley-Tukey FFT, and cache oblivious algorithms for classical linear algebra, where a WA algorithm cannot exist: the number of writes is unavoidably within a constant factor of the total number of reads and writes. We explore the interaction of WA algorithms with cache replacement policies and argue that the Least Recently Used policy works well with the WA algorithms in this paper. We provide empirical hardware counter measurements from Intel's Nehalem-EX microarchitecture to validate our theory. In the parallel case, for classical linear algebra, we show that it is impossible to attain lower bounds both on interprocessor communication and on writes to local memory, but either one is attainable by itself. Finally, we discuss WA algorithms for sparse iterative linear algebra. |
|---|---|
| AbstractList | Communication, i.e., moving data between levels of a memory hierarchy or between processors over a network, is much more expensive (in time or energy) than arithmetic. There has thus been a recent focus on designing algorithms that minimize communication and, when possible, attain lower bounds on the total number of reads and writes. However, most previous work does not distinguish between the costs of reads and writes. Writes can be much more expensive than reads in some current and emerging storage devices such as nonvolatile memories. This motivates us to ask whether there are lower bounds on the number of writes that certain algorithms must perform, and whether these bounds are asymptotically smaller than bounds on the sum of reads and writes together. When these smaller lower bounds exist, we then ask when they are attainable, we call such algorithms "write-avoiding" (WA), to distinguish them from "communication-avoiding" (CA) algorithms, which only minimize the sum of reads and writes. We identify a number of cases in linear algebra and direct N-body methods where known CA algorithms are also WA (some are and some aren't). We also identify classes of algorithms, including Strassen's matrix multiplication, Cooley-Tukey FFT, and cache oblivious algorithms for classical linear algebra, where a WA algorithm cannot exist: the number of writes is unavoidably within a constant factor of the total number of reads and writes. We explore the interaction of WA algorithms with cache replacement policies and argue that the Least Recently Used policy works well with the WA algorithms in this paper. We provide empirical hardware counter measurements from Intel's Nehalem-EX microarchitecture to validate our theory. In the parallel case, for classical linear algebra, we show that it is impossible to attain lower bounds both on interprocessor communication and on writes to local memory, but either one is attainable by itself. Finally, we discuss WA algorithms for sparse iterative linear algebra. |
| Author | Simhadri, Harsha Vardhan Knight, Nicholas Carson, Erin Koanantakool, Penporn Schwartz, Oded Demmel, James Grigori, Laura |
| Author_xml | – sequence: 1 givenname: Erin surname: Carson fullname: Carson, Erin email: erin.carson@nyu.edu organization: Courant Inst. of Math. Sci., New York Univ., New York, NY, USA – sequence: 2 givenname: James surname: Demmel fullname: Demmel, James email: demmel@berkeley.edu organization: Dept. of Math., Univ. of California, Berkeley, Berkeley, CA, USA – sequence: 3 givenname: Laura surname: Grigori fullname: Grigori, Laura email: laura.grigori@inria.fr organization: INRIA Paris-Rocquencourt, UPMC - Univ. Paris 6, Paris, France – sequence: 4 givenname: Nicholas surname: Knight fullname: Knight, Nicholas email: nknight@nyu.edu organization: Courant Inst. of Math. Sci., New York Univ., New York, NY, USA – sequence: 5 givenname: Penporn surname: Koanantakool fullname: Koanantakool, Penporn email: penpornk@eecs.berkeley.edu organization: Comput. Sci. Div., Univ. of California, Berkeley, Berkeley, CA, USA – sequence: 6 givenname: Oded surname: Schwartz fullname: Schwartz, Oded email: odedsc@cs.huji.ac.il organization: Sch. of Eng. & Comput. Sci., Hebrew Univ. of Jerusalem, Jerusalem, Israel – sequence: 7 givenname: Harsha Vardhan surname: Simhadri fullname: Simhadri, Harsha Vardhan email: harshas@lbl.gov organization: Comput. Res. Div., Lawrence Berkeley Nat. Lab., Berkeley, CA, USA |
| BookMark | eNotjF1LAkEUQCcwSM3XIHrpD4zdO3c-9j4uliUIChY9yjp7xyZ0N3Yl6N8n1NPhnIczUoOmbUSpG4QpIvDDYv243kwNoD-7vVATDgU6YDBowQ_UEB2BNhDclRr1_SeAAbI8VLfvXT6JLr_bXOdmf18e9u25fBz7a3WZqkMvk3-O1dv86XX2oper58WsXOpsoDhpF2vPgdNOhGNEDJSEWYpEpnJcWS-RIIKREAViTFGk3rlEZImisNBY3f19s4hsv7p8rLqfbXDowSP9AhnPPaM |
| CODEN | IEEPAD |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1109/IPDPS.2016.114 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISBN | 9781509021406 150902140X |
| EndPage | 658 |
| ExternalDocumentID | 7516061 |
| Genre | orig-research |
| GroupedDBID | 29O 6IE 6IF 6IH 6IK 6IL 6IN AAJGR AAWTH ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IPLJI OCL RIE RIL |
| ID | FETCH-LOGICAL-i208t-5cd6979fbee9cc1173fe99e8f32a59a46ec30c02e7ce0ccfceedb5f33433ce9e3 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 24 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000391251800067&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 1530-2075 |
| IngestDate | Wed Aug 27 02:11:36 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | false |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-i208t-5cd6979fbee9cc1173fe99e8f32a59a46ec30c02e7ce0ccfceedb5f33433ce9e3 |
| PageCount | 11 |
| ParticipantIDs | ieee_primary_7516061 |
| PublicationCentury | 2000 |
| PublicationDate | 20160501 |
| PublicationDateYYYYMMDD | 2016-05-01 |
| PublicationDate_xml | – month: 05 year: 2016 text: 20160501 day: 01 |
| PublicationDecade | 2010 |
| PublicationTitle | Proceedings - IEEE International Parallel and Distributed Processing Symposium |
| PublicationTitleAbbrev | IPDPS |
| PublicationYear | 2016 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| SSID | ssj0020349 |
| Score | 1.7562889 |
| Snippet | Communication, i.e., moving data between levels of a memory hierarchy or between processors over a network, is much more expensive (in time or energy) than... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 648 |
| SubjectTerms | Algorithm design and analysis communication avoiding algorithms Hardware Krylov subspace methods Linear algebra Load modeling lower bounds N-body methods Non-volatile memories Nonvolatile memory Program processors Schedules write complexity |
| Title | Write-Avoiding Algorithms |
| URI | https://ieeexplore.ieee.org/document/7516061 |
| WOSCitedRecordID | wos000391251800067&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEB5q8eCpaivWF3vwaGw22U02x-IDBSkLPuitdGcnWtCutNv-fpPdtSJ48RaGIa8hTCb55huAc-surSZSHioVaRbZJGfOjyuW58IaSaRkhap8edCjUTIem7QFF5tcGCKqwGd06ZvVX35e4Mo_lQ2064X7WGdLa1Xnam2CK8-zUnOjcmd5HTcEjSE3g_v0On30KC7lmXF_lVGpvMht53_j70LvJx0vSDeOZg9aNN-Hznc9hqA5nl3ou1C7JDZcFzOvGAzfXwsneftY9uD59ubp6o41pQ_YTPCkZDHmymhjMyKDGIZaWjKGEivFNDbTSBFKjlyQRuKI1s8gi62UkZRIhuQBtOfFnA4hEIl16rGIJHoiG5tJYWONGQptQ2WTPnT9SiefNbvFpFnk0d_iY9jxG1lD_k6gXS5WdArbuC5ny8VZZZIvn26LGA |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3dS8MwED-GCvo0dRPnZx98tC7NZ_M41LHhHAWn7G2s6UUHusrW7e-3aetE8MW3cIRwlyNcLvnd7wCubH5p1Vw6qBRXPrdh4udxXPpJQq1miJIVqMqXgRoOw_FYRzW43tTCIGIBPsMbNyz-8pPUrNxTWVvlqxCX62wLzikpq7U26ZVjWinZUUnueyUqisaA6HY_uoueHI5LOm7cX41UijjSrf9Pg31o_hTkedEm1BxADeeHUP_uyOBVB7QBrTzZztDvrNOZm-h13l_TXPL2sWzCc_d-dNvzq-YH_oySMPOFSaRW2saI2pggUMyi1hhaRqdCT7lEw4ghFJVBYox1GsTCMsYZM6iRHcHWPJ3jMXg0tPl0QTkzjsrGxoxaoUxsqLKBtGELGs7SyWfJbzGpjDz5W3wJu73R42Ay6A8fTmHPbWoJADyDrWyxwnPYMetstlxcFO75AhA5jl8 |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+-+IEEE+International+Parallel+and+Distributed+Processing+Symposium&rft.atitle=Write-Avoiding+Algorithms&rft.au=Carson%2C+Erin&rft.au=Demmel%2C+James&rft.au=Grigori%2C+Laura&rft.au=Knight%2C+Nicholas&rft.date=2016-05-01&rft.pub=IEEE&rft.issn=1530-2075&rft.spage=648&rft.epage=658&rft_id=info:doi/10.1109%2FIPDPS.2016.114&rft.externalDocID=7516061 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1530-2075&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1530-2075&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1530-2075&client=summon |