Simultaneously solving swarms of small sparse systems on SIMD silicon
A number of computational science algorithms lead to discretizations that require a large number of independent small matrix solves. Examples include small non-linear coupled chemistry and flow systems, one-dimensional sub-systems in climate and diffusion simulations and semi-implicit time integrato...
Gespeichert in:
| Veröffentlicht in: | 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) S. 1128 - 1137 |
|---|---|
| Hauptverfasser: | , , |
| Format: | Tagungsbericht |
| Sprache: | Englisch |
| Veröffentlicht: |
IEEE
01.05.2017
|
| Schlagworte: | |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Abstract | A number of computational science algorithms lead to discretizations that require a large number of independent small matrix solves. Examples include small non-linear coupled chemistry and flow systems, one-dimensional sub-systems in climate and diffusion simulations and semi-implicit time integrators, among others. We introduce an approach for solving large quantities of independent banded matrix problems on SIMD architectures. Unlike many vectorized or batched approaches that rely on reusing the matrix factorization across multiple solves, our algorithm supports batches of matrices that differ (due to spatial variation or non-linear solvers, for example). We present an implementation of our approach for diagonally-dominant tridiagonal systems that is optimized via compiler directives, tiling, and choice of data layout. Performance is evaluated on three Intel micro-architectures with different cache, vectorization, and threading features: Intel Ivy Bridge, Haswell, and Knight's Landing. Finally, we show that our solver improves on existing approaches and achieves ~90% of STREAM Triad effective bandwidth on all three platforms. |
|---|---|
| AbstractList | A number of computational science algorithms lead to discretizations that require a large number of independent small matrix solves. Examples include small non-linear coupled chemistry and flow systems, one-dimensional sub-systems in climate and diffusion simulations and semi-implicit time integrators, among others. We introduce an approach for solving large quantities of independent banded matrix problems on SIMD architectures. Unlike many vectorized or batched approaches that rely on reusing the matrix factorization across multiple solves, our algorithm supports batches of matrices that differ (due to spatial variation or non-linear solvers, for example). We present an implementation of our approach for diagonally-dominant tridiagonal systems that is optimized via compiler directives, tiling, and choice of data layout. Performance is evaluated on three Intel micro-architectures with different cache, vectorization, and threading features: Intel Ivy Bridge, Haswell, and Knight's Landing. Finally, we show that our solver improves on existing approaches and achieves ~90% of STREAM Triad effective bandwidth on all three platforms. |
| Author | Johansen, Hans Williams, Samuel Lelbach, Bryce Adelstein |
| Author_xml | – sequence: 1 givenname: Bryce Adelstein surname: Lelbach fullname: Lelbach, Bryce Adelstein email: brycelelbach@gmail.com organization: Comput. Res. Div., Lawrence Berkeley Nat. Lab., Berkeley, CA, USA – sequence: 2 givenname: Hans surname: Johansen fullname: Johansen, Hans email: hjohansen@lbl.gov organization: Comput. Res. Div., Lawrence Berkeley Nat. Lab., Berkeley, CA, USA – sequence: 3 givenname: Samuel surname: Williams fullname: Williams, Samuel email: swwilliams@lbl.gov organization: Comput. Res. Div., Lawrence Berkeley Nat. Lab., Berkeley, CA, USA |
| BookMark | eNotzE9LwzAYgPEIetC5swcv-QKdedv8aY-yTS1MHFTxONL0jQTSdPTtlH57FT098Ds8V-w8DQkZuwGxAhDVXb3f7Jv3VS7A_IA8Y8vKlKCKUhdSlOKSbZvQn-JkEw4nijOnIX6G9MHpy4498cFz6m2MnI52JOQ004S_nnhTP284hRjckK7ZhbeRcPnfBXt72L6un7Ldy2O9vt9lAYyaMvRellBI04HVTpWu810HSrUOEQxq61thADuRe6-V0k6gaL3HvHVOYpsXC3b79w2IeDiOobfjfDCVVqB18Q0sVEqG |
| CODEN | IEEPAD |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1109/IPDPSW.2017.114 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| EISBN | 9781538634080 1538634082 |
| EndPage | 1137 |
| ExternalDocumentID | 7965166 |
| Genre | orig-research |
| GroupedDBID | 6IE 6IL CBEJK RIE RIL |
| ID | FETCH-LOGICAL-i175t-eff481347d1a6c58cdfdd155bcee17e6afb071ed02ff6556c0e0bffe2bcc4eb23 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 2 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000417418900122&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| IngestDate | Thu Jun 29 18:38:09 EDT 2023 |
| IsPeerReviewed | false |
| IsScholarly | false |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-i175t-eff481347d1a6c58cdfdd155bcee17e6afb071ed02ff6556c0e0bffe2bcc4eb23 |
| PageCount | 10 |
| ParticipantIDs | ieee_primary_7965166 |
| PublicationCentury | 2000 |
| PublicationDate | 2017-May |
| PublicationDateYYYYMMDD | 2017-05-01 |
| PublicationDate_xml | – month: 05 year: 2017 text: 2017-May |
| PublicationDecade | 2010 |
| PublicationTitle | 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) |
| PublicationTitleAbbrev | IPDPSW |
| PublicationYear | 2017 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| Score | 1.6396302 |
| Snippet | A number of computational science algorithms lead to discretizations that require a large number of independent small matrix solves. Examples include small... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 1128 |
| SubjectTerms | AVX512 Banded Batched Computational modeling Computer architecture Knight's Landing KNL Layout Libraries Linear Algebra Many-core Matrix Parallel processing SIMD Sparse Sparse matrices TDMA Thomas Algorithm Three-dimensional displays Tiling Tridiagonal Tridiagonal Matrix Algorithm Vector Xeon Phi |
| Title | Simultaneously solving swarms of small sparse systems on SIMD silicon |
| URI | https://ieeexplore.ieee.org/document/7965166 |
| WOSCitedRecordID | wos000417418900122&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEA61ePCk0opvcvBo7Ga7eezZttiDZaGKvZU8JrCw3cpuq_jvTXZLRfDiLcwlySTkyzy-GYTuNEiRgpYkhPlI4s1ronTsCLWxEYbFwKxtmk2I2UwuFmnWQfd7LgwANMln8BCGTSzfrs02uMoGIuWMcn6ADoTgLVdrV62HRulgmo2y-VvI1hKhAu6vdikNWkyO_zfPCer_0O5wtgeUU9SBsofG8zwk_akSvI1efGF_V4IPANefqlrVeO1wvVJFgf3LUNWA28rMXl7i-fR5hOu88Gdd9tHrZPzy-ER2vQ9I7gF9Q8C5RAaap6WKGyaNddZ67Nd-DVQAV077zwHYKHaOM8ZNBJF2DmJtTOKt5eEZ6pbrEs4RNlIxBi5iTkKig4kknQeslFuhqRjSC9QLKli-t-UtlrvdX_4tvkJHQcNtzt816m6qLdygQ_OxyevqtjmTb1h9kng |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3NS8MwFA9zCnpS2cRvc_BoXds1Hz27yYbbKGzibiMfL1DoOmk3xf_epCsTwYu38C5JXkJ-eR-_9xC6l8BZDJJ7LsznRda89oQMjRfoUDFFQiBaV80m2GTC5_M4aaCHHRcGAKrkM3h0wyqWr1dq41xlHRZTElC6h_Zd56yarVXX6wn8uDNMesn0zeVrMVcD91fDlAovno__N9MJav8Q73Cyg5RT1IC8hfrT1KX9iRyslZ59YXtbnBcAl5-iWJZ4ZXC5FFmG7dtQlIC3tZmtPMfT4biHyzSzp5230etzf_Y08OruB15qIX3tgTERd0RPHQiqCFfaaG3RX9o1BAyoMNJ-D0D7oTGUEKp88KUxEEqlImsvd89QM1_lcI6w4oIQMD4xHCLpjCRuLGTFVDMZsG5wgVpOBYv3bYGLRb37y7_Fd-hwMBuPFqPh5OUKHTltbzMAr1FzXWzgBh2oj3VaFrfV-XwDJyqVwQ |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2017+IEEE+International+Parallel+and+Distributed+Processing+Symposium+Workshops+%28IPDPSW%29&rft.atitle=Simultaneously+solving+swarms+of+small+sparse+systems+on+SIMD+silicon&rft.au=Lelbach%2C+Bryce+Adelstein&rft.au=Johansen%2C+Hans&rft.au=Williams%2C+Samuel&rft.date=2017-05-01&rft.pub=IEEE&rft.spage=1128&rft.epage=1137&rft_id=info:doi/10.1109%2FIPDPSW.2017.114&rft.externalDocID=7965166 |