Simultaneously solving swarms of small sparse systems on SIMD silicon

A number of computational science algorithms lead to discretizations that require a large number of independent small matrix solves. Examples include small non-linear coupled chemistry and flow systems, one-dimensional sub-systems in climate and diffusion simulations and semi-implicit time integrato...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) S. 1128 - 1137
Hauptverfasser: Lelbach, Bryce Adelstein, Johansen, Hans, Williams, Samuel
Format: Tagungsbericht
Sprache:Englisch
Veröffentlicht: IEEE 01.05.2017
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract A number of computational science algorithms lead to discretizations that require a large number of independent small matrix solves. Examples include small non-linear coupled chemistry and flow systems, one-dimensional sub-systems in climate and diffusion simulations and semi-implicit time integrators, among others. We introduce an approach for solving large quantities of independent banded matrix problems on SIMD architectures. Unlike many vectorized or batched approaches that rely on reusing the matrix factorization across multiple solves, our algorithm supports batches of matrices that differ (due to spatial variation or non-linear solvers, for example). We present an implementation of our approach for diagonally-dominant tridiagonal systems that is optimized via compiler directives, tiling, and choice of data layout. Performance is evaluated on three Intel micro-architectures with different cache, vectorization, and threading features: Intel Ivy Bridge, Haswell, and Knight's Landing. Finally, we show that our solver improves on existing approaches and achieves ~90% of STREAM Triad effective bandwidth on all three platforms.
AbstractList A number of computational science algorithms lead to discretizations that require a large number of independent small matrix solves. Examples include small non-linear coupled chemistry and flow systems, one-dimensional sub-systems in climate and diffusion simulations and semi-implicit time integrators, among others. We introduce an approach for solving large quantities of independent banded matrix problems on SIMD architectures. Unlike many vectorized or batched approaches that rely on reusing the matrix factorization across multiple solves, our algorithm supports batches of matrices that differ (due to spatial variation or non-linear solvers, for example). We present an implementation of our approach for diagonally-dominant tridiagonal systems that is optimized via compiler directives, tiling, and choice of data layout. Performance is evaluated on three Intel micro-architectures with different cache, vectorization, and threading features: Intel Ivy Bridge, Haswell, and Knight's Landing. Finally, we show that our solver improves on existing approaches and achieves ~90% of STREAM Triad effective bandwidth on all three platforms.
Author Johansen, Hans
Williams, Samuel
Lelbach, Bryce Adelstein
Author_xml – sequence: 1
  givenname: Bryce Adelstein
  surname: Lelbach
  fullname: Lelbach, Bryce Adelstein
  email: brycelelbach@gmail.com
  organization: Comput. Res. Div., Lawrence Berkeley Nat. Lab., Berkeley, CA, USA
– sequence: 2
  givenname: Hans
  surname: Johansen
  fullname: Johansen, Hans
  email: hjohansen@lbl.gov
  organization: Comput. Res. Div., Lawrence Berkeley Nat. Lab., Berkeley, CA, USA
– sequence: 3
  givenname: Samuel
  surname: Williams
  fullname: Williams, Samuel
  email: swwilliams@lbl.gov
  organization: Comput. Res. Div., Lawrence Berkeley Nat. Lab., Berkeley, CA, USA
BookMark eNotzE9LwzAYgPEIetC5swcv-QKdedv8aY-yTS1MHFTxONL0jQTSdPTtlH57FT098Ds8V-w8DQkZuwGxAhDVXb3f7Jv3VS7A_IA8Y8vKlKCKUhdSlOKSbZvQn-JkEw4nijOnIX6G9MHpy4498cFz6m2MnI52JOQ004S_nnhTP284hRjckK7ZhbeRcPnfBXt72L6un7Ldy2O9vt9lAYyaMvRellBI04HVTpWu810HSrUOEQxq61thADuRe6-V0k6gaL3HvHVOYpsXC3b79w2IeDiOobfjfDCVVqB18Q0sVEqG
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/IPDPSW.2017.114
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 9781538634080
1538634082
EndPage 1137
ExternalDocumentID 7965166
Genre orig-research
GroupedDBID 6IE
6IL
CBEJK
RIE
RIL
ID FETCH-LOGICAL-i175t-eff481347d1a6c58cdfdd155bcee17e6afb071ed02ff6556c0e0bffe2bcc4eb23
IEDL.DBID RIE
ISICitedReferencesCount 2
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000417418900122&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Thu Jun 29 18:38:09 EDT 2023
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i175t-eff481347d1a6c58cdfdd155bcee17e6afb071ed02ff6556c0e0bffe2bcc4eb23
PageCount 10
ParticipantIDs ieee_primary_7965166
PublicationCentury 2000
PublicationDate 2017-May
PublicationDateYYYYMMDD 2017-05-01
PublicationDate_xml – month: 05
  year: 2017
  text: 2017-May
PublicationDecade 2010
PublicationTitle 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)
PublicationTitleAbbrev IPDPSW
PublicationYear 2017
Publisher IEEE
Publisher_xml – name: IEEE
Score 1.6396302
Snippet A number of computational science algorithms lead to discretizations that require a large number of independent small matrix solves. Examples include small...
SourceID ieee
SourceType Publisher
StartPage 1128
SubjectTerms AVX512
Banded
Batched
Computational modeling
Computer architecture
Knight's Landing
KNL
Layout
Libraries
Linear Algebra
Many-core
Matrix
Parallel processing
SIMD
Sparse
Sparse matrices
TDMA
Thomas Algorithm
Three-dimensional displays
Tiling
Tridiagonal
Tridiagonal Matrix Algorithm
Vector
Xeon Phi
Title Simultaneously solving swarms of small sparse systems on SIMD silicon
URI https://ieeexplore.ieee.org/document/7965166
WOSCitedRecordID wos000417418900122&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEA61ePCk0opvcvBo7Ga7eezZttiDZaGKvZU8JrCw3cpuq_jvTXZLRfDiLcwlySTkyzy-GYTuNEiRgpYkhPlI4s1ronTsCLWxEYbFwKxtmk2I2UwuFmnWQfd7LgwANMln8BCGTSzfrs02uMoGIuWMcn6ADoTgLVdrV62HRulgmo2y-VvI1hKhAu6vdikNWkyO_zfPCer_0O5wtgeUU9SBsofG8zwk_akSvI1efGF_V4IPANefqlrVeO1wvVJFgf3LUNWA28rMXl7i-fR5hOu88Gdd9tHrZPzy-ER2vQ9I7gF9Q8C5RAaap6WKGyaNddZ67Nd-DVQAV077zwHYKHaOM8ZNBJF2DmJtTOKt5eEZ6pbrEs4RNlIxBi5iTkKig4kknQeslFuhqRjSC9QLKli-t-UtlrvdX_4tvkJHQcNtzt816m6qLdygQ_OxyevqtjmTb1h9kng
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3NS8MwFA9zCnpS2cRvc_BoXds1Hz27yYbbKGzibiMfL1DoOmk3xf_epCsTwYu38C5JXkJ-eR-_9xC6l8BZDJJ7LsznRda89oQMjRfoUDFFQiBaV80m2GTC5_M4aaCHHRcGAKrkM3h0wyqWr1dq41xlHRZTElC6h_Zd56yarVXX6wn8uDNMesn0zeVrMVcD91fDlAovno__N9MJav8Q73Cyg5RT1IC8hfrT1KX9iRyslZ59YXtbnBcAl5-iWJZ4ZXC5FFmG7dtQlIC3tZmtPMfT4biHyzSzp5230etzf_Y08OruB15qIX3tgTERd0RPHQiqCFfaaG3RX9o1BAyoMNJ-D0D7oTGUEKp88KUxEEqlImsvd89QM1_lcI6w4oIQMD4xHCLpjCRuLGTFVDMZsG5wgVpOBYv3bYGLRb37y7_Fd-hwMBuPFqPh5OUKHTltbzMAr1FzXWzgBh2oj3VaFrfV-XwDJyqVwQ
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2017+IEEE+International+Parallel+and+Distributed+Processing+Symposium+Workshops+%28IPDPSW%29&rft.atitle=Simultaneously+solving+swarms+of+small+sparse+systems+on+SIMD+silicon&rft.au=Lelbach%2C+Bryce+Adelstein&rft.au=Johansen%2C+Hans&rft.au=Williams%2C+Samuel&rft.date=2017-05-01&rft.pub=IEEE&rft.spage=1128&rft.epage=1137&rft_id=info:doi/10.1109%2FIPDPSW.2017.114&rft.externalDocID=7965166