Multi-threading and one-sided communication in parallel LU factorization

Dense LU factorization has a high ratio of computation to communication and, as evidenced by the High Performance Linpack (HPL) benchmark, this property makes it scale well on most parallel machines. Nevertheless, the standard algorithm for this problem has non-trivial dependence patterns which limi...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	Proceedings of the 2007 ACM/IEEE Conference on Supercomputing s. 1 - 10
Hlavní autoři:	Husbands, Parry, Yelick, Katherine
Médium:	Konferenční příspěvek
Jazyk:	angličtina
Vydáno:	New York, NY, USA ACM 10.11.2007 IEEE
Edice:	ACM Conferences
Témata:	Communication system control Computing methodologies > Concurrent computing methodologies > Concurrent programming languages Concurrent computing Costs Delay dense linear algebra Government High performance computing latency tolerance Multithreading Parallel processing Software and its engineering > Software notations and tools > General programming languages > Language types > Concurrent programming languages Sparse matrices Yarn dense linear algebra multithreading latency tolerance
ISBN:	1595937641, 9781595937643
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	Dense LU factorization has a high ratio of computation to communication and, as evidenced by the High Performance Linpack (HPL) benchmark, this property makes it scale well on most parallel machines. Nevertheless, the standard algorithm for this problem has non-trivial dependence patterns which limit parallelism, and local computations require large matrices in order to achieve good single processor performance. We present an alternative programming model for this type of problem, which combines UPC's global address space with lightweight multithreading. We introduce the concept of memory-constrained lookahead where the amount of concurrency managed by each processor is controlled by the amount of memory available. We implement novel techniques for steering the computation to optimize for high performance and demonstrate the scalability and portability of UPC with Teraflop level performance on some machines, comparing favourably to other state-of-the-art MPI codes.
ISBN:	1595937641 9781595937643
DOI:	10.1145/1362622.1362664