A high performance implementation of MPI-IO for a Lustre file system environment

It is often the case that MPI‐IO performs poorly in a Lustre file system environment, although the reasons for such performance have heretofore not been well understood. We hypothesize that such performance is a direct result of the fundamental assumptions upon which most parallel I/O optimizations...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	Concurrency and computation Ročník 22; číslo 11; s. 1433 - 1449
Hlavní autoři:	Dickens, Phillip M., Logan, Jeremy
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	Chichester, UK John Wiley & Sons, Ltd 10.08.2010
Témata:	Concurrency Devices grid computing Infrastructure Libraries Lustre Mathematical models object-based file systems Optimization parallel I/O Yttrium
ISSN:	1532-0626, 1532-0634, 1532-0634
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	It is often the case that MPI‐IO performs poorly in a Lustre file system environment, although the reasons for such performance have heretofore not been well understood. We hypothesize that such performance is a direct result of the fundamental assumptions upon which most parallel I/O optimizations are based. In particular, it is almost universally believed that parallel I/O performance is optimized when aggregator processes perform large, contiguous I/O operations in parallel. Our research, however, shows that this approach can actually provide the worst performance in a Lustre environment, and that the best performance may be obtained by performing a large number of small, non‐contiguous I/O operations. In this paper, we provide empirical results demonstrating these non‐intuitive results and explore the reasons for such unexpected performance. We present our solution to the problem, which is embodied in a user‐level library termed Y‐Lib, which redistributes the data in a way that conforms much more closely with the Lustre storage architecture than does the data redistribution pattern employed by MPI‐IO. We provide a large body of experimental results, taken across two large‐scale Lustre installations, demonstrating that Y‐Lib outperforms MPI‐IO by up to 36% on one system and 1000% on the other. We discuss the factors that impact the performance improvement obtained by Y‐Lib, which include the number of aggregator processes and Object Storage Devices, as well as the power of the system's communications infrastructure. We also show that the optimal data redistribution pattern for Y‐Lib is dependent upon these same factors. Copyright © 2009 John Wiley & Sons, Ltd.
Bibliografie:	National Science Foundation - No. 0702748 ark:/67375/WNG-D7WD03V9-C ArticleID:CPE1491 istex:51734B87519EF8407F97C21154D37ED12BBCD575 ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	1532-0626 1532-0634 1532-0634
DOI:	10.1002/cpe.1491