Parallel computer system, parallel computing method, and program storage medium
Gespeichert in:
| Titel: | Parallel computer system, parallel computing method, and program storage medium |
|---|---|
| Patent Number: | 10013,393 |
| Publikationsdatum: | July 03, 2018 |
| Appl. No: | 15/137238 |
| Application Filed: | April 25, 2016 |
| Abstract: | A parallel computer system including a plurality of processors configured to perform LU factorization in parallel, the system is configured to cause each of the plurality of processors to execute processing including: generating a first panel by integrating a plurality of row panels among panels of a matrix to be subjected to the LU-factorization, the plurality of row panels being processed by the processor, generating a second panel by integrating a plurality of column panels among the panels of the matrix, the plurality of column panels being processed by the processor, and computing a matrix product of the first panel and the second panel. In parallel with the computation of the matrix product, each processor is configured to receive or transmit a column panel to be used for computation of a subsequent matrix product from or to another processor among the plurality of processors. |
| Inventors: | FUJITSU LIMITED (Kawasaki-shi, Kanagawa, JP) |
| Assignees: | FUJITSU LIMITED (Kawasaki, JP) |
| Claim: | 1. A non-transitory computer-readable storage medium that stores a program causing a first processor among a plurality of processors configured to perform LU factorization in parallel, to execute processing comprising: generating a first panel by integrating a plurality of row panels among panels of a local array of a matrix to be subjected to the LU-factorization, the plurality of row panels being processed by the first processor; generating a second panel by integrating a plurality of column panels among the panels of the local array, the plurality of column panels being processed by the first processor; and computing a matrix product of the first panel and the second panel; wherein the matrix is composed of a plurality of blocks which are distributed to the plurality of processors, and the blocks distributed to each of the plurality of processors form the local array. |
| Claim: | 2. The storage medium according to claim 1 , wherein in the computing the matrix product, communication processing is executed, in parallel with the computation of the matrix product, to receive or transmit a column panel to be used for computation of a subsequent matrix product from or to another processor among the plurality of processors. |
| Claim: | 3. The storage medium according to claim 2 , wherein in the computing the matrix product, the computation of the matrix product and the communication processing are performed in batches. |
| Claim: | 4. The storage medium according to claim 1 , wherein the program further causes the first processor to execute processing of computing the matrix product using a head block of a column panel with the smallest column number among the plurality of column panels and a row panel with the smallest row number among the plurality of row panels if lengths in a column direction of the plurality of column panels are different. |
| Claim: | 5. The storage medium according to claim 1 , wherein the program further causes an exchange of rows to be executed for a column panel with the smallest column number among the plurality of column panels. |
| Claim: | 6. A parallel computer system comprising a plurality of processors configured to perform LU factorization in parallel, the parallel computer system being configured to cause each of the plurality of processors to execute processing comprising: generating a first panel by integrating a plurality of row panels among panels of a local array of a matrix to be subjected to the LU-factorization, the plurality of row panels being processed by the processor, generating a second panel by integrating a plurality of column panels among the panels of the local array, the plurality of column panels being processed by the processor, and computing a matrix product of the first panel and the second panel; wherein the matrix is composed of a plurality of blocks which are distributed to the plurality of processors, and the blocks distributed to each of the plurality of processors form the local array. |
| Claim: | 7. A parallel computing method of causing each of a plurality of processors configured to perform LU factorization in parallel, to execute processing comprising: generating a first panel by integrating a plurality of row panels among panels of a local array of a matrix to be subjected to the LU-factorization, the plurality of row panels being processed by the processor, generating a second panel by integrating a plurality of column panels among the panels of the local array, the plurality of column panels being processed by the processor, and computing a matrix product of the first panel and the second panel; wherein the matrix is composed of a plurality of blocks which are distributed to the plurality of processors, and the blocks distributed to each of the plurality of processors form the local array. |
| Patent References Cited: | 2004/0193841 September 2004 Nakanishi 2006/0064452 March 2006 Nakanishi 2009/0300091 December 2009 Brokenshire 2009/0319592 December 2009 Nakanishi 2000-339295 December 2000 2006-85619 March 2006 2008-176738 July 2008 2008/136045 November 2008 |
| Other References: | “HPL—A Portable Implementation of the High-Performance Linpack Benchmark for Distributed-Memory Computers”, Innovative Computing Laboratory, 1 pp., Sep. 27, 2000, http://www.netlib.org/benchmark/hpl_oldest/. cited by applicant |
| Primary Examiner: | Ngo, Chuong D |
| Attorney, Agent or Firm: | Staas & Halsey LLP |
| Dokumentencode: | edspgr.10013393 |
| Datenbank: | USPTO Patent Grants |
Schreiben Sie den ersten Kommentar!