Parallel computer system, parallel computing method, and program storage medium

Gespeichert in:
Bibliographische Detailangaben
Titel: Parallel computer system, parallel computing method, and program storage medium
Patent Number: 10013,393
Publikationsdatum: July 03, 2018
Appl. No: 15/137238
Application Filed: April 25, 2016
Abstract: A parallel computer system including a plurality of processors configured to perform LU factorization in parallel, the system is configured to cause each of the plurality of processors to execute processing including: generating a first panel by integrating a plurality of row panels among panels of a matrix to be subjected to the LU-factorization, the plurality of row panels being processed by the processor, generating a second panel by integrating a plurality of column panels among the panels of the matrix, the plurality of column panels being processed by the processor, and computing a matrix product of the first panel and the second panel. In parallel with the computation of the matrix product, each processor is configured to receive or transmit a column panel to be used for computation of a subsequent matrix product from or to another processor among the plurality of processors.
Inventors: FUJITSU LIMITED (Kawasaki-shi, Kanagawa, JP)
Assignees: FUJITSU LIMITED (Kawasaki, JP)
Claim: 1. A non-transitory computer-readable storage medium that stores a program causing a first processor among a plurality of processors configured to perform LU factorization in parallel, to execute processing comprising: generating a first panel by integrating a plurality of row panels among panels of a local array of a matrix to be subjected to the LU-factorization, the plurality of row panels being processed by the first processor; generating a second panel by integrating a plurality of column panels among the panels of the local array, the plurality of column panels being processed by the first processor; and computing a matrix product of the first panel and the second panel; wherein the matrix is composed of a plurality of blocks which are distributed to the plurality of processors, and the blocks distributed to each of the plurality of processors form the local array.
Claim: 2. The storage medium according to claim 1 , wherein in the computing the matrix product, communication processing is executed, in parallel with the computation of the matrix product, to receive or transmit a column panel to be used for computation of a subsequent matrix product from or to another processor among the plurality of processors.
Claim: 3. The storage medium according to claim 2 , wherein in the computing the matrix product, the computation of the matrix product and the communication processing are performed in batches.
Claim: 4. The storage medium according to claim 1 , wherein the program further causes the first processor to execute processing of computing the matrix product using a head block of a column panel with the smallest column number among the plurality of column panels and a row panel with the smallest row number among the plurality of row panels if lengths in a column direction of the plurality of column panels are different.
Claim: 5. The storage medium according to claim 1 , wherein the program further causes an exchange of rows to be executed for a column panel with the smallest column number among the plurality of column panels.
Claim: 6. A parallel computer system comprising a plurality of processors configured to perform LU factorization in parallel, the parallel computer system being configured to cause each of the plurality of processors to execute processing comprising: generating a first panel by integrating a plurality of row panels among panels of a local array of a matrix to be subjected to the LU-factorization, the plurality of row panels being processed by the processor, generating a second panel by integrating a plurality of column panels among the panels of the local array, the plurality of column panels being processed by the processor, and computing a matrix product of the first panel and the second panel; wherein the matrix is composed of a plurality of blocks which are distributed to the plurality of processors, and the blocks distributed to each of the plurality of processors form the local array.
Claim: 7. A parallel computing method of causing each of a plurality of processors configured to perform LU factorization in parallel, to execute processing comprising: generating a first panel by integrating a plurality of row panels among panels of a local array of a matrix to be subjected to the LU-factorization, the plurality of row panels being processed by the processor, generating a second panel by integrating a plurality of column panels among the panels of the local array, the plurality of column panels being processed by the processor, and computing a matrix product of the first panel and the second panel; wherein the matrix is composed of a plurality of blocks which are distributed to the plurality of processors, and the blocks distributed to each of the plurality of processors form the local array.
Patent References Cited: 2004/0193841 September 2004 Nakanishi
2006/0064452 March 2006 Nakanishi
2009/0300091 December 2009 Brokenshire
2009/0319592 December 2009 Nakanishi
2000-339295 December 2000
2006-85619 March 2006
2008-176738 July 2008
2008/136045 November 2008
Other References: “HPL—A Portable Implementation of the High-Performance Linpack Benchmark for Distributed-Memory Computers”, Innovative Computing Laboratory, 1 pp., Sep. 27, 2000, http://www.netlib.org/benchmark/hpl_oldest/. cited by applicant
Primary Examiner: Ngo, Chuong D
Attorney, Agent or Firm: Staas & Halsey LLP
Dokumentencode: edspgr.10013393
Datenbank: USPTO Patent Grants