Parallel computing system and communication control program
Saved in:
| Title: | Parallel computing system and communication control program |
|---|---|
| Patent Number: | 10430,375 |
| Publication Date: | October 01, 2019 |
| Appl. No: | 12/872338 |
| Application Filed: | August 31, 2010 |
| Abstract: | A parallel computing system includes a plurality of processors multi-dimensionally commented by an interconnection network, wherein each of the processors in the parallel computing system determines, in dimensional order, communication channels to other processors in the interconnection network, each of the processors sets, as relative coordinates of destination processors with respect to the plurality of processors in data communications performed at a same timing, relative coordinates common to all of the processors, and each of the processors performs data communications with destination processors having the set relative coordinates. |
| Inventors: | Ajima, Yuichiro (Kawasaki, JP); Shimizu, Toshiyuki (Kawasaki, JP); Ishihata, Hiroaki (Hachioji, JP) |
| Assignees: | FUJITSU LIMITED (Kawasaki, JP) |
| Claim: | 1. A parallel computing system comprising: a plurality of processors multi-dimensionally coupled via an interconnection network having torus or mesh topology, each of the plurality of processors in the parallel computing system being configured to: determine, in dimensional order, communication channels to other processors in the interconnection network, be provided with a relative coordinate system whose origin is set as the each processor, the relative coordinate system being common to all of the plurality of processors, perform a simultaneous data-transmission including: determining a distinct set of plural destination relative coordinates whose values are symmetric about the origin of the relative coordinate system provided for the each processor and whose distances in dimensional order from the origin of the relative coordinate system provided for the each processor are identical, selecting, from among the plurality of processors, based on the relative coordinate system provided for the each processor, a different group of plural destination processors having the determined distinct set of plural destination relative coordinates, and simultaneously transmitting data from the each processor to the selected different group of plural destination processors through the communication channels determined in dimensional order, wherein when performing all-to-all communication in the parallel computing system, each of the plurality of processors repeats the simultaneous data-transmission by changing the distinct set of plural destination relative coordinates so that each of all the plurality of processors is selected once as a destination processor included in one of the different groups of plural destination processors; and for any pair of first and second processors among the plurality of processors, first different groups of plural destination processors selected by the first processor have first distinct sets of plural destination relative coordinates that are respectively identical to second distinct sets of plural destination relative coordinates for second different groups of plural destination processors selected by the second processor, and identical transmission control including the simultaneous data-transmission is performed on both the first and second processors in the parallel computing system, based on the first and second distinct sets of plural destination relative coordinates, so that a number of inter processor communications routed through respective links is equalized among all links in the parallel computing system. |
| Claim: | 2. The parallel computing system according to claim 1 , wherein each of the plurality of processors in the parallel computing system is configured to simultaneously transmit data to a maximum of four processors, wherein, when L represents the maximum value of the lengths in respective dimensions of the interconnection network and Xn and Yn respectively represent the relative coordinate value in a first dimension and the relative coordinate value in a second dimension of the relative coordinates of an n-th destination processor with respect to a source processor, each of the plurality of processors performs: first transmission processing of transmitting data to four destination processors located at respective positions of relative coordinates (X1, Y1) in which the absolute value of X1 is set to be different from the absolute value of Y1, and in which the absolute value of X1 and the absolute value of Y1 are set to be different from half the L value, relative coordinates (X2, Y2) in which X2 is set to the sign-inverted value of X1 and Y2 is set to the sign-inverted value of Y1, relative coordinates (X3, Y3) in which X3 is set to Y1 and Y3 is set to X1, and relative coordinates (X4, Y4) in which X4 is set to the sign-inverted value of Y1 and Y4 is set to the sign-inverted value of X1; second transmission processing of transmitting data to four destination processors located at the respective positions of relative coordinates (X1, Y1) in which the absolute value of X1 is set to be equal to the absolute value of Y1, relative coordinates (X2, Y2) in which X2 is set to the sign-inverted value of X1 and Y2 is set to the sign-inverted value of Y1, relative coordinates (X3, Y3) in which X3 is set to the sign-inverted value of X1 and Y3 is set to Y1, and relative coordinates (X4, Y4) in which X4 is set to X1 and Y4 is set to the sign-inverted value of Y1; and third transmission processing of transmitting data to four destination processors located at the respective positions of relative coordinates (X1, 0), relative coordinates (0, Y1) in which the absolute value of Y1 is set to be equal to the absolute value of X1, relative coordinates (X2, 0) in which X2 is set to the sign-inverted value of X1, and relative coordinates (0, Y2) in which Y2 is set to the sign-inverted value of Y1, and wherein the plurality of processors in the parallel computing system perform, at the same timing, data communications using a same transmission processing with destination processors having the same relative coordinates. |
| Claim: | 3. The parallel computing system according to claim 1 , wherein each of the plurality of processors in the parallel computing system is configured to simultaneously transmit data to a maximum of four processors, wherein, when the interconnection network is a two-dimensional torus and is different in length between a first dimension and a second dimension thereof, and when L represents the maximum value of the length in the first dimension and the length in the second dimension of the interconnection network and Xn and Yn respectively represent the relative coordinate value in the first dimension and the relative coordinate value in the second dimension of the relative coordinates of an n-th destination processor with respect to a source processor, each of the plurality of processors perform: first transmission processing of transmitting data to four destination processors located at respective positions of relative coordinates (X1, Y1) in which the absolute value of X1 is set to be different from the absolute value of Y1, and in which the absolute value of X1 and the absolute value of Y1 are set to be different from half the L value, relative coordinates (X2, Y2) in which X2 is set to the sign-inverted value of X1 and Y2 is set to the sign-inverted value of Y1, relative coordinates (X3, Y3) in which X3 is set to Y1 and Y3 is set to X1, and relative coordinates (X4, Y4) in which X4 is set to the sign-inverted value of Y1 and Y4 is set to the sign-inverted value of X1, second transmission processing of transmitting data to four destination processors located at the respective positions of relative coordinates (X1, Y1) in which the absolute value of X1 is set to be equal to the absolute value of Y1, relative coordinates (X2, Y2) in which X2 is set to the sign-inverted value of X1 and Y2 is set to the sign-inverted value of Y1, relative coordinates (X3, Y3) in which X3 is set to the sign-inverted value of X1 and Y3 is set to Y1, and relative coordinates (X4, Y4) in which X4 is set to X1 and Y4 is set to the sign-inverted value of Y1, and third transmission processing of transmitting data to four destination processors located at the respective positions of relative coordinates (X1, 0), relative coordinates (0, Y1) in which the absolute value of Y1 is set to be equal to the absolute value of X1, relative coordinates (X2, 0) in which X2 is set to the sign-inverted value of X1, and relative coordinates (0, Y2) in which Y2 is set to the sign-inverted value of Y1, and wherein the plurality of processors in the parallel computing system perform, at the same timing, data communications using a same transmission processing with destination processors having the same relative coordinates. |
| Claim: | 4. The parallel computing system according to claim 1 , wherein each of the plurality of processors in the parallel computing system is configured to simultaneously transmit data to a maximum of four processors, wherein, when the interconnection network is a two-dimensional mesh, and when L represents the maximum value of the length in a first dimension and the length in a second dimension of the interconnection network and Xn and Yn respectively represent the relative coordinate value in the first dimension and the relative coordinate value in the second dimension of the relative coordinates of an n-th destination processor with respect to a source processor, each of the plurality of processors performs: first transmission processing of transmitting data to four destination processors located at respective positions of relative coordinates (X1, Y1) in which the absolute value of X1 is set to be different from the absolute value of Y1, and in which the absolute value of X1 and the absolute value of Y1 are set to be different from half the L value, relative coordinates (X2, Y2) in which X2 is set to the sign-inverted value of X1 and Y2 is set to the sign-inverted value of Y1, relative coordinates (X3, Y3) in which X3 is set to Y1 and Y3 is set to X1, and relative coordinates (X4, Y4) in which X4 is set to the sign-inverted value of Y1 and Y4 is set to the sign-inverted value of X1, second transmission processing of transmitting data to four destination processors located at the respective positions of relative coordinates (X1, Y1) in which the absolute value of X1 is set to be equal to the absolute value of Y1, relative coordinates (X2, Y2) in which X2 is set to the sign-inverted value of X1 and Y2 is set to the sign-inverted value of Y1, relative coordinates (X3, Y3) in which X3 is set to the sign-inverted value of X1 and Y3 is set to Y1, and relative coordinates (X4, Y4) in which X4 is set to X1 and Y4 is set to the sign-inverted value of Y1, and third transmission processing of transmitting data to four destination processors located at the respective positions of relative coordinates (X1, 0), relative coordinates (0, Y1) in which the absolute value of Y1 is set to be equal to the absolute value of X1, relative coordinates (X2, 0) in which X2 is set to the sign-inverted value of X1, and relative coordinates (0, Y2) in which Y2 is set to the sign-inverted value of Y1, and wherein the plurality of processors in the parallel computing system performs, at the same timing, data communications using a same transmission processing with destination processors having the same relative coordinates. |
| Claim: | 5. The parallel computing system according to claim 2 , wherein, when the L value is an odd number, each of the plurality of processors repeats the simultaneous data-transmission to four destination processors (L−1) 2 /4 times with the absolute value of X1 and the absolute value of Y1 set to be different from each other and performing repeats the simultaneous transmission to four destination processors (L−1)/2 times with the absolute value of X1 and the absolute value of Y1 set to be equal to each other, while changing the relative coordinate values in each of the simultaneous data-transmissions, and wherein, when the L value is an even number, each of the plurality of processors repeats simultaneous transmission to four destination processors the (L−1) 2/4 times with the absolute value of X1 and the absolute value of Y1 set to be different from each other, repeats the simultaneous data-transmission to four destination processors (L/2−1) times with the absolute value of X1 and the absolute value of Y1 set to be equal to each other, and performs once the simultaneous data-transmission to three destination processors with the absolute value of X1 and the absolute value of Y1 set to be equal to each other and set to half the L value, while changing the relative coordinate values in each of the simultaneous data-transmissions. |
| Claim: | 6. The parallel computing system according to claim 1 , wherein, when the interconnection network is a three-dimensional torus and is equal in length among a first dimension, a second dimension, and a third dimension thereof, and when L represents the length in each of the dimensions of the interconnection network and Xn, Yn, and Zn respectively represent the relative coordinate value in the first dimension, the relative coordinate value in the second dimension, and the relative coordinate value in the third dimension of the relative coordinates of an n-th destination processor with respect to a source processor, each of the plurality of processors performs: first transmission processing of transmitting data to six destination processors located at respective positions of: relative coordinates (X1, Y1, Z1) in which at least one of the absolute value of X1, the absolute value of Y1, and the absolute value of Z1 is different from the other absolute values, and in which the relative coordinates correspond to one of conditions of: 1) each of the absolute values X1, Y1, and Z1 is set to be different from zero and half the L value, 2) X1 is set to zero and each of the absolute values X1 and Z1 is set to be different from zero and half the L value, and 3) X1 and Y1 are set to zero and the absolute value of Z1 is set to be different from zero and half the L value, relative coordinates (X2, Y2, Z2) in which X2 is set to the sign-inverted value of X1, Y2 is set to the sign-inverted value of Y1, and Z2 is set to the sign-inverted value of Z1, relative coordinates (X3, Y3, Z3) in which X3 is set to Z1, Y3 is set to X1, and Z3 is set to Y1, relative coordinates (X4, Y4, Z4) in which X4 is set to the sign-inverted value of Z1, Y4 is set to the sign-inverted value of X1, and Z4 is set to the sign-inverted value of Y1, relative coordinates (X5, Y5, Z5) in which X5 is set to Y1, Y5 is set to Z1, and Z5 is set to X1, and relative coordinates (X6, Y6, Z6) in which X6 is set to the sign-inverted value of Y1, Y6 is set to the sign-inverted value of Z1, and Z6 is set to the sign-inverted value of X1; and second transmission processing of transmitting data to four destination processors located at the respective positions of relative coordinates (X1, Y1, Z1) in which the absolute values of X1, Y1, and Z1 are set to the same value different from zero and half the L value, relative coordinates (X2, Y2, Z2) in which X2 is set to the sign-inverted value of X1, Y2 is set to the sign-inverted value of Y1, and Z2 is set to the sign-inverted value of Z1, relative coordinates (X3, Y3, Z3) in which X3 is set to X1, Y3 is set to Y1, and Z3 is set to the sign-inverted value of Z1, and relative coordinates (X4, Y4, Z4) in which X4 is set to the sign-inverted value of X1, Y4 is set to the sign-inverted value of Y1, and Z4 is set to Z1, wherein the plurality of processors in the parallel computing system performs, at the same timing, data communications using a same transmission processing with destination processors having the same relative coordinates. |
| Claim: | 7. The parallel computing system according to claim 1 , wherein, when the interconnection network is a three-dimensional torus with one of a length in a first dimension, a length in a second dimension, and a length in a third dimension of the interconnection network different from the other lengths, and when L represents a maximum value of the length in the first dimension, the length in the second dimension, and the length in the third dimension of the interconnection network and Xn, Yn, and Zn respectively represent the relative coordinate value in the first dimension, the relative coordinate value in the second dimension, and the relative coordinate value in the third dimension of the relative coordinates of an n-th destination processor with respect to a source processor, each of the plurality of processors performs: first transmission processing of transmitting data to six destination processors located at respective positions of: relative coordinates (X1, Y1, Z1) in which at least one of the absolute value of X1, the absolute value of Y1, and the absolute value of Z1 is different from the other absolute values, and in which the relative coordinates correspond to one of conditions of: 1) each of the absolute values X1, Y1, and Z1 is set to be different from zero and half the L value, 2) X1 is set to zero and each of the absolute values X1 and Z1 is set to be different from zero and half the L value, 3) X1 and Y1 are set to zero and the absolute value of Z1 is set to be different from zero and half the L value, relative coordinates (X2, Y2, Z2) in which X2 is set to the sign-inverted value of X1, Y2 is set to the sign-inverted value of Y1, and Z2 is set to the sign-inverted value of Z1, relative coordinates (X3, Y3, Z3) in which X3 is set to Z1, Y3 is set to X1, and Z3 is set to Y1, relative coordinates (X4, Y4, Z4) in which X4 is set to the sign-inverted value of Z1, Y4 is set to the sign-inverted value of X1, and Z4 is set to the sign-inverted value of Y1, relative coordinates (X5, Y5, Z5) in which X5 is set to Y1, Y5 is set to Z1, and Z5 is set to X1, and relative coordinates (X6, Y6, Z6) in which X6 is set to the sign-inverted value of Y1, Y6 is set to the sign-inverted value of Z1, and Z6 is set to the sign-inverted value of X1; and second transmission processing of transmitting data to four destination processors located at the respective positions of: relative coordinates (X1, Y1, Z1) in which the absolute values of X1, Y1, and Z1 are set to the same value different from zero and half the L value, relative coordinates (X2, Y2, Z2) in which X2 is set to the sign-inverted value of X1, Y2 is set to the sign-inverted value of Y1, and Z2 is set to the sign-inverted value of Z1, relative coordinates (X3, Y3, Z3) in which X3 is set to X1, Y3 is set to Y1, and Z3 is set to the sign-inverted value of Z1, and relative coordinates (X4, Y4, Z4) in which X4 is set to the sign-inverted value of X1, Y4 is set to the sign-inverted value of Y1, and Z4 is set to Z1, wherein the plurality of processors in the parallel computing system performs, at the same timing, data communications using a same transmission processing with destination processors having the same relative coordinates. |
| Claim: | 8. The parallel computing system according to claim 1 , wherein, when the interconnection network is a three-dimensional mesh, and when L represents a maximum value of a length in a first dimension, a length in a second dimension, and a length in a third dimension of the interconnection network and Xn, Yn, and Zn respectively represent the relative coordinate value in the first dimension, the relative coordinate value in the second dimension, and the relative coordinate value in the third dimension of the relative coordinates of an n-th destination processor with respect to a source processor, each of the plurality of processors performs: first transmission processing of transmitting data to six destination processors located at respective positions of: relative coordinates (X1, Y1, Z1) in which at least one of the absolute value of X1, the absolute value of Y1, and the absolute value of Z1 is different from the other absolute values, and in which the relative coordinates correspond to one of the following conditions of: 1) X1 is set to zero and one each of the absolute values X1, Y1, and Z1 is set to be different from zero and half the L value, 2) X1 is set to zero and each of the absolute values X1 and Z1 is set to be different from zero and half the L value, 3) X1 and Y1 are set to zero and the absolute value of Z1 is set to be different from zero and half the L value, relative coordinates (X2, Y2, Z2) in which X2 is set to the sign-inverted value of X1, Y2 is set to the sign-inverted value of Y1, and Z2 is set to the sign-inverted value of Z1, relative coordinates (X3, Y3, Z3) in which X3 is set to Z1, Y3 is set to X1, and Z3 is set to Y1, relative coordinates (X4, Y4, Z4) in which X4 is set to the sign-inverted value of Z1, Y4 is set to the sign-inverted value of X1, and Z4 is set to the sign-inverted value of Y1, relative coordinates (X5, Y5, Z5) in which X5 is set to Y1, Y5 is set to Z1, and Z5 is set to X1, and relative coordinates (X6, Y6, Z6) in which X6 is set to the sign-inverted value of Y1, Y6 is set to the sign-inverted value of Z1, and Z6 is set to the sign-inverted value of X1; and second transmission processing of transmitting data to four destination processors located at the respective positions of relative coordinates (X1, Y1, Z1) in which the absolute values of X1, Y1, and Z1 are set to the same value different from zero and half the L value, relative coordinates (X2, Y2, Z2) in which X2 is set to the sign-inverted value of X1, Y2 is set to the sign-inverted value of Y1, and Z2 is set to the sign-inverted value of Z1, relative coordinates (X3, Y3, Z3) in which X3 is set to X1, Y3 is set to Y1, and Z3 is set to the sign-inverted value of Z1, and relative coordinates (X4, Y4, Z4) in which X4 is set to the sign-inverted value of X1, Y4 is set to the sign-inverted value of Y1, and Z4 is set to Z1, wherein the plurality of processors in the parallel computing system performs, at the same timing, data communications using a same transmission processing with destination processors having the same relative coordinates. |
| Claim: | 9. The parallel computing system according to claim 6 , wherein, when the L value is an odd number, each of the plurality of processors repeats the simultaneous data-transmission to six destination processors (L 3 −4L+3)/4 times, and repeats the simultaneous data-transmission to four destination processors L−1 times, while changing the relative coordinate values in each of the simultaneous data-transmissions, and wherein, when the L value is an even number, each of the plurality of processors repeats the simultaneous data-transmission to six destination processors (L 3 −4L+1)/4 times, repeats the simultaneous data-transmission to four destination processors L−2 times, and performs once the simultaneous data-transmission to seven destination processors, while changing the relative coordinate values in each of the simultaneous data-transmissions. |
| Claim: | 10. A method performed by each of a plurality of processors in a parallel computing system, the plurality of processors being multi-dimensionally coupled to each other in an interconnection network having multi-dimensional torus or mesh topology, each of the plurality of processors having a relative coordinate system common to all of the plurality of processors so that any one of the plurality of processors in the parallel computing system is uniquely identified by a set of relative coordinates based on the relative coordinate system, the method comprising: determining, in dimensional order, communication channels to other processors in the interconnection network, determining a plurality of distinct combinations each including plural sets of distinct destination relative coordinates so that the plurality of distinct combinations cover all the plurality of processors except the each processor in the parallel computing system, performing a transmission process to transmit data to all the plurality of processors in the parallel computing system, the transmission process including transmitting data, through the communication channels determined in dimensional order, simultaneously from the each processor to the plural sets of distinct destination relative coordinates included in each of the determined plurality of distinct combinations, wherein the plural sets of distinct destination relative coordinates included in each of the plurality of distinct combinations are determined to be symmetric about the origin of the relative coordinate system and to have identical distances in dimensional order from the origin of the relative coordinate system, so that a number of inter processor communications routed through respective links is equalized among all links in the parallel computing system. |
| Claim: | 11. The method of claim 10 , wherein in a case where the interconnection network has a two-dimensional torus or mesh topology with two sides having a same length of 2n+1 where n is a natural number, and the relative coordinate system is set such that a set of relative coordinates of any one of the plurality of processors in the parallel computing system is represented as a set of relative coordinates (x, y) where x is an integer whose sign indicates a direction along an x-axis on a dimensional-order path from the origin (0, 0) and whose absolute value indicates a number of links of processors along the x-axis on the dimensional-order path, and y is an integer whose sign indicates a direction along a y-axis on the dimensional-order path from the origin (0, 0) and whose absolute value indicates a number of links of processors along the y-axis on the dimensional-order path from the origin (0, 0), the transmission process includes: a first transmission that transmits, for each of all pairs of natural numbers i and j satisfying that 0 |
| Claim: | 12. The method of claim 10 , wherein in a case where the interconnection network has a two-dimensional torus or mesh topology with two sides having a same length of 2n+2 where n is a natural number, and the relative coordinate system is set such that a set of relative coordinates of any one of the plurality of processors in the parallel computing system is represented as a set of relative coordinates (x, y) where x is an integer whose sign indicates a direction along an x-axis on a dimensional-order path from the origin (0, 0) and whose absolute value indicates a number of links of processors along the x-axis on the dimensional-order path, and y is an integer whose sign indicates a direction along a y-axis on the dimensional-order path from the origin (0, 0) and whose absolute value indicates a number of links of processors along the y-axis on the dimensional-order path from the origin (0, 0), the transmission process includes: a first transmission that transmits, for each of all pairs of natural numbers i and j satisfying that 0 |
| Claim: | 13. The method of claim 10 , wherein in a case where the interconnection network has a three-dimensional torus or mesh topology with three sides having a same length of 2n+1 where n is a natural number, and the relative coordinate system is set such that a set of relative coordinates of any one of the plurality of processors in the parallel computing system is represented as a set of relative coordinates (x, y, z) where x is an integer whose sign indicates a direction along an x-axis on a dimensional-order path from the origin (0, 0, 0) and whose absolute value indicates a number of links of processors along the x-axis on the dimensional-order path, y is an integer whose sign indicates a direction along a y-axis on the dimensional-order path from the origin (0, 0, 0) and whose absolute value indicates a number of links of processors along the y-axis on the dimensional-order path from the origin (0, 0, 0), and z is an integer whose sign indicates a direction along an z-axis on the dimensional-order path from the origin (0, 0, 0) and whose absolute value indicates a number of links of processors along the z-axis on the dimensional-order path, the transmission process includes: a first transmission that performs, for each of all combinations of three natural numbers i, j, and k satisfying that 0 |
| Claim: | 14. The method of claim 10 , wherein in a case where the interconnection network has a three-dimensional torus or mesh topology with three sides having a same length of 2n+2 where n is a natural number, and the relative coordinate system is set such that a set of relative coordinates of any one of the plurality of processors in the parallel computing system is represented as a set of relative coordinates (x, y, z) where x is an integer whose sign indicates a direction along an x-axis on a dimensional-order path from the origin (0, 0, 0) and whose absolute value indicates a number of links of processors along the x-axis on the dimensional-order path, y is an integer whose sign indicates a direction along a y-axis on the dimensional-order path from the origin (0, 0, 0) and whose absolute value indicates a number of links of processors along the y-axis on the dimensional-order path from the origin (0, 0, 0), and z is an integer whose sign indicates a direction along an z-axis on the dimensional-order path from the origin (0, 0, 0) and whose absolute value indicates a number of links of processors along the z-axis on the dimensional-order path, the transmission process includes: a first transmission that performs, for each of all combinations of three natural numbers i, j, and k satisfying that 0 |
| Patent References Cited: | 4598400 July 1986 Hillis 5175865 December 1992 Hillis 5247694 September 1993 Dahl 5535408 July 1996 Hillis 2006/0179270 August 2006 Archer et al. 2009/0292787 November 2009 Hosokawa 0206580 December 1986 0544532 June 1993 03-116357 May 1991 04-235654 August 1992 05-151181 June 1993 2004-536372 December 2004 2010-211553 September 2010 WO 2002/069168 September 2002 |
| Other References: | Tseng et al. (Efficient Broadcasting in Wormhole-Routed Multicomputers: A Network-Partitioning Approach, Jan. 1999, pp. 44-61). cited by examiner Suh et al. (Efficient All-to-All Personalized Exchange in Multidimensional Torus Networks, Aug. 1998, pp. 468-475). cited by examiner Yang et al. (Near-Optimal All-to-All Broadcast in Multidimensional All-Port Meshes and Tori, Feb. 2002, pp. 128-141). cited by examiner Adiga et al., “Blue Gene/L torus interconnection network”, IBM Journal of Research and Development, vol. 49, No. 2/3, Mar./May 2005, pp. 265-276. cited by applicant Almasi et al., “Optimization of MPI Collective Communication on BlueGene/L Systems,” ICS '05: Proceedings of the 19th annual international conference on Supercomputing. New York, NY, USA: ACM, Jun. 20-22, 2005, pp. 253-262. cited by applicant Bruck et al., “Efficient Algorithms for All-to-All Communications in Multiport Message-Passing Systems”, IEEE Transactions on Parallel and Distributed Systems, vol. 8, No. 11, Nov. 11, 1997, pp. 1143-1156. cited by applicant Scott, “Efficient All-to-All Communication Patterns in Hypercube and Mesh Topologies”, IEEE, 6th Distributed Memory Computing Conference, 1991, pp. 398-403. cited by applicant Hayashi et al., “Optimal All-to-All Communication Method in Torus Network”, The 44th (First Half of Tear Heisei 4) Annual Conference of Information Processing Society of Japan, 1993, English-language Translation Provided, 6D-4, pp. 6-105-6-106. cited by applicant Kumar et al., “Optimization of All-to-All Communication on the Blue Gene/L Supercomputer”, 37th International Conference on Parallel Processing, IEEE, 2008, pp. 320-329. cited by applicant Tipparaju et al., “Optimizing All-to-All Collective Communication by Exploiting Concurrency in Modern Networks”, SC '05: Proceedings of the 2005 ACM/IEEE Conference on Supercomputing. Washington, DC, USA: IEEE Computer Society 2005, 9 pages. cited by applicant Tseng et al., “All-to-All Personalized Communication in a Wormhole-Routed Torus”, IEEE Transactions on Parallel and Distributed Systems, vol. 7, No. 5, May 5, 1996, pp. 498-505. cited by applicant Japanese Office Action dated Oct. 15, 2013 for corresponding Japanese Application No. 2009-201560, with Partial English-language Translation. cited by applicant Boku, Taisuke et al., “Performance Evaluation of Hyper Cross Bar Network” Technical Research Report for the Institute of Electronics, Information and Communication Engineers; Corporate Juridical Person The Institute of Electronics, Information and Communication Engineers, Nov. 16, 1993, vol. 93, No. 3230, (CPSY 93-35 to 44), pp. 41-48, with English-language Abstract & Pursuant to MPEP §609, in fulfillment of the requirement under 37 CFR §1.98(a)(3)(i) for a concise explanation of relevance regarding this cited reference, the Office's attention is directed to the Partial English-language translation of the official action mailed by the JPO as listed above in this section of form SB08. cited by applicant Hillis, W. Daniel, “The Connection Machine: A Computer Architecture Based on Cellular Automata”, Physica 10D, North Holland, Amsterdam, NL, vol. 10, No. 1-2, pp. 213-228, Jan. 1, 1984, XP024479362. cited by applicant Extended European Search Report dated Feb. 15, 2011 for corresponding European Patent Application No. 10174920.8, 6 pages. cited by applicant Tucker, Lewis W. et al.,“Architecture and Applications of the Connection Machine”, Computer, IEEE Service Center, Los Alamitos, CA, US, vol. 21, No. 8, pp. 26-38, Aug. 1, 1988, XP000118166. cited by applicant Hillis, W. Daniel et al.,“The CM-5 Connection Machine: A Scalable Supercomputer”, Communications of the ACM, New York, US, vol. 36, No. 11, pp. 31-40, Nov. 1, 1993, XP000415037. cited by applicant Leiserson, Charles et al.,“The Network Architecture of the Connection Machine CM-5”, Journal of Parallel and Distributed Computing, Elsevier, Amsterdam, NL, vol. 33, No. 2, pp. 145-158, Mar. 15, 1996, XP004419204. cited by applicant European Summons to attend oral proceedings pursuant to Rule 115(1) EPC dated Jun. 13, 2018 for corresponding European Patent Application No. 10174920.8, 8 pages. **Please note EP-0206580-A2 and EP-0544532-A2 cited herewith, were previously cited in and IDS filed on Feb. 8, 2016. cited by applicant European Office Action dated Nov. 6, 2015 for corresponding European Patent Application No. 10174920.8, 5 pages. **Please note EP-0206580-A2 cited herein, was previously cited in IDS filed on Feb. 8, 2016.***. cited by applicant |
| Primary Examiner: | Giroux, George |
| Attorney, Agent or Firm: | Fujitsu Patent Center |
| Accession Number: | edspgr.10430375 |
| Database: | USPTO Patent Grants |
| Abstract: | A parallel computing system includes a plurality of processors multi-dimensionally commented by an interconnection network, wherein each of the processors in the parallel computing system determines, in dimensional order, communication channels to other processors in the interconnection network, each of the processors sets, as relative coordinates of destination processors with respect to the plurality of processors in data communications performed at a same timing, relative coordinates common to all of the processors, and each of the processors performs data communications with destination processors having the set relative coordinates. |
|---|