Shuffle embedded distributed storage system supporting virtual merge and method thereof
Gespeichert in:
| Titel: | Shuffle embedded distributed storage system supporting virtual merge and method thereof |
|---|---|
| Patent Number: | 10135,926 |
| Publikationsdatum: | November 20, 2018 |
| Appl. No: | 14/919135 |
| Application Filed: | October 21, 2015 |
| Abstract: | Provided herein is a shuffle embedded distributed storage system and method supporting virtual merge, the system and method including a distributed shared storage configured to store a virtual merged file; a plurality of map servers connected to the distributed shared storage via a network, and configured to perform a map function and record a map result data computed as a result of the map function in the distributed shared storage by means of a map result file; and a plurality of reduce servers connected to the distributed shared storage and the map servers via the network, wherein the virtual merged file includes a list of the map result files recorded by the plurality of map servers, and an identifier of a reduce server to which the virtual merged file is to be transmitted. |
| Inventors: | Electronics and Telecommunications Research Institute (Daejeon, KR) |
| Assignees: | ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTITUTE (Daejeon, KR) |
| Claim: | 1. A shuffle embedded distributed storage system supporting virtual merge, the system comprising: a distributed shared storage configured to store a virtual merged file; a plurality of map servers connected to the distributed shared storage via a network, and configured to perform a map function and record a map result data computed as a result of the map function in an aligned format in the distributed shared storage by means of a map result file; and a plurality of reduce servers connected to the distributed shared storage and the map servers via the network for performing a reduce function on the map result files received from the virtual merged file of the distributed shared storage, wherein the virtual merged file comprises a list of the map result files generated by the plurality of map servers, and an identifier of one of the plurality of reduce servers to which the virtual merged file is to be transmitted, wherein the map result files are registered in the virtual merged file, wherein one or more of the plurality of map servers transmits an identifier of the virtual merged file to one or more of the plurality of the reduce servers, and wherein, in response to receiving a request for data reading from a selected one of the plurality of reduce servers, the distributed shared storage searches the virtual merged file having an identifier that is identical to the selected reduce server, and reads and aligns data of the map result files included in the searched virtual merged file consecutively, and transmits the aligned data to the selected reduce server without merging the map result files. |
| Claim: | 2. A shuffle embedded distributed storage method supporting virtual merge, the method comprising: reading, by a plurality of map servers, a map input file from a distributed shared storage, and performing a map function; performing the map function, and recording a computed map result data in an aligned format in the distributed shared storage by means of a map result file; registering information on map result files recorded by the plurality of map servers in a virtual merged file, wherein the virtual merged file comprises a list of the map result files generated by the plurality of map servers, and an identifier of one of the plurality of reduce servers to which the virtual merged file is to be transmitted, transmitting by one or more of the plurality of map servers an identifier of the virtual merged file to one or more of the plurality of the reduce servers, in response to receiving a request for data reading from a selected one of the plurality of reduce servers, searching the virtual merged file having an identifier that is identical to the selected reduce server, and reading and aligning data of the map result files included in the searched virtual merged file consecutively, and transmitting the aligned data to the selected reduce server without merging the map result files. |
| Claim: | 3. The shuffle embedded distributed storage method supporting virtual merge according to claim 2 , further comprising: requesting, by the reduce server, the distributed shared storage to read data; determining whether or not there is data remaining in map result files registered in the virtual merged file; in response to there being data in the map result files, circulating the map result files and reading data of a certain area and accumulating the read data in a memory; aligning the data accumulated in the memory; and transmitting the aligned data to the reduce server. |
| Claim: | 4. The shuffle embedded distributed storage method supporting virtual merge according to claim 3 , wherein, in response to there being no data remaining in the map result files, the reduce function of the reduce server is ended. |
| Claim: | 5. The shuffle embedded distributed storage method supporting virtual merge according to claim 3 , wherein the map servers, distributed shared storage and reduce server are connected to one another via a network. |
| Claim: | 6. The shuffle embedded distributed storage method supporting virtual merge according to claim 3 , wherein, after the information on the map result files is registered in the virtual merged file, the map servers transmit the identifier of the virtual merged file to the reduce server. |
| Patent References Cited: | 7349406 March 2008 Robins 8266192 September 2012 Nemoto 8990526 March 2015 Wade 9022869 May 2015 DeSanti 9164678 October 2015 Wade 9389994 July 2016 Hu 9389995 July 2016 Hu 9734160 August 2017 Huntwork 2008/0095079 April 2008 Barkley 2011/0313973 December 2011 Srivas et al. 2012/0101991 April 2012 Srivas et al. 2012/0209943 August 2012 Jung 2013/0166503 June 2013 Chung et al. 2013/0167151 June 2013 Verma 2013/0339966 December 2013 Meng 2014/0317056 October 2014 Kim et al. 2015/0035858 February 2015 Yang 2015/0150017 May 2015 Hu 2016/0179568 June 2016 Bezbaruah 2012-0092930 August 2012 2014-0055093 May 2014 |
| Other References: | Maltzahn, C. et al., “Ceph as a scalable alternative to the Hadoop Distributed File System,” LOGIN, vol. 35(4), pp. 38-49 (Aug. 2010). cited by applicant |
| Primary Examiner: | Le, Hung |
| Attorney, Agent or Firm: | Rabin & Berdo, P.C. |
| Dokumentencode: | edspgr.10135926 |
| Datenbank: | USPTO Patent Grants |
| Abstract: | Provided herein is a shuffle embedded distributed storage system and method supporting virtual merge, the system and method including a distributed shared storage configured to store a virtual merged file; a plurality of map servers connected to the distributed shared storage via a network, and configured to perform a map function and record a map result data computed as a result of the map function in the distributed shared storage by means of a map result file; and a plurality of reduce servers connected to the distributed shared storage and the map servers via the network, wherein the virtual merged file includes a list of the map result files recorded by the plurality of map servers, and an identifier of a reduce server to which the virtual merged file is to be transmitted. |
|---|