Distribution of data in a distributed shared storage system

Gespeichert in:
Bibliographische Detailangaben
Titel: Distribution of data in a distributed shared storage system
Patent Number: 7,844,775
Publikationsdatum: November 30, 2010
Appl. No: 11/524666
Application Filed: September 21, 2006
Abstract: Segments or blocks of a file can be distributed among an number n of storage units by using a function of sequentially assigned segment identifiers for each segment, where the function is reduced modulo n, so long as the function is not congruent to segment identifier, modulo n, for any given segment identifier. An example of such a function, where n is the number of storage units and k is a segment identifier, is f(k)=ak+b, where a is relatively prime to n. Such a function can be computed quickly for any given segment. As the list of storage units changes, data may be redistributed using a new mapping of segments to storage units. Any new mapping can be restricted so that segments only move to a new storage unit or from an old storage unit, and not from one existing storage unit to another. In this way, the amount of data to be moved is limited. A chain of the lists of available storage units, as that list changes over time, is maintained to permit the history of file mappings to be recreated.
Inventors: Snaman, Jr., William E. (Nashua, NH, US); Rabinowitz, Stanley (Chelmsford, MA, US); Aiduk, David M. (West Newbury, MA, US); Kuninsky, Mitchel (Woburn, MA, US)
Assignees: Avid Technology, Inc. (Burlington, MA, US)
Claim: 1. A distributed shared storage system, comprising: a plurality of client systems; a plurality of independent storage units for storing a plurality of files, each file comprising a plurality of segments of data, each of the plurality of segments being identified by a file identifier corresponding to the file from which that segment originates and a segment identifier; wherein a client requests a segment of data by providing the segment identifier corresponding to the requested segment and the file identifier corresponding to the requested segment; wherein for each of the plurality of files, segments of the file are distributed among the plurality of storage units by using a mapping based on a linear function of the segment identifier for each segment, wherein the function is reduced modulo n, wherein n is the number of independent storage units in the distributed shared storage system, and wherein, a segment identified by segment identifier i of the file is mapped onto storage element e in accordance with a formula e =(s+ik)mod n,  wherein s is an offset, and k is a stride, wherein the offset and the stride are generated from a seed value associated with the file, and k is relatively prime to n,
Claim: 2. The distributed shared storage system of claim 1 , wherein the formula is used to provide an index into a list of the n storage units.
Claim: 3. The distributed shared storage system of claim 1 , wherein addition of a new storage unit to the plurality of storage units or removal of one of the plurality of storage units causes data to be redistributed using a new mapping of segments to storage units.
Claim: 4. The distributed shared storage system of claim 3 , wherein, if a new storage unit is added, the new mapping causes at least one of the plurality of segments to move to the new storage unit from one of the plurality of storage units, without causing segments to move from one of the plurality of storage units to another one of the plurality of storage units.
Claim: 5. The distributed shared storage system of claim 2 , wherein a chain of lists of available storage units is maintained as storage units are added or removed.
Claim: 6. The distributed storage system of claim 3 , wherein if one of the plurality of storage units is removed, the new mapping causes the segments stored on the removed storage unit to be mapped onto the remaining ones of the plurality of storage units without mapping segments from one of the remaining ones of the plurality of storage units to another one of the remaining ones of the plurality of storage units.
Claim: 7. A method of storing a data file on a plurality of computer-readable storage units, the data file comprising a plurality of segments, each segment being identified by a segment identifier, the method comprising: for each of the plurality of segments, mapping that segment to one of the plurality of storage units wherein the mapping is based on a linear function of the corresponding segment identifier, wherein the function is reduced modulo n, wherein n is the number of storage units in the plurality of computer-readable storage units; generating a seed value; generating an offset value s and a stride value k, wherein s and k are derived from the seed value, wherein the offset value is a whole number greater than or equal to 1 and less than or equal to n, and wherein k is a whole number that is relatively prime to n; wherein the mapping is e=(s+ik)mod n, wherein e identifies the storage unit to which a segment having segment identifier i is mapped; and storing the segment on the storage unit to which it is mapped.
Claim: 8. The method of claim 7 , further comprising storing a plurality of data files on the plurality of computer-readable storage units, the plurality of data files including the first-mentioned data file, wherein storing each of the plurality of data files involves mapping the segments of that data file to the plurality of storage units based on a corresponding linear function associated with that data file.
Claim: 9. The method of claim 8 , further comprising for each of the plurality of data files, generating a corresponding seed value to be associated with that data file and using the seed value to generate an offset value s and a stride value k corresponding to that data file, wherein the offset value is a whole number greater than or equal to 1 and less than or equal to n, and wherein k is a whole number that is relatively prime to n, wherein the mapping for that file is e=(s+ik)mod n, wherein e identifies the storage unit to which a segment having segment identifier i is mapped.
Claim: 10. The method of claim 9 , further comprising generating an index into a list of the plurality of storage units using the file mappings.
Claim: 11. The method of claim 7 , wherein if a new storage unit is added to the plurality of storage units or one of the plurality of storage units is removed, redistributing the segments among the plurality of storage units using a new mapping.
Claim: 12. The method of claim 11 , wherein, if a new storage unit is added, the new mapping maps at least one of the plurality of segments to the new storage unit without mapping segments from one of the plurality of storage units to another one of the plurality of storage units.
Claim: 13. The method of claim 11 , wherein if one of the plurality of storage units is removed, the new mapping maps the segments stored on the removed storage unit onto the remaining ones of the plurality of storage units without mapping segments from one of the remaining ones of the plurality of storage units to another one of the remaining ones of the plurality of storage units.
Claim: 14. The method of claim 11 , further comprising maintaining a chain of lists of available storage units as storage units are added or removed.
Current U.S. Class: 711/112
Patent References Cited: 5754844 May 1998 Fuller
6065010 May 2000 Otsuka et al.
7512673 March 2009 Miloushev et al.
2002/0199060 December 2002 Peters et al.
2004/0003173 January 2004 Yao et al.
2006/0080353 April 2006 Miloushev et al.
2008/0010296 January 2008 Bayliss et al.
Other References: Goel, Ashish, et al., “SCADDAR: An Efficient Randomized Technique To Reorganize Continuous Media Blocks”, In Proceedings of the 18th International Conference on Data Engineering , 2002. cited by other
Assistant Examiner: Dudek, Edward J
Primary Examiner: Tsai, Sheng-Jen
Attorney, Agent or Firm: Strimpel, Oliver
Dokumentencode: edspgr.07844775
Datenbank: USPTO Patent Grants
Beschreibung
Abstract:Segments or blocks of a file can be distributed among an number n of storage units by using a function of sequentially assigned segment identifiers for each segment, where the function is reduced modulo n, so long as the function is not congruent to segment identifier, modulo n, for any given segment identifier. An example of such a function, where n is the number of storage units and k is a segment identifier, is f(k)=ak+b, where a is relatively prime to n. Such a function can be computed quickly for any given segment. As the list of storage units changes, data may be redistributed using a new mapping of segments to storage units. Any new mapping can be restricted so that segments only move to a new storage unit or from an old storage unit, and not from one existing storage unit to another. In this way, the amount of data to be moved is limited. A chain of the lists of available storage units, as that list changes over time, is maintained to permit the history of file mappings to be recreated.