CACHE INDEXING USING DATA ADDRESSES BASED ON DATA FINGERPRINTS

Saved in:
Bibliographic Details
Title: CACHE INDEXING USING DATA ADDRESSES BASED ON DATA FINGERPRINTS
Document Number: 20220269657
Publication Date: August 25, 2022
Appl. No: 17/180903
Application Filed: February 22, 2021
Abstract: A cache storage system indexing method is provided that indexes a data address in a cache storage system based on a data fingerprint of the cached data, wherein the data fingerprint is generated by a deduplication fingerprint function used for referencing deduplication of data in the cache storage system. A computer-implemented method of data operations to a cache storage system is also provided including: obtaining a data fingerprint for the data of the data operation, either by applying a deduplication fingerprinting function to data of a write operation or by accessing deduplication metadata for a read operation to obtain the data fingerprint generated by using a deduplication fingerprinting function used for deduplication of data in the cache storage system; and using an indexing service to the cache storage system having an address schema based on the data fingerprints of the data.
Claim: 1. A computer-implemented method for cache storage system indexing, the computer-implemented method comprising: indexing, by one or more computer processors, a data address in a cache storage system based on a data fingerprint of a cached data, wherein the data fingerprint is generated by a deduplication fingerprint function used for referencing deduplication of data in the cache storage system.
Claim: 2. The computer-implemented method of claim 1, wherein indexing the data address is based on a subset of the data fingerprint as a reference into an array that stores cache addresses in a cache address space of the cache storage system.
Claim: 3. The computer-implemented method of claim 1, wherein the data fingerprint is obtained for read operations from deduplication metadata that maps different address aliases from different software logical layers to the deduplication data fingerprint.
Claim: 4. The computer-implemented method of claim 1, wherein the data fingerprint is obtained for read operations from deduplication metadata that maps different address aliases from different client domains to the deduplication data fingerprint.
Claim: 5. The computer-implemented method of claim 1, wherein the cache storage system is a single uniform cache for read/write operations at multiple layers of a storage subsystem, and wherein the indexing provides a uniform indexing across the multiple layers.
Claim: 6. The computer-implemented method of claim 1, wherein the cache storage system is a sharded cache with sharding based on the data fingerprints generated by the deduplication fingerprint function.
Claim: 7. A computer-implemented method for data operations to a cache storage system, comprising: obtaining, by one or more computer processors, a data fingerprint for a data of a data operation, by applying a deduplication fingerprinting function to a data of a write operation, to obtain the data fingerprint generated by using a deduplication fingerprinting function used for deduplication of data in the cache storage system; and using an indexing service to the cache storage system having an address schema based on the data fingerprint of the data of the write operation.
Claim: 8. The computer-implemented method of claim 7, wherein the indexing service uses a reference based on a subset of a data fingerprint as an address into an array that stores cache addresses in a cache address space of the cache storage system.
Claim: 9. The computer-implemented method of claim 7, wherein the data operation is a write operation and the method further comprises: receiving, by one or more computer processors, a data write operation; calculating, by one or more computer processors, a data fingerprint of a data of the data write operation using a fingerprint function used for deduplication of a data in the cache storage system; creating, by one or more computer processors, a cache entry for the data of the data write operation; and using the data fingerprint to address the cache entry in the cache storage system.
Claim: 10. The method as claimed in claim 7, wherein the data operation is a read operation and the method further comprises: receiving, by one or more computer processors, a data read operation with a logical address for the data of the data read operation; obtaining, by one or more computer processors, a data fingerprint of the data for the data read operation from deduplication metadata by referencing the logical address; and using the data fingerprint to address a data in the cache storage system to attempt to read the data of the data read operation.
Claim: 11. The computer-implemented method of claim 10, wherein obtaining the data fingerprint is carried out after a cache miss using an original addressing scheme of the cache storage system when a read operation comes down to a deduplication layer as it could not be serviced using the logical address.
Claim: 12. The computer-implemented method of claim 10, wherein obtaining the data fingerprint and using the data fingerprint to address data is carried out prior to an original addressing scheme.
Claim: 13. The computer-implemented method of claim 10, further comprising: allowing, by one or more computer processors, false hits of the cache storage system using the data fingerprint to address the data of the data read operation; and verifying, by one or more computer processors, the data fingerprint with the data fingerprint of a retrieved data.
Claim: 14. The computer-implemented method of claim 10, further comprising: using, by one or more computer processors, a subset of the data fingerprint as an address into an array storing cache addresses; and verifying, by one or more computer processors, the data fingerprint with the data fingerprint of retrieved data.
Claim: 15. The computer-implemented method of claim 10, further comprising: discovering, by one or more computer processors, multiple references to a data source by use of the data fingerprint addressing; and unifying, by one or more computer processors, the references to a data source thereby increasing the deduplication.
Claim: 16. The computer-implemented method of claim 10, wherein obtaining the data fingerprint and using the data fingerprint to address the data is carried out in parallel with an original addressing scheme.
Claim: 17. A computer system comprising: one or more computer processors; one or more computer readable storage media; and program instructions, stored on the one or more computer readable storage media for execution by at least one of the one or more computer processors, the program instructions comprising: program instructions to create an array that stores cache addresses into a shared cache address space, the array having references based on a data fingerprint of a cached data, wherein the data fingerprint is generated by a deduplication fingerprint function used for deduplication of a data in a cache storage system.
Claim: 18. The computer system of claim 17, further comprising one or more of the following program instructions, stored on the one or more computer readable storage media for execution by at least one of the one or more computer processors, to: obtain deduplication data fingerprints when carrying out read and write operations to the cache storage system, including: obtain the data fingerprint for read operations from a deduplication metadata, wherein the deduplication metadata maps address aliases from different software logical layers or different client domains to the deduplication data fingerprint; and apply a deduplication fingerprinting function to data of a write operation.
Claim: 19. The computer system of claim 17, wherein the computer system includes a single uniform cache for read/write operations at multiple layers of a storage subsystem, and wherein the computer system provides a uniform indexing across the multiple layers.
Claim: 20. The computer system of claim 17, further comprising one or more of the following program instructions, stored on the one or more computer readable storage media for execution by at least one of the one or more computer processors, to: create a sharded cache with sharding based on the data fingerprints generated by the deduplication fingerprint function.
Claim: 21. A cache storage system indexing system, comprising: a data fingerprint obtaining component for obtaining a data fingerprint for a data of a data operation, by applying a deduplication fingerprinting function to data of a write operation and by accessing deduplication metadata for a read operation to obtain the data fingerprint generated by using a deduplication fingerprinting function used for deduplication of data in the cache storage system; and an array lookup component for obtaining a cache address for the data of the data operation to the cache storage system by using a reference based on the data fingerprint of the data of the data operation.
Claim: 22. The cache storage system indexing system of claim 21, further comprising: a write operation component for: receiving a data write operation; calculating a data fingerprint of a data of the data write operation using a fingerprint function used for deduplication of a data in the cache storage system; creating a cache entry for the data of the data write operation; and using the data fingerprint to address the cache entry in the cache storage system; and a read operation component for: receiving a data read operation with a logical address for a data of the data read operation; obtaining a data fingerprint of the data of the data read operation from deduplication metadata by referencing the logical address; and using the data fingerprint to address the data of the data write operation and the data of the data read operation in the cache storage system.
Claim: 23. The cache storage system indexing system of claim 21, further comprising: a hit verifying component for verifying the data fingerprint with the data fingerprint of retrieved data.
Claim: 24. The cache storage system indexing system of claim 21, further comprising: a reference unifying component for discovering multiple references to a data source by use of the data fingerprint addressing and unifying the references to a data source thereby increasing the deduplication.
Claim: 25. A cache storage system sharded using a data fingerprint address as a shard key, where the data fingerprint address is an address based on a data fingerprint obtaining using a deduplication fingerprint function used for deduplication of data in the cache storage system.
Current International Class: 06; 06; 06
Accession Number: edspap.20220269657
Database: USPTO Patent Applications
Description
Abstract:A cache storage system indexing method is provided that indexes a data address in a cache storage system based on a data fingerprint of the cached data, wherein the data fingerprint is generated by a deduplication fingerprint function used for referencing deduplication of data in the cache storage system. A computer-implemented method of data operations to a cache storage system is also provided including: obtaining a data fingerprint for the data of the data operation, either by applying a deduplication fingerprinting function to data of a write operation or by accessing deduplication metadata for a read operation to obtain the data fingerprint generated by using a deduplication fingerprinting function used for deduplication of data in the cache storage system; and using an indexing service to the cache storage system having an address schema based on the data fingerprints of the data.