Data-Dependent Hashing Based on p-Stable Distribution

The p-stable distribution is traditionally used for data-independent hashing. In this paper, we describe how to perform data-dependent hashing based on p-stable distribution. We commence by formulating the Euclidean distance preserving property in terms of variance estimation. Based on this property...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE transactions on image processing Vol. 23; no. 12; pp. 5033 - 5046
Main Authors:	Bai, Xiao, Yang, Haichuan, Zhou, Jun, Ren, Peng, Cheng, Jian
Format:	Journal Article
Language:	English
Published:	United States IEEE 01.12.2014 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:	Binary codes Educational institutions Euclidean distance Hash based algorithms Image processing Mathematical analysis Methods Preserves Projection Quantization (signal) Semantics Similarity Training Vectors Vectors (mathematics) hash retrieval Image retrieval p-stable distribution
ISSN:	1057-7149, 1941-0042, 1941-0042
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	The p-stable distribution is traditionally used for data-independent hashing. In this paper, we describe how to perform data-dependent hashing based on p-stable distribution. We commence by formulating the Euclidean distance preserving property in terms of variance estimation. Based on this property, we develop a projection method, which maps the original data to arbitrary dimensional vectors. Each projection vector is a linear combination of multiple random vectors subject to p-stable distribution, in which the weights for the linear combination are learned based on the training data. An orthogonal matrix is then learned data-dependently for minimizing the thresholding error in quantization. Combining the projection method and orthogonal matrix, we develop an unsupervised hashing scheme, which preserves the Euclidean distance. Compared with data-independent hashing methods, our method takes the data distribution into consideration and gives more accurate hashing results with compact hash codes. Different from many data-dependent hashing methods, our method accommodates multiple hash tables and is not restricted by the number of hash functions. To extend our method to a supervised scenario, we incorporate a supervised label propagation scheme into the proposed projection method. This results in a supervised hashing scheme, which preserves semantic similarity of data. Experimental results show that our methods have outperformed several state-of-the-art hashing approaches in both effectiveness and efficiency.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	1057-7149 1941-0042 1941-0042
DOI:	10.1109/TIP.2014.2352458