tags: Deep Learning, Computer Vision, Image Retrieval, Hashing, Survey
Hashing BaseLine: https://github.com/willard-yuan/hashing-baseline-for-image-retrieval
Some representative papers about deep hashing
1.Given the pairwise similarity matrix \(S\) over training images, they use a scalable coordinate descent method to decompose \(S\) into a product of \(HH^T\) where \(H\) is a matrix with each of its rows being the approximate hash code associated to a training image.

2.In the second stage, the idea is to simultaneously learn a good feature representation for the input images as well as a set of hash functions, via a deep convolutional network tailored to the learned hash codes in \(H\) and optionally the discrete class labels of the images. (Using Alexnet)

The pipeline of the proposed deep architecture consists of three building blocks: 1) a sub-network with a stack of convolution layers to produce the effective intermediate image features; 2) a divide-and-encode module to divide the intermediate image features into multiple branches, each encoded into one hash bit; and 3) a triplet ranking loss designed to characterize that one image is more similar to the second image than to the third one.

We pose hashing learning as a problem of regularized similarity learning. Specifically, we organize the training images into a batch of triplet samples, each sample containing two images with the same label and one with a different label. With these triplet samples, we maximize the margin between matched pairs and mismatched pairs in the Hamming space. In addition, a regularization term is introduced to enforce the adjacency consistency, i.e., images of similar appearances should have similar codes. The deep convolutional neural network is utilized to train the model in an end-to-end fashion, where discriminative image features and hash functions are simultaneously optimized.

Deep convolutional neural network is incorporated into hash functions to jointly learn feature representations and mappings from them to hash codes, which avoids the limitation of semantic representation power of hand-crafted features. Meanwhile, a ranking list that encodes the multilevel similarity information is employed to guide the learning of such deep hash functions. An effective scheme based on surrogate loss is used to solve the intractable optimization problem of non-smooth and multivariate ranking measures involved in the learning procedure.
And calculate derivative values.

Our model is learned under three constraints at the top layer of the deep network: 1) the loss between the original real-valued feature descriptor and the learned binary vector is minimized, 2) the binary codes distribute evenly on each bit, and 3) different bits are as independent as possible. To further improve the discriminative power of the learned binary codes, we extend DH into supervised DH (SDH) by including one discriminative term into the objective function of DH which simultaneously maximizes the inter-class variations and minimizes the intra-class variations of the learned binary codes.
| DH Loss function: \(J = J_1 -\lambda _1J_2 +\lambda_ 2J_3 +\lambda _3J_4\), where $$J_1 = \frac{1}{2} | B-H^M | F^2\(is the quantization loss,\)J_2= \frac{1}{2N} tr(H^M(H^M)^T)\(is the balance bits constraint,\)J_3 = \frac{1}{2}\sum\limits{m=1}^M | W^m(W^m)T-I | _F^2\(is the independent bit constraint, and\)J_4 = \frac{1}{2}( | W^m | ^2_F+ | c^m | _2^2)$$ are regularizers to control scales of parameters. |


(DPSH) Feature Learning based Deep Supervised Hashing with Pairwise Labels [paper][code]
Wu-Jun Li, Sheng Wang and Wang-Cheng Kang. [arXiv], 2015
| Define the pairwise loss function similar to that of LFH: $$\displaystyle L = -\log p(\mathcal{S} | \mathcal{B}) = - \log p(\mathcal{s}_{ij} | \mathcal{B}) = -\sum\limits_{\mathcal{s}{ij}\in\mathcal{S}} (\mathcal{s}{ij}\theta_{ij}-\log(1 + e^{\theta_{ij}} ))\(, where\)\mathcal{B}={b_i}^n_{i=1},\theta_{ij}=\frac{1}{2}b^T_ib_j$$ |
