The source code is here.
Introduction
The two most prevalent types of loss functions in ReID are classification loss (e.g. softmax cross entropy loss) and metric learning based loss (e.g. triplet loss and contrastive loss):
Classification loss has promising convergence but is vulnerable to overfitting. It processes samples individually and only builds connections implicitly through the classifier.
Metric learning based loss explicitly optimizes the distances between samples. While the similarity structure it builds only involves a pair/triplet of data points and ignores other informative samples. This leads to a large proportion of trivial pairs/triplets which could overwhelm the training process and eventually makes the model suffer from slow convergence.
Most existing methods process data points individually or only involves a fraction of samples while building a similarity structure. They ignore dense informative connections among samples more or less. The lack of holistic observation eventually leads to inferior performance. To relieve the issue, we propose to formulate the whole data batch as a similarity graph.
Spectral Feature Transformation
The final embedding of a training batch is regarded as a similarity graph.
Affinity Matrix $W$:
$$w_{ij}=\exp\left(\frac{x_i^Tx_j}{\sigma\cdot||x_i||_2\cdot||x_j||_2}\right)$$
where $\sigma$ is temperature hyper-parameter.
Transition Probability Matrix $T$ (归一化的 $W$):
$$T=softmax(W)$$
Spectral Feature Transformation
$$X’=TX$$
Note:
To fully liberate the power of spectral clustering, it is necessary to satisfy the assumption that the input data obey the underlying cluster structure, i.e., there must be sufficient images for each identity in the training batch.
The proposed spectral feature transformation is just applied in the training process and would be discarded during inference.
核心代码:
Local Blurring Re-ranking
Given a probe image, images in the gallery are ranked according to the cosine similarity with it.
We collect features of top-n entries and perform spectral feature transformation on them.
The top-n rank list is recomputed based on the similarity derived from transformed features.
The extension is based on the assumption that there underlies a cluster structure in the neighborhood of the probe images. This is exactly the case when the feature extractor has been properly trained on the training data. As expressed in the mathematical formulation of spectral feature transformation, the embedding of each data point will be blurred by the others according to the similarities between them. Each data point will be moved towards the high-density area (i.e. cluster center) which has more short paths to it. This process is equivalent to conduct a clustering operation on local neighbors of the probe image. Therefore, it can make the cluster structure more compact and relieve the ambiguous issue in retrieval.
Compared with k-reciprocal re-ranking which is operated on the whole test set, the proposed re-ranking is much more efficient.