Paper Reading on Image Retrieval


0. 一些基础

0.1 SIFT特征

Lowe, D.G.: Distinctive image features from scale-invariant keypoints. International journal of computer vision 60(2), 91{110 (2004)

0.2 Fisher Kernel

Perronnin, F., Liu, Y., Sanchez, J., Poirier, H.: Large-scale image retrieval with compressed sher vectors. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. pp. 3384{3391. IEEE (2010)

0.3 PQ乘积量化

0.4 VLAD特征

Jégou, H., Douze, M., Schmid, C., & Pérez, P. (2010). Aggregating local descriptors into a compact image representation. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 3304–3311. link

解决了大规模图像搜索的问题,其中必须共同考虑三个约束:搜索的准确性,其效率以及表示的内存使用情况。首先提出一种简单而有效的方法,将本地图像描述符聚合到一个有限维的向量中,这可以看作是Fisher核表示的简化。 然后,我们展示了如何共同优化降维和索引算法,从而最好地保留了矢量比较的质量。 评估显示,我们的方法明显优于现有技术:对于20个字节的图像表示,搜索准确性与功能袋方法相当。 搜索一千万个图像数据集大约需要50毫秒。

0.5 局部敏感哈希

1. SIFT Meets CNN: A Decade Survey of Instance Retrieval


Zheng, L., Yang, Y., & Tian, Q. (2018). SIFT Meets CNN: A Decade Survey of Instance Retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(5), 1224–1244. link

1.1 Introduction

1.2 SIFT-based

1.2.1 Pipeline

1.2.2 局部特征提取

1.2.3 小码本检索

1.2.4 大码本检索

1.2.5 中型码本检索

1.2.6 其他问题

1.3 基于CNN的图像检索


1.3.1 使用预训练CNN模型

1.3.2 微调CNN

1.3.3 混合方法

1.3.4 讨论

1.4 实验比较

1.4.1 图像检索数据集

1.4.2 评价指标

1.4.3 比较分析

1.5 未来方向

2. All about VLAD

Arandjelovic, R., & Zisserman, A. (2013). All about VLAD. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 1578–1585. link

2.1 Introduction


2.2 VLAD

2.3 单词表适应

2.4 内归一化

2.5 Multi-VLAD

3. NetVLAD: CNN architecture for weakly supervised place recognition

Arandjelovic, R., Gronat, P., Torii, A., Pajdla, T., & Sivic, J. (2018). NetVLAD: CNN Architecture for Weakly Supervised Place Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(6), 1437–1451. link

3.1 Introduction

3.2 方法

3.3 结构

3.4 弱监督学习

4. Stochastic Attraction-Repulsion Embedding for Large Scale Image Localization

Liu, L., Li, H., & Dai, Y. (2019). Stochastic attraction-repulsion embedding for large scale image localization. Proceedings of the IEEE International Conference on Computer Vision, 2019-Octob, 2570–2579. link

4.1 Introduction

4.2 方法

4.3 loss 的比较

4.4 多个负样本

5. Self-supervising Fine-grained Region Similarities for Large-scale Image Localization

Ge, Y., Wang, H., Zhu, F., Zhao, R., & Li, H. (2020). Self-supervising Fine-grained Region Similarities for Large-scale Image Localization. 1–18. Retrieved from link

5.1 Introduction

5.2 方法

5.2.1 基于检索的IBL

5.2.2 自监督query-gallery相似度

5.2.3 细粒度image-to-region相似度

Sθωr(q,p1,,pk;τω)=softmax([fθωq,fθωp1/τω,fθωq,fθωr11/τω,,fθωq,fθωr18/τω,,fθωq,fθωpk/τω,fθωq,fθωrk1/τω,,fθωq,fθωrk8/τω]T) \begin{aligned} S^r_{\theta_\omega}(q,p_1,\cdots,p_k;\tau_\omega) &= \mathrm{softmax}\Big([\langle f_{\theta_\omega}^q, f_{\theta_\omega}^{p_1}\rangle/\tau_\omega,\langle f_{\theta_\omega}^q, f_{\theta_\omega}^{r_1^1}\rangle/\tau_\omega, \cdots,\langle f_{\theta_\omega}^q, f_{\theta_\omega}^{r_1^8}\rangle/\tau_\omega,\\ & \cdots, \langle f_{\theta_\omega}^q, f_{\theta_\omega}^{p_k}\rangle/\tau_\omega,\langle f_{\theta_\omega}^q, f_{\theta_\omega}^{r_k^1}\rangle/\tau_\omega, \cdots,\langle f_{\theta_\omega}^q, f_{\theta_\omega}^{r_k^8}\rangle/\tau_\omega]^T\Big) \end{aligned}