Abstract
We address the problem of fine-grained image recognition using user click data, wherein each image is represented as a semantical query-click feature vector. Usually, the query set obtained from search engines is large-scale and redundant, making the click feature be high-dimensional and sparse. We propose a novel query modeling approach to merge semantically similar queries, and construct a compact click feature with the merged queries. To deal with the sparsity and in-consistency in click feature, we design a graph based propagation approach to predict the zero-clicks, ensuring similar images have similar clicks for each query. Afterwards, using the propagated click feature, we formulate the query merging problem as a sparse coding based recognition task. In addition, the hot queries are utilized to construct the dictionary. We evaluate our method for fine-grained image recognition on the public Clickture-Dog dataset. It is shown that, the propagated click feature performs much better than the original one. In the query merging procedure, sparse coding performs better than traditional K-mean algorithm. Also, the “hot queries” outperform K-SVD in dictionary learning.




Similar content being viewed by others
Notes
The optimal α is 0.9 and 0.5 for Prop-E and Prop-W respectively.
We use VGG-net [13] with 16-layers to learn a CNN model, including 13 convolutional layers and 3 fully connected layers. It is pre-trained on ImageNet Large-Scale Visual Recognition Challenge (ILSVRC)-2012 dataset.
References
Berg T, Liu J, Lee SW, Alexander ML, Jacobs DW, Belhumeur PN (2014) Birdsnap: large-scale fine-grained visual categorization of birds. In: IEEE Conference on computer vision and pattern recognition, pp 2019–2026
Chang YS (2017) Fine-grained attention for image caption generation. Multimed Tool Appl PP(7):1–13
Cilibrasi RL, Vitanyi P (2007) The google similarity distance. IEEE Trans Knowl Data Eng 19(3):370–383
Datta D, Singh SK, Chowdary CR (2017) Bridging the gap: effect of text query reformulation in multimodal retrieval. Multimed Tool Appl 76:1–18
Feng L, Bhanu B (2016) Semantic concept co-occurrence patterns for image annotation and retrieval. IEEE Trans Pattern Anal Mach Intell 38(4):1–1
Hua XS, Yang L, Wang J, Wang J, Ye M, Wang K, Rui Y, Li J (2013) Clickage: towards bridging semantic and intent gaps via mining click logs of search engines. In: ACM International conference on multimedia. ACM, pp 243–252
Khosla A, Jayadevaprakash N, Yao B, Fei-Fei L (2011) Novel dataset for fine-grained image categorization. In: First workshop on fine-grained visual categorization, IEEE conference on computer vision and pattern recognition. Colorado Springs, CO
Li C, Song Q, Wang Y, Song H, Kang Q, Cheng J, Lu H (2016) Learning to recognition from bing clickture data. In: IEEE International conference on multimedia and expo workshops, pp 1–4
Liu T, Tao D (2016) Classification with noisy labels by importance reweighting. IEEE Trans Pattern Anal Mach Intell 38(3):447–461
Nie L, Wang M, Zha Z, Li G, Chua TS (2011) Multimedia answering: enriching text qa with media information. In: ACM SIGIR Conference on research and development in information retrieval, SIGIR ‘11. ACM, pp 695–704
Nie L, Wang M, Zha ZJ, Chua TS (2012) Oracle in image search: a content-based approach to performance prediction. ACM Trans Inf Syst 30(2):13:1–13:23
Nie L, Yan S, Wang M, Hong R, Chua TS (2012) Harvesting visual concepts for image search with complex queries. In: ACM International conference on multimedia, MM’12. ACM, pp 59–68
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
Tan M, Wang Y, Pan G (2012) Feature reduction for efficient object detection via L1-norm latent SVM. In: Intelligent science and intelligent data engineering
Tan M, Pan G, Wang Y, Zhang Y, Wu Z (2014) L1-norm latent svm for compact features in object detection. Neurocomputing 139(139):56–64
Tan M, Hu Z, Wang B, Zhao J, Wang Y (2016) Robust object recognition via weakly supervised metric and template learning. Neurocomputing 101:96–107
Tan M, Wang B, Wu Z, Wang J, Pan G (2016) Weakly supervised metric learning for traffic sign recognition in a lidar-equipped vehicle. IEEE Trans Intell Transp Syst 17(5):1415–1427. https://doi.org/10.1109/TITS.2015.2506182
Tan M, Yu J, Zheng G, Wu W, Sun K (2016) Deep neural network boosted large scale image recognition using user click data. In: International conference on internet multimedia computing and service, pp 118–121
Tsung-Yu Lin AR, Maji S (2015) Bilinear CNN models for fine-grained visual recognition. In: IEEE International conference on computer vision
Wang R, Liu T, Tao D (2017) Multiclass learning with partially corrupted labels. IEEE Trans Neural Netw Learn Syst PP(99):1–13
Yan Y, Nie F, Li W, Gao C, Yang Y, Xu D (2016) Image classification by cross-media active learning with privileged information. IEEE Trans Multimedia 18(12):2494–2502
Yan C, Luo M, Liu W, Zheng Q (2017) Robust dictionary learning with graph regularization for unsupervised person re-identification. Multimed Tool Appl (2):1–25
Yang Y, Nie F, Xu D, Luo J, Zhuang Y, Pan Y (2012) A multimedia retrieval framework based on semi-supervised ranking and relevance feedback. IEEE Trans Pattern Anal Mach Intell 34(4):723–742
Yu J, Wang M, Tao D (2012) Semisupervised multiview distance metric learning for cartoon synthesis. IEEE Trans Image Process 21(11):4636–4648
Yu J, Rui Y, Chen B (2014) Exploiting click constraints and multi-view features for image re-ranking. IEEE Trans Multimedia 16(1):159–168
Yu J, Rui Y, Tao D (2014) Click prediction for web image reranking using multimodal sparse coding. IEEE Trans Image Process 23(5):2019–2032
Yu J, Tao D, Meng W, Yong R (2015) Learning to rank using user clicks and visual features for image retrieval. IEEE Trans Cybern 45(4):767–779
Zhang H, Zha ZJ, Yang Y, Yan S, Chua TS (2014) Robust (semi) nonnegative graph embedding. IEEE Trans Image Process A Publ the IEEE Signal Process Society 23(7):2996–3012
Zhang H, Zha ZJ, Yang Y, Yan S, Gao Y, Chua TS (2014) Attribute-augmented semantic hierarchy:towards a unified framework for content-based image retrieval. ACM Trans Multimed Comput Commun Appl 11(1s):1–21
Zhang J, Nie L, Wang X, He X, Huang X, Chua TS (2016) Shorter-is-better: venue category estimation from micro-video. In: ACM On multimedia conference, pp 1415–1424
Zhang Y, Wei XS, Wu J, Cai J (2016) Weakly supervised fine-grained categorization with part-based image representation. IEEE Trans Image Process 25(4):1713–1725
Zhang H, Huang Y, Xu X, Zhu Z, Deng C (2017) Latent semantic factorization for multimedia representation learning. Multimed Tool Appl (1):1–16
Zheng G, Tan M, Yu J, Wu Q, Fan J (2017) Fine-grained image recongnition via weakly supervised click data guided bilinear cnn model. In: IEEE International conference on multimedia and expo (accpet). IEEE
Acknowledgments
This work was partly supported by National Natural Science Foundation of China (No. 61602136, No.61622205, No. 61472110), and Zhejiang Provincial Natural Science Foundation of China under Grant LR15F020002.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Tan, M., Yu, J., Huang, Q. et al. Click data guided query modeling with click propagation and sparse coding. Multimed Tools Appl 77, 22145–22158 (2018). https://doi.org/10.1007/s11042-018-5703-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-018-5703-4