Abstract
Learning from limited data is challenging because data scarcity leads to a poor generalization of the trained model. A classical global pooled representation will probably lose useful local information. Many few-shot learning methods have recently addressed this challenge using deep descriptors and learning a pixel-level metric. However, using deep descriptors as feature representations may lose image contextual information. Moreover, most of these methods independently address each class in the support set, which cannot sufficiently use discriminative information and task-specific embeddings. In this paper, we propose a novel transformer-based neural network architecture called sparse spatial transformers (SSFormers), which finds task-relevant features and suppresses task-irrelevant features. Particularly, we first divide each input image into several image patches of different sizes to obtain dense local features. These features retain contextual information while expressing local information. Then, a sparse spatial transformer layer is proposed to find spatial correspondence between the query image and the full support set to select task-relevant image patches and suppress task-irrelevant image patches. Finally, we propose using an image patch-matching module to calculate the distance between dense local representations, thus determining which category the query image belongs to in the support set. Extensive experiments on popular few-shot learning benchmarks demonstrate the superiority of our method over state-of-the-art methods.
Similar content being viewed by others
References
Cao Z, Simon T, Wei S E, et al. Realtime multi-person 2D pose estimation using part affinity fields. In: Proceedings of Computer Vision and Pattern Recognition (CVPR), 2017. 7291–7299
Lin T Y, Goyal P, Girshick R, et al. Focal loss for dense object detection. In: Proceedings of International Conference on Computer Vision (ICCV), 2017. 2980–2988
Cheng G, Lang C, Han J. Holistic prototype activation for few-shot segmentation. IEEE Trans Pattern Anal Mach Intell, 2023, 45: 4650–4666
Lang C, Cheng G, Tu B, et al. Learning what not to segment: a new perspective on few-shot segmentation. In: Proceedings of Computer Vision and Pattern Recognition (CVPR). 2022. 8057–8067
Cheng G, Li R M, Lang C B, et al. Task-wise attention guided part complementary learning for few-shot image classification. Sci China Inf Sci, 2021, 64: 120104
Finn C, Abbeel P, Levine S. Model-agnostic meta-learning for fast adaptation of deep networks. In: Proceedings of International Conference on Machine Learning (ICML), 2017. 1126–1135
Ravi S, Larochelle H. Optimization as a model for few-shot learning. In: Proceedings of International Conference on Learning Representations (ICLR), 2017
Chu W H, Li Y J, Chang J C, et al. Spot and learn: a maximum-entropy patch sampler for few-shot image classification. In: Proceedings of Computer Vision and Pattern Recognition (CVPR), 2019. 6251–6260
Vinyals O, Blundell C, Lillicrap T, et al. Matching networks for one shot learning. In: Proceedings of Neural Information Processing Systems (NeurIPS), 2016. 3630–3638
Snell J, Swersky K, Zemel R. Prototypical networks for few-shot learning. In: Proceedings of Neural Information Processing Systems (NeurIPS), 2017. 4077–4087
Sung F, Yang Y, Zhang L, et al. Learning to compare: relation network for few-shot learning. In: Proceedings of Computer Vision and Pattern Recognition (CVPR), 2018. 1199–1208
Li W, Wang L, Xu J, et al. Revisiting local descriptor based image-to-class measure for few-shot learning. In: Proceedings of Computer Vision and Pattern Recognition (CVPR), 2019. 7260–7268
Kang D, Kwon H, Min J, et al. Relational embedding for few-shot classification. In: Proceedings of International Conference on Computer Vision (ICCV), 2021. 8822–8833
Li A, Luo T, Xiang T, et al. Few-shot learning with global class representations. In: Proceedings of International Conference on Computer Vision (ICCV), 2019. 9715–9724
Chen H, Li H, Li Y, et al. Multi-scale adaptive task attention network for few-shot learning. In: Proceedings of International Conference on Pattern Recognition (ICPR), 2022
Zhang C, Cai Y, Lin G, et al. DeepEMD: few-shot image classification with differentiable earth mover’s distance and structured classifiers. In: Proceedings of Computer Vision and Pattern Recognition (CVPR), 2020. 12203–12213
Hou R, Chang H, Ma B, et al. Cross attention network for few-shot classification. In: Proceedings of Neural Information Processing Systems (NeurIPS), 2019. 4005–4016
Hao F, He F, Cheng J, et al. Collect and select: semantic alignment metric learning for few-shot learning. In: Proceedings of International Conference on Computer Vision (ICCV), 2019. 8460–8469
Li W, Xu J, Huo J, et al. Distribution consistency based covariance metric networks for few-shot learning. In: Proceedings of Association for the Advancement of Artificial Intelligence (AAAI), 2019. 8642–8649
Haghverdi L, Lun A T L, Morgan M D, et al. Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors. Nat Biotechnol, 2018, 36: 421–427
Liu Y, Zheng T, Song J, et al. DMN4: few-shot learning via discriminative mutual nearest neighbor neural network. 2021. ArXiv:2103.08160
Jamal M A, Qi G J. Task agnostic meta-learning for few-shot learning. In: Proceedings of Computer Vision and Pattern Recognition (CVPR), 2019. 11719–11727
Li H, Dong W, Mei X, et al. LGM-Net: learning to generate matching networks for few-shot learning. In: Proceedings of International Conference on Machine Learning (ICML), 2019. 3825–3834
Ye H J, Hu H, Zhan D C, et al. Few-shot learning via embedding adaptation with set-to-set functions. In: Proceedings of Computer Vision and Pattern Recognition (CVPR), 2020. 8808–8817
Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need. In: Proceedings of Neural Information Processing Systems (NeurIPS), 2017. 8088–8017
Doersch C, Gupta A, Zisserman A. CrossTransformers: spatially-aware few-shot transfer. In: Proceedings of Neural Information Processing Systems (NeurIPS), 2020. 21981–21993
Ren M, Triantafillou E, Ravi S, et al. Meta-learning for semi-supervised few-shot classification. In: Proceedings of International Conference on Learning Representations (ICLR), 2018
Bertinetto L, Henriques J F, Torr P H S, et al. Meta-learning with differentiable closed-form solvers. In: Proceedings of International Conference on Learning Representations (ICLR), 2019
Oreshkin B, Rodriguez P, Lacoste A. TADAM: task dependent adaptive metric for improved few-shot learning. In: Proceedings of Neural Information Processing Systems (NeurIPS), 2018. 719–729
Simon C, Koniusz P, Nock R, et al. Adaptive subspaces for few-shot learning. In: Proceedings of Computer Vision and Pattern Recognition (CVPR), 2020. 4136–4145
Lee K, Maji S, Ravichandran A, et al. Meta-learning with differentiable convex optimization. In: Proceedings of Computer Vision and Pattern Recognition (CVPR), 2019. 10657–10665
Kinga D, Adam J B. A method for stochastic optimization. In: Proceedings of International Conference on Learning Representations (ICLR), 2015
Paszke A, Gross S, Chintala S, et al. Automatic differentiation in PyTorch. In: Proceedings of Neural Information Processing Systems (NeurIPS) Workshop, 2017
Chen M T, Wang X G, Luo H, et al. Learning to focus: cascaded feature matching network for few-shot image recognition. Sci China Inf Sci, 2021, 64: 192105
Lu S, Ye H J, Zhan D C. Tailoring embedding function to heterogeneous few-shot tasks by global and local feature adaptors. In: Proceedings of Association for the Advancement of Artificial Intelligence (AAAI), 2021. 8776–8783
Zhang H, Koniusz P, Jian S, et al. Rethinking class relations: absolute-relative supervised and unsupervised few-shot learning. In: Proceedings of Computer Vision and Pattern Recognition (CVPR), 2021. 9432–9441
Chen Z, Ge J, Zhan H, et al. Pareto self-supervised training for few-shot learning. In: Proceedings of Computer Vision and Pattern Recognition (CVPR), 2021. 13663–13672
Laenen S, Bertinetto L. On episodes, prototypical networks, and few-shot learning. In: Proceedings of Neural Information Processing Systems (NeurIPS), 2021
Kim J, Kim H, Kim G. Model-agnostic boundary-adversarial sampling for test-time generalization in few-shot learning. In: Proceedings of European Conference on Computer Vision (ECCV), 2020. 599–617
Dhillon G S, Chaudhari P, Ravichandran A, et al. A baseline for few-shot image classification. In: Proceedings of International Conference on Learning Representations (ICLR), 2020
Chen H, Li H, Li Y, et al. Multi-level metric learning for few-shot image recognition. In: Proceedings of International Conference on Artificial Neural Networks (ICANN), 2022
Tian Y, Wang Y, Krishnan D, et al. Rethinking few-shot image classification: a good embedding is all you need? In: Proceedings of European Conference on Computer Vision (ECCV), 2020. 266–282
Yun S, Han D, Oh S J, et al. CutMix: regularization strategy to train strong classifiers with localizable features. In: Proceedings of Computer Vision and Pattern Recognition (CVPR), 2019. 6023–6032
Acknowledgements
This work was partially supported by National Natural Science Foundation of China (Grant Nos. 62176116, 62073160, 62276136) and Natural Science Foundation of the Jiangsu Higher Education Institutions of China (Grant No. 20KJA520-006).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Chen, H., Li, H., Li, Y. et al. Sparse spatial transformers for few-shot learning. Sci. China Inf. Sci. 66, 210102 (2023). https://doi.org/10.1007/s11432-022-3700-8
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11432-022-3700-8