skip to main content
10.1145/2964284.2964315acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Deep-based Ingredient Recognition for Cooking Recipe Retrieval

Published: 01 October 2016 Publication History

Abstract

Retrieving recipes corresponding to given dish pictures facilitates the estimation of nutrition facts, which is crucial to various health relevant applications. The current approaches mostly focus on recognition of food category based on global dish appearance without explicit analysis of ingredient composition. Such approaches are incapable for retrieval of recipes with unknown food categories, a problem referred to as zero-shot retrieval. On the other hand, content-based retrieval without knowledge of food categories is also difficult to attain satisfactory performance due to large visual variations in food appearance and ingredient composition. As the number of ingredients is far less than food categories, understanding ingredients underlying dishes in principle is more scalable than recognizing every food category and thus is suitable for zero-shot retrieval. Nevertheless, ingredient recognition is a task far harder than food categorization, and this seriously challenges the feasibility of relying on them for retrieval. This paper proposes deep architectures for simultaneous learning of ingredient recognition and food categorization, by exploiting the mutual but also fuzzy relationship between them. The learnt deep features and semantic labels of ingredients are then innovatively applied for zero-shot retrieval of recipes. By experimenting on a large Chinese food dataset with images of highly complex dish appearance, this paper demonstrates the feasibility of ingredient recognition and sheds light on this zero-shot problem peculiar to cooking recipe retrieval.

References

[1]
K. Aizawa and M. Ogawa. Foodlog: Multimedia tool for healthcare applications. IEEE MultiMedia, 22(2):4--8, 2015.
[2]
O. Beijbom, N. Joshi, D. Morris, S. Saponas, and S. Khullar. Menu-match: Restaurant-specific food logging from images. In Applications of Computer Vision (WACV), 2015 IEEE Winter Conference on, pages 844--851. IEEE, 2015.
[3]
V. Bettadapura, E. Thomaz, A. Parnami, G. D. Abowd, and I. Essa. Leveraging context to support automated food recognition in restaurants. In Applications of Computer Vision (WACV), 2015 IEEE Winter Conference on, pages 580--587. IEEE, 2015.
[4]
L. Bossard, M. Guillaumin, and L. Van Gool. Food-101-mining discriminative components with random forests. In Computer Vision-ECCV 2014, pages 446--461. Springer, 2014.
[5]
L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille. Semantic image segmentation with deep convolutional nets and fully connected crfs. arXiv preprint arXiv:1412.7062, 2014.
[6]
M. Chen, K. Dhingra, W. Wu, L. Yang, R. Sukthankar, and J. Yang. Pfid: Pittsburgh fast-food image dataset. In Image Processing (ICIP), 2009 16th IEEE International Conference on, pages 289--292. IEEE, 2009.
[7]
M.-Y. Chen, Y.-H. Yang, C.-J. Ho, S.-H. Wang, S.-M. Liu, E. Chang, C.-H. Yeh, and M. Ouhyoung. Automatic chinese food identification and quantity estimation. In SIGGRAPH Asia 2012 Technical Briefs, page 29. ACM, 2012.
[8]
N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. Computer Vision and Pattern Recognition, pages 886--893, 2005.
[9]
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. Imagenet: A large-scale hierarchical image database. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pages 248--255. IEEE, 2009.
[10]
M. Elhoseiny, T. El-Gaaly, A. Bakry, and A. Elgammal. Convolutional models for joint object categorization and pose estimation. arXiv preprint arXiv:1511.05175, 2015.
[11]
A. H. Goris, M. S. Westerterp-Plantenga, and K. R. Westerterp. Undereating and underrecording of habitual food intake in obese men: selective underreporting of fat intake. The American journal of clinical nutrition, 71(1):130--134, 2000.
[12]
H. He, F. Kong, and J. Tan. Dietcam: Multi-view food recognition using a multi-kernel svm. Journal of Biomedical and Health Informatics, 2015.
[13]
H. Hoashi, T. Joutou, and K. Yanai. Image recognition of 85 food categories by feature fusion. In Multimedia (ISM), 2010 IEEE International Symposium on, pages 296--301. IEEE, 2010.
[14]
Y. Kawano and K. Yanai. Real-time mobile food recognition system. In Computer Vision and Pattern Recognition Workshop, 2013.
[15]
Y. Kawano and K. Yanai. Food image recognition with deep convolutional features. Proc. of ACM UbiComp Workshop on Cooking and Eating Activities (CEA), 2014.
[16]
K. Kitamura, T. Yamasaki, and K. Aizawa. Food log by analyzing food images. In Proceedings of the 16th ACM international conference on Multimedia, pages 999--1000. ACM, 2008.
[17]
A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, pages 1097--1105, 2012.
[18]
D. Lin, X. Shen, C. Lu, and J. Jia. Deep lac: Deep localization, alignment and classification for fine-grained recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1666--1674, 2015.
[19]
T.-Y. Lin, A. RoyChowdhury, and S. Maji. Bilinear cnn models for fine-grained visual recognition. In Proceedings of the IEEE International Conference on Computer Vision, pages 1449--1457,2015.
[20]
Z. Liu, P. Luo, X. Wang, and X. Tang. Deep learning face attributes in the wild. In Proceedings of the IEEE International Conference on Computer Vision, pages 3730--3738, 2015.
[21]
D. G. Lowe. Object recognition from local scale-invariant features. The proceedings of the seventh IEEE international conference on Computer Vision, pages 1150--1157, 1999.
[22]
Y. Matsuda and K. Yanai. Multiple-food recognition considering co-occurrence employing manifold ranking. In Pattern Recognition (ICPR), 2012 21st International Conference on, pages 2017--2020. IEEE, 2012.
[23]
H. Matsunaga, K. Doman, T. Hirayama, I. Ide, D. Deguchi, and H. Murase. Tastes and textures estimation of foods based on the analysis of its ingredients list and image. In New Trends in Image Analysis and Processing-ICIAP 2015 Workshops, pages 326--333. Springer, 2015.
[24]
A. Meyers, N. Johnston, V. Rathod, A. Korattikara, A. Gorban, N. Silberman, S. Guadarrama, G. Papandreou, J. Huang, and K. P. Murphy. Im2calories: towards an automated mobile vision food diary. In Proceedings of the IEEE International Conference on Computer Vision, pages 1233--1241, 2015.
[25]
M. Puri, Z. Zhu, Q. Yu, A. Divakaran, and H. Sawhney. Recognition and volume estimation of food intake using a mobile device, 2009.
[26]
C. Robert and G. Casella. Monte Carlo statistical methods. Springer Science & Business Media, 2013.
[27]
C. Siagian and L. Itti. Rapid biologically-inspired scene classification using features shared with visual attention. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 29(2):300--312, 2007.
[28]
K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
[29]
R. Socher, M. Ganjoo, C. D. Manning, and A. Ng. Zero-shot learning through cross-modal transfer. In Advances in neural information processing systems, pages 935--943, 2013.
[30]
M. A. Stricker and M. Orengo. Similarity of color images. In IS&T/SPIE's Symposium on Electronic Imaging: Science & Technology, pages 381--392. International Society for Optics and Photonics, 1995.
[31]
Y. Sun, Y. Chen, X. Wang, and X. Tang. Deep learning face representation by joint identification-verification. In Advances in Neural Information Processing Systems, pages 1988--1996, 2014.
[32]
Y. Sun, X. Wang, and X. Tang. Deep learning face representation from predicting 10,000 classes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1891--1898, 2014.
[33]
M. J. Wainwright and M. I. Jordan. Graphical models, exponential families, and variational inference. Foundations and Trends R in Machine Learning, 1(1--2):1--305, 2008.
[34]
X. Wang, D. Kumar, N. Thome, M. Cord, and F. Precioso. Recipe recognition with large multimodal food dataset. In Multimedia & Expo Workshops (ICMEW), 2015 IEEE International Conference on, pages 1--6. IEEE, 2015.
[35]
W. Wu and J. Yang. Fast food recognition from videos of eating for calorie estimation. In Multimedia and Expo, 2009. ICME 2009. IEEE International Conference on, pages 1210--1213. IEEE, 2009.
[36]
K. Yanai and Y. Kawano. Food image recognition using deep convolutional network with pre-training and fine-tuning. In Multimedia & Expo Workshops (ICMEW), 2015 IEEE International Conference on, pages 1--6. IEEE, 2015.
[37]
S. Yang, M. Chen, D. Pomerleau, and R. Sukthankar. Food recognition using statistics of pairwise local features. Computer Vision and Pattern Recognition (CVPR), pages 2249--2256, 2010.

Cited By

View all
  • (2025)An Explainable CNN and Vision Transformer-Based Approach for Real-Time Food RecognitionNutrients10.3390/nu1702036217:2(362)Online publication date: 20-Jan-2025
  • (2025)Enhancing Food Image Recognition by Multi-Level Fusion and the Attention MechanismFoods10.3390/foods1403046114:3(461)Online publication date: 31-Jan-2025
  • (2025)SARI: A Stage-aware Recognition Method for Ingredients Changing Appearance in Cooking Image SequencesJournal of Information Processing10.2197/ipsjjip.33.10433(104-114)Online publication date: 2025
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

MM '16: Proceedings of the 24th ACM international conference on Multimedia
October 2016
1542 pages
ISBN:9781450336031
DOI:10.1145/2964284
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 October 2016

Permissions

Request permissions for this article.

Check for updates

Badges

  • Best Student Paper

Author Tags

  1. food categorization
  2. ingredient recognition
  3. multi-task deep learning
  4. zero-shot retrieval

Qualifiers

  • Research-article

Funding Sources

  • National Hi-Tech Research and Development Program(863 Program) of China
  • National Natural Science Foundation of China

Conference

MM '16
Sponsor:
MM '16: ACM Multimedia Conference
October 15 - 19, 2016
Amsterdam, The Netherlands

Acceptance Rates

MM '16 Paper Acceptance Rate 52 of 237 submissions, 22%;
Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)271
  • Downloads (Last 6 weeks)32
Reflects downloads up to 23 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)An Explainable CNN and Vision Transformer-Based Approach for Real-Time Food RecognitionNutrients10.3390/nu1702036217:2(362)Online publication date: 20-Jan-2025
  • (2025)Enhancing Food Image Recognition by Multi-Level Fusion and the Attention MechanismFoods10.3390/foods1403046114:3(461)Online publication date: 31-Jan-2025
  • (2025)SARI: A Stage-aware Recognition Method for Ingredients Changing Appearance in Cooking Image SequencesJournal of Information Processing10.2197/ipsjjip.33.10433(104-114)Online publication date: 2025
  • (2025)Improving Global Generalization and Local Personalization for Federated LearningIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2024.341745236:1(76-87)Online publication date: Jan-2025
  • (2025)Large-scale image classification and nutrient estimation for Chinese dishesJournal of Agriculture and Food Research10.1016/j.jafr.2025.101733(101733)Online publication date: Feb-2025
  • (2025)Learning complementary visual information for few-shot food recognition by Regional Erasure and ReactivationExpert Systems with Applications10.1016/j.eswa.2024.126174268(126174)Online publication date: Apr-2025
  • (2024)Exploring Deep Learning–Based Models for Sociocultural African Food Recognition SystemHuman Behavior and Emerging Technologies10.1155/2024/44433162024:1Online publication date: 18-Sep-2024
  • (2024)Adaptive Feature Inheritance and Thresholding for Ingredient Recognition in Multimedia Cooking InstructionsProceedings of the 6th ACM International Conference on Multimedia in Asia10.1145/3696409.3700256(1-7)Online publication date: 3-Dec-2024
  • (2024)Digital Food Sensing and Ingredient Analysis Techniques to Facilitate Human-Food Interface DesignsACM Computing Surveys10.1145/368567557:1(1-39)Online publication date: 7-Oct-2024
  • (2024)Lightweight Food Recognition via Aggregation Block and Feature EncodingACM Transactions on Multimedia Computing, Communications, and Applications10.1145/368028520:10(1-25)Online publication date: 22-Jul-2024
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy