research-article

Deep-based Ingredient Recognition for Cooking Recipe Retrieval

Authors:

Chong-wah NgoAuthors Info & Claims

MM '16: Proceedings of the 24th ACM international conference on Multimedia

Pages 32 - 41

https://doi.org/10.1145/2964284.2964315

Published: 01 October 2016 Publication History

Abstract

Retrieving recipes corresponding to given dish pictures facilitates the estimation of nutrition facts, which is crucial to various health relevant applications. The current approaches mostly focus on recognition of food category based on global dish appearance without explicit analysis of ingredient composition. Such approaches are incapable for retrieval of recipes with unknown food categories, a problem referred to as zero-shot retrieval. On the other hand, content-based retrieval without knowledge of food categories is also difficult to attain satisfactory performance due to large visual variations in food appearance and ingredient composition. As the number of ingredients is far less than food categories, understanding ingredients underlying dishes in principle is more scalable than recognizing every food category and thus is suitable for zero-shot retrieval. Nevertheless, ingredient recognition is a task far harder than food categorization, and this seriously challenges the feasibility of relying on them for retrieval. This paper proposes deep architectures for simultaneous learning of ingredient recognition and food categorization, by exploiting the mutual but also fuzzy relationship between them. The learnt deep features and semantic labels of ingredients are then innovatively applied for zero-shot retrieval of recipes. By experimenting on a large Chinese food dataset with images of highly complex dish appearance, this paper demonstrates the feasibility of ingredient recognition and sheds light on this zero-shot problem peculiar to cooking recipe retrieval.

References

[1]

K. Aizawa and M. Ogawa. Foodlog: Multimedia tool for healthcare applications. IEEE MultiMedia, 22(2):4--8, 2015.

Digital Library

[2]

O. Beijbom, N. Joshi, D. Morris, S. Saponas, and S. Khullar. Menu-match: Restaurant-specific food logging from images. In Applications of Computer Vision (WACV), 2015 IEEE Winter Conference on, pages 844--851. IEEE, 2015.

Digital Library

[3]

V. Bettadapura, E. Thomaz, A. Parnami, G. D. Abowd, and I. Essa. Leveraging context to support automated food recognition in restaurants. In Applications of Computer Vision (WACV), 2015 IEEE Winter Conference on, pages 580--587. IEEE, 2015.

Digital Library

[4]

L. Bossard, M. Guillaumin, and L. Van Gool. Food-101-mining discriminative components with random forests. In Computer Vision-ECCV 2014, pages 446--461. Springer, 2014.

[5]

L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille. Semantic image segmentation with deep convolutional nets and fully connected crfs. arXiv preprint arXiv:1412.7062, 2014.

[6]

M. Chen, K. Dhingra, W. Wu, L. Yang, R. Sukthankar, and J. Yang. Pfid: Pittsburgh fast-food image dataset. In Image Processing (ICIP), 2009 16th IEEE International Conference on, pages 289--292. IEEE, 2009.

Digital Library

[7]

M.-Y. Chen, Y.-H. Yang, C.-J. Ho, S.-H. Wang, S.-M. Liu, E. Chang, C.-H. Yeh, and M. Ouhyoung. Automatic chinese food identification and quantity estimation. In SIGGRAPH Asia 2012 Technical Briefs, page 29. ACM, 2012.

Digital Library

[8]

N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. Computer Vision and Pattern Recognition, pages 886--893, 2005.

Digital Library

[9]

J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. Imagenet: A large-scale hierarchical image database. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pages 248--255. IEEE, 2009.

[10]

M. Elhoseiny, T. El-Gaaly, A. Bakry, and A. Elgammal. Convolutional models for joint object categorization and pose estimation. arXiv preprint arXiv:1511.05175, 2015.

[11]

A. H. Goris, M. S. Westerterp-Plantenga, and K. R. Westerterp. Undereating and underrecording of habitual food intake in obese men: selective underreporting of fat intake. The American journal of clinical nutrition, 71(1):130--134, 2000.

[12]

H. He, F. Kong, and J. Tan. Dietcam: Multi-view food recognition using a multi-kernel svm. Journal of Biomedical and Health Informatics, 2015.

[13]

H. Hoashi, T. Joutou, and K. Yanai. Image recognition of 85 food categories by feature fusion. In Multimedia (ISM), 2010 IEEE International Symposium on, pages 296--301. IEEE, 2010.

Digital Library

[14]

Y. Kawano and K. Yanai. Real-time mobile food recognition system. In Computer Vision and Pattern Recognition Workshop, 2013.

Digital Library

[15]

Y. Kawano and K. Yanai. Food image recognition with deep convolutional features. Proc. of ACM UbiComp Workshop on Cooking and Eating Activities (CEA), 2014.

Digital Library

[16]

K. Kitamura, T. Yamasaki, and K. Aizawa. Food log by analyzing food images. In Proceedings of the 16th ACM international conference on Multimedia, pages 999--1000. ACM, 2008.

Digital Library

[17]

A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, pages 1097--1105, 2012.

Digital Library

[18]

D. Lin, X. Shen, C. Lu, and J. Jia. Deep lac: Deep localization, alignment and classification for fine-grained recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1666--1674, 2015.

[19]

T.-Y. Lin, A. RoyChowdhury, and S. Maji. Bilinear cnn models for fine-grained visual recognition. In Proceedings of the IEEE International Conference on Computer Vision, pages 1449--1457,2015.

Digital Library

[20]

Z. Liu, P. Luo, X. Wang, and X. Tang. Deep learning face attributes in the wild. In Proceedings of the IEEE International Conference on Computer Vision, pages 3730--3738, 2015.

Digital Library

[21]

D. G. Lowe. Object recognition from local scale-invariant features. The proceedings of the seventh IEEE international conference on Computer Vision, pages 1150--1157, 1999.

Digital Library

[22]

Y. Matsuda and K. Yanai. Multiple-food recognition considering co-occurrence employing manifold ranking. In Pattern Recognition (ICPR), 2012 21st International Conference on, pages 2017--2020. IEEE, 2012.

[23]

H. Matsunaga, K. Doman, T. Hirayama, I. Ide, D. Deguchi, and H. Murase. Tastes and textures estimation of foods based on the analysis of its ingredients list and image. In New Trends in Image Analysis and Processing-ICIAP 2015 Workshops, pages 326--333. Springer, 2015.

[24]

A. Meyers, N. Johnston, V. Rathod, A. Korattikara, A. Gorban, N. Silberman, S. Guadarrama, G. Papandreou, J. Huang, and K. P. Murphy. Im2calories: towards an automated mobile vision food diary. In Proceedings of the IEEE International Conference on Computer Vision, pages 1233--1241, 2015.

Digital Library

[25]

M. Puri, Z. Zhu, Q. Yu, A. Divakaran, and H. Sawhney. Recognition and volume estimation of food intake using a mobile device, 2009.

[26]

C. Robert and G. Casella. Monte Carlo statistical methods. Springer Science & Business Media, 2013.

[27]

C. Siagian and L. Itti. Rapid biologically-inspired scene classification using features shared with visual attention. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 29(2):300--312, 2007.

Digital Library

[28]

K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.

[29]

R. Socher, M. Ganjoo, C. D. Manning, and A. Ng. Zero-shot learning through cross-modal transfer. In Advances in neural information processing systems, pages 935--943, 2013.

Digital Library

[30]

M. A. Stricker and M. Orengo. Similarity of color images. In IS&T/SPIE's Symposium on Electronic Imaging: Science & Technology, pages 381--392. International Society for Optics and Photonics, 1995.

[31]

Y. Sun, Y. Chen, X. Wang, and X. Tang. Deep learning face representation by joint identification-verification. In Advances in Neural Information Processing Systems, pages 1988--1996, 2014.

Digital Library

[32]

Y. Sun, X. Wang, and X. Tang. Deep learning face representation from predicting 10,000 classes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1891--1898, 2014.

Digital Library

[33]

M. J. Wainwright and M. I. Jordan. Graphical models, exponential families, and variational inference. Foundations and Trends R in Machine Learning, 1(1--2):1--305, 2008.

Digital Library

[34]

X. Wang, D. Kumar, N. Thome, M. Cord, and F. Precioso. Recipe recognition with large multimodal food dataset. In Multimedia & Expo Workshops (ICMEW), 2015 IEEE International Conference on, pages 1--6. IEEE, 2015.

[35]

W. Wu and J. Yang. Fast food recognition from videos of eating for calorie estimation. In Multimedia and Expo, 2009. ICME 2009. IEEE International Conference on, pages 1210--1213. IEEE, 2009.

Digital Library

[36]

K. Yanai and Y. Kawano. Food image recognition using deep convolutional network with pre-training and fine-tuning. In Multimedia & Expo Workshops (ICMEW), 2015 IEEE International Conference on, pages 1--6. IEEE, 2015.

[37]

S. Yang, M. Chen, D. Pomerleau, and R. Sukthankar. Food recognition using statistics of pairwise local features. Computer Vision and Pattern Recognition (CVPR), pages 2249--2256, 2010.

Cited By

Nfor KTheodore Armand TIsmaylovna KJoo MKim H(2025)An Explainable CNN and Vision Transformer-Based Approach for Real-Time Food RecognitionNutrients10.3390/nu1702036217:2(362)Online publication date: 20-Jan-2025
https://doi.org/10.3390/nu17020362
Chen ZWang JWang Y(2025)Enhancing Food Image Recognition by Multi-Level Fusion and the Attention MechanismFoods10.3390/foods1403046114:3(461)Online publication date: 31-Jan-2025
https://doi.org/10.3390/foods14030461
Zhang YYamakata YTajima K(2025)SARI: A Stage-aware Recognition Method for Ingredients Changing Appearance in Cooking Image SequencesJournal of Information Processing10.2197/ipsjjip.33.10433(104-114)Online publication date: 2025
https://doi.org/10.2197/ipsjjip.33.104
Show More Cited By

Index Terms

Deep-based Ingredient Recognition for Cooking Recipe Retrieval
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Object recognition
  2. Machine learning
    1. Learning paradigms
      1. Multi-task learning
      2. Supervised learning
        Supervised learning by classification
    2. Machine learning approaches
      1. Neural networks
2. Information systems
  1. Information retrieval
    1. Specialized information retrieval
      1. Multimedia and multimodal retrieval

Recommendations

Food Recipe Recommendation Based on Ingredients Detection Using Deep Learning
ICCA '22: Proceedings of the 2nd International Conference on Computing Advancements

Food is essential for human survival, and people always try to taste different types of delicious recipes. Frequently, people choose food ingredients without even knowing their names or pick up some food ingredients that aren’t obvious to them from a ...
Cross-modal Recipe Retrieval with Rich Food Attributes
MM '17: Proceedings of the 25th ACM international conference on Multimedia

Food is rich of visible (e.g., colour, shape) and procedural (e.g., cutting, cooking) attributes. Proper leveraging of these attributes, particularly the interplay among ingredients, cutting and cooking methods, for health-related applications has not ...
Deep Understanding of Cooking Procedure for Cross-modal Recipe Retrieval
MM '18: Proceedings of the 26th ACM international conference on Multimedia

Finding a right recipe that describes the cooking procedure for a dish from just one picture is inherently a difficult problem. Food preparation undergoes a complex process involving raw ingredients, utensils, cutting and cooking operations. This ...

Comments

comments powered by Disqus.

Information & Contributors

Information

Published In

MM '16: Proceedings of the 24th ACM international conference on Multimedia

October 2016

1542 pages

ISBN:9781450336031

DOI:10.1145/2964284

General Chairs:
Alan Hanjalic
Delft University of Technology
,
Cees Snoek
Qualcomm Research Netherlands / University of Amsterdam
,
Marcel Worring
University of Amsterdam
,
Moderator:
Dick Bulterman
CWI / VU University Amsterdam
,
Program Chairs:
Benoit Huet
EURECOM
,
Aisling Kelliher
Virginia Tech
,
Yiannis Kompatsiaris
CERTH-ITI
,
Jin Li
Microsoft

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 October 2016

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Badges

Best Student Paper

Author Tags

Qualifiers

Research-article

Funding Sources

National Hi-Tech Research and Development Program(863 Program) of China
National Natural Science Foundation of China

Conference

MM '16

Sponsor:

SIGMM

MM '16: ACM Multimedia Conference

October 15 - 19, 2016

Amsterdam, The Netherlands

Acceptance Rates

MM '16 Paper Acceptance Rate 52 of 237 submissions, 22%;

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

265
Total Citations
View Citations
2,869
Total Downloads

Downloads (Last 12 months)271
Downloads (Last 6 weeks)32

Reflects downloads up to 23 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Nfor KTheodore Armand TIsmaylovna KJoo MKim H(2025)An Explainable CNN and Vision Transformer-Based Approach for Real-Time Food RecognitionNutrients10.3390/nu1702036217:2(362)Online publication date: 20-Jan-2025
https://doi.org/10.3390/nu17020362
Chen ZWang JWang Y(2025)Enhancing Food Image Recognition by Multi-Level Fusion and the Attention MechanismFoods10.3390/foods1403046114:3(461)Online publication date: 31-Jan-2025
https://doi.org/10.3390/foods14030461
Zhang YYamakata YTajima K(2025)SARI: A Stage-aware Recognition Method for Ingredients Changing Appearance in Cooking Image SequencesJournal of Information Processing10.2197/ipsjjip.33.10433(104-114)Online publication date: 2025
https://doi.org/10.2197/ipsjjip.33.104
Meng LQi ZWu LDu XLi ZCui LMeng X(2025)Improving Global Generalization and Local Personalization for Federated LearningIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2024.341745236:1(76-87)Online publication date: Jan-2025
https://doi.org/10.1109/TNNLS.2024.3417452
Feng YWang YWang XBi JXiao ZLuo Y(2025)Large-scale image classification and nutrient estimation for Chinese dishesJournal of Agriculture and Food Research10.1016/j.jafr.2025.101733(101733)Online publication date: Feb-2025
https://doi.org/10.1016/j.jafr.2025.101733
Zhang YLi HHuangfu LBalazs LHuang S(2025)Learning complementary visual information for few-shot food recognition by Regional Erasure and ReactivationExpert Systems with Applications10.1016/j.eswa.2024.126174268(126174)Online publication date: Apr-2025
https://doi.org/10.1016/j.eswa.2024.126174
Ataguba GAlhasani MDaniel JOgbuju EOrji R(2024)Exploring Deep Learning–Based Models for Sociocultural African Food Recognition SystemHuman Behavior and Emerging Technologies10.1155/2024/44433162024:1Online publication date: 18-Sep-2024
https://doi.org/10.1155/2024/4443316
Zhang YYamakata YTajima K(2024)Adaptive Feature Inheritance and Thresholding for Ingredient Recognition in Multimedia Cooking InstructionsProceedings of the 6th ACM International Conference on Multimedia in Asia10.1145/3696409.3700256(1-7)Online publication date: 3-Dec-2024
https://dl.acm.org/doi/10.1145/3696409.3700256
Amarasinghe CRanasinghe N(2024)Digital Food Sensing and Ingredient Analysis Techniques to Facilitate Human-Food Interface DesignsACM Computing Surveys10.1145/368567557:1(1-39)Online publication date: 7-Oct-2024
https://dl.acm.org/doi/10.1145/3685675
Yang YMin WSong JSheng GWang LJiang S(2024)Lightweight Food Recognition via Aggregation Block and Feature EncodingACM Transactions on Multimedia Computing, Communications, and Applications10.1145/368028520:10(1-25)Online publication date: 22-Jul-2024
https://dl.acm.org/doi/10.1145/3680285
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Alternative Proxies:

Alternative Proxy