Rethinking Cooking State Recognition with Vision Transformers

Khan, Akib Mohammed; Ashrafee, Alif; Sayera, Reeshoon; Ivan, Shahriar; Ahmed, Sabbir

doi:10.1109/ICCIT57492.2022.10055869

Computer Science > Computer Vision and Pattern Recognition

arXiv:2212.08586 (cs)

[Submitted on 16 Dec 2022 (v1), last revised 24 Dec 2022 (this version, v2)]

Title:Rethinking Cooking State Recognition with Vision Transformers

Authors:Akib Mohammed Khan, Alif Ashrafee, Reeshoon Sayera, Shahriar Ivan, Sabbir Ahmed

View PDF

Abstract:To ensure proper knowledge representation of the kitchen environment, it is vital for kitchen robots to recognize the states of the food items that are being cooked. Although the domain of object detection and recognition has been extensively studied, the task of object state classification has remained relatively unexplored. The high intra-class similarity of ingredients during different states of cooking makes the task even more challenging. Researchers have proposed adopting Deep Learning based strategies in recent times, however, they are yet to achieve high performance. In this study, we utilized the self-attention mechanism of the Vision Transformer (ViT) architecture for the Cooking State Recognition task. The proposed approach encapsulates the globally salient features from images, while also exploiting the weights learned from a larger dataset. This global attention allows the model to withstand the similarities between samples of different cooking objects, while the employment of transfer learning helps to overcome the lack of inductive bias by utilizing pretrained weights. To improve recognition accuracy, several augmentation techniques have been employed as well. Evaluation of our proposed framework on the `Cooking State Recognition Challenge Dataset' has achieved an accuracy of 94.3%, which significantly outperforms the state-of-the-art.

Comments:	Accepted in 25th ICCIT (6 pages, 5 Figures, 5 Tables)
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Report number:	10055869
Cite as:	arXiv:2212.08586 [cs.CV]
	(or arXiv:2212.08586v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2212.08586
Journal reference:	2022 25th International Conference on Computer and Information Technology (ICCIT)
Related DOI:	https://doi.org/10.1109/ICCIT57492.2022.10055869

Submission history

From: Sabbir Ahmed [view email]
[v1] Fri, 16 Dec 2022 17:06:28 UTC (8,732 KB)
[v2] Sat, 24 Dec 2022 06:32:50 UTC (8,732 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Rethinking Cooking State Recognition with Vision Transformers

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Computer Science > Computer Vision and Pattern Recognition

Title:Rethinking Cooking State Recognition with Vision Transformers

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.