Learning to Recognize Daily Actions Using Gaze

Fathi, Alireza; Li, Yin; Rehg, James M.

doi:10.1007/978-3-642-33718-5_23

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 7572))

Included in the following conference series:

European Conference on Computer Vision

11k Accesses
154 Citations

Abstract

We present a probabilistic generative model for simultaneously recognizing daily actions and predicting gaze locations in videos recorded from an egocentric camera. We focus on activities requiring eye-hand coordination and model the spatio-temporal relationship between the gaze point, the scene objects, and the action label. Our model captures the fact that the distribution of both visual features and object occurrences in the vicinity of the gaze point is correlated with the verb-object pair describing the action. It explicitly incorporates known properties of gaze behavior from the psychology literature, such as the temporal delay between fixation and manipulation events. We present an inference method that can predict the best sequence of gaze locations and the associated action label from an input sequence of images. We demonstrate improvements in action recognition rates and gaze prediction accuracy relative to state-of-the-art methods, on two new datasets that contain egocentric videos of daily activities and gaze.

Download to read the full chapter text

Chapter PDF

ACE-DNV: Automatic classification of gaze events in dynamic natural viewing

Article Open access 06 March 2024

‘Labelling the Gaps’: A Weakly Supervised Automatic Eye Gaze Estimation

In the Eye of Transformer: Global–Local Correlation for Egocentric Gaze Estimation and Beyond

Article Open access 18 October 2023

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Arbelaez, P., Maire, M., Fowlkes, C., Malik, J.: From contours to regions: an empirical evaluation. In: CVPR (2009)
Google Scholar
Borji, A., Sihite, D.N., Itti, L.: Probabilistic learning of task-specific visual attention. In: CVPR (2012)
Google Scholar
Devyver, M., Tsukada, A., Kanade, T.: A wearable device for first person vision. In: 3rd International Symposium on Quality of Life Technology (2011)
Google Scholar
Einhauser, W., Spain, M., Perona, P.: Objects predict fixations better than early saliency. Journal of Vision (2008)
Google Scholar
Fathi, A., Farhadi, A., Rehg, J.M.: Understanding egocentric activities. In: ICCV (2011)
Google Scholar
Fathi, A., Hodgins, J.K., Rehg, J.M.: Social interactions: A first-person perspective. In: CVPR (2012)
Google Scholar
Fathi, A., Ren, X., Rehg, J.M.: Learning to recognize objects in egocentric activities. In: CVPR (2011)
Google Scholar
Findlay, J.M., Gilchrist, I.D.: Active Vision: The Psychology of Looking and Seeing. Oxford Psychology Series. Oxford University Press (2003)
Google Scholar
Gupta, A., Kembhavi, A., Davis, L.S.: Observing human-object interactions: using spatial and functional compatibility for recognition. PAMI (2009)
Google Scholar
Hayhoe, M., Ballard, D.: Eye movements in natural behavior. TRENDS in Congnitive Sciences (2005)
Google Scholar
Itti, L., Koch, C., Niebur, E.: A model of saliency-based visual attention for rapid scene analysis. PAMI (1998)
Google Scholar
Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: ICCV (2009)
Google Scholar
Kitani, K.M., Okabe, T., Sato, Y., Sugimoto, A.: Fast unsupervised ego-action learning for first-person sports videos. In: CVPR (2011)
Google Scholar
Land, M.F., Hayhoe, M.: In what ways do eye movements contribute to everyday activities? Vision Research 41, 3559–3565 (2001)
Article Google Scholar
Lee, Y.J., Ghosh, J., Grauman, K.: Discovering important people and objects for egocentric video summarization. In: CVPR (2012)
Google Scholar
Lester, J., Choudhury, T., Kern, N., Borriello, G., Hannaford, B.: A hybrid discriminative/generative approach for modeling human activities. In: IJCAI (2005)
Google Scholar
Mann, R., Jepson, A., Siskind, J.M.: Computational Perception of Scene Dynamics. In: Buxton, B.F., Cipolla, R. (eds.) ECCV 1996, Part II. LNCS, vol. 1065, pp. 528–539. Springer, Heidelberg (1996)
Chapter Google Scholar
Pelz, J.B., Consa, R.: Oculomotor behavior and perceptual strategies in complex tasks. Vision Research (2001)
Google Scholar
Pirsiavash, H., Ramanan, D.: Detecting activities of daily living in first-person camera views. In: CVPR (2012)
Google Scholar
Platt, J.: Probabilities for sv machines. In: Advanced in Large Margin Classifiers. MIT Press (1999)
Google Scholar
Ren, X., Gu, C.: Figure-ground segmentation improves handled object recognition in egocentric video. In: CVPR (2010)
Google Scholar
Schiele, B., Oliver, N., Jebara, T., Pentland, A.: An interactive computer vision system - dypers: dynamic personal enhanced reality system. In: ICVS (1999)
Google Scholar
Spriggs, E.H., De La Torre, F., Hebert, M.: Temporal segmentation and activity classification from first-person sensing. In: Egovision Workshop (2009)
Google Scholar
Torralba, A., Oliva, A., Castelhano, M., Henderson, J.: Contextual guidance of eye movements and attention in real-world scenes: the role of global features on object search. Psychological Review (2006)
Google Scholar
Verma, M., Zisserman, A.: A statistical approach to texture classification from single images. IJCV (2005)
Google Scholar
Wu, J., Osuntogun, A., Choudhury, T., Philipose, M., Rehg, J.M.: A scalable approach to activity recognition based on object use. In: CVPR (2007)
Google Scholar
Yarbus, A.: Eye Movements and Vision. Plenum Press (1967)
Google Scholar
Yi, W., Ballard, D.: Recognizing behavior in hand-eye coordination patterns. International Journal of Humanoid Robots (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

College of Computing, Georgia Institute of Technology, USA
Alireza Fathi, Yin Li & James M. Rehg

Authors

Alireza Fathi
View author publications
You can also search for this author in PubMed Google Scholar
Yin Li
View author publications
You can also search for this author in PubMed Google Scholar
James M. Rehg
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Microsoft Research Ltd., CB3 0FB, Cambridge, UK
Andrew Fitzgibbon
Dept. of Computer Science, University of North Carolina, 27599, Chapel Hill, NC, USA
Svetlana Lazebnik
California Institute of Technology, 91125, Pasadena, CA, USA
Pietro Perona
Institute of Industrial Science, The University of Tokyo, 153-8505, Tokyo, Japan
Yoichi Sato
INRIA, 38330, Montbonnot, France
Cordelia Schmid

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fathi, A., Li, Y., Rehg, J.M. (2012). Learning to Recognize Daily Actions Using Gaze. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds) Computer Vision – ECCV 2012. ECCV 2012. Lecture Notes in Computer Science, vol 7572. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33718-5_23

Download citation

DOI: https://doi.org/10.1007/978-3-642-33718-5_23
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33717-8
Online ISBN: 978-3-642-33718-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Learning to Recognize Daily Actions Using Gaze

Abstract

Chapter PDF

Similar content being viewed by others

ACE-DNV: Automatic classification of gaze events in dynamic natural viewing

‘Labelling the Gaps’: A Weakly Supervised Automatic Eye Gaze Estimation

In the Eye of Transformer: Global–Local Correlation for Egocentric Gaze Estimation and Beyond

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Learning to Recognize Daily Actions Using Gaze

Abstract

Chapter PDF

Similar content being viewed by others

ACE-DNV: Automatic classification of gaze events in dynamic natural viewing

‘Labelling the Gaps’: A Weakly Supervised Automatic Eye Gaze Estimation

In the Eye of Transformer: Global–Local Correlation for Egocentric Gaze Estimation and Beyond

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.