research-article

Image Profiling for History Events on the Fly

Authors:

Alexander G. HauptmannAuthors Info & Claims

MM '15: Proceedings of the 23rd ACM international conference on Multimedia

Pages 291 - 300

https://doi.org/10.1145/2733373.2806242

Published: 13 October 2015 Publication History

Abstract

History event related knowledge is precious and imagery is a powerful medium that records diverse information about the event. In this paper, we propose to automatically construct an image profile given a one sentence description of the historic event which contains where, when, who and what elements. Such a simple input requirement makes our solution easy to scale up and support a wide range of culture preservation and curation related applications ranging from wikipedia enrichment to history education. However, history relevant information on the web is available as "wild and dirty" data, which is quite different from clean, manually curated and structured information sources. There are two major challenges to build our proposed image profiles: 1) unconstrained image genre diversity. We categorize images into genres of documents/maps, paintings or photos. Image genre classification involves a full-spectrum of features from low-level color to high-level semantic concepts. 2) image content diversity. It can include faces, objects and scenes. Furthermore, even within the same event, the views and subjects of images are diverse and correspond to different facets of the event. To solve this challenge, we group images at two levels of granularity: iconic image grouping and facet image grouping. These require different types of features and analysis from near exact matching to soft semantic similarity. We develop a full-range feature analysis module which is composed of several levels, each suitable for different types of image analysis tasks. The wide range of features are based on both classical hand-crafted features and different layers of a convolutional neural network. We compare and study the performance of the different levels in the full-range features and show their effectiveness on handling such a wild, unconstrained dataset.

References

[1]

B-25 empire state building crash. http://en.wikipedia.org/wiki/B-25_Empire_State_Building_crash.

[2]

Victor hugo quotes. http://www.brainyquote.com/quotes/quotes/v/victorhugo385868.html.

[3]

O. Chum and J. Matas. Optimal randomized RANSAC. IEEE Trans. Pattern Anal. Mach. Intell., 30(8):1472--1482, 2008.

Digital Library

[4]

F. Cutzu, R. I. Hammoud, and A. Leykin. Estimating the photorealism of images: Distinguishing paintings from photographs. In 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2003), 16-22 June 2003, Madison, WI, USA, pages 305--312, 2003.

[5]

J. Deng, W. Dong, R. Socher, L. Li, K. Li, and F. Li. Imagenet: A large-scale hierarchical image database. In 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2009), 20-25 June 2009, Miami, Florida, USA, pages 248--255, 2009.

[6]

J. Donahue, Y. Jia, O. Vinyals, J. Hoffman, N. Zhang, E. Tzeng, and T. Darrell. Decaf: A deep convolutional activation feature for generic visual recognition. In Proceedings of the 31th International Conference on Machine Learning, ICML 2014, Beijing, China, 21-26 June 2014, pages 647--655, 2014.

Digital Library

[7]

Z. Feng, J. Chen, X. Wu, and Y. Yu. Aggregation-based probing for large-scale duplicate image detection. In Web Technologies and Applications - 15th Asia-Pacific Web Conference, APWeb 2013, Sydney, Australia, April 4-6, 2013. Proceedings, pages 417--428, 2013.

[8]

B. J. Frey and D. Dueck. Clustering by passing messages between data points. Science, 315:972--976, 2007.

[9]

R. B. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In 2014 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2014, Columbus, OH, USA, June 23-28, 2014, pages 580--587, 2014.

Digital Library

[10]

J. Hays and A. A. Efros. Scene completion using millions of photographs. ACM Trans. Graph., 26(3):4, 2007.

Digital Library

[11]

G. B. Huang, M. Ramesh, T. Berg, and E. Learned-Miller. Labeled faces in the wild: A database for studying face recognition in unconstrained environments. Technical Report 07-49, University of Massachusetts, Amherst, October 2007.

[12]

L. Hubert and P. Arabie. Comparing partitions. Journal of Classification, 2(1):193--218, 1985.

[13]

H. il Koo and N. I. Cho. Rectification of figures and photos in document images using bounding box interface. In The Twenty-Third IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2010, San Francisco, CA, USA, 13--18 June 2010, pages 3121--3128, 2010.

[14]

A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a meeting held December 3-6, 2012, Lake Tahoe, Nevada, United States., pages 1106--1114, 2012.

Digital Library

[15]

S. Lazebnik, C. Schmid, and J. Ponce. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2006), 17--22 June 2006, New York, NY, USA, pages 2169--2178, 2006.

Digital Library

[16]

X. Li, C. Wu, C. Zach, S. Lazebnik, and J. Frahm. Modeling and recognition of landmark image collections using iconic scene graphs. In Computer Vision - ECCV 2008, 10th European Conference on Computer Vision, Marseille, France, October 12-18, 2008, Proceedings, Part I, pages 427--440, 2008.

Digital Library

[17]

M. Lin, Z. Hu, S. Liu, M. Wang, R. Hong, and S. Yan. eheritage of shadow puppetry: creation and manipulation. In ACM Multimedia Conference, MM '13, Barcelona, Spain, October 21-25, 2013, pages 183--192, 2013.

Digital Library

[18]

D. G. Lowe. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2):91--110, 2004.

Digital Library

[19]

M. Lux and S. A. Chatzichristofis. Lire: lucene image retrieval: an extensible java CBIR library. In Proceedings of the 16th International Conference on Multimedia 2008, Vancouver, British Columbia, Canada, October 26-31, 2008, pages 1085--1088, 2008.

Digital Library

[20]

C. D. Manning, M. Surdeanu, J. Bauer, J. Finkel, S. J. Bethard, and D. McClosky. The Stanford CoreNLP natural language processing toolkit. In Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pages 55--60, 2014.

[21]

X. V. Nguyen, J. Epps, and J. Bailey. Information theoretic measures for clusterings comparison: Variants, properties, normalization and correction for chance. Journal of Machine Learning Research, 11:2837--2854, 2010.

Digital Library

[22]

A. Oliva and A. Torralba. Modeling the shape of the scene: A holistic representation of the spatial envelope. International Journal of Computer Vision, 42(3):145--175, 2001.

Digital Library

[23]

J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. Object retrieval with large vocabularies and fast spatial matching. In 2007 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2007), 18-23 June 2007, Minneapolis, Minnesota, USA, 2007.

[24]

A. S. Razavian, H. Azizpour, J. Sullivan, and S. Carlsson. CNN features off-the-shelf: An astounding baseline for recognition. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR Workshops 2014, Columbus, OH, USA, June 23-28, 2014, pages 512--519, 2014.

Digital Library

[25]

E. Roman-Rangel, C. P. Gayol, J. Odobez, and D. Gatica-Perez. Searching the past: an improved shape descriptor to retrieve maya hieroglyphs. In Proceedings of the 19th International Conference on Multimedia 2011, Scottsdale, AZ, USA, November 28 - December 1, 2011, pages 163--172, 2011.

Digital Library

[26]

A. Rosenberg and J. Hirschberg. V-measure: A conditional entropy-based external cluster evaluation measure. In EMNLP-CoNLL 2007, Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, June 28-30, 2007, Prague, Czech Republic, pages 410--420, 2007.

[27]

P. Salembier and T. Sikora. Introduction to MPEG-7: Multimedia Content Description Interface. John Wiley & Sons, Inc., New York, NY, USA, 2002.

Digital Library

[28]

P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, and Y. LeCun. Overfeat: Integrated recognition, localization and detection using convolutional networks. CoRR.

[29]

C. Simon, Williem, J. Choe, I. D. Yun, and I. K. Park. Correcting photometric distortion of document images on a smartphone. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR Workshops 2014, Columbus, OH, USA, June 23--28, 2014, pages 199--200, 2014.

Digital Library

[30]

K. Song, Y. Tian, W. Gao, and T. Huang. Diversifying the image retrieval results. In Proceedings of the 14th ACM International Conference on Multimedia, Santa Barbara, CA, USA, October 23-27, 2006, pages 707--710, 2006.

Digital Library

[31]

L. van der Maaten. Accelerating t-sne using tree-based algorithms. Journal of Machine Learning Research, 15(1):3221--3245, 2014.

Digital Library

[32]

R. H. van Leuken, L. G. Pueyo, X. Olivares, and R. van Zwol. Visual diversification of image search results. In Proceedings of the 18th International Conference on World Wide Web, WWW 2009, Madrid, Spain, April 20-24, 2009, pages 341--350, 2009.

Digital Library

[33]

A. Vedaldi and K. Lenc. Matconvnet -- convolutional neural networks for matlab. CoRR, abs/1412.4564, 2014.

[34]

B. Wang, Z. Li, M. Li, and W. Ma. Large-scale duplicate detection for web image search. In Proceedings of the 2006 IEEE International Conference on Multimedia and Expo, ICME 2006, July 9-12 2006, Toronto, Ontario, Canada, pages 353--356, 2006.

[35]

J. Xiao, J. Hays, K. A. Ehinger, A. Oliva, and A. Torralba. SUN database: Large-scale scene recognition from abbey to zoo. In The Twenty-Third IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2010, San Francisco, CA, USA, 13-18 June 2010, pages 3485--3492, 2010.

[36]

M. D. Zeiler and R. Fergus. Visualizing and understanding convolutional networks. In Computer Vision - ECCV 2014 - 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part I, pages 818--833, 2014.

[37]

J. Zhang and S. Sclaroff. Saliency detection: A boolean map approach. In IEEE International Conference on Computer Vision, ICCV 2013, Sydney, Australia, December 1-8, 2013, pages 153--160, 2013.

Digital Library

[38]

S. Zhao, Y. Gao, X. Jiang, H. Yao, T. Chua, and X. Sun. Exploring principles-of-art features for image emotion recognition. In Proceedings of the ACM International Conference on Multimedia, MM '14, Orlando, FL, USA, November 03 - 07, 2014, pages 47--56, 2014.

Digital Library

Cited By

Zhang JMalmberg FSclaroff SZhang JMalmberg FSclaroff S(2019)Boolean Map Saliency: A Surprisingly Simple MethodVisual Saliency: From Pixel-Level to Object-Level Analysis10.1007/978-3-030-04831-0_2(11-31)Online publication date: 22-Jan-2019
https://doi.org/10.1007/978-3-030-04831-0_2
Xiong YChen JJin QZhang CHanjalic ASnoek CWorring MBulterman DHuet BKelliher AKompatsiaris YLi J(2016)History RhymeProceedings of the 24th ACM international conference on Multimedia10.1145/2964284.2973832(749-751)Online publication date: 1-Oct-2016
https://dl.acm.org/doi/10.1145/2964284.2973832
Chen JJin QXiong YHanjalic ASnoek CWorring MBulterman DHuet BKelliher AKompatsiaris YLi J(2016)Semantic Image Profiling for Historic EventsProceedings of the 24th ACM international conference on Multimedia10.1145/2964284.2964306(1028-1037)Online publication date: 1-Oct-2016
https://dl.acm.org/doi/10.1145/2964284.2964306

Index Terms

Image Profiling for History Events on the Fly
1. Information systems
  1. Information retrieval

Recommendations

Semantic Image Profiling for Historic Events: Linking Images to Phrases
MM '16: Proceedings of the 24th ACM international conference on Multimedia

Automatically generating image profiles for historic events is desired for history knowledge preservation and curation. However, a simple profile with groups of related images lacks explicit semantic information, such as which images correspond to which ...
Social profiling through image understanding

Linking OCEAN personality traits and preferred images in the Flickr social network.Classification of personality traits by novel image features, designed by CNN.Interpretation of visual features by an ad-hoc deconvolution strategy.Online demo ...
Image deblurring with blurred/noisy image pairs
SIGGRAPH '07: ACM SIGGRAPH 2007 papers

Taking satisfactory photos under dim lighting conditions using a hand-held camera is challenging. If the camera is set to a long exposure time, the image is blurred due to camera shake. On the other hand, the image is dark and noisy if it is taken with a ...

Comments

comments powered by Disqus.

Information & Contributors

Information

Published In

MM '15: Proceedings of the 23rd ACM international conference on Multimedia

October 2015

1402 pages

ISBN:9781450334594

DOI:10.1145/2733373

General Chairs:
Xiaofang Zhou
The University of Queensland, Australia
,
Alan F. Smeaton
Dublin City University, Ireland
,
Qi Tian
The University of Texas at San Antonio, USA
,
Program Chairs:
Dick C.A. Bulterman
FXPAL, USA
,
Heng Tao Shen
The University of Queensland, Australia
,
Ketan Mayer-Patel
The University of North Carolina, USA
,
Shuicheng Yan
National University of Singapore, Singapore

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 October 2015

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Beijing Natural Science Foundation
National Science Foundation (NSF)
Research Funds of Renmin University of China

Conference

MM '15

Sponsor:

SIGMM

MM '15: ACM Multimedia Conference

October 26 - 30, 2015

Brisbane, Australia

Acceptance Rates

MM '15 Paper Acceptance Rate 56 of 252 submissions, 22%;

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
302
Total Downloads

Downloads (Last 12 months)4
Downloads (Last 6 weeks)1

Reflects downloads up to 22 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhang JMalmberg FSclaroff SZhang JMalmberg FSclaroff S(2019)Boolean Map Saliency: A Surprisingly Simple MethodVisual Saliency: From Pixel-Level to Object-Level Analysis10.1007/978-3-030-04831-0_2(11-31)Online publication date: 22-Jan-2019
https://doi.org/10.1007/978-3-030-04831-0_2
Xiong YChen JJin QZhang CHanjalic ASnoek CWorring MBulterman DHuet BKelliher AKompatsiaris YLi J(2016)History RhymeProceedings of the 24th ACM international conference on Multimedia10.1145/2964284.2973832(749-751)Online publication date: 1-Oct-2016
https://dl.acm.org/doi/10.1145/2964284.2973832
Chen JJin QXiong YHanjalic ASnoek CWorring MBulterman DHuet BKelliher AKompatsiaris YLi J(2016)Semantic Image Profiling for Historic EventsProceedings of the 24th ACM international conference on Multimedia10.1145/2964284.2964306(1028-1037)Online publication date: 1-Oct-2016
https://dl.acm.org/doi/10.1145/2964284.2964306

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Alternative Proxies:

Alternative Proxy