skip to main content
10.1145/2733373.2806242acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Image Profiling for History Events on the Fly

Published: 13 October 2015 Publication History

Abstract

History event related knowledge is precious and imagery is a powerful medium that records diverse information about the event. In this paper, we propose to automatically construct an image profile given a one sentence description of the historic event which contains where, when, who and what elements. Such a simple input requirement makes our solution easy to scale up and support a wide range of culture preservation and curation related applications ranging from wikipedia enrichment to history education. However, history relevant information on the web is available as "wild and dirty" data, which is quite different from clean, manually curated and structured information sources. There are two major challenges to build our proposed image profiles: 1) unconstrained image genre diversity. We categorize images into genres of documents/maps, paintings or photos. Image genre classification involves a full-spectrum of features from low-level color to high-level semantic concepts. 2) image content diversity. It can include faces, objects and scenes. Furthermore, even within the same event, the views and subjects of images are diverse and correspond to different facets of the event. To solve this challenge, we group images at two levels of granularity: iconic image grouping and facet image grouping. These require different types of features and analysis from near exact matching to soft semantic similarity. We develop a full-range feature analysis module which is composed of several levels, each suitable for different types of image analysis tasks. The wide range of features are based on both classical hand-crafted features and different layers of a convolutional neural network. We compare and study the performance of the different levels in the full-range features and show their effectiveness on handling such a wild, unconstrained dataset.

References

[1]
B-25 empire state building crash. http://en.wikipedia.org/wiki/B-25_Empire_State_Building_crash.
[2]
Victor hugo quotes. http://www.brainyquote.com/quotes/quotes/v/victorhugo385868.html.
[3]
O. Chum and J. Matas. Optimal randomized RANSAC. IEEE Trans. Pattern Anal. Mach. Intell., 30(8):1472--1482, 2008.
[4]
F. Cutzu, R. I. Hammoud, and A. Leykin. Estimating the photorealism of images: Distinguishing paintings from photographs. In 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2003), 16-22 June 2003, Madison, WI, USA, pages 305--312, 2003.
[5]
J. Deng, W. Dong, R. Socher, L. Li, K. Li, and F. Li. Imagenet: A large-scale hierarchical image database. In 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2009), 20-25 June 2009, Miami, Florida, USA, pages 248--255, 2009.
[6]
J. Donahue, Y. Jia, O. Vinyals, J. Hoffman, N. Zhang, E. Tzeng, and T. Darrell. Decaf: A deep convolutional activation feature for generic visual recognition. In Proceedings of the 31th International Conference on Machine Learning, ICML 2014, Beijing, China, 21-26 June 2014, pages 647--655, 2014.
[7]
Z. Feng, J. Chen, X. Wu, and Y. Yu. Aggregation-based probing for large-scale duplicate image detection. In Web Technologies and Applications - 15th Asia-Pacific Web Conference, APWeb 2013, Sydney, Australia, April 4-6, 2013. Proceedings, pages 417--428, 2013.
[8]
B. J. Frey and D. Dueck. Clustering by passing messages between data points. Science, 315:972--976, 2007.
[9]
R. B. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In 2014 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2014, Columbus, OH, USA, June 23-28, 2014, pages 580--587, 2014.
[10]
J. Hays and A. A. Efros. Scene completion using millions of photographs. ACM Trans. Graph., 26(3):4, 2007.
[11]
G. B. Huang, M. Ramesh, T. Berg, and E. Learned-Miller. Labeled faces in the wild: A database for studying face recognition in unconstrained environments. Technical Report 07-49, University of Massachusetts, Amherst, October 2007.
[12]
L. Hubert and P. Arabie. Comparing partitions. Journal of Classification, 2(1):193--218, 1985.
[13]
H. il Koo and N. I. Cho. Rectification of figures and photos in document images using bounding box interface. In The Twenty-Third IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2010, San Francisco, CA, USA, 13--18 June 2010, pages 3121--3128, 2010.
[14]
A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a meeting held December 3-6, 2012, Lake Tahoe, Nevada, United States., pages 1106--1114, 2012.
[15]
S. Lazebnik, C. Schmid, and J. Ponce. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2006), 17--22 June 2006, New York, NY, USA, pages 2169--2178, 2006.
[16]
X. Li, C. Wu, C. Zach, S. Lazebnik, and J. Frahm. Modeling and recognition of landmark image collections using iconic scene graphs. In Computer Vision - ECCV 2008, 10th European Conference on Computer Vision, Marseille, France, October 12-18, 2008, Proceedings, Part I, pages 427--440, 2008.
[17]
M. Lin, Z. Hu, S. Liu, M. Wang, R. Hong, and S. Yan. eheritage of shadow puppetry: creation and manipulation. In ACM Multimedia Conference, MM '13, Barcelona, Spain, October 21-25, 2013, pages 183--192, 2013.
[18]
D. G. Lowe. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2):91--110, 2004.
[19]
M. Lux and S. A. Chatzichristofis. Lire: lucene image retrieval: an extensible java CBIR library. In Proceedings of the 16th International Conference on Multimedia 2008, Vancouver, British Columbia, Canada, October 26-31, 2008, pages 1085--1088, 2008.
[20]
C. D. Manning, M. Surdeanu, J. Bauer, J. Finkel, S. J. Bethard, and D. McClosky. The Stanford CoreNLP natural language processing toolkit. In Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pages 55--60, 2014.
[21]
X. V. Nguyen, J. Epps, and J. Bailey. Information theoretic measures for clusterings comparison: Variants, properties, normalization and correction for chance. Journal of Machine Learning Research, 11:2837--2854, 2010.
[22]
A. Oliva and A. Torralba. Modeling the shape of the scene: A holistic representation of the spatial envelope. International Journal of Computer Vision, 42(3):145--175, 2001.
[23]
J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. Object retrieval with large vocabularies and fast spatial matching. In 2007 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2007), 18-23 June 2007, Minneapolis, Minnesota, USA, 2007.
[24]
A. S. Razavian, H. Azizpour, J. Sullivan, and S. Carlsson. CNN features off-the-shelf: An astounding baseline for recognition. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR Workshops 2014, Columbus, OH, USA, June 23-28, 2014, pages 512--519, 2014.
[25]
E. Roman-Rangel, C. P. Gayol, J. Odobez, and D. Gatica-Perez. Searching the past: an improved shape descriptor to retrieve maya hieroglyphs. In Proceedings of the 19th International Conference on Multimedia 2011, Scottsdale, AZ, USA, November 28 - December 1, 2011, pages 163--172, 2011.
[26]
A. Rosenberg and J. Hirschberg. V-measure: A conditional entropy-based external cluster evaluation measure. In EMNLP-CoNLL 2007, Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, June 28-30, 2007, Prague, Czech Republic, pages 410--420, 2007.
[27]
P. Salembier and T. Sikora. Introduction to MPEG-7: Multimedia Content Description Interface. John Wiley & Sons, Inc., New York, NY, USA, 2002.
[28]
P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, and Y. LeCun. Overfeat: Integrated recognition, localization and detection using convolutional networks. CoRR.
[29]
C. Simon, Williem, J. Choe, I. D. Yun, and I. K. Park. Correcting photometric distortion of document images on a smartphone. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR Workshops 2014, Columbus, OH, USA, June 23--28, 2014, pages 199--200, 2014.
[30]
K. Song, Y. Tian, W. Gao, and T. Huang. Diversifying the image retrieval results. In Proceedings of the 14th ACM International Conference on Multimedia, Santa Barbara, CA, USA, October 23-27, 2006, pages 707--710, 2006.
[31]
L. van der Maaten. Accelerating t-sne using tree-based algorithms. Journal of Machine Learning Research, 15(1):3221--3245, 2014.
[32]
R. H. van Leuken, L. G. Pueyo, X. Olivares, and R. van Zwol. Visual diversification of image search results. In Proceedings of the 18th International Conference on World Wide Web, WWW 2009, Madrid, Spain, April 20-24, 2009, pages 341--350, 2009.
[33]
A. Vedaldi and K. Lenc. Matconvnet -- convolutional neural networks for matlab. CoRR, abs/1412.4564, 2014.
[34]
B. Wang, Z. Li, M. Li, and W. Ma. Large-scale duplicate detection for web image search. In Proceedings of the 2006 IEEE International Conference on Multimedia and Expo, ICME 2006, July 9-12 2006, Toronto, Ontario, Canada, pages 353--356, 2006.
[35]
J. Xiao, J. Hays, K. A. Ehinger, A. Oliva, and A. Torralba. SUN database: Large-scale scene recognition from abbey to zoo. In The Twenty-Third IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2010, San Francisco, CA, USA, 13-18 June 2010, pages 3485--3492, 2010.
[36]
M. D. Zeiler and R. Fergus. Visualizing and understanding convolutional networks. In Computer Vision - ECCV 2014 - 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part I, pages 818--833, 2014.
[37]
J. Zhang and S. Sclaroff. Saliency detection: A boolean map approach. In IEEE International Conference on Computer Vision, ICCV 2013, Sydney, Australia, December 1-8, 2013, pages 153--160, 2013.
[38]
S. Zhao, Y. Gao, X. Jiang, H. Yao, T. Chua, and X. Sun. Exploring principles-of-art features for image emotion recognition. In Proceedings of the ACM International Conference on Multimedia, MM '14, Orlando, FL, USA, November 03 - 07, 2014, pages 47--56, 2014.

Cited By

View all
  • (2019)Boolean Map Saliency: A Surprisingly Simple MethodVisual Saliency: From Pixel-Level to Object-Level Analysis10.1007/978-3-030-04831-0_2(11-31)Online publication date: 22-Jan-2019
  • (2016)History RhymeProceedings of the 24th ACM international conference on Multimedia10.1145/2964284.2973832(749-751)Online publication date: 1-Oct-2016
  • (2016)Semantic Image Profiling for Historic EventsProceedings of the 24th ACM international conference on Multimedia10.1145/2964284.2964306(1028-1037)Online publication date: 1-Oct-2016

Index Terms

  1. Image Profiling for History Events on the Fly

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    MM '15: Proceedings of the 23rd ACM international conference on Multimedia
    October 2015
    1402 pages
    ISBN:9781450334594
    DOI:10.1145/2733373
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 13 October 2015

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. history event
    2. image profiling

    Qualifiers

    • Research-article

    Funding Sources

    • Beijing Natural Science Foundation
    • National Science Foundation (NSF)
    • Research Funds of Renmin University of China

    Conference

    MM '15
    Sponsor:
    MM '15: ACM Multimedia Conference
    October 26 - 30, 2015
    Brisbane, Australia

    Acceptance Rates

    MM '15 Paper Acceptance Rate 56 of 252 submissions, 22%;
    Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)4
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 22 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2019)Boolean Map Saliency: A Surprisingly Simple MethodVisual Saliency: From Pixel-Level to Object-Level Analysis10.1007/978-3-030-04831-0_2(11-31)Online publication date: 22-Jan-2019
    • (2016)History RhymeProceedings of the 24th ACM international conference on Multimedia10.1145/2964284.2973832(749-751)Online publication date: 1-Oct-2016
    • (2016)Semantic Image Profiling for Historic EventsProceedings of the 24th ACM international conference on Multimedia10.1145/2964284.2964306(1028-1037)Online publication date: 1-Oct-2016

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media

    pFad - Phonifier reborn

    Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

    Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


    Alternative Proxies:

    Alternative Proxy

    pFad Proxy

    pFad v3 Proxy

    pFad v4 Proxy