Skip to main content

Multivariate mutual information for audio video fusion

  • Original Paper
  • Published:
Signal, Image and Video Processing Aims and scope Submit manuscript

Abstract

Information fusion is one of the essential part of distributed wireless sensor networks as well as perceptual user interfaces. Irrelevant and redundant data severely affect the performance of the information fusion process. In this paper, a method based on multivariate mutual information is presented to validate the acceptability of data from two sources (visual and auditory). The audiovisual information is fused to observe the ventriloquism effect to validate the algorithm. Unlike the preceding algorithms, this framework does not require any preprocessing such as automatic face recognition. Moreover, statistical modeling or feature extraction and learning algorithms are not required to extract the maximum information regions. The results for various cases, containing a single speaker as well as a group of speakers, are also presented.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Hall, D.L.: An introduction to multisensor data fusion. Proc. IEEE 85, 6–23 (1997)

    Article  Google Scholar 

  2. Liggins, M., Hall, D., Llinas, J.: Multisensor Data Fusion Theory and Practice (Multisensor Data Fusion). CRC Press, Boca Raton (2008)

    Book  Google Scholar 

  3. Mitchell, B.H.: Multi-sensor Data Fusion—An Introduction. Springer, Berlin (2007)

    MATH  Google Scholar 

  4. Nam, J., Cetin, A.E., Ahmed, H.T.: Speaker identification and video analysis for hierarchical video shot classification. In: IEEE Proceedings on International Conference of Image Processing, Vol. 2 (1997)

  5. White, F.E.: Data fusion lexicon: data fusion subpanel of the joint directors of laboratories technical panel for C3. IEEE Trans. San Diego (1991)

  6. Durrant-Whyte, H.: Multi sensor Data Fusion. Australian Centre for Field Robotics. The University of Sydney New South Wales, Australia (2001)

  7. Calvert, G., Spence, C., Stein, B.E.: The Handbook of Multisensory Processes. The MIT Press, Cambridge (2004)

    Google Scholar 

  8. Mcgurck, H., Macdonald, J.W.: Hearing lips and seeing voices. Nature 264, 746748 (1976)

    Google Scholar 

  9. Sumby, W.H., Pollack, I.: Visual contribution to speech intelligibility in noise. J. Acoust. Soc. Am. 26(2), 212–215 (1954)

    Article  Google Scholar 

  10. Nam, J., Cetin, A.E., Tewfik, A.H.: Speaker identification and video analysis for hierarchical video shot classification. In: Proceedings of IEEE International Conference on Image Processing, vol. 2, pp. 550–555. Santa Barbara, USA (1997)

  11. Dixon, N.F., Spitz, L.: The detection of auditory visual desynchrony. Perception 9(6), 719721 (1980)

    Article  Google Scholar 

  12. Cowan, N.: Evolving conceptions of memory storage, selective attention, and their mutual constraints within the human information processing system. Psychol Bull 104(2), 163191 (1988)

    Article  Google Scholar 

  13. Hershey, J., Movellan, J.: Audio vision: using audiovisual synchrony to locate sounds. pp. 813819 (1999)

  14. Fisher, J.W., Darrell, T.: Signal level fusion for multimodal perceptual user interface. In: Proceedings of the 2001 Workshop on Perceptive User Interfaces (PUI’01), pp. 17. ACM Press, New York, NY, USA (2001)

  15. Fisher, J., Principe, J.: Unsupervised learning for nonlinear synthetic discriminant functions. In: Casasent, D., Chao, T. (eds.) Proceedings of the SPIE, Optical Pattern Recognition VII, vol. 2752, pp. 213 (1996)

  16. Dilpazir, H., Mahmood, H., Muhammad, Z., Malik, H.: Face recognition: a multivariate mutual information based approach. In: 2nd IEEE International Conference on Cybernetics, Poland (2015)

  17. Yamasaki, H., Takahashi, K.: Advanced intelligent sensing system using sensor fusion. In: Proceedings of the 1992, International Conference on Power Electronics and Motion Control, Industrial Electronics, Control, Instrumentation, and Automation, 1992, pp. 18. IEEE, San Diego, CA, USA (1992)

  18. Dasarathy, B.V.: Sensor fusion potential exploitation-innovative architectures and illustrative applications. Proc. IEEE 85(1), 2438 (1997)

    Article  Google Scholar 

  19. Cover, T., Thomas, J.: Elements of Information Theory. Wiley, New York (1991)

    Book  MATH  Google Scholar 

  20. Watanabe, S.: Information theoretical analysis of multivariate correlation. IBM J. Res. Dev. 15(4), 6682 (1960)

    MathSciNet  Google Scholar 

  21. Foley, J.D.: Introduction to Computer Graphics. Addison Wesley, Reading (1994)

    MATH  Google Scholar 

  22. Patterson, E., Gurbuz, S., Tufekci, Z., Gowdy, J.: CUAVE: A new audio-visual database for multimodal human-computer interface research. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, 2002, Proceedings. (ICASSP ’02), vol. 2, pp. 20172020 (2002)

  23. Evans, E.: A computationally efficient estimator for mutual information. In: Royal Society of London A: Mathematical, Physical and Engineering Sciences, 2008, Proceedings, vol. 464, No. 2093, pp. 1203-1215. The Royal Society (2008)

  24. Fisher III, J.W., Jose, C.P.: A Methodology for information theoretic feature extraction. In: IEEE International Joint Conference on Neural Networks Proceedings and IEEE World Congress on Computational Intelligence, vol. 3, pp. 1712–1716 (1998)

Download references

Acknowledgments

This research is supported by the Higher Education Commission of Pakistan (HEC) in part by Grant Nos. 1-8/HEC/ HRD/2012/2709 and 106-2095-Ps6-127. The authors would like to thank Quaid-i-Azam University, Islamabad, Pakistan, and University of Michigan, Dearborn, USA, for providing resources to conduct this research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hammad Dilpazir.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dilpazir, H., Muhammad, Z., Minhas, Q. et al. Multivariate mutual information for audio video fusion. SIViP 10, 1265–1272 (2016). https://doi.org/10.1007/s11760-016-0892-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11760-016-0892-7

Keywords

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy