Abstract
Information fusion is one of the essential part of distributed wireless sensor networks as well as perceptual user interfaces. Irrelevant and redundant data severely affect the performance of the information fusion process. In this paper, a method based on multivariate mutual information is presented to validate the acceptability of data from two sources (visual and auditory). The audiovisual information is fused to observe the ventriloquism effect to validate the algorithm. Unlike the preceding algorithms, this framework does not require any preprocessing such as automatic face recognition. Moreover, statistical modeling or feature extraction and learning algorithms are not required to extract the maximum information regions. The results for various cases, containing a single speaker as well as a group of speakers, are also presented.
Similar content being viewed by others
References
Hall, D.L.: An introduction to multisensor data fusion. Proc. IEEE 85, 6–23 (1997)
Liggins, M., Hall, D., Llinas, J.: Multisensor Data Fusion Theory and Practice (Multisensor Data Fusion). CRC Press, Boca Raton (2008)
Mitchell, B.H.: Multi-sensor Data Fusion—An Introduction. Springer, Berlin (2007)
Nam, J., Cetin, A.E., Ahmed, H.T.: Speaker identification and video analysis for hierarchical video shot classification. In: IEEE Proceedings on International Conference of Image Processing, Vol. 2 (1997)
White, F.E.: Data fusion lexicon: data fusion subpanel of the joint directors of laboratories technical panel for C3. IEEE Trans. San Diego (1991)
Durrant-Whyte, H.: Multi sensor Data Fusion. Australian Centre for Field Robotics. The University of Sydney New South Wales, Australia (2001)
Calvert, G., Spence, C., Stein, B.E.: The Handbook of Multisensory Processes. The MIT Press, Cambridge (2004)
Mcgurck, H., Macdonald, J.W.: Hearing lips and seeing voices. Nature 264, 746748 (1976)
Sumby, W.H., Pollack, I.: Visual contribution to speech intelligibility in noise. J. Acoust. Soc. Am. 26(2), 212–215 (1954)
Nam, J., Cetin, A.E., Tewfik, A.H.: Speaker identification and video analysis for hierarchical video shot classification. In: Proceedings of IEEE International Conference on Image Processing, vol. 2, pp. 550–555. Santa Barbara, USA (1997)
Dixon, N.F., Spitz, L.: The detection of auditory visual desynchrony. Perception 9(6), 719721 (1980)
Cowan, N.: Evolving conceptions of memory storage, selective attention, and their mutual constraints within the human information processing system. Psychol Bull 104(2), 163191 (1988)
Hershey, J., Movellan, J.: Audio vision: using audiovisual synchrony to locate sounds. pp. 813819 (1999)
Fisher, J.W., Darrell, T.: Signal level fusion for multimodal perceptual user interface. In: Proceedings of the 2001 Workshop on Perceptive User Interfaces (PUI’01), pp. 17. ACM Press, New York, NY, USA (2001)
Fisher, J., Principe, J.: Unsupervised learning for nonlinear synthetic discriminant functions. In: Casasent, D., Chao, T. (eds.) Proceedings of the SPIE, Optical Pattern Recognition VII, vol. 2752, pp. 213 (1996)
Dilpazir, H., Mahmood, H., Muhammad, Z., Malik, H.: Face recognition: a multivariate mutual information based approach. In: 2nd IEEE International Conference on Cybernetics, Poland (2015)
Yamasaki, H., Takahashi, K.: Advanced intelligent sensing system using sensor fusion. In: Proceedings of the 1992, International Conference on Power Electronics and Motion Control, Industrial Electronics, Control, Instrumentation, and Automation, 1992, pp. 18. IEEE, San Diego, CA, USA (1992)
Dasarathy, B.V.: Sensor fusion potential exploitation-innovative architectures and illustrative applications. Proc. IEEE 85(1), 2438 (1997)
Cover, T., Thomas, J.: Elements of Information Theory. Wiley, New York (1991)
Watanabe, S.: Information theoretical analysis of multivariate correlation. IBM J. Res. Dev. 15(4), 6682 (1960)
Foley, J.D.: Introduction to Computer Graphics. Addison Wesley, Reading (1994)
Patterson, E., Gurbuz, S., Tufekci, Z., Gowdy, J.: CUAVE: A new audio-visual database for multimodal human-computer interface research. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, 2002, Proceedings. (ICASSP ’02), vol. 2, pp. 20172020 (2002)
Evans, E.: A computationally efficient estimator for mutual information. In: Royal Society of London A: Mathematical, Physical and Engineering Sciences, 2008, Proceedings, vol. 464, No. 2093, pp. 1203-1215. The Royal Society (2008)
Fisher III, J.W., Jose, C.P.: A Methodology for information theoretic feature extraction. In: IEEE International Joint Conference on Neural Networks Proceedings and IEEE World Congress on Computational Intelligence, vol. 3, pp. 1712–1716 (1998)
Acknowledgments
This research is supported by the Higher Education Commission of Pakistan (HEC) in part by Grant Nos. 1-8/HEC/ HRD/2012/2709 and 106-2095-Ps6-127. The authors would like to thank Quaid-i-Azam University, Islamabad, Pakistan, and University of Michigan, Dearborn, USA, for providing resources to conduct this research.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Dilpazir, H., Muhammad, Z., Minhas, Q. et al. Multivariate mutual information for audio video fusion. SIViP 10, 1265–1272 (2016). https://doi.org/10.1007/s11760-016-0892-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11760-016-0892-7