Multivariate mutual information for audio video fusion

Dilpazir, Hammad; Muhammad, Zia; Minhas, Qurratulain; Ahmed, Faheem; Malik, Hafiz; Mahmood, Hasan

doi:10.1007/s11760-016-0892-7

Multivariate mutual information for audio video fusion

Original Paper
Published: 06 April 2016

Volume 10, pages 1265–1272, (2016)
Cite this article

Signal, Image and Video Processing Aims and scope Submit manuscript

Hammad Dilpazir ORCID: orcid.org/0000-0003-2340-1854¹,
Zia Muhammad¹,
Qurratulain Minhas¹,
Faheem Ahmed²,
Hafiz Malik³ &
…
Hasan Mahmood¹

437 Accesses
2 Citations
Explore all metrics

Abstract

Information fusion is one of the essential part of distributed wireless sensor networks as well as perceptual user interfaces. Irrelevant and redundant data severely affect the performance of the information fusion process. In this paper, a method based on multivariate mutual information is presented to validate the acceptability of data from two sources (visual and auditory). The audiovisual information is fused to observe the ventriloquism effect to validate the algorithm. Unlike the preceding algorithms, this framework does not require any preprocessing such as automatic face recognition. Moreover, statistical modeling or feature extraction and learning algorithms are not required to extract the maximum information regions. The results for various cases, containing a single speaker as well as a group of speakers, are also presented.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Bimodal fusion of visual and speech data for audiovisual speaker recognition in noisy environment

Article 22 May 2023

Multimodal Data Fusion Architectures in Audiovisual Speech Recognition

Survey on Fusion of Audiovisual Information for Multimedia Event Recognition

References

Hall, D.L.: An introduction to multisensor data fusion. Proc. IEEE 85, 6–23 (1997)
Article Google Scholar
Liggins, M., Hall, D., Llinas, J.: Multisensor Data Fusion Theory and Practice (Multisensor Data Fusion). CRC Press, Boca Raton (2008)
Book Google Scholar
Mitchell, B.H.: Multi-sensor Data Fusion—An Introduction. Springer, Berlin (2007)
MATH Google Scholar
Nam, J., Cetin, A.E., Ahmed, H.T.: Speaker identification and video analysis for hierarchical video shot classification. In: IEEE Proceedings on International Conference of Image Processing, Vol. 2 (1997)
White, F.E.: Data fusion lexicon: data fusion subpanel of the joint directors of laboratories technical panel for C3. IEEE Trans. San Diego (1991)
Durrant-Whyte, H.: Multi sensor Data Fusion. Australian Centre for Field Robotics. The University of Sydney New South Wales, Australia (2001)
Calvert, G., Spence, C., Stein, B.E.: The Handbook of Multisensory Processes. The MIT Press, Cambridge (2004)
Google Scholar
Mcgurck, H., Macdonald, J.W.: Hearing lips and seeing voices. Nature 264, 746748 (1976)
Google Scholar
Sumby, W.H., Pollack, I.: Visual contribution to speech intelligibility in noise. J. Acoust. Soc. Am. 26(2), 212–215 (1954)
Article Google Scholar
Nam, J., Cetin, A.E., Tewfik, A.H.: Speaker identification and video analysis for hierarchical video shot classification. In: Proceedings of IEEE International Conference on Image Processing, vol. 2, pp. 550–555. Santa Barbara, USA (1997)
Dixon, N.F., Spitz, L.: The detection of auditory visual desynchrony. Perception 9(6), 719721 (1980)
Article Google Scholar
Cowan, N.: Evolving conceptions of memory storage, selective attention, and their mutual constraints within the human information processing system. Psychol Bull 104(2), 163191 (1988)
Article Google Scholar
Hershey, J., Movellan, J.: Audio vision: using audiovisual synchrony to locate sounds. pp. 813819 (1999)
Fisher, J.W., Darrell, T.: Signal level fusion for multimodal perceptual user interface. In: Proceedings of the 2001 Workshop on Perceptive User Interfaces (PUI’01), pp. 17. ACM Press, New York, NY, USA (2001)
Fisher, J., Principe, J.: Unsupervised learning for nonlinear synthetic discriminant functions. In: Casasent, D., Chao, T. (eds.) Proceedings of the SPIE, Optical Pattern Recognition VII, vol. 2752, pp. 213 (1996)
Dilpazir, H., Mahmood, H., Muhammad, Z., Malik, H.: Face recognition: a multivariate mutual information based approach. In: 2nd IEEE International Conference on Cybernetics, Poland (2015)
Yamasaki, H., Takahashi, K.: Advanced intelligent sensing system using sensor fusion. In: Proceedings of the 1992, International Conference on Power Electronics and Motion Control, Industrial Electronics, Control, Instrumentation, and Automation, 1992, pp. 18. IEEE, San Diego, CA, USA (1992)
Dasarathy, B.V.: Sensor fusion potential exploitation-innovative architectures and illustrative applications. Proc. IEEE 85(1), 2438 (1997)
Article Google Scholar
Cover, T., Thomas, J.: Elements of Information Theory. Wiley, New York (1991)
Book MATH Google Scholar
Watanabe, S.: Information theoretical analysis of multivariate correlation. IBM J. Res. Dev. 15(4), 6682 (1960)
MathSciNet Google Scholar
Foley, J.D.: Introduction to Computer Graphics. Addison Wesley, Reading (1994)
MATH Google Scholar
Patterson, E., Gurbuz, S., Tufekci, Z., Gowdy, J.: CUAVE: A new audio-visual database for multimodal human-computer interface research. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, 2002, Proceedings. (ICASSP ’02), vol. 2, pp. 20172020 (2002)
Evans, E.: A computationally efficient estimator for mutual information. In: Royal Society of London A: Mathematical, Physical and Engineering Sciences, 2008, Proceedings, vol. 464, No. 2093, pp. 1203-1215. The Royal Society (2008)
Fisher III, J.W., Jose, C.P.: A Methodology for information theoretic feature extraction. In: IEEE International Joint Conference on Neural Networks Proceedings and IEEE World Congress on Computational Intelligence, vol. 3, pp. 1712–1716 (1998)

Download references

Acknowledgments

This research is supported by the Higher Education Commission of Pakistan (HEC) in part by Grant Nos. 1-8/HEC/ HRD/2012/2709 and 106-2095-Ps6-127. The authors would like to thank Quaid-i-Azam University, Islamabad, Pakistan, and University of Michigan, Dearborn, USA, for providing resources to conduct this research.

Author information

Authors and Affiliations

Department of Electronics, Quaid-i-Azam University, Islamabad, Pakistan
Hammad Dilpazir, Zia Muhammad, Qurratulain Minhas & Hasan Mahmood
Department of Computer Science, Thompson Rivers University, Kamloops, Canada
Faheem Ahmed
Department of Electrical and Computer Engineering, University of Michigan-Dearborn, Dearborn, MI, USA
Hafiz Malik

Authors

Hammad Dilpazir
View author publications
You can also search for this author in PubMed Google Scholar
Zia Muhammad
View author publications
You can also search for this author in PubMed Google Scholar
Qurratulain Minhas
View author publications
You can also search for this author in PubMed Google Scholar
Faheem Ahmed
View author publications
You can also search for this author in PubMed Google Scholar
Hafiz Malik
View author publications
You can also search for this author in PubMed Google Scholar
Hasan Mahmood
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hammad Dilpazir.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dilpazir, H., Muhammad, Z., Minhas, Q. et al. Multivariate mutual information for audio video fusion. SIViP 10, 1265–1272 (2016). https://doi.org/10.1007/s11760-016-0892-7

Download citation

Received: 19 September 2015
Revised: 10 March 2016
Accepted: 18 March 2016
Published: 06 April 2016
Issue Date: October 2016
DOI: https://doi.org/10.1007/s11760-016-0892-7

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multivariate mutual information for audio video fusion

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Bimodal fusion of visual and speech data for audiovisual speaker recognition in noisy environment

Multimodal Data Fusion Architectures in Audiovisual Speech Recognition

Survey on Fusion of Audiovisual Information for Multimedia Event Recognition

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Multivariate mutual information for audio video fusion

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Bimodal fusion of visual and speech data for audiovisual speaker recognition in noisy environment

Multimodal Data Fusion Architectures in Audiovisual Speech Recognition

Survey on Fusion of Audiovisual Information for Multimedia Event Recognition

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.