Abstract
We present a comparative evaluation of automatic classification of a sound database containing more than six hundred drum sounds (kick, snare, hihat, toms and cymbals). A preliminary set of fifty descriptors has been refined with the help of different techniques and some final reduced sets including around twenty features have been selected as the most relevant. We have then tested different classification techniques (instance-based, statistical-based, and tree-based) using ten-fold cross-validation. Three levels of taxonomic classification have been tested: membranes versus plates (super-category level), kick vs. snare vs. hihat vs. toms vs. cymbals (basic level), and some basic classes (kick and snare) plus some sub-classes -i.e. ride, crash, open-hihat, closed hihat, high-tom, medium-tom, low-tom- (sub-category level). Very high hit-rates have been achieved (99%, 97%, and 90% respectively) with several of the tested techniques.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Zhang, T. and Jay Kuo, C.-C.: Classification and retrieval of sound effects in audiovisual data management. In Proceedings of 33rd Asilomar Conference on Signals, Systems, and Computers (1991)
Casey, M.A.: MPEG-7 sound recognition tools. IEEE Transactions on Circuits and Systems for Video Technology, 11, (2001) 37–747
Herrera, P., Amatriain, X., Batlle, E., Serra, X.: A critical review of automatic musical instrument classification. In Byrd, D, Downie, J.S., and Crawford, T (Eds.), Recent Research in Music Information Retrieval: Audio, MIDI, and Score Kluwer Academic Press, in preparation.
Kaminskyj, I.: Multi-feature Musical Instrument Sound Classifier. In Proceedings of Australasian Computer Music Conference (2001)
Schloss, W.A.: On the automatic transcription of percussive music-from acoustic signal to high-level analysis. STAN-M-27. Stanford, CA, CCRMA, Department of Music, Stanford University (1985)
Bilmes, J.: Timing is the essence: Perceptual and computational techniques for representing, learning and reproducing expressive timing in percussive rhythm. MSc, Thesis. Massachussetts Institute of Technology, Media Laboratory. Cambridge, MA. (1993)
McDonald, S. and Tsang, C.P.: Percussive sound identification using spectral centre trajectories. In Proceedings of 1997 Postgraduate Research Conference (1997)
Sillanpää, J.: Drum stroke recognition. Tampere University of Technology. Tampere, Finland (2000)
Sillanpää, J., Klapuri, A., Seppänen, J., and Virtanen, T.: Recognition of acoustic noise mixtures by combined bottom-up and top-down approach. In Proceedings of European Signal Processing Conference, EUSIPCO-2000 (2000)
Goto, M., Muraoka, Y.: A sound source separation system for percussion instruments. Transactions of the Institute of Electronics, Information and Communication Engineers DII, J77, 901–911 (1994)
Goto, M. and Muraoka, Y.: A real-time beat tracking system for audio signals. In Proceedings of International Computer Music Conference, 171–174 (1995)
Gouyon, F. and Herrera, P.: Exploration of techniques for automatic labeling of audio drum tracks’ instruments. In Proceedings of MOSART: Workshop on Current Directions in Computer Music (2001)
Miller, J.R., Carterette, E.C.: Perceptual space for musical structures. Journal of the Acoustical Society of America, 58, 711–720 (1975)
Grey, J.M.: Multidimensional perceptual scaling of musical timbres. Journal of the Acoustical Society of America, 61, 1270–1277 (1977)
McAdams, S., Winsberg, S., de Soete, G., and Krimphoff, J.: Perceptual scaling of synthesized musical timbres: common dimensions, specificities, and latent subject classes. Psychological Research, 58, 177–192 (1995)
Toiviainen, P., Kaipainen, M., and Louhivuori, J.: Musical timbre: Similarity ratings correlate with computational feature space distances. Journal of New Music Research, 282–298 (1995)
Lakatos, S.: A common perceptual space for harmonic and percussive timbres. Perception and Psychophysics, 62, 1426–1439 (2000)
McAdams, S., Winsberg, S.: A meta-analysis of timbre space. I: Multidimensional scaling of group data with common dimensions, specificities, and latent subject classes (2002)
Peeters, G., McAdams, S., and Herrera, P.: Instrument sound description in the context of MPEG-7. In Proceedings of Proceedings of the 2000 International Computer Music Conference (2000)
Scavone, G., Lakatos, S., Cook, P., and Harbke, C.: Perceptual spaces for sound effects obtained with an interactive similarity rating program. In Proceedings of International Symposium on Musical Acoustics (2001)
Laroche, J., Meillier, J.-L.: Multichannel excitation/filter modeling of percussive sounds with application to the piano. IEEE Transactions onSpeech and Audio Processing, 2, (1994) 329–344
Repp, B.H.: The sound of two hands clapping: An exploratory study. Journal of the Acoustical Society of America, 81, (1993) 1100–1109
Freed, A.: Auditory correlates of perceived mallet hardness for a set of recorded percussive events. Journal of the Acoustical Society of America, 87, (1990) 311–322
Klatzky, R.L., Pai, D.K., and Krotkov, E.P.: Perception of material from contact sounds. Presence: Teleoperators and Virtual Environments, 9, (2000) 399–410
Logan, B.: Mel Frequency Cepstral Coefficients for Music Modeling. In Proceedings of International Symposium on Music Information Retrieval, ISMIR-2000. Plymouth, MA, (2000)
Martin, K.D. and Kim, Y.E.: Musical instrument identification: A pattern-recognition approach. In Proceedings of Proceedings of the 136th meeting of the Acoustical Society of America. (1998)
Fujinaga, I. and MacMillan, K.: Realtime recognition of orchestral instruments. In Proceedings of the 2000 International Computer Music Conference, (2000) 141–143
Eronen, A.: Comparison of features for musical instrument recognition. In Proceedings of 2001 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA’01) (2001)
Cleary, J.G. and Trigg, L.E.: K*: An instance-based learner using an entropic distance measure. In Proceedings of International Conference on Machine Learning, (1995) 108–114
Agostini, G., Longari, M., and Pollastri, E.: Musical instrument timbres classification with spectral features. In Proceedings of IEEE Multimedia Signal Processing Conference (2001)
Quinlan, J.R.: C4.5: Programs for machine learning. Morgan Kaufmann. San Mateo, CA, (1993)
Jensen, K. and Arnspang, J.: Binary decision tree classification of musical sounds. In Proceedings of the 1999 International Computer Music Conference. (1999)
Wieczorkowska, A.: Classification of musical instrument sounds using decision trees. In Proceedings of the 8th International Symposium on Sound Engineering and Mastering, ISSEM’99, (1999) 225–230
Hall, M.A.: Correlation-based feature selection for discrete and numeric class machine learning. In Proceedings of Seventeenth International Conference on Machine Learning (2000)
Blum, A., Langley, P.: Selection of relevant features and examples in machine learning. Artificial Intelligence, 97, (1997) 245–271
Huberty, C.J.: Applied discriminant analysis. John Wiley. New York (1994)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Herrera, P., Yeterian, A., Gouyon, F. (2002). Automatic Classification of Drum Sounds: A Comparison of Feature Selection Methods and Classification Techniques. In: Anagnostopoulou, C., Ferrand, M., Smaill, A. (eds) Music and Artificial Intelligence. ICMAI 2002. Lecture Notes in Computer Science(), vol 2445. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45722-4_8
Download citation
DOI: https://doi.org/10.1007/3-540-45722-4_8
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44145-8
Online ISBN: 978-3-540-45722-0
eBook Packages: Springer Book Archive