0% found this document useful (0 votes)
19 views5 pages

JAUDIO - Audio Extraction Lib

This document summarizes jAudio, an open-source feature extraction library for music information retrieval research. The library contains a collection of audio analysis algorithms to extract high-level features from audio signals. It aims to streamline the feature extraction process and enable researchers to focus on other areas of MIR without needing to implement their own feature extraction code. The library provides both a graphical user interface and command line interface for selecting features to extract. It outputs features in standard XML and ARFF formats to facilitate use in other MIR applications.

Uploaded by

George Kasihiuw
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views5 pages

JAUDIO - Audio Extraction Lib

This document summarizes jAudio, an open-source feature extraction library for music information retrieval research. The library contains a collection of audio analysis algorithms to extract high-level features from audio signals. It aims to streamline the feature extraction process and enable researchers to focus on other areas of MIR without needing to implement their own feature extraction code. The library provides both a graphical user interface and command line interface for selecting features to extract. It outputs features in standard XML and ARFF formats to facilitate use in other MIR applications.

Uploaded by

George Kasihiuw
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Accelerat ing t he world's research.

jAudio: A feature extraction library


Cory Mckay

Related papers Download a PDF Pack of t he best relat ed papers 

jAudio: An Feat ure Ext ract ion Library


Cory Mckay

ACE: A Framework for Opt imizing Music Classificat ion


Rebecca Fiebrink

ACE: A GENERAL-PURPOSE CLASSIFICAT ION ENSEMBLE OPT IMIZAT ION FRAMEWORK


Rebecca Fiebrink
JAUDIO: A FEATURE EXTRACTION LIBRARY

Daniel McEnnis Cory McKay Ichiro Fujinaga Philippe Depalle


Faculty of Music Faculty of Music Faculty of Music Faculty of Music
McGill University McGill University McGill University McGill University
Montreal Canada Montreal Canada Montreal Canada Montreal Canada
daniel.mcennis cory.mckay ich depalle
@mail.mcgill.ca @mail.mcgill.ca @music.mcgill.ca @music.mcgill.ca

ABSTRACT 1999). Since these features are the only information that
a classifier or other interpretive construct has about the
jAudio is a new framework for feature extraction designed
original data, a failure to capture information and patterns
to eliminate the duplication of effort in calculating fea-
inherent in the signal will result in poor performance, no
tures from an audio signal. This system meets the needs
matter how good the interpretive layer is.
of MIR researchers by providing a library of analysis al-
One such problem is the difficulty in extracting per-
gorithms that are suitable for a wide array of MIR tasks.
ceptual features such as meter or pitch from a signal.
In order to provide these features with a minimal learn-
These features, though useful, are typically not used be-
ing curve, the system implements a GUI that makes the
cause they are generally too complicated to create. This
process of selecting desired features straight forward. A
is particularly true if the main focus of the research is in
command-line interface is also provided to manipulate
other areas, not the creation of new pitch or meter detec-
jAudio via scripting. Furthermore, jAudio provides a
tion algorithms.
unique method of handling multidimensional features and
Since features are the sole mechanism by which in-
a new mechanism for dependency handling to prevent du-
terpretive layers can gain access to the latent information
plicate calculations.
of the original data source, having many different fea-
The system takes a sequence of audio files as input.
tures is desirable. Especially if feature selection or feature
In the GUI, users select the features that they wish to
weighting is used, having a multitude of features permits
have extracted—letting jAudio take care of all depen-
the interpretive layer to have as many perspectives on the
dency problems—and either execute directly from the
incoming data as possible. However, especially if the in-
GUI or save the settings for batch processing. The out-
terpretive layer is the main focus of the researchers’ ef-
put is either an ACE XML file or an ARFF file depending
forts, creating and maintaining a large array of features is
on the user’s preference.
a significant effort that may not be feasible.
The current state of feature extraction techniques has
Keywords: Java Audio Environment, Audio Feature some additional difficulties. There is no central repository
Extraction, Music Information Retrieval. of algorithms dedicated to extracting features. This means
that researchers are dependent on often sparse descriptions
from conference proceedings to identify the best features
1 INTRODUCTION to obtain for a given topic. Unfortunately, due to space
jAudio is a feature extraction system designed to meet the constraints, these descriptions tend to be terse to the point
needs of MIR researchers by providing a collection of fea- of obscurity, greatly increasing the chance that an algo-
ture extraction algorithms bundled with both an easy-to- rithm may be implemented incorrectly due to a misunder-
use GUI and a command-line interface. The system ac- standing of a source. jAudio alleviates this problem by
cepts audio files as input and produces either ACE XML providing a central repository for placing features.
files (McKay et al. 2005) or ARFF files. Furthermore, the There is also a problem in how the feature extrac-
system includes multidimensional features and a new way tors communicate with their interpreters. There has been
to handle dependencies between features. some progress made in this field as Weka’s ARFF format
Extracting high-quality features is of critical impor- has become a de facto standard (Witten and Frank 1999).
tance for many MIR projects (Fujinaga 1998; Jensen Yet, with the exception of Marsyas (Tzanetakis and Cook
2000), no existing feature extraction system provides their
output data in a standard output format.
Permission to make digital or hard copies of all or part of this More critically, feature extraction source code is either
work for personal or classroom use is granted without fee pro- not made available or is tightly coupled to the classifica-
vided that copies are not made or distributed for profit or com- tion or analysis code. This prevents the reuse of this code
mercial advantage and that copies bear this notice and the full in other contexts, limiting the ability of researchers to ex-
citation on the first page. change feature extraction algorithms.
Furthermore, many algorithms for feature extraction
c
2005 Queen Mary, University of London

600
are implemented in a platform dependent way that does is geared towards visualization rather than controlling the
not necessarily function properly on computers of another feature extraction process (Pfeiffer et al. 2005).
architecture or sometimes on systems of the same archi-
tecture but different configuration.
Another concern that needs to be addressed is how
3 DESIGN DECISIONS
easily the feature extraction platform can be extended. A In order to address the issues introduced in Section 1, it
complicated setup means that few features will be added was necessary to make a number of design decisions that
by anyone but the maintainers, drastically limiting the use- shaped jAudio.
fulness of the project.
3.1 Java based
2 RELATED RESEARCH jAudio was written in Java in order to capitalize on Java’s
Efforts to extract a large number of features in a single ex- cross-platform portability and design advantages. A cus-
periment have been done before. Papers such as Fujinaga tom low-level audio layer was implemented in order to
(1998), Hong (2000), and Jensen (1999) are known for supplement Java’s limited core audio support and allow
their large feature collection. However, the creation of li- those writing jAudio features to deal directly with arrays
braries to avoid duplication of the effort of writing feature of sample values rather than needing to concern them-
extraction algorithms is a relatively new phenomenon. selves directly with low-level issues such as buffering and
format conversions.
2.1 Marsyas
3.2 XML and Weka output
Marsyas by George Tzanetakis is a pioneer in this area.
His system is implemented in C++. The system is both jAudio supports multiple output formats, including both
efficient and open source. Despite being integrated into a the native XML format of the ACE (Autonomous Classi-
general classification system, Marsyas retains the ability fier Engine) system, which is a framework for optimizing
to output feature extraction data to Weka’s ARFF format. classifiers (McKay et al. 2005), and the ARFF format used
One drawback is the complicated interface for controlling by the popular Weka analysis toolkit (Witten and Frank
the features selected for extraction in the extraction sub- 1999). This permits utilizing both the ACE environment
system (Tzanetakis and Cook 2000). for classification problems and providing a more estab-
lished format. Export to ARFF is accomplished by treat-
ing all multidimensional features as collections of individ-
2.2 CLAM
ual features.
CLAM is produced by the Music Technology Group at
Pompeu Fabra University (Amatrain et al. 2002). The sys- 3.3 Handling dependencies
tem is an analysis/synthesis system and is implemented in
C++. While a good general system with a good GUI user In order to reduce the complexity of calculations, it is of-
interface, the system was not intended for extracting fea- ten advantageous to reuse the results of an earlier calcu-
tures for classification problems. lation in other modules. jAudio provides a simple way
for a feature class to declare which features it requires in
order to be calculated. An example is the magnitude spec-
2.3 M2K
trum of a signal. It is used by a number of features, but
M2K is built upon the D2K data mining software devel- only needs to be calculated once. Just before execution
oped at NCSA at the University of Illinois (Downie et al. begins, jAudio reorders the execution of feature calcula-
2004). The system utilizes the GUI architecture of D2K tions such that every feature’s calculation is executed only
patches—a very intuitive method for building large hier- after all of its dependencies have been executed. Further-
archies of feature sets. Unfortunately, the system is cur- more, unlike any other system, the user need not know
rently in an alpha state. Further complicating this diffi- the dependencies of the features selected. Any feature se-
culty is the commercial license of the underlying D2K sys- lected for output that has dependencies will automatically
tem. While M2K is available under a free license, the D2K and silently calculate dependent features as needed with-
system is only available for academic use. This makes the out replication.
system more difficult to obtain by researchers outside the
United States and raises the possibility of licensing prob- 3.4 Support for multidimensional features
lems for those researchers whose work with M2K blurs
the edge between a research tool and a commercial open- jAudio has the capacity to accept features that provide an
source application. arbitrary number of dimensions. This is an extremely use-
ful way to group related features calculated at once such as
MFCC. This is in contrast to the ARFF format from Weka
2.4 Maaate
where all features are unidimensional. Furthermore, the
Maaate is produced by the Commonwealth Scientific and dimensionality of each feature is exported. This permits
Industrial Research Organization. It was built primarily to derivatives and other metafeatures to have the same num-
extract features from MPEG 1 audio rather than uncom- ber of dimensions as the feature they are calculated from,
pressed audio. While it has a GUI front end, the GUI even though the same code is used for all features.

601
Figure 1: Screenshot of jAudio GUI.

3.5 Intuitive interface • Zero Crossing


jAudio permits control of downsampling of the input sig- Zero Crossing is calculated by counting the num-
nal, signal normalization, window size, window overlap, ber of times that the time domain signal crosses zero
and control of which features are extracted and saved with within a given window. ’Crossing zero’ is defined as
an easy to use GUI (See Figure 1). The GUI also per- (xn−1 < 0 and xn > 0) or (xn−1 > 0 and xn < 0)
mits users to configure settings and save them for batch or (xn−1 6= 0 and xn = 0) .
processing.
• RMS

3.6 License RMS is calculated on a per window basis. It is de-


fined by the equation:
All source code is publicly available on the Internet s
(http://coltrane.music.mcgill.ca/ACE) under the Lesser PN 2
GNU Public License (LGPL). n xn
RM S = (1)
N
3.7 Extensibility where N is the total number of samples provided in
Effort was taken to make it as easy as possible to add the time domain. RMS is used to calculate the am-
new features and associated documentation to the system. plitude of a window.
An abstract class is provided that includes all the features • Fraction of Low Amplitude Frames
needed to implement a feature.
This feature is defined as the fraction of previous
windows whose RMS is less than the mean RMS.
3.8 Metafeatures This gives an indication of the variability of the am-
Metafeatures are templates that can be applied against any plitude of windows.
feature to create new features. Examples of metafeatures
include Derivative, Mean, and Standard Deviation. Each • Spectral Flux
of these metafeatures are automatically applied to all fea- Spectral Flux is defined as the spectral correlation be-
tures without the user needing to explicitly create these tween adjacent windows (McAdams 1999). It is of-
derivative features. ten used as an indication of the degree of change of
the spectrum between windows.
4 IMPLEMENTED FEATURES • Spectral Rolloff
There are 27 distinct features implemented in jAudio. The Spectral rolloff is defined as the frequency where
following is a non-exhaustive list. 85% of the energy in the spectrum is below this point.

602
It is often used as an indicator of the skew of the fre- Also, if the license issues can be resolved, we would
quencies present in a window. like to merge our development efforts into the M2K
project. This would allow the project to take advantage
• Compactness of the extensive GUI support while maintaining the exist-
Compactness is closely related to Spectral Smooth- ing benefits of jAudio.
ness as defined by McAdams (1999). The difference
is that instead of summing over partials, compactness
sums over frequency bins of an FFT. This provides an REFERENCES
indication of the noisiness of the signal. X. Amatrain, P. Arumi, and M. Ramirez. Clam: Yet an-
other library for audio and music processing? In Pro-
• Method of Moments
ceedings of the ACM Conference on Object Oriented
This feature consists of the first five statistical mo- Programming, Systems, and Applications, 2002. 22–3.
ments of the spectrograph. This includes the area (ze-
B. Bogert, M. Healy, and M. Healy. The quefrency
roth order), mean (first order), Power Spectrum Den-
alanysis of time series for echoes: cepstrum, pseudo-
sity (second order), Spectral Skew (third order), and
autocovariance, cross-cepstrum, and saphe-cracking. In
Spectral Kurtosis (fourth order). These features de-
Proceedings of the Symposium Time Series Analysis,
scribe the shape of the spectrograph of a given win-
1963. 209–43.
dow (Fujinaga 1997).
S. Downie, J. Futrelle, and D. Tcheng. The international
• 2D Method of Moments music information retrieval systems evaluation labora-
This feature treats a series of frames of spectral data tory. International Conference on Music Information
as a two dimensional image which are then analyzed Retrieval, 2004. 9–14.
using two-dimensional method of moments (Fuji- I. Fujinaga. Machine recognition of timbre using steady-
naga 1997). This gives a description of the spectro- state tone of acoustic musical instruments. In Proceed-
graph, including its changes, over a relatively short ings of the International Computer Music Conference,
time frame. 1998. 207–10.
• MFCC I. Fujinaga. Adaptive optical music recognition. PhD the-
Mel-Frequency Cepstral Coefficients (MFCCs) are sis, McGill University, 1997.
calculated according to the formula by Bogert et al. T. Hong. Salient feature extraction of musical instrument
(1963). The calculations are implemented here using signals. Master’s thesis, Dartmouth College, 2000.
the code taken from the Orange Cow voice recogni- K. Jensen. Timbre models of musical sounds. PhD thesis,
tion project (Su et al. 2005). This is useful for de- Kobenhavens Universitet, 1999.
scribing a spectrum window.
S. McAdams. Perspectives on the contribution of timbre
• Beat Histogram to musical structure. Computer Music Journal, 23:85–
This feature autocorrelates the RMS for each bin in 102, 1999.
order to construct a histogram representing rhyth- C. McKay, D. McEnnis, R. Fiebrink, and I. Fujinaga.
mic regularities. This is used as a base feature for Ace: A framework for optomizing music classifica-
determining best tempo match (Scheirer and Slaney tion. International Conference on Music Information
1997). Retrieval, 2005.
S. Pfeiffer, C. Parker, and T. Vincent. Maate, 2005. URL
5 CONCLUSIONS http://www.cmis.csiro.au/maaate/. [Ac-
cessed April 14, 2005].
jAudio provides a comprehensive solution to the problem
of the duplication of work in programming feature extrac- E. Scheirer and M. Slaney. Construction and evaluation
tion. This system permits general use of a large number of a robust multi-feature speech/music discriminator. In
of features in a fashion that is both easy to use and ex- Proceedings of the International Conference on Acous-
tensible. The GUI permits the system to be easily config- tics, Speech, and Signal Processing, 1997.
ured with minimal effort and the command-line interface C. Su, K. Fung, and A. Leonov. Oc volume, 2005. URL
permits easy batch processing. The system also provides http://ocvolume.sourceforge.net/. [Ac-
a central repository for the storing of feature algorithms cessed April 14, 2005].
with an unambiguous meaning with output that can be
G. Tzanetakis and P. Cook. Marsyas: A framework for
read by either ACE or Weka.
audio analysis. Organized Sound, 10:293–302, 2000.
I. Witten and E. Frank. Data mining: Practical machine
6 FUTURE WORK learning tools and techniques with Java implementa-
The set of features provided by jAudio is by no means tions. San Fransisco: Morgan Kaufmann, 1999.
comprehensive. Numerous additional features remain to
be added. In particular, the system needs an implemen-
tation of LPC and the ability to process multiple window
sizes concurrently.

603

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy