0% found this document useful (0 votes)
535 views15 pages

Music Genre Classification Slides

The document discusses using machine learning techniques for music genre classification. It explores using convolutional neural networks on MEL spectrograms of audio clips to classify songs into 7 genres. A pre-trained VGG-16 model is used for transfer learning. Additional spectral features are also extracted from the audio. Different classifiers like logistic regression, random forests, and SVMs are trained and evaluated. Results show ensembling classifiers improves performance over a single model. Frequency domain features perform better than time domain features. The confusion matrix reveals some genres like rock are predicted accurately while others like rhythm and blues see more misclassifications.

Uploaded by

Manoj Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
535 views15 pages

Music Genre Classification Slides

The document discusses using machine learning techniques for music genre classification. It explores using convolutional neural networks on MEL spectrograms of audio clips to classify songs into 7 genres. A pre-trained VGG-16 model is used for transfer learning. Additional spectral features are also extracted from the audio. Different classifiers like logistic regression, random forests, and SVMs are trained and evaluated. Results show ensembling classifiers improves performance over a single model. Frequency domain features perform better than time domain features. The confusion matrix reveals some genres like rock are predicted accurately while others like rhythm and blues see more misclassifications.

Uploaded by

Manoj Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Music Genre Classification using Machine

Learning Techniques

CS 698 - Computational Audio

Hareesh Bahuleyan
Problem Statement
● Music genres are a way to classify
music based on rhythmic structure,
harmonic content and
instrumentation

● Automatically recognition
○ Organize digital libraries
○ Provide recommendations
Data
Google Audio Set
● 2.1 Million audio samples (of 10 seconds)
● 527 classes of sounds
● Selected 7 labels

● Not the actual audio, just the YouTubeIDs,


start and end times
● 880 KB per wav file,
● Approximately 34 GB data
Convolutional Neural
Networks
MEL Spectrograms
● 2D colormap representation of the signal
● STFT: Window size = 2048, Hop size = 512, Hann window function, Number
of MEL bins = 96
CNN - Image Classification
● Consider spectrogram as an image and train a CNN classifier
● Matrix of pixel values - 3 channel RGB input
Convolution Block
Convolution Pooling Non-Linear Activation

Source: http://www.wildml.com/2015/11/understanding-convolutional-neural-networks-for-nlp/
VGG-16

● Transfer Learning
○ Weights of conv base are fixed
● Fine Tuning
○ Both conv base and feed-forward network are trainable
Feature Engineering
Approaches
Feature Extraction

Time Domain Frequency Domain Classifiers


1. Mean 1. MEL Frequency Cepstral 1. Logistic Regression
Coefficients (MFCCs)
2. Variance 2. Random Forest
2. Chroma Features
3. Skewness 3. Support Vector
3. Spectral Centroid Machines
4. Kurtosis
4. Spectral Band-widths 4. Extreme Gradient
5. Zero Crossing Rate Boosting
5. Spectral Contrast
6. Root Mean Square
Energy 6. Spectral Roll-offs

7. Tempo

● Total Number of Features = 97


Spectral Features
● Spectral Centroid

● Spectral Band-width

● Spectral Contrast
○ Divide spectrum into frequency bands
○ Maximum magnitude - Minimum magnitude in each band
● Spectral Roll-off
○ Frequency below which 85% of the total energy in the spectrum lies
● Chroma Features
○ 12-element feature vector
○ Indicates how much energy of each pitch class, {C, C#, D, D#, E, ..., B}
Results
Comparison of Models
● Metrics: Accuracy | F-score | AUC

Baseline uses flatten


vector of pixels

Ensembling classifiers
is beneficial
Feature Importance Study
Keep only
most
important
top N
features

Time domain vs.


Frequency domain
Confusion Matrix

Good at predicting some


classes. Eg: Rock

Many mis-classifications
for Rhythm blues, Pop
genre

Classes are also


unbalanced
Thank You

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy