0% found this document useful (0 votes)

24 views5 pages

Dimensionality Reduction

Uploaded by

sparkydiwakar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views5 pages

Dimensionality Reduction

Uploaded by

sparkydiwakar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Introduction to Dimensionality Reduction


What is Predictive Modeling: Predictive modeling is a probabilistic process that allows
us to forecast outcomes, on the basis of some predictors. These predictors are basically
feature that come into play when deciding the final result, i.e., the outcome of the model.
Dimensionality reduction is the process of reducing the number of features (or
dimensions) in a dataset while retaining as much information as possible. This can be
done for a variety of reasons, such as to reduce the complexity of a model, to improve
the performance of a learning algorithm, or to make it easier to visualize the data. There
are several techniques for dimensionality reduction, including principal component
analysis (PCA), singular value decomposition (SVD), and linear discriminant analysis
(LDA). Each technique uses a different method to project the data onto a lower-
dimensional space while preserving important information.

What is Dimensionality Reduction?

Dimensionality reduction is a technique used to reduce the number of features in a
dataset while retaining as much of the important information as possible. In other words,
it is a process of transforming high-dimensional data into a lower-dimensional space
that still preserves the essence of the original data.
In machine learning, high-dimensional data refers to data with a large number of
features or variables. The curse of dimensionality is a common problem in machine
learning, where the performance of the model deteriorates as the number of features
increases. This is because the complexity of the model increases with the number of
features, and it becomes more difficult to find a good solution. In addition, high-
dimensional data can also lead to overfitting, where the model fits the training data too
closely and does not generalize well to new data.
Dimensionality reduction can help to mitigate these problems by reducing the
complexity of the model and improving its generalization performance. There are two
main approaches to dimensionality reduction: feature selection and feature extraction.
Feature Selection:
Feature selection involves selecting a subset of the original features that are most
relevant to the problem at hand. The goal is to reduce the dimensionality of the dataset
while retaining the most important features. There are several methods for feature
selection, including filter methods, wrapper methods, and embedded methods. Filter
methods rank the features based on their relevance to the target variable, wrapper
methods use the model performance as the criteria for selecting features, and
embedded methods combine feature selection with the model training process.

Page 1 of 2
Feature Extraction:
Feature extraction involves creating new features by combining or transforming the
original features. The goal is to create a set of features that captures the essence of the
original data in a lower-dimensional space. There are several methods for feature
extraction, including principal component analysis (PCA), linear discriminant analysis
(LDA), and t-distributed stochastic neighbor embedding (t-SNE). PCA is a popular
technique that projects the original features onto a lower-dimensional space while
preserving as much of the variance as possible.
Why is Dimensionality Reduction important in Machine Learning and Predictive
Modeling?
An intuitive example of dimensionality reduction can be discussed through a simple e-
mail classification problem, where we need to classify whether the e-mail is spam or
not. This can involve a large number of features, such as whether or not the e-mail has a
generic title, the content of the e-mail, whether the e-mail uses a template, etc.
However, some of these features may overlap. In another condition, a classification
problem that relies on both humidity and rainfall can be collapsed into just one
underlying feature, since both of the aforementioned are correlated to a high degree.
Hence, we can reduce the number of features in such problems. A 3-D classification
problem can be hard to visualize, whereas a 2-D one can be mapped to a simple 2-
dimensional space, and a 1-D problem to a simple line. The below figure illustrates this
concept, where a 3-D feature space is split into two 2-D feature spaces, and later, if
found to be correlated, the number of features can be reduced even further.

Page 2 of 2
Components of Dimensionality Reduction
There are two components of dimensionality reduction:

 Feature selection: In this, we try to find a subset of the original set of variables,
or features, to get a smaller subset which can be used to model the problem. It
usually involves three ways:
1. Filter
2. Wrapper
3. Embedded
 Feature extraction: This reduces the data in a high dimensional space to a
lower dimension space, i.e. a space with lesser no. of dimensions.
Methods of Dimensionality Reduction
The various methods used for dimensionality reduction include:

 Principal Component Analysis (PCA)

 Linear Discriminant Analysis (LDA)
 Generalized Discriminant Analysis (GDA)
Dimensionality reduction may be both linear and non-linear, depending upon the
method used. The prime linear method, called Principal Component Analysis, or PCA, is
discussed below.
Principal Component Analysis
This method was introduced by Karl Pearson. It works on the condition that while the
data in a higher dimensional space is mapped to data in a lower dimension space, the
variance of the data in the lower dimensional space should be maximum.
Page 3 of 2
It involves the following steps:

 Construct the covariance matrix of the data.

 Compute the eigenvectors of this matrix.
 Eigenvectors corresponding to the largest eigenvalues are used to reconstruct a
large fraction of variance of the original data.
Hence, we are left with a lesser number of eigenvectors, and there might have been
some data loss in the process. But, the most important variances should be retained by
the remaining eigenvectors.
Advantages of Dimensionality Reduction
 It helps in data compression, and hence reduced storage space.
 It reduces computation time.
 It also helps remove redundant features, if any.
 Improved Visualization: High dimensional data is difficult to visualize, and
dimensionality reduction techniques can help in visualizing the data in 2D or
3D, which can help in better understanding and analysis.
 Overfitting Prevention: High dimensional data may lead to overfitting in
machine learning models, which can lead to poor generalization performance.
Dimensionality reduction can help in reducing the complexity of the data, and
hence prevent overfitting.
 Feature Extraction: Dimensionality reduction can help in extracting important
features from high dimensional data, which can be useful in feature selection
for machine learning models.
 Data Preprocessing: Dimensionality reduction can be used as a preprocessing
step before applying machine learning algorithms to reduce the dimensionality
of the data and hence improve the performance of the model.
 Improved Performance: Dimensionality reduction can help in improving the
performance of machine learning models by reducing the complexity of the
data, and hence reducing the noise and irrelevant information in the data.

Page 4 of 2
Disadvantages of Dimensionality Reduction
 It may lead to some amount of data loss.
 PCA tends to find linear correlations between variables, which is sometimes
undesirable.
 PCA fails in cases where mean and covariance are not enough to define
datasets.
 We may not know how many principal components to keep- in practice, some
thumb rules are applied.
 Interpretability: The reduced dimensions may not be easily interpretable, and it
may be difficult to understand the relationship between the original features
and the reduced dimensions.
 Overfitting: In some cases, dimensionality reduction may lead to overfitting,
especially when the number of components is chosen based on the training
data.
 Sensitivity to outliers: Some dimensionality reduction techniques are sensitive
to outliers, which can result in a biased representation of the data.
 Computational complexity: Some dimensionality reduction techniques, such as
manifold learning, can be computationally intensive, especially when dealing
with large datasets.

Important points:

 Dimensionality reduction is the process of reducing the number of features in a

dataset while retaining as much information as possible.
This can be done to reduce the complexity of a model, improve the performance
of a learning algorithm, or make it easier to visualize the data.
 Techniques for dimensionality reduction include: principal component analysis
(PCA), singular value decomposition (SVD), and linear discriminant analysis
(LDA).
 Each technique projects the data onto a lower-dimensional space while
preserving important information.
 Dimensionality reduction is performed during pre-processing stage before
building a model to improve the performance
 It is important to note that dimensionality reduction can also discard useful
information, so care must be taken when applying these techniques.

Page 5 of 2

ML Team 13 B Div
No ratings yet
ML Team 13 B Div
6 pages
PCA
No ratings yet
PCA
21 pages
Unit 5 - Machine Learning - WWW - Rgpvnotes.in PDF
No ratings yet
Unit 5 - Machine Learning - WWW - Rgpvnotes.in PDF
14 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
27 pages
Feature Dimensionality Reduction: A Review: Survey and State of The Art
No ratings yet
Feature Dimensionality Reduction: A Review: Survey and State of The Art
31 pages
Unit-13 Feature Selection and Extraction
No ratings yet
Unit-13 Feature Selection and Extraction
24 pages
ML Unit 4 (R22)
No ratings yet
ML Unit 4 (R22)
34 pages
Dimensionality Reduction in Machine Learning-1
No ratings yet
Dimensionality Reduction in Machine Learning-1
16 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
22 pages
ASM-BDM - Module 3 - Notes
No ratings yet
ASM-BDM - Module 3 - Notes
12 pages
Unit 3
No ratings yet
Unit 3
102 pages
Machine Learning
No ratings yet
Machine Learning
5 pages
ML Unit 4 at VS
No ratings yet
ML Unit 4 at VS
33 pages
ICACCI 2015 7275954-Pca
No ratings yet
ICACCI 2015 7275954-Pca
4 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
47 pages
Feature Selection & Feature Extraction
No ratings yet
Feature Selection & Feature Extraction
19 pages
03 Dimensionality Reduction
No ratings yet
03 Dimensionality Reduction
38 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
7 pages
ML Unit 4
No ratings yet
ML Unit 4
50 pages
CRSI Product Catalog 2024-Spring
No ratings yet
CRSI Product Catalog 2024-Spring
20 pages
ICT202B AI ML and Emerging Technologies UNIT 2 (Advanced Phython Packages)
No ratings yet
ICT202B AI ML and Emerging Technologies UNIT 2 (Advanced Phython Packages)
20 pages
ML (Unit 5)
No ratings yet
ML (Unit 5)
34 pages
ISOMAP in ML
No ratings yet
ISOMAP in ML
12 pages
Machine Learning Unit-5
No ratings yet
Machine Learning Unit-5
49 pages
Dimentiality
No ratings yet
Dimentiality
4 pages
A Review of Various Linear and Non Linear Dimensionality Reduction Techniques
No ratings yet
A Review of Various Linear and Non Linear Dimensionality Reduction Techniques
7 pages
ML Unit 4
No ratings yet
ML Unit 4
20 pages
Dimension Reduction
No ratings yet
Dimension Reduction
38 pages
Unit 3
No ratings yet
Unit 3
23 pages
AML Unit 5
No ratings yet
AML Unit 5
13 pages
Chapter 1.2. Overview of ML
No ratings yet
Chapter 1.2. Overview of ML
17 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
9 pages
Chapter6 - Unit IV2024
No ratings yet
Chapter6 - Unit IV2024
84 pages
L-10 - Presentation1-09052024-072206pm
No ratings yet
L-10 - Presentation1-09052024-072206pm
27 pages
Ai & ML Week-9
No ratings yet
Ai & ML Week-9
30 pages
ML 4
No ratings yet
ML 4
14 pages
ML Unit Iv Part I
No ratings yet
ML Unit Iv Part I
11 pages
ML Unit 4
No ratings yet
ML Unit 4
34 pages
Unit No.02 - Feature Extraction and Selection
No ratings yet
Unit No.02 - Feature Extraction and Selection
17 pages
Unit 4 Dimenstionality Reduction
No ratings yet
Unit 4 Dimenstionality Reduction
104 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
30 pages
Unit 5 - Machine Learning - Www.a2softech - Xyz - A2kash
No ratings yet
Unit 5 - Machine Learning - Www.a2softech - Xyz - A2kash
12 pages
22AIP3101A Session 7
No ratings yet
22AIP3101A Session 7
28 pages
PCA - Feb 8
No ratings yet
PCA - Feb 8
28 pages
Dimensionality Reduction Techniques You Should Know in 2021
No ratings yet
Dimensionality Reduction Techniques You Should Know in 2021
12 pages
Certificate: Jawahar Navodaya Vidyalaya
No ratings yet
Certificate: Jawahar Navodaya Vidyalaya
13 pages
14: Dimensionality Reduction (PCA) : Motivation 1: Data Compression
No ratings yet
14: Dimensionality Reduction (PCA) : Motivation 1: Data Compression
7 pages
Introduction To Dimensionality Reduction
No ratings yet
Introduction To Dimensionality Reduction
5 pages
Dimensionality Reduction-PCA FA LDA
No ratings yet
Dimensionality Reduction-PCA FA LDA
12 pages
ML Unit 4
No ratings yet
ML Unit 4
34 pages
ML Mod 6
No ratings yet
ML Mod 6
5 pages
Dimensionality
No ratings yet
Dimensionality
9 pages
Unit 5 Notes New
No ratings yet
Unit 5 Notes New
6 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
19 pages
Dimension Reduction: P Adraig Cunningham University College Dublin
No ratings yet
Dimension Reduction: P Adraig Cunningham University College Dublin
24 pages
Feature Selection Extraction
No ratings yet
Feature Selection Extraction
24 pages
The Fit of Hollands RIASEC Model To US Occupation
No ratings yet
The Fit of Hollands RIASEC Model To US Occupation
23 pages
Dimensionality Reduction: A Comparative Review
No ratings yet
Dimensionality Reduction: A Comparative Review
36 pages
1.variable Reduction 2.principal Component Analysis: Topic UNIT-4
No ratings yet
1.variable Reduction 2.principal Component Analysis: Topic UNIT-4
19 pages
Dimensionality Reduction Report-Yomna Eid Rizk
No ratings yet
Dimensionality Reduction Report-Yomna Eid Rizk
6 pages
PPP Neurological Disorders-2017-Infectious Diseases Spinal Cord Injury Degenerative Diseases Etc.
No ratings yet
PPP Neurological Disorders-2017-Infectious Diseases Spinal Cord Injury Degenerative Diseases Etc.
167 pages
PHD Thesis Felix Eichas
No ratings yet
PHD Thesis Felix Eichas
166 pages
S3 Food and Nutrition
No ratings yet
S3 Food and Nutrition
66 pages
5 Resources For English Language Teachers - Cambridge English
No ratings yet
5 Resources For English Language Teachers - Cambridge English
7 pages
Bachelor of Paramedicine - Victoria University
No ratings yet
Bachelor of Paramedicine - Victoria University
18 pages
Kalita & Deka (2024)
No ratings yet
Kalita & Deka (2024)
6 pages
Gradebook Shaima
No ratings yet
Gradebook Shaima
2 pages
Chief Education Supervisor & Education Program
No ratings yet
Chief Education Supervisor & Education Program
27 pages
Field Experience B-Principal Interview
No ratings yet
Field Experience B-Principal Interview
6 pages
Out
No ratings yet
Out
61 pages
University of Calicut - B. Arch. Degree Course Scheme - 2017 Admission
No ratings yet
University of Calicut - B. Arch. Degree Course Scheme - 2017 Admission
3 pages
Balvatika - Kvs Samagam - Balvatika Shastika Nandakumar (Father)
No ratings yet
Balvatika - Kvs Samagam - Balvatika Shastika Nandakumar (Father)
4 pages
MAED ECE509 Educational Policy and Practice
No ratings yet
MAED ECE509 Educational Policy and Practice
9 pages
A1 Movers 2018 Speaking Part 4
75% (4)
A1 Movers 2018 Speaking Part 4
4 pages
Call For Papers - IJAIKE Inaugural Issues - Rev3
No ratings yet
Call For Papers - IJAIKE Inaugural Issues - Rev3
2 pages
Module-1 and Module 2
No ratings yet
Module-1 and Module 2
16 pages
SP321 Reviewer
No ratings yet
SP321 Reviewer
5 pages
HK TSA Writing Paper 3 2020 - 3ERW3
No ratings yet
HK TSA Writing Paper 3 2020 - 3ERW3
12 pages
FSD BIS601 Syllabus
No ratings yet
FSD BIS601 Syllabus
4 pages
Iligan Medical Center College
No ratings yet
Iligan Medical Center College
14 pages
Detailed Lesson Plan in Trends, Network & Critical Thinking (2 Quarter)
100% (4)
Detailed Lesson Plan in Trends, Network & Critical Thinking (2 Quarter)
5 pages
CV Kishore
No ratings yet
CV Kishore
3 pages
Draft 2023 EWF Side Meeting
No ratings yet
Draft 2023 EWF Side Meeting
2 pages
White Classic Clean Resume
No ratings yet
White Classic Clean Resume
2 pages
Discussion Text (s1)
No ratings yet
Discussion Text (s1)
5 pages
Chapter 4 Marzano
No ratings yet
Chapter 4 Marzano
2 pages
Kami Export - Gene Expression-Translation-S.1617553074
89% (9)
Kami Export - Gene Expression-Translation-S.1617553074
6 pages
Principles of MapReduce Systems: Definitive Reference for Developers and Engineers
From Everand
Principles of MapReduce Systems: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
Decision Tree Pruning: Fundamentals and Applications
From Everand
Decision Tree Pruning: Fundamentals and Applications
Fouad Sabry
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Dimensionality Reduction

Uploaded by

Dimensionality Reduction

Uploaded by

Introduction to Dimensionality Reduction

What is Dimensionality Reduction?

 Principal Component Analysis (PCA)

 Construct the covariance matrix of the data.

 Dimensionality reduction is the process of reducing the number of features in a

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.