ML Unit 4
ML Unit 4
me/jntuh
UNIT – IV
Dimensionality Reduction – Linear Discriminant Analysis –
Principal Component Analysis – Factor Analysis – Independent
Component Analysis – Locally Linear Embedding – Isomap – Least
Squares Optimization
In machine learning,
high-dimensional data refers to data with a large number of features
or variables. The curse of dimensionality is a common problem in
machine learning, where the performance of the model deteriorates as
the number of features increases. This is because the complexity of
the model increases with the number of features, and it becomes more
difficult to find a good solution.
In addition, high-dimensional data can also lead to overfitting, where
the model fits the training data too closely and does not generalize
well to new data.
Page 1 of 34
Feature Extraction:
Page 2 of 34
Page 4 of 34
Page 5 of 34
Page 6 of 34
Page 7 of 34
interrelationships among items and aims to group items that are part
of unified concepts or constructs.
• Researchers do not make a priori assumptions about the
relationships among factors, allowing the data to reveal the
structure organically.
• Exploratory Factor Analysis (EFA) helps in identifying the
number of factors needed to account for the variance in the
observed variables and understanding the relationships
between variables and factors.
2. Confirmatory Factor Analysis (CFA)
Confirmatory Factor Analysis (CFA) is a more structured approach
that tests specific hypotheses about the relationships between
observed variables and latent factors based on prior theoretical
knowledge or expectations. It uses structural equation modeling
techniques to test a measurement model, wherein the observed
variables are assumed to load onto specific factors.
• Confirmatory Factor Analysis (CFA) assesses the fit of the
hypothesized model to the actual data, examining how well
the observed variables align with the proposed factor
structure.
• This method allows for the evaluation of relationships
between observed variables and unobserved factors, and it
can accommodate measurement error.
• Researchers hypothesize the relationships between variables
and factors before conducting the analysis, and the model is
tested against empirical data to determine its validity.
In summary, while Exploratory Factor Analysis (EFA) is more
exploratory and flexible, allowing the data to dictate the factor
structure, Confirmatory Factor Analysis (CFA) is more
confirmatory, testing specific hypotheses about how the observed
variables are related to latent factors. Both methods are valuable
tools in understanding the underlying structure of data and have their
respective strengths and applications.
Types of Factor Extraction Methods
Some of the Type of Factor Extraction methods are dicussed below:
1. Principal Component Analysis (PCA):
• PCA is a widely used method for factor extraction.
Page 8 of 34
Page 9 of 34
Page 10 of 34
or
Assumptions in ICA
1. The first assumption asserts that the source signals (original
signals) are statistically independent of each other.
2. The second assumption is that each source signal exhibits non-
Gaussian distributions.
Mathematical Representation of Independent Component Analysis
The observed random vector is , representing
the observed data with m components. The hidden components are
represented by the random vector , where n is the
number of hidden sources.
Linear Static Transformation
The observed data X is transformed into hidden components S using a
linear static transformation representation by the matrix W.
Page 11 of 34
Page 12 of 34
Here, there is a party going into a room full of people. There is ‘n’
number of speakers in that room, and they are speaking
simultaneously at the party. In the same room, there are also ‘n’
microphones placed at different distances from the speakers, which
are recording ‘n’ speakers’ voice signals. Hence, the number of
speakers is equal to the number of microphones in the room. Step 4:
Visualize the signals
• Python3
Page 13 of 34
plt.subplot(3, 1, 3)
plt.title('Estimated Sources (FastICA)')
plt.plot(S_)
plt.tight_layout()
plt.show()
where, X1, X2, …, Xn are the original signals present in the mixed signal
and Y1, Y2, …, Yn are the new features and are independent components
that are independent of each other.
Page 14 of 34
Page 15 of 34
Minimize:
Subject to :
Where:
• xi represents the i-th data point.
• wij are the weights that minimize the reconstruction error for
data point xi using its neighbors.
Locally Linear Embedding Algorithm
The LLE algorithm can be broken down into several steps:
• Neighborhood Selection: For each data point in the high-
dimensional space, LLE identifies its k-nearest neighbors. This
step is crucial because LLE assumes that each data point can be
well approximated by a linear combination of its neighbors.
• Weight Matrix Construction: LLE computes a set of weights for
each data point to express it as a linear combination of its
neighbors. These weights are determined in such a way that the
reconstruction error is minimized. Linear regression is often used
to find these weights.
• Global Structure Preservation: After constructing the weight
matrix, LLE aims to find a lower-dimensional representation of
the data that best preserves the local linear relationships. It
does this by seeking a set of coordinates in the lower-
dimensional space for each data point that minimizes a cost
function. This cost function evaluates how well each data point
can be represented by its neighbors.
• Output Embedding: Once the optimization process is complete,
LLE provides the final lower-dimensional representation of the
data. This representation captures the essential structure of the
data while reducing its dimensionality.
Page 16 of 34
Page 17 of 34
Page 18 of 34
n_neighbors = 10
X, _ = make_swiss_roll(n_samples=n_samples)
#It generates a synthetic dataset resembling a Swiss Roll using the
make_swiss_roll function from scikit-learn.
n_samples specifies the number of data points to generate.
n_neighbors defines the number of neighbors used in the LLE algorithm.
#Applying Locally Linear Embedding (LLE)
X_reduced = lle.fit_transform(X)
plt.figure(figsize=(12, 6))
plt.subplot(121)
plt.title("Original Data")
plt.xlabel("Feature 1")
plt.ylabel("Feature 2")
plt.subplot(122)
Page 19 of 34
plt.xlabel("Component 1")
plt.ylabel("Component 2")
plt.tight_layout()
plt.show()
ISOMAP
Page 20 of 34
Isomap
An understanding and representation of complicated data structures are
crucial for the field of machine learning. To achieve this, Manifold
Learning, a subset of unsupervised learning, has a significant role to play.
Among the manifold learning techniques, ISOMAP (Isometric Mapping)
stands out for its prowess in capturing the intrinsic geometry of high-
dimensional data. In the case of situations in which linear methods are
lacking, they have proved particularly efficient.
ISOMAP is a flexible tool that seamlessly blends multiple learning and
dimensionality reduction intending to obtain more detailed knowledge of
the underlying structure of data. This article takes a look at ISOMAP's
inner workings and sheds light on its parameters, functions, and proper
implementation with SkLearn.
Isometric mapping is an approach to reduce the dimensionality
of machine learning.
Relation between Geodesic Distances and Euclidean Distances
Understanding the distinction between equatorial and elliptic distances is
of vital importance for ISOMAP. The geodesic distance considers the
shortest path along the curved surface of the manifold, as opposed
to Euclidean distances which are measured by measuring straight Line
distances in the input space. In order to provide a more precise
representation of the data's internal structure, ISOMAP exploits these
quantum distances.
ISOMAP Parameters
ISOMAP comes with several parameters, each influencing the
dimensionality reduction process:
• n_neighbors: Determines the number of neighbors used to
approximate geodesic distances. Higher values may make it
possible to achieve higher results, but they still require more
computing power.
• n_components: Determines the number of dimensions in a low
dimensional representation.
• eigen_solver: Determines the method used for decomposing
an Eigenvalue. There are options such as "auto", "arpack" and
"dense."
Page 21 of 34
Page 22 of 34
# Apply Isomap
isomap = Isomap(n_neighbors=10, n_components=2)
X_isomap = isomap.fit_transform(X)
plt.show()
Page 23 of 34
Page 24 of 34
Disadvanatges
• Computational cost: for large datasets, computation of geodesic
distance using Floyd's algorithm can be computationally
expensive and lead to a longer run time.
• Sensitive to parameter settings: incorrect selection of the
parameters may lead to a distortion or misleading insert.
• May be difficult for manifolds with holes or topological
complexity, which may lead to inaccurate representations:
Isomap is not capable of performing well in a manifold that
contains holes or other topological complexity.
Applications of Isomap
• Visualization: High-dimensional data like face images can be
visualized in a lower-dimensional space, enabling easier
exploration and understanding.
• Data exploration: Isomap can help identify clusters and patterns
within the data that are not readily apparent in the original high-
dimensional space.
• Anomaly detection: Outliers that deviate significantly from the
underlying manifold can be identified using Isomap.
• Machine learning tasks: Isomap can be used as a pre-
processing step for other machine learning tasks, such as
classification and clustering, by improving the performance and
interpretability of the models.
Page 25 of 34
Page 26 of 34
calculate the slope (m) and intercept (c) of the line are derived from the
following equations:
1. Slope (m) Formula: m = n(∑xy)−(∑x)(∑y) / n(∑x2)−(∑x)2
2. Intercept (c) Formula: c = (∑y)−a(∑x) / n
Where:
• n is the number of data points,
• ∑xy is the sum of the product of each pair of x and y values,
• ∑x is the sum of all x values,
• ∑y is the sum of all y values,
• ∑x2 is the sum of the squares of x values.
Page 28 of 34
The red points in the above plot represent the data points for the sample
data available. Independent variables are plotted as x-coordinates and
dependent ones are plotted as y-coordinates. The equation of the line of
best fit obtained from the Least Square method is plotted as the red line
in the graph.
We can conclude from the above graph that how the Least Square
method helps us to find a line that best fits the given data points and
hence can be used to make further predictions about the value of the
dependent variable where it is not known initially.
Limitations of the Least Square Method
The Least Square method assumes that the data is evenly distributed and
doesn’t contain any outliers for deriving a line of best fit. But, this method
doesn’t provide accurate results for unevenly distributed data or for data
containing outliers.
Check: Least Square Regression Line
Least Square Method Solved Examples
Problem 1: Find the line of best fit for the following data points using
the Least Square method: (x,y) = (1,3), (2,4), (4,8), (6,10), (8,15).
Solution:
Here, we have x as the independent variable and y as the dependent
variable. First, we calculate the means of x and y values denoted by X
and Y respectively.
X = (1+2+4+6+8)/5 = 4.2
Y = (3+4+8+10+15)/5 = 8
Page 29 of 34
i yi X – xi Y – yi (X-xi)*(Y-yi) (X – xi)2
1 3 3.2 5 16 10.24
4 8 0.2 0 0 0.04
The slope of the line of best fit can be calculated from the formula as
follows:
m = (Σ (X – xi)*(Y – yi)) /Σ(X – xi)2
m = 55/32.8 = 1.68 (rounded upto 2 decimal places)
Now, the intercept will be calculated from the formula as follows:
c = Y – mX
c = 8 – 1.68*4.2 = 0.94
Thus, the equation of the line of best fit becomes, y = 1.68x + 0.94.
Page 30 of 34
Genetic Algorithms
Page 31 of 34
Fitness Score
A Fitness Score is given to each individual which shows the ability of an
individual to “compete”. The individual having optimal fitness score (or
near optimal) are sought.
The GAs maintains the population of n individuals
(chromosome/solutions) along with their fitness scores.The individuals
having better fitness scores are given more chance to reproduce than
others. The individuals with better fitness scores are selected who mate
and produce better offspring by combining chromosomes of parents.
The population size is static so the room has to be created for new
arrivals. So, some individuals die and get replaced by new arrivals
eventually creating new generation when all the mating opportunity of
the old population is exhausted. It is hoped that over successive
generations better solutions will arrive while least fit die.
Each new generation has on average more “better genes” than the
individual (solution) of previous generations. Thus each new generations
have better “partial solutions” than previous generations. Once the
offspring produced having no significant difference from offspring
produced by previous populations, the population is converged. The
algorithm is said to be converged to a set of solutions for the problem.
Operators of Genetic Algorithms
Once the initial generation is created, the algorithm evolves the
generation using following operators –
1) Selection Operator: The idea is to give preference to the individuals
with good fitness scores and allow them to pass their genes to
successive generations.
2) Crossover Operator: This represents mating between individuals.
Two individuals are selected using selection operator and crossover sites
are chosen randomly. Then the genes at these crossover sites are
Page 32 of 34
Page 33 of 34
• Code breaking
• Filtering and signal processing
• Learning fuzzy rule base etc
Page 34 of 34