CE880_Lecture4_slides
CE880_Lecture4_slides
Haider Raza
Tuesday, 07st February 2023
1
About Myself
2
Dimensionality Reduction
3
Principal component analysis (PCA)
PCA is a technique used to emphasize variation and bring out strong patterns in a
dataset. It‘s often used to make data easy to explore and visualize
* It is an orthogonal transformation to convert a set of correlated variables into a set
of values of linearly uncorrelated variables called principal components, with the goal
of finding the best summary of the data using a limited number of PCs.
4
PCA: 2D Example
First, consider a dataset in only two dimensions such as (height & weight). This
dataset can be plotted as points in a plane. But if we want to tease out variation,
PCA finds a new coordinate system in which every point has a new (x,y) value. The
axes don’t actually mean anything physical; they’re combinations of height and weight
called "principal components" that are chosen to give one axes lots of variation.
5
PCA: 2D Example
6
PCA: 3D Example
With three dimensions, PCA is more useful, because it’s hard to see through a cloud
of data. In the example below, the original data are plotted in 3D, but you can project
the data into 2D through a transformation no different than finding a camera angle:
rotate the axes to find the best angle. The PCA transformation ensures that the
horizontal axis PC1 has the most variation, the vertical axis PC2 the second-most, and
a third axis PC3 the least. Obviously, PC3 is the one we drop.
7
PCA: Real Example
What if our data have way more than 3-dimensions? Like, 17 dimensions?! In the
table is the average consumption of 17 types of food in grams per person per week for
every country in the UK.
The table shows some interesting variations across different food types, but overall
differences aren’t so notable. Let’s see if PCA can eliminate dimensions to emphasize
how countries differ.
8
PCA: Real Example in 1D
Here’s the plot of the data along the first principal component. Already we can see
something is different about Northern Ireland.
9
PCA: Real Example in 2D
Now, see the first and second principal components, we see Northern Ireland a major
outlier. Once we go back and look at the data in the table, this makes sense: the
Northern Irish eat way more grams of fresh potatoes and way fewer of fresh fruits,
cheese, fish and alcoholic drinks.
10
Using PCA
11
Applying PCA on half-moon data
12
Before and After PCA
13
Correlation plot
14
Kernel PCA
15
Let‘s run it for Kernel PCA on moon data
16
Kernel PCA plot
17
What is Bias?
Most of you might have watched or at least heard about the popular Netflix series
‘Queen’s Gambit’. This series had excellently captured the struggles of women in
society and one of the best examples of gender bias.
18
What is Data Bias?
Data bias in machine learning is a type of error in which certain elements of a dataset
are more heavily weighted and/or represented than others. A biased dataset does not
accurately represent a model’s use case, resulting in skewed outcomes, low accuracy
levels, and analytical errors.
19
How serious are the implications of neglecting bias in the data?
As Data Scientists we know that if our data sample does not represent the whole
population, then our results are not statistically significant. Which means that we do
not get accurate results.
20
How can AI bias occur?
21
Let‘s dive into real world data
This data set is the result of a chemical analysis of wines grown in the same region in
Italy but derived from three different cultivars. The analysis determined the quantities
of 13 constituents found in each of the three types of wines
https://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data
’Alcohol’, ’Malic acid’, ’Ash’, ’Alcalinity of ash’, ’Magnesium’,’Total phenols’, ’Flavanoids’,
’Nonflavanoid phenols’,’Proanthocyanins’, ’Color intensity’, ’Hue’,’OD280/OD315 of diluted
wines’, ’Proline’
22
Let‘s cluster the wine dataset
23
Let‘s cluster the wine dataset
24
Let’s do PCA on wine data to reduce dimensionality
We will get Principal components on X-axis and explained variance ratio on Y-axis
25
Let’s do PCA ...
26
Let’s do PCA ...
27
Let’s train a Logistic Regression Model to classify the test data
28
explanation
29
30