0% found this document useful (0 votes)
11 views53 pages

Bim309 Ai Week13

Uploaded by

gtuguz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views53 pages

Bim309 Ai Week13

Uploaded by

gtuguz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 53

25.12.

2024

BIM309 Artificial Intelligence

Week 13 – Machine Learning (ML) – Part III

Outline
§ Supervised Learning (cont.)
§ Perceptron Algorithm
§ Neural Networks
§ Perceptron to MLP
§ The XOR Example
§ Backpropagation Algorithm
§ Backpropagation Rules
§ Unsupervised Learning: Clustering
§ K-Means Clustering
§ Hierarchical Clustering

1
25.12.2024

Outline
§ Supervised Learning (cont.)
§ Perceptron Algorithm
§ Neural Networks
§ Perceptron to MLP
§ The XOR Example
§ Backpropagation Algorithm
§ Backpropagation Rules
§ Unsupervised Learning: Clustering
§ K-Means Clustering
§ Hierarchical Clustering

Biological Neurons

§ What are they?


§ Neurons are the fundamental units of our brain and nervous system. They
process and transmit information through electrical and chemical signals.
§ Structure: Each neuron has
§ Dendrites: Receive signals from other neurons
§ Cell body (soma): Processes the signals
§ Axon: Sends the processed signal to other neurons
§ Synapse: The gap where chemical signals are exchanged between neurons
§ Function: When a neuron receives enough input (a threshold), it "fires" a
signal to the next neuron.

2
25.12.2024

A Simplified Mathematical
Model of a Biological Neuron
§ Perceptron is a simplified model of a biological neuron, designed to mimic how
neurons work in our brain, but mathematically.
§ Structure:
§ Inputs: Correspond to dendrites in biological neurons. These are weighted signals
(e.g., numbers) from other sources
§ Weights: Each input is multiplied by a weight that determines its importance
§ Summation: The weighted inputs are added together
§ Activation Function: A threshold is applied to the sum. If it exceeds the
threshold, the perceptron "fires" (outputs a signal like 1); otherwise, it doesn't
(outputs 0)
§ Function: A perceptron is used in AI for decision-making tasks like classification (e.g.,
determining if an email is spam or not).
6

Perceptron
§ Belongs to Neural Networks (NN) class of algorithms (that try to mimic how
the brain functions)
§ The first algorithm (or the simplest NN) used was the Perceptron
(Rosenblatt 1959)
§ The idea is that, given some inputs, the neuron should produce some output
§ Worked extremely well to recognize:
1. Handwritten characters (LeCun et a. 1989)
2. Spoken words (Lang et al. 1990)
3. Faces (Cottrel 1990)
§ NN were popular in the 90’s but then lost some of its popularity
§ Now NN back with deep learning, thanks to algorithmic and computational
progress

3
25.12.2024

Perfectly Separable Data

The Perceptron Algorithm

§ Linear classification method


§ Simplest classification method
§ Simplest neural network
§ Works only on perfectly
separated data

Perceptron
bias

Weighted Sum Activation Function

output

step function
Given 𝑛 examples and 𝑑 features
%

𝑓 𝑥! = 𝑠𝑖𝑔𝑛 ( 𝑤" 𝑥!"


"#$

4
25.12.2024

Perceptron
§ Works perfectly …
§ if data is linearly separable
§ If not, it will not converge
§ Idea: Start with a random hyperplane and adjust it using your training data
by updating the weights
§ Iterative method
§ Perceptron Convergence Theorem states that any linear function can be learned using
this algorithm in a finite number of iterations

10

Perceptron

output and prediction do not match

11

5
25.12.2024

Perceptron
Some observations:
§ The weights 𝑤! , . . . , 𝑤" determine the slope of the decision boundary
§ 𝑤# determines the offset of the decision boundary (sometimes noted b)
§ Line 6 (#adjust the weights) corresponds to:

§ Mistake on positive: add x to weight vector


§ Mistake on negative: subtract x from weight vector
§ Some other variants of the algorithm add or subtract 1
§ Convergence happen when the weights do not change anymore
(difference between the last two weight vectors is 0)

12

Perceptron: Example

13

6
25.12.2024

Perceptron: Example

14

Perceptron: Example

15

7
25.12.2024

Perceptron: Example

16

Perceptron: Example

17

8
25.12.2024

Perceptron: Example

18

Perceptron: Example

19

9
25.12.2024

Perceptron: Example

20

Perceptron: Example

21

10
25.12.2024

Perceptron: Example

22

Perceptron: Example
Finally
converged!

23

11
25.12.2024

Perceptron: Example
With some
test data:

24

Perceptron
§ The 𝑤$ determine the contribution of 𝑥$ to the label
§ −𝑤# is a quantity that ∑&$%! 𝑤$ 𝑥! needs to exceed for the perceptron
to output 1
§ Can be used to represent many Boolean functions:
§ AND, OR, NAND, NOR, NOT
§ but not all of them (e.g., XOR)

25

12
25.12.2024

Boolean Function “OR” Using Perceptron

26

Choice of the Hyperplane

27

13
25.12.2024

Summary: Perceptron Expressiveness


§ Problem: Step function is not differentiable, so we cannot find a
closed-form solution or apply gradient descent
§ Instead use iterative simple perceptron learning algorithm à starts with an
arbitrary hyperplane and adjusts it using the training data
§ Perceptron produces a linear separator
§ Can only learn linearly separable patterns
§ Can represent Boolean functions like AND, OR, NAND, NOR, NOT but
not the XOR function

28

Outline
§ Supervised Learning (cont.)
§ Perceptron Algorithm
§ Neural Networks
§ Perceptron to MLP
§ The XOR Example
§ Backpropagation Algorithm
§ Backpropagation Rules
§ Unsupervised Learning: Clustering
§ K-Means Clustering
§ Hierarchical Clustering

29

14
25.12.2024

From Perceptron to NN
§ The perceptron works perfectly
§ if data is linearly separable
§ if not, it will not converge L
§ Neural networks use the ability of the perceptrons to represent
elementary functions and combine them in a network of layers of
elementary questions
§ However, a cascade of linear functions is still linear!
§ And we want networks that represent highly non-linear functions

30

From Perceptron to NN
§ Also, perceptron used a threshold function (step function), which is
undifferentiable and not suitable for gradient descent in case data is
not linearly separable
§ We want a function whose output is a linear function of the inputs
§ One possibility is to use the
sigmoid function:
'! !
𝑔 𝑧 = !( '!
= !( ' "!

𝒈 𝒛 = 𝟏 𝐰𝐡𝐞𝐧 𝒛 → +∞ 𝒈 𝒛 = 𝟎 𝐰𝐡𝐞𝐧 𝒛 → −∞
31

15
25.12.2024

Perceptron with Sigmoid

Weighted Sum Activation Function

sigmoid function
Given 𝑛 examples and 𝑑 features
For an example 𝑥$ ( the 𝑖 )* line in the matrix of examples)
1
𝑓 𝑥! =
" ∑.
+,- $+%/+
1+ 𝑒
32

The XOR Example


Let’s try to create a NN for the XOR function using elementary
perceptrons (AND, OR, NAND, NOR NOT)

First observe:

Let’s consider that:

33

16
25.12.2024

The XOR Example


First what is the perceptron of the OR?

𝑤! = −10
→ 𝑔(𝑤! + 𝑤") = 1 →
𝑤! + 𝑤" = 10 → −10 + 𝑤" = 10 → 𝑤" = 20
→ 𝑔(𝑤! + 𝑤#) = 1 →
𝑤! + 𝑤# = 10 → −10 + 𝑤# = 10 → 𝑤# = 20
𝑥# 𝑥" (𝑥#𝑶𝑹 𝑥") g(x)
0 0 0 -10
0 1 1 10
1 0 1 10
1 1 1 30
→ 𝑔(−10 + 20𝑥# +20𝑥")

1
1 + 𝑒 $($#!&"!'! &"!'" )
34

The XOR Example


Similarly, we obtain the perceptron's for the AND and NAND:

Note: How the weights in the NAND are the inverse weights of the AND
(𝑥5 𝑵𝑨𝑵𝑫 𝑥6 ) = 𝑵𝑶𝑻 (𝑥5 𝑨𝑵𝑫 𝑥6 )

35

17
25.12.2024

The XOR Example


Let’s try to create a NN for the XOR function using elementary perceptrons
Idea: Model non-linearly separable patterns (like XOR), by connecting
multiple units in a network
We can compute (𝑥! 𝑿𝑶𝑹 𝑥0 ) as ((𝑥! 𝑶𝑹 𝑥0 ) 𝑨𝑵𝑫 (𝑥! 𝑵𝑨𝑵𝑫 𝑥0 ))

𝑥# 𝑥" (𝑥#𝑶𝑹 𝑥") (𝑥#𝑨𝑵𝑫 𝑥") (𝑥#𝑵𝑨𝑵𝑫 𝑥") (𝑥#𝑶𝑹 𝑥") 𝑨𝑵𝑫 (𝑥#𝑵𝑨𝑵𝑫 𝑥") (𝑥#𝑿𝑶𝑹 𝑥")
0 0 0 0 1 0 0
0 1 1 0 1 1 1
1 0 1 0 1 1 1
1 1 1 1 0 0 0

36

The XOR Example


Input Layer Output Layer
Let’s put them together...

Input Output
Examples label
(features)

XOR as a combination of 3 basic perceptrons


𝒈(−𝟑𝟎 + 𝟐𝟎 𝒈 −𝟏𝟎 + 𝟐𝟎𝒙𝟏 + 𝟐𝟎𝒙𝟐 + 𝟐𝟎 𝒈 𝟑𝟎 − 𝟐𝟎𝒙𝟏 − 𝟐𝟎𝒙𝟐 )

37

18
25.12.2024

Multi-Layer Neural Networks

Basic idea: represent any (non-linear) function as a composition of


soft-threshold functions. This is a form of non-linear regression.

38

Backpropagation Algorithm
§ Note: “Feedforward NN” (as opposed to “Recurrent Networks”) have no
connections that loop
§ Backpropagation stands for “backward propagation of errors”
§ Learn the weights for a multilayer network
§ Given a network with a fixed architecture (neurons and inter-connections)
§ Use Gradient descent to minimize the squared error between the network
output value 𝒐 and the ground truth 𝒚
§ We suppose multiple output 𝑘
§ Challenge: Search in all possible weight values for all neurons in the network
to find the best weighting of this architecture that would lead us to minimize
the squared error of the training data

39

19
25.12.2024

Feedforward-
Backpropagation i 𝑤!"

Feature j
vector label f(x)
𝑂A
𝑒 = (𝑥, 𝑦)

40

Backpropagation Rules
§ We consider 𝒌 outputs
§ For an example 𝑒 defined by (𝑥, 𝑦), the error on training example 𝑒,
summed over all output neurons in the network is:
1 1
𝐸/ 𝑤 = * 𝑦0 − 𝑜0
2
0
§ Gradient descent iterates through all the training examples one at a
time, descending the gradient of the error
𝜕𝐸/ (𝑤)
∆𝑤!2 = −𝛼
𝜕𝑤!2

41

20
25.12.2024

𝜕𝐸A (𝑤)
∆𝑤!" = −𝛼
𝜕𝑤!"
Backpropagation Rules
𝜕𝐸' 𝜕𝐸' 𝜕𝑧1 𝜕𝐸'
= = 𝑥
𝜕𝑤$1 𝜕𝑧1 𝜕𝑤$1 𝜕𝑧1 $1

𝜕𝐸'
△ 𝑤$1 = −𝛼 𝑥
𝜕𝑧1 $1

23#
We consider two cases in calculating 24$
(let’s abandon the index 𝑒):

§ Case 1: Neuron 𝑗 is an output neuron


§ Case 2: Neuron 𝑗 is a hidden neuron

43

Backpropagation Rules

Derivative of the Sigmoid Function

44

21
25.12.2024

Backpropagation Rules △ 𝑤!" = −𝛼


𝜕𝐸A
𝑥
𝜕𝑧" !"

45

Backpropagation Rules

46

22
25.12.2024

Backpropagation Algorithm

47

Observations
§ Convergence: small changes
in the weights
§ There are other activation
functions à Hyperbolic
tangent function, is
practically better for NN as its
outputs range from -1 to 1

48

23
25.12.2024

Multi-Class Case

§ Nowadays, networks with more than two layers, a.k.a. deep networks,
have proven to be very effective in many domains
§ Examples of deep networks: restricted Boltzman machines, convolutional
NN, auto encoders, etc.

49

MNIST Database
http://yann.lecun.com/exdb/mnist/

§ The MNIST database of handwritten digits


§ Training set of 60,000 examples, test set of 10,000 examples
§ Vectors in ℝ567 (28x28 images)
§ Labels are the digits they represent
§ Various methods have been tested with this training set and test set
§ Linear models: 7% - 12% error
§ KNN: 0.5%- 5% error
§ Neural networks: 0.35% - 4.7% error
§ Convolutional NN: 0.23% - 1.7% error

50

24
25.12.2024

Tensorflow
http://playground.tensorflow.org/

§ Open source software to play with neural networks in your browser


§ The dots are colored orange or blue for positive and negative examples
§ It’s possible to choose the activation function, architecture, rate etc.
§ Very well done!

51

ANN Demo Samples


§ https://cs.stanford.edu/people/karpathy/convnetjs/demo/classify2d.
html
§ https://machinelearningknowledge.ai/animated-explanation-of-feed-
forward-neural-network-architecture/
§ http://tamaszilagyi.com/blog/2017/2017-11-11-animated_net/
§ https://towardsdatascience.com/understanding-backpropagation-
abcc509ca9d0
§ https://teachablemachine.withgoogle.com/train (transfer learning)

52

25
25.12.2024

https://www.youtube.com/watch?v=aircAruvnKk

Machine Learning Systems: Principles and Practices of Engineering Artificially Intelligent Systems (https://mlsysbook.ai/)

53

Outline
§ Supervised Learning (cont.)
§ Perceptron Algorithm
§ Neural Networks
§ Perceptron to MLP
§ The XOR Example
§ Backpropagation Algorithm
§ Backpropagation Rules
§ Unsupervised Learning: Clustering
§ K-Means Clustering
§ Hierarchical Clustering

54

26
25.12.2024

Review: Supervised Learning: § Given a set of labeled training


examples
Prediction Task is Classification § Learning algorithm produces a
classifier that can classify new
points
§ Learning algorithms only depend
on the feature vectors

Labeled data is expensive to obtain

55

§ Given unlabeled training


examples
Unsupervised Learning: § Goal is to assign each point to a
cluster, ex: two clusters - 1 (blue)
Clustering and 2 (orange)
§ Intuition: Want to assign nearby
points to same cluster

Unlabeled data is very cheap to obtain

56

27
25.12.2024

Unsupervised Learning
Training data: “examples” x
𝑥! , . . . , 𝑥& , 𝑥$ ∈ 𝑋 ⊂ ℝ&

§ Clustering/segmentation:
𝒇 ∶ ℝ𝒅 → 𝑪𝟏 , . . . , 𝑪𝒌 (set of clusters)

Example: Find clusters in the population, fruit species

57

Unsupervised Learning
weight

length

58

28
25.12.2024

Methods: K-means,
Unsupervised Learning Gaussian mixtures,
weight
Hierarchical clustering,
Spectral clustering, etc.

length

59

K-Means: Example

60

29
25.12.2024

K-Means: Example

61

K-Means: Example

62

30
25.12.2024

K-Means: Example

63

K-Means: Example

64

31
25.12.2024

K-Means: Example

65

Clustering: K-Means

§ Goal: Assign each example 𝑥! , . . . , 𝑥& to one of the 𝑘 clusters


𝐶! , . . . , 𝐶;

§ 𝝁𝒋 is the mean of all examples in the 𝑗)* cluster

§ Minimize:

66

32
25.12.2024

Clustering: K-Means
Algorithm K-Means

67

K-Means: Pros and Cons


Pros
+ Easy to implement

BUT...

Cons
- Need to know K
- Suffer from the curse of dimensionality
- No theoretical foundation

68

33
25.12.2024

K-Means: Questions

1. How to set 𝒌 to optimally cluster the data?

2. How to evaluate your model?

3. How to cluster non circular shapes?

69

K-Means: Question-1
How to set 𝒌 to optimally cluster the data?
G-means algorithm (Hamerly and Elkan, NIPS 2003)
1. Initialize 𝑘 to be a small number
2. Run k-means with those cluster centers, and store the resulting centers as C
3. Assign each point to its nearest cluster
4. Determine if the points in each cluster fit a Gaussian distribution (Anderson-
Darling test)
5. For each cluster
§ If the points seem to be normally distributed, keep the cluster center
§ Otherwise, replace it with two cluster centers
6. Repeat this algorithm from step 2 until no more cluster centers are created
70

34
25.12.2024

K-Means: Question-2
How to evaluate your model?
§ Not trivial (as compared to counting the number of errors in classification)
§ Internal evaluation: using same data, high intra-cluster similarity
(documents within a cluster are similar) and low inter-cluster similarity
§ E.g., Davies-Bouldin index that takes into account both the distance inside the
clusters and the distance between clusters. The lower the value of the index, the
wider is the separation between different clusters, and the more tightly the points
within each cluster are located together
§ External evaluation: use of ground truth of external data
§ E.g., mutual information, entropy, adjusted random index, etc.
https://nlp.stanford.edu/IR-book/html/htmledition/evaluation-of-clustering-1.html
https://towardsdatascience.com/evaluation-metrics-for-clustering-models-5dde821dd6cd

71

K-Means: Question-3
How to cluster non circular shapes?
§ There are other methods that handle other shapes
§ Spectral Clustering
§ DBSCAN
§ BIRCH, etc.

72

35
25.12.2024

K-Means Clustering

73

K-Means Clustering

74

36
25.12.2024

K-Means Clustering

75

K-Means Clustering

How do we teach these to the software? Algorithm??

76

37
25.12.2024

K-Means Clustering

77

K-Means Clustering

78

38
25.12.2024

K-Means Clustering

79

K-Means Clustering

80

39
25.12.2024

K-Means Clustering

81

K-Means Clustering

82

40
25.12.2024

K-Means Clustering

83

K-Means Clustering

84

41
25.12.2024

K-Means Clustering

Let’s re-calculate the centroids of the clusters…

85

K-Means Clustering

86

42
25.12.2024

K-Means Clustering

87

K-Means Clustering

Let’s re-calculate the centroids of the clusters…

88

43
25.12.2024

K-Means Clustering

When will the iteration end???

89

K-Means Clustering

Optimal Solution

90

44
25.12.2024

Hierarchical Clustering

We don't want to specify the number of clusters, … but we want to group the houses…
There is a different way to group
91

Hierarchical Clustering

Grouping Rule: If two houses are close to each other, they order from the same pizza
restaurant!!
92

45
25.12.2024

Hierarchical Clustering

93

Hierarchical Clustering

94

46
25.12.2024

Hierarchical Clustering

95

Hierarchical Clustering

Merge the groups…

96

47
25.12.2024

Hierarchical Clustering

97

Hierarchical Clustering

Merge the groups…

98

48
25.12.2024

Hierarchical Clustering

99

Hierarchical Clustering

Merge the groups…

100

49
25.12.2024

Hierarchical Clustering

101

Hierarchical Clustering

102

50
25.12.2024

Hierarchical Clustering

103

Clustering Applications – Example 1


Market Segmentation

§ Goal: subdivide a market into distinct subsets of customers where any subset
may conceivably be selected as a market target to be reached with a distinct
marketing mix.

§ Approach:
§ Collect different attributes of customers based on their geographical and
lifestyle related information
§ Find clusters of similar customers
§ Measure the clustering quality by observing buying patterns of customers in
same cluster vs. those from different clusters

104

51
25.12.2024

Clustering Applications – Example 2


Document Clustering

§ Goal: To find groups of documents that are similar to each other based on the
important terms appearing in them.

§ Approach: To identify frequently occurring terms in each document. Form a


similarity measure based on the frequencies of different terms. Use it to cluster.

§ Gain: Information Retrieval can utilize the clusters to relate a new document or
search term to clustered documents.

105

Association Rule Discovery


Given a set of records each of which contain some number of items from a collection

üProduce dependency rules which will predict occurrence of an item based on


occurrences of other items

Supermarket shelf management

106

52
25.12.2024

Anomaly Detection
§ Detect significant deviations from normal behavior

§ Applications:
§ Credit Card Fraud Detection
§ Network Intrusion Detection

107

53

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy