0% found this document useful (0 votes)

8 views6 pages

MLP RL1

Reinforcement learning

Uploaded by

sathiyavathy ponnusamy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views6 pages

MLP RL1

Reinforcement learning

Uploaded by

sathiyavathy ponnusamy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

lOMoARcPSD|53150105

Machine Learning a Probabilistic Perspective Notes

Machine Shop Trigonometry (Portland Community College)

Scan to open on Studocu

Studocu is not sponsored or endorsed by any college or university

Downloaded by sathiyavathy ponnusamy (sathiyavathy.ponnusamy@gmail.com)
lOMoARcPSD|53150105

Machine Learning: A Probabilistic Perspective

Everything after Chapter 4 got advanced for me LOL so this is unfinished for the
most part

Chapter 1 - Introduction
● Machine learning is a set of methods that can automatically detect patterns in data and
then use those uncovered patterns to predict future data or perform other kinds of
decision making.
○ Best way to solve such problems is to use probability theory.
● Predictive of supervised learning is the goal of learning a mapping from inputs x to
outputs y given a set of input and output pairs, called the training set.
● Each training input x is a D dimensional vector, where each number in that vector is
called a feature or attribute.
● The outputs y can be a categorial variable from some finite set of classes. This is
classification.
● When the outputs y are real scalar values, this is regression.
● The goal of unsupervised learning is to find interesting patterns in the given inputs.
● Binary classification is when the number of classes to classify from is 2. If C > 2, it is
multi-class classification.
● The probability distribution over all possible labels, given the input vector x and the
label y and the training set D is p(y|x, D). This represents a vector of length C, with all of
the probabilities for each of the classes.
● When choosing between different models, the notation is p(y|x, D, M).
● To find out the predicted class of the model, you simply take the argmax of the
probability distributions.
○ The most probable class label is called the mode of the distribution and is known
as a MAP (maximum a posteriori) estimate.
● Unsupervised learning is where we have the task of density estimation. We want to
build models of the form p(x | θ ).
● Supervised learning is a conditional density estimation, while unsupervised learning is an
unconditional density estimation. SL is conditional because we are given a training set.
● Clustering data is a method in unsupervised learning.
○ FIrst we determine how many clusters to create.

○ Then, we estimate which cluster each point belongs to. The feature of which
cluster the point is a part of is a hidden or latent variable because it is not
observed in the training set, but rather is something that we created. We can pick
the cluster by using.

Downloaded by sathiyavathy ponnusamy (sathiyavathy.ponnusamy@gmail.com)

lOMoARcPSD|53150105

● Dimensionality reduction is used to capture the essence of the data.

○ The motivation is that although data appears high dimensional, there may only be
a small number of degrees of variability, corresponding to latent factors.
● Most common approach to dimensionality reduction is PCA (discuss later).
● Matrix completion (or imputation) is the method of inferring plausible values for missing
entries in a matrix.
● Parametric models are ones that have a fixed number of parameters.
● The number of parameters in nonparametric models grows with the amount of training
data.
○ K nearest neighbors is a nonparametric classifier, which looks at the K points in
the training set that are closest to the input x, and counts how many members of
each class are in that set.
○ Basically, the reason KNN is a nonparametric model is that the predictions
depend on the size of the training set. If you think about neural networks, they
have a fixed number of parameters and the predictions are solely dependent on
those parameters. The parameters capture everything you know about the data.

○ This can also be thought of an instance based learning or memory based

learning.
● Curse of dimensionality comes with high dimensional data.
○ To combat this, we make assumptions about the data distribution in the form of
creating a parametric model where it has a fixed number of parameters that
doesn’t increase with the training set.
● A Gaussian or normal distribution is one that is shaped like a bell curve.
● Linear regression says that the response is a linear function of the inputs.

● Logistic regression computes the linear combination of inputs, but also passes the
output through a sigmoid function, which is necessary for the output to be interpreted as
a probability.
○ If we threshold the probability, we can induce a decision rule.

Downloaded by sathiyavathy ponnusamy (sathiyavathy.ponnusamy@gmail.com)

lOMoARcPSD|53150105

● The data is not linearly separable if there is no straight line we can draw to separate the
1s from the 0s, and thus these models will have a non zero train error.
● The lower the value for K in KNN, the more likely the model is to overfit. A lower K value
signifies a complex model, while a large K underfits and is too simple.
● Generalization error is the error of a function that is tested on data it has never been
trained on.
● A common technique to measure a model’s performance is to split the training set into
two pieces: A training set and a validation set which just acts as a test set.
● Cross validation is a technique where the training data is split into K folds, and for each
fold, we train on all the folds but the k’th fold, and test on that k’th fold. The error will
averaged across all of the folds.
● Leave One Out Validation is when you set K = N
● The no free lunch theorem states that there is no universally best algorithm. We use all
of those previous methods (validation sets, cross validation, minimization of test error) to
empirically choose the best method for our particular problem. The no free lunch
theorem basically says that the performance of two models is the same if it’s averaged
over all possible problems. This sounds terrible, but the caveat is that we specific models
are better for specific problems, and most of the time, we know the problem space we
have and we’re not necessarily averaging across all problems.

Chapter 2 - Probability
● Two interpretations of probability
○ Frequentist - Probabilities represent long run frequencies of events. Ex) If we flip
a coin a bunch of times, it will land heads about half the time.
○ Bayesian - Probability is used to quantify uncertainty about something. Ex) 80%
chance of raining tomorrow. It’s basically where the probability represents how
probable we think this event is.
● p(A) denotes the probability that event A is true.
● Discrete random variables are variables that can take some value from a finite set X.
● Conditional probability of A, given that B is true.

● Probability of an event based on a prior

● X and Y are unconditionally independent if p(X,Y) = p(X)p(Y). Equivalently, you can also
say p(X | Y) = P(X).

Downloaded by sathiyavathy ponnusamy (sathiyavathy.ponnusamy@gmail.com)

lOMoARcPSD|53150105

○ Set of variables is mutually independent if the joint can be written as a product of

marginals.
○ Basically, two events are unconditionally independent if the fact that one event
happened doesn’t impact the probability of another event happening.
● X and Y are conditionally independent given Z if the conditional joint can be written as a
product of conditional marginals. p(X, Y | Z) = p(X|Z) p(Y|Z)
● The probability that an uncertain continuous quantity X lies between a and b can be
computed as the following.

○ f(x) can be defined as the probability density function.

● Each distribution has a
○ mean (expected value)

○ variance (spread of a distribution)

○ standard deviation

● Suppose we toss a coin N times. Let X be the number of heads (anywhere from 0 to N).
○ If the probability of heads is θ , then X has a binomial distribution.

● Suppose we toss the coin once. Let X be 0 or 1 (depending on the coin flip result), with
the probability of heads as θ .
○ We say that X has a Bernoulli distribution.
○ A Bernoulli random variable is one that only has 2 outcomes.
● Encoding the states 1, 2, and 3 as (1,0,0) and (0,1,0) and (0,0,1) is a one-hot encoding.
● X can have a Poisson distribution with a parameter lambda, with the following probability
mass function.

● A couple of the most common univariate continuous distributions are

○ Gaussian (normal) distribution: Bell curve
○ Laplace distribution: Double sided exponential distribution
○ Gamma distribution: Flexible for positive real values

Downloaded by sathiyavathy ponnusamy (sathiyavathy.ponnusamy@gmail.com)

lOMoARcPSD|53150105

○ Exponential distribution: Times between events in a Poisson process

○ Beta distribution: Support over the interval [0,1]
○ Pareto distribution: Model distributions with long tails.
● Joint probability distributions are distributions on multiple related random variables.
○ It models the relationships between the variables.
● Covariance between X and Y measures the degree to which X and Y are linearly related.

● The Pearson correlation coefficient between X and Y can be defined below.

● The multivariate Gaussian is the most widely used joint probability density function for
continuous variables.
● The Kullback-Leibler divergence is a measure of the dissimilarity between two probability
distributions p and q. It’s also known as the relative entropy.
● Laplace’s principle of insufficient reason argues in favor of using uniform distributions
when there are no other reasons to favor one distribution over another.

Chapter 3 - Generative Models for Discrete Data

● Concept learning is a type of binary classification where f(x) = 1 if x is an example of the
concept C and f(x) = 0 otherwise. The goal is to learn the indicator function f, which
defines which elements are in the set C.
○ You can think of an example of concept learning as when a child learns to
understand the meaning of a word.
● Occam’s razor is the theory that the model favors the simplest hypothesis consistent with
the data.
● (Couldn’t understand most of this chapter to be honest)

Chapter 4 - Gaussian Models

● Multivariate Gaussian is the most widely used joint probability density function for
continuous variables.
●

Downloaded by sathiyavathy ponnusamy (sathiyavathy.ponnusamy@gmail.com)

Data Science Cheatsheet
100% (1)
Data Science Cheatsheet
5 pages
Experiment No.2 - Cyber Security
No ratings yet
Experiment No.2 - Cyber Security
16 pages
Types of AI Agents Artificial Intelligence
100% (1)
Types of AI Agents Artificial Intelligence
4 pages
Simple Linear Ordinary Least Squares Regression: JTMS-03 Applied Statistics With R
No ratings yet
Simple Linear Ordinary Least Squares Regression: JTMS-03 Applied Statistics With R
39 pages
CS 188 Fall 2018 Written HW4 Soln
No ratings yet
CS 188 Fall 2018 Written HW4 Soln
6 pages
Brief Intro To ML PDF
No ratings yet
Brief Intro To ML PDF
236 pages
Huawei H12-211 PRACTICE EXAM HCNA-HNTD H
No ratings yet
Huawei H12-211 PRACTICE EXAM HCNA-HNTD H
117 pages
6.867 Lecture Notes: Section 1: Introduction: 1 Intro 2 2 Problem Class 3
No ratings yet
6.867 Lecture Notes: Section 1: Introduction: 1 Intro 2 2 Problem Class 3
10 pages
Machine Learning Handbook - Radivojac and White
No ratings yet
Machine Learning Handbook - Radivojac and White
108 pages
Lecture5 Maximum Likelihood
No ratings yet
Lecture5 Maximum Likelihood
13 pages
EDAN96 2024 Last Lecture-1
No ratings yet
EDAN96 2024 Last Lecture-1
78 pages
Ml2 Script v2
No ratings yet
Ml2 Script v2
123 pages
Machine Learning Notes
100% (3)
Machine Learning Notes
134 pages
Machine Learning HC
No ratings yet
Machine Learning HC
4 pages
Pattern Summary Final
No ratings yet
Pattern Summary Final
28 pages
07 Intro To ML
No ratings yet
07 Intro To ML
38 pages
Cheatsheet Supervised Learning
No ratings yet
Cheatsheet Supervised Learning
4 pages
LR, Decision Tree
No ratings yet
LR, Decision Tree
48 pages
This Story Paraphrased From A Post On 9/4/12
No ratings yet
This Story Paraphrased From A Post On 9/4/12
7 pages
MIT - Machine Learning Notes From Chapter 1 - 14 PDF
No ratings yet
MIT - Machine Learning Notes From Chapter 1 - 14 PDF
101 pages
Supervised Learning
No ratings yet
Supervised Learning
6 pages
Machine Learning and Data Mining
No ratings yet
Machine Learning and Data Mining
88 pages
ML Unit 1
No ratings yet
ML Unit 1
73 pages
AIML
No ratings yet
AIML
30 pages
Bishop Solutions PDF
No ratings yet
Bishop Solutions PDF
87 pages
Unit-1 - Machine Learning
No ratings yet
Unit-1 - Machine Learning
85 pages
Chapter1: Introduction: Notes On MLAPP
No ratings yet
Chapter1: Introduction: Notes On MLAPP
25 pages
Chapter
100% (1)
Chapter
101 pages
Notes6 Classification
No ratings yet
Notes6 Classification
10 pages
ChatGPT - Machine Learning Overview
No ratings yet
ChatGPT - Machine Learning Overview
34 pages
Super Cheatsheet Machine Learning
100% (1)
Super Cheatsheet Machine Learning
15 pages
Pattern Revision
No ratings yet
Pattern Revision
63 pages
Aml Notes
No ratings yet
Aml Notes
21 pages
Q. 1) What Is Class Condition Density? (3 Marks) Ans
No ratings yet
Q. 1) What Is Class Condition Density? (3 Marks) Ans
12 pages
COMP4702 Notes 2019: Week 2 - Supervised Learning
No ratings yet
COMP4702 Notes 2019: Week 2 - Supervised Learning
23 pages
ML Imp QB
No ratings yet
ML Imp QB
34 pages
CH 1
No ratings yet
CH 1
24 pages
AWS Machine Learning Specialty Master Cheat Sheet
No ratings yet
AWS Machine Learning Specialty Master Cheat Sheet
24 pages
Fall 2022 Midterm Notes PDF
No ratings yet
Fall 2022 Midterm Notes PDF
15 pages
07 - Bayesian Learning
No ratings yet
07 - Bayesian Learning
55 pages
Introduction ML
No ratings yet
Introduction ML
65 pages
Cheat Sheet
No ratings yet
Cheat Sheet
163 pages
Chap1 Bishop
No ratings yet
Chap1 Bishop
35 pages
Machine Learning
No ratings yet
Machine Learning
32 pages
Let's Begin With:: Differentiate Between Supervised and Unsupervised Learning
No ratings yet
Let's Begin With:: Differentiate Between Supervised and Unsupervised Learning
26 pages
Cheet Sheet
No ratings yet
Cheet Sheet
47 pages
Unit 3
No ratings yet
Unit 3
16 pages
Aula 4 (L) - Oggi La Tua Lezione È in Presenza
No ratings yet
Aula 4 (L) - Oggi La Tua Lezione È in Presenza
11 pages
Predictive Unit 1
No ratings yet
Predictive Unit 1
22 pages
ECE 449 Notes
No ratings yet
ECE 449 Notes
5 pages
Lecture4 Foundations Supervised Learning
No ratings yet
Lecture4 Foundations Supervised Learning
22 pages
Machine Learning Lecture Notes
No ratings yet
Machine Learning Lecture Notes
119 pages
L09 Learning I Bayesian Learning
No ratings yet
L09 Learning I Bayesian Learning
66 pages
PR & ML: CS5691: Machine Learning
No ratings yet
PR & ML: CS5691: Machine Learning
42 pages
Pattern Unit 3
No ratings yet
Pattern Unit 3
14 pages
Machine Learning Notes 1
No ratings yet
Machine Learning Notes 1
120 pages
Optimization Problems For Machine Learning: A Survey
No ratings yet
Optimization Problems For Machine Learning: A Survey
41 pages
ML Fundamentals by Bitspace
No ratings yet
ML Fundamentals by Bitspace
19 pages
DS - UNIT - III - QB & Ans
No ratings yet
DS - UNIT - III - QB & Ans
25 pages
Deep Learning Answers
No ratings yet
Deep Learning Answers
36 pages
Machine Learning: Lecture 6: Bayesian Learning (Based On Chapter 6 of Mitchell T.., Machine Learning, 1997)
No ratings yet
Machine Learning: Lecture 6: Bayesian Learning (Based On Chapter 6 of Mitchell T.., Machine Learning, 1997)
15 pages
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
From Everand
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
SUJAUL CHOWDHURY
No ratings yet
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
MCS-011: Problem Solving and Programming
From Everand
MCS-011: Problem Solving and Programming
Dr. DK Sukhani
No ratings yet
Emerging Trends in AI
No ratings yet
Emerging Trends in AI
14 pages
AI Fraud Detection
No ratings yet
AI Fraud Detection
19 pages
Algorithm For RL
No ratings yet
Algorithm For RL
99 pages
Toys Problem
No ratings yet
Toys Problem
21 pages
Unit
No ratings yet
Unit
112 pages
SYLLABUS
No ratings yet
SYLLABUS
4 pages
Smooth N-Gram
No ratings yet
Smooth N-Gram
2 pages
Depth Buffer Method
No ratings yet
Depth Buffer Method
12 pages
Paper 4a QP
No ratings yet
Paper 4a QP
6 pages
Heuristic Search
No ratings yet
Heuristic Search
51 pages
CH 1.1 Slides 210 SJ
No ratings yet
CH 1.1 Slides 210 SJ
24 pages
A Demand Forecasting Method For High Value-Added Agri - Food Based On Machine Learning and Time Series Analysis
No ratings yet
A Demand Forecasting Method For High Value-Added Agri - Food Based On Machine Learning and Time Series Analysis
4 pages
Writeup
No ratings yet
Writeup
5 pages
Transportation Model
No ratings yet
Transportation Model
44 pages
تقدير متجه المتوسطات ومصفوفة التباين والتباين المشترك PDF
No ratings yet
تقدير متجه المتوسطات ومصفوفة التباين والتباين المشترك PDF
3 pages
Enr 533 Examples
No ratings yet
Enr 533 Examples
7 pages
Scilab 1
No ratings yet
Scilab 1
14 pages
While, For and Repeat Until
No ratings yet
While, For and Repeat Until
5 pages
Multiple Regression
No ratings yet
Multiple Regression
36 pages
FactorisingExercises132 PDF
No ratings yet
FactorisingExercises132 PDF
4 pages
Week 7 Operators What Are The Fundamental Image Processing Operators
No ratings yet
Week 7 Operators What Are The Fundamental Image Processing Operators
1 page
Process Control Systems
100% (1)
Process Control Systems
3 pages
FT of AI
No ratings yet
FT of AI
109 pages
ISYE 6740 - (SU22) Syllabus
No ratings yet
ISYE 6740 - (SU22) Syllabus
6 pages
Point Slope Form
No ratings yet
Point Slope Form
11 pages
Presented by Prof. Dr. A. M. Siddiqui Penn State University, York, USA
No ratings yet
Presented by Prof. Dr. A. M. Siddiqui Penn State University, York, USA
18 pages
Color Image Compression-Encryption Algorithm Based
No ratings yet
Color Image Compression-Encryption Algorithm Based
14 pages
Multivariable Control Theory - Lecture 3
No ratings yet
Multivariable Control Theory - Lecture 3
30 pages
Distributed Computing QP - Comp
No ratings yet
Distributed Computing QP - Comp
1 page
Reccurences General
No ratings yet
Reccurences General
22 pages
Tries and Huffman Encoding
No ratings yet
Tries and Huffman Encoding
16 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

MLP RL1

Uploaded by

MLP RL1

Uploaded by

lOMoARcPSD|53150105

Machine Learning a Probabilistic Perspective Notes

Machine Shop Trigonometry (Portland Community College)

Scan to open on Studocu

Studocu is not sponsored or endorsed by any college or university

Machine Learning: A Probabilistic Perspective

Downloaded by sathiyavathy ponnusamy (sathiyavathy.ponnusamy@gmail.com)

● Dimensionality reduction is used to capture the essence of the data.

○ This can also be thought of an instance based learning or memory based

Downloaded by sathiyavathy ponnusamy (sathiyavathy.ponnusamy@gmail.com)

● Probability of an event based on a prior

Downloaded by sathiyavathy ponnusamy (sathiyavathy.ponnusamy@gmail.com)

○ Set of variables is mutually independent if the joint can be written as a product of

○ f(x) can be defined as the probability density function.

○ variance (spread of a distribution)

● A couple of the most common univariate continuous distributions are

Downloaded by sathiyavathy ponnusamy (sathiyavathy.ponnusamy@gmail.com)

○ Exponential distribution: Times between events in a Poisson process

● The Pearson correlation coefficient between X and Y can be defined below.

Chapter 3 - Generative Models for Discrete Data

Chapter 4 - Gaussian Models

Downloaded by sathiyavathy ponnusamy (sathiyavathy.ponnusamy@gmail.com)

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

MLP RL1

Uploaded by

MLP RL1

Uploaded by

lOMoARcPSD|53150105

Machine Learning a Probabilistic Perspective Notes

Machine Shop Trigonometry (Portland Community College)

Scan to open on Studocu

Studocu is not sponsored or endorsed by any college or university

Machine Learning: A Probabilistic Perspective

Downloaded by sathiyavathy ponnusamy (sathiyavathy.ponnusamy@gmail.com)

● Dimensionality reduction​ is used to capture the essence of the data.

○ This can also be thought of an instance based learning or memory based

Downloaded by sathiyavathy ponnusamy (sathiyavathy.ponnusamy@gmail.com)

● Probability of an event based on a prior

Downloaded by sathiyavathy ponnusamy (sathiyavathy.ponnusamy@gmail.com)

○ Set of variables is mutually independent if the joint can be written as a product of

○ f(x) can be defined as the probability density function.

○ variance (spread of a distribution)

● A couple of the most common univariate continuous distributions are

Downloaded by sathiyavathy ponnusamy (sathiyavathy.ponnusamy@gmail.com)

○ Exponential distribution: Times between events in a Poisson process

● The Pearson correlation coefficient between X and Y can be defined below.

Chapter 3 - Generative Models for Discrete Data

Chapter 4 - Gaussian Models

Downloaded by sathiyavathy ponnusamy (sathiyavathy.ponnusamy@gmail.com)

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

● Dimensionality reduction is used to capture the essence of the data.