0% found this document useful (0 votes)

15 views103 pages

Cs Ai Lecture Notes 02

The document discusses various models and theories in decision-making, particularly focusing on agent-based decision-making in deterministic and stochastic environments, exemplified by the 8-puzzle and soccer. It covers search strategies, including breadth-first and iterative-deepening search, as well as the application of probability theory and Bayesian inference in learning and decision-making processes. Additionally, it addresses concepts like regression, model selection, and the implications of dimensionality in data analysis.

Uploaded by

techviktor17

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views103 pages

Cs Ai Lecture Notes 02

Uploaded by

techviktor17

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 103

Discovering Models /

Theories

cs365 2015 mukerjee

Domain Theories
 Agent :
given precept history p ∈ P,
select decision from set of choices a ∈ A
so as to meet a goal g (performance) –
maximize utility function U()

 Requires knowledge of how actions under different precepts

affect the goal
 Model or Theory

 Task domains: a) 8-puzzle, [detrmnstc] b) Soccer [stochastic]

8-puzzle

• Precept = state

• Actions = move

• Goal : T/F

• Utility : num moves

8-puzzle

• State = [7,2,4,5,B,6,8,3,1]

• Actions = L,R, U,D

State + Action
 new State

• Decision: based on Search

• [Informed / Uninformed]
Breadth-first search
• Expand shallowest unexpanded node

• Fringe: FIFO queue new successors go at end

O(b1+d)

CS 3243 - Blind Search 5

Properties of breadth-first search
• Complete? Yes (if b is finite)

• Time? 1+b+b2+b3+… +bd + b(bd-1) = O(bd+1)

• Space? O(bd+1) (keeps every node in memory)

• Optimal? Yes (if cost = 1 per step)

Iterative-Deepening search

14 Jan 2004 CS 3243 - Blind Search 7

Cost-based search

• edges don’t have

equal cost

• Breadth-first = first
search lower costs
from START
• Fringe: FIFO

O(b1+C/ ε)

8
Soccer

• Precept = goalie, self, ball

+ wind, opponents, θ
teammates…

• Actions = kick (angle,

speed, swing)

• Utility : goal probability

Discrete-Deterministic Spaces:

Search
Uninformed search strategies
• Uninformed search strategies use only the
information available in the problem definitio
• Breadth-first search
• Uniform-cost search
• Depth-first search
• Depth-limited search
• Iterative deepening search
Breadth-first search
• Expand shallowest unexpanded node

• Fringe: FIFO queue new successors go at end

14 Jan 2004 CS 3243 - Blind Search 12

Properties of breadth-first search
• Complete? Yes (if b is finite)

• Time? 1+b+b2+b3+… +bd + b(bd-1) = O(bd+1)

• Space? O(bd+1) (keeps every node in memory)

• Optimal? Yes (if cost = 1 per step)

Representing
the state
space

1. States:

2. Actions :

3. Goal test:

4. Cost:
8-puzzle heuristics
Admissible:

• h1 : Number of misplaced tiles

=6
goal:
• h2: Sum of Manhattan
distances of the tiles
from their goal positions
= 0+0+1+1+2+3+1+3=11
8-puzzle heuristics
Nilsson’s Sequence
Score(n) = P(n) + 3 S(n)

P(n) : Sum of Manhattan distances of each tile from

its proper position
S(n), sequence score : check around the non-central
squares:
+2 for every tile not followed by successor
0 for every other tile.
piece in center = +1
Stochastic Spaces
Soccer

θ
Soccer : Shooting at goal

[acharya mukerjee 01]

Soccer : Shoot, Pass, dribble, or … ?
Handwritten digits - MNIST
Confusion matrix
Discovering theories
Continuous Data
Discrete Attribute data
• Examples described by attribute values (Boolean, discrete, continuous)
• E.g., situations where I will/won't wait at a restaurant:

• Classification of examples is positive (T) or negative (F)

Discrete Features
• Parse the sentence: “Time flies like an arrow”

May have many parses.

How to rank the choices?
Regression
Modelling as Regression
Given a set of decisions yi based on observations xi,
- derived from unknown function y = f(x)
- with noise

Try to find a model or theory:

y = h(x) ≈ f(x)

where h() is drawn from the hypothesis space – e.g. the space of
radial basis functions, or polynomials, etc.
Polynomial Curve Fitting

[Bishop 06] ch.1

Linear Regression
y = f(x) = Σi wi . φi(x)

φi(x) : basis function

wi : weights

Linear : function is linear in the weights

Quadratic error function --> derivative is linear in w
Sum-of-Squares Error Function
0th Order Polynomial
1st Order Polynomial
3rd Order Polynomial
9th Order Polynomial
Over-fitting

Root-Mean-Square (RMS) Error:

Polynomial Coefficients
9th Order Polynomial
Data Set Size:
9th Order Polynomial
Data Set Size:
9th Order Polynomial
Regularization

Penalize large coefficient values

Regularization:
Regularization:
Regularization: vs.
Polynomial Coefficients
Probability Theory
Learning = discovering regularities
- Regularity : repeated experiments:
outcome not be fully predictable

outcome = “possible world”

set of all possible worlds = Ω
Probability Theory
Apples and Oranges
Sample Space
Sample ω = Pick two fruits,
e.g. Apple, then Orange
Sample Space Ω = {(A,A), (A,O),
(O,A),(O,O)}
= all possible worlds

Event e = set of possible worlds, e ⊆ Ω

• e.g. second one picked is an apple
Learning = discovering regularities
- Regularity : repeated experiments:
outcome not be fully predictable

- Probability p(e) : "the fraction of possible worlds in

which e is true” i.e. outcome is event e

- Frequentist view : p(e) = limit as N → ∞

- Belief view: in wager : equivalent odds
(1-p):p that outcome is in e, or vice versa
Axioms of Probability
- non-negative : p(e) ≥ 0

- unit sum p(Ω) = 1

i.e. no outcomes outside sample space

- additive : if e1, e2 are disjoint events (no common

outcome):
p(e1) + p(e2) = p(e1 ∪ e2)
ALT:
p(e1 ∨ e2) = p(e1) + p(e2) - p(e1 ∧ e2)
Why probability theory?
different methodologies attempted for uncertainty:
– Fuzzy logic
– Multi-valued logic
– Non-monotonic reasoning
But unique property of probability theory:
If you gamble using probabilities you have the best
chance in a wager. [de Finetti 1931]
=> if opponent uses some other system, he's
more likely to lose
Ramsay-diFinetti theorem (1931)
If agent X’s degrees of belief are rational, then X ’s
degrees of belief function defined by fair betting
rates is (formally) a probability function
Fair betting rates: opponent decides which side one
bets on
Proof: fair odds result in a function pr () that satisifies
the Kolmogrov axioms:
Normality : pr(S) >=0
Certainty : pr(T)=1
Additivity : pr (S1 v S2 v.. )= Σ(Si)
Joint vs. conditional probability

Marginal Probability

Joint Probability Conditional Probability

Probability Theory

Sum Rule

Product Rule
Rules of Probability

Sum Rule

Product Rule
Example
A disease d occurs in 0.05% of population. A test is
99% effective in detecting the disease, but 5% of
the cases test positive in absence of d.
10000 people are tested. How many are expected to
test positive?
p(d) = 0.0005 ; p(t/d) = 0.99 ; p(t/~d) = 0.05
p(t) = p(t,d) + p(t,~d) [Sum Rule]
= p(t/d)p(d) + p(t/~d)p(~d) [Product Rule]
= 0.99*0.0005 + 0.05 * 0.9995 = 0.0505  505 +ve
Bayes’ Theorem

posterior  likelihood × prior

Bayes’ Theorem
Thomas Bayes (c.1750):
how can we infer causes from effects?
How can one learn the probability of a future event if one knew
only
how many times it had (or had not) occurred in the past?

as new evidence comes in --> prob knowledge improves.

e.g. throw a die. guess is poor (1/6)
throw die again. is it > or < than prev? Can improve guess.
throw die repeatedly. can improve prob of guess quite a lot.

Hence: initial estimate (prior belief P(h), not well formulated)

+ new evidence (support) – compute likelihood P(data|h)
 improved estimate (posterior P(h|data) )
Example
A disease d occurs in 0.05% of population. A test is
99% effective in detecting the disease, but 5% of
the cases test positive in absence of d.
If you are tested +ve, what is the probability you have
the disease?
p(d/t) = p(d) . p(t/d) / p(t) ; p(t) = 0.0505
p(d/t) = 0.0005 * 0.99 / 0.0505 = 0.0098 (about 1%)
if 10K people take the test, E(d) = 5
FPs = 0.05 * 9995 = 500
TPs = 0.99 * 5 = 5.  only 5/505 have d
Bayesian Inference
Testing for hypothesis H given evidence E
- Evidence : based on new observation E
- Prior : Earlier evaluation about the probability of H
- Likelihood : probability of evidence given hypothesis
P(E|H)
normalization(
Bayesian inference: (marginal lklihood)
P (H|E) = P(E|H) P(H) / P(E)

Posterior probability
Bayesian Inference
The fruit picked is an orange
(o). What is the probability
that it’s from the blue box (B)?

orange
P(B|o) =
P(o|B)p(B) / P(o)

Given: red box is picked

40%  p(B) = 0.6

P(o) = (¾.6 + 1/30.4) = 11/20

P(B|o) = ¾ * .6 * 20/11 = 9/11

Continuous variables:
Probability Densities
Probability Densities
cumulative
Expectations

discrete x continuous x

Frequentist approximation w unbiased sample

(both discrete / continuous)

The Gaussian Distribution
Gaussian Mean and Variance
Central Limit Theorem
Distribution of sum of N i.i.d. random variables
becomes increasingly Gaussian for larger N.

Example: N uniform [0,1] random variables.

Gaussian Parameter Estimation

Observations
assumed to be
indpendently
drawn from same
distribution (i.i.d)

Likelihood function
Maximum (Log) Likelihood
Distributions over
Multi-dimensional spaces
The Multivariate Gaussian

lines of equal
probability densities
Multivariate distribution

joint distribution P(x,y) varies considerably

though marginals P(x), P(y) are identical

estimating the joint distribution requires

much larger sample: O(nk) vs nk
Marginals and Conditionals

marginals P(x), P(y) are gaussian

conditional P(x|y) is also gaussian
Non-intuitive in high dimensions

As dimensionality
increases, bulk of
data moves away
from center

Gaussian in polar coordinates;

p(r)δr : prob. mass inside annulus δr at r.
Change of variable x=g(y)
Bernoulli Process

Successive Trials – e.g. Toss a coin three times:

HHH, HHT, HTH, HTT, THH, THT, TTH, TTT

Probability of k Heads:

k 0 1 2 3
P(k) 1/8 3/8 3/8 1/8
Probability of success: p, failure q, then
Model Selection
Model Selection
Cross-Validation
Quantized-Cell Classification

flow data
red: ‘homogenous’,
green : ‘annular’,
blue : ‘laminar’.
Curse of Dimensionality

general cubic polynomial for D dimensions : O(D3) parameters

Curse of Dimensionality
The unit hyper cube and unit sphere in high dimensions

At higher dim, vol(sphere) / vol(hypercube)  0

Curse of Dimensionality
Polynomial curve fitting, M = 3

Gaussian Densities in
higher dimensions
Regression with Polynomials
Curve Fitting Re-visited
Bayesian Inference
Testing for hypothesis H given evidence E

likelihood
Bayesian inference:
P (H|E) = P(E|H) P(H) / P(E)
prior
posterior
Maximum Likelihood
Evidence = t; Hypothesis = poly(x,w)
Maximum Likelihood
Evidence = t; Hypothesis = poly(x,w)

Determine by minimizing sum-of-squares error,

.
Predictive Distribution
MAP: A Step towards Bayes

Determine by minimizing regularized sum-of-squares error,

MAP = Maximum Posterior

Bayesian Curve Fitting
Bayesian Predictive Distribution
Information Theory
Twenty Questions
Knower: thinks of object (point in a probability space)
Guesser: asks knower to evaluate random variables

Stupid approach:

Guesser: Is it my left big toe?

Knower: No.

Guesser: Is it Valmiki?
Knower: No.

Guesser: Is it Aunt Lakshmi?

...
Expectations & Surprisal
Turn the key: expectation: lock will open

Exam paper showing: could be 100, could be zero.

random variable: function from set of marks
to real interval [0,1]

Interestingness ∝ unpredictability

surprisal (r.v. = x) = - log2 p(x)

= 0 when p(x) = 1
= 1 when p(x) = ½
= ∞ when p(x) = 0
Expectations in data

A: 00010001000100010001. . . 0001000100010001000100010001

B: 01110100110100100110. . . 1010111010111011000101100010

C: 00011000001010100000. . . 0010001000010000001000110000

Structure in data  easy to remember

Entropy

Used in
• coding theory
• statistical physics
• machine learning
Entropy
Entropy
In how many ways can N identical objects be allocated M
bins?

Entropy maximized when

Entropy in Coding theory
x discrete with 8 possible states; how many bits to
transmit the state of x?

All states equally likely

Coding theory
Entropy in Twenty Questions
Intuitively : try to ask q whose answer is 50-50

Is the first letter between A and M?

question entropy = p(Y)logp(Y) + p(N)logP(N)

For both answers equiprobable:

entropy = - ½ * log2(½) - ½ * log2(½) = 1.0

For P(Y)=1/1028
entropy = - 1/1028 * -10 - eps = 0.01

PyCon 2015 - Bayesian Statistics Made Simple
100% (4)
PyCon 2015 - Bayesian Statistics Made Simple
145 pages
ML Cheat Sheet
50% (2)
ML Cheat Sheet
74 pages
Probablity Mit Removed
No ratings yet
Probablity Mit Removed
31 pages
List of National Anthems
No ratings yet
List of National Anthems
28 pages
MAT3003 Modules - (1 2 3) - Updated
No ratings yet
MAT3003 Modules - (1 2 3) - Updated
40 pages
3logistic Regression
No ratings yet
3logistic Regression
61 pages
Capítulo 1 Probabilidad Libro
No ratings yet
Capítulo 1 Probabilidad Libro
33 pages
Cs229 Probability Review
No ratings yet
Cs229 Probability Review
36 pages
ECE 368 Course Review: Probabilistic Reasoning 2023
No ratings yet
ECE 368 Course Review: Probabilistic Reasoning 2023
138 pages
MTH2222 Mathematics of Uncertainty
No ratings yet
MTH2222 Mathematics of Uncertainty
96 pages
CLASS 2025 Bayesian Framework
No ratings yet
CLASS 2025 Bayesian Framework
46 pages
Leon-Garcia-IPPR - Chapters 1-6
No ratings yet
Leon-Garcia-IPPR - Chapters 1-6
180 pages
PTSP
No ratings yet
PTSP
101 pages
Unit 2 (2) - 1
No ratings yet
Unit 2 (2) - 1
37 pages
12th Certificate
No ratings yet
12th Certificate
1 page
Mathematics in Machine Learning
No ratings yet
Mathematics in Machine Learning
83 pages
Theories of Gender and Media
No ratings yet
Theories of Gender and Media
3 pages
07 Probability Review
No ratings yet
07 Probability Review
56 pages
Lecture 10
No ratings yet
Lecture 10
59 pages
New Generation DMR Radio Application Notes V1.5.00
No ratings yet
New Generation DMR Radio Application Notes V1.5.00
331 pages
4b ProbabilityNotes
No ratings yet
4b ProbabilityNotes
79 pages
8 - Probability
No ratings yet
8 - Probability
54 pages
Statistical Tests Martin G 161131 V15 UPLOAD
No ratings yet
Statistical Tests Martin G 161131 V15 UPLOAD
33 pages
Lecture Notes MAI
No ratings yet
Lecture Notes MAI
114 pages
Lecture Notes MAI
No ratings yet
Lecture Notes MAI
111 pages
Cpts 440 / 540 Artificial Intelligence: Uncertainty Reasoning
No ratings yet
Cpts 440 / 540 Artificial Intelligence: Uncertainty Reasoning
59 pages
AIML-Unit 3 Notes-Assignment 3
No ratings yet
AIML-Unit 3 Notes-Assignment 3
37 pages
PMRprobabilistic Modelling Primer
No ratings yet
PMRprobabilistic Modelling Primer
14 pages
Fall 2019 Prob Review
No ratings yet
Fall 2019 Prob Review
33 pages
Probabilities
No ratings yet
Probabilities
7 pages
Introduction To Bayesian Learning: Aaron Hertzmann University of Toronto SIGGRAPH 2004 Tutorial
No ratings yet
Introduction To Bayesian Learning: Aaron Hertzmann University of Toronto SIGGRAPH 2004 Tutorial
141 pages
ML - Lec 2 - Review of Probability and Statistics
No ratings yet
ML - Lec 2 - Review of Probability and Statistics
30 pages
Introduction To Probability Theory and S
No ratings yet
Introduction To Probability Theory and S
127 pages
Foundations of Machine Learning: Part A: Probability Basics
No ratings yet
Foundations of Machine Learning: Part A: Probability Basics
75 pages
Pantropiko Satb - 025855
No ratings yet
Pantropiko Satb - 025855
15 pages
Reading Material 02
No ratings yet
Reading Material 02
30 pages
Module5 2
No ratings yet
Module5 2
5 pages
Naive Bayes
No ratings yet
Naive Bayes
25 pages
Report Mid
No ratings yet
Report Mid
19 pages
Stochbasics Handout
No ratings yet
Stochbasics Handout
36 pages
Turn in Recitation and Tutorial Scheduling Form Policy: Text
No ratings yet
Turn in Recitation and Tutorial Scheduling Form Policy: Text
52 pages
Introduction To Discrete Probability Theory and Bayesian Networks
No ratings yet
Introduction To Discrete Probability Theory and Bayesian Networks
26 pages
On Probability Theory &stochastic Process
No ratings yet
On Probability Theory &stochastic Process
101 pages
Report Endterm
No ratings yet
Report Endterm
30 pages
Chapter13 Uncertainty
No ratings yet
Chapter13 Uncertainty
49 pages
Bayesian Learning: Thanks To Nir Friedman, HU
No ratings yet
Bayesian Learning: Thanks To Nir Friedman, HU
41 pages
CHP 5
No ratings yet
CHP 5
63 pages
Introduction To Probability Theory and Statistics
No ratings yet
Introduction To Probability Theory and Statistics
127 pages
Sample Paper 12 With Answer Key
No ratings yet
Sample Paper 12 With Answer Key
43 pages
AbhyasaGana 1
100% (3)
AbhyasaGana 1
20 pages
Modeling With Probability
No ratings yet
Modeling With Probability
91 pages
Sam Roweis Probx
No ratings yet
Sam Roweis Probx
12 pages
Rabbi Shneur Zalman of Liadi - Lessons in Tanya Vol-1
83% (6)
Rabbi Shneur Zalman of Liadi - Lessons in Tanya Vol-1
474 pages
PJSUA2 Doc
No ratings yet
PJSUA2 Doc
273 pages
Dealing With Uncertainty P (X - E) : Probability Theory The Foundation of Statistics
No ratings yet
Dealing With Uncertainty P (X - E) : Probability Theory The Foundation of Statistics
34 pages
Scribe: Naive Bayes Classifier
No ratings yet
Scribe: Naive Bayes Classifier
16 pages
Probability Theory For Machine Learning: Chris Cremer September 2015
No ratings yet
Probability Theory For Machine Learning: Chris Cremer September 2015
40 pages
Breast MRI Structured Report TEMPLATES
No ratings yet
Breast MRI Structured Report TEMPLATES
6 pages
Q 4
No ratings yet
Q 4
27 pages
ML Unit2-1
No ratings yet
ML Unit2-1
11 pages
The Largest Truly Open Library in Human History
No ratings yet
The Largest Truly Open Library in Human History
10 pages
Individual Workplan
No ratings yet
Individual Workplan
1 page
A 18-Page Statistics & Data Science Cheat Sheets
No ratings yet
A 18-Page Statistics & Data Science Cheat Sheets
18 pages
Probability Theory - Towards Data Science
No ratings yet
Probability Theory - Towards Data Science
19 pages
Design and Configure Azure Front Door
No ratings yet
Design and Configure Azure Front Door
11 pages
ECE523 Engineering Applications of Machine Learning and Data Analytics - Bayes and Risk - 1
No ratings yet
ECE523 Engineering Applications of Machine Learning and Data Analytics - Bayes and Risk - 1
7 pages
Chapter 1 Legally Hai
No ratings yet
Chapter 1 Legally Hai
5 pages
Introduction To Probability Theory: A Short Course On Graphical Models
No ratings yet
Introduction To Probability Theory: A Short Course On Graphical Models
30 pages
JMAK
0% (1)
JMAK
21 pages
Marriland Team Builder For Pokémon Teams
No ratings yet
Marriland Team Builder For Pokémon Teams
1 page
Ajp 22517 Important Questions
No ratings yet
Ajp 22517 Important Questions
14 pages
Bisayan Studies
No ratings yet
Bisayan Studies
10 pages
Introduction To Probabilistic Learning
No ratings yet
Introduction To Probabilistic Learning
9 pages
Software For Embedded Systems Assignment
No ratings yet
Software For Embedded Systems Assignment
4 pages
The Study of Select Themes in Cormac Mcarthy'S
No ratings yet
The Study of Select Themes in Cormac Mcarthy'S
26 pages
1803indonesian Grammer
No ratings yet
1803indonesian Grammer
7 pages
Bayes
No ratings yet
Bayes
10 pages
Interview Questions and Topics
No ratings yet
Interview Questions and Topics
3 pages
SAP BODS Training v1.0
No ratings yet
SAP BODS Training v1.0
10 pages
Context Driven
No ratings yet
Context Driven
18 pages
Online Food Ordering System in ASP Net S
No ratings yet
Online Food Ordering System in ASP Net S
5 pages
6003 19545 1 PB
No ratings yet
6003 19545 1 PB
9 pages
Eskimo Words For Snow
No ratings yet
Eskimo Words For Snow
7 pages
Literary Devices English 4
No ratings yet
Literary Devices English 4
3 pages
Description: Tags: Jjfellows2001
No ratings yet
Description: Tags: Jjfellows2001
3 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Cs Ai Lecture Notes 02

Uploaded by

Cs Ai Lecture Notes 02

Uploaded by

Discovering Models /

cs365 2015 mukerjee

 Requires knowledge of how actions under different precepts

 Task domains: a) 8-puzzle, [detrmnstc] b) Soccer [stochastic]

• Utility : num moves

• Actions = L,R, U,D

• Decision: based on Search

• Fringe: FIFO queue new successors go at end

CS 3243 - Blind Search 5

• Time? 1+b+b2+b3+… +bd + b(bd-1) = O(bd+1)

• Space? O(bd+1) (keeps every node in memory)

• Optimal? Yes (if cost = 1 per step)

14 Jan 2004 CS 3243 - Blind Search 7

• edges don’t have

• Precept = goalie, self, ball

• Actions = kick (angle,

• Utility : goal probability

• Fringe: FIFO queue new successors go at end

14 Jan 2004 CS 3243 - Blind Search 12

• Time? 1+b+b2+b3+… +bd + b(bd-1) = O(bd+1)

• Space? O(bd+1) (keeps every node in memory)

• Optimal? Yes (if cost = 1 per step)

• h1 : Number of misplaced tiles

P(n) : Sum of Manhattan distances of each tile from

[acharya mukerjee 01]

• Classification of examples is positive (T) or negative (F)

May have many parses.

Try to find a model or theory:

[Bishop 06] ch.1

φi(x) : basis function

Linear : function is linear in the weights

Root-Mean-Square (RMS) Error:

Penalize large coefficient values

outcome = “possible world”

Event e = set of possible worlds, e ⊆ Ω

- Probability p(e) : "the fraction of possible worlds in

- Frequentist view : p(e) = limit as N → ∞

- unit sum p(Ω) = 1

- additive : if e1, e2 are disjoint events (no common

Joint Probability Conditional Probability

posterior  likelihood × prior

as new evidence comes in --> prob knowledge improves.

Hence: initial estimate (prior belief P(h), not well formulated)

Given: red box is picked

P(o) = (¾*.6 + 1/3*0.4) = 11/20

P(B|o) = ¾ * .6 * 20/11 = 9/11

Frequentist approximation w unbiased sample

(both discrete / continuous)

Example: N uniform [0,1] random variables.

joint distribution P(x,y) varies considerably

estimating the joint distribution requires

marginals P(x), P(y) are gaussian

Gaussian in polar coordinates;

Successive Trials – e.g. Toss a coin three times:

general cubic polynomial for D dimensions : O(D3) parameters

At higher dim, vol(sphere) / vol(hypercube)  0

Determine by minimizing sum-of-squares error,

Determine by minimizing regularized sum-of-squares error,

MAP = Maximum Posterior

Guesser: Is it my left big toe?

Guesser: Is it Aunt Lakshmi?

Exam paper showing: could be 100, could be zero.

surprisal (r.v. = x) = - log2 p(x)

Structure in data  easy to remember

Entropy maximized when

All states equally likely

Is the first letter between A and M?

question entropy = p(Y)logp(Y) + p(N)logP(N)

For both answers equiprobable:

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

P(o) = (¾.6 + 1/30.4) = 11/20