0% found this document useful (0 votes)
198 views7 pages

cs188 Fa24 hw9

CS 188 Fall 24

Uploaded by

sara172425
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
198 views7 pages

cs188 Fa24 hw9

CS 188 Fall 24

Uploaded by

sara172425
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

CS 188 Introduction to

Fall 2024 Artificial Intelligence Written HW9

Due: Tuesday 11/12 at 11:59pm.


Policy: Can be solved in groups (acknowledge collaborators) but must be submitted individually.

Make sure to show all your work and justify your answers.
Note: This is a typical exam-level question. On the exam, you would be under time pressure, and have to complete this
question on your own. We strongly encourage you to first try this on your own to help you understand where you currently
stand. Then feel free to have some discussion about the question with other students and/or staff, before independently
writing up your solution.

Note: Leave the self-assessment sections blank for the original submission of your homework. After the homework dead-
line passes, we will release the solutions. At that time, you will review the solutions, self-assess your initial response, and
complete the self-assessment sections below. The deadline for the self-assessment is 1 week after the original submission
deadline.
Your submission on Gradescope should be a PDF that matches this template. Each page of the PDF should align with the
corresponding page of the template (page 1 has name/collaborators, question begins on page 2.). Do not reorder, split,
combine, or add extra pages. The intention is that you print out the template, write on the page in pen/pencil, and then
scan or take pictures of the pages to make your submission. You may also fill out this template digitally (e.g. using a
tablet.)

First name

Last name

SID

Collaborators

1
CS 188 Introduction to
Fall 2024 Artificial Intelligence Written HW9

Q1. [12 pts] Course Evaluations


Every semester we try to make CS 188 a little better. Let 𝑆𝑡 represent the quality of the CS 188 offering at semester 𝑡, where
𝑆𝑡 ∈ {1, 2, 3, 4, 5}. Changes to the course are incremental so between semester 𝑡 and 𝑡 + 1, the value of 𝑆 can only change by at
most 1. Each possible transition occurs with equal probability. As examples: if 𝑆𝑡 = 1, then 𝑆𝑡+1 ∈ {1, 2} each with probability
1∕2. If 𝑆𝑡 = 2, then 𝑆𝑡+1 ∈ {1, 2, 3} each with probability 1∕3.

Let 𝐸𝑡 ∈ {+𝑒, −𝑒} represent the feedback we receive from student evaluations for the semester 𝑡, where +𝑒 is generally positive
feedback and −𝑒 is negative feedback. Student feedback is dependent on whether released assignments were helpful for the
given semester (𝐴𝑡 ∈ {+𝑎, −𝑎}) which is dependent on the quality of the semester’s course offering (𝑆𝑡 ). Additionally, student
evaluations are also dependent on events external to the class (𝑋𝑡 = {+𝑥, −𝑥}).

The following HMM depicts the described scenario:

𝑆0 𝑆1 ... 𝑆𝑡−1 𝑆𝑡

𝑋0 𝐴0 𝑋1 𝐴1 𝑋𝑡−1 𝐴𝑡−1 𝑋𝑡 𝐴𝑡

𝐸0 𝐸1 𝐸𝑡−1 𝐸𝑡

(a) Consider the above dynamic bayes net which ends at some finite timestep 𝑡. In this problem, we are trying to approximate
the most likely value of 𝑆𝑡 given all the evidence variables up to and including 𝑡. For each of the following subparts, first
decide whether the given method can be used to solve this problem. Then, if yes, select all CPTs which must be known
to run the algorithm.
(i) [1 pt] Variable elimination
# No # Yes: □ 𝑃 (𝑆0 ), 𝑃 (𝑆𝑡 |𝑆𝑡−1 ), 𝑡 > 0 □ 𝑃 (𝐸𝑡 |𝑋𝑡 , 𝐴𝑡 )∀𝑡 □ 𝑃 (𝐴𝑡 |𝑆𝑡 )∀𝑡 □ 𝑃 (𝑋𝑡 )∀𝑡
(ii) [1 pt] Value iteration
# No # Yes: □ 𝑃 (𝑆0 ), 𝑃 (𝑆𝑡 |𝑆𝑡−1 ), 𝑡 > 0 □ 𝑃 (𝐸𝑡 |𝑋𝑡 , 𝐴𝑡 )∀𝑡 □ 𝑃 (𝐴𝑡 |𝑆𝑡 )∀𝑡 □ 𝑃 (𝑋𝑡 )∀𝑡
(iii) [1 pt] Gibbs sampling
# No # Yes: □ 𝑃 (𝑆0 ), 𝑃 (𝑆𝑡 |𝑆𝑡−1 ), 𝑡 > 0 □ 𝑃 (𝐸𝑡 |𝑋𝑡 , 𝐴𝑡 )∀𝑡 □ 𝑃 (𝐴𝑡 |𝑆𝑡 )∀𝑡 □ 𝑃 (𝑋𝑡 )∀𝑡
(iv) [1 pt] Prior sampling
# No # Yes: □ 𝑃 (𝑆0 ), 𝑃 (𝑆𝑡 |𝑆𝑡−1 ), 𝑡 > 0 □ 𝑃 (𝐸𝑡 |𝑋𝑡 , 𝐴𝑡 )∀𝑡 □ 𝑃 (𝐴𝑡 |𝑆𝑡 )∀𝑡 □ 𝑃 (𝑋𝑡 )∀𝑡
(v) [1 pt] Particle Filtering
# No # Yes: □ 𝑃 (𝑆0 ), 𝑃 (𝑆𝑡 |𝑆𝑡−1 ), 𝑡 > 0 □ 𝑃 (𝐸𝑡 |𝑋𝑡 , 𝐴𝑡 )∀𝑡 □ 𝑃 (𝐴𝑡 |𝑆𝑡 )∀𝑡 □ 𝑃 (𝑋𝑡 )∀𝑡

2
(b) For the HMM shown above, determine the correct recursive formula for the belief distribution update from 𝐵(𝑆𝑡−1 ) to
𝐵(𝑆𝑡 ). Recall that the belief distribution 𝐵(𝑆𝑡 ) represents the probability 𝑃 (𝑆𝑡 |𝐸0∶𝑡 ) and involves two steps: (i) Time
elapse and (ii) Observation update.

𝐵(𝑆𝑡 ) ∝ (ii) ⋅ (i)

(i) [1 pt] Time elapse


∑ ∑∑
# 𝑃 (𝑆𝑡 |𝑆𝑡−1 )𝐵(𝑆𝑡−1 ) # 𝑃 (𝑆𝑡 |𝑆𝑡−1 )𝑃 (𝐴𝑡−1 |𝑆𝑡−1 )𝐵(𝑆𝑡−1 )
𝑆𝑡−1 𝑆𝑡−1 𝐴𝑡−1
∑ ∑∑
# 𝑃 (𝑆𝑡 |𝑆𝑡−1 )𝑃 (𝐴𝑡 |𝑆𝑡 )𝐵(𝑆𝑡−1 ) # 𝑃 (𝑆𝑡 |𝑆𝑡−1 )𝑃 (𝐴𝑡−1 |𝑆𝑡−1 )𝑃 (𝐴𝑡 |𝑆𝑡 )𝐵(𝑆𝑡−1 )
𝑆𝑡−1 𝑆𝑡−1 𝐴𝑡−1

(ii) [1 pt] Observation update

# 𝑃 (𝐸𝑡 |𝑋𝑡 , 𝐴𝑡 ) # 𝑃 (𝐸𝑡 |𝑋𝑡 , 𝐴𝑡 )𝑃 (𝑋𝑡 )𝑃 (𝐴𝑡 |𝑆𝑡 )


∑ ∑ ∑ ∑
# 𝑥∈𝑋𝑡 𝑎∈𝐴𝑡 𝑃 (𝐸𝑡 |𝑥, 𝑎) # 𝑥∈𝑋𝑡 𝑎∈𝐴𝑡 𝑃 (𝐸𝑡 |𝑥, 𝑎)𝑃 (𝑥)𝑃 (𝑎|𝑆𝑡 )
∏ ∏ ∏ ∏
# 𝑥∈𝑋𝑡 𝑎∈𝐴𝑡 𝑃 (𝐸𝑡 |𝑥, 𝑎) # 𝑥∈𝑋𝑡 𝑎∈𝐴𝑡 𝑃 (𝐸𝑡 |𝑥, 𝑎)𝑃 (𝑥)𝑃 (𝑎|𝑆𝑡 )

Due to the differences between CS188 offerings in the fall and spring semesters, we realize that only student evaluations from
past fall semesters are accurate enough to be incorporated into our model. Assume that a fall semester occurs during an even
timestep and that 𝑡 is even in the diagram below. The new HMM can be repesented as follows:

𝑆0 𝑆1 ... 𝑆𝑡 𝑆𝑡+1 𝑆𝑡+2

𝑋0 𝐴0 𝑋1 𝐴1 𝑋𝑡 𝐴𝑡 𝑋𝑡+1 𝐴𝑡+1 𝑋𝑡+2 𝐴𝑡+2

𝐸0 𝐸1 𝐸𝑡 𝐸𝑡+1 𝐸𝑡+2

(c) [2 pts] In this question, we are trying to derive a recursive formula for the two-step belief distribution update from 𝐵(𝑆𝑡 )
to 𝐵(𝑆𝑡+2 ) for the new problem described above. Which of the following steps represent the correct and most efficient
method of performing HMM updates to get the belief distribution at 𝑆𝑡+2 from the current belief at 𝑆𝑡 ?
For the following notation, let 𝐵(𝑆𝑡 ) = 𝑃 (𝑆𝑡 |𝐸0∶𝑡∶2 ) and 𝐵 ′ (𝑆𝑡 ) = 𝑃 (𝑆𝑡 |𝐸0∶𝑡−1∶2 ) where 𝐸0∶𝑖∶2 represents the set
of all evidence variables at even timesteps up to 𝑖. Further, let 𝑂(𝐸𝑡 ) represent the value of the observation update
expression from the previous part (ii). (Note that 𝑂(𝐸𝑡+1 ) and 𝑂(𝐸𝑡+2 ) would represent the appropriate observation
update expressions for timestep 𝑡 + 1 and 𝑡 + 2 respectively.)
∑ ∑ ∑
# 𝐵 ′ (𝑆𝑡+1 ) = 𝑆𝑡 𝑃 (𝑆𝑡+1 |𝑆𝑡 )𝐵(𝑆𝑡 )

# 𝐵 ′ (𝑆𝑡+2 ) = 𝑆𝑡+1 𝑆𝑡 𝑃 (𝑆𝑡+2 |𝑆𝑡+1 )𝑃 (𝑆𝑡+1 |𝑆𝑡 )𝐵(𝑆𝑡 )
𝐵 ′ (𝑆𝑡+2 ) = 𝑆𝑡+1 𝑃 (𝑆𝑡+2 |𝑆𝑡+1 )𝐵 ′ (𝑆𝑡+1 ) 𝐵(𝑆𝑡+2 ) ∝ 𝑂(𝐸𝑡+2 )𝐵 ′ (𝑆𝑡+2 )
𝐵(𝑆𝑡+2 ) ∝ 𝑂(𝐸𝑡+2 )𝐵 ′ (𝑆𝑡+2 )

# 𝐵 ′ (𝑆𝑡+1 ) = 𝑆𝑡 𝑃 (𝑆𝑡+1 |𝑆𝑡 )𝐵(𝑆𝑡 )
∑ 𝐵(𝑆𝑡+1 ) ∝ 𝑂(𝐸𝑡+1 )𝐵 ′ (𝑆𝑡+1 )
# 𝐵 ′ (𝑆𝑡+1 ) = 𝑆𝑡 𝑃 (𝑆𝑡+1 |𝑆𝑡 )𝐵(𝑆𝑡 )


𝐵 ′ (𝑆𝑡+2 ) = 𝑆𝑡+1 𝑃 (𝑆𝑡+2 |𝑆𝑡+1 )𝐵(𝑆𝑡+1 )
𝐵(𝑆𝑡+1 ) = 𝐸𝑡+1 𝑂(𝐸𝑡+1 )𝐵 ′ (𝑆𝑡+1 )
∑ 𝐵(𝑆𝑡+2 ) ∝ 𝑂(𝐸𝑡+2 )𝐵 ′ (𝑆𝑡+2 )
𝐵 ′ (𝑆𝑡+2 ) = 𝑆𝑡+1 𝑃 (𝑆𝑡+2 |𝑆𝑡+1 )𝐵(𝑆𝑡+1 )
𝐵(𝑆𝑡+2 ) ∝ 𝑂(𝐸𝑡+2 )𝐵 ′ (𝑆𝑡+2 ) # None of the above.

3
(d) Now consider a scenario where instead of getting one general student feedback as evidence (𝐸𝑡 ), we instead get individual
student feedback from 100 students. Let the variable 𝐸𝑡,𝑛 represent the evidence from student 𝑛 at timestep 𝑡. Assume
that the new evidence variables (𝐸𝑡,𝑛 ) can each take on the value of +𝑒 or −𝑒 with the same probability distribution as the
single variable case (𝐸𝑡 ).

... 𝑆𝑡 ...

𝑋𝑡 𝐴𝑡

𝐸𝑡,1 𝐸𝑡,2 ... 𝐸𝑡,99 𝐸𝑡,100

(i) [2 pts] Which of the following statements are true regarding this new setup?
□ The evidence variables within the same timestep are independent of each other (𝐸𝑡,𝑗 ⟂⟂ 𝐸𝑡,𝑘 ∀𝑗 ≠ 𝑘).
□ The evidence variables between any two different timesteps are independent of each other (𝐸𝑡1 ,𝑗 ⟂⟂
𝐸𝑡2 ,𝑘 ∀𝑡1 ≠ 𝑡2 ).
□ The expression to calculate the time elapse step from 𝑆𝑡 to 𝑆𝑡+1 for this new setup will be the same as the
time elapse expression in the case of one evidence from Q3.6.
# None of the above.
(ii) [1 pt] In the observation update at timestep 𝑡, we receive as evidence 60 positive evaluations (+𝑒) and 40 negative
evaluations (−𝑒). Let 𝑥 be the observation update probability at timestep 𝑡 of observing one positive evaluation
(𝑥 = 𝑂(𝐸𝑡 = +𝑒)). Now, let 𝑓 (𝑥) represent the new observation update expression for the case with the observed 100
evidence variables. Which of the following functions 𝑓 gives the correct observation update for the new scenario?
For this part only, regardless of your previous answer, please assume that each students’ feedback is independent
of each other.
# 𝑓 (𝑥) = 𝑥100
# 𝑓 (𝑥) = 60𝑥 ⋅ 40(1 − 𝑥)
# 𝑓 (𝑥) = 𝑥60 ⋅ (1 − 𝑥)40
# None of the above.

4
Q1 Self-Assessment - leave this section blank for your original submission. We will release the solutions to this problem
after the deadline for this assignment has passed. After reviewing the solutions for this problem, assess your initial response
by checking one of the following options:

# I fully solved the problem correctly, including fully correct logic and sufficient work (if applicable).
# I got part or all of the question incorrect.

If you selected the second option, explain the mistake(s) you made and why your initial reasoning was incorrect (do not re-
iterate the solution. Instead, reflect on the errors in your original submission). Approximately 2-3 sentences for each incorrect
sub-question.

5
Q2. [5 pts] Learning to Act
In lecture and discussion, we have mainly used the Naive Bayes algorithm to do binary classification, such as classifying whether
an email is spam. However, we can also use Naive Bayes to learn how to act in an environment. This problem will explore
learning good policies with Naive Bayes and comparing them to policies learned with RL.

We consider the following one-dimensional grid world environment with three squares, named 𝐴, 𝐵, and 𝐶 from left to right.
Pacman has two possible actions at each square: left and right. Taking the left action at square 𝐴 or the right action at square 𝐶
will transition to a terminal state 𝑥 where no further action can be taken. At each timestep, Pacman observes his own position
(𝑠𝑝 ) as well as the ghost’s position (𝑠𝑔 ), and he uses these observations to decide on an action.

𝑠𝑔 𝑠𝑝

(a) In this part, Pacman has no idea about the transition probabilities of the ghost or the rewards it gets. However, Pacman
has access to an expert demonstration dataset, which gives reasonably good actions to take in a number of scenarios. The
dataset is divided into training, validation, and test datasets. The following is the training set of the dataset.

𝑠𝑝 𝑠𝑔 𝑎

𝐵 𝐶 left

𝐶 𝐴 left

𝐴 𝐵 left

𝐶 𝐶 right

𝐵 𝐴 right

𝐶 𝐵 right

(i) [1 pt] Using the standard Naive Bayes algorithm, what are the maximum likelihood estimates for the following con-
ditional probabilities (encoded in the Bayes Net)?
𝑃 (𝑠𝑝 = 𝐶 | 𝑎 = left) =
𝑃 (𝑠𝑝 = 𝐴 | 𝑎 = right) =
𝑃 (𝑎 = left) =

6
(ii) [2 pts] Using the standard Naive Bayes algorithm, which action should we choose in the following new scenarios?
𝑠𝑝 = 𝐴, 𝑠𝑔 = 𝐶 # Left # Right
𝑠𝑝 = 𝐶, 𝑠𝑔 = 𝐵 # Left # Right

(iii) [2 pts] Suppose we want to add Laplace smoothing with strength 𝑘 in the Naive Bayes algorithm. (There is no
smoothing when 𝑘 = 0.) Which of the following are true?
□ To find the optimal value of 𝑘, we pick the value of 𝑘 which gives the highest accuracy on the training set.
□ To find the optimal value of 𝑘, we pick the value of 𝑘 which gives the highest accuracy on the validation
set.
□ To find the optimal value of 𝑘, we pick the value of 𝑘 which gives the highest accuracy on the test set.
□ If 𝑘 = 0, we may observe low accuracy on the test set due to overfitting.
□ If 𝑘 is a very large integer, the posterior probability for each action will be close to 0.5.
# None of the above.
Q2 Self-Assessment - leave this section blank for your original submission. We will release the solutions to this problem
after the deadline for this assignment has passed. After reviewing the solutions for this problem, assess your initial response
by checking one of the following options:

# I fully solved the problem correctly, including fully correct logic and sufficient work (if applicable).
# I got part or all of the question incorrect.

If you selected the second option, explain the mistake(s) you made and why your initial reasoning was incorrect (do not re-
iterate the solution. Instead, reflect on the errors in your original submission). Approximately 2-3 sentences for each incorrect
sub-question.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy