Midterm Exam: CS 188 Introduction To Fall 2008 Artificial Intelligence
Midterm Exam: CS 188 Introduction To Fall 2008 Artificial Intelligence
• Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a
brief explanation. All short answer sections can be successfully answered in a few sentences at most.
• Question 0: Fill out the following grid and write your name, SID, login, and GSI at the top of
each subsequent page. (-1 points if done incorrectly!)
Last Name
First Name
SID
Login
GSI
Pacman and Ms. Pacman are lost in an N xN maze and would like to meet; they don’t care where. In each time
step, both simultaneously move in one of the following directions: {NORTH, SOUTH, EAST, WEST, STOP}.
They do not alternate turns. You must devise a plan which positions them together, somewhere, in as few
time steps as possible. Passing each other does not count as meeting; they must occupy the same square at
the same time.
(a) (4 points) Formally state this problem as a single-agent state-space search problem.
States:
Answer: The set of pairs of positions for Pacman and Ms. Pacman:
{((x1 , y1 ), (x2 , y2 )) | x1 , x2 , y1 , y2 ∈ {1, 2, . . . , N }}
Maximum size of state space:
(c) (3 points) Circle all of the following graph search methods which are guaranteed to output optimal
solutions to this problem:
(i) DFS
(ii) BFS
(iii) UCS
(iv) A* (with a consistent and admissible heuristic)
(v) A* (with heuristic that returns zero for each state)
(vi) Greedy search (with a consistent and admissible heuristic)
Answer: BFS, UCS, A* (with a consistent and admissible heuristic), A* (with heuristic that returns zero for
each state)
(d) (2 points) If h1 and h2 are admissible, which of the following are also guaranteed to be admissible? Circle
all that apply:
(i) h1 + h2
(ii) h1 ∗ h2
(iii) max(h1 , h2 )
(iv) min(h1 , h2 )
(v) (α)h1 + (1 − α)h2 , for α ∈ [0, 1]
You are designing a menu for a special event. There are several choices, each represented as a variable:
(A)ppetizer, (B)everage, main (C)ourse, and (D)essert. The domains of the variables are as follows:
A: (v)eggies, (e)scargot
B: (w)ater, (s)oda, (m)ilk
C: (f)ish, (b)eef, (p)asta
D: (a)pple pie, (i)ce cream, (ch)eese
Because all of your guests get the same menu, it must obey the following dietary constraints:
(i) Vegetarian options: The appetizer must be veggies or the main course must be pasta or fish (or both).
(ii) Total budget: If you serve the escargot, you cannot afford any beverage other than water.
(iii) Calcium requirement: You must serve at least one of milk, ice cream, or cheese.
(a) (3 points) Draw the constraint graph over the variables A, B, C, and D.
A B
C D
(b) (2 points) Imagine we first assign A=e. Cross out eliminated values to show the domains of the variables
after forward checking.
A [ e ]
B [ w s m ]
C [ f b p ]
D [ a i ch ]
Answer: The values s, m, and b should be crossed off. “s” and “m” are eliminated due to being incompatible
with “e” based on constraint (ii). “b” is eliminated due to constraint (i).
(c) (3 points) Again imagine we first assign A=e. Cross out eliminated values to show the domains of the
variables after arc consistency has been enforced.
A [ e ]
B [ w s m ]
C [ f b p ]
D [ a i ch ]
Answer: The values s, m, b, and a should be eliminated. The first three are crossed off for the reasons above,
and “a” is eliminated because there is no value for (B) that is compatible with “a” (based on constraint (iii)).
(d) (1 point) Give a solution for this CSP or state that none exists.
Answer: Multiple solutions exist. One is A=e, B=w, C=f, and D=i.
(e) (2 points) For general CSPs, will enforcing arc consistency after an assignment always prune at least as
many domain values as forward checking? Briefly explain why or why not.
Answer: Two answers are possible:
Yes. The first step of arc consistency is equivalent to forward checking, so arc consistency removes all values
that forward checking does.
No. While forward checking is a subset of arc consistency, after any assignment, arc consistency may have
already eliminated values in a previous step that are eliminated in that step by forward checking. Thus,
enforcing arc consistency will never leave more domain values than enforcing forward checking, but on a given
NAME: SID#: Login: GSI: 5
step, forward checking might prune values than arc consistency by pruning values that have already been
pruned by arc consistency.
6
Consider the following zero-sum game, in which the utilities UA (s) are shown for the first player (A). Assume
the second player (B) is a minimizer: B holds the opposite utilities to A, UB (s) = −UA (s). In this case, B’s
maximization of UB is equivalent to minimization of UA (i.e. the computation is standard minimax).
(a) (2 points) In each node, write UA (s), the (minimax) utility of that state for player A, assuming that B is
a minimizer.
Answer: Displayed above.
(b) (3 points) Cross off any nodes which will be skipped by alpha-beta pruning, assuming left-to-right ordering.
Answer: Displayed above.
Assume now that B is not a minimizer, but a balancer. A balancer does not try to minimize A’s score, but
rather wishes the outcome of the game to be as balanced as possible. Formally, assume B’s utility for a state
s is defined as UB (s) = −|UA (s)|. The game tree is shown here, with hexagons indicating player B’s control.
NAME: SID#: Login: GSI: 7
(c) (3 points) In each node, write UA (s), the utility of that state for player A, assuming that B is a balancer.
Answer: Displayed above.
(d) (3 points) Write pseudocode for the functions which compute the UA (s) values of game states in the
general case of multi-turn games where B is a balancer. Assume you have access to the following functions:
successors(s) gives the possible next states, isTerminal(s) checks whether a state is a terminal state, and
terminalValue(s) returns A’s utility for a terminal state. Careful: As in minimax, be sure that both functions
compute and return player A’s utilities for states – B’s utility can always be computed from A’s utility.
Answer: Below. Note that for balanceValue(s), we must return the utility the maximizer’s perspective.
(h) (2 points) Consider pruning children of a B node in this scenario. On the tree on the bottom of the
previous page, cross off any nodes which can be pruned, again assuming left-to-right ordering.
(i) (2 points) Again consider pruning children of a B node s. Let α be the best option for an A node higher in
the tree, just as in alpha-beta pruning, and let v be the UA value of the best action B has found so far from s.
Give a general condition under which balanceValue(s) can return without examining any more of its children.
(a) (2 points) If for all i, ri = 1, pi = 1, and there is a discount γ = 0.5, what is the value V stay (1) of being
in city 1 under the policy that always chooses stay? Your answer should be a real number.
Answer: for all cities (states) i = 1, . . . , N , we have that the optimal value behaves as follows:
(remember, this is like the Bellman equation for a fixed policy). Plugging in values, we get V stay (i) = 1 +
0.5V stay (i). Now we can just solve for V stay (i) using algebra to obtain V stay (i) = 2. In particular, V stay (1) = 2.
(b) (2 points) If for all i, ri = 1, pi = 1, and there is a discount γ = 0.5, what is the optimal value V ∗ (1) of
being in city 1?
Intuitive Answer: since all the cities offer the same reward (ri = 1), there is no incentive to move to another
city to do business, so the optimal policy is to always stay, yielding V ∗ (1) = 2.
More Formal Answer:
For all cities (states) i = 1, . . . , N , writing out the Bellman equations:1
From this, we see that V ∗ (i) is the same for all i, so the max is obtained always with the stay action.
(c) (2 points) If the ri ’s and pi ’s are known positive numbers and there is almost no discount, i.e. γ ≈ 1,
describe the optimal policy. You may define it formally or in words, e.g. “always go east,” but your answer
1 For i = 1, omit the left action; for i = N , omit the right action.
NAME: SID#: Login: GSI: 9
should precisely define how an agent should act in any given state. Hint: You should not need to do any
computation to answer this question.
Basically Right Answer: the optimal policy is to always move towards the city with the highest reward. Once
there, stay there and do business forever.
Technical Answer: The only complication is due to possible ties. Let r∗ = max1≤i≤n ri be the maximum reward
out of all the cities. The optimal policy from city i is as follows: if ri = r∗ , stay; otherwise, move towards the
closest city j that has rj = r∗ , where distance between i and j > i is the the expected number of moves to get
Pj−1
there ( k=i 1/pk ).
Suppose we run value iteration. Recall that Vk (s) is the value of state s after k rounds of value iteration and
all the values are initialized to zero.
(d) (2 points) If the optimal value of being in city 1 is positive, i.e. V ∗ (1) > 0, what is the largest k for which
Vk (1) could still be zero? Be careful of off-by-one errors.
Answer: Assuming ri > 0, then the largest k is 0, because V1 (s) = max{ri + 0, · · · } > 0.
(Intended) Answer: If we don’t assume ri > 0, then the largest k is N − 1. Proof: since V ∗ (1) > 0, at least one
of the ri ’s must be strictly positive. After one iteration, V1 (i) > 0; after two iterations, V2 (i − 1) > 0; finally
after i iterations, Vi (1) > 0. In the meantime, if all rj = 0 for j < i, then Vj (1) = 0 for all j < i. In the worst
case, i = N , so VN −1 (1) = 0 is possible, but VN (1) > 0.
(e) (2 points) If all of the ri and pi are positive, what is the largest k for which Vk (s) could still be zero for
some state s? Be careful of off-by-one errors.
Answer: Since ri > 0, the largest k is 0, because V1 (s) = max{ri + 0, · · · } > 0.
Suppose we don’t know the ri ’s or the pi ’s, so we decide to do Q-learning.
(f ) (3 points) Suppose we experience the following sequence of states, actions, and rewards: (s=1, a=stay,
r=4), (s=1, a=east, r=0), (s=2, a=stay, r=6), (s=2, a=west, r=0), (s=1, a=stay, r=4, s=1). What are the
resulting Q(s, a) values if the learning rate is 0.5, the discount is 1, and we start with all Q(s, a) = 0? Fill in
the table below; each row should hold the q-values after the transition specified in its first column. You may
leave unchanged values blank.
Before you guess the word, two of the three letters will be revealed to
you. In the first round of the game, you choose one of X, Y or Z to be
revealed. In the second round, you choose one of the remaining two letters to
be revealed. In the third round, you guess the word. If you guess correctly,
you win. The utility of winning is 1, while the utility of losing is 0.
You watch the game a lot and determine that the eight possible words
occur with the probabilities shown on the right. Your goal is to act in such
a way as to maximize your chances of winning (and thereby your expected
utility).
(a) (3 points) What is the distribution P(Y, Z)? Your answer should be in the form of a table.
Answer:
P(X=c,Y=a)=0.2
P(X=c,Y=o)=0.4
P(X=l,Y=a)=0.2
P(X=l,Y=o)=0.2
(b) (2 points) Are the second and third letters (Y and Z) independent? Show a specific computation that
supports your claim.
Answer: No, since P(X=c) = 0.6, P(Y=a) = 0.4 but P(X=c,Y=a)=0.2 which is not P(X=c)P(Y=a)=0.24
(other counterexamples exist too)
(c) (2 points) Are the second and third letters (Y and Z) independent if you know the value of the first letter
(X)? Show a specific computation that supports your claim.
Answer: Yes. P (Y = a, Z = b|X = c) = P (X = c, Y = a, Z = b)/P (X = c) = 1/6.
P (Y = a|X = c) = (0.1 + 0.1)/0.6P (Z = b|X = c) = (0.1 + 0.2)/(0.6) = 1/2.
Thus, P (Y = a, Z = b|X = c) = 1/6 = P (Y = a|X = c)P (Z = b|X = c). To be certain, you have to also check
for all pairs (not required for full credit). Alternatively, you can show that P (Y |X, Z) = P (Y |X)
12
Suppose that in the first round, you ask about X and are told that X = c. It is the second round and you can
now either ask the host to reveal Y or to reveal Z.
(d) (2 points) If you ask the host to reveal Y, what is the probability that you will win in the third round?
Answer: Since Y and Z are independent conditioned on X, no matter what Y comes out to be, P (Z = b|X =
c, Y ) will be 0.5 Thus, you’ll guess arbitrarily and win with probability 0.5
(e) (1 point) What letter should you ask the host about in the second round to maximize your chance of
winning, Y or Z?
Answer: Z, since you’ll be able to win 2/3 of the time (see part f)
(f ) (3 points) What is your expected utility if you act optimally from the state where X=c?
Answer: Since Y and Z are conditionally independent given X, knowing Z won’t give you any additional
information about Y . So, you’ll guess the most likely value of Y given X = c, which is o since P (Y = o|X =
c) = 2/3 and win 2/3 of the time.
(g) (2 points) Suppose that the host is allowed to pick any distribution over the three variables but has to
tell you what the distribution is before the game starts. What distribution should the host pick to minimize
your chances of winning? Justify your answer briefly.
Answer: Uniform: you need some distribution where each letter has 50% chance of occuring and the values of
each variable are independent of the others, so knowing the revealed values doesn’t give any information about
the hidden one.