0% found this document useful (0 votes)

72 views

Deterministic Dynamic Programming

Gghhh

Uploaded by

Ozan Şimşek

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

0% found this document useful (0 votes)

72 views

Deterministic Dynamic Programming

Gghhh

Uploaded by

Ozan Şimşek

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

You are on page 1/ 23

Dynamic Programming Dynainie programming (DP) is the most important techniqne to solve many aptimeation prob. Jems, In most applications, dynamic programming obtains solutions by working backwards from the end of a problem to the beginning, in such @ way that a large, complicated problem is broken up into a series of smaller, nnore tractable problems, 1 Simple examples of dynamic programming 1.1 Which is the heaviest coin We have 21 coins and are told that all the coins are of the same weight except one is heavier than any of the other com. How many weighings on a balance will it take to find the heaviest coin? Analysis: ‘The answer of the question is not so clear at first glance. However, let us think backwards. What is the maximum number of coin we should have left in ofelor for the tact weighing ‘to successfully tell the heaviest coin. This is an easy question to answer: 3 coins. We only need to pick two of three coins and weigh them on the balance. If they are of the same weight, then the fone coin left is the heaviest coin, If they are not. the heavier one is the heaviest enin in tho meh However, if we have 4 or more coins left, we eamnot always tell which one is the heaviest if only one ‘weighing is let. Now let us go back ove step and ask what is the maximum number of coin we should have left in order for the last two weighing to successfully tell the heaviest coin, Suppose we have m <9 coins loft, What we can do is to divide the coin into three groups (a, 4,8) such that a+a-+b—m, and a < 3, b <3. Then we weight the first two groups on the balance. If they have the sane weight, then the heaviest coin is in the group of 6 coins. If they are not, then the heaviest coin is in the heavier group of a coins. Bither way, since a < 3, <3, we can always tell the heaviest coin in the last weighing. Now assume that we have m > 10 coins, and we divide into three groups (a,a,0). A weighing on balance will tell which group the beaviest coin belongs to, No matter how wo do it, there is at least a group with at least 4 coins. If it turns out that the heavier ooin is in that group, we are in trouble, because the last weighing cannot tell the heaviest coin. ‘Therefore, ‘we can tell the heaviest coin trom at most 9 coins, and cannot not tel if there are at least 10 coins. Go on step futher, it is easy 10 sco that if we have at most 3-9 = 27 coins, we can tell the heaviest one in three weighings. Therefore, for 21 coins, we need at most 3 weighings. Not only does the above procedure gives the number of weighing need, it also yields how should the weighings go: We will divide the coins into three groups (a,a,8) such that a < 9,b-< 9. Say we take a= b= 7, The first weighing will tell us the group the heaviest coin belongs to. ‘Then we divide that group (ol T cote) nw cree groups (2,2, 3) OF (3,4, 1). “Ihe second weighing will tell which group the heaviest coin belong to, and the last weighing will only need to tell the heaviest coin from at most S coinsRemark: ‘The above procedure indeed solves a more general problem: suppose we have n > 2 coins to start with, then the number of weighings needed to tell the heaviest coin is k with 31 colin ta) = We can view g(z) = terminal cost if the terminal state ayy is 2 eale,u) = running cost if at stage n, a decision tq = wis made at state ay = 2. In the example of shortest path, the running cost cq(z,u) is nothing but the distance from 2 to destination determined by the control u. There is no terminal cost involved. Remark: For American option pricing, the situation is a bit subtle sinoe if a decision of “liqui- dation” is made at any stage, the stock price (the state) after that stage does not matter a bit. Therefore the problem cannot directly fit inta the enst structure swe montianod shows, However, the idea to solve the problem using DPE (which we will describe below) remains the same. We will come back to this later in the section of Probabilistic Dynamic Programming, Remark: Someumes the above cost criteria 1s called additive. OF course one can consider a multiplicative (or even mixed) cost criteria, A general multiplicative cast reads a(ens1)- T] en(ans¥m)- ‘The analysis of Uh cost is exactly the same as that of the additive cost criteria. ‘The value function is defined as Vo(z) = min [stews + oem ~ given 2 = To solve for Vo, we will espand the problem and work backwavds. For notation convenience, we will denote by x ¥sce) = mg [rer + Dewltaun)| — given a; = for 5 V4.1, with convontion that EM,Dynamic Programming Equation (DPE) It fs not difficult to see that, Vvs1(2) = g() for any ¢ by definition. Furthermore, Cis intuitive dat «hee functions Vj should satisfy the following equation (DPE) Vi(e) = min [ey(2,u) + Vjrlfple,u))], for every 2. ‘This equation is called the dynamic programming equation (DPE). Knowing Vy-.1, ‘one can recursively solve all Vj. The interpretation of DPE is also clear. Suppose at stage j, the systom is at atato ay — a. Choosing o control uy = ty at stage jy the state will become 8441 = Filey) = filayu). Theretore the UE’ says “The minimurn of the east from stage j to the end of the problem must be attained by choosing at stage j a decision that minimize the ‘sum of the costs incurred during the current stage plus the minimum cost that can te tucurrat from stage J+ 4 t0 the end of the problem.” We give a short (not so rigorous) proof below. Proofs Given 5 = 2, fix an arbitrary control uy = wu. For any conteol IT = {ujsi,-++ st} afterwards, by definition, x V46#) S of(t,u) + D> enltes ta) + 9LEN4A). wait Now minimize the right-hand.side aver 17 = {o1j,4,-++ jy}, wo have Vj(@) S oy(a,u) + Visa file, )). ‘But remember that ws arbitrary, we have Vi{e) < min les(z,u) + Vjsr(fi(@,u))]- On the other hand, if we use an optimal control uf = u* at stage j and afterwards ‘optimal control U* = (uf.,,--- ,ujy}, then we have a Vila) = es(eyu") + YD en(tnsuh) + (ena) = eg(asu’) + Vjaalflavut)) “SoH > min (ej(2, 4) + Vjs(f(2,0))] ‘These two inequalities yield the (DPE). Furthermore, from the penn we verily that “the optimal control at stage j is the control u* that achieving the minimum in the RHS of the DPE” Remarks Nuve that we DPB aciually solve Vo¢r) for any 2, tn practice 29 =o is a specific value, but which is not important since one only need to plug this specific initial state into function Vo to obtain the value of optimization problem associated with this specific zo. 6Remark: It is not hard to believe that forthe multiplicative cost eriterta,oue can similarly define V; (with convention [])*? = 1), and the DPE will become (DPE) Vile) = min{os(e,u)-Vealfilesu))], for every 2, with Vysa(e) of the DPE. (x). Again, the optimal control at stage j is the minimizing u* in the RUS Remark: The above procedure remains true in a maximization problem — just replace all the “min? above by “man” 2.1 Examples of DP Example: The owner of a lake must decide how many bass to catch and sell at the beginning of each year. Assume that the market demand for bass is unlimited. If 2 is the bass sold in year 1m, w revenue of r(2) is earned. The cost to catch « bass is e(x.B). where b is the nnmhor of bass in the lake af the beginning of the year. The mumber of buss in the lake at the beginning of a year is 20% greater than the number in the lake at the end of the previous year. The ‘owner is interested in maximizing the overall profit over the next N’ years DP Formulation: In this case, the stage is naturally defined as the time n, which varies from 1 to NV. The state variable is the mimber of bass at the beginning of year n, denoted by sq. The ecision variable at year n (control) is the number of bass caught, denoted by ty. ‘The dynamic for the system is ont —A8(en—an)y WH ‘The objective is to maximize the overall profit x v(s) = max) [r(tn) ~eltmsn)], given 8} oh at ‘The dynamic programming will work backwards: define x ‘u(s) mer) [r(zn) ~e(tn,S5n}], given 8; = 3, shee for j= 1+. and define vy4i(s) = 0 for any « (note the terminal cost in the optimization problem i 0). Clearly the quantity of interest is v = vy, ‘The dynamic programming equation (DPE) can be written as 4j(s) = ee Ir(x) —e(x, 8) + v441(1.2(s ~ 2))), 12-6, Since vy 41 = 0, one can work backwards to solve forall v;. At year 7, the optimal amount of hase to sell 25 is the maximizing x in the above DPE, given thet sj = (i. at the beginning of year j the munber of bass in the lake is 3). aRemark: A weakness of the previous formulation is that the profits received during later years are weighted the same as profits received during earlier years. Now we consider the following discounting factor 0 << 1: $1 received in year 7 + 1 38 equivalent to 8 dollar received in year j. Then the optimization becomes (abusing the notation a bit) w (0) = moe SOA" Gen) clemson) sy givea on eel ZY rly we define Ny wy(e) = max D8 Tru) ~ clea sa)]y given 9 = 1yj can be interpreted as the optimal profit from year j to year NV (valued by the dollar in year j). Clearly v1 = 0. The DPE becomes ¥s(6) = guar [r(2) ~ ofe, a) + nj1(1.2(s — 2))] and wysi(s) = 0. n Example: Farmer Jones now possesses $5000 and 10 tons of wheat. During month j, the price of wheat is pj (assumed known}. During each month, he must decide how much wheat to buy oF to sell, ‘These ave Guy reatcictions vt wach month's wheat transaction: (L) Dating ‘ay month, the amount of money spent on wheat cannot exceed the cash on hand at the beginning of the month; (2) during any month, he cannot sell more wheat than he kas at the Beginning of the month, (Q) becouse uf lnuieal wagehuusw capacity, the ending mnventory of ‘wheat for each month cannot exceed 10 ton, Show how dynamic programming can be utilized to maximize the amount of cash farmer Jones has on hend at the end of three months. Formulation: Again, time's the stage, At the beginning of month n {the present is the beginning of month 1), farmer Jones must decide how much wheat to buy or sell. We will denote by um the change (i.e. the contro) in Jones* wheat position during month ni tu 20 corresponds to a mouth of wheat purchase, and tix <0 to a mouth of wheat sale. The state at the beginning of month n Js the amount of wheat on hand, denoted by wa, and the cash on hand, denoted by cq. To ease notation, write 5 — (wyén). The optimization is to maximize the cash gain during the 3 month period. v(s) under the constraints (1)-(3) a oo Prtiny given 6, As before, define 7 (a) = ay “Prthny given 8s with (o) 8. Note that v= vs The DPD thes th x yla) = (-pyu-+ yan] forall s = (w,¢): max mwsucin i0-w2/pj)here ¥ = (1,6) with wt FE c~pyu That the control u io conotrained in the interval —w —w; (2) cannot store more than 10 tons of wheat gives u +w < 10 or ew. “The DPE ean recursively determine all vy. For example, vals) = [-pyu + va(3)] = paw, with maximizing u* = —w. max messing) ‘That is in month 3 Jones should sell all the wheat. vy and vy can also be determined in the same ‘manner, ‘A special case is that pi > p2 2 po, in which case it is obviously optimal to sell all the wheat in the beginning of month 1. ‘This is verified by the DPE. Under this eizeumstance, we have 2 rcucnta tien) PF MEN = [+a + pa(w-+w)| = paw, wheal -wsecelatt wai) PET I= cca ein {piu + pa(w + w) swith moximinings —w. Fos the spevifiy tial state 9 = 5 = (10,9000), the optimal cash gain is uy(s) = pw — 10p,, and Jones should sell the wheat immediately. His overall cash on hand at the end of month 3 is 5000 + 10p,, Anothor opocial eave is pi X pa $ ps- ‘The obvivusly upllmel polly ty w buy as much wheat as the cash and warehouse capacity will allow, and sell them in month 3. This is also confirmed by the DPE. In this case, (3) = [peu + vs(3)] [pew + p(w + u)] amex. max ~wsumin{10—we/pa) wsusinin(10-wefpa) paw + (ps ~ Po) mnin{10 — w,¢/pa}s with maxinnizing w nin{10— w,¢/p2}, and nails) 1c BES eppy OPE ODL auc gp) OPE Pale + 1) + (Po — Ba) nO —w — ue pr)/p}] Howover, the term to be maxunized equals Flu) = piu palw +1) + (Py—pa)min{10~ wu, (e~ prv)/pa} Raw + anin{(10 — w)(p3 ~ Pa) + (Pa ~ Pr}t, (Ps ~ Po} pe + uly — pi}pa/Paty which is an increasing function of u. Therefore, tals) = pow + min{ (10 ~ w)(ps ~ pa) + (Ba mi)u", e(Pa ~a)/va + w'(ba~ pa)Ps/ pa} with wt = min{10—w,e/pi}. oOExercise: Can you formulate an LP to solve this maximization problem? Example: It’s the last weekend of the 1996 campaign, and candidate Blaa Blaa is in NY. Before election day, he must visit Yam, Vallas, and Chicago and then return to his NY headquar- ters. Blaa Blaa wants to minimize the total distance he must travel, In whut order he should visit the cities? NY_Miami Dallas Chicago NY | - 1334 1559 809 Miami | 1334-1343 1397 Dallas | 1589 1343 - 2 Chicago | 8091397 __ 921 = Solution: Ibis not obvious how to work this problem into the structure of dynamic programming, ‘The stage j is easily defined as the j-th stop of the trip, with j = 0 as the starting point and j = 4 as the ending point (both are NY in this case). The definition of the state, however, is a bit more subtle: The state is (J,S) where J = {lost city visited} and $ = {cities visited}. Lat us detine J{I,S) = the minimal distance that must be travelled to complete the tour if Blaa is at the j-th stop with J being the last city visited and $ being all the j cities visited. Lo ease notation, we will denote by {N, M, D, C} the four cities, and the distance between city J and city J is denoted by dj. The DPE for this problem can be written as LHL 9) — ye ldis + HAA TVD, VOI S2. with Ja(t.{M,D.C}) = dyy = distance from city I to NY. Recursively, we have fo(M,{M,D}) = dyso + don = 1397 + 809 = 2206; Jo(D.{M,D}) = dpe + dew = 921 4-809 = 1730; LAM{M,C}) = dag + dpy = 1843 + 1559 = 2002; JxlC{M,C}) = dep +dpn = 921 +1889 — 2480; FAD. (DCJ) — done + dare — 1942 | 1284 = 2677, falC {D.C = dons + day = 1397 +1834 = 2731; 10and A(E,5) = mig [dy + AL SUL})] AOLMY) = jamin [dary + (MI) min{dsro + fa(D,{M,D}), dc + folC, {M,C})} = min{(1343 +1730, 1397 + 2480} ~ 9079 (with Use uainiusum athieved av J =D), AD {DY = sehiey [apy + fd, {D,T})) = milnfdoar | fal (PLM}s dow | falOs (2O})} min{1343 + 2206, 921 + 2731) 3549 (with the minimum achieved at J* = Mf) pe Mea + falJ, {CIPI min{dew + fo(M.{C,M}), dev + folD,{C,D})} min{1397 + 2902, 921 +2677} 3598 (with the minitnum achieved at J* = D); AC {CH = ‘and finally ANAND = amin, dns + SulJ,(})] min {dwar + fx(M,{M)), dwn + f(D, {D)), do + fulC {CH} min{1334 + 3073, 1559 +3549, 809 + 3598} = 07 (with te minimum actieved at J- = oF J i) ‘Therefore, the shortest tour will be either NOM3DSC4N of N3C4+D4M>N, Both has total distance 4407 miles, Note these two tours are reverse to each other.2.2. Shortest path problems To consider a shortest path problem in general, we need to introduce some new concepts, Defiwition: A graph, or network, is defined by two sets of symbols; nodes and ares. An are consists of a pair of nodes, aud represents a possible direction of motion between the two ides, ‘The length of an are from node i to node j is denoted by dy. In voneral, diy ennle be negative. Note that the length of an arc does not necessarily mean the physical length, it could stand for some very general quantity associated with the arc, say for example, cost. inition. An oa io said We be dixected if is only allow Inoue direction, Uswaly the direction is indicated by an arrowhead at the end of the arc, An arc is said to be undireeted if both directions are allowed. In this ease, there is no arrowhead on the arc. Au unditeeted are can bo oquivalently represented by two disectesl ava, Definition: path is a sequen initial node of the noxt axe Below we will consider the shortest path problem for a network, Suppose the nodes of a network are denoted by {1,2,-++ ,N'}, and one wants to find a path from node 1 to node NV with the shortest total Toneth © of ares such that the terminal node of each are is identical to the 2.2.1 Shortest path for simple networks As we have seen in Section 1.2, we can use dynamic programming to solve the shortest path problem if the nodes can be divided into groups, which we call “stage”, and one always travels from a node in one stage t0 a node in the next stage. ‘assume node 1 18 an stage U aud node JV isin stage A ‘The mininwl distance from node 1 to node V can be recursively determined by the DPE that for any node 2 in stage k, Velo) = mindde, +Vielu): uv © Stageke i} k= - 1K 0. With convention dyy + 00 if there is no are from x to y. and Vi (N) Exercise: Consider the following network. Find the shortest path from node 1 to node 19. Alsn ind the shortest path from node 3 to node 10, Sohetwon: Lhere are three shortest paths: 1+ 35 +8 > 10, 1+ 4+ 6-39 10, and 144-45.) 810, Each of these path has total length 11. The shortest path from node 3 to node 10 is 3+ 5-8 + 10 with total length 7. 22.2.2 Shortest path by Dijkstra’s algorithm An assumption here isthe length of each are is non-negative. Even though the Dijkstra’s algorithm 18 impucitly denned by a VE'E, it Is nob necessary to specily the stages. ‘The idea is as follows, Instead of searching the shortest path from node 1 to node N, we search the shortest path from node 1 to node m for any node n € {1,2,+++ ,.N}. Define u(j) = the shortest distance from node 1 to the j-th nearest node, and with convention, the Ist nearest novle ie nnd 1 itenlf Suppose NV ie the k-th noareot node to node 1, then the shortest path from node 1 to node J is v(k). ‘The key observation is that The shortest path from node 1 to the (j+1)-th nearest node only passes trough nodes contained sn the 3 nearest nodes to node 1. In other words, there exists an 1 ue ‘Solution: This optimization can indeed be represented by an LP, Maxmize 2 = Lz) + Tez + 1209 such that oy + hry +529 < 10, and sr, :22)-r3 2 0 are non-negative integers. ‘The LP is different frown before is that the «x; is required to be an integer. ‘There are several ways to solve this problem: using DP. Tn the following, we set g(u) = the maximum benifit that can be gained from a w-lb knapsack ‘The recursive equation to solve g(w) is a(t) = max fb; + o{w— wy)}5 here j denote the j-th item, with wy Ws weight and Oy tts benolit, and j must satisfy wy we ws could be different from the original labeling. ‘Turnpike Theorem: Consider a knapsack problem for which ed wi” a eae) Suppose the knapsack can hold w pounds, with w > w*. Then the optimal solution to the knapsack problem must use at least one item of type 1 Set Proof: Consider the same knapsack problem except that item of type 1 are not going to be used. The corresponding LP becomes Maxmize 2 = bey + Oyaty +++ + Buty such that wpa + yay +++ una Sw, and 22,+++ 1 > 0 are non-negative integers. 7But clearly, the value of this knapsack problem is at most bt /we (wits?) Now consider the following solution to the original knapsack problem: put as many items of type 1 ae possible into tho bnapsock ‘Thie way wo eam put in iS wr items of type 1, Here [:| denotes the integer part of a real number. Therefore, the benefit from this solution is at least In case w > w*, we have w by [2] ou (2-1) wt wt in in ‘m oe “Tero, 1 urns out that an solution wath no Hams of type Lis never optimal, 0 Corollary: ‘The optimal solution of the knapsack problem will indeed include at least 1+ [(w — ant) ona] Stome of type 1 Example: Reconsider the preceding example but with w = 4000 Ib, Botution. We lave ma 7 Therefi _ 220 ==. "Thovefaro, the optimal solution will contain ot loost 14 2000222077 — og items of type 1. Therefore, all we need to solve is @ knapsack problem with w = 4000-993-4 = 28.1b knapsack, which greatly reduces the computational elfort, Exercise: Complete the above example by solving the knapsack problem with w = 28-Ib, 2.5 General resource allocation problem, formulation, DPE A gencral resource allocation problem can be expressed as follows. Suppose we have w units of resource available, and N activities to which the resource can be allocated, If the activity n is Implemented at the level x, units (assumed to be a non-negative integer), then gn(n) units of the resources are used and a benefit ra(ra) is obtained. The problem to maximize the total benefit subject to the limited resource availability may be written as 18such that (En) Sw and 1,+++ cy are non-negative integers. In the knapeark problem, the tatal for the number of type n items put in, and gq and ry are the weight and benefit of type n items, respectively. One approach to formulate this problem as dynamic programming, fe as fallows: Consider tho functions if nf rosnnero fe tho woight the Ienapeacls can holdy ay olan x w s By Talim), such that y ‘Gal2n) < w, with vys(w) = 0, In this ease vj(w) can be interpreted as the maximal benefit one can receive if there are w unit of resources available to be allocated to resources jy+++ NY. ‘The DPE is uit v;(u) max [ry{x) + oj1(w—9,(z))], Vw, Where a suse be a sgatlve Laeger saxtstying gj(2) < w. Exercise: Solve the previous knapsack example using the dynamie programming equation above. Exercise: The number of crimes on each of a city’s three police precincts depends on the manber of patzol cars assigned to each precinct, Three patrol cars are aveilable, Use dynamic programming to determine how many patrol cars should be assiened to each precinct sa as to thinimize the total number of erimes, No. of patrol cars ie learaee Precinct [14107 4 Procinet 2/25 19 16 1d Precinct 3|20 14 11 8 Solution: Let 2, = number of partrol ears assigned to Precinct 7. Then the aptimization prob lem is 2 Minimioo Forates) such that a ass and 1, 2,24 are non-nogative integers, 19Define 5 v(t) = ral) such that = and zp are all now-nogative integers. + (uw) fe mln) = min, no() over all nom nogative intoger # ouch thet wi) = 8 Gt =2) wQ) =U, (e=2) wi) = 4, t=) (0) = 2, (=U). © v(t) = ming [r2(z) + vs(w — 2)] over all non-negative i iteger such that x < w, We have v2(3} = min|L4 + 20,16 + 14,194 11,25 +8] =30, (e* = 1,2) (2) = min|16 + 20,19 + 14,25 + 11) = 33, (e* =1) v(I} = min[19 +20,25+ 14) = 39, (2* =1,0) 12(0) = min [25 +20) = 45, (e* =0), nine (r1(z) + va(w ~ «)] over all non-negative integer x such that 2 < w. We have 24(3) = min[44-45,7+99,10493,14+30)=43, (a =1) ‘Therefore, the optimal solution is to assign one patrol car to each procinet, which will yield a minimal munber of erimes 43. a Brervise: What if there are only two patrol cars available? 203. Preliminary probabilistic dynamic programming ‘The deterministic DPE (for minimization of an additive cost criteria) can be loosely written as Vj (current state) = {cost during the current stage + Vj,1(new state)} . all feasible decisions ‘The idea for probabilistic dynainic programming is the same, the only difference is that now the new state is a mndom outcome and the goal is to minimize the average cost. ‘The DPE in this case is very similar, with “V;.1(new state)” replaced by “Average of Viei{uew state)" Example: A gambler has $2. She is allowed to play a game of chance two times, and her goal is to maximize her probability of ending up with at least 84. If she gambles b dollars on a play of the gatne, with probability 0.4 she wins the game and increases her capital by 6 dollars; with probability 0.6 she loses the game and decrease her capital by b dollars. On any play of the game, the gambler cannot bet more than she has available. Design a betting stratogy for inne gambier. Solution: For j = 1,2,3, define Vj(z) = Maximal probability of having at least $4 at the end of game two ssiven that at the beginning of j-th play the gambler has capital z dollars. Also define Va(sr) = 1 ifr > 4 and V3(1) = 0 if x <4, We are interested in Vi(2). One can solve the problem recursively. Indeed, we have the DPE Vj(2) = sc ay Average of Vi4i(new state) = seit [OAVjs1(2 + 6) + 0.6¥5.1(2 - b)) 1,9 (why?) Brow Up we con recursively dotermine all uy: manpefo,1,~,2} [0-4V4(x +b) + 0.6V5(e — 8). Clearly, we have 1 {0,1,---,2—4} tao)={ 0: ia a) OF {01-2} # Vile) = maryefoa.n 4) [OAVEL2 + 0) + 0.6¥;(2 — 0). In particular, Od 5 oF = 4016 ; b=1 od 5 ot V2) = ox, [O4Va(2 +8) + 0.6V212 — 0) ‘Therefore, the maximal probability is 0.4 and one best policy is to het lost, s0 be it; if the strategy bs U0 SIC Oi the first play; if it is ibler wins the first play, she should sit out in the second play. Another best luwe frst play, and bet Z mt the second play, a aExercise: Suppose the gambler wants to maximize her probability of having at least 86 at the end of the dth play. How should she play? (Solution: ‘The optimal probability is 0.1984, and one of the optimal hotting strotagie io Oroe Gow 3) ig oe Bh as Game t 1 in | ops ce Bel BB ine nse ! ! Doe ve ab 7 ba : ie 1 oo i m if : is oy 1 be ‘ ee : é - Dore ! Example: Tennis plaver Thm has two types of sorvec: a hard core (H) ond a soft corvo (8). The probability that Tom's bard serve will land in bounds is py, and the probability of his soft serve will land in bounds is ps. If Tomn’s hard serve lands in bounds, there is a probability wy, tha: Tom will win the point. If Tom's soft serve lands in honns, there ie. penhobility ‘tha: Ton will win the point. We assume pyr < ps and wyr > ws. ‘Toun's goal is to maximize the probability of winning a point on which he serves. Remember that if both serves are out cof bounds, Tom loses the noint. Soketion: Define fj, i = 1,2 as the probability that Tom wins a point if he plays optimally and is about to take his é-th serve. To determine the optimal strategy, we will work backward. What Js fat TC Tom serves hard on hus second serve, le iiss a probability pyri, to win the point, while he has probability psw to win the point if he serves soft. Therefore, we have fe — rox (nuns pswsh ‘To determine f,, observe the following equation: Ar = max {prrwar + (1 — pi) fa, pss + (1 —ps) Fo} There are three possibilities, A. prwy 2 pswg: In this case, fy = pw and fi = max foun + (1 pidparon, pews + —pedpwwn} = rama + (1 prdemon, (Pater + (1 — pulpy we) ~ pss + ~ ps)pawe| = Puwy(L + ps — put) ~ pews > paws ~ ewe > 0, ‘Therefore, ‘Tom should serve hard on both serves. 22. pstes(1 + pu — Ps) S pHwH < Psws: In this case, fo = pws, and fi = max {oewer + (1 ~ va \news, vewe + (1 — pe)powe} = pero + (1 — pudneme So Tom should serve hard on the first serve, and soft on the second. 3. pyren < pywy(lt par — wy). Ta Us vase, fo = psws, aid fi = max {pnw + (1 pa)psws, psws + (1 ps)psws} = psws + (1~ps)psws, and Tom should serve soft on both servos. Example: We will return to the bass problem: Every year the owner of a lake must decide how may bass to capture and sell. During year n the price is py for the bass. Ifthe lake contains ‘ bass in the beginning of the year n, the cost of catching 2 bass is ¢q(z{b). But now the bass population grows by a random factor D, where Prob(D = d) = g(a). Can yon formulate a dynamic programming recursion if the owner wants to maximize the average net profit over the next five years? Sokuston: Denne for m= 1,2+-+ 2, tm(8) = the maximal average net profit daring the years n,n + 1y+++,5 1 tie take contains © bass at the begmning ot year 1. and yp(b) $0. The recursive DPE is nf) = gs, [= ~ elelb) +S aledonas (lo — a] 2

Jim Dai Textbook PDF
No ratings yet
Jim Dai Textbook PDF
168 pages
Jim Dai Textbook
No ratings yet
Jim Dai Textbook
168 pages
Deterministic Dynamic Programming: To The Next
No ratings yet
Deterministic Dynamic Programming: To The Next
52 pages
Dynamic_Programming
No ratings yet
Dynamic_Programming
37 pages
Dynamic Programming - Part 1
No ratings yet
Dynamic Programming - Part 1
23 pages
Dynamic Programming Lecture #1: Outline: - Problem Formulation(s) - Examples
No ratings yet
Dynamic Programming Lecture #1: Outline: - Problem Formulation(s) - Examples
7 pages
DAA UNIT-III (1)
No ratings yet
DAA UNIT-III (1)
27 pages
IE 303 - LN9_1
No ratings yet
IE 303 - LN9_1
17 pages
Dynamic Programming 7707
No ratings yet
Dynamic Programming 7707
51 pages
Dynamic Optimization
No ratings yet
Dynamic Optimization
73 pages
Dynamic Programming
No ratings yet
Dynamic Programming
16 pages
Chapter Four and Five
No ratings yet
Chapter Four and Five
56 pages
Moritz Lars
No ratings yet
Moritz Lars
97 pages
Week 9 - Probabilistic Dynamic Programming
No ratings yet
Week 9 - Probabilistic Dynamic Programming
45 pages
Dynamic Programming
No ratings yet
Dynamic Programming
9 pages
Stochastic Optimization: Anton J. Kleywegt and Alexander Shapiro
No ratings yet
Stochastic Optimization: Anton J. Kleywegt and Alexander Shapiro
43 pages
Dynamic Optimization - Book
No ratings yet
Dynamic Optimization - Book
84 pages
Dynamic Programming
No ratings yet
Dynamic Programming
10 pages
Chapter Four
No ratings yet
Chapter Four
64 pages
Dynamic Programming
No ratings yet
Dynamic Programming
27 pages
Dynamic Programing II
No ratings yet
Dynamic Programing II
32 pages
0-1 Knapsack Problem
No ratings yet
0-1 Knapsack Problem
57 pages
Schaefer MDP
No ratings yet
Schaefer MDP
47 pages
5 Dynamic Programming
No ratings yet
5 Dynamic Programming
16 pages
A Tutorial On Dynamic Programming
No ratings yet
A Tutorial On Dynamic Programming
18 pages
Application of Reinforcement Learning - Finance
No ratings yet
Application of Reinforcement Learning - Finance
540 pages
A Markov Chain Model in Decision Making
No ratings yet
A Markov Chain Model in Decision Making
8 pages
Unit 3 - Analysis and Design of Algorithm - WWW - Rgpvnotes.in
No ratings yet
Unit 3 - Analysis and Design of Algorithm - WWW - Rgpvnotes.in
11 pages
Book-Decision Making Under Uncertainty and Reinforcement Learning
No ratings yet
Book-Decision Making Under Uncertainty and Reinforcement Learning
273 pages
Motivation Example: Isye 3232C Stochastic Manufacturing and Service Systems Fall 2015 Yl. Chang
No ratings yet
Motivation Example: Isye 3232C Stochastic Manufacturing and Service Systems Fall 2015 Yl. Chang
28 pages
A Tutorial On Stochastic Programming PDF
No ratings yet
A Tutorial On Stochastic Programming PDF
35 pages
Approximate Dynamic Programming - II: Algorithms: Warren B. Powell
No ratings yet
Approximate Dynamic Programming - II: Algorithms: Warren B. Powell
22 pages
Daa M-4
No ratings yet
Daa M-4
102 pages
Chapter 7 - Dynamic Programming
No ratings yet
Chapter 7 - Dynamic Programming
40 pages
Dynamic Programming and Optimal Control Script
No ratings yet
Dynamic Programming and Optimal Control Script
58 pages
Infinite Horizon Problems
No ratings yet
Infinite Horizon Problems
69 pages
Optimization: Dynamic Programming
No ratings yet
Optimization: Dynamic Programming
49 pages
Book
No ratings yet
Book
534 pages
Daa R20 Unit 3
No ratings yet
Daa R20 Unit 3
15 pages
DynamicProgramming_Ch18
No ratings yet
DynamicProgramming_Ch18
46 pages
Lecture 5
No ratings yet
Lecture 5
56 pages
Coin Changing 2x2
No ratings yet
Coin Changing 2x2
4 pages
Dynamic Programming: Xiaolan Xie
No ratings yet
Dynamic Programming: Xiaolan Xie
97 pages
Decision Uncertainty
No ratings yet
Decision Uncertainty
269 pages
Characteristics of Dynamic Programming Problems
No ratings yet
Characteristics of Dynamic Programming Problems
13 pages
DAA4M
No ratings yet
DAA4M
26 pages
daa-unit-3
No ratings yet
daa-unit-3
47 pages
UNIT-4
No ratings yet
UNIT-4
21 pages
Chapter - 12 (Dynamic Programming)
No ratings yet
Chapter - 12 (Dynamic Programming)
15 pages
The Analysis of Forward and Backward Dynamic Programming For Multistage Graph
No ratings yet
The Analysis of Forward and Backward Dynamic Programming For Multistage Graph
7 pages
Unit Iv R23
No ratings yet
Unit Iv R23
31 pages
Dynamic Programming: of Optimality
No ratings yet
Dynamic Programming: of Optimality
11 pages
Reinforcement Learning: Foundations
No ratings yet
Reinforcement Learning: Foundations
276 pages
19.5 Markov Decision Processes: Resolving Unbounded Expected Rewards
No ratings yet
19.5 Markov Decision Processes: Resolving Unbounded Expected Rewards
13 pages
Unit-4 Dynamic Programming
No ratings yet
Unit-4 Dynamic Programming
131 pages
Markov Decision Processes: Lecture Notes For STP 425: Jay Taylor
100% (1)
Markov Decision Processes: Lecture Notes For STP 425: Jay Taylor
86 pages
Greedy Appraoch and Dynamic Programming
No ratings yet
Greedy Appraoch and Dynamic Programming
60 pages
Unit 5 Notes
No ratings yet
Unit 5 Notes
20 pages

Deterministic Dynamic Programming

Uploaded by

Deterministic Dynamic Programming

Uploaded by

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.