Wolfe Conditions
Wolfe Conditions
Recap
In the last class we had seen an approximate line search algorithm where you pick an approximate step size
rather than going for the optimum one. We also saw the Wolfe conditions for checking if the step size is
appropriate.
Sufficient decrease: The step size selected ensures a certain amount of reduction in function value. i.e.
the function value at the new location is lesser than the one at previous location by at least “some amount”.
Mathematically
If you compare with the sheet I shared last week, the terms have been rearranged. The inequality remains
the same. Here the reduction is at least by |c1 αk ∇f (xk )T dk |. c1 ∈ (0, 1) and is usually chosen close to zero.
Curvature condition: The main purpose of this is to prevent step sizes which are too small. In other
words, say at the k th iteration the gradient is fk and the direction is dk = −fk and the gradient at the new
point is such that the function can still reduce further in the direction dk .
An example is: say your function is f (x, y) = x2 + y 2 . Starting from any point one can directly move to the
minimum location (0, 0). But if you choose a step size which stops at a location other than (0, 0), you have
stopped too soon. This second condition prevents such a situation.
The algorithm ignores points where the function can further decrease and selects only that step size where
the directional derivative in dk direction starts becoming less negative or positive. (Refer back to previous
document). Mathematically,
We will use φ(αk ) = f (xk + αk dk ). Note that αk is a constant for an iteration and our objective is to
arrive at an acceptable αk .
Algorithm
This algorithm is from Numerical Optimization by Nocedal and Wright. I have added a commentary and
examples to illustrate the process.
f (x, y) = (1 − x)2 + (y − x2 )2 .
This function goes by the name Rosenbrock function, also known as banana function. Used for testing
optimization algorithms. It has only one minimum but convergence to the minimum is difficult.
1
4
3.5
2.5
1.5
0.5
0
0 0.05 0.1 0.15 0.2 0.25 0.3
Remember that φ(αk ) = f (xk +αk dk ). Also from the chain rule: φ0 (αk ) = ∇f (xk +αk dk )T dk . Note that
φ0 (0) = ∇f (xk )T dk The first step in selecting the step size (α) is to check if sufficient decrease condition is
satisfied. The graph in Fig.1 is different from what I had shown in last class (that one was plot of φ(αk )−φ(0)
and the line c1 αk ∇f (xk )T dk ). This one is the plot of φ(αk ). The description of the lines are as follows:
Red line is the RHS of inequality in (1). Slope is c1 φ0 (0), which is negative since φ0 (0) is negative.
Yellow line is tangent of φ(αk ) at αk = 0. Has a slope of φ0 (0).
Magenta line is c2 φ0 (0), where c2 ∈ (c1 , 1). Used in the second Wolfe condition.
The algorithm starts by fixing α0 = 0 and a user defined maximum step size, αmax . The first picked
step size is α1 ∈ (0, αmax ). For the curve drawn c1 = 0.1, c2 = 0.4, φ0 (0) = −18.5. The algorithm proceeds as
follows:
1. Check if sufficient decrease criterion is satisfied. In this case, yes, since at α1 = 0.05 the curve is below
the line.
2. Next we should check if the step size is too short. You can visualize the tangent at α1 and see that it
is steeper (more negative) than c2 φ0 (0). The value of φ0 (α1 ) = −15.56, whereas c2 φ0 (0) = −7.4. This
violates the curvature condition.
3. This step size is not acceptable.
4. Using an interpolation method (linear, quadratic or cubic), pick the next step size from the interval
(α1 , αmax ). αmax = 0.3 for this example.
2
5. Using linear interpolation α2 = 0.175. The new point is shown in Fig.2.
3.5
2.5
1.5
0.5
0
0 0.05 0.1 0.15 0.2 0.25 0.3
Figure 2: α2 is still not acceptable because we are looking for a point where the derivative is still negative.
A positive derivative only gives an acceptable range. We are looking for a point which is roughly at the
boundary where the slope changes from negative to positive, and has negative slope.
6. Note that at α2 the tangent has a positive slope (φ0 (α2 ) = 4.39).
7. Though this satisfies the inequality in (2), since the directional derivative of f (xk + αk dk ) is positive we
will continue the search till a point where the directional derivative while still being negative satisfies
the curvature condition.
8. We find α3 = (α1 + α2 )/2. Note that the range got reduced from (0, αmax ) to (α1 , α2 ) and from this
range we pick α3 .
9. Here α3 = 0.1125 and φ0 (α3 ) = −8.1. Still does not satisfy curvature condition.
10. Choose α4 = (α2 + α3 )/2 = 0.1437. φ0 (α4 ) = −2.54 and we are done! This point satisfies curvature
condition and still has a negative directional derivative. Fig.3 shows the additional points.
3
4
3.5
2.5
1.5
0.5
0
0 0.05 0.1 0.15 0.2 0.25 0.3
Figure 3: α3 not acceptable since it is still too steep, whereas α4 is acceptable. Slope is negative and
satisfies second Wolfe condition.
In case, the initial point fails to satisfy the sufficient decrease condition, α2 , the next point is picked
from the new interval (0, α1 ). Once a point which satisfies the sufficient decrease condition is found then the
algorithm proceeds as given in the above steps. An example for this case is given in Fig.4. Here αmax = 0.325,
α1 = 0.28. I changed αmax just for the heck of it, 0.3 is also perfectly fine.
4
4
3.5
2.5
1.5
0.5
0
0 0.05 0.1 0.15 0.2 0.25 0.3
Figure 4: α1 fails the first Wolfe condition. Whereas α2 = (α0 + α1 )/2 = 0.14 satisfies both the conditions;
φ0 (α2 ) = −3.27.
The sheets from the book which has the algorithm is appended. You need to implement this and mail
me/upload in Moodle by 31 March 2020. The problem statement is:
TODO
Implement the approximate line search algorithm and use it to minimize a function. You should write
a code which can accept any function as an input and return a minimum. Implement the optimization
using (1) optimal step size and (2) approximate step size. Note down the time to converge in both
cases.
Upload the code and a report. Report should contain your observations and results for minimizing
the Rosenbrock function given above.