Additional Material

Lecture Notes


Quadratic penalty method (07.03.)

The following plots demonstrate the application of the quadratic penalty method for the minimisation of Himmelblau's function subject to the additional constraint that \(x^2 + y^2 = 16\). As the penalty parameter \(\mu\) increases, the quadratic penalty function more and more resembles the characteristic function of a circle of radius 4. As a consequence, one sees that the resulting unconstrained optimisation problem becomes more and more ill-conditioned, which indicates that a gradient descent method will perform poorly.

\(\mu\sim 0.01\) \(\mu\sim 1.0\) \(\mu\sim 100\) \(\mu\sim 2000\)

Comparison of convergence speeds (11.02.)

The following plots show logarithmic plots of the error in dependence of the iteration number for different optimisation methods. In all cases, the Rosenbrock function has been minimised using a line search method with the strong Wolfe conditions. The plots show the natural logarithm of the distance of the different iterates from the unique minimiser \((x,y) = (1,1)\). The plots show quite clearly the linear (and slow) convergence of the gradient descent method and the superlinear convergence of the BFGS method. Additionally, they demonstrate the somehow erratic and difficult to categorise behaviour of the Fletcher–Reeves method, where the iterates are almost stationary for several iterations but then suddenly improve drastically for a few steps.

Gradient descent Fletcher-Reeves BFGS

BFGS method (11.02.)

Here Himmelblau's function and the Rosenbrock function are minimised using the BFGS method with a line search using the strong Wolfe conditions. In both cases, the iterations converge very rapidly towards a minimum.

Nonlinear CG method (04.02.)

Here Himmelblau's function is numerically minimised using the Fletcher-Reeves method with line search based on the strong Wolfe conditions (that is, the Armijo condition and the strong curvature condition) with parameters \(c_1 = 0.1\) and \(c_2 = 0.4\). While the first result looks perfectly fine, the second result demonstrates the main problem of the Fletcher-Reeves method: If the search direction happens to be almost orthogonal to the gradient in one step, this tends to be the case for a large number of steps.

Now the Fletcher-Reeves method is applied to the Rosenbrock function (the first image shows the result after 8 steps, the second after 18, the third after 42). The results show a significant improvement over the gradient descent method, where even after 200 steps the improvement over the starting value is rather marginal. Still one has the same problem as above with the Himmelblau function that the iterates tend to spiral around the minimum.

Gradient descent for quadratic problems (01.02.)

Results of the gradient descent method with exact line search applied to quadratic problems. The first image shows the function \(f(x,y) = 5x^2 + 5y^2 -8xy\); the second image the function \(f(x,y) = 5x^2 + 5y^2 - 9.8xy\).

Behaviour of the gradient descent method (28.01.)

Himmelblau's function

Himmelblau's function is defined as

\(f(x,y) = (x^2+y-11)^2 + (x+y^2-7)^2 \)

A contour plot of the function (over the most interesting region) looks like that:

With gradient descent with backtracking line search (parameters \(c_1 = 0.1\) and \(\rho = 0.25\)) we obtain for three different, but somehow close starting values \(x_0\) the following iteration (15 iterations have been performed in each example):

Rosenbrock function

The Rosenbrock function is defined as

\(f(x,y) = (1-x)^2 + 100(y-x^2)^2\)

A contour plot and the result obtained after 200 steps with the gradient descent method with backtracking line search (same parameters as before) look like this:

2016-03-09, Markus Grasmair