Today we will look at a variant of gradient descent called the steepest descent algorithm.

**1. Steepest Descent Algorithm **

Having seen the Gradient Descent Algorithm, we now turn our attention to yet another member of the Descent Algorithms family — the Steepest Descent Algorithm. In this algorithm, we optimize the descent direct to obtain the maximum decrease of the objective function. First, let us look at the descent direction for Steepest Direction. Let be continuous and differentiable. Using Taylor Series, we get

Our objective is to minimize the second term i.e. term. The descent direction is defined in two steps. First, we look at the so-called normalized steepest descent direction. This in turn helps us define the steepest descent direction. The *normalized steepest descent direction* is defined as

Observe that the descent direction depends not only on the point , but also on the norm . As is clear from the definition, the name *normalized* is used since we are searching only through those vectors that have unit magnitude. Looking closely, we notice that this definition is similar to the one of *dual norm* of a given norm. Hence, the minimum value is given by . The negative sign compensates for the “min” used here as opposed to the “max” used in the definition of a dual norm. Now we can define the descent direction for the Steepest Descent Algorithm as follows:

The following calculation verifies that the descent direction defined above satisfies the sufficiency condition required, and also justifies the use of term used in the definition:

where follows from the definition of normalized steepest descent direction. Hence, the update equation is

The following Lemma casts the descent direction as a simpler optimization problem.

**Lemma 1**

*Proof:* The proof follows by representing , where , and jointly optimizing over and .

We will now look at the descent directions generated by a quadratic norm.

**Definition 2**Let be a fixed Positive Definite matrix i.e., . The quadratic norm of any vector is then defined as

If we define = , then this essentially corresponds to a change of basis. Since , = is well defined. This in turn lets us allow the quadratic expression to be simplified as follows:

Hence, an ellipsoid is converted to a sphere. In the case of Quadratic Norm, the descent direction is

Since the above minimization problem is unconstrained, setting its gradient to zero gives the minimizer. Therefore,

and hence

This norm can be used when the condition number of the objective function is bad. In this case, if one has some idea of the Hessian at the optimal point the matrix can be chosen to be equal to . This would improve the condition number and the convergence rate of the algorithm.

** 1.1. Convergence Rate for Steepest Descent with Back Tracking **

**Theorem 3**Let be smooth and strongly convex. Then Steepest Descent with Back Tracking has a linear convergence rate.

*Proof:* See Boyd.

## Leave a comment

Comments feed for this article