You are currently browsing the category archive for the ‘Capacity’ category.

**Theorem 1**Let be a convex and a -smooth function. Then, for any ,

*Proof:* Let . Then,

where follows by using gradient inequality for the term since is a convex function, and using the -smoothess of for the term, and follows by substituting .

The above theorem is from the notes of Dr. Sébastien Bubeck.

**1. Unconstrained Minimization **

We now look at the following optimization problem.

Our objective is to come up with a sequence of vectors such that . We refer to such a sequence as a minimizing sequence. We next define a tolerance limit to the optimization problem which basically limits the allowed gap between the optimal value and the function value at the point returned by the algorithm. By this we mean that the algorithm stops at if

**2. Descent Algorithms **

In descent algorithms, we follow the following update rule.

where is the step size and is the descent direction. Here, and hence it is not a “direction” in the strict sense of the word but a misnomer. By descent, we mean the following,

If is a convex function, by gradient inequality,

From (4)

**From (5) and (7), we have the following as a necessary condition for every gradient descent algorithm. **

Given below is the basic structure of a descent algorithm.

- Start at an initial point .
- Repeat the following till the “Stopping Criterion” is met.
- Descent direction : Choose a descent direction .
- Line search : Choose a step size .
- Update : .

The stopping criterion does not directly follow from the tolerance limit as we do not know the optimal value . As we will see later, the choice of stopping criterion is of prime importance to the performance of a descent algorithm and it is not easy to come up with an appropriate stopping criterion. For now, we assume that “Exact Line Search” is used. By this, we mean that the step size is given by,

We will later see that implementing “Exact Line Search” is not practical and we will come up with simpler ways to find the step size without any significant loss in performance. We are now left with the choice of the descent direction which should follow (8). All the descent algorithms differ on the choice of this descent direction. We now look at a basic descent algorithms,

** 2.1. Gradient Descent Algorithm (GDA) **

The descent direction is chosen as

Observe that this descent direction clearly satisfies (8). Next, we state and prove the following important theorem which shows that GDA works on Lipschitz functions.

**Theorem 2**Let be a -smooth function and . Then the gradient descent algorithm with a constant step size will converge to a stationary point, i.e., the set .

*Proof:* Recall that for gradient descent algorithm,

As is -smooth, we have

where follows from (10). Thus, we have

where follows since . Next, we have

where is the global optimal value. Taking the limit as , we have

Hence and hence .

Note that maximizes term appearing in the denominator of (11) and hence is the optimal step size to be used. In this case we have

The convergence rates can be obtained as follows: Let . Then

which implies

Hence, when is just a -smooth function (need not be convex), the above theorem shows that gradient descent converges sub-linearly to a stationary point (this need not even be a local minimum). Also, observe that the above bound is on , i.e, the best gradient till . However, the gradient might oscillate while the function value might not. While, Lipschitz condition is used to prove the convergence rate, the algorithm will converge to a stationary point if the level set corresponding to the initial point is bounded.

Recently this talk was given by Dr. Andrew Thangaraj in our reading seminar and I was fascinated by the elegance of the solution. I went through the original paper “On the Shannon Capacity of a Graph” by Lovasz and decided to write up some notes. The technique described by Andrew follows a modern approach that requires spectral graph theory and I will describe the spectral theory approach in a later post. I will begin with the basic description of the problem and Lovassz’s solution.

** Zero error capacity of a noisy type writer channel **

Suppose you have a typewriter that is noisy, i.e., the printed letter might turn out to be the pressed letter or the next letter with equal probability. For example, if the “c” key is pressed then the printed letter might be “c” with probability or a “d” with probability . If letter “z” is pressed, then the printed letter is “z” with probability or an “a” with probability . So the zero-error capacity of this type writer channel is defined as the maximum number of symbols (more correctly bits) that can be conveyed per channel use. It is easy to see that if we only use a alternate letter set, then there is no confusion at the output. It can also be shown that something better cannnot be done. So the error free capacity in bits is

Alternatively, we can form the following graph: Connect two nodes if there is confusion of using two nodes simultaneously. For the -letter type writer channel it is easy to see that the graph equals the cycle . Observe that the maximum independent set of has nodes. Hence for ,

where is the size of the maximal independent set of a graph . So now lets look at the cycle . This would correspond to a type writer of letters . It is easy to see that . Earlier the code was for one channel use. It turns out that combing letters over two channel uses improves the number of error free symbols. So for the (And its corresponding typewriter) sending the pairs of letters leads to an error free code. For , the error-free capacity equals (proved by Lovasz). So we should look at graph product and their independence numbers to obtain the error free capacity.

**Definition 1 (Strong Product)**

*For two graphs and , their strong product denoted by is defined on in which is adjacent to iff in and in . (“” denotes adjacency.)*

denotes the strong product of the graph by itself times. The following Figure illustrates a product of a 5 cycle graph with itself.