You are currently browsing the tag archive for the ‘information theory’ tag.

We will look a nice application of Carathéodory’s theorem in information theory. Let essentially a vector with int he -th position and zero elsewhere. Before we go to the main example, we require the following lemma.

**Lemma 1**Let denote the “vertices” (extreme points) of the simplex, \ie are the standard basis of . For , let be real valued functions on . Also, let be such that . Then there exists , , and such that

*Proof:* We have

Hence by Caratheodory’s theorem, there exists scalars and vectors such that

proving the theorem.

For a more general version of the Lemma look at Lemma 3 in “Source coding with side information and a converse for degraded broadcast channels”

** Bounds on the cardinality of channel inputs using Caratheodory’s theorem **

We will now look at the capacity theorem for channel capacity of Discrete memoryless channels (DMC). Please see Wikipedia and related links for more introduction on the topic of channel capacity. Refer to the the technical report by M SALEHI for more advanced techniques to bound the cardinality of random variables in Information theory.

Consider a channel with input from the alphabet and output alphabet . A channel is specified by a conditional probability distribution for each input, , \ie the probability of obtaining when is transmitted. For the discussion below, let us assume that the input and the output alphabet are discrete. Shannon proved that the channel capacity of a DMC is

where the maximization is over all the probability distributions on the input alphabet. Let for notational simplicity that the input alphabet cardinality and . Observe that in the above optimization problem, the search domain is the set of all discrete probability distrbutions with mass at -points, \ie, the search space is the -simplex. We will now use Caratheodeory theorem to show that this search space can be reduced drastically if the cardinality of the output alphabet is small.

Let and be two discrete random variables with PMF and conditional PMF . Once the channel is specified, fixing the input distribution fixes the output distribution as . So the only flexibility we have is the distribution of the input.

The mutual information between the random variables and is denoted by and equals

where is the entropy of the random variable defined as

and the conditional entropy is defined as

Observe that the entropies and mutual information depend only on the distribution of the random variables. Let be the optimal distribution on the input alphabet that achieves the capacity . As mentioned earlier, this optimal distribution induces a distribution and conditional distribution and .

Now we will show that there exists one more probability distribution on such that the same mutual information . However the new distribution will have a non-zero probability only on terms of . Observe that if then the search space will reduce.

Since is a dimensional (term) probability distribution, it belongs to the simplex and hence can be represented as a convex combination of the vertex points, \ie,

for some and and are the standard -dimensional basis defined in the previous lemma. Here we have used the notation for the optimal distribution

where are the alphabet of . We now consider the following functions on (see the previous lemma for the definition of )and use the previous lemma to reduce the cardinality. Let . Let be the alphabet of the output Define For any vector ,

We make the following observation for

where is the probability of in the optimal distribution of the channel output. Similarly it follows that

So we have

So the right hand side corresponds to the optimal distribution and the optimal entropy distribution . But by previous lemma, there exists constants and and such that

But from the definitions of , it follows that

where Hence it follows that choosing the distribution on would lead to the same optimal distribution on as and the same conditional entropy . But observe that the new optimal distribution.

has positive mass only on terms of the PMF. This is because has one only one coordinate, the -th coordinate. This proves the claim. Hence it is sufficient to search over all the probability distributions with positive mass on coordinates.