You are currently browsing the tag archive for the ‘Convex sets’ tag.
Before we proceed further, in the above corollary, we observe that the property of a linear function was used only in a small part of the proof of corollary 8 in last post. However, observe that the proof (for maximum) would follow through if the function satisfies the following properties:
 , .
 subadditive property:
Such functions are called as subadditive functions. So a subadditive function which is continuous achieves its maximum on a vertex of a compact convex set. An example of subadditive function is the norm . Another example is the support function of a set (we will look at this function later).
We will now look at some interesting examples of extreme points and maximization of linear functionals. We will first begin with the following motivational example.
Examples:
Assignment problem (AP): The AP arises in many contexts and is an example of binary optimization. Here we will motivate the problem in the context of wireless and later generalize the notion.
OFDM (orthogonal frequency division multiplexing) is a multicarrier modulation which is being used in 4G networks. In OFDM,the data is modulated over different tones in such a way that orthogonality can be maintained across tones.
In general, a BS tower serves multiple users. When using OFDM, the BS generally has to decide the allocation of different tones to different users (OFDMA). Assume for now there are frequency tones and there are users. In general the users are distributed in space. Hence each frequency tone will have a different “channel” with respect a user. So let denote the channel of tone from the BS to the user . A good channel in general indicates a higher data rate on that particular tone. A simple map from the channel quality to the data rate obtained is . So the BS wants to allocate one frequency tone per user and allocate all the tones, such that the sum rate is maximized. For example if , a sample allocation is given by , which implies the first tone is allocated to the third user, the second tone to the second user, the third to the fourth user and the fourth to the first user. Observe that any allocation is just a permutation of the integers . So mathematically the BS should find a permutation such that the allocated sum rate
is maximized. Observe that if the BS has check for every permutation of before it can find the permutation that maximizes the sum rate. Suppose there are tones, then the search space is approximately which is huge.
In a little while using the extreme points and KreinMilman theorem, we will come up with a smarter way of solving this problem in polynomial complexity.
We will now start looking more closely at the vertices (extreme points) of a polyhedron. Most of the material is taken from AB. We will first begin with the following theorem that characterizes the vertices of a polyhedron in terms of linear dependence of the vectors.
Theorem 1 Let be vectors in and . Let be a polyhedron defined as
For any point define
Then is a vertex of if and only if the set of vectors linearly span .
Proof: Let us first prove that vertex implies linear span. We will prove by contradiction.Let be a vertex of . Suppose does not span . For notation let the cardinality of be Consider the matrix with as the columns, \ie,
an matrix. Since does not span the entire , the kernel space (or null space) of the matrix is non empty (or the dimension of the kernel is greater than .) Hence there exists a non zero . Since ,
We will now prove that is not a vertex. Let be a small number. Define
and
Observe that and hence if and lie in the polyhedron , then is not an extreme point (vertex) which is a contradiction. We will now prove that for very small , and indeed lie in . To show that (or ) lie in we should show that they satisfy for all . Let . Then
follows since for and since . Now if , then
Observe that . Hence if we choose very very small, then . (For example, you can choose .). Hence satisfies all the inequalities and hence is in the polyhedron. Similarly, you can show that satisfies all the inequalities and hence in the polyhedron.
We will now prove the converse, \ie, linear span implies vertex. Suppose where . We first have for ,
Since , we have and , which is possible only when and . Now consider the set of equations for .
Since span the entire , it follows that there is a unique solution to the above set of equations. However, we have and and . This implies since the solution should be unique.
Proof: We require at least vectors to span .
We will now analyze a particular polyhedron that is required to simplify the assignment problem. Before that we require the following definition of a permutation matrix. Suppose is a permutation of . Then the permutation matrix is an matrix with the entry given by
So for a given there are permutation matrices.
Birkhoff polytope is the set of all doubly stochastic matrices. Mathematically,
First observe that is entirely defined by linear inequalities and hence a polyhedron. Also observe that is a convex set.
Proof: Suppose where . Consider the th row of all the three matrices. Suppose and all the other entries are zero. So we have
We have , and each term is nonnegative. This is possible only when and zero for all the other entries of the th row. This shows that , which proves that is an extreme point of .
We will now prove the converse of the above statement.
Proof: We prove the lemma by induction. For , it is an obvious statement. Let us assume it is true till dimension and prove it for dimension .
Consider the following affine subspace of . The space is the set of all real numbers such that
and
Observe that the dimension of the space is . (Think why??) Observe that and is defined as the polyhedron
in .
Let be an extreme point of . So by Theorem 1, an extreme point of should satisfy at least inequalities with equalities. Hence for at least entries of the matrix. We make the following observation.
 Every row cannot have more than or equal to non zero entries: In this case the maximum number of zero entries would be (each row can have at maximum zero entries and there are rows). This cannot be true, since .
So there is at least one row with only one nonzero entry at location. Since is also doubly stochastic, we have that . Hence it also follows that there can be no more non zero entries in the column . Let be the new matrix obtained after removing the row and the column . Observe that
 is a doubly stochastic matrix:This is because the original matrix was doubly stochastic and the contribution from row and column is only .
 is an extreme point of : Suppose is not an extreme point of , \ie, can be expressed as a sum of , where and . Then it is easy to see that by adding one more row (all zeros except at ) and column (all zeros except at ) to and at positions and will result in new matrices and . Observe that .
So is an extreme point of . So by induction hypothesis, we have that is some permutation matrix . Now augment with the row and column to obtain .
We will now see how to utilize this theorem to simplify the assignment problem. Recall that the AP was to minimize over all the permutations of .
Observe that the cost function is equivalent to
where is the permutation matrix. So the AP equals
Observe that even now, is restricted to be either or .
Now consider a new problem P2:
over the set
Observe that we are now maximizing over the Birkhoff polytope and are real numbers. So we are maximizing a linear function over a convex compact set . Hence the maximum occurs at a vertex point of . But by Birkhoff von Neumann theorem, the vertices of are nothing but the permutation matrices. Hence solving P2 is equal to solving AP. Also observe that the new problem P2 is a linear programming problem and is very easy to solve in polynomial complexity.
1. Separation Theorems
We begin with some definitions of halfspaces:
Definition 1 Let and .
 Open half space :
 Open half space :
 Closed half space :
 Closed half space :
Separation theorems deal with sufficient conditions for separation of sets by hyperplane. There are numerous separation theorems with increasing order of complexity. We will look at two separation theorems. In the following discussion the norm we consider is the Euclidean norm. We begin with the following definition of distance to a set
We require the following lemma that characterize the distance of a point to a closed set. The proof can be found in standard analysis textbooks.
Proof: Claim1: The function: is a continuous function. (Easy to prove. Think why).
Since is a compact set and is a continuous function, it attains a minimum on a point on , \ie,
Also by previous lemma there exists a point in such that
Let be some point in and some point in . Then
where follows from the definition of distance to the set and follows from (1). So for any and , we have . Hence
since and .
We will now look into this distance functions more closely.
\ie, the point achieves the distance. Moreover, the innerproduct
Proof: Existence of some point in follows from Lemma 4 because is closed. We will have to show that the point is unique and the inner product inequality is satisfied. We will begin with the inequality. Since and , the line segment . Since is the minimum distance,
So we have
Squaring both sides and after some basic simplification (check this)
Observe that this inequality should hold true for all , especially very small . Since for very small , , it follows that the above expression is nonnegative for all if and only if , which proves the inequality.
Suppose there exist one more point such that . Then by the above inequality we have
and
Adding both these inequalities we obtain
which implies .
Geometrically, the inequality means that the angle between the vectors and is obtuse.
We will now look at the first separation theorem:
Proof: Since is compact and is closed, by Lemma 5, there exists two points and such that is the closest point in of the point and the closest point in of the point .
Let be some point in . Then from Lemma 6, we have
which implies
Observe the strict inequality in the last line. This is because which implies . Hence we have
Now let be some point in . Then from Lemma 6, we have
Using this inequality and similar trick as before, it follows that
It is easy to see from (2) and (3) that the hyperplane strictly separates the sets and .
The next separation theorem looks at separation of a point and an open set. Before that lets look at hyperplanes again. Earlier we have seen that a Hyperplane in is an affine space and its dimension is of dimension . It also follows that any affine subspace of dimension in is a hyperplane.
Proof: To show that is an hyperplane, we have to find a scalar and a vector such that
Let . Since is an affine subspace, translating the space by , \ie, is a subspace again of dimension . Since is a subspace (which is a vectorspace) of there exists vectors such that they form the basis of . By GramSchmidt orthogonalization we can also assume that these vectors form an orthonormal set. Since these are orthonormal, they are linearly independent in . But since is of dimension , we can find one more vector which is orthogonal to every vector in the set . ( together with will form a basis of ). Since is orthogonal to any basis vector of the subspace , it follows that it is orthogonal to every vector in the subspace . Indeed, it can be shown that any vector orthogonal to should belong to (think projections ?). Hence
Since , we have
which is the equation of a hyperplane with and .
We will now get back to the main separation theorem. The following separation theorem is a special case of geometric HahnBanach theorem.
Proof: Without loss of generality we can assume that . Otherwise, we can translate both and so that .
We will prove this in three stages: First we will prove the theorem for , i.e, a plane. Consider the circle . Now we project the convex set onto the circle in the following way:
as
Observe that the angle of the arc should be less than . Otherwise,if it is greater than , consider a point on the arc . Let this point be the image of a point . Since the arc is greater than , the diametrically opposite point of should also be in the set . Call that point and its inverse image as . Since is convex, observe that the line segment should be in the set and also the line segment should contain the origin. Hence , which is a contradiction to our assumption.
Also, since is an open set, observe that the image of , will be an open arc on . Now consider an end point of the arc. This can be done, since the arc is open. Now consider the line joining the end point and the center origin. This line contains the origin and the set is on one side of the line. This line is the required hyperplane for two dimensions.
Now let us consider higher dimensions :
We will first now prove that there is always a line (one dimension) in higher dimension that passes through the origin such that . Consider a two dimensional plane through the origin. If it does not have any intersection with , pick some line in the plane and denote it by . This is the required line. Suppose is nonempty, it is easy to see that is a convex and open set that does not contain the origin. Hence by the previous two dimensional result, it follows that there is a line that passes through the origin such that . This is the required line .
Let be an subspace such that and . Also let be the biggest such affine space, \ie, any other subspace with this property is contained in . If the dimension of is , then by Lemma 8, it is a hyperplane and we are done (this is because is an affine subspace through the origin).
We will now prove by contradiction that the dimension of such maximal set is . Let the dimension of be . Since is a subspace there are basis vectors that generate this subspace. Call them . Adding more vectors , the set will form a basis of . So any vector can be represented as a linear combination of the basis vector, i.e.,
Hence we can define the following projection function from to as
Consider the projection of the convex set . The set is convex and open (Why?). Now from the previous argument we can find a line that passes through the origin in but does not intersect . Now consider the inverse image of under the map , \ie, . From (4), we observe that this means only the last coordinates will correspond to and the first coordinates to . Hence it follows that
Observe that is a subspace (why ?) that contains and larger than . Also, it has no intersection with . This is a contradiction since we assumed is the largest subspace with these properties.
So and hence by Lemma 8, is a hyperplane.
1. Properties of convex sets

Lemma 1
Let a set be a convex set for every . Then the intersection of all these sets
is convex.
Proof: HW problem
Example 1 (A cool example from SB) The set of PSD matrices can be represented as
Observe that for any , the set
is a convex set in (why?). Hence from the above Lemma is convex.
 A scaled and a translated version of a convex sets is convex.
 The projection of a convex set onto its coordinates is convex. If the set is convex then the set
is convex.
Example 2 The projection of a polyhedron on to its coordinates is a polyhedron.Proof: The following example illustrates the procedure of the proof. Consider the following polyhedron:
Let us consider its projection onto the plane. Denote this projection of onto the plane by . It is easy to see that
Let us see the formal way to prove it. So belongs to the projection if and only if there exists a such that \begin{align*} z&\leq 1(x+y),
z&\geq 0. \end{align*} There exits a that satisfies these two conditions if
which implies . This combines with the initial inequalities and lead to the polyhedron . (This technique of variable reduction is also called as Fourier Motzkin reduction)
 Consider the affine function and . Then the image of a convex set is convex. Also, if is convex, then the inverse image of a set under is convex. Proof:Let be the inverse image of a convex set . Let . then is the inverse image of which belongs to the set since it is convex. Let us look at some examples: (These examples are from SB)
Example 3 Consider the matrix inequality
for , . Consider two sets , we say if and only if , \ie, .
This is called a linear matrix inequality. Then the set is convex. Consider the function . Observe that
Example 4 (Ellipsoid) Consider the set , . This set is the inverse image of of the unit ball . and hence convex.  In general the union of convex sets need not be convex. The union of an increasing sequence of convex sets is convex.
1.1. Affine sets
Definition 2 A set is affine if for any two points , the line passing through and lies in . Mathematically,
Definition 3 A linear combination of points
is called an affine combination of the points .
Some examples:
 Let be the unit square. Then the affine hull of is the entire .
 . Then the affine hull of is .
Let be a vector space and be a subspace ( belongs to the space). Then for any element , the translation is called an affine subspace. The dimension of is the dimension of .
Proof: Let . Then consider the set . We will now prove that is a subspace. belongs to the subspace. Let . Then
Observe that , and of course . So their affine combination belongs to the set . So belongs to the set , which implies that is a subspace. Since
and hence it follows that is an affine subspace.
Example 5 The Hyperplane
is an affine subspace.
A hyperplane in is of dimension . (This is because only coordinates can be freely chosen. The remaining coordinate is automatically fixed by the constraint .)
Definition 6 A set of points are affinley independent if
implies for .
1.2. Topological properties of convex sets
Proof: Since is an interior point, there is a ball that lies entirely in the set . Now consider the point . We should prove that every point on the line segment joining and (expect for the end point ) is an interior point. Consider the “cone ” with as an end point and a section of the Ball passing through as a base.
This cone is entirely in the set because of convexity. Hence for any point on the line between and , we will be able to find a ball that fits in this cone and hence the set . See the figure
Proof: Let be interior points of set . By Lemma 7, it follows that
Since also belongs to the interior, the line segment .
Proof: Consider the point a convex combination with equal weights. We will now show that there exists an such that . This will prove that belongs to the interior of the set. To show that , it suffices to prove that any is a convex combination of .
Consider the following matrix denoted by
Observe that from the independence of points we have
since implies . Hence the matrix is invertible and hence its eigenvalues are nonzero (condition number is non zero). Now consider the following function
Observe that since is invertible both and are continuous. We have that
So from the definition of continuity we have, for any , there exists a such that
If is made very small, it follows that all the coefficients of the vector will be positive (in fact close to ). Hence any , can be represented as a convex combination
where are the coordinates of the vector . Also observe that because of the last row of .
In general a set need not have an interior. For example the interval considered as a subset of has a nonzero interior. However, it turns out if a convex set has an empty interior, then we can find an affine space of dimension lower than that contains where has a non zero interior. For example see the Figure and consider the dimension of an affine space that contains the set.
Proof: First claim: The number of affinely independent point in set cannot be greater than . This is because,if there are or more independent points, by previous Lemma, we have that the interior of the convex set is nonempty. This contradicts the theorem statement that the interior is empty.
Let the maximum number of affinely independent points in set be and denote them by . Let be some point in . Now consider the following set of equations:
Since the maximum number of independent points in the set is , the above set of equations has a non trivial solution. Also, observe that (think why ?) . Hence we have
Also observe that
Hence any point is an affine combination of the points . So the set . Also
hence proving the theorem.
We will look a nice application of Carathéodory’s theorem in information theory. Let essentially a vector with int he th position and zero elsewhere. Before we go to the main example, we require the following lemma.
Proof: We have
Hence by Caratheodory’s theorem, there exists scalars and vectors such that
proving the theorem.
For a more general version of the Lemma look at Lemma 3 in “Source coding with side information and a converse for degraded broadcast channels”
Bounds on the cardinality of channel inputs using Caratheodory’s theorem
We will now look at the capacity theorem for channel capacity of Discrete memoryless channels (DMC). Please see Wikipedia and related links for more introduction on the topic of channel capacity. Refer to the the technical report by M SALEHI for more advanced techniques to bound the cardinality of random variables in Information theory.
Consider a channel with input from the alphabet and output alphabet . A channel is specified by a conditional probability distribution for each input, , \ie the probability of obtaining when is transmitted. For the discussion below, let us assume that the input and the output alphabet are discrete. Shannon proved that the channel capacity of a DMC is
where the maximization is over all the probability distributions on the input alphabet. Let for notational simplicity that the input alphabet cardinality and . Observe that in the above optimization problem, the search domain is the set of all discrete probability distrbutions with mass at points, \ie, the search space is the simplex. We will now use Caratheodeory theorem to show that this search space can be reduced drastically if the cardinality of the output alphabet is small.
Let and be two discrete random variables with PMF and conditional PMF . Once the channel is specified, fixing the input distribution fixes the output distribution as . So the only flexibility we have is the distribution of the input.
The mutual information between the random variables and is denoted by and equals
where is the entropy of the random variable defined as
and the conditional entropy is defined as
Observe that the entropies and mutual information depend only on the distribution of the random variables. Let be the optimal distribution on the input alphabet that achieves the capacity . As mentioned earlier, this optimal distribution induces a distribution and conditional distribution and .
Now we will show that there exists one more probability distribution on such that the same mutual information . However the new distribution will have a nonzero probability only on terms of . Observe that if then the search space will reduce.
Since is a dimensional (term) probability distribution, it belongs to the simplex and hence can be represented as a convex combination of the vertex points, \ie,
for some and and are the standard dimensional basis defined in the previous lemma. Here we have used the notation for the optimal distribution
where are the alphabet of . We now consider the following functions on (see the previous lemma for the definition of )and use the previous lemma to reduce the cardinality. Let . Let be the alphabet of the output Define For any vector ,
We make the following observation for
where is the probability of in the optimal distribution of the channel output. Similarly it follows that
So we have
So the right hand side corresponds to the optimal distribution and the optimal entropy distribution . But by previous lemma, there exists constants and and such that
But from the definitions of , it follows that
where Hence it follows that choosing the distribution on would lead to the same optimal distribution on as and the same conditional entropy . But observe that the new optimal distribution.
has positive mass only on terms of the PMF. This is because has one only one coordinate, the th coordinate. This proves the claim. Hence it is sufficient to search over all the probability distributions with positive mass on coordinates.
Similar to a closure of a set, the convex hull of a set is the smallest convex set containing . We will now see some examples.
 Some basic geometric examples.
 Birkhoff – Von Neumann Theorem: Consider the set of permutation matrices . Then the convex hull of this set is the set of all doubly stochastic matrices.
We will now see an interesting application of convex hull.
Proof: Without loss of generality the polynomial can be represented as
So we have
Let be a root of . Then we have and hence
Multiplying both sides with , we obatin
which implies
If then for some and hence the theorem is proved. Else if
which equals
which implies
Taking the conjugate we get the result where .
We will now look at one more theorem that looks at convex combinations.
Some observations before the proof:
 The points depend on in consideration.
 points are necessary. Let be three noncollinear points. Let be the the centroid of the triangle formed by these three points. Observe that the convex combination of the three points is necessary to obtain .
 Observe that the points are from and not .
Proof: Any point can be represented as a convex combination of some points in . So
where and . If we are done. Hence we will only consider the case of . Let . Now consider the set of equations and . This system of equations can be put in matrix form as
Since the matrix is of size , and , there are non trivial solutions of this equation (look at ranknullity theorem to observe that ). From now on Let be one such solution. Now let . Since , the point can be represented as
Also observe that since . Now we will choose such that and equals for one term so that becomes a convex combination. Since implies some ‘s are positive and others are negative. If for some , , then . So we need to consider only the terms with . For all such that , we want
which implies
Hence if we choose , then one term and every other term will be positive. So we are able to express
where one of the coefficients is zero, \ie, as a convex combination of points. We can continue this procedure till
We will now see a simple application of Caratheodory’s theorem.
Proof: Consider a sequence . If we show that there is a convergent subsequence which converges to a point in then we are done.
By Caratheodory’s theorem, each can be represented as a convex sum of points in . So
Now consider the following matrix of dimensions
Now consider the first column . Since this sequence lies in and is compact, there is a subsequence in that that converges to a point . The indices of this subsequence are denoted by . Now consider the second column, but consider only rows corresponding to the index set . Since a compact set, there exists a subsequene of that converges to some point in . Denote the indices corresponding to this subsequence by the set . Now consider the third column corresponding to but only corresponding to the rows . Since and is compact, we can find a subsequence which converges to . Denote the set of the indices corresponding to this sub sequence by . Continue this till all the are exhausted. So we will finally have an index set corresponding to the last column.
Now consider the original sequence . It is easy to see that, the subsequence corresponding to the index set converges to a point
which belongs to the set proving the result.
1. Convex Sets
The value corresponds to the point and is the point . Also observe that for a line can any real value. When is restricted to , \ie, , a line segment connecting and is obtained. We denote the line segment between and as . Let us see some examples:
 The concept of line and line segment is very clear if and belong to .
 Let and . Then it is very clear that for all . Also it is very clear that . Hence the concept of line is well defined on the set .
 Let and . Then it is very clear that for . From the properties of SD matrices, it follows that
So the notion of line segment is well defined on .
A set is convex if the line segment joining any two points in lies in . Mathematically the set is convex if and only if for any implies . First observe that convexity is not a topological property, that is it does not depend on the norm. Let us see some examples:
 the empty set is always convex.
 Provide other geometrical examples.
 The set
is a convex set. This is also called as the dimensional simplex. Observe that this set corresponds to all valued discrete random variables.
 Any ball is simplex. Let and belong to the ball . Then by the property of norm
This implies which implies is convex. In general, is a convex set for any and (Home work). Observe that the ball can be represented as (prove it)
 Ellipsoid: Prove that the set , where is a convex set.
 Hyperplane and halfspace: Let and , then the set
is called a hyperplane. The set
is the halfspace.
 Systems of linear equations or Polyhedron: Let be vectors in and be some scalars. Then the set
is called a polyhedron. Succinctly, this can be represented as .
 The set of PSD matrices and is convex. Proof: Let and belong to .
 Voronoi tessellation: Let denote points in . For any point let denote the Voronoi region of the point , i.e.m
Prove that is a Voronoi set.
is called a convex combination of the points .
Proof: It is easy to see that if a set contains all convex combinations of its points, then the set is convex.
Now to prove the other way. We prove it by induction on the number of points. By the definition, it is is true for . Let us assume it is true for any points, \ie, any convex combination of points lies in the set. Now consider points . Consider
such that . Alternatively,
Observe by induction assumption, , since . So,
is a two points convex combination of and the points and hence belongs to the set .
We will now define a process of convexifying any set.
Similar to a closure of a set, the convex hull of a set is the smallest convex set containing . We will now see some examples.
 Some basic geometric examples.
 Birkhoff – Von Neumann Theorem: Consider the set of permutation matrices . Then the convex hull of this set is the set of all doubly stochastic matrices.