Before we proceed further, in the above corollary, we observe that  the property of a linear function was used only in a small part of the proof of corollary 8 in last post. However, observe that the proof (for maximum) would follow through if the function ${f(x)}$ satisfies the following properties:

1. ${f(a x) = a f(x)}$, ${a>0}$.
2. sub-additive property: ${f(x+y) \leq f(x)+ f(y)}$

Such functions are called as sub-additive functions. So a sub-additive function which is continuous achieves its maximum on a vertex of a compact convex set. An example of sub-additive function is the norm ${\|x\|}$. Another example is the support function of a set (we will look at this function later).

We will now look at some interesting examples of extreme points and maximization of linear functionals. We will first begin with the following motivational example.

Examples:

Assignment problem (AP): The AP arises in many contexts and is an example of binary optimization. Here we will motivate the problem in the context of wireless and later generalize the notion.

OFDM (orthogonal frequency division multiplexing) is a multi-carrier modulation which is being used in 4G networks. In OFDM,the data is modulated over different tones in such a way that orthogonality can be maintained across tones.

In general, a BS tower serves multiple users. When using OFDM, the BS generally has to decide the allocation of different tones to different users (OFDMA). Assume for now there are ${n}$ frequency tones and there are ${n}$ users. In general the users are distributed in space. Hence each frequency tone will have a different “channel” with respect a user. So let ${h_{ij}}$ denote the channel of tone ${i}$ from the BS to the user ${j}$. A good channel in general indicates a higher data rate on that particular tone. A simple map from the channel quality to the data rate obtained is ${R_{ij} = \log_2(1+h_{ij}^2)}$. So the BS wants to allocate one frequency tone per user and allocate all the tones, such that the sum rate is maximized. For example if ${n=4}$, a sample allocation is given by ${\sigma =\{3,2,4,1\}}$, which implies the first tone is allocated to the third user, the second tone to the second user, the third to the fourth user and the fourth to the first user. Observe that any allocation is just a permutation of the integers ${\{1,2,...,n\}}$. So mathematically the BS should find a permutation ${\sigma}$ such that the allocated sum rate

$\displaystyle \sum_{i=1}^n R_{i \sigma(i)},$

is maximized. Observe that if the BS has check for every permutation of ${1,...,n}$ before it can find the permutation that maximizes the sum rate. Suppose there are ${20}$ tones, then the search space is approximately ${20!\approx 2\times 10^{18}}$ which is huge.

In a little while using the extreme points and Krein-Milman theorem, we will come up with a smarter way of solving this problem in polynomial complexity.

We will now start looking more closely at the vertices (extreme points) of a polyhedron. Most of the material is taken from AB. We will first begin with the following theorem that characterizes the vertices of a polyhedron in terms of linear dependence of the vectors.

Theorem 1 Let ${a_i, i=1..,n}$ be ${n}$ vectors in ${{\mathbb R}^d}$ and ${b_i \in {\mathbb R}}$. Let ${P}$ be a polyhedron defined as

$\displaystyle P=\{x: x\in {\mathbb R}^d, a_i^t x \leq b_i, i = 1...n\}.$

For any point ${u \in P}$ define

$\displaystyle I(u)= \{i; a_i^t u = b_i\}.$

Then ${u}$ is a vertex of ${P}$ if and only if the set of vectors ${\{a_i, i \in I(u)\}}$ linearly span ${{\mathbb R}^d}$.

Proof: Let us first prove that vertex implies linear span. We will prove by contradiction.Let ${u}$ be a vertex of ${P}$. Suppose ${\{a_i, i \in I(u)\}}$ does not span ${{\mathbb R}^d}$. For notation let the cardinality of ${I(u)}$ be ${m=|I(u)|}$ Consider the matrix ${A}$ with ${a_i}$ as the columns, \ie,

$\displaystyle A=[a_1,... ,a_m],$

an ${d \times m}$ matrix. Since ${A}$ does not span the entire ${{\mathbb R}^d}$, the kernel space (or null space) of the matrix ${A}$ is non empty (or the dimension of the kernel is greater than ${0}$.) Hence there exists a non zero ${y \in \ker(A)}$. Since ${y \in \ker(A)}$,

$\displaystyle a_i^t y =0,\quad \forall i \in I(u). \ \ \ \ \ (1)$

We will now prove that ${u}$ is not a vertex. Let ${\epsilon >0}$ be a small number. Define

$\displaystyle X= u + \epsilon y,$

and

$\displaystyle Y= u -\epsilon y.$

Observe that ${u= (X+Y)/2}$ and hence if ${X}$ and ${Y}$ lie in the polyhedron ${P}$, then ${u}$ is not an extreme point (vertex) which is a contradiction. We will now prove that for very small ${\epsilon}$, ${X}$ and ${Y}$ indeed lie in ${P}$. To show that ${X}$ (or ${Y}$) lie in ${P}$ we should show that they satisfy ${a_i^t X \leq b_i}$ for all ${i}$. Let ${i \in I(u)}$. Then

$\displaystyle \begin{array}{rcl} a_i^tX &= a_i^t(u+\epsilon y)\\ &= a_i^tu +\epsilon a_i^t y\\ &\stackrel{(a)}{=} b_i +0 \end{array}$

${(a)}$ follows since ${a_i^t u =b_i}$ for ${i\in I(u)}$ and ${a_i^t y =0}$ since ${y \in \ker(A)}$. Now if ${i \notin I(u)}$, then

$\displaystyle \begin{array}{rcl} a_i^tX &= a_i^t(u+\epsilon y)\\ &=a_i^tu +\epsilon a_i^t y \end{array}$

Observe that ${a_i^t u . Hence if we choose ${\epsilon}$ very very small, then ${a_i^tu + \epsilon a_i^t y < b_i}$. (For example, you can choose ${\epsilon = 0.2*\min_{i=1,...,n} \frac{b_i - a_i^t u}{|a_i^t y|}}$.). Hence ${X}$ satisfies all the inequalities and hence is in the polyhedron. Similarly, you can show that ${Y}$ satisfies all the inequalities and hence in the polyhedron.

We will now prove the converse, \ie, linear span implies vertex. Suppose ${ u = (X+Y)/2}$ where ${X, Y \in P}$. We first have for ${i\in I(u)}$,

$\displaystyle a_i^t u =b_i= \frac{a_i^t X + a_i^t Y}{2}.,$

Since ${X, Y \in P}$, we have ${a_i^tX \leq b_i}$ and ${a_i^TY \leq b_i}$, which is possible only when ${a_i^tX = b_i}$ and ${a_i^TY = b_i}$. Now consider the set of equations for ${i \in I(u)}$.

$\displaystyle a_i^t z = b_i.$

Since ${a_i}$ span the entire ${{\mathbb R}^d}$, it follows that there is a unique solution to the above set of equations. However, we have ${a_i^tX \leq b_i}$ and ${a_i^TY \leq b_i}$ and ${a_i^t u =b_i}$. This implies ${u = X = Y}$ since the solution should be unique. $\Box$

Corollary 2 For any vertex ${u}$, the cardinality of the set ${I(u)}$ should be greater than ${d}$, \ie ${|I(u)| \geq d}$.

Proof: We require at least ${d}$ vectors to span ${{\mathbb R}^d}$. $\Box$

We will now analyze a particular polyhedron that is required to simplify the assignment problem. Before that we require the following definition of a permutation matrix. Suppose ${\sigma}$ is a permutation of ${1,...,n}$. Then the permutation matrix ${X_\sigma}$ is an ${n\times n}$ matrix with the ${(i,j)}$ entry given by

$\displaystyle x_{ij}=\left\{\begin{array}{cc} 1& \text{if } \sigma(i) =j\\ 0& \text{otherwise}. \end{array} \right. \ \ \ \ \ (2)$

So for a given ${n}$ there are ${n!}$ permutation matrices.

Birkhoff polytope:

Birkhoff polytope ${B_n}$ is the set of all doubly stochastic ${n\times n}$ matrices. Mathematically,

$\displaystyle B_n=\left\{\{\xi_{ij}\}_{n\times n}: \sum_{i=1}^n \xi_{ij}=1,\sum_{j=1}^n \xi_{ij}=1, \xi_{ij}\geq 0 , \forall i, j \right\}.$

First observe that ${B_n}$ is entirely defined by linear inequalities and hence a polyhedron. Also observe that ${B_n}$ is a convex set.

Hw 1 Show that ${B_n}$ is a compact set.
Lemma 3 An ${n \times n}$ permutation matrix ${X_\sigma}$ is an extreme point of ${B_n}$.

Proof: Suppose ${X_\sigma = (A+ C)/2}$ where ${A, C \in B_n}$. Consider the ${i}$-th row of all the three matrices. Suppose ${x_{im} =1}$ and all the other entries are zero. So we have

$\displaystyle 2[0, 0, ..,1,....0] = [a_{i1}, ..., a_{in}]+[c_{i1}, ..., c_{in}]$

We have ${\sum_{k=1}^n a_{ik} =1}$, ${\sum_{k=1}^n c_{ik} =1}$ and each term is non-negative. This is possible only when ${a_{im}=c_{im} =1}$ and zero for all the other entries of the ${i}$-th row. This shows that ${X_\sigma= A=C}$, which proves that ${X_\sigma}$ is an extreme point of ${B_n}$. $\Box$

We will now prove the converse of the above statement.

Theorem 4 (Birkhoff- von Neumann Theorem.) The extreme points of ${B_n}$ are precisely the set of permutation matrices.

Proof: We prove the lemma by induction. For ${n=1}$, it is an obvious statement. Let us assume it is true till dimension ${(n-1)}$ and prove it for dimension ${n}$.

Consider the following affine subspace of ${{\mathbb R}^{n^2}}$. The space ${L}$ is the set of all ${n^2}$ real numbers ${\xi_{ij} \in {\mathbb R}}$ such that

$\displaystyle \sum_{i=1}^n \xi_{ij} =1, \ \forall j=1,...,n,$

and

$\displaystyle \sum_{j=1}^n \xi_{ij} =1, \ \forall i=1,...,n.$

Observe that the dimension of the space ${L}$ is ${(n-1)^2}$. (Think why??) Observe that ${B_n\subset L}$ and is defined as the polyhedron

$\displaystyle \xi_{ij} \geq 0, \forall\ i,j$

in ${L}$.

Let ${X}$ be an extreme point of ${B_n}$. So by Theorem 1, an extreme point ${X}$ of ${B_n}$ should satisfy at least ${(n-1)^2}$ inequalities with equalities. Hence ${x_{ij} =0}$ for at least ${(n-1)^2}$ entries of the matrix. We make the following observation.

• Every row cannot have more than or equal to ${2}$ non zero entries: In this case the maximum number of zero entries would be ${n(n-2)}$ (each row can have at maximum ${n-2}$ zero entries and there are ${n}$ rows). This cannot be true, since ${n(n-2) < (n-1)^2}$.

So there is at least one row ${i_o}$ with only one non-zero entry at ${j_o}$ location. Since ${X}$ is also doubly stochastic, we have that ${x_{i_oj_o}=1}$. Hence it also follows that there can be no more non zero entries in the column ${j_o}$. Let ${\tilde{X}}$ be the new matrix obtained after removing the row ${i_o}$ and the column ${j_o}$. Observe that

1. ${\tilde{X}}$ is a doubly stochastic matrix:This is because the original matrix ${X}$ was doubly stochastic and the contribution from row ${i_o}$ and column ${j_o}$ is only ${x_{i_oj_o}}$.
2. ${\tilde{X}}$ is an extreme point of ${B_{n-1}}$: Suppose ${\tilde{X}}$ is not an extreme point of ${B_{n-1}}$, \ie, ${\tilde{X}}$ can be expressed as a sum of ${\tilde{X} = (C+D)/2}$, where ${C\in B_{n-1}}$ and ${D \in B_{n-1}}$. Then it is easy to see that by adding one more row (all zeros except at ${j_o}$) and column (all zeros except at ${i_o}$) to ${C}$ and ${D}$ at positions ${i_o}$ and ${j_o}$ will result in new matrices ${\hat C}$ and ${\hat D}$. Observe that ${X = (\hat{C}+\hat{D})/2}$.

So ${\tilde{X}}$ is an extreme point of ${B_{n-1}}$. So by induction hypothesis, we have that ${\tilde{X}}$ is some ${(n-1)\times (n-1)}$ permutation matrix ${X_\sigma}$. Now augment ${X_\sigma}$ with the ${i_o}$ row and ${j_o}$ column to obtain ${X}$. $\Box$

We will now see how to utilize this theorem to simplify the assignment problem. Recall that the AP was to minimize over all the permutations of ${1,...,n}$.

$\displaystyle \min_{\sigma}\sum_{i=1}^n R_{i \sigma(i)}.$

Observe that the cost function ${\sum_{i=1}^n R_{i \sigma(i)}}$ is equivalent to

$\displaystyle \sum_{i,j} R_{ij} x_{ij},$

where ${X_\sigma = \{x_{ij}\}}$ is the permutation matrix. So the AP equals

$\displaystyle \min_{X_\sigma}\sum_{i,j} R_{ij} x_{ij}.$

Observe that even now, ${x_{ij}}$ is restricted to be either ${0}$ or ${1}$.

Now consider a new problem P2:

$\displaystyle \min\sum_{ij} R_{ij} \xi_{ij},$

over the set

$\displaystyle \begin{array}{rcl} &\sum_{i}\xi_{ij} =1, \forall i\\ &\sum_{j}\xi_{ij} =1, \forall j\\ &\xi_{ij} \geq 0. \end{array}$

Observe that we are now maximizing over the Birkhoff polytope ${B_n}$ and ${\xi_{ij}}$ are real numbers. So we are maximizing a linear function ${\sum_{ij} R_{ij} \xi_{ij}}$ over a convex compact set ${B_n}$. Hence the maximum occurs at a vertex point of ${B_n}$. But by Birkhoff- von Neumann theorem, the vertices of ${B_n}$ are nothing but the permutation matrices. Hence solving P2 is equal to solving AP. Also observe that the new problem P2 is a linear programming problem and is very easy to solve in polynomial complexity.