You are currently browsing the tag archive for the ‘Hahn-Banach Theorem’ tag.

1. Separation Theorems

We begin with some definitions of halfspaces:

Definition 1 Let {a \in {\mathbb R}^n} and {b\in {\mathbb R}}.

  1. Open half space : {H_+ = \{x: a^tx > b\}}
  2. Open half space : {H_- = \{x: a^tx < b\}}
  3. Closed half space : {\overline{H_+} = \{x: a^tx \geq b\}}
  4. Closed half space : {\overline{H_-} = \{x: a^tx \leq b\}}
Definition 2 Let {A} and {B} be two set in {{\mathbb R}^n} and {H} a hyperplane. The sets {A} and{ B} are said to be separated by {H} if {A} lies in {\overline{H_+}} and {B\subset \overline{H_-}}. The sets are {A} and{ B} are strictly separated by {H} if {A} lies in { {H_+}} and {B\subset {H_-}}. See Figure 1.

Observe that the two sets A and B in the top figure cannot be separated by any hyperplane. Right figure: $A= {x<0}$ and $B={y>1/x, x>0.}$. Observe that while the sets can be separated, they cannot be strictly separated.

Separation theorems deal with sufficient conditions for separation of sets by hyperplane. There are numerous separation theorems with increasing order of complexity. We will look at two separation theorems. In the following discussion the norm we consider is the Euclidean norm. We begin with the following definition of distance to a set

Definition 3 The distance between a point {x} and a set {A} is defined as

\displaystyle dis(x,A)=\inf\{\|x-y\|; y \in A\}

We require the following lemma that characterize the distance of a point to a closed set. The proof can be found in standard analysis textbooks.

Lemma 4 Let {A} be a non-empty closed set and {x} be a point. Then there exists a point in {A} such that

\displaystyle \|x-u\| =dis(x,A).

Lemma 5 Let {A} and {B} be two non-empty sets in {{\mathbb R}^n}. Further, let {A} be closed and {B} compact. Then there exists two points {a\in A} and {b\in B} such that

\displaystyle \|a-b\| = \inf\{\|x-y\|; x\in A , y \in B \}.

Proof: Claim1: The function: {dis(x,A): {\mathbb R}^n \rightarrow [0,\infty) } is a continuous function. (Easy to prove. Think why).

Since {B} is a compact set and {dis(x,A)} is a continuous function, it attains a minimum on a point {b} on {B}, \ie,

\displaystyle dis(b,A)= \inf\{dis(x,A), x\in B \}. \ \ \ \ \ (1)

Also by previous lemma there exists a point {a } in {A} such that

\displaystyle dis(b,A)=\|b-a\|.

Let {x }be some point in {A} and {y} some point in {B}. Then

\displaystyle \|x-y\| \stackrel{(a)}{\geq} dis(y,A)\stackrel{(b)}{\geq }dis(b,A) = \|b-a\|,

where {(a)} follows from the definition of distance to the set and {(b)} follows from (1). So for any {x \in A} and {y\in B}, we have {\|x-y\| \geq \|b-a\|}. Hence

\displaystyle \inf\{\|x-y\| , x\in A, y \in B\} = \|b-a\|,

since {b \in B} and {a \in A}. \Box

We will now look into this distance functions more closely.

Lemma 6 Let {A} be a closed convex set and {x} be a point in {{\mathbb R}^n}. Then there exists an unique point {u \in A} such that

\displaystyle \|x-u\| = dis(x,A) ,

\ie, the point {u \in A} achieves the distance. Moreover, the inner-product

\displaystyle <x-u, y-u> \leq 0,

for any {y\in A}.

Proof: Existence of some point {u} in {A} follows from Lemma 4 because {A} is closed. We will have to show that the point is unique and the inner product inequality is satisfied. We will begin with the inequality. Since {y \in A} and {u \in A}, the line segment {[y, u] \in A}. Since {\|x-u\|} is the minimum distance,

\displaystyle \|x-u\| \leq \|x - \theta y -(1-\theta)u\|= \|x-u+\theta(u-y) \|.

So we have

\displaystyle \|x-u\| \leq \|x-u+\theta(u-y) \|.

Squaring both sides and after some basic simplification (check this)

\displaystyle 2\theta <x-u,u-y>+\theta^2\|u-y\|^2\geq 0.

Observe that this inequality should hold true for all {\theta}, especially very small {\theta}. Since for very small {\theta}, {\theta^2 << 2\theta}, it follows that the above expression is non-negative for all {\theta} if and only if {<x-u,u-y> \geq 0}, which proves the inequality.

Suppose there exist one more point {u_i \in A} such that {\|x-u\| =\|x-u_1\|}. Then by the above inequality we have

\displaystyle <x-u, u_1-u>\leq 0


\displaystyle <x-u_1,u-u_1>\leq 0.

Adding both these inequalities we obtain

\displaystyle \|u-u_1\|^2 \leq 0,

which implies {u=u_1}. \Box

Geometrically, the inequality means that the angle between the vectors {x-u} and {y-u} is obtuse.

Hw 1 Let {A} be a closed convex set. Then for any {x \in {\mathbb R}^n}, from the previous lemma, we have seen that there exists a unique point {u} such which attains the minimum distance. For any {x \in {\mathbb R}^n}, call the point {u\in A} that attains the distance, the projection of the point {x} on {A}. Then we can define a projection function {f_p(x):{\mathbb R}^n \rightarrow A}. Prove that {f_p(x)} is a Lipschitz function.This function is called the projection operator. (Hint: Use the inner product inequality)

We will now look at the first separation theorem:

Theorem 7 Let {A\subset {\mathbb R}^n} and {B \subset {\mathbb R}^n} be non-empty non-intersecting convex sets. Moreover let {A} be a compact set and {B}be a closed set. Then there exists an hyperplane that strictly separates both the sets.

Proof: Since {A} is compact and {B} is closed, by Lemma 5, there exists two points {a \in A} and {b \in B} such that {a } is the closest point in {A} of the point {b} and {b} the closest point in {B} of the point {a}.

Let {x} be some point in {A}. Then from Lemma 6, we have

\displaystyle \begin{array}{rcl} <b-a,x-a> \leq 0 \end{array}

which implies

\displaystyle \begin{array}{rcl} <b-a,x> &\leq &<b-a,a>\\ &=&<b,a>-<a,a>\\ &=&\frac{1}{2}\left(2<b,a>-2<a,a>\right)\\ &=&\frac{1}{2}\left( <b,b> -<a,a> -<b,b>+2<b,a>-<a,a>\right)\\ &=&\frac{1}{2}\left( \|b\|^2-\|a\|^2 -\|a-b\|^2\right)\\ &<&\frac{1}{2}\left( \|b\|^2-\|a\|^2 \right). \end{array}

Observe the strict inequality in the last line. This is because {A\cap B = \emptyset} which implies {\|a-b\| >0}. Hence we have

\displaystyle (b-a)^t x < \frac{1}{2}\left( \|b\|^2-\|a\|^2 \right). \ \ \ \ \ (2)

Now let {y} be some point in {B}. Then from Lemma 6, we have

\displaystyle <a-b,y-b>\leq 0.

Using this inequality and similar trick as before, it follows that

\displaystyle (b-a)^t y > \frac{1}{2}\left( \|b\|^2-\|a\|^2 \right). \ \ \ \ \ (3)

Let the hyperplane be

\displaystyle H= \{z: (b-a)^Tz =\frac{1}{2}\left( \|b\|^2-\|a\|^2 \right), z\in {\mathbb R}^n \}.

It is easy to see from (2) and (3) that the hyperplane {H} strictly separates the sets {A} and {B}. \Box

The next separation theorem looks at separation of a point and an open set. Before that lets look at hyperplanes again. Earlier we have seen that a Hyperplane in {{\mathbb R}^n} is an affine space and its dimension is of dimension {n-1}. It also follows that any affine subspace of dimension {n-1} in {{\mathbb R}^n} is a hyperplane.

Lemma 8 Let {A \subset {\mathbb R}^n} be an affine subspace of dimension {n-1}. Then {A} is a hyperplane.

Proof: To show that {A} is an hyperplane, we have to find a scalar {b \in {\mathbb R}} and a vector {a \in {\mathbb R}^n} such that

\displaystyle A=\{x: a^tx=b, x\in {\mathbb R}^n\}.

Let {v \in A}. Since {A} is an affine subspace, translating the space {A} by {v}, \ie, {B=A-v} is a subspace again of dimension {n-1}. Since {B} is a subspace (which is a vectorspace) of {{\mathbb R}^n} there exists {n-1} vectors {v_1, ..... v_{n-1}} such that they form the basis of {B}. By Gram-Schmidt orthogonalization we can also assume that these vectors form an orthonormal set. Since these are orthonormal, they are linearly independent in {{\mathbb R}^n}. But since {{\mathbb R}^n} is of dimension {n}, we can find one more vector {v_n} which is orthogonal to every vector in the set {\{v_1, ....,v_{n-1}\}}. ({v_n} together with {v_1,...,v_{n-1}} will form a basis of {{\mathbb R}^n}). Since {v_n} is orthogonal to any basis vector of the subspace {B}, it follows that it is orthogonal to every vector in the subspace {B}. Indeed, it can be shown that any vector orthogonal to {v_n} should belong to {B} (think projections ?). Hence

\displaystyle B=\{x: v_n^tx=0, x\in {\mathbb R}^n\}

Since {B=A-v}, we have

\displaystyle \begin{array}{rcl} A&=&\{x+v: v_n^tx=0, x\in {\mathbb R}^n\}\\ &=&\{y: v_n^t(y-v)=0, y\in {\mathbb R}^n\}\\ &=&\{y: v_n^ty=v_n^tv, y\in {\mathbb R}^n\} \end{array}

which is the equation of a hyperplane with {a=v_n} and {b=v_n^tv}. \Box

We will now get back to the main separation theorem. The following separation theorem is a special case of geometric Hahn-Banach theorem.

Theorem 9 (Geometric Hahn-Banach theorem) Let {A\in {\mathbb R}^d} be an open convex set and {u \notin A} be some point. Then there exists a hyperplane which contains {u} and strictly isolates {A}.

Proof: Without loss of generality we can assume that {u=o}. Otherwise, we can translate both {u} and {A} so that {u=0}.

We will prove this in three stages: First we will prove the theorem for {d=2}, i.e, a plane. Consider the circle {S_1=\{(x,y): x^2+y^2=1 \}}. Now we project the convex set {A \subset {\mathbb R}^2} onto the circle in the following way:

\displaystyle f(x): A \rightarrow S_1 ,


\displaystyle f(x)= \frac{x}{\|x\|},\ x\in A.

Observe that the angle of the arc {f(A)} should be less than {\pi}. Otherwise,if it is greater than {\pi}, consider a point {x} on the arc {f(A)}. Let this point be the image of a point {x_A \in A}. Since the arc {f(A)} is greater than {\pi}, the diametrically opposite point of {x} should also be in the set {f(A)}. Call that point {y} and its inverse image as {y_A \in A}. Since {A} is convex, observe that the line segment {[x_A , y_A]} should be in the set {A} and also the line segment should contain the origin. Hence {o \in A}, which is a contradiction to our assumption.

Also, since {A} is an open set, observe that the image of {A}, {f(A)} will be an open arc on {S_1}. Now consider an end point of the arc. This can be done, since the arc is open. Now consider the line joining the end point and the center origin. This line contains the origin and the set {A} is on one side of the line. This line is the required hyperplane for two dimensions.

Now let us consider higher dimensions {d \geq 2}:

We will first now prove that there is always a line {L} (one dimension) in higher dimension that passes through the origin such that {L\cap A=\emptyset}. Consider a two dimensional plane {P} through the origin. If it does not have any intersection with {A}, pick some line in the plane and denote it by {L}. This is the required line. Suppose {P\cap A} is non-empty, it is easy to see that {P\cap A} is a convex and open set that does not contain the origin. Hence by the previous two dimensional result, it follows that there is a line {L} that passes through the origin such that {A\cap P\cap L =\emptyset}. This is the required line {L}.

Let {M} be an subspace such that {o \in M} and {M\cap A =\empty}. Also let {M} be the biggest such affine space, \ie, any other subspace with this property is contained in {M}. If the dimension of {M} is {d-1}, then by Lemma 8, it is a hyperplane and we are done (this is because {M} is an affine subspace through the origin).

We will now prove by contradiction that the dimension of such maximal set is {d-1}. Let the dimension of {M} be {k <d-1}. Since {M} is a subspace there are {k} basis vectors that generate this subspace. Call them {v_1,....,v_k}. Adding more vectors {v_{k+1}, ...,v_{d}}, the set {v_1,v_2,....,v_d} will form a basis of {{\mathbb R}^d}. So any vector {x \in {\mathbb R}^d} can be represented as a linear combination of the basis vector, i.e.,

\displaystyle x =\sum_{i=1}^k \beta_i v_i +\sum_{i=k+1}^d \beta_i v_i . \ \ \ \ \ (4)

Hence we can define the following projection function from {{\mathbb R}^{d}} to {{\mathbb R}^{d-k}} as

\displaystyle f_p(x)= (\beta_{k+1},\hdots, \beta_d).

Consider the projection of the convex set {f_p(A)}. The set {f_p(A)} is convex and open (Why?). Now from the previous argument we can find a line {L} that passes through the origin in {{\mathbb R}^{d-1}} but does not intersect {f_p(A)}. Now consider the inverse image of {L} under the map {f_p(x)}, \ie, {f_p^{-1}(L)}. From (4), we observe that this means only the last {d-k} coordinates will correspond to {L} and the first {k} coordinates to {M}. Hence it follows that

\displaystyle G=f_p^{-1}(L) = \{M+x: x\in L\} =\cup_{x\in L} (M+x).

Observe that {G} is a subspace (why ?) that contains {M} and larger than {M}. Also, it has no intersection with {A}. This is a contradiction since we assumed {M} is the largest subspace with these properties.

So{ k=d-1} and hence by Lemma 8, {M} is a hyperplane. \Box