You are currently browsing the monthly archive for January 2012.

We will look a nice application of Carathéodory’s theorem in information theory. Let essentially a vector with int he -th position and zero elsewhere. Before we go to the main example, we require the following lemma.

**Lemma 1**Let denote the “vertices” (extreme points) of the simplex, \ie are the standard basis of . For , let be real valued functions on . Also, let be such that . Then there exists , , and such that

*Proof:* We have

Hence by Caratheodory’s theorem, there exists scalars and vectors such that

proving the theorem.

For a more general version of the Lemma look at Lemma 3 in “Source coding with side information and a converse for degraded broadcast channels”

** Bounds on the cardinality of channel inputs using Caratheodory’s theorem **

We will now look at the capacity theorem for channel capacity of Discrete memoryless channels (DMC). Please see Wikipedia and related links for more introduction on the topic of channel capacity. Refer to the the technical report by M SALEHI for more advanced techniques to bound the cardinality of random variables in Information theory.

Consider a channel with input from the alphabet and output alphabet . A channel is specified by a conditional probability distribution for each input, , \ie the probability of obtaining when is transmitted. For the discussion below, let us assume that the input and the output alphabet are discrete. Shannon proved that the channel capacity of a DMC is

where the maximization is over all the probability distributions on the input alphabet. Let for notational simplicity that the input alphabet cardinality and . Observe that in the above optimization problem, the search domain is the set of all discrete probability distrbutions with mass at -points, \ie, the search space is the -simplex. We will now use Caratheodeory theorem to show that this search space can be reduced drastically if the cardinality of the output alphabet is small.

Let and be two discrete random variables with PMF and conditional PMF . Once the channel is specified, fixing the input distribution fixes the output distribution as . So the only flexibility we have is the distribution of the input.

The mutual information between the random variables and is denoted by and equals

where is the entropy of the random variable defined as

and the conditional entropy is defined as

Observe that the entropies and mutual information depend only on the distribution of the random variables. Let be the optimal distribution on the input alphabet that achieves the capacity . As mentioned earlier, this optimal distribution induces a distribution and conditional distribution and .

Now we will show that there exists one more probability distribution on such that the same mutual information . However the new distribution will have a non-zero probability only on terms of . Observe that if then the search space will reduce.

Since is a dimensional (term) probability distribution, it belongs to the simplex and hence can be represented as a convex combination of the vertex points, \ie,

for some and and are the standard -dimensional basis defined in the previous lemma. Here we have used the notation for the optimal distribution

where are the alphabet of . We now consider the following functions on (see the previous lemma for the definition of )and use the previous lemma to reduce the cardinality. Let . Let be the alphabet of the output Define For any vector ,

We make the following observation for

where is the probability of in the optimal distribution of the channel output. Similarly it follows that

So we have

So the right hand side corresponds to the optimal distribution and the optimal entropy distribution . But by previous lemma, there exists constants and and such that

But from the definitions of , it follows that

where Hence it follows that choosing the distribution on would lead to the same optimal distribution on as and the same conditional entropy . But observe that the new optimal distribution.

has positive mass only on terms of the PMF. This is because has one only one coordinate, the -th coordinate. This proves the claim. Hence it is sufficient to search over all the probability distributions with positive mass on coordinates.

**Definition 1 (Convex hull)**

*The set of all convex combinations of points from a set is called the convex hull of . It is denoted by .*

*
*

Similar to a closure of a set, the convex hull of a set is the smallest convex set containing . We will now see some examples.

- Some basic geometric examples.
- Birkhoff – Von Neumann Theorem: Consider the set of permutation matrices . Then the convex hull of this set is the set of all doubly stochastic matrices.

**Hw 1**Prove that the convex hull of a set is convex.

We will now see an interesting application of convex hull.

**Theorem 2 (Gauss-Lucas Theorem)**

*Let be a -th degree polynomial with real coefficients. By fundamental theorem of algebra there are -roots of the polynomial (they might be complex). Call the roots be the set of all the roots. Observe that each root can be represented as a point in the plane . Then the roots of the degree polynomial lies in the convex hull .*

*Proof:* Without loss of generality the polynomial can be represented as

So we have

Let be a root of . Then we have and hence

Multiplying both sides with , we obatin

which implies

If then for some and hence the theorem is proved. Else if

which equals

which implies

Taking the conjugate we get the result where .

We will now look at one more theorem that looks at convex combinations.

**Theorem 3 (Caratheodory’s theorem )**

*Let . Then any point can be represented as a convex combination of points of*

*
*Some observations before the proof:

- The points depend on in consideration.
- points are necessary. Let be three non-collinear points. Let be the the centroid of the triangle formed by these three points. Observe that the convex combination of the three points is necessary to obtain .
- Observe that the points are from and not .

*Proof:* Any point can be represented as a convex combination of some points in . So

where and . If we are done. Hence we will only consider the case of . Let . Now consider the set of equations and . This system of equations can be put in matrix form as

Since the matrix is of size , and , there are non trivial solutions of this equation (look at rank-nullity theorem to observe that ). From now on Let be one such solution. Now let . Since , the point can be represented as

Also observe that since . Now we will choose such that and equals for one term so that becomes a convex combination. Since implies some ‘s are positive and others are negative. If for some , , then . So we need to consider only the terms with . For all such that , we want

which implies

Hence if we choose , then one term and every other term will be positive. So we are able to express

where one of the coefficients is zero, \ie, as a convex combination of points. We can continue this procedure till

We will now see a simple application of Caratheodory’s theorem.

**Lemma 4**If a set is compact, then its convex hull is also compact.

*Proof:* Consider a sequence . If we show that there is a convergent subsequence which converges to a point in then we are done.

By Caratheodory’s theorem, each can be represented as a convex sum of points in . So

Now consider the following matrix of dimensions

Now consider the first column . Since this sequence lies in and is compact, there is a sub-sequence in that that converges to a point . The indices of this subsequence are denoted by . Now consider the second column, but consider only rows corresponding to the index set . Since a compact set, there exists a subsequene of that converges to some point in . Denote the indices corresponding to this subsequence by the set . Now consider the third column corresponding to but only corresponding to the rows . Since and is compact, we can find a subsequence which converges to . Denote the set of the indices corresponding to this sub sequence by . Continue this till all the are exhausted. So we will finally have an index set corresponding to the last column.

Now consider the original sequence . It is easy to see that, the subsequence corresponding to the index set converges to a point

which belongs to the set proving the result.

*
***1. Convex Sets **

**Definition 1 (Line)**

*Let and be two points. Then the line passing through and can be represented as*

The value corresponds to the point and is the point . Also observe that for a line can any real value. When is restricted to , \ie, , a *line segment* connecting and is obtained. We denote the line segment between and as . Let us see some examples:

- The concept of line and line segment is very clear if and belong to .
- Let and . Then it is very clear that for all . Also it is very clear that . Hence the concept of line is well defined on the set .
- Let and . Then it is very clear that for . From the properties of SD matrices, it follows that
So the notion of line segment is well defined on .

A set is convex if the line segment joining any two points in lies in . Mathematically the set is convex if and only if for any implies . First observe that convexity is not a topological property, that is it does not depend on the norm. Let us see some examples:

- the empty set is always convex.
- Provide other geometrical examples.
- The set
is a convex set. This is also called as the -dimensional simplex. Observe that this set corresponds to all valued discrete random variables.

- Any ball is simplex. Let and belong to the ball . Then by the property of norm
This implies which implies is convex. In general, is a convex set for any and (Home work). Observe that the ball can be represented as (prove it)

- Ellipsoid: Prove that the set , where is a convex set.
- Hyperplane and halfspace: Let and , then the set
is called a hyperplane. The set

is the halfspace.

- Systems of linear equations or Polyhedron: Let be vectors in and be some scalars. Then the set
is called a polyhedron. Succinctly, this can be represented as .

- The set of PSD matrices and is convex. Proof: Let and belong to .
- Voronoi tessellation: Let denote points in . For any point let denote the Voronoi region of the point , i.e.m
Prove that is a Voronoi set.

**Definition 2 (Convex combination)**

*Let denote a set of points. Let and . Then*

is called a convex combination of the points .

*
***Theorem 3**A set is convex if and only if it contains all convex combinations of its points.

*Proof:* It is easy to see that if a set contains all convex combinations of its points, then the set is convex.

Now to prove the other way. We prove it by induction on the number of points. By the definition, it is is true for . Let us assume it is true for any points, \ie, any convex combination of -points lies in the set. Now consider points . Consider

such that . Alternatively,

Observe by induction assumption, , since . So,

is a two points convex combination of and the points and hence belongs to the set .

We will now define a process of convexifying any set.

**Definition 4 (Convex hull)**

*The set of all convex combinations of points from a set is called the convex hull of . It is denoted by .*

*
*Similar to a closure of a set, the convex hull of a set is the smallest convex set containing . We will now see some examples.

- Some basic geometric examples.
- Birkhoff – Von Neumann Theorem: Consider the set of permutation matrices . Then the convex hull of this set is the set of all doubly stochastic matrices.

**Hw 1**Prove that the convex hull of a set is convex.

*
***1. Linear Algebra **

Let be a real valued matrix. The matrix can be considered as a linear operator from to . Let us see what a linear operator is: A function is called a linear operator (functional) if for any , and ,

It turns out that on , the only linear functional’s are matrix operators, \ie, the linear functionals are of the form,

For every linear functional, there are a couple of important associated spaces.

- Range space: The range space of the matrix is a subspace of and is defined as
A space is a subspace if for scalars and in the space, belongs to the space.

Let be the columns of , \ie, . Let ,

This means that is a linear combination of the columns of . Alternatively is the space spanned by the columns of . Hence the Range space of is also the column space of .

- Null space or Kernel space: The kernel space of a matrix is a subspace of of is
Essentially, null space is the set of solution to the homogeneous equation . Let denote the row of . Then

So requires that , , , . This implies that , , , . So for any scalars and the linearity of the inner product,

So if and only if is orthogonal to the subspace spanned by the rows of the matrix . From the previous argument observe that the set of such that equals the orthogonal subspace to the row space of or the column space of , \ie, . So

The following is a fundamental theorem in linear algebra

**Theorem 1**Let . Then

This theorem means that any vector of belongs to or or can be expressed as a sum of two vectors one belonging to and the other to .

**Lemma 2**(Fredholm Alternative) has a solution if and only if for any such that , \ie, , implies , \ie, .

**Hw 1**Let . Show that and are subspaces.

**Lemma 3**Consider the solution of . Let and the range space of . Let be a solution to . Then any solution to the equation belongs to the set

*Proof:*We have . So the equation to be solved can be reformulated to be

So any solution of the original equation is of the form , where belongs to the kernel of . Hence .

Some observations:

- So has a unique solution only when the kernel of has only one element, \ie the zero vector. This means that the dimension of is zero.
- Also observe that is one to one only if .

We will now see when the linear map is continuous.

**Theorem 4**The map is a continuous function if any matrix norm of is finite.

*Proof:*Let and a induced matrix norm. So we have to show that in . Let . Then

Since and , which proves the convergence and hence continuity.

Also since all norms are equivalent, it is sufficient that be finite with respect to one norm.

We will now describe the rank of a matrix. The rank of a matrix is the dimension of the range space of . The following is a fundamental theorem of linear algebra

**Theorem 5**The dimension of the range (or the column space) of a matrix equals the dimension of the range of (or the row space) and equals the rank of the matrix. Or equivalently, The dimension of the row space equals that of the column space

*Proof:* Follows from rank decomposition where the matrix is the matrix of independent vectors of column space. ……

**Theorem 6**Rank-nullity theorem: Let , then

*Proof:*Let denote the dimension of the null space or . Hence there exists a basis of . We can add vectors , so that form a basis of . From the definition of the range space,

which by the linearity equals,

Observe that the dimension of the RHS is if all the vectors are linearly independent. Suppose there exists some scalars so that

Which implies

\ie, . Hence there exists such that

which implies

which is a contradiction since we assumed are a basis of unless which proves our theorem.

**Lemma 7**So the linear map is invertible if and only if either of the conditions hold true

- , \ie, is one-one.
- is onto, \ie,

*Proof:* Follows from the rank nullity theorem.

So the homogeneous set of equations has a non trivial solution only when .

** 1.1. Determinant **

The determinant of a matrix is a function from the space of square matrices to real numbers. One definition of determinant is that it is the volume of the parallelepiped formed by the row or column vectors.

(Leibniz formula)

Look at the following Wikipedia page for more details about the permutation group and the sign of permutation . From this definition it follows that the determinant is a continuous function of the matrices. Very formally, the determinant if the unique function on matrices that is -linear alternating in the columns and takes a unit value on the identity matrix. Some properties of the determinants:

- .

** 1.2. Eigenvalues **

A vector is an eigenvector of an matrix if there exists a scalar such that

The scalar is called the eigenvalue of the matrix . The eigenvalues can alternatively be characterized as the roots of the polynomial equation

The set of eigenvectors form a subspace (Why?) and is called the eigenspace. If there are independent eigenvectors then the matrix can be decomposed

where the -eigenvectors form the columns of . Some facts about eigenvalues:

- .

** 1.3. Symmetric matrices **

A matrix is symmetric if . The space of Symmetric matrices is denoted by . We have the following properties of a symmetric matrix

- The dimension of a symmetric matrix is .
- The eigenvalues of a symmetric matrix are real. (The eigenvectors are real and non trivial if restricted to )
- Eigenvectors corresponding to different eigenvalues are orthogonal.
- Indeed can be decomposed as for a real orthogonal matrix .
- If denote the moralized eigenvectors of , then

** 1.4. Rayleigh quotient **

Let be a symmetric real matrix. Then the Rayleigh quotient is defined as

It turns out that Rayleigh quotient is a very important quantity in the analysis of eigenvalues. For example the following lemma characterizes the maximum eigenvalue of a symmetric matrix.

**Lemma 8**Let denote the maximum eigenvalue of the symmetric matrix . Then

*Proof:*Let be the eigen decomposition of . Then

Since , . So

where some other vector. Observe that is invertible and hence onto. So for spanning the whole of , also spans the whole of . Hence

This equals

Similarly we have the following result for the other eigenvalue.

**Hw 2**How do you characterize the other eigenvalues?

\subsubsection{Positive definite matrices} A matrix is positive definite if for any and ,

This is denoted by . The set of positive definite matrices is denoted by .

*Small digression as to why we consider only the subsets of :*Technically, a positive definite matrix need not be symmetric. Every matrix can be represented as

Observe that is skew symmetric, \ie, . Also . Hence

We have

This implies . So, . Hence to understand the positive definite property of a matrix, it suffices to consider only the symmetric part.)

Some important properties:

- A matrix is positive definite if and only if all its eigenvalues are positive.
- If then and hence invertible.
- Let be the matrix obtained by removing the rows ans the columns indexed by . Then . This implies all the diagonal elements should be greater than .
- There exists a unique matrix called the square root such that . It is easy to see that .
- If then
- If , .
- if and only if for any invertible matrix ,

(Positive seimidefinite matrices) A matrix is positive definite if for any and ,

This is denoted by . The set of positive definite matrices is denoted by .

- A matrix is positive semidefinite if and only if all its eigenvalues are non-negative.
- Let be the matrix obtained by removing the rows ans the columns indexed by . Then . This implies all the diagonal elements should be non-negative
- There exists a unique matrix called the square root such that . It is easy to see that .

** 1.5. Schur Complement **

Let . If is partitioned as

Then if then the Schur complement of in is defined as

We have the following lemma

**Lemma 9**For any symmetric matrix , that can be partitioned as

Then,

- The matrix if and only if and .
- If , then if and only if

*Proof:*

Also we have

So

Since for any matrix if and only if for any invertible matrix , This implies

**1. Compact sets **

We will now move to an important class of sets. These sets are desirable (most analysts) since they are very nice and easy to work with. There are many definitions of compact sets. Since we are in , we will use a sequence definition). There are alternative ways to define compact sets, however we will not concentrate on them (See the appendix for an alternative definition).

We first begin with a notion for a subsequence. Given a sequence a subsequence is denoted by such that .

**A set is compact if and only if for any sequence , there is a subsequence that converges to a point in .**

The above definition does not provide us with a clear picture of how compact sets look. The next theorem provides a better characterization of compact sets in .

**Theorem 1 (Bolzano–Weierstrass theorem)**

*A set is compact if and only if it is closed and bounded in*

Before we go into the proof let us see some examples of compact sets.

- The set is compact since it is closed and bounded.
- is not compact since it is not closed.
- is not compact since it is not bounded.
- The ball in is not compact since it is an open set. However the closure of the ball
is compact.

*Proof:* I will prove the theorem in . A similar proof with more notation will hold true for general . We will first prove that a closed and bounded set is compact. So we should prove that given any sequence in , there exists a convergence subsequence with limit in .

Let be a sequence in . Since is bounded, it can be fitted in a big square say , that is . Now divide the square into equal parts. So now each square is of side . Since the sequence has infinite number of points, at least one of the square should contain infinite number of points (If not we are done). Call that square . So is some square of side length . Now divide the square into equal squares of side length . Again, at least one of these squares will contain infinite number of points. Pick one such square and call it . Now again divide the square into equal squares and continue this procedure.

Observe that the side length of is . So will pick the subsequence this way: Pick any point of the original sequence from square and denote it by . Pick the second point from the original sequence with and lying in square . Call it . Continue this way to obtain a subsequence . You should always be able to do this since there are infinite number of points in any square we picked. Also observe the following: The sets are monotocnically decreasing, i.e., and they are closed. Since the arbitrary intersection of closed sets is closed, is closed. It turns out that the intersection of these sets will contain only one point which we denote by , i.e., (*It is interesting to prove this and will leave it to you*).

The claim is that the subsequence converges to the point in . It is sufficient to prove that the subsequence converses. The limit will automatically lie in the set since the set is closed. So let us now prove that the subsequence converges. Fix an . It is easy to observe that for all , will lie in and also lies in . Since the width of is it follows that

We can choose a large such that (for example ) and hence , which implies as . This proves the existence of a subsequence of any sequence converging to a point in .

The converse (showing that compact implies closed and bounded) is left as HW.

In some sense compact sets are the best kind of sets to deal with. Since they are closed all the limit points exist inside the set and the set itself is bounded.

**Hw 1 ** Are the following sets compact: (prove or disprove)

- as a subset of
- , is invertible matrix and
- as a subset of .
- . Try to prove or disprove using the sequential definition of compactness.

**Hw 2**Prove that the union of two compact sets is compact.

**Hw 3**Let be the squares defined in the previous theorem. Prove that . (Hint: Look at the coordinates of the corners of the square. See if they form a monotonic bounded sequence. )

**Hw 4**If is a compact set and is a closed subset of , i.e., then is compact.

If is a compact set, since is bounded, and exist and since it is closed the and belong to the set . We now come to the most important theorem from an optimization perspective.

**Lemma 2**Let be a compact set and be a continuous function. Then the image of the set , \ie is compact.

*Proof:* Let be a sequence in the set . Then the pre image of the sequence is in the compact set . By the definition of compactness, there is some subsequence that converges to a point in . Since the function is continuous, by the definition, converges to . This means where is some point in . Hence for every sequence there exists a subsequence which converges to a point in . Hence the set is compact.

We will see some examples soon, but first an important corollary.

**Lemma 3**If is a continuous function. Then is bounded on the set and attains its maximum or minimum in the set .

*Proof:* Since is a compact set and is continuous the image is a compact set. Since is compact it is closed and bounded. Hence the supremum and the infimum of the set exist and they belong to the set.

**Lemma 4**Any two norms and on the Euclidean space are equivalent.

*Proof:* Let be the unit sphere in with respect to the standard Euclidean norm , i.e., .

First claim: It suffices to prove that and are equivalent on .

Reason: Suppose you are able to prove that and are equivalent on , i.e.,

Let be an arbitrary point in . Observe that , i.e., the point . Hence

By the property of the norms,

which proves that

We will now prove (1). Consider the function on . Since , and and are continuous functions with respect to the standard Euclidean norm, their quotient is also continuous with respect to the Euclidean norm. Also observe that is a closed and bounded set with respect to Euclidean norm and hence compact. By the previous theorem the image is compact. Hence the supremum and infimum of the set are attained. Hence for any

which proves the result.

We will state the following theorem without proof:

**Theorem 5 (Tychonoff’s theorem)**

*If and are compact sets, then their Cartesian product is also compact. (This holds true for product of an arbitrary number of sets)*

**2. Appendix (optional) **

The following characterization of compact sets is fundamental compared to the sequential definition as it depends only on the underlying topology (open sets)

** 2.1. An open cover description of compact sets **

An open cover of a set is a collection of sets such that . In plain English, an open cover of is a collection of open sets that cover the set . Observe that the set need not be finite. Let’s see some examples.

- Let . Then an open cover of is the set of sets . This is because . Observe that the cardinality of the cover is infinite.
- Let . Another open cover of is since . Here the cardinality of the cover is finite.
- Let . Then the set is an open cover since . Here again the cardinality of the over cover is not finite.

A set is compact if any open cover of has finite sub cover. This means that given *any* cover of compact set , you will be able to pick a finite number of subsets of the open cover to cover . The emphasis is on the word *any*. You should be able to pick a finite sub cover for any cover. Lets look at some examples:

- Consider the set . Then one open cover of is the set of sets . Let’s see if we can find a finite subcover. Observe that and belong to the previous cover and . It turns out (we will see later) that is compact.
- Consider the set . Then one open cover of is the set of sets . Let’s see if we can find a finite subcover. Observe that and belong to the previous cover and . Let see if we find an open cover of which has no finite sub cover.

Consider the open cover . This is an open cover of . You can check, how much ever you try you cannot pick a finite number of these sets to cover . is not a compact set. As this example shows, the finite subcover property should hold for any open cover.

The following theorem shows that the earlier definition of compact sets based on sequences and this definition based on open sets are both equivalent on .

**Theorem 6 (Heine–Borel theorem)**

*A set is compact if and only if is bounded and closed.*

**1. Continuous functions **

A function is continuous at if for any sequence , . A function is continuous if it is continuous on every point of its domain. This is a very intuitive definition. Observe that the definition does not restrict the function to be in real spaces. Let us know see some examples. Let us see an example of a discontinuous function for the definition to be clear.

It is obvious that this function is discontinuous. However, we will use the definition to make this point. Consider the sequence . We have that . We have

since . However we have

So,

Since we are able to find a sequence such that which implies the function is not continuous at . One important observation is the following which I state as a lemma.

**Lemma 1**For any sequence and a function

if and only if is continuous.

**So before taking the limit operation inside a function, it is necessary to check if the function is continuous. ** Alternatively, a function is continuous at if for any , there exits a such that whenever . This is just a reformulation of the continuity. Observe that the notion of continuity depends on the notion of Balls and norm. (It turns out that in norm doesn’t matter since all norms are equivalent).

Let be some norm in . Consider the function . We will now show that the function is continuous with respect to the topology generated by the the Euclidean norm .

**Lemma 2**Let denote some norm on . Then the function is a continuous function with respect to the Euclidean norm.

*Proof:* Let be some sequence in and let converge to in the norm. This means that as becomes large. To prove continuity, we will have to show that as . From triangle inequality,

Hence rearranging and taking absolute value

Let be the standard basis of , i.e., a vector with in the -th position. Then we have

where is the -th coordinate of (recall that . Again using triangle inequality (or the property of the norm),

Observe that is a real number. Using the scaling property of a norm,

Now using Cauchy-Schwartz inequality,

Hence

where is some finite constant. So,

as . Hence converges to as . This implies that the function is continuous.

But we have for all , . So

Hence converges to . This implies is continuous.

While the definition is analytically good, its not “easy” to check if a function is continuous at every point. We now provide a stronger version of continuity that is often useful and desired. A function is Lipschitz continuous, if there exists a constant , such that for any in the domain of the function

is called the Lipschitz constant. It is easy to see that every Lipschitz continuous function is continuous. Let . Hence

Since is continuous we can move the limit inside the norm on the RHS.

This implies which implies the continuity of . Indeed we can show that every Lipschitz function is also differentiable and its derivative bounded. Examples of Lipschitz functions:

- is Lipschitz function with constant .
- and are Lipschitz function with constant .
- is Lipschitz function with constant .

We will now look at an alternative definition of continuity that uses open sets.

**Lemma 3**A function is continuous if and only if the inverse image of any open set is an open set. More precisely, is open for any open set .

*Proof:* We will prove that for a continuous function, the inverse image of an open set is open.

- Let . This implies .
- Since is an open set, there exists an such that .
- Since is continuous, there exits a such that if , then . This implies that
- This implies that .
- Hence is an open set.

The converse can be proved similarly (HW).

Observe that the ambient spaces ( and ) are important. A function can be continuous in one space and not in another. Look at on and .

Let us check for the constant function, . Its range is which is a closed set. Now take any open set in . Then which is open. Hence the function is continuous. Let us look at the step function that I have defined earlier. Consider the open set . Its inverse image is and equals . Observe that is not an open set and hence the function is not continuous.

**Hw 1**Use this definition to check the continuity of .

**Hw 2**Is the function continuous on ?

**1. Open sets **

On the standard topology can be described using the balls generated by the Euclidean norm. We begin with the central object of topology, an open set.

**Definition 1**A set is open if for every , there exists an such that .

Simply put, a set is open if for every point in the set we can find a radius such that a ball of radius centered around the point fits entirely in the set. Observe that the can be arbitrary small and also can depend on the position . We now provide a few examples.

- By convention, empty set is an open set.
- Let , \ie, the real line. The interval is an open set. It is obvious that if , we can choose and will fit entirely in . For any , choosing would suffice. Also observe that we do not have to worry about the end points and since they do not belong to .
- Observe that if and are open intervals then is open. This proof is obvious.
- If and are open sets in , show that are open sets.
- is open in .
- is not an open set. Reason out why? (Check what happens when you try to center a ball around ).
- A -dimensional ball is an open set in . (Prove it)

**Hw 1**Check if the following sets are open subsets of

please provide reasons.

**Theorem 2**Let and be two open sets. Then the intersection of the sets is open.

*Proof:* Let . This implies that belongs to both and . Since is an open set, there exists an such that and an such that . It is easy to see that, , which implies is open.

The previous two results can be extended to any finite collection of sets by induction. More precisely, if are open sets. Then is open. Let us see an example. If and , then their intersection is which is an open set. Let us also see an example which highlights the importance of finite number of sets. Let

Observe that is an open set. It is also easy to observe that the intersection of or for that case any finite number of is open. However,

which is closed. Hence one cannot claim that the intersection of an arbitrary number of open sets is open.

It turns out that the union of open sets is always open regardless of the number of sets.

**Theorem 3**Let be any set. Then if , is an open set for every , then their union

is always open.

*Proof:* HW

The set in the above theorem is called an indexing set. For example or or . So the number of sets in consideration depends on the cardinality of the index set. The above theorem indicates that for any cardinality of the index sets , the union is still open.

**Hw 2**This problem emphasizes the fact that the notion of a set being open (or closed) depends on the ambient space in consideration. As a subset of check if is an open set. Is an open set of ? (Use the definition of open set)?

**Hw 3**Let . Is the set an open subset of ? “” denotes set minus.

**Hw 4**Let denote the set of rational numbers in . That is

Is the set an open set of ?

**Hw 5**Let be a invertible matrix and belong to the range space of . Show that the set

is an open subset of . What happens if is not invertible?

**Hw 6**The following exercise shows the properties of sequences in an open set. Let and . Verify the following:

- Does ?
- What is the limit ? Does it belong to the set ?
- Let . Does the limit of the sequence belong to ?

This shows that even though a sequence belongs to an open set, the limit of the sequence (if it exists) may not belong to the set.

**Hw 7**Observe that the definition of an open set depends on the notion of a Ball which in turn depends on the norm used. Suppose is an open set with respect to the norm (the standard Euclidean norm). Is the set an open set in , when we use a different norm, for example the norm? Hint: Use the equivalence of norms in and the definition of open sets.

**2. Closed sets **

A set is a closed set if its complement is open.

So is a closed set in since its complement is an open set. Any set with finite cardinality (for example or ) is a closed set. Also observe that the entire set is both a closed and open set with respect to . So a closed or an open set need not be *bounded*. Also by convention, the empty set is a closed set.

**Hw 8**Check if the following sets are closed subsets of

Explain why.

From the definition of closed sets, we have the following important property which we state as a theorem.

**Theorem 4**Let be an open set in . Then its complement denoted by is closed.

We will now use the definition of closed sets and De-Morgan laws to characterize the union and intersection of closed sets.

**Theorem 5**Let and be two closed sets. Then the union of the sets is closed.

*Proof:* We have Since and are open sets (complements of closed set), it follows from the open sets intersection theorem that is open. Hence is open implying its complement is closed.

Observe that as in the open set case, the above theorem can be extended to any finite collection of closed sets. It turns out that the intersection of closed sets is always closed regardless of the number of sets.

**Theorem 6**Let be any set. Then if , is a closed set for every , then their intersection

is always closed.

*Proof:* HW

If are closed sets. Then is closed and follows from the previous result by induction.

While the above definition characterizes a closed set, we now look at an alternate definition of closed sets that is more intuitive. For that we require the notion of a limit point.

**Definition 7**is a limit point of a set if for every , is non-empty.

So a point is a limit point of a set when every ball centered around the point has a non-empty intersection with the set. For example if then every point of is a limit point of the set. In addition, the points and are limit points of the set (check why?). However is not a limit point of the set (Why?). So one can think of a limit point as a point in the set, or a point which is *arbitrary close* to some point in the set.

**Hw 9**Find the limit points of the following sets.

- .
- the set of rational numbers.

**Theorem 8**A set is closed if and only if it contains all of its limit points.

*Proof:* We we prove by contradiction. Let be a limit point of the closed set and let . We will provide a contradiction.

Since is closed, is open. Also since implies . Since is open, by the definition of an open set, there exists an such that . This implies that for this . Hence is not a limit point of . This contradicts our assumption.

**Hw 10**Prove that the supremum and the inifimum of a closed set belong to the set.

The closure of a set denoted by equals the union of the set and all its limit points, \ie,

So closure of equals .

We will now define the interior and the boundary of a set. Before that we require the notion of an interior point. Let be a point such that there exists a ball of some radius such that the ball fits entirely in . The set of all such points of is called the interior of and is denoted by

The definition of the interior is similar to that of an open set. In fact the interior of a set is the largest open set contained in . The interior of the set is the set itself. The interior of the set is the set . The interior of the set is the set . *Also observe that the set might not have any interior.*

**Hw 12**Find the interior of the sets in HW problem 11

**Hw 13**What is the interior of the set as a subset of (with respect to the metric on ). What is its interior as a subset of (with respect to the metric on ).

The above exercise shows that the ambient space is important to the definition of the interior of a set.

**Hw 14**Prove that a set is open if and only if all of its elements are inter points, \ie . Hint: Look at both the definitions.

The boundary of a set denoted by equals the set difference between the closure of the set and the interior of the set, \ie,

For example the boundary of the set are the points . The boundary of the set is the set .

**Hw 15**What is the boundary of the set as a subset of (with respect to the metric on ). What is its boundary of the above set as a subset of (with respect to the metric on ).

**Hw 16**Find the boundary of the sets in Exercise 11

**Hw 17**Prove that the boundary of a set is a closed set.

A set is bounded if there exists a ball of radius centered around the origin that contains the set. The square is bounded while the set is not bounded.

This lecture notes will cover the basics of Analysis and Linear Algebra that is required for the optimization theory course EE5121. This is by no stretch a comprehensive treatment of either Real Analysis or Linear Algebra. There are many excellent books that deal with both these subjects. Here are a few references:

- Andrei Nikolaevich Kolmogorov, Sergei Vasilevich Fomin, “Introductory real analysis”.
- H. L. Royden, “Real Analysis”.
- Walter Rudin, “Principles of mathematical analysis”.
- Gilbert Strang, “Introduction to linear algebra”.
- Roger A. Horn, Charles R. Johnson, “Matrix analysis”.

**1. Real numbers **

The set of real number is denoted by . We begin with an important property (axiom?) of the real number system. The real number system can be ordered. In plain English this means any two real number can be compared. We take this property for granted. However, it turns out to be a very important property that in general does not hold for lots of other fields.

Since any two numbers can be compared we can provide an upper bound to a set of real numbers. For example, consider the set of numbers . There are lot of real numbers that upper bound this set. Any number in the set , upper bound the numbers in while any number in the set , lower bound the real numbers in . If you consider the interval , then any number in the set upper bounds while lower bounds . It turns out, whenever a set has some upper bound and some lower bound, we can always pick the best upper bound and the best lower bound.

Don’t try to prove this as this is a distinguishing axiom of real numbers: *“Every non empty set of real numbers that has an upper bound has a least upper bound. Similarly, Every non empty set of real numbers that has a lower bound has a greatest lower bound.” * The least upper bound of a set is often denoted by and the greatest lower bound is denoted by . Also observe that the and need not belong to the set .

If we consider the set , we can see that and . For the set , there is no supremum as it is not upper bounded. Its infimum . However, if we extended the real number system by including and infimum and supremum exist for any set (they might just be and ).

**Hw 1**Find the and the of the following sets:

**Hw 2**For any , can ? Can you tell anything about the set ?

**Hw 3**Is it necessary that the supremeum belongs to the set . Similar question for the infimum. Look at the sets , and .

If the supremeum of a set belongs to the set, ie, , then the supremum is usually replaced with the maximum . So makes sense while doesn’t make sense. *So if you are unsure, always use and instead of and .*

**2. Basic Topology **

** 2.1. Norm and Inner Product **

One of the nice properties of real spaces is that we can define a norm and a notion of distance in a rather concrete manner. Most of the course deals with real functions and subsets of -dimensional real space. So unless otherwise stated the discussion will be about real spaces and functions. We first begin with the definition of a norm. A norm on is a function that maps every to a non-negative real number with the following properties:

- Let and , then .
- .
- If then

The most popular norm on is the Euclidean norm. For any point , its Euclidean norm is defined as

Other examples of norms are

**Hw 4**Prove that ,, satisfy the properties of a norm.

Two norms and are *equivalent*, if there exists two positive constants and such that for any

Observe that and do not depend on . Let us now show that and are equivalent. Let be some element in . Then

Where follows since any , . Taking square root on both sides we have, . We also have

Taking square root we obtain . Hence we have

which show that the norms and are equivalent. The following theorem shows that all norms in are equivalent.

*Proof:* The proof of this follows from a simple optimization problem. We will prove this later in the course.

**Hw 5**Show that , and norms are equivalent.

From now on we assume that we are provided with some norm on which we denote by . A norm can be used to define a distance metric as

It is easy to prove that is a metric. The only thing to prove is the triangle inequality which follows in a straight forward manner from the norm definition.

One of the most fundamental geometric object is a ball. A ball centered at and radius is denoted by and is defined as

Observe the strict inequality. This is called as an open ball.

**Hw 6**

- Plot when the norm is given by in .
- Plot when the norm is given by in .
- Plot when the norm is given by in .

Explain what equivalence of these three norms in terms of the plots.

**Hw 7**Let be a real matrix. For any define

Is a norm in a norm for any matrix ?. What are the properties of that will make a norm?

**Hw 8**Let be a real matrix. For any define

Is a norm in a norm for any matrix ?. What are the properties of that will make a norm?

**Hw 9**Let denote a symmetric matrix. A norm on the set of symmetric matrices can be defined as

Show that is a norm on symmetric real matrices.

\subsubsection{Dual norm} For any norm , its dual is defined as

We will first show that is a norm. We will show all the properties

- Homogeneity: First assume ,
Now if , then

- Suppose . This implies . However if we choose , then which is a contradiction. Hence .
- Triangle inequality:

Also observe that from the proof we observe that when . The dual norm of is again the norm, while for is the conjugate norm of , where . We have the following important lemma.

**Lemma 2**

*Proof:* From the definition we have

Since is the supremum of over all norm vectors, picking any particular one will only give a lower bound on . Pick . Observe that . Hence

which proves the result.

** 2.2. Sequences **

A sequence in (or any space), is a map from the set of integers to (or the space in consideration). We will now look at some important properties regarding the convergence of sequences. We say that a sequence converges to if for every , there exists some such that the distance between and for all is less than , \ie,

Let us think about a game: Suppose you want to argue (prove) that a sequence converges to some limit to your friend who is not convinced and challenges you. Your friend gives you an , say . Then you should show that you can find some such that the distance between the terms of the sequence after and , \ie, is smaller than the provided by your friend. Observe that you can choose whatever you want. So is provided by someone (your friend) and you will have to find the required . Now let us see some examples.

- Let for all . It is obvious that the limit of the sequence is . However, let us check it more formally. Suppose your friend gives you . You can observe that you can choose , since . Even if he had given you , would be valid.
- Let . Of course the limit is . Suppose your friend gives . Let us see how to choose the correct . So we want
which is always true. So you can pick . Now suppose your friend challenges you for , then you would want

This is possible if . So you can answer your friend that the corresponding to is .

- Let except that we replace some values as follows: , , and . You claim that the sequence still converges to . Then your friend challenges to . So what do you pick as. As last time picking will not work since . But you can see that choosing will work, since for all . So only the values of for large will matter and fluctuations for a finite set of values wont matter.

**Hw 10**Formally prove the following limits.

Let . Observe that for all . However, the limit does not belong to the set . In the coming sections, we will characterize sets so that when the sequence belong to the set, the limit also belongs to the set.