\[ \newcommand{R}{\mathbb{R}} \newcommand{C}{\mathbb{C}} \newcommand{K}{\mathbb{K}} \newcommand{N}{\mathbb{N}} \newcommand{Z}{\mathbb{Z}} \newcommand{defarrow}{\quad \stackrel{\text{def}}{\Longleftrightarrow} \quad} \]

This week's lecture notes is an attempt to cover (in finite time) the remaining topics listed in the course description. Most important is the spectral theorem (including basic properties for adjoints and self-adjoint operators) and the characterization of positive definite matrices. The singular value decomposition can then be understood as a generalization of the spectral theorem. Finally, one should know the Gram–Schmidt orthogonalization and QR-decomposition.

Adjoints and decompositions

Consider a Hilbert space \(H\) over a field \(\K \in \{\R, \C\}\). Most of the results in this section will be for the cases \(H = \R^n\) and \(H = \C^n\).

Adjoints

Let \(T \in B(H)\) be a bounded linear operator \(H \to H\).

  • The adjoint of \(T\) is the operator \(T^* \in B(H)\) defined by \[ \langle Tx, y \rangle = \langle x, T^* y \rangle \quad\text{ for all }\quad x,y \in H.\]
  • \(T \) is called self-adjoint if \(T = T^*\).

> Properties of the adjoint

The adjoint is well defined: for each \(T \in B(H)\), there exists a unique \(T^* \in B(H)\). The map \(^*\colon B(H) \to B(H)\), \(T \mapsto T^*\) satisfies the following properties:

  • It is anti-linear: \( (\mu S + \lambda T)^* = \overline{\mu} S^* + \overline{\lambda} T^*\), for all \(S,T \in B(H)\) and \(\mu, \lambda \in \K\).
  • It is bounded with unit norm: \(\|T^*\| = \|T\|\).
  • It is invertible with itself as inverse: \((T^*)^* = T\).

N.b. We adopt the convention that \(T^{**} \stackrel{\text{def.}}{=} (T^*)^*\).

Proof

Proof

Existence of the adjoint: For each \(y \in H\), the map \(x \mapsto \langle Tx, y \rangle\) is a bounded linear functional, since by the Cauchy–Schwarz inequality and the boundedness of \(T\), \[ |\langle Tx, y \rangle| \leq \|Tx\| \|y\| \leq \|T\| \|x\| \|y\|. \] The Riesz representation theorem thus guarantees that there exists a unique element \(y^* \in H\) with \[ \langle Tx, y \rangle = \langle x, y^* \rangle. \] Define \(T^* y := y^*\).

Linearity and boundedness of the adjoint: Since \(\lambda y_1^* + \mu y_2^*\) is the unique element corresponding to the functional \(\langle Tx, \lambda y_1 + \mu y_2\rangle \), the map \( y \mapsto T^* y\) is linear. It is also bounded: we have \[ |\langle x, T^*y \rangle| = |\langle T x, y \rangle| \leq \|T\| \|x\| \|y\|, \] so by choosing \(x = T^* y\) we obtain \[ \| T^* y \|^2 \leq \| T \| \| T^* y \| \|y \| \quad \Longrightarrow\quad \| T^* y \| \leq \| T \| \|y \|, \] so that \( \|T^*\| \leq \|T\|\). But since \(T^{**} = T\), we also obtain that \( \|T\| \leq \|T^*\|\), whence \( \|T^*\| = \|T\|\).

Anti-linearity and boundedness of *: Since \[ \langle x, (\mu S + \lambda T)^* y \rangle = \langle (\mu S + \lambda T)x, y \rangle = \mu \langle Sx, y \rangle + \lambda \langle Tx, y \rangle = \mu \langle x, S^* y \rangle + \lambda \langle x, T^* y \rangle = \langle x, (\overline{\mu} S^* + \overline{\lambda} T^*) y \rangle, \] the map \(^*\) is anti-linear: \[ (\mu S + \lambda T)^* = \overline{\mu} S^* + \overline{\lambda} T^*. \] In view of that \(\|T\| = \|T^*\|\) it follows that \(^*\) is bounded with norm \(1\).

Invertibility of * : \[ \langle Tx, y \rangle = \langle x, T^* y \rangle = \overline{\langle T^* y, x \rangle} = \overline{\langle y, T^{**} x \rangle} = \langle T^{**} x, y \rangle, \] so that \(\langle (T - T^{**})x, y \rangle = 0 \) for all \(x,y \in H\). Choose \(y = (T - T^{**})x\). Then \[ \| (T - T^{**})x \|^2 = 0 \quad\text{ for all }\quad x \in H, \] meaning that \(T = T^{**}\) in \(B(H)\).


The adjoint of a matrix

For matrices, we extend this definition (in that \(A\) need not map \(H\) to \(H\)):

  • The adjoint of \(A \in M_{m \times n}(\R)\) is its transpose \(A^t \in M_{n \times m}(\R)\).
  • The adjoint of \(A \in M_{m \times n}(\C)\) is its conjugate transpose \(A^* \in M_{n \times m}(\C)\).

N.b. Note that, in the case \(m=n\), this fits with the above definition if one adopts the standard inner product on \(\K^n\).

> Self-adjoint matrices are symmetric or hermitian

  • If \(T\in B(\R^n)\) is realized by a matrix \(A \in M_{n \times n}(\R)\), \(T\) is self-adjoint if and only if \(A\) is symmetric.
  • If \(T\in B(\C^n)\) is realized by a matrix \(A \in M_{n \times n}(\C)\), \(T\) is self-adjoint if and only if \(A\) is hermitian.

Proof

Proof

Real case: \[ \langle Tx, y \rangle = \langle x, Ty \rangle \quad\Longleftrightarrow\quad y^t A x = (Ay)^t x = y^t A^t x. \] By considering \(x = e_j\), \(y = e_k\) we get that \(A_{jk} = A^t_{jk}\) for all \(j,k = 1, \ldots, n\).

Complex case: \[ \langle Tx, y \rangle = \langle x, Ty \rangle \quad\Longleftrightarrow\quad y^* A x = (Ay)^* x = y^* A^* x. \] By considering \(x = e_j\), \(y = e_k\) we get that \(A_{jk} = A^*_{jk}\) for all \(j,k = 1, \ldots, n\).


> Self-adjoint operators have real spectrum

The eigenvalues of a self-adjoint operator are real, and eigenspaces corresponding to different eigenvalues are orthogonal.

Proof

Proof

If \(T = T^*\), then \[ \langle Tx, x \rangle = \langle x, Tx \rangle = \overline{\langle Tx,x \rangle} \in \R \quad\text{ for all } x \in H. \] Hence, if \(\mu, \lambda \in \C\), \(\mu \neq \lambda\), are eigenvalues of \(T\), with eigenvectors \(x, y\), respectively, then \[ \mu\|x\|^2 = \langle \mu x, x \rangle = \langle Tx, x \rangle \in \R, \] so that \(\mu \in \R\) (and, similarly, \(\lambda \in \R\)). Then \[ (\mu - \lambda)\langle x, y \rangle = \langle \mu x, y \rangle - \langle x, \lambda y \rangle = \langle Tx, y \rangle - \langle x, Ty \rangle = 0, \] since \(T\) is self-adjoint. Thus \(x \perp y\).


Unitary operators and orthogonal matrices

  • An operator \(U \in B(H)\) is called unitary if \(U U^* = U^* U = \mathrm{Id}\). Similarly, a matrix \(A \in M_{n \times n}(\K)\) is called unitary if the corresponding operator is unitary, i.e., if \(A^* A = I\).
  • A unitary real matrix \(Q \in M_{n \times n}(\R)\) is called orthogonal.

> Unitary operators preserve inner products

If \(U \in B(H)\) is unitary, then \[ \langle Ux, Uy \rangle = \langle x, y \rangle, \quad \text{ for all} \quad x,y \in H.\] In particular, \(U\) is an isometry.

Proof

Proof

\[ \langle Ux, Uy \rangle = \langle x, U^* U y \rangle = \langle x, y \rangle. \]


> Lemma: there are no nontrivial nilpotent self-adjoint operators

If \(N = N^*\) satisfies \(N^k = 0 \) for some \(k \in \N\), then \(N = 0\).

Proof

Proof

If \(N^2 = 0\), consider \[ \| Nx \|^2 = \langle Nx, Nx \rangle = \langle x, N^2 x \rangle = 0, \] to see that \(Nx = 0\) for all \(x \in H\).

Else, let \(k_0 \geq 2 \) be the smallest positive even number such that \(N^{2k_0} = 0\), and note that \[ \| N^{k_0}x \|^2 = \langle N^{k_0} x, N^{k_0} x \rangle = \langle x, N^{2k_0} x \rangle = 0. \] Hence, \(N^{k_0} = 0\) and either \(k = k_0\) or \(k = k_0 +1 \) is a strictly smaller positive even number satisfying \(N^k = 0\). This is a contradiction, so there is no such number \(k_0\).


> Orthogonal matrices matrices describe orthonormal bases

The columns \(\{Q_1, \ldots, Q_n\}\) of an orthogonal (unitary) matrix \(Q\) is an orthonormal basis for \(\R^n\) (\(\C^N\)).

N.b. The same is true for the rows of \(Q\).

Proof

Proof

Since the rows of \(Q^*\) are columns of \(\overline{Q}\), we have \[ Q^* Q = ((Q_i)_i)^* (Q_j)_j = (\overline{Q_i} \cdot Q_j)_{ij} = I\] if and only if \(Q_i \perp Q_j\) for \(i \neq j\), and \(|Q_i| = 1\).


The spectral theorem

Let \(A \in M_{n \times n}(\K)\) be symmetric (hermitian). Then there exists an orthonormal basis \(\{Q_j\}_{j=1}^n\) for \(\R^n\) (\(\C^n\)) of eigenvectors of \(A\), such that \[ A = Q D Q^*, \] where \(Q\) is the orthogonal (unitary) matrix with columns \((Q_j)_j\) and \(D\) is a diagonal matrix with the eigenvalues of \(A\) as its diagonal elements.

N.b. It is possible to extend the spectral theorem to all normal matrices, characterized by \( A A^* = A^* A\). 1)

Proof

Proof

Each eigenspace is maximal: Let \(\lambda\) an eigenvalue of \(A\) and pick \(x \in \mathrm{ker}((A-\lambda I)^2)\). Since \(A\) is self-adjoint with real eigenvalues we have \[ 0 = \langle (A-\lambda I)^2 x, x \rangle = \| (A-\lambda I) x \|^2 \quad\Longrightarrow\quad x \in \ker(A-\lambda I). \] Hence the Riesz index of \(\lambda\) is \(1\), and all eigenvalues are semi-simple, meaning that \(\mathrm{dim}(\ker(A-\lambda I)) = \mathrm{mult}(\lambda)\).

Applying the spectral decomposition: The statement now follows from the spectral (or Jordan) decomposition: The maximal generalized eigenspaces coincide with the eigenspaces, these are mutually orthogonal, and we may pick an orthonormal basis for each of them. Together, these form an orthonormal basis for \(K^n\), described by \(Q\).

Diagonalization by direct computation: With \(D + N\) representing the diagonal and nilpotent part of the spectral representation of \(A\), one can see directly that \[ Q (D+N) Q^{-1} = A = A^* = (Q (D+N) Q^{-1})^* = (Q^{-1})^* (D+N)^* Q^* = Q (D+N^*) Q^{-1}, \] in view of that \(D = D^*\) is a diagonal real matrix. By applying \(Q^{-1}=Q^*\) from the left and \(Q = (Q^*)^{-1}\) from the right, we obtain that \(N = N^*\), whence \(N = 0\) by the above lemma. Thus, \(A\) is diagonalized by the spectral decomposition given by the orthonormal basis \(Q\).


Positive definiteness

Let \(A = A^t \in M_{n\times n}(\R)\) be a symmetric matrix.

  • \(A\) is said to be positive definite if \[ \langle Ax, x \rangle = x^t Ax > 0 \quad\text{ for }\quad x \neq 0.\]
  • \(A\) is said to be positive semi-definite if \(\langle Ax, x \rangle = x^t Ax \geq 0\) for all \(x \in \R^n\).

> Characterization of positive definite matrices

A symmetric matrix \(A = A^t \in M_{n\times n}(\R)\) is positive definite exactly if one (and hence all) of the following conditions hold:

  • \(\langle A \cdot, \cdot \rangle \) defines an inner product.
  • All the eigenvalues of \(A\) are strictly positive.
  • \(A = R^t R\) for some invertible matrix \(R\).

Proof (contains important methods)

Proof (contains important methods)

Inner-product property. Assume that \(A\) is positive definite. Since \(x \mapsto Ax\), \(\R^n \to \R^n\), is linear, and the usual inner product is sesqui-linear (linear in its first argument, anti-linear in its second), the form \[ (x,y) \mapsto \langle Ax, y \rangle, \qquad \R^n \times \R^n \to \R, \] is also sesqui-linear. Furthermore, \[ \langle Ax, y \rangle = \langle x, Ay \rangle = \langle Ay,x \rangle, \] by the symmetry of \(A\) and of the standard inner product, and \[ \langle Ax, x \rangle > 0 \quad \text{ for }\quad x \neq 0, \] by the definition of positive definiteness, so the inner product \(\langle A \cdot, \cdot\rangle\) is non-degenerate symmetric.

Considering the same arguments, one also sees that \(\langle A \cdot, \cdot\rangle\) cannot be an inner product unless \(A\) is positive definite.

Eigenvalue property: Since \(A\) is symmetric, there exists an orthonormal basis of eigenvectors \(\{v_1, \ldots, v_n\}\) with \( Av_j = \lambda_j v_j\). Let \(x_j\) be the coordinates of a vector \(x\) in this basis. Then \[ \langle Ax, x \rangle = \Big\langle A \sum_{j=1}^n x_j v_j, \sum_{k=1}^n x_k v_k \Big\rangle = \Big\langle \sum_{j=1}^n x_j (A v_j), \sum_{k=1}^n x_k v_k \Big\rangle = \sum_{j,k=1}^n \lambda_j \langle x_j v_j, x_k v_k \rangle = \lambda_j x_j^2 > 0 \] for all \(x \neq 0\) if and only if \(\lambda_j > 0\) for all \(j =1, \ldots, n\).

Factorization property: If \( A = R^t R\) with \(R\) invertible, then \[ \langle Ax, x \rangle = \langle R^t R x, x \rangle = \| Rx\|^2 > 0, \] unless \( Rx = 0\), which happens only if \(x = 0\) (as \(R\) is invertible).

Contrariwise, if \(A\) is symmetric and positive definite, we can write \[ A = Q^t D Q = Q^t \sqrt{D} \sqrt{D} Q = Q^t (\sqrt{D})^t \sqrt{D} Q = (\sqrt{D} Q )^t (\sqrt{D} Q), \] where \(Q\) is an orthogonal matrix of eigenvectors, \(D = (\lambda_j)_{j}\) is a diagonal matrix with positive eigenvalues, and \(\sqrt{D} = (\sqrt{\lambda_j})_{j}\) has \(\sqrt{\lambda_j}\) as diagonal elements. Thus, \(A = R^T R\).


The Cholesky decomposition

An alternative to the spectral decomposition used in the proof of that \(A = R^t R\) for positive definite matrices is the Cholesky decomposition: If one divides out the main pivots (diagonal elements) in the \(LU\)-factorization of \(A\), one gets an \(LDU\)-decomposition, \(D\) being a diagonal matrix with the main pivots as diagonal elements, and \(L\) and \(U\) having only unit elements on their main diagonals. This factorization is unique. We thus obtain \[ L D U = A = A^t = (LDU)^t = U^t D^t L^t = U^t D L^t, \] where \(U^t\) is lower triangular, and \(L^t\) is upper triangular. By uniqueness of the \(LDU\)-factorization, we must have \(L = U^t\) and \(U = L^t\). Consequently, \[ A = L D L^t = L \sqrt{D} \sqrt{D} L^t = L \sqrt{D} (L \sqrt{D})^t, \] with \(L \sqrt{D}\) being invertible (\(\sqrt{D}\) has only positive elements along the diagonal and \(L\) is lower triangular with units along the diagonal).

Singular values

> The singular value theorem

Let \(A \in M_{m \times n}(\R)\) be the realization of a linear transformation \(\R^n \to \R^m\), such that \(\mathrm{rank}(A) = r\). Then \(A^t A\) is a positive semi-definite matrix of rank \(r\), and there exists an orthonormal basis of eigenvectors of \(A^t A\), \[\{v_1, \ldots, v_n\} \subset \R^n, \quad\text{ with eigenvalues }\quad \sigma_1^2 \geq \ldots \geq \sigma_r^2 > 0, \quad \sigma_{r+1}^2 = \ldots = \sigma_n^2 = 0,\] and a corresponding orthonormal basis \[\{u_1, \ldots, u_r, u_{r+1}, \ldots, u_m\} := \{\frac{1}{\sigma_1}Av_1, \ldots, \frac{1}{\sigma_r}Av_r, u_{r+1}, \ldots, u_m\}\] for \(\R^m\) (\(u_{r+1}, \ldots, u_m\) arbitrary to fit the orthonormal basis), such that \[ A v_j = \begin{cases} \sigma_j u_j, \qquad & j = 1, \ldots, r,\\ 0, \qquad & j = r+1, \ldots, n. \end{cases} \] The unique scalars \(\sigma_1, \ldots, \sigma_r, 0, \ldots ,0\) (extended to a total of \(\min(m,n)\)) are called singular values of \(A\). If \[ (\Sigma_{ij})_{ij} := (\delta_{ij} \sigma_j)_{ij} \in M_{m \times n}(\R) \] is the diagonal matrix with \(\sigma_1, \ldots, \sigma_r, 0, \ldots, 0\) on its main diagonal, \(U = (u_j)_j \in M_{m\times m}(\R)\) is the orthogonal matrix with \(u_j\) as columns, and \(V = (v_j)_j \in M_{n\times n}(\R)\) is the orthogonal matrix with \(v_j\) as columns, it follows that \[ A = U \Sigma V^{-1} = U \Sigma V^t. \] This is the singular value decomposition of \(A\).

N.b. The singular values for \(A^t\) equals those of \(A\). For \(A^t\), the orthonormal bases \(V\) and \(U\) are simply exchanged (in comparison to \(A\)).

The pseudoinverse

A finite-dimensional linear transformation can always be inverted on its range. Let \(A \in M_{m\times n}(\R)\) be the realization of a linear transformation \(\R^n \to \R^m\), and let \[ A|_{\mathrm{ker}(A)^\perp} \colon \mathrm{ker}(A)^\perp \to \mathrm{ran}(A) \] be the restriction of \(A\) to the orthogonal complement of its null space. Then (according to the rank–nullity theorem) \(A|_{\mathrm{ker}(A)}\) is bijective. Its inverse, \(A^\dagger\), is called the (Moore-Penrose) pseudoinverse: \[ A^\dagger \colon \mathrm{ran}(A) \to \mathrm{ker}(A)^\perp, \qquad A^\dagger (Ax) = x. \]

The pseudoinverse from the singular value decomposition

If \(A = U \Sigma V^t\) is the singular value decomposition of \(A \in M_{m \times n}(\R)\), using the bases \(\{u_j\}_j\) and \(\{v_j\}_j\), we only have to invert \(Av_j = \sigma_j u_j\) for \(j =1, \ldots, r\).

More precisely, if \[ (\Sigma_{ij}^\dagger)_{ij} = (\delta_{ij} \frac{1}{\sigma_j})_{ij} \in M_{n \times m}(\R) \] is the diagonal matrix with the reciprocals of the singular values along its main diagonal, then \[ A^\dagger = V \Sigma^\dagger U^t \] is the singular value decomposition of the pseudoinverse \(A^\dagger\).

The Gram–Schmidt orthogonalization and QR-decompositions

Gram–Schmidt

A set \(\{v_1, \ldots, v_n\} \subset H\) of linearly independent vectors can transform into an orthonormal system using the Gram–Schmidt orthogonalization algorithm.2) Define \[ e_1 := \frac{v_1}{\|v_1\|}, \] \[ \tilde e_2 := v_2 - \langle v_2, e_1 \rangle e_1, \qquad e_2 := \frac{\tilde e_2}{\| \tilde e_2 \|}, \] and, recursively, \[ \tilde e_{k+1} := v_{k+1} - \sum_{j=1}^{k} \langle v_{k+1}, e_j \rangle e_j, \qquad e_{k+1} := \frac{\tilde e_{k+1}}{\| \tilde e_{k+1} \|}, \qquad k = 1, \ldots, n-1. \] Then \(\{e_1, \ldots, e_n\}\) is an orthonormal system.

QR-decompositions

If, in the Gram–Schmidt orthoghonalization, we express the vectors \(\{v_1, \ldots, v_n\}\) in terms of \(\{e_1, \ldots, e_n\}\), we obtain \[ v_1 = \langle v_1, e_1 \rangle e_1,\qquad v_2 = \langle v_2, e_1 \rangle e_1 + \langle v_2, e_2 \rangle e_2, \qquad v_k = \sum_{j=1}^k \langle v_k, e_j \rangle e_j, \] which can be seen either by direct calculation or from the 'closest point'-corollary (since \(\mathrm{span}\{v_1, \ldots v_k\} = \mathrm{span}\{e_1, \ldots e_k\}\) for \(k =1, \ldots, n\)). This gives us the \(QR\)-decomposition of a full-rank matrix \(A\):

If \(A = [ v_1, \ldots, v_n] \in M_{n \times n}(\R)\) is matrix of full rank (so that its column vectors span \(\R^n\)), the Gram–Schmidt orthogonalization applied to the columns vectors \(v_1, \ldots, v_n\) yields the QR-decomposition of \(A\): \[ A = QR = \begin{bmatrix} [e_1] & [e_2] & \cdots & [e_n] \end{bmatrix} \begin{bmatrix} \langle v_1, e_1 \rangle & \langle v_2, e_1 \rangle & \ldots & \langle v_n, e_1 \rangle \\ 0 & \langle v_2, e_2 \rangle & \ldots & \langle v_n, e_2 \rangle \\ \vdots & & \ddots & \vdots\\ 0 & 0 & \ldots & \langle v_n, e_n \rangle \end{bmatrix}, \] where \(Q\) is orthogonal (\(Q^t = Q^{-1}\)), and \(R\) is upper (right) triangular. Consequently, one can calculate \(R = Q^t A\) by finding the orthonormal basis given by \(Q\).

N.b.

  • It is possible to extend the QR-decomposition to general rectangular matrices.
  • Just like the LU-decomposition, the QR-decomposition can help in solving linear systems, since \[Ax = b \quad\Longleftrightarrow\quad QR x = b \quad\Longleftrightarrow\quad R x = Q^t b,\] where \(R\) is uppper triangular.
1)
In fact, the class of normal matrices is the biggest class of matrices for which the spectral theorem holds.
2)
The Gram–Schmidt orthogonalization is equally valid for linearly independent sequences.
2017-03-24, Hallvard Norheim Bø