\[ \newcommand{R}{\mathbb{R}} \newcommand{C}{\mathbb{C}} \newcommand{K}{\mathbb{K}} \newcommand{N}{\mathbb{N}} \newcommand{Z}{\mathbb{Z}} \newcommand{defarrow}{\quad \stackrel{\text{def}}{\Longleftrightarrow} \quad} \]
This week's lecture notes is an attempt to cover (in finite time) the remaining topics listed in the course description. Most important is the spectral theorem (including basic properties for adjoints and self-adjoint operators) and the characterization of positive definite matrices. The singular value decomposition can then be understood as a generalization of the spectral theorem. Finally, one should know the Gram–Schmidt orthogonalization and QR-decomposition.
Adjoints and decompositions
Consider a Hilbert space \(H\) over a field \(\K \in \{\R, \C\}\). Most of the results in this section will be for the cases \(H = \R^n\) and \(H = \C^n\).
Adjoints
Let \(T \in B(H)\) be a bounded linear operator \(H \to H\).
- The adjoint of \(T\) is the operator \(T^* \in B(H)\) defined by \[ \langle Tx, y \rangle = \langle x, T^* y \rangle \quad\text{ for all }\quad x,y \in H.\]
- \(T \) is called self-adjoint if \(T = T^*\).
> Properties of the adjoint
The adjoint is well defined: for each \(T \in B(H)\), there exists a unique \(T^* \in B(H)\). The map \(^*\colon B(H) \to B(H)\), \(T \mapsto T^*\) satisfies the following properties:
- It is anti-linear: \( (\mu S + \lambda T)^* = \overline{\mu} S^* + \overline{\lambda} T^*\), for all \(S,T \in B(H)\) and \(\mu, \lambda \in \K\).
- It is bounded with unit norm: \(\|T^*\| = \|T\|\).
- It is invertible with itself as inverse: \((T^*)^* = T\).
N.b. We adopt the convention that \(T^{**} \stackrel{\text{def.}}{=} (T^*)^*\).
The adjoint of a matrix
For matrices, we extend this definition (in that \(A\) need not map \(H\) to \(H\)):
- The adjoint of \(A \in M_{m \times n}(\R)\) is its transpose \(A^t \in M_{n \times m}(\R)\).
- The adjoint of \(A \in M_{m \times n}(\C)\) is its conjugate transpose \(A^* \in M_{n \times m}(\C)\).
N.b. Note that, in the case \(m=n\), this fits with the above definition if one adopts the standard inner product on \(\K^n\).
> Self-adjoint matrices are symmetric or hermitian
- If \(T\in B(\R^n)\) is realized by a matrix \(A \in M_{n \times n}(\R)\), \(T\) is self-adjoint if and only if \(A\) is symmetric.
- If \(T\in B(\C^n)\) is realized by a matrix \(A \in M_{n \times n}(\C)\), \(T\) is self-adjoint if and only if \(A\) is hermitian.
> Self-adjoint operators have real spectrum
The eigenvalues of a self-adjoint operator are real, and eigenspaces corresponding to different eigenvalues are orthogonal.
Unitary operators and orthogonal matrices
- An operator \(U \in B(H)\) is called unitary if \(U U^* = U^* U = \mathrm{Id}\). Similarly, a matrix \(A \in M_{n \times n}(\K)\) is called unitary if the corresponding operator is unitary, i.e., if \(A^* A = I\).
- A unitary real matrix \(Q \in M_{n \times n}(\R)\) is called orthogonal.
> Unitary operators preserve inner products
If \(U \in B(H)\) is unitary, then \[ \langle Ux, Uy \rangle = \langle x, y \rangle, \quad \text{ for all} \quad x,y \in H.\] In particular, \(U\) is an isometry.
> Lemma: there are no nontrivial nilpotent self-adjoint operators
If \(N = N^*\) satisfies \(N^k = 0 \) for some \(k \in \N\), then \(N = 0\).
> Orthogonal matrices matrices describe orthonormal bases
The columns \(\{Q_1, \ldots, Q_n\}\) of an orthogonal (unitary) matrix \(Q\) is an orthonormal basis for \(\R^n\) (\(\C^N\)).
N.b. The same is true for the rows of \(Q\).
The spectral theorem
Let \(A \in M_{n \times n}(\K)\) be symmetric (hermitian). Then there exists an orthonormal basis \(\{Q_j\}_{j=1}^n\) for \(\R^n\) (\(\C^n\)) of eigenvectors of \(A\), such that \[ A = Q D Q^*, \] where \(Q\) is the orthogonal (unitary) matrix with columns \((Q_j)_j\) and \(D\) is a diagonal matrix with the eigenvalues of \(A\) as its diagonal elements.
N.b. It is possible to extend the spectral theorem to all normal matrices, characterized by \( A A^* = A^* A\). 1)
Positive definiteness
Let \(A = A^t \in M_{n\times n}(\R)\) be a symmetric matrix.
- \(A\) is said to be positive definite if \[ \langle Ax, x \rangle = x^t Ax > 0 \quad\text{ for }\quad x \neq 0.\]
- \(A\) is said to be positive semi-definite if \(\langle Ax, x \rangle = x^t Ax \geq 0\) for all \(x \in \R^n\).
> Characterization of positive definite matrices
A symmetric matrix \(A = A^t \in M_{n\times n}(\R)\) is positive definite exactly if one (and hence all) of the following conditions hold:
- \(\langle A \cdot, \cdot \rangle \) defines an inner product.
- All the eigenvalues of \(A\) are strictly positive.
- \(A = R^t R\) for some invertible matrix \(R\).
The Cholesky decomposition
An alternative to the spectral decomposition used in the proof of that \(A = R^t R\) for positive definite matrices is the Cholesky decomposition: If one divides out the main pivots (diagonal elements) in the \(LU\)-factorization of \(A\), one gets an \(LDU\)-decomposition, \(D\) being a diagonal matrix with the main pivots as diagonal elements, and \(L\) and \(U\) having only unit elements on their main diagonals. This factorization is unique. We thus obtain \[ L D U = A = A^t = (LDU)^t = U^t D^t L^t = U^t D L^t, \] where \(U^t\) is lower triangular, and \(L^t\) is upper triangular. By uniqueness of the \(LDU\)-factorization, we must have \(L = U^t\) and \(U = L^t\). Consequently, \[ A = L D L^t = L \sqrt{D} \sqrt{D} L^t = L \sqrt{D} (L \sqrt{D})^t, \] with \(L \sqrt{D}\) being invertible (\(\sqrt{D}\) has only positive elements along the diagonal and \(L\) is lower triangular with units along the diagonal).
Singular values
> The singular value theorem
Let \(A \in M_{m \times n}(\R)\) be the realization of a linear transformation \(\R^n \to \R^m\), such that \(\mathrm{rank}(A) = r\). Then \(A^t A\) is a positive semi-definite matrix of rank \(r\), and there exists an orthonormal basis of eigenvectors of \(A^t A\), \[\{v_1, \ldots, v_n\} \subset \R^n, \quad\text{ with eigenvalues }\quad \sigma_1^2 \geq \ldots \geq \sigma_r^2 > 0, \quad \sigma_{r+1}^2 = \ldots = \sigma_n^2 = 0,\] and a corresponding orthonormal basis \[\{u_1, \ldots, u_r, u_{r+1}, \ldots, u_m\} := \{\frac{1}{\sigma_1}Av_1, \ldots, \frac{1}{\sigma_r}Av_r, u_{r+1}, \ldots, u_m\}\] for \(\R^m\) (\(u_{r+1}, \ldots, u_m\) arbitrary to fit the orthonormal basis), such that \[ A v_j = \begin{cases} \sigma_j u_j, \qquad & j = 1, \ldots, r,\\ 0, \qquad & j = r+1, \ldots, n. \end{cases} \] The unique scalars \(\sigma_1, \ldots, \sigma_r, 0, \ldots ,0\) (extended to a total of \(\min(m,n)\)) are called singular values of \(A\). If \[ (\Sigma_{ij})_{ij} := (\delta_{ij} \sigma_j)_{ij} \in M_{m \times n}(\R) \] is the diagonal matrix with \(\sigma_1, \ldots, \sigma_r, 0, \ldots, 0\) on its main diagonal, \(U = (u_j)_j \in M_{m\times m}(\R)\) is the orthogonal matrix with \(u_j\) as columns, and \(V = (v_j)_j \in M_{n\times n}(\R)\) is the orthogonal matrix with \(v_j\) as columns, it follows that \[ A = U \Sigma V^{-1} = U \Sigma V^t. \] This is the singular value decomposition of \(A\).
N.b. The singular values for \(A^t\) equals those of \(A\). For \(A^t\), the orthonormal bases \(V\) and \(U\) are simply exchanged (in comparison to \(A\)).
The pseudoinverse
A finite-dimensional linear transformation can always be inverted on its range. Let \(A \in M_{m\times n}(\R)\) be the realization of a linear transformation \(\R^n \to \R^m\), and let \[ A|_{\mathrm{ker}(A)^\perp} \colon \mathrm{ker}(A)^\perp \to \mathrm{ran}(A) \] be the restriction of \(A\) to the orthogonal complement of its null space. Then (according to the rank–nullity theorem) \(A|_{\mathrm{ker}(A)}\) is bijective. Its inverse, \(A^\dagger\), is called the (Moore-Penrose) pseudoinverse: \[ A^\dagger \colon \mathrm{ran}(A) \to \mathrm{ker}(A)^\perp, \qquad A^\dagger (Ax) = x. \]
The pseudoinverse from the singular value decomposition
If \(A = U \Sigma V^t\) is the singular value decomposition of \(A \in M_{m \times n}(\R)\), using the bases \(\{u_j\}_j\) and \(\{v_j\}_j\), we only have to invert \(Av_j = \sigma_j u_j\) for \(j =1, \ldots, r\).
More precisely, if \[ (\Sigma_{ij}^\dagger)_{ij} = (\delta_{ij} \frac{1}{\sigma_j})_{ij} \in M_{n \times m}(\R) \] is the diagonal matrix with the reciprocals of the singular values along its main diagonal, then \[ A^\dagger = V \Sigma^\dagger U^t \] is the singular value decomposition of the pseudoinverse \(A^\dagger\).
The Gram–Schmidt orthogonalization and QR-decompositions
Gram–Schmidt
A set \(\{v_1, \ldots, v_n\} \subset H\) of linearly independent vectors can transform into an orthonormal system using the Gram–Schmidt orthogonalization algorithm.2) Define \[ e_1 := \frac{v_1}{\|v_1\|}, \] \[ \tilde e_2 := v_2 - \langle v_2, e_1 \rangle e_1, \qquad e_2 := \frac{\tilde e_2}{\| \tilde e_2 \|}, \] and, recursively, \[ \tilde e_{k+1} := v_{k+1} - \sum_{j=1}^{k} \langle v_{k+1}, e_j \rangle e_j, \qquad e_{k+1} := \frac{\tilde e_{k+1}}{\| \tilde e_{k+1} \|}, \qquad k = 1, \ldots, n-1. \] Then \(\{e_1, \ldots, e_n\}\) is an orthonormal system.
QR-decompositions
If, in the Gram–Schmidt orthoghonalization, we express the vectors \(\{v_1, \ldots, v_n\}\) in terms of \(\{e_1, \ldots, e_n\}\), we obtain \[ v_1 = \langle v_1, e_1 \rangle e_1,\qquad v_2 = \langle v_2, e_1 \rangle e_1 + \langle v_2, e_2 \rangle e_2, \qquad v_k = \sum_{j=1}^k \langle v_k, e_j \rangle e_j, \] which can be seen either by direct calculation or from the 'closest point'-corollary (since \(\mathrm{span}\{v_1, \ldots v_k\} = \mathrm{span}\{e_1, \ldots e_k\}\) for \(k =1, \ldots, n\)). This gives us the \(QR\)-decomposition of a full-rank matrix \(A\):
If \(A = [ v_1, \ldots, v_n] \in M_{n \times n}(\R)\) is matrix of full rank (so that its column vectors span \(\R^n\)), the Gram–Schmidt orthogonalization applied to the columns vectors \(v_1, \ldots, v_n\) yields the QR-decomposition of \(A\): \[ A = QR = \begin{bmatrix} [e_1] & [e_2] & \cdots & [e_n] \end{bmatrix} \begin{bmatrix} \langle v_1, e_1 \rangle & \langle v_2, e_1 \rangle & \ldots & \langle v_n, e_1 \rangle \\ 0 & \langle v_2, e_2 \rangle & \ldots & \langle v_n, e_2 \rangle \\ \vdots & & \ddots & \vdots\\ 0 & 0 & \ldots & \langle v_n, e_n \rangle \end{bmatrix}, \] where \(Q\) is orthogonal (\(Q^t = Q^{-1}\)), and \(R\) is upper (right) triangular. Consequently, one can calculate \(R = Q^t A\) by finding the orthonormal basis given by \(Q\).
N.b.
- It is possible to extend the QR-decomposition to general rectangular matrices.
- Just like the LU-decomposition, the QR-decomposition can help in solving linear systems, since \[Ax = b \quad\Longleftrightarrow\quad QR x = b \quad\Longleftrightarrow\quad R x = Q^t b,\] where \(R\) is uppper triangular.