\[ \newcommand{R}{\mathbb{R}} \newcommand{C}{\mathbb{C}} \newcommand{K}{\mathbb{K}} \newcommand{N}{\mathbb{N}} \newcommand{Z}{\mathbb{Z}} \newcommand{defarrow}{\quad \stackrel{\text{def}}{\Longleftrightarrow} \quad} \]

Orthogonality

Consider an inner-product space \((X, \langle \cdot, \cdot \rangle)\) over a field \(\K \in \{\R, \C\}\). When \(X\) is complete, we shall write \(H\) to indicate that it is a Hilbert space.

The projection and Riesz representation theorems

Orthogonal vectors and set

  • Two vectors \(x,y \in X\) are said to be orthogonal, \[ x \perp y \defarrow \langle x,y \rangle = 0.\]
  • A vector \(x \in X\) is said to be orthogonal to a set \(M \subset X\), \[ x \perp M \defarrow \langle x, y \rangle = 0 \quad\text{ for all } y \in M. \]
  • Two sets \(M,N \subset X\) are said to be orthogonal, \[ M \perp N \defarrow \langle x, y \rangle = 0 \quad\text{ for all } \quad x \in M, y \in N. \]
  • The orthogonal complement of a set \(M \in X\) consists of all vectors orthogonal to \(M\): \[M^\perp \quad\stackrel{\text{def.}}{=}\quad \{ x \in X \colon x \perp M\}.\]

The word perpendicular is sometimes used interchangeably with 'orthogonal', but mostly in \(\R^n\).

Ex.
  • In \(\R^3\), the vector \((1,2,1)\) is orthogonal to the plane \(\{x_1 + 2x_2 + x_3 = 0\}\).
  • In \(L_2((-\pi,\pi),\C)\) the vectors \( e^{ikx}\), \(k \in \Z\), are all orthogonal to each other: \[ \langle e^{ik_1 x}, e^{ik_2x} \rangle = \int_{-\pi}^\pi e^{ik_1 x} \overline{e^{ik_2 x}}\,dx = \int_{-\pi}^\pi e^{i(k_1-k_2) x} \,dx = \frac{e^{i(k_1-k_2)x}}{i(k_1-k_2)}\bigg|_{-\pi}^{\pi} = 0 \quad\text{ for }\quad k_1 \neq k_2, \] by periodicity of \(e^{ikx} = \cos(kx) + i\sin(kx)\).
  • In \(l_2(\R)\), with \(e_1 = (1,0,0, \ldots)\), \[ \{e_1\}^\perp = \{ x \in l_2(\R) \colon x = (0,x_2, x_3, \ldots) \}. \]

* In a Hilbert space, \(H = \oplus_{j=1}^m H_j\) for subspaces \(H_j \subset H\) means that \[ x = \sum_{j=1}^m x_j, \quad x_j \in H_j, \qquad H_j \perp H_k \text{ for } j \neq k,\] which gives a unique representation of any \(x \in H\) in terms of elements in orthogonal subspaces.1)

Ex.
  • \[ \R^3 = \mathrm{span}\{(1,0,0)\} \oplus \mathrm{span}\{(0,1,0),(0,0,1)\},\] but also \[ \R^3 = \mathrm{span}\{(1,2,1)\} \oplus \{x_1 + 2x_2 + x_3 = 0\}.\]

> The projection theorem

Let \(M \subset H\) be a closed linear subspace of a Hilbert space \(H\). Then \(H = M \oplus M^\perp\).

Proof

Proof

Existence of \(y_0 \in M\): Pick \(x_0 \in H\). By the minimal distance theorem, there exists a unique point \(y_0 \in M\) with \[ \|x_0 - y_0\| = \inf_{y \in M} \| x_0 -y\|. \]

Existence of \(x_0 - y_0 \in M^\perp\): Since \(M\) is a subspace, \(y_0 + \lambda y \in M\) for any \(y \in M\), \(\lambda \in \K\). Hence \[ \|x_0 - y_0\|^2 \leq \| x_0 - y_0 - \lambda y\|^2 = \| x_0 - y_0\|^2 - 2 \Re (\lambda \langle y, x_0 - y_0 \rangle) + |\lambda|^2 \|y\|^2, \] and \[ -2 \Re (\lambda \langle y, x_0 - y_0 \rangle) + |\lambda|^2 \|y\|^2 \geq 0. \] By taking \(\lambda = \varepsilon \ll 1\), we see that \[ \Re (\lambda \langle y, x_0 - y_0 \rangle) \leq 0, \] and, similarly, by taking \(\lambda = -i \varepsilon\), that \[ \Im (\lambda \langle y, x_0 - y_0 \rangle) \leq 0. \] Since \(y \in M\) is arbitrary, by exchanging \(-y\) for \(y\), we obtain \[ \langle y, x_0 - y_0 \rangle = 0 \quad\text{ for any }\quad y \in M. \] Thus we can write \[ x_0 = y_0 + (x_0 - y_0), \quad\text{ where }\quad y_0 \in M, \quad x_0 - y_0 \in M^\perp. \]

Uniqueness: If we have two representations \(x_0 = y_0 + z_0\) and \(x_0 = \tilde y_0 + \tilde z_0\), then \[ M \ni y_0 - \tilde y_0 = \tilde z_0 - z_0 \in M^\perp, \] but only the zero vector is orthogonal to itself, implying that \(y_0 = \tilde y_0\) and \(z_0 = \tilde z_0\).


Ex.
  • The null space of a matrix \(A \in M_{m\times n}(\R)\) is closed linear subspace, so that \( \R^n = \mathrm{ker}(A) \oplus (\mathrm{ker}(A))^\perp\). The geometric rank–nullity theorem characterizes the orthogonal complement as the range of the transpose matrix: \[ \R^n = \mathrm{ker}(A) \oplus \mathrm{ran}(A^t). \]

> Corollary: strict subspace characterization

If \(M \subsetneq H\) is a closed linear subspace of \(H\), there exists a non-zero vector \(z_0 \in H\) with \(z_0 \perp M\).

Proof

Proof

Since \(M \neq H\) there exists \(x_0 \in H \setminus M\). According to the projection theorem, \(x_0 = y_0 + z_0\) with \(y_0 \in M\), \(z_0 \in M^\perp\). Then \(z_0 \neq 0\), and \(z_0 \perp M\) is the vector we are looking for.


Ex.
  • Let \(M = \overline{l_0}\) be the closure of \[ l_0 = \{ x \in l_2 \colon \{x_j\}_{j \in \N} \text{ has finitely many non-zero entries} \} \] in \(l_2\). Is \(M = l_2\)? Say there exists \(z \in l_2\) such that \(z \perp M\). Since \(\{e_j\}_{j\in \N} \subset M\), we have \[ \langle z, e_j \rangle = z_j = 0 \quad\text{ for all }\quad j \in \N.\] Thus \(z = 0\), and \(\overline{l_0} = l_2\).

> The Riesz representation theorem

A Hilbert space is its own dual: every bounded linear functional \(T \in B(H, \K)\) is given by an inner product, \[ Tx = \langle x, z \rangle, \] for a unique \(z \in H\). Moreover, \(\|T\|_{B(H,\K)} = \|z\|_H\).

N.b. Note that any function \(x \mapsto \langle x, y \rangle\) defines a bounded linear functional on \(H\).

Proof

Proof

Existence: Let \[ N = \mathrm{ker}(T). \] Then \(N\) is a closed linear subspace of \(H\). If \(N = H\), we have \(T = 0\) in \( B(H,\K)\) and \(Tx = \langle x, 0 \rangle\).

Assume now that \(N \neq H\). According to the corollary above, there exists \(z_0 \in N^\perp\), \(z_0 \neq 0\). Since \(z_0 \perp \mathrm{ker}(T)\) we have \(T z_0 \neq 0\). Consequently, \[ x - \frac{Tx}{Tz_0} z_0 \in \mathrm{ker}(T) \quad \text{ for all }\quad x \in H, \] implying \[ \Big\langle x - \frac{Tx}{Tz_0} z_0, z_0 \Big\rangle = 0 \quad \Leftrightarrow\quad Tx \Big\langle \frac{1}{T z_0} z_0, z_0 \Big\rangle = \langle x, z_0 \rangle \quad \Leftrightarrow\quad Tx = \frac{Tz_0}{\| z_0\|^2} \langle x, z_0 \rangle = \Big\langle x, \frac{\overline{Tz_0}}{\| z_0\|^2} z_0 \Big\rangle. \] Thus \[ Tx = \langle x, z \rangle \quad \text{ for }\quad z := \frac{\overline{Tz_0}}{\|z_0\|^2} z_0. \]

Uniqueness: If, in addition, \[ Tx = \langle x, w \rangle \quad\text{ for all }\quad x \in H, \] then \[ \langle x, z- w \rangle = Tx - Tx = 0 \quad \text{ for all }\quad x \in H, \] so that \( z = w \).

Equality of norms: We have \[ \|T\| = \sup_{\|x\| = 1}|Tx| = \sup_{\|x\| = 1}|\langle x, z \rangle| \leq \sup_{\|x\| = 1} \|x\| \|z\| = \|z\|, \] by the Cauchy–Schwarz inequality. Contrariwise, \[ \|z\|^2 = \langle z, z \rangle = |Tz| \leq \|T\| \|z\| \quad\Longrightarrow\quad \|z\| \leq \|T\|. \] Thus \(\|T\| = \|z\|\).


Ex.
  • \(\C^n\) is its own dual: every bounded linear functional on \(\C^n\) is realized by a dot product: \( Tx = x \cdot \overline{y}\).
  • \(L_2(\R,\R)\) is its own dual: every bounded linear functional on \(L_2(\R,\R)\) is realized by an inner product: \( Tf = \int_\R f(s) \overline{g(s)}\,ds.\) By the Cauchy–Schwarz inequality, \[ |Tf| = | \langle f,g \rangle | = \Big| \int_\R f(s) \overline{g(s)}\,ds \Big| \leq \Big( \int_\R |f(s)|^2\,ds \Big)^{1/2} \Big( \int_\R |g(s)|^2\,ds \Big)^{1/2} = \|f\| \|g\|, \] with equality for \(f = \lambda g\); hence, \( \|T\| = ||g\| \).

Orthonormal systems, Bessel's inequality and the Fourier series theorem

  • A sequence \( \{e_j\}_{j \in \N}\) is called orthogonal if \(e_j \perp e_k\) for \(j \neq k\). If, in addition, \(\|e_j\| = 1\) for all \(j \in \N\), it is called orthonormal.
  • An orthonormal sequence is called complete if there are no non-zero vectors orthogonal to it: \[ K \text{ complete} \defarrow K^\perp = \{0\}. \]
  • If \(\{e_j\}_{j \in \N} \subset H\) is an orthonormal sequence, the projection \(\langle x, e_j \rangle\) is called the \(j\)th Fourier coeffecient of \(x\). The series \(\sum_{j \in \N} \langle x, e_j \rangle e_j\) is called the Fourier series of \(x\) with respect to the sequence \(\{e_j\}_{j \in \N}\).

N.b. All the above definition carry over to general (finite or infinite) sets, called orthonormal systems.

Ex.
  • The canonical basis \(\{e_j\}_j\) is an orthonormal basis in \(\R^n\), \(\C^n\) and \(l_2\) (real or complex).
  • The sequence \(\{\frac{1}{\sqrt{2\pi}} e^{ikx}\}_{k \in \Z}\) is an ortonormal sequence in \(L_2((-\pi,\pi),\C)\), since \[ \|{\textstyle \frac{1}{\sqrt{2\pi}}} e^{ikx}\| = \Big( \int_{-\pi}^{\pi} {\textstyle \frac{1}{\sqrt{2\pi}}} e^{ikx} \overline{{\textstyle \frac{1}{\sqrt{2\pi}}} e^{ikx}} \,dx \Big)^{1/2} = \Big( \frac{1}{2\pi} \int_{-\pi}^{\pi} e^{ikx} e^{-ikx}\,dx \Big)^{1/2} = 1.\]

> Bessel's inequality

An othonormal sequence satisfies \( \sum_{j \in \N} | \langle x, e_j \rangle |^2 \leq \| x\|^2, \) for all \(x \in X\).

Proof

Proof

\[ \begin{align*} 0 &\leq \Big\| x - \sum_{j=1}^N \langle x, e_j \rangle e_j \Big\|^2\\ &= \|x\|^2 - 2 \Re \Big \langle x, \sum_{j=1}^N \langle x, e_j\rangle e_j \Big\rangle + \Big\langle \sum_{j=1}^N \langle x, e_j\rangle e_j, \sum_{k=1}^N \langle x, e_k\rangle e_k \Big\rangle\\ &= \|x\|^2 - 2 \Re \sum_{j=1}^N \overline{\langle x, e_j \rangle} \langle x, e_j \rangle + \sum_{j=1}^N \sum_{k = 1}^N \langle x, e_j \rangle \overline{\langle x, e_k \rangle} \langle e_j, e_k\rangle\\ &= \|x\|^2 - \sum_{j=1}^N |\langle x, e_j \rangle|^2. \end{align*} \] Thus \[ \sum_{j=1}^N |\langle x, e_j \rangle|^2 \leq \|x\|^2, \] irrespective of \(N \in \N\). Bessel's inequality is obtained by letting \(N \to \infty\).


> Fourier coefficients are best possible coefficients

An orthonormal sequence satisfies \[ \Big\| x - \sum_{j=1}^N \lambda_j e_j \Big\| \geq \Big\| x - \sum_{j=1}^N \langle x, e_j \rangle e_j \Big\|, \] for any \(N \in \N\) and any scalars \(\lambda_1, \ldots, \lambda_N \in \K\). Equality holds if and only if \(\lambda_j = \langle x, e_j \rangle\) for all \(j \in \N\).

Proof

Proof

\[ \begin{align*} \Big\| x - \sum_{j=1}^N \lambda_j e_j \Big\|^2 &= \|x\|^2 - 2 \Re \Big\langle x, \sum_{j=1}^N \lambda_j e_j \Big\rangle + \Big\| \sum_{j=1}^N \lambda_j e_j \Big\|^2\\ &= \|x\|^2 - 2 \Re \sum_{j=1}^N \overline{\lambda_j} \langle x, e_j \rangle + \sum_{j=1}^N |\lambda_j|^2\\ &= \|x\|^2 + \sum_{j=1}^n |\langle x, e_j \rangle - \lambda_j |^2 - \sum_{j=1}^N |\langle x, e_j \rangle|^2\\ &\geq \|x\|^2 - \sum_{j=1}^N |\langle x, e_j \rangle|^2\\ & = \Big\| x - \sum_{j=1}^N \langle x, e_j \rangle e_j \Big\|^2, \end{align*} \] where the last equality comes from the proof of Bessel's inequality.


> Corollary: closest point

If \(\{e_1, \ldots, e_n\}\) is an orthonormal system, then \( y = \sum_{j=1}^n \langle x,e_j\rangle e_j\) is the closest point to \(x\) in \(\mathrm{span}\{e_1, \ldots, e_n\}\), with \(d = \|x-y\|\) given by \[ d^2 = \|x\|^2 - \sum_{j=1}^N |\langle x, e_j \rangle|^2. \]

N.b. In particular, if \(x \in \mathrm{span}\{e_1, \ldots, e_n\}\), then \(x = \sum_{j=1}^N \langle x,e_j\rangle e_j\).

Proof

Proof

Since Fourier coefficients are best possible, there is no better approximation of \(x\) in \(\mathrm{span}\{e_1, \ldots, e_n\}\). The distance formula follows from the proof of Bessel's inequality.


Ex.
  • In \(\R^3\), what is the closest point in the plane spanned by \(e_1 := \frac{1}{\sqrt{2}}(1,1,0)\) and \(e_2 := (0,0,1)\) to the point \(x = (2,1,1)\)? We have \[\begin{align*} \langle x, e_1 \rangle e_1 + \langle x, e_2 \rangle e_2 &= \big( (2,1,1) \cdot {\textstyle \frac{1}{\sqrt{2}}} (1,1,0) \big) {\textstyle \frac{1}{\sqrt{2}}} (1,1,0) + \big( (2,1,1) \cdot (0,0,1) \big) (0,0,1)\\ &= {\textstyle \frac{3}{2}} (1,1,0) + (0,0,1) = {\textstyle (\frac{3}{2}, \frac{3}{2},1)}. \end{align*} \] The distance is \[ \big( \|x\|^2 - |\langle x, e_1 \rangle |^2 - |\langle x, e_2 \rangle |^2\big)^{1/2} = \big( 6 - {\textstyle \frac{9}{2}} - 1\big)^{1/2} = \frac{1}{\sqrt{2}}, \] which can be checked to fit with \(|(2,1,1) - {\textstyle (\frac{3}{2}, \frac{3}{2},1)} |\).

> Convergence as an l2-property (in Hilbert spaces)

Let \(\{e_j\}_{j \in \N}\) be an orthonormal sequence in a Hilbert space \(H\), and \(\{\lambda_j\}_{j \in \N}\) a sequence of scalars. Then \[ \exists\, \lim_{N \to \infty} \sum_{j = 1}^N \lambda_j e_j \quad \text{ in }\: H \quad \Longleftrightarrow\quad \sum_{j=1}^\infty |\lambda_j|^2 < \infty. \] In that case, \(\| \sum_{j \in \N} \lambda_j e_j \|^2 = \sum_{j \in \N} |\lambda_j|^2 \).

N.b. A consequence of this is that every infinite-dimensional separable Hilbert space can be identified with \(l_2\). If the Hilbert space is finite, it can be identified with \(\R^n\) or \(\C^n\); if it is not separable, it is bigger than \(l_2\).

Proof

Proof

Let \(x_n := \sum_{j=1}^n \lambda_j e_j\). For \(m > n\), \[ \| x_m - x_n \|^2 = \Big\| \sum_{j=n+1}^m \lambda_j e_j \Big\|^2 = \sum_{j,k=n+1}^m \lambda_j \overline{\lambda_k} \langle e_j, e_k \rangle = \sum_{j=n+1}^m |\lambda_j|^2, \] meaning that \(\{x_n\}_{n \in \N}\) is Cauchy exactly if \(\sum_{j=1}^\infty |\lambda_j|^2\) converges in \(\R\). Since \(H\) is complete, this happens exactly if \(\{x_n\}_{n \in \N}\) converges in \(H\). A similar calculation shows that \[ \Big\| \sum_{j=1}^m \lambda_j e_j \Big\|^2 = \sum_{j=1}^m |\lambda_j|^2. \] When (one of) these sums converge we may let \(m \to \infty\) to obtain the desired equality.


  • An orthonormal system \(\{e_j\}_j \subset H\) is called an orthonormal basis for \(H\) if \[ x = \sum_{j} \langle x, e_j \rangle e_j \quad\text{ for all } x \in H. \]

Ex.
  • In \(\R^n\), \(\C^n\), \(l_2(\R)\) and \(l_2(\C)\), the canonical basis \(\{e_j\}_j\) is also an orthonormal basis.
  • The vectors \[ {\textstyle \frac{1}{\sqrt{2}}} (1,1,0), \quad {\textstyle \frac{1}{\sqrt{2}}}(1,-1,0), \quad (0,0,1) \] form an orthonormal basis for \(\R^3\).
  • \(\{\frac{1}{\sqrt{2}},\cos(x), \sin(x), \cos(2x), \sin(2x), \ldots\}\) is an orthonormal basis for \(L_2((-\pi,\pi),\R)\) if we equip it with the inner product \[ \langle f, g \rangle = \frac{1}{\pi}\int_{-\pi}^{\pi} f(x) g(x)\,dx.\] (One may also use the standard inner product and scale the functions with \(1/\sqrt{\pi}\).)
  • \(\{e^{ikx}\}_{k\in \Z}\) is an orthonormal basis for \(L_2((-\pi,\pi),\C)\) if we equip it with the inner product \[ \langle f, g \rangle = \frac{1}{2\pi} \int_{-\pi}^{\pi} f(x)\overline{g(x)}\,dx.\] Equivalently, one may use the standard inner product and scale the functions with \(1/\sqrt{2\pi}\).

> The Fourier series theorem

Let \(M = \{e_j\}_{j \in \N}\) be an orthonormal sequence in a Hilbert space \(H\). Then the following are equivalent:

  • \(M\) is complete.
  • \(\overline{\mathrm{span}(M)} = H\).
  • \(M\) is an orthonormal basis for \(H\).
  • For all \(x \in H\), \(\|x||^2 = \sum_{j\in \N} | \langle x, e_j \rangle |^2\).

N.b.

  • An analog result holds for orthonormal systems (in particular: for finite sets).
  • The last equality is known as Parseval's identity.

Proof

Proof

(i) \(\Longrightarrow\) (ii): If \(M\) is complete, then \(M^\perp = \{0\}\), so that \(\overline{\mathrm{span}(M)} = H\) (else, there would exists a non-zero vector in its orthogonal complement.

(ii) \(\Longrightarrow\) (iii): If \(\overline{\mathrm{span}(M)} = H\), then, for any \(x \in H\), there exist \(\{\lambda_j\}_{j \in \N}\) such that \[ \lim_{N \to \infty} \sum_{j=1}^N \lambda_j e_j = x. \] But \[ \Big\| \sum_{j=1}^N \lambda_j e_j - x \Big\|^2 \geq \Big\| \sum_{j=1}^N \langle x, e_j \rangle e_j - x \Big\|^2 \geq 0, \] so that \(x = \sum_{j=1}^\infty \langle x, e_j \rangle e_j\).

(iii) \(\Longrightarrow\) (iv): If \(M\) is an orthonormal basis, it is immediate that \[ \|x\|^2 = \Big\langle \sum_{j\in \N} \langle x, e_j \rangle e_j, \sum_{j\in \N} \langle x, e_j \rangle e_j \Big\rangle = \sum_{j \in \N} | \langle x, e_j \rangle |^2. \]

(iv) \(\Longrightarrow\) (i): Finally, if \(\|x\|^2 = \sum_{j \in \N} | \langle x, e_j \rangle |^2\) for all \(x \in H\), and \(x \perp M\), then \(\|x\| = 0\). Hence, there is no non-zero vector in \(M^\perp\), which is the definition of \(M\) being complete.


Ex.
  • Consider \(L_2((-\pi,\pi),\C)\) with the orthonormal basis \(\{e^{ikx}\}_{k\in \Z}\) and the inner product \[ \langle f, g \rangle = \frac{1}{2\pi} \int_{-\pi}^{\pi} f(x)\overline{g(x)}\,dx.\] The Fourier coefficients are given by \[ \hat f_k := \langle f, e^{ik\cdot} \rangle = \frac{1}{2\pi} \int_{-\pi}^{\pi} f(x) e^{-ikx}\,dx, \] and Parseval's identity states that \[ \|f\|^2 = \frac{1}{2\pi} \int_{-\pi}^{\pi} |f(x)|^2\,dx = \sum_{k = -\infty}^\infty |\hat f_k|^2 = \sum_{k = -\infty}^\infty |\langle f, e^{ik\cdot} \rangle|^2. \]
1)
Note that, in general, not all direct sums describe orthogonal subspaces.
2017-03-24, Hallvard Norheim Bø