
# Orthogonality

Consider an inner-product space $(X, \langle \cdot, \cdot \rangle)$ over a field $\K \in \{\R, \C\}$. When $X$ is complete, we shall write $H$ to indicate that it is a Hilbert space.

## The projection and Riesz representation theorems

### Orthogonal vectors and set

• Two vectors $x,y \in X$ are said to be orthogonal, $x \perp y \defarrow \langle x,y \rangle = 0.$
• A vector $x \in X$ is said to be orthogonal to a set $M \subset X$, $x \perp M \defarrow \langle x, y \rangle = 0 \quad\text{ for all } y \in M.$
• Two sets $M,N \subset X$ are said to be orthogonal, $M \perp N \defarrow \langle x, y \rangle = 0 \quad\text{ for all } \quad x \in M, y \in N.$
• The orthogonal complement of a set $M \in X$ consists of all vectors orthogonal to $M$: $M^\perp \quad\stackrel{\text{def.}}{=}\quad \{ x \in X \colon x \perp M\}.$

The word perpendicular is sometimes used interchangeably with 'orthogonal', but mostly in $\R^n$.

Ex.
• In $\R^3$, the vector $(1,2,1)$ is orthogonal to the plane $\{x_1 + 2x_2 + x_3 = 0\}$.
• In $L_2((-\pi,\pi),\C)$ the vectors $e^{ikx}$, $k \in \Z$, are all orthogonal to each other: $\langle e^{ik_1 x}, e^{ik_2x} \rangle = \int_{-\pi}^\pi e^{ik_1 x} \overline{e^{ik_2 x}}\,dx = \int_{-\pi}^\pi e^{i(k_1-k_2) x} \,dx = \frac{e^{i(k_1-k_2)x}}{i(k_1-k_2)}\bigg|_{-\pi}^{\pi} = 0 \quad\text{ for }\quad k_1 \neq k_2,$ by periodicity of $e^{ikx} = \cos(kx) + i\sin(kx)$.
• In $l_2(\R)$, with $e_1 = (1,0,0, \ldots)$, $\{e_1\}^\perp = \{ x \in l_2(\R) \colon x = (0,x_2, x_3, \ldots) \}.$

* In a Hilbert space, $H = \oplus_{j=1}^m H_j$ for subspaces $H_j \subset H$ means that $x = \sum_{j=1}^m x_j, \quad x_j \in H_j, \qquad H_j \perp H_k \text{ for } j \neq k,$ which gives a unique representation of any $x \in H$ in terms of elements in orthogonal subspaces.1)

Ex.
• $\R^3 = \mathrm{span}\{(1,0,0)\} \oplus \mathrm{span}\{(0,1,0),(0,0,1)\},$ but also $\R^3 = \mathrm{span}\{(1,2,1)\} \oplus \{x_1 + 2x_2 + x_3 = 0\}.$

### > The projection theorem

Let $M \subset H$ be a closed linear subspace of a Hilbert space $H$. Then $H = M \oplus M^\perp$.

Proof

Proof

Existence of $y_0 \in M$: Pick $x_0 \in H$. By the minimal distance theorem, there exists a unique point $y_0 \in M$ with $\|x_0 - y_0\| = \inf_{y \in M} \| x_0 -y\|.$

Existence of $x_0 - y_0 \in M^\perp$: Since $M$ is a subspace, $y_0 + \lambda y \in M$ for any $y \in M$, $\lambda \in \K$. Hence $\|x_0 - y_0\|^2 \leq \| x_0 - y_0 - \lambda y\|^2 = \| x_0 - y_0\|^2 - 2 \Re (\lambda \langle y, x_0 - y_0 \rangle) + |\lambda|^2 \|y\|^2,$ and $-2 \Re (\lambda \langle y, x_0 - y_0 \rangle) + |\lambda|^2 \|y\|^2 \geq 0.$ By taking $\lambda = \varepsilon \ll 1$, we see that $\Re (\lambda \langle y, x_0 - y_0 \rangle) \leq 0,$ and, similarly, by taking $\lambda = -i \varepsilon$, that $\Im (\lambda \langle y, x_0 - y_0 \rangle) \leq 0.$ Since $y \in M$ is arbitrary, by exchanging $-y$ for $y$, we obtain $\langle y, x_0 - y_0 \rangle = 0 \quad\text{ for any }\quad y \in M.$ Thus we can write $x_0 = y_0 + (x_0 - y_0), \quad\text{ where }\quad y_0 \in M, \quad x_0 - y_0 \in M^\perp.$

Uniqueness: If we have two representations $x_0 = y_0 + z_0$ and $x_0 = \tilde y_0 + \tilde z_0$, then $M \ni y_0 - \tilde y_0 = \tilde z_0 - z_0 \in M^\perp,$ but only the zero vector is orthogonal to itself, implying that $y_0 = \tilde y_0$ and $z_0 = \tilde z_0$.

Ex.
• The null space of a matrix $A \in M_{m\times n}(\R)$ is closed linear subspace, so that $\R^n = \mathrm{ker}(A) \oplus (\mathrm{ker}(A))^\perp$. The geometric rank–nullity theorem characterizes the orthogonal complement as the range of the transpose matrix: $\R^n = \mathrm{ker}(A) \oplus \mathrm{ran}(A^t).$

### > Corollary: strict subspace characterization

If $M \subsetneq H$ is a closed linear subspace of $H$, there exists a non-zero vector $z_0 \in H$ with $z_0 \perp M$.

Proof

Proof

Since $M \neq H$ there exists $x_0 \in H \setminus M$. According to the projection theorem, $x_0 = y_0 + z_0$ with $y_0 \in M$, $z_0 \in M^\perp$. Then $z_0 \neq 0$, and $z_0 \perp M$ is the vector we are looking for.

Ex.
• Let $M = \overline{l_0}$ be the closure of $l_0 = \{ x \in l_2 \colon \{x_j\}_{j \in \N} \text{ has finitely many non-zero entries} \}$ in $l_2$. Is $M = l_2$? Say there exists $z \in l_2$ such that $z \perp M$. Since $\{e_j\}_{j\in \N} \subset M$, we have $\langle z, e_j \rangle = z_j = 0 \quad\text{ for all }\quad j \in \N.$ Thus $z = 0$, and $\overline{l_0} = l_2$.

### > The Riesz representation theorem

A Hilbert space is its own dual: every bounded linear functional $T \in B(H, \K)$ is given by an inner product, $Tx = \langle x, z \rangle,$ for a unique $z \in H$. Moreover, $\|T\|_{B(H,\K)} = \|z\|_H$.

N.b. Note that any function $x \mapsto \langle x, y \rangle$ defines a bounded linear functional on $H$.

Proof

Proof

Existence: Let $N = \mathrm{ker}(T).$ Then $N$ is a closed linear subspace of $H$. If $N = H$, we have $T = 0$ in $B(H,\K)$ and $Tx = \langle x, 0 \rangle$.

Assume now that $N \neq H$. According to the corollary above, there exists $z_0 \in N^\perp$, $z_0 \neq 0$. Since $z_0 \perp \mathrm{ker}(T)$ we have $T z_0 \neq 0$. Consequently, $x - \frac{Tx}{Tz_0} z_0 \in \mathrm{ker}(T) \quad \text{ for all }\quad x \in H,$ implying $\Big\langle x - \frac{Tx}{Tz_0} z_0, z_0 \Big\rangle = 0 \quad \Leftrightarrow\quad Tx \Big\langle \frac{1}{T z_0} z_0, z_0 \Big\rangle = \langle x, z_0 \rangle \quad \Leftrightarrow\quad Tx = \frac{Tz_0}{\| z_0\|^2} \langle x, z_0 \rangle = \Big\langle x, \frac{\overline{Tz_0}}{\| z_0\|^2} z_0 \Big\rangle.$ Thus $Tx = \langle x, z \rangle \quad \text{ for }\quad z := \frac{\overline{Tz_0}}{\|z_0\|^2} z_0.$

Uniqueness: If, in addition, $Tx = \langle x, w \rangle \quad\text{ for all }\quad x \in H,$ then $\langle x, z- w \rangle = Tx - Tx = 0 \quad \text{ for all }\quad x \in H,$ so that $z = w$.

Equality of norms: We have $\|T\| = \sup_{\|x\| = 1}|Tx| = \sup_{\|x\| = 1}|\langle x, z \rangle| \leq \sup_{\|x\| = 1} \|x\| \|z\| = \|z\|,$ by the Cauchy–Schwarz inequality. Contrariwise, $\|z\|^2 = \langle z, z \rangle = |Tz| \leq \|T\| \|z\| \quad\Longrightarrow\quad \|z\| \leq \|T\|.$ Thus $\|T\| = \|z\|$.

Ex.
• $\C^n$ is its own dual: every bounded linear functional on $\C^n$ is realized by a dot product: $Tx = x \cdot \overline{y}$.
• $L_2(\R,\R)$ is its own dual: every bounded linear functional on $L_2(\R,\R)$ is realized by an inner product: $Tf = \int_\R f(s) \overline{g(s)}\,ds.$ By the Cauchy–Schwarz inequality, $|Tf| = | \langle f,g \rangle | = \Big| \int_\R f(s) \overline{g(s)}\,ds \Big| \leq \Big( \int_\R |f(s)|^2\,ds \Big)^{1/2} \Big( \int_\R |g(s)|^2\,ds \Big)^{1/2} = \|f\| \|g\|,$ with equality for $f = \lambda g$; hence, $\|T\| = ||g\|$.

## Orthonormal systems, Bessel's inequality and the Fourier series theorem

• A sequence $\{e_j\}_{j \in \N}$ is called orthogonal if $e_j \perp e_k$ for $j \neq k$. If, in addition, $\|e_j\| = 1$ for all $j \in \N$, it is called orthonormal.
• An orthonormal sequence is called complete if there are no non-zero vectors orthogonal to it: $K \text{ complete} \defarrow K^\perp = \{0\}.$
• If $\{e_j\}_{j \in \N} \subset H$ is an orthonormal sequence, the projection $\langle x, e_j \rangle$ is called the $j$th Fourier coeffecient of $x$. The series $\sum_{j \in \N} \langle x, e_j \rangle e_j$ is called the Fourier series of $x$ with respect to the sequence $\{e_j\}_{j \in \N}$.

N.b. All the above definition carry over to general (finite or infinite) sets, called orthonormal systems.

Ex.
• The canonical basis $\{e_j\}_j$ is an orthonormal basis in $\R^n$, $\C^n$ and $l_2$ (real or complex).
• The sequence $\{\frac{1}{\sqrt{2\pi}} e^{ikx}\}_{k \in \Z}$ is an ortonormal sequence in $L_2((-\pi,\pi),\C)$, since $\|{\textstyle \frac{1}{\sqrt{2\pi}}} e^{ikx}\| = \Big( \int_{-\pi}^{\pi} {\textstyle \frac{1}{\sqrt{2\pi}}} e^{ikx} \overline{{\textstyle \frac{1}{\sqrt{2\pi}}} e^{ikx}} \,dx \Big)^{1/2} = \Big( \frac{1}{2\pi} \int_{-\pi}^{\pi} e^{ikx} e^{-ikx}\,dx \Big)^{1/2} = 1.$

### > Bessel's inequality

An othonormal sequence satisfies $\sum_{j \in \N} | \langle x, e_j \rangle |^2 \leq \| x\|^2,$ for all $x \in X$.

Proof

Proof

\begin{align*} 0 &\leq \Big\| x - \sum_{j=1}^N \langle x, e_j \rangle e_j \Big\|^2\\ &= \|x\|^2 - 2 \Re \Big \langle x, \sum_{j=1}^N \langle x, e_j\rangle e_j \Big\rangle + \Big\langle \sum_{j=1}^N \langle x, e_j\rangle e_j, \sum_{k=1}^N \langle x, e_k\rangle e_k \Big\rangle\\ &= \|x\|^2 - 2 \Re \sum_{j=1}^N \overline{\langle x, e_j \rangle} \langle x, e_j \rangle + \sum_{j=1}^N \sum_{k = 1}^N \langle x, e_j \rangle \overline{\langle x, e_k \rangle} \langle e_j, e_k\rangle\\ &= \|x\|^2 - \sum_{j=1}^N |\langle x, e_j \rangle|^2. \end{align*} Thus $\sum_{j=1}^N |\langle x, e_j \rangle|^2 \leq \|x\|^2,$ irrespective of $N \in \N$. Bessel's inequality is obtained by letting $N \to \infty$.

### > Fourier coefficients are best possible coefficients

An orthonormal sequence satisfies $\Big\| x - \sum_{j=1}^N \lambda_j e_j \Big\| \geq \Big\| x - \sum_{j=1}^N \langle x, e_j \rangle e_j \Big\|,$ for any $N \in \N$ and any scalars $\lambda_1, \ldots, \lambda_N \in \K$. Equality holds if and only if $\lambda_j = \langle x, e_j \rangle$ for all $j \in \N$.

Proof

Proof

\begin{align*} \Big\| x - \sum_{j=1}^N \lambda_j e_j \Big\|^2 &= \|x\|^2 - 2 \Re \Big\langle x, \sum_{j=1}^N \lambda_j e_j \Big\rangle + \Big\| \sum_{j=1}^N \lambda_j e_j \Big\|^2\\ &= \|x\|^2 - 2 \Re \sum_{j=1}^N \overline{\lambda_j} \langle x, e_j \rangle + \sum_{j=1}^N |\lambda_j|^2\\ &= \|x\|^2 + \sum_{j=1}^n |\langle x, e_j \rangle - \lambda_j |^2 - \sum_{j=1}^N |\langle x, e_j \rangle|^2\\ &\geq \|x\|^2 - \sum_{j=1}^N |\langle x, e_j \rangle|^2\\ & = \Big\| x - \sum_{j=1}^N \langle x, e_j \rangle e_j \Big\|^2, \end{align*} where the last equality comes from the proof of Bessel's inequality.

### > Corollary: closest point

If $\{e_1, \ldots, e_n\}$ is an orthonormal system, then $y = \sum_{j=1}^n \langle x,e_j\rangle e_j$ is the closest point to $x$ in $\mathrm{span}\{e_1, \ldots, e_n\}$, with $d = \|x-y\|$ given by $d^2 = \|x\|^2 - \sum_{j=1}^N |\langle x, e_j \rangle|^2.$

N.b. In particular, if $x \in \mathrm{span}\{e_1, \ldots, e_n\}$, then $x = \sum_{j=1}^N \langle x,e_j\rangle e_j$.

Proof

Proof

Since Fourier coefficients are best possible, there is no better approximation of $x$ in $\mathrm{span}\{e_1, \ldots, e_n\}$. The distance formula follows from the proof of Bessel's inequality.

Ex.
• In $\R^3$, what is the closest point in the plane spanned by $e_1 := \frac{1}{\sqrt{2}}(1,1,0)$ and $e_2 := (0,0,1)$ to the point $x = (2,1,1)$? We have \begin{align*} \langle x, e_1 \rangle e_1 + \langle x, e_2 \rangle e_2 &= \big( (2,1,1) \cdot {\textstyle \frac{1}{\sqrt{2}}} (1,1,0) \big) {\textstyle \frac{1}{\sqrt{2}}} (1,1,0) + \big( (2,1,1) \cdot (0,0,1) \big) (0,0,1)\\ &= {\textstyle \frac{3}{2}} (1,1,0) + (0,0,1) = {\textstyle (\frac{3}{2}, \frac{3}{2},1)}. \end{align*} The distance is $\big( \|x\|^2 - |\langle x, e_1 \rangle |^2 - |\langle x, e_2 \rangle |^2\big)^{1/2} = \big( 6 - {\textstyle \frac{9}{2}} - 1\big)^{1/2} = \frac{1}{\sqrt{2}},$ which can be checked to fit with $|(2,1,1) - {\textstyle (\frac{3}{2}, \frac{3}{2},1)} |$.

### > Convergence as an l2-property (in Hilbert spaces)

Let $\{e_j\}_{j \in \N}$ be an orthonormal sequence in a Hilbert space $H$, and $\{\lambda_j\}_{j \in \N}$ a sequence of scalars. Then $\exists\, \lim_{N \to \infty} \sum_{j = 1}^N \lambda_j e_j \quad \text{ in }\: H \quad \Longleftrightarrow\quad \sum_{j=1}^\infty |\lambda_j|^2 < \infty.$ In that case, $\| \sum_{j \in \N} \lambda_j e_j \|^2 = \sum_{j \in \N} |\lambda_j|^2$.

N.b. A consequence of this is that every infinite-dimensional separable Hilbert space can be identified with $l_2$. If the Hilbert space is finite, it can be identified with $\R^n$ or $\C^n$; if it is not separable, it is bigger than $l_2$.

Proof

Proof

Let $x_n := \sum_{j=1}^n \lambda_j e_j$. For $m > n$, $\| x_m - x_n \|^2 = \Big\| \sum_{j=n+1}^m \lambda_j e_j \Big\|^2 = \sum_{j,k=n+1}^m \lambda_j \overline{\lambda_k} \langle e_j, e_k \rangle = \sum_{j=n+1}^m |\lambda_j|^2,$ meaning that $\{x_n\}_{n \in \N}$ is Cauchy exactly if $\sum_{j=1}^\infty |\lambda_j|^2$ converges in $\R$. Since $H$ is complete, this happens exactly if $\{x_n\}_{n \in \N}$ converges in $H$. A similar calculation shows that $\Big\| \sum_{j=1}^m \lambda_j e_j \Big\|^2 = \sum_{j=1}^m |\lambda_j|^2.$ When (one of) these sums converge we may let $m \to \infty$ to obtain the desired equality.

• An orthonormal system $\{e_j\}_j \subset H$ is called an orthonormal basis for $H$ if $x = \sum_{j} \langle x, e_j \rangle e_j \quad\text{ for all } x \in H.$
Ex.
• In $\R^n$, $\C^n$, $l_2(\R)$ and $l_2(\C)$, the canonical basis $\{e_j\}_j$ is also an orthonormal basis.
• The vectors ${\textstyle \frac{1}{\sqrt{2}}} (1,1,0), \quad {\textstyle \frac{1}{\sqrt{2}}}(1,-1,0), \quad (0,0,1)$ form an orthonormal basis for $\R^3$.
• $\{\frac{1}{\sqrt{2}},\cos(x), \sin(x), \cos(2x), \sin(2x), \ldots\}$ is an orthonormal basis for $L_2((-\pi,\pi),\R)$ if we equip it with the inner product $\langle f, g \rangle = \frac{1}{\pi}\int_{-\pi}^{\pi} f(x) g(x)\,dx.$ (One may also use the standard inner product and scale the functions with $1/\sqrt{\pi}$.)
• $\{e^{ikx}\}_{k\in \Z}$ is an orthonormal basis for $L_2((-\pi,\pi),\C)$ if we equip it with the inner product $\langle f, g \rangle = \frac{1}{2\pi} \int_{-\pi}^{\pi} f(x)\overline{g(x)}\,dx.$ Equivalently, one may use the standard inner product and scale the functions with $1/\sqrt{2\pi}$.

### > The Fourier series theorem

Let $M = \{e_j\}_{j \in \N}$ be an orthonormal sequence in a Hilbert space $H$. Then the following are equivalent:

• $M$ is complete.
• $\overline{\mathrm{span}(M)} = H$.
• $M$ is an orthonormal basis for $H$.
• For all $x \in H$, $\|x||^2 = \sum_{j\in \N} | \langle x, e_j \rangle |^2$.

N.b.

• An analog result holds for orthonormal systems (in particular: for finite sets).
• The last equality is known as Parseval's identity.

Proof

Proof

(i) $\Longrightarrow$ (ii): If $M$ is complete, then $M^\perp = \{0\}$, so that $\overline{\mathrm{span}(M)} = H$ (else, there would exists a non-zero vector in its orthogonal complement.

(ii) $\Longrightarrow$ (iii): If $\overline{\mathrm{span}(M)} = H$, then, for any $x \in H$, there exist $\{\lambda_j\}_{j \in \N}$ such that $\lim_{N \to \infty} \sum_{j=1}^N \lambda_j e_j = x.$ But $\Big\| \sum_{j=1}^N \lambda_j e_j - x \Big\|^2 \geq \Big\| \sum_{j=1}^N \langle x, e_j \rangle e_j - x \Big\|^2 \geq 0,$ so that $x = \sum_{j=1}^\infty \langle x, e_j \rangle e_j$.

(iii) $\Longrightarrow$ (iv): If $M$ is an orthonormal basis, it is immediate that $\|x\|^2 = \Big\langle \sum_{j\in \N} \langle x, e_j \rangle e_j, \sum_{j\in \N} \langle x, e_j \rangle e_j \Big\rangle = \sum_{j \in \N} | \langle x, e_j \rangle |^2.$

(iv) $\Longrightarrow$ (i): Finally, if $\|x\|^2 = \sum_{j \in \N} | \langle x, e_j \rangle |^2$ for all $x \in H$, and $x \perp M$, then $\|x\| = 0$. Hence, there is no non-zero vector in $M^\perp$, which is the definition of $M$ being complete.

Ex.
• Consider $L_2((-\pi,\pi),\C)$ with the orthonormal basis $\{e^{ikx}\}_{k\in \Z}$ and the inner product $\langle f, g \rangle = \frac{1}{2\pi} \int_{-\pi}^{\pi} f(x)\overline{g(x)}\,dx.$ The Fourier coefficients are given by $\hat f_k := \langle f, e^{ik\cdot} \rangle = \frac{1}{2\pi} \int_{-\pi}^{\pi} f(x) e^{-ikx}\,dx,$ and Parseval's identity states that $\|f\|^2 = \frac{1}{2\pi} \int_{-\pi}^{\pi} |f(x)|^2\,dx = \sum_{k = -\infty}^\infty |\hat f_k|^2 = \sum_{k = -\infty}^\infty |\langle f, e^{ik\cdot} \rangle|^2.$
1)
Note that, in general, not all direct sums describe orthogonal subspaces.