\[ \newcommand{R}{\mathbb{R}} \newcommand{N}{\mathbb{N}} \newcommand{defarrow}{\quad \stackrel{\text{def}}{\Longleftrightarrow} \quad} \]

Linear transformations

Let \(X\) and \(Y\) be vector spaces (both real, or both complex), and \(T \colon X \to Y\) a mapping between them.

Linear transformations

We say that \(T\) is a linear transformation (or just linear) if it preserves the linear structure of a vector space: \[ T \text{ linear } \defarrow T(\lambda x+ \mu y) = \lambda Tx + \mu Ty, \qquad x,y \in X, \: \mu, \lambda \in \mathbb R \: (\text{or } \mathbb C). \]

Ex.
  • Any matrix \(A \in M_{m \times n}(\R)\) defines a linear transformation \(\R^n \to \R^m\): \[ \underbrace{\begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix}}_{x} \mapsto \underbrace{ \begin{bmatrix} a_{11} & a_{12} & \cdots & a_{1n}\\ a_{21} & a_{22} & \cdots & a_{2n}\\ \vdots & \vdots & \ddots & \vdots\\ a_{m1} & a_{m2} & \cdots & a_{mn}\\ \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix}.}_{Ax}\]
  • The integral operator defined by \( Tf(t) := \int_0^t f(s)\,ds \) is a linear transformation on \(C(I,\R)\): \[ T \colon C(I,\R) \to C(I,\R), \quad Tf = \bigg[ t \mapsto \int_0^t f(s)\,ds\bigg].\]
  • A slight modification, \[ Tf := \int_0^1 f(s)\,ds, \] yields a linear transformation \(C(I,\R) \to \R\) (given that \([0,1] \subset I\)). 1)
  • For any polynomial \(p \in P_k(\R)\), the differential operator \( p(D) := \sum_{j=0}^k a_j D^j\) is a linear transformation: \[ p(D) \colon C^k(I,\R) \to C(I,\R), \qquad p(D)f = \sum_{j=0}^k a_j f^{(j)}. \] Here, \(D = \frac{d}{dx}\) is the standard differentiation operator.
  • The shift operator \( T \colon (x_1, x_2, \ldots) \mapsto (0,x_1,x_2, \ldots),\) is a linear transformation \(l_p \to l_p\), for any \(p\), \(1 \leq p \leq \infty\): \[T(\lambda x + \mu y) = (0, \lambda x_1 + \mu y_1, \ldots) = \lambda (0, x_1, \ldots) + \mu (0,y_1, \ldots) = \lambda Tx + \mu Ty. \] Note that \(\|Tx\|_{l_p} = \|x\|_{l_p}\) guarantees that \(\mathrm{ran}(T) \subset l_p\).

The set of linear transformations as a vector space

The set of linear transformations \(X \to Y\) is denoted by \(L(X,Y)\): \[ L(X,Y) \stackrel{\text{def.}}{=} \{ T \colon X \to Y \text{ linear}\} \]

IF \(X = Y\), we may abbreviate \(L(X,X)\) by \(L(X)\).

› L(X,Y) is a vector space

If, for all \(S,T \in L(X,Y)\), we define \[ (T+S)(x) := Tx + Sx \quad\text{ and }\quad (\lambda T)x := \lambda (Tx),\] for all \(x \in X\) and \(\lambda \in \R\) (or \(\mathbb C\)), it is easily checked that \(L(X,Y)\) becomes a vector space. In particular, \(\mu T + \lambda S \in L(X,Y)\) for any \(S,T \in L(X,Y)\).

Ex.
  • The set of \(m \times n\)-matrices \(M_{m\times n}(\R)\) forms a real vector space. As we shall see, \(M_{m \times n}(\R) \cong L(\R^n,\R^m)\).

› A linear transformation is determined by its action on any basis

Let \(X\) be a finite-dimensional 2) vector space with basis \(\{e_1, \ldots, e_n\}\). For any values \(y_1, \ldots, y_n \in Y\) there exists exactly one linear transformation \(T \in L(X,Y)\) such that \[ Te_j = y_j, \qquad j =1,\ldots,n.\]

Proof

Proof

Any \(x \in X\) has a unique representation \(x = \sum_{j=1}^n x_j e_j\). Define \(T\) through \[Tx = \sum_{j=1}^n x_j y_j.\] Then \(Te_j = y_j\), and \(T\) is linear since it acts as multiplication with a \(1 \times n\) matrix (a dot product with the vector \((y_1, \ldots, y_n)\)). Moreover, if \(S \in L(X,Y)\) also satisfies \(Se_j = y_j\), then \[ Sx = S\big( \sum_{j=1}^n x_j e_j\big) = \sum_{j=1}^n x_j S e_j = \sum_{j=1}^n x_j y_j = Tx, \qquad\text{ for all } x \in X,\] so that \(S = T\) in \(L(X,Y)\).


Ex.
  • The columns \(A_j\) of an \(m \times n\)-matrix \(A\) are determined by its action on the standard basis \(\{e_j\}_{j=1}^n\): \[ Ae_j = A_j, \qquad j = 1, \ldots, n. \] Here \(A_j\) plays the role of \(y_j\) in the above theorem.

› Linear transformations between finite-dimensional vector spaces correspond to matrices

Let \(X,Y\) be real vector spaces of dimension \(n\) and \(m\), respectively. Then \(L(X,Y) \cong M_{m\times n}(\R)\).

N.b. The corresponding statement holds for complex vector spaces \(X,Y\), with \(M_{m\times n}(\mathbb C)\) also complex-valued.

Proof

Proof

Since \(X \cong \R^n\) and \(Y \cong \R^m\) it suffices to prove the statement for these choices of \(X\) and \(Y\). Let \(\{e_j\}_{j=1}^n\) be the standard basis for \(\R^n\). Then

\[ T \colon \begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix} \mapsto \begin{bmatrix} a_{11} & a_{12} & \cdots & a_{1n}\\ a_{21} & a_{22} & \cdots & a_{2n}\\ \vdots & \vdots & \ddots & \vdots\\ a_{m1} & a_{m2} & \cdots & a_{mn}\\ \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix}\] is a linear transformation \(\R^n \to \R^m\) satisfying \[T e_j = \begin{bmatrix} a_{1j} \\ a_{2j} \\ \vdots \\ a_{mj} \end{bmatrix}. \] According to the above proposition, there is exactly one such \(T \in L(\R^n,\R^m)\). Since we can choose the columns of \(A = (a_{ij})_{ij}\) to be any elements in \(\R^m\), we get all possible \(T \in L(\R^n,\R^m)\) in this way.


Ex.
  • The linear transformation \(T\colon (x_1,x_2) \mapsto (-x_2,x_1)\) on \(\R^2\) is realized by a rotation matrix \(A\): \[ \begin{bmatrix} 0 & -1 \\ 1 & 0\end{bmatrix}\begin{bmatrix} x_1 \\ x_2\end{bmatrix} = \begin{bmatrix} -x_2 \\ x_1\end{bmatrix}.\]
  • More generally, \[ \begin{bmatrix} \cos(\theta) & -\sin(\theta) \\ \sin(\theta) & \cos(\theta) \end{bmatrix} \] rotates a vector \( \theta\) radians counterclockwise; the preceeding example is attained for \(\theta=\pi/2\). Any such matrix also corresponds to a change of basis3): if \(f_1 = (\cos(\theta), \sin(\theta))\) and \(f_2 = (-\sin(\theta),\cos(\theta))\), then \[ \begin{bmatrix} \cos(\theta) & -\sin(\theta) \\ \sin(\theta) & \cos(\theta) \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} \] expresses the coordinates \((x_1,x_2)_f\) in the standard basis \(e\) as \(x_1 f_1 + x_2 f_2\).
  • The differential operator \(\frac{d}{dx}\) is a linear operator on \(P_2(\R)\). Since \(P_2(\R) \cong \R^3\) via the vector space isomorphism \[ \sum_{j=0}^2 a_j x^j \stackrel{\varphi}{\mapsto} (a_0,a_1,a_2), \] we see that \[\frac{d}{dx} \colon \begin{bmatrix} 0 & 1 & 0\\ 0 & 0 & 2 \\ 0 & 0 & 0 \end{bmatrix} \begin{bmatrix} a_0 \\ a_1 \\ a_2 \end{bmatrix} = \begin{bmatrix} a_1 \\ 2a_2 \\ 0 \end{bmatrix}\] expresses the derivation \[\frac{d}{dx} \big(a_0 + a_1 x + a_2 x^2\big) = a_1 + 2 a_2 x + 0 x^2 \] using a matrix. 4)

Digression: representing linear transformations in different bases

Let \(e = \{e_1, \ldots, e_n\}\) (standard basis) and \(f = \{f_1, \ldots, f_n\}\) (new basis) be two bases for \(\R^n\), and \([f]\) the matrix with \([f_j]\), \( j=1,\ldots, n\), as column vectors (expressed in the standard basis \(e\)). Then \[ x_e = [f] x_f \quad\text{ and }\quad x_f = [f]^{-1}x_e. \] Hence, if \[ T \in L(\R^n) \colon\quad T \text{ is realised by } A_e \in M_{n \times n}(\R) \text{ in the basis } e,\] what is its realisation \(A_f\) in the basis \(f\)? We have \[ y_e = A_e x_e \quad\Longleftrightarrow\quad y_f = [f]^{-1} y_e = [f]^{-1} A_e x_e = [f]^{-1} A_e [f] x_f. \] Thus \[ A_f = [f]^{-1} A_e [f] \] is the realisation of \(T\) in the basis \(f\).

Ex.
  • How do we express the rotation \[ \begin{bmatrix} x_1 \\ x_2\end{bmatrix}_e \mapsto \begin{bmatrix} 0 & -1 \\ 1 & 0\end{bmatrix}\begin{bmatrix} x_1 \\ x_2\end{bmatrix}_e = \begin{bmatrix} -x_2 \\ x_1\end{bmatrix}_e\] in the basis \(f = \{(1,1), (-1,0)\}\)? Since \[ [f] = \begin{bmatrix} 1 & -1 \\ 1 & 0 \end{bmatrix} \quad\text{ and }\quad [f]^{-1} = \begin{bmatrix} 0 & 1 \\ -1 & 1 \end{bmatrix}, \] we have \[A_f = \underbrace{\begin{bmatrix} 0 & 1 \\ -1 & 1 \end{bmatrix}}_{[f]^{-1}} \underbrace{\begin{bmatrix} 0 & -1 \\ 1 & 0\end{bmatrix}}_{A_e} \underbrace{\begin{bmatrix} 1 & -1 \\ 1 & 0 \end{bmatrix}}_{[f]} = \begin{bmatrix} 1 & -1 \\ 2 & -1 \end{bmatrix}. \] Check: \( (x_1,x_2)_e \stackrel{[f]^{-1}}{\mapsto} (x_2, x_2-x_1)_f \stackrel{A_f}{\mapsto} (x_1, x_1+x_2)_f \stackrel{[f]}{\mapsto} (-x_2, x_1)_e \) describes the correct transformation.

Kernels and ranks

Let \(T \in L(X,Y)\). The set of vectors for which \(T\) vanishes is called the kernel of T. \[ \mathrm{ker}(T) \stackrel{\text{def.}}{=} \{ x \in X \colon Tx = {\bf 0} \text{ in } Y\}. \]

› The kernel and range of a linear transformation are vector spaces

Let \(T \in L(X,Y)\). Then \(\mathrm{ker}(T) \subset X\) is a linear subspace of \(X\), and \(\mathrm{ran}(T) \subset Y\) is a linear subspace of \(Y\).

N.b. The dimension of \(\mathrm{ran}(T)\) is called the rank of \(T\), \(\mathrm{rank}(T)\).

Proof

Proof

For the kernel of \(T\): If \(x_1,x_2 \in \mathrm{ker}(T)\), then \[T(\lambda x_1 + \mu x_2) = \lambda T x_1 + \mu T x_2 = \lambda \bf{0} + \mu \bf{0} = \bf{0}. \] This shows that \(\mathrm{ker}(T)\) is a subspace of \(X\). (Note, in particular, that the zero element of \(X\) is always in \(\mathrm{ker}(T)\).)

For the range of \(T\): If \(y_1, y_2 \in \mathrm{ran}(T)\), then there exists \(x_1, x_2 \in X\) such that \[ Tx_1 = y_1, \qquad Tx_2 = y_2. \] We want to show that, for any scalars \(\lambda, \mu\), we have \(\lambda y_1 + \mu y_2 \in \mathrm{ran}(T)\). But this follows from that \[ \lambda y_1 + \mu y_2 = \lambda T x_1 + \mu T x_2 = T(\lambda x_1 + \mu x_2) \in \mathrm{ran}(T), \] where we have used that \(\mu x_1 + \lambda x_2 \in X\), by the properties of a vector space.


Ex.
  • The kernel of \(T \in L(\R^2)\colon (x_1,x_2) \mapsto (-x_2,x_1)\) is the trivial subspace \(\{(0,0)\} \subset \R^2\). Since \(\mathrm{ran}(T) = \R^2\), we have \(\mathrm{rank}(T) = 2\).
  • The differential operator \(\frac{d}{dx}\) is a linear operator \(C^1(\R) \to C(\R)\)5). As we know, \[\mathrm{ker}\big(\frac{d}{dx}\big) = \{f \in C^1(\R) \colon f(x) \equiv c \text{ for some } c \in \R\},\] so that \(\mathrm{ker}(\frac{d}{dx}) \cong \R\) is a one-dimensional subspace of \(C^1(\R)\). Since \[ \frac{d}{dx} \int_0^x f(t)\,dt = f(x) \qquad\text{ for any } f \in C(\R), \] we have \(\mathrm{ran} (\frac{d}{dx}) = C(\R)\) and \(\mathrm{rank}(\frac{d}{dx}) = \infty\).
  • The domain of definition matters: considered as an operator on \(P_n(\R)\) the differential operator \(\frac{d}{dx} \colon P_n(\R) \to P_n(\R)\) still has a one-dimensional kernel (the space of constant polynomials, \(P_0(\R)\)), but its range is now finite-dimensional: \[ \mathrm{ran}\big(\frac{d}{dx}\big) = P_{n-1}(\R) \cong \R^n. \] This even works for \(n = 0\), if we define \(P_{-1}(\R) := \R^0 = \{0\}\). 6)

› A linear transformation is injective if and only if its kernel is trivial

Let \(T \in L(X,Y)\). Then \[ T \text{ injective } \quad \Longleftrightarrow\quad \mathrm{ker}(T) = \{\bf{0}\}.\]

Proof

Proof

\[ \begin{align*} T \text{ injective} \quad&\Longleftrightarrow\quad \big[ Tx = Ty \Longrightarrow x = y \big] \Longleftrightarrow\quad \big[ T(x-y) = 0 \Longrightarrow x - y = 0 \big]\\ \quad&\Longleftrightarrow\quad \big[ Tz = 0 \Longrightarrow z = 0 \big] \quad\Longleftrightarrow\quad \mathrm{ker}(T) = \{0\}. \end{align*} \]


Ex.
  • A matrix \(A \in M_{m\times n}(\R)\) describes a linear transformation \(\R^n \to \R^m\). This transformation is injective if zero (the zero element in \(\R^n\)) is the only solution of the corresponding linear homogeneous system: \[ \begin{align*} &a_{11} x_1 &+ \ldots & a_{1n} x_n = 0\\ &\vdots & & \vdots \\ &a_{m1} x_1 &+ \ldots & a_{mn} x_n = 0\end{align*} \quad\Longrightarrow\quad (x_1,\ldots,x_n) = (0, \ldots, 0).\]

Matrices: null spaces, column spaces and row spaces

Let \(A = (a_{ij})_{ij} \in M_{m\times n}(\R)\) be the matrix realisation of a linear map \(\R^n \to \R^m\).7)

The null space of a matrix

In this case the kernel of \(A\) is also called the null space of \(A\): \[ \begin{align*} x \in \mathrm{ker}(A) \quad&\Longleftrightarrow\quad Ax = 0 \quad\Longleftrightarrow\quad \sum_{j=1}^n a_{ij} x_j = 0 \quad \forall i = 1, \ldots, m\\ \quad&\Longleftrightarrow\quad (x_1, \ldots, x_n) \perp (a_{i1}, \ldots, a_{in}) \quad \text{ for all } i = 1, \ldots, n. \end{align*} \] Thus, the kernel is the space of vectors \(x \in \R^n\) which are orthogonal to the row vectors of \(A\).

The column space of a matrix

The column space of \(A\) is the range of \(A\): since \[ Ax = \begin{bmatrix} a_{11} & \cdots & a_{1n} \\ \vdots & \ddots & \vdots \\ a_{m1} & \cdots & a_{mn} \end{bmatrix} \begin{bmatrix} x_1 \\ \vdots \\ x_n \end{bmatrix} = x_1 \begin{bmatrix} a_{11} \\ \vdots \\ a_{m1} \end{bmatrix} + x_2 \begin{bmatrix} a_{12} \\ \vdots \\ a_{m2} \end{bmatrix} + \ldots + x_n \begin{bmatrix} a_{1n} \\ \vdots \\ a_{mn} \end{bmatrix}, \] we have that \[ \mathrm{ran}(A) = \{Ax \colon x \in \R^n\} = \Big\{ \sum_{j=1}^n x_j A_j \colon (x_1, \ldots, x_n) \in \R^n \Big\} = \mathrm{span}\{A_1, \ldots, A_n\} \] is the subspace of \(\R^m\) spanned by the column vectors \(A_j\), \(j = 1, \ldots, n\), of \(A\).

The row space of a matrix

Similarly, define the row space of \(A\) to be the space spanned by the row vectors of \(A\). Then \[ \text{ row space of } A = \text{ column space of } A^t, \] where \(A^t = (a_{ji})\) is the transpose of \(A = (a_{ij})\).

› The kernel of a matrix is perpendicular to the range of its transpose

Let \(A \in M_{m\times n}(\R)\). Then \[ \mathrm{ker}(A) \perp \mathrm{ran}(A^t), \] meaning that if \(x \in \mathrm{ker}(A)\) and \(y \in \mathrm{ran}(A^t)\), then \(x \cdot y = \sum_{j=1}^n x_j y_j = 0.\)

Proof

Proof

As shown above, the null space of \(A\) is perpendicular to the row space of \(A\). The row space of \(A\) equals the column space of \(A^t\) (this is the definition of the matrix transpose). The proposition follows.


The rank–nullity theorem and its consequences

› The rank–nullity theorem

Let \(T \in L(\R^n,\R^m)\). Then \[ \mathrm{dim}\, \mathrm{ker}(T) + \mathrm{dim}\, \mathrm{ran}(T) = n. \]

N.b. The name comes from that \(\mathrm{dim}\, \mathrm{ker}(T)\) is the nullity of \(T\). Thus, the sum of the rank and the nullity of \(T\) equals the dimension of its ground space (domain).

Proof

Proof

Pick a basis \(e = \{e_1, \ldots, e_k\}\) for \(\mathrm{ker}(T)\). If \(k = n\) and \( \mathrm{ker}(T) = \R^n\) we are done, since then \[ \mathrm{ran}(T) = \{ Tx \colon x \in \R^n\} = \{0\}, \] so that \(\mathrm{dim}\, \mathrm{ker}(T) + \mathrm{dim}\,\mathrm{ran}(T) = n\).

Hence, assume that \(k < n\) and extend \(e\) to a basis \(\{e_1, \ldots, e_k, f_1, \ldots, f_m\}\) for \(\R^n\).

This can be done in the following way: pick \(f_1 \not\in \mathrm{span}\{e_1, \ldots, e_k\}\). Then \(\{e_1, \ldots, e_n, f_1\}\) is linearly independent. If \(\mathrm{span} \{e_1, \ldots, e_k, f_1\} = \R^n\) we stop. Else, pick \(f_2\not\in \mathrm{span}\{e_1, \ldots, e_k,f_1\}\). Since \(l > n \) vectors are always linearly dependent in \(\R^n\), this process stops when \(k + m = n\) (it cannot stop before, since then \(\R^n\) would be spanned by a set of dimension \( < n\), which is impossible; see the definition of vector space dimension).

We now prove that \(Tf = \{Tf_1, \ldots Tf_m\}\) is a basis for \(\mathrm{ran}(T)\).

\(Tf\) is linearly independent: \[ \sum_{j=1}^m a_j Tf_j = 0 \quad\Longleftrightarrow\quad T\big( \sum_{j=1}^m a_j f_j \big) = 0 \quad\Longleftrightarrow\quad \sum_{j=1}^m a_j f_j \in \mathrm{ker}(T) \quad\Longleftrightarrow\quad a_j = 0 \:\forall j = 1, \ldots, m, \] since \(T\) is linear, and since, by the construction of \(f\), no non-zero linear combination of elements \(f_j\) is in \(\mathrm{ker}(T)\).

Furthermore, \(Tf\) spans \(\mathrm{ran}(T)\): \[ \begin{align*} \mathrm{ran(T)} = \{ Tx \colon x \in \R^n\} &= \Big\{ T \big(\sum_{j=1}^k a_j e_j + \sum_{j=1}^m b_j f_j \big) \colon a_j, b_j \in \R \Big\}\\ &= \Big\{ T \big(\sum_{j=1}^k a_j e_j \big) + T\big(\sum_{j=1}^m b_j f_j\big) \colon a_j, b_j \in \R \Big\} = \Big\{ \sum_{j=1}^m b_j T f_j \colon b_j \in \R\Big\}, \end{align*} \] since \(T\) is linear and \(e \subset \mathrm{ker}(T)\). Hence, \(\{Tf_1, \ldots, Tf_m\}\) is a basis for \(\mathrm{ran}(T)\), and \[ \mathrm{dim}\, \mathrm{ker}(T) + \mathrm{dim}\, \mathrm{ran}(T) = k + m = n. \]


› For finite-dimensional linear transformations, injective means surjective

Let \(T \in L(\R^n)\) be a linear transformation \(\R^n \to \R^n\). Then the following are equivalent:

  • \(T\) is injective \(\quad\) (\(\mathrm{ker}(T) = \{\mathbf{0}\}\))
  • \(T\) is surjective \(\quad\) (\(\mathrm{ran}(T) = \R^n\))
  • \(T\colon \R^n \to \R^n \) is invertible
  • The matrix representation \(A\) of \(T\) (in any given basis) is invertible.
  • For any \(b \in \R^n\) the system \(Ax = b\) has a unique solution \(x\).

Proof

Proof

(i) \(\Longleftrightarrow\) (ii): When \(m = n\), the rank–nullity theorem says that \( \mathrm{ran}(T) = \R^n\) (so that \(T\) is surjective) exactly when \(\mathrm{ker}(T) = \{ \mathbf 0 \}\) (so that \(T\) is injective).

(i,ii) \(\Longleftrightarrow\) (iii): A function is bijective exactly if it is both invertible and surjective.

(iii) \(\Longleftrightarrow\) (iv): Given any basis for \(\R^n\), \(T\) has a unique matrix representation \(A\) (defined by its action on the basis vectors). If the inverse matrix \(A^{-1}\) exists, then there exists a corresponding linear transformation \(S\) such that \(ST = ST = \mathrm{id}\) (since \(A^{-1}A = A A^{-1} = I\), and the identity map \(\mathrm{id}\colon \R^n \to \R^n\) has the identity matrix \(I\) as representation in all bases). Thus \(S = T^{-1}\) is the inverse of \(T\). If, on the other hand, \(T^{-1}\) exists, it must by the same argument have a matrix representation \(B\) such that \(AB = BA = I\). Hence, \(A^{-1} = B\) exists.

(iv) \(\Longleftrightarrow\) (v): If \(A\) is invertible it is immediate that \(x = A^{-1}b\) is the unique solution. If, on the other hand, \(Ax = b\), has a unique solution \(x\) for any \(b\), we construct a matrix \(B\) by taking as its columns \(x_j\) such that \(Ax_j = e_j\), where \(\{e_1, \ldots, e_n\}\) is the standard basis. This guarantees that \(B = A^{-1}\) is the inverse matrix of \(A\). (A less constructive argument would be to note that \(Ax = b\) is uniquely solvable for all \(b \in \R^n\) exactly if \(T\) is invertible.)


Geometric interpretation of the rank–nullity theorem

Define the direct sum \(X \oplus Y\) of two vector spaces (both real, or both complex) as the space of pairs \((x,y)\) with the naturally induced vector addition and scalar multiplication: \[ X \oplus Y \stackrel{\text{def.}}{=} \{(x,y) \in X \times Y\}, \] where \[ (x_1,y_1) + (x_2,y_2) \stackrel{\text{def.}}{=} (x_1+x_2, y_1 + y_2) \quad\text{ and }\quad \lambda(x,y) \stackrel{\text{def.}}{=} (\lambda x, \lambda y). \] If \(X,Y \subset V\) are subspaces of a vector space \(V\), then \[ X \oplus Y = V \quad\Longleftrightarrow\quad X \cap Y = \{0\} \quad\text{ and }\quad X + Y \stackrel{\text{def.}}{=} \{ x + y \colon x \in X, y \in Y\} = V, \] where the equality \(X \oplus Y = V\) should be interpreted in terms of isomorphisms (\(V\) can be represented as \(X \oplus Y\)). Note that \[ \dim(X \oplus Y) = \dim(X) + \dim(Y). \]

With these definitions, the rank–nullity theorem can be expressed as a geometric description of the underlying space (\(\R^n\)) in terms of the matrix \(A\).

› Rank–nullity theorem: geometric version

Let \(A \in M_{m\times n}(\R)\). Then \[ \R^n = \mathrm{ker}(A) \oplus \mathrm{ran}(A^t). \]

N.b. A consequence of this is that \(\mathrm{rank}(A) = \mathrm{rank}(A^t)\); another is that \(\R^m = \mathrm{ker}(A^t) \oplus \mathrm{ran}(A)\).

Proof

Proof

We have already showed that \(\mathrm{ker}(A) \perp \mathrm{ran}(A^t)\) in \(\R^n\), so that \[\mathrm{ker}(A) \cap \mathrm{ran}(A^t) = \{0\};\] this is a consequence of that \(|x|^2 = x \cdot x = 0\) for any \( x \in \mathrm{ker}(A) \cap \mathrm{ran}(A^t)\).

It remains to show that \(\mathrm{ker}(A) \oplus \mathrm{ran}(A^t)\) make up of all \(\R^n\). Since \(\mathrm{ker}(A) \perp \mathrm{ran}(A^t)\) in \(\R^n\), we have \[\mathrm{rank}(A^t) \leq n - \dim(\mathrm{ker}(A) = \mathrm{rank}(A),\] where the last equality follows is the rank–nullity theorem. But this argument is not dependent on \(A\); hence \[ \mathrm{rank}(A) = \mathrm{rank}((A^t)^t) \leq \mathrm{rank}(A^t), \] and \[\mathrm{rank}(A) = \mathrm{rank}(A^t).\] This shows that \[\mathrm{ker}(A) \oplus \mathrm{ran}(A^t) = \R^n\] constitute all of \(\R^n\).


› Summary on linear equations (the Fredholm alternative)

Let \(A \in M_{m\times n}(\R)\) be the realisation of a linear transformation \(\R^n \to \R^m\), and consider the linear equation \[Ax = b.\]

  • Either \(b \in \mathrm{ran}(A)\) and the equation is solvable, or \(b \in \mathrm{ker}(A^t)\) and there is no solution.
  • In case \(\ker(A) = \{0\}\) any solution is unique, else the solutions can be described as \[ x_p + \ker(A), \] where \(x_p\) is any (particular) solution of \(Ax = b\).

Proof

Proof

The first statement is a reformulation of the geometric version of the rank–nullity theorem; the second follows from that \[ \begin{cases} Ax = b\\ Ay = b \end{cases} \quad \Longleftrightarrow \quad \begin{cases} Ax = b\\ A(x-y) = 0 \end{cases} \quad \Longleftrightarrow \quad \begin{cases} Ax = b\\ y = x + z,\quad z \in \mathrm{ker}(A). \end{cases} \] If \(\mathrm{ker}(A) = \{0\}\) there is at most one solution, \(x\), else the solution space is an affine space8) of the same dimension as \(\mathrm{ker}(A)\).

1)
Such an operator is called a linear functional.
2)
For infinite-dimensional Banach spaces one needs the additional concept of boundedness (continuity) of a linear transformation to state a similar result, which then says that the transformation is determined by \(Te_j\) (but we cannot choose \(Te_j = y_j\) arbitrarily).
3)
The determinant of the matrix is \(1\), so it is invertible, regardless of the value of \(\theta\).
4)
Note that this matrix is nilpotent, meaning that \(A^n = 0\) for some \(n \in \mathbb N\) (in this case \(n=3\)). This is because three derivations on any \(p \in P_2(\R)\) produces the zero polynomial.
5)
Here we use the convention that \(C^k(\R) = C^k(\R,\R)\), just as one may write \(L(X) = L(X,X)\) for linear transformations on a space \(X\).
6)
This is the reason why the degree of the zero polynomial is sometimes taken as \(-1\); if \(\mathrm{deg}(0) = -1\), then \(P_{-1}(\R)\) is naturally definied, and the differential operator maps \(P_n(\R) \to P_{n-1}(\R)\) for all \(n \geq 0\).
7)
This realisation is unique as long we have agreed upon a choice of bases for \(\R^n\) and \(\R^m\). If nothing else is said, we assume that vectors in \(\R^n\), \(n \in \mathbb N\), are expressed in the standard basis.
8)
An affine space is a 'translation' of a vector space (or a vector space which has lost its origin); affine spaces are not themselves vector spaces, since they do not have any zero element.
2017-03-24, Hallvard Norheim Bø