$\newcommand{R}{\mathbb{R}} \newcommand{N}{\mathbb{N}} \newcommand{defarrow}{\quad \stackrel{\text{def}}{\Longleftrightarrow} \quad}$

# Linear transformations

Let $X$ and $Y$ be vector spaces (both real, or both complex), and $T \colon X \to Y$ a mapping between them.

## Linear transformations

We say that $T$ is a linear transformation (or just linear) if it preserves the linear structure of a vector space: $T \text{ linear } \defarrow T(\lambda x+ \mu y) = \lambda Tx + \mu Ty, \qquad x,y \in X, \: \mu, \lambda \in \mathbb R \: (\text{or } \mathbb C).$

Ex.
• Any matrix $A \in M_{m \times n}(\R)$ defines a linear transformation $\R^n \to \R^m$: $\underbrace{\begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix}}_{x} \mapsto \underbrace{ \begin{bmatrix} a_{11} & a_{12} & \cdots & a_{1n}\\ a_{21} & a_{22} & \cdots & a_{2n}\\ \vdots & \vdots & \ddots & \vdots\\ a_{m1} & a_{m2} & \cdots & a_{mn}\\ \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix}.}_{Ax}$
• The integral operator defined by $Tf(t) := \int_0^t f(s)\,ds$ is a linear transformation on $C(I,\R)$: $T \colon C(I,\R) \to C(I,\R), \quad Tf = \bigg[ t \mapsto \int_0^t f(s)\,ds\bigg].$
• A slight modification, $Tf := \int_0^1 f(s)\,ds,$ yields a linear transformation $C(I,\R) \to \R$ (given that $[0,1] \subset I$). 1)
• For any polynomial $p \in P_k(\R)$, the differential operator $p(D) := \sum_{j=0}^k a_j D^j$ is a linear transformation: $p(D) \colon C^k(I,\R) \to C(I,\R), \qquad p(D)f = \sum_{j=0}^k a_j f^{(j)}.$ Here, $D = \frac{d}{dx}$ is the standard differentiation operator.
• The shift operator $T \colon (x_1, x_2, \ldots) \mapsto (0,x_1,x_2, \ldots),$ is a linear transformation $l_p \to l_p$, for any $p$, $1 \leq p \leq \infty$: $T(\lambda x + \mu y) = (0, \lambda x_1 + \mu y_1, \ldots) = \lambda (0, x_1, \ldots) + \mu (0,y_1, \ldots) = \lambda Tx + \mu Ty.$ Note that $\|Tx\|_{l_p} = \|x\|_{l_p}$ guarantees that $\mathrm{ran}(T) \subset l_p$.

## The set of linear transformations as a vector space

The set of linear transformations $X \to Y$ is denoted by $L(X,Y)$: $L(X,Y) \stackrel{\text{def.}}{=} \{ T \colon X \to Y \text{ linear}\}$

IF $X = Y$, we may abbreviate $L(X,X)$ by $L(X)$.

### › L(X,Y) is a vector space

If, for all $S,T \in L(X,Y)$, we define $(T+S)(x) := Tx + Sx \quad\text{ and }\quad (\lambda T)x := \lambda (Tx),$ for all $x \in X$ and $\lambda \in \R$ (or $\mathbb C$), it is easily checked that $L(X,Y)$ becomes a vector space. In particular, $\mu T + \lambda S \in L(X,Y)$ for any $S,T \in L(X,Y)$.

Ex.
• The set of $m \times n$-matrices $M_{m\times n}(\R)$ forms a real vector space. As we shall see, $M_{m \times n}(\R) \cong L(\R^n,\R^m)$.

### › A linear transformation is determined by its action on any basis

Let $X$ be a finite-dimensional 2) vector space with basis $\{e_1, \ldots, e_n\}$. For any values $y_1, \ldots, y_n \in Y$ there exists exactly one linear transformation $T \in L(X,Y)$ such that $Te_j = y_j, \qquad j =1,\ldots,n.$

Proof

Proof

Any $x \in X$ has a unique representation $x = \sum_{j=1}^n x_j e_j$. Define $T$ through $Tx = \sum_{j=1}^n x_j y_j.$ Then $Te_j = y_j$, and $T$ is linear since it acts as multiplication with a $1 \times n$ matrix (a dot product with the vector $(y_1, \ldots, y_n)$). Moreover, if $S \in L(X,Y)$ also satisfies $Se_j = y_j$, then $Sx = S\big( \sum_{j=1}^n x_j e_j\big) = \sum_{j=1}^n x_j S e_j = \sum_{j=1}^n x_j y_j = Tx, \qquad\text{ for all } x \in X,$ so that $S = T$ in $L(X,Y)$.

Ex.
• The columns $A_j$ of an $m \times n$-matrix $A$ are determined by its action on the standard basis $\{e_j\}_{j=1}^n$: $Ae_j = A_j, \qquad j = 1, \ldots, n.$ Here $A_j$ plays the role of $y_j$ in the above theorem.

### › Linear transformations between finite-dimensional vector spaces correspond to matrices

Let $X,Y$ be real vector spaces of dimension $n$ and $m$, respectively. Then $L(X,Y) \cong M_{m\times n}(\R)$.

N.b. The corresponding statement holds for complex vector spaces $X,Y$, with $M_{m\times n}(\mathbb C)$ also complex-valued.

Proof

Proof

Since $X \cong \R^n$ and $Y \cong \R^m$ it suffices to prove the statement for these choices of $X$ and $Y$. Let $\{e_j\}_{j=1}^n$ be the standard basis for $\R^n$. Then

$T \colon \begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix} \mapsto \begin{bmatrix} a_{11} & a_{12} & \cdots & a_{1n}\\ a_{21} & a_{22} & \cdots & a_{2n}\\ \vdots & \vdots & \ddots & \vdots\\ a_{m1} & a_{m2} & \cdots & a_{mn}\\ \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix}$ is a linear transformation $\R^n \to \R^m$ satisfying $T e_j = \begin{bmatrix} a_{1j} \\ a_{2j} \\ \vdots \\ a_{mj} \end{bmatrix}.$ According to the above proposition, there is exactly one such $T \in L(\R^n,\R^m)$. Since we can choose the columns of $A = (a_{ij})_{ij}$ to be any elements in $\R^m$, we get all possible $T \in L(\R^n,\R^m)$ in this way.

Ex.
• The linear transformation $T\colon (x_1,x_2) \mapsto (-x_2,x_1)$ on $\R^2$ is realized by a rotation matrix $A$: $\begin{bmatrix} 0 & -1 \\ 1 & 0\end{bmatrix}\begin{bmatrix} x_1 \\ x_2\end{bmatrix} = \begin{bmatrix} -x_2 \\ x_1\end{bmatrix}.$
• More generally, $\begin{bmatrix} \cos(\theta) & -\sin(\theta) \\ \sin(\theta) & \cos(\theta) \end{bmatrix}$ rotates a vector $\theta$ radians counterclockwise; the preceeding example is attained for $\theta=\pi/2$. Any such matrix also corresponds to a change of basis3): if $f_1 = (\cos(\theta), \sin(\theta))$ and $f_2 = (-\sin(\theta),\cos(\theta))$, then $\begin{bmatrix} \cos(\theta) & -\sin(\theta) \\ \sin(\theta) & \cos(\theta) \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \end{bmatrix}$ expresses the coordinates $(x_1,x_2)_f$ in the standard basis $e$ as $x_1 f_1 + x_2 f_2$.
• The differential operator $\frac{d}{dx}$ is a linear operator on $P_2(\R)$. Since $P_2(\R) \cong \R^3$ via the vector space isomorphism $\sum_{j=0}^2 a_j x^j \stackrel{\varphi}{\mapsto} (a_0,a_1,a_2),$ we see that $\frac{d}{dx} \colon \begin{bmatrix} 0 & 1 & 0\\ 0 & 0 & 2 \\ 0 & 0 & 0 \end{bmatrix} \begin{bmatrix} a_0 \\ a_1 \\ a_2 \end{bmatrix} = \begin{bmatrix} a_1 \\ 2a_2 \\ 0 \end{bmatrix}$ expresses the derivation $\frac{d}{dx} \big(a_0 + a_1 x + a_2 x^2\big) = a_1 + 2 a_2 x + 0 x^2$ using a matrix. 4)

### Digression: representing linear transformations in different bases

Let $e = \{e_1, \ldots, e_n\}$ (standard basis) and $f = \{f_1, \ldots, f_n\}$ (new basis) be two bases for $\R^n$, and $[f]$ the matrix with $[f_j]$, $j=1,\ldots, n$, as column vectors (expressed in the standard basis $e$). Then $x_e = [f] x_f \quad\text{ and }\quad x_f = [f]^{-1}x_e.$ Hence, if $T \in L(\R^n) \colon\quad T \text{ is realised by } A_e \in M_{n \times n}(\R) \text{ in the basis } e,$ what is its realisation $A_f$ in the basis $f$? We have $y_e = A_e x_e \quad\Longleftrightarrow\quad y_f = [f]^{-1} y_e = [f]^{-1} A_e x_e = [f]^{-1} A_e [f] x_f.$ Thus $A_f = [f]^{-1} A_e [f]$ is the realisation of $T$ in the basis $f$.

Ex.
• How do we express the rotation $\begin{bmatrix} x_1 \\ x_2\end{bmatrix}_e \mapsto \begin{bmatrix} 0 & -1 \\ 1 & 0\end{bmatrix}\begin{bmatrix} x_1 \\ x_2\end{bmatrix}_e = \begin{bmatrix} -x_2 \\ x_1\end{bmatrix}_e$ in the basis $f = \{(1,1), (-1,0)\}$? Since $[f] = \begin{bmatrix} 1 & -1 \\ 1 & 0 \end{bmatrix} \quad\text{ and }\quad [f]^{-1} = \begin{bmatrix} 0 & 1 \\ -1 & 1 \end{bmatrix},$ we have $A_f = \underbrace{\begin{bmatrix} 0 & 1 \\ -1 & 1 \end{bmatrix}}_{[f]^{-1}} \underbrace{\begin{bmatrix} 0 & -1 \\ 1 & 0\end{bmatrix}}_{A_e} \underbrace{\begin{bmatrix} 1 & -1 \\ 1 & 0 \end{bmatrix}}_{[f]} = \begin{bmatrix} 1 & -1 \\ 2 & -1 \end{bmatrix}.$ Check: $(x_1,x_2)_e \stackrel{[f]^{-1}}{\mapsto} (x_2, x_2-x_1)_f \stackrel{A_f}{\mapsto} (x_1, x_1+x_2)_f \stackrel{[f]}{\mapsto} (-x_2, x_1)_e$ describes the correct transformation.

## Kernels and ranks

Let $T \in L(X,Y)$. The set of vectors for which $T$ vanishes is called the kernel of T. $\mathrm{ker}(T) \stackrel{\text{def.}}{=} \{ x \in X \colon Tx = {\bf 0} \text{ in } Y\}.$

### › The kernel and range of a linear transformation are vector spaces

Let $T \in L(X,Y)$. Then $\mathrm{ker}(T) \subset X$ is a linear subspace of $X$, and $\mathrm{ran}(T) \subset Y$ is a linear subspace of $Y$.

N.b. The dimension of $\mathrm{ran}(T)$ is called the rank of $T$, $\mathrm{rank}(T)$.

Proof

Proof

For the kernel of $T$: If $x_1,x_2 \in \mathrm{ker}(T)$, then $T(\lambda x_1 + \mu x_2) = \lambda T x_1 + \mu T x_2 = \lambda \bf{0} + \mu \bf{0} = \bf{0}.$ This shows that $\mathrm{ker}(T)$ is a subspace of $X$. (Note, in particular, that the zero element of $X$ is always in $\mathrm{ker}(T)$.)

For the range of $T$: If $y_1, y_2 \in \mathrm{ran}(T)$, then there exists $x_1, x_2 \in X$ such that $Tx_1 = y_1, \qquad Tx_2 = y_2.$ We want to show that, for any scalars $\lambda, \mu$, we have $\lambda y_1 + \mu y_2 \in \mathrm{ran}(T)$. But this follows from that $\lambda y_1 + \mu y_2 = \lambda T x_1 + \mu T x_2 = T(\lambda x_1 + \mu x_2) \in \mathrm{ran}(T),$ where we have used that $\mu x_1 + \lambda x_2 \in X$, by the properties of a vector space.

Ex.
• The kernel of $T \in L(\R^2)\colon (x_1,x_2) \mapsto (-x_2,x_1)$ is the trivial subspace $\{(0,0)\} \subset \R^2$. Since $\mathrm{ran}(T) = \R^2$, we have $\mathrm{rank}(T) = 2$.
• The differential operator $\frac{d}{dx}$ is a linear operator $C^1(\R) \to C(\R)$5). As we know, $\mathrm{ker}\big(\frac{d}{dx}\big) = \{f \in C^1(\R) \colon f(x) \equiv c \text{ for some } c \in \R\},$ so that $\mathrm{ker}(\frac{d}{dx}) \cong \R$ is a one-dimensional subspace of $C^1(\R)$. Since $\frac{d}{dx} \int_0^x f(t)\,dt = f(x) \qquad\text{ for any } f \in C(\R),$ we have $\mathrm{ran} (\frac{d}{dx}) = C(\R)$ and $\mathrm{rank}(\frac{d}{dx}) = \infty$.
• The domain of definition matters: considered as an operator on $P_n(\R)$ the differential operator $\frac{d}{dx} \colon P_n(\R) \to P_n(\R)$ still has a one-dimensional kernel (the space of constant polynomials, $P_0(\R)$), but its range is now finite-dimensional: $\mathrm{ran}\big(\frac{d}{dx}\big) = P_{n-1}(\R) \cong \R^n.$ This even works for $n = 0$, if we define $P_{-1}(\R) := \R^0 = \{0\}$. 6)

### › A linear transformation is injective if and only if its kernel is trivial

Let $T \in L(X,Y)$. Then $T \text{ injective } \quad \Longleftrightarrow\quad \mathrm{ker}(T) = \{\bf{0}\}.$

Proof

Proof

\begin{align*} T \text{ injective} \quad&\Longleftrightarrow\quad \big[ Tx = Ty \Longrightarrow x = y \big] \Longleftrightarrow\quad \big[ T(x-y) = 0 \Longrightarrow x - y = 0 \big]\\ \quad&\Longleftrightarrow\quad \big[ Tz = 0 \Longrightarrow z = 0 \big] \quad\Longleftrightarrow\quad \mathrm{ker}(T) = \{0\}. \end{align*}

Ex.
• A matrix $A \in M_{m\times n}(\R)$ describes a linear transformation $\R^n \to \R^m$. This transformation is injective if zero (the zero element in $\R^n$) is the only solution of the corresponding linear homogeneous system: \begin{align*} &a_{11} x_1 &+ \ldots & a_{1n} x_n = 0\\ &\vdots & & \vdots \\ &a_{m1} x_1 &+ \ldots & a_{mn} x_n = 0\end{align*} \quad\Longrightarrow\quad (x_1,\ldots,x_n) = (0, \ldots, 0).

## Matrices: null spaces, column spaces and row spaces

Let $A = (a_{ij})_{ij} \in M_{m\times n}(\R)$ be the matrix realisation of a linear map $\R^n \to \R^m$.7)

### The null space of a matrix

In this case the kernel of $A$ is also called the null space of $A$: \begin{align*} x \in \mathrm{ker}(A) \quad&\Longleftrightarrow\quad Ax = 0 \quad\Longleftrightarrow\quad \sum_{j=1}^n a_{ij} x_j = 0 \quad \forall i = 1, \ldots, m\\ \quad&\Longleftrightarrow\quad (x_1, \ldots, x_n) \perp (a_{i1}, \ldots, a_{in}) \quad \text{ for all } i = 1, \ldots, n. \end{align*} Thus, the kernel is the space of vectors $x \in \R^n$ which are orthogonal to the row vectors of $A$.

### The column space of a matrix

The column space of $A$ is the range of $A$: since $Ax = \begin{bmatrix} a_{11} & \cdots & a_{1n} \\ \vdots & \ddots & \vdots \\ a_{m1} & \cdots & a_{mn} \end{bmatrix} \begin{bmatrix} x_1 \\ \vdots \\ x_n \end{bmatrix} = x_1 \begin{bmatrix} a_{11} \\ \vdots \\ a_{m1} \end{bmatrix} + x_2 \begin{bmatrix} a_{12} \\ \vdots \\ a_{m2} \end{bmatrix} + \ldots + x_n \begin{bmatrix} a_{1n} \\ \vdots \\ a_{mn} \end{bmatrix},$ we have that $\mathrm{ran}(A) = \{Ax \colon x \in \R^n\} = \Big\{ \sum_{j=1}^n x_j A_j \colon (x_1, \ldots, x_n) \in \R^n \Big\} = \mathrm{span}\{A_1, \ldots, A_n\}$ is the subspace of $\R^m$ spanned by the column vectors $A_j$, $j = 1, \ldots, n$, of $A$.

### The row space of a matrix

Similarly, define the row space of $A$ to be the space spanned by the row vectors of $A$. Then $\text{ row space of } A = \text{ column space of } A^t,$ where $A^t = (a_{ji})$ is the transpose of $A = (a_{ij})$.

### › The kernel of a matrix is perpendicular to the range of its transpose

Let $A \in M_{m\times n}(\R)$. Then $\mathrm{ker}(A) \perp \mathrm{ran}(A^t),$ meaning that if $x \in \mathrm{ker}(A)$ and $y \in \mathrm{ran}(A^t)$, then $x \cdot y = \sum_{j=1}^n x_j y_j = 0.$

Proof

Proof

As shown above, the null space of $A$ is perpendicular to the row space of $A$. The row space of $A$ equals the column space of $A^t$ (this is the definition of the matrix transpose). The proposition follows.

## The rank–nullity theorem and its consequences

### › The rank–nullity theorem

Let $T \in L(\R^n,\R^m)$. Then $\mathrm{dim}\, \mathrm{ker}(T) + \mathrm{dim}\, \mathrm{ran}(T) = n.$

N.b. The name comes from that $\mathrm{dim}\, \mathrm{ker}(T)$ is the nullity of $T$. Thus, the sum of the rank and the nullity of $T$ equals the dimension of its ground space (domain).

Proof

Proof

Pick a basis $e = \{e_1, \ldots, e_k\}$ for $\mathrm{ker}(T)$. If $k = n$ and $\mathrm{ker}(T) = \R^n$ we are done, since then $\mathrm{ran}(T) = \{ Tx \colon x \in \R^n\} = \{0\},$ so that $\mathrm{dim}\, \mathrm{ker}(T) + \mathrm{dim}\,\mathrm{ran}(T) = n$.

Hence, assume that $k < n$ and extend $e$ to a basis $\{e_1, \ldots, e_k, f_1, \ldots, f_m\}$ for $\R^n$.

This can be done in the following way: pick $f_1 \not\in \mathrm{span}\{e_1, \ldots, e_k\}$. Then $\{e_1, \ldots, e_n, f_1\}$ is linearly independent. If $\mathrm{span} \{e_1, \ldots, e_k, f_1\} = \R^n$ we stop. Else, pick $f_2\not\in \mathrm{span}\{e_1, \ldots, e_k,f_1\}$. Since $l > n$ vectors are always linearly dependent in $\R^n$, this process stops when $k + m = n$ (it cannot stop before, since then $\R^n$ would be spanned by a set of dimension $< n$, which is impossible; see the definition of vector space dimension).

We now prove that $Tf = \{Tf_1, \ldots Tf_m\}$ is a basis for $\mathrm{ran}(T)$.

$Tf$ is linearly independent: $\sum_{j=1}^m a_j Tf_j = 0 \quad\Longleftrightarrow\quad T\big( \sum_{j=1}^m a_j f_j \big) = 0 \quad\Longleftrightarrow\quad \sum_{j=1}^m a_j f_j \in \mathrm{ker}(T) \quad\Longleftrightarrow\quad a_j = 0 \:\forall j = 1, \ldots, m,$ since $T$ is linear, and since, by the construction of $f$, no non-zero linear combination of elements $f_j$ is in $\mathrm{ker}(T)$.

Furthermore, $Tf$ spans $\mathrm{ran}(T)$: \begin{align*} \mathrm{ran(T)} = \{ Tx \colon x \in \R^n\} &= \Big\{ T \big(\sum_{j=1}^k a_j e_j + \sum_{j=1}^m b_j f_j \big) \colon a_j, b_j \in \R \Big\}\\ &= \Big\{ T \big(\sum_{j=1}^k a_j e_j \big) + T\big(\sum_{j=1}^m b_j f_j\big) \colon a_j, b_j \in \R \Big\} = \Big\{ \sum_{j=1}^m b_j T f_j \colon b_j \in \R\Big\}, \end{align*} since $T$ is linear and $e \subset \mathrm{ker}(T)$. Hence, $\{Tf_1, \ldots, Tf_m\}$ is a basis for $\mathrm{ran}(T)$, and $\mathrm{dim}\, \mathrm{ker}(T) + \mathrm{dim}\, \mathrm{ran}(T) = k + m = n.$

### › For finite-dimensional linear transformations, injective means surjective

Let $T \in L(\R^n)$ be a linear transformation $\R^n \to \R^n$. Then the following are equivalent:

• $T$ is injective $\quad$ ($\mathrm{ker}(T) = \{\mathbf{0}\}$)
• $T$ is surjective $\quad$ ($\mathrm{ran}(T) = \R^n$)
• $T\colon \R^n \to \R^n$ is invertible
• The matrix representation $A$ of $T$ (in any given basis) is invertible.
• For any $b \in \R^n$ the system $Ax = b$ has a unique solution $x$.

Proof

Proof

(i) $\Longleftrightarrow$ (ii): When $m = n$, the rank–nullity theorem says that $\mathrm{ran}(T) = \R^n$ (so that $T$ is surjective) exactly when $\mathrm{ker}(T) = \{ \mathbf 0 \}$ (so that $T$ is injective).

(i,ii) $\Longleftrightarrow$ (iii): A function is bijective exactly if it is both invertible and surjective.

(iii) $\Longleftrightarrow$ (iv): Given any basis for $\R^n$, $T$ has a unique matrix representation $A$ (defined by its action on the basis vectors). If the inverse matrix $A^{-1}$ exists, then there exists a corresponding linear transformation $S$ such that $ST = ST = \mathrm{id}$ (since $A^{-1}A = A A^{-1} = I$, and the identity map $\mathrm{id}\colon \R^n \to \R^n$ has the identity matrix $I$ as representation in all bases). Thus $S = T^{-1}$ is the inverse of $T$. If, on the other hand, $T^{-1}$ exists, it must by the same argument have a matrix representation $B$ such that $AB = BA = I$. Hence, $A^{-1} = B$ exists.

(iv) $\Longleftrightarrow$ (v): If $A$ is invertible it is immediate that $x = A^{-1}b$ is the unique solution. If, on the other hand, $Ax = b$, has a unique solution $x$ for any $b$, we construct a matrix $B$ by taking as its columns $x_j$ such that $Ax_j = e_j$, where $\{e_1, \ldots, e_n\}$ is the standard basis. This guarantees that $B = A^{-1}$ is the inverse matrix of $A$. (A less constructive argument would be to note that $Ax = b$ is uniquely solvable for all $b \in \R^n$ exactly if $T$ is invertible.)

### Geometric interpretation of the rank–nullity theorem

Define the direct sum $X \oplus Y$ of two vector spaces (both real, or both complex) as the space of pairs $(x,y)$ with the naturally induced vector addition and scalar multiplication: $X \oplus Y \stackrel{\text{def.}}{=} \{(x,y) \in X \times Y\},$ where $(x_1,y_1) + (x_2,y_2) \stackrel{\text{def.}}{=} (x_1+x_2, y_1 + y_2) \quad\text{ and }\quad \lambda(x,y) \stackrel{\text{def.}}{=} (\lambda x, \lambda y).$ If $X,Y \subset V$ are subspaces of a vector space $V$, then $X \oplus Y = V \quad\Longleftrightarrow\quad X \cap Y = \{0\} \quad\text{ and }\quad X + Y \stackrel{\text{def.}}{=} \{ x + y \colon x \in X, y \in Y\} = V,$ where the equality $X \oplus Y = V$ should be interpreted in terms of isomorphisms ($V$ can be represented as $X \oplus Y$). Note that $\dim(X \oplus Y) = \dim(X) + \dim(Y).$

With these definitions, the rank–nullity theorem can be expressed as a geometric description of the underlying space ($\R^n$) in terms of the matrix $A$.

### › Rank–nullity theorem: geometric version

Let $A \in M_{m\times n}(\R)$. Then $\R^n = \mathrm{ker}(A) \oplus \mathrm{ran}(A^t).$

N.b. A consequence of this is that $\mathrm{rank}(A) = \mathrm{rank}(A^t)$; another is that $\R^m = \mathrm{ker}(A^t) \oplus \mathrm{ran}(A)$.

Proof

Proof

We have already showed that $\mathrm{ker}(A) \perp \mathrm{ran}(A^t)$ in $\R^n$, so that $\mathrm{ker}(A) \cap \mathrm{ran}(A^t) = \{0\};$ this is a consequence of that $|x|^2 = x \cdot x = 0$ for any $x \in \mathrm{ker}(A) \cap \mathrm{ran}(A^t)$.

It remains to show that $\mathrm{ker}(A) \oplus \mathrm{ran}(A^t)$ make up of all $\R^n$. Since $\mathrm{ker}(A) \perp \mathrm{ran}(A^t)$ in $\R^n$, we have $\mathrm{rank}(A^t) \leq n - \dim(\mathrm{ker}(A) = \mathrm{rank}(A),$ where the last equality follows is the rank–nullity theorem. But this argument is not dependent on $A$; hence $\mathrm{rank}(A) = \mathrm{rank}((A^t)^t) \leq \mathrm{rank}(A^t),$ and $\mathrm{rank}(A) = \mathrm{rank}(A^t).$ This shows that $\mathrm{ker}(A) \oplus \mathrm{ran}(A^t) = \R^n$ constitute all of $\R^n$.

### › Summary on linear equations (the Fredholm alternative)

Let $A \in M_{m\times n}(\R)$ be the realisation of a linear transformation $\R^n \to \R^m$, and consider the linear equation $Ax = b.$

• Either $b \in \mathrm{ran}(A)$ and the equation is solvable, or $b \in \mathrm{ker}(A^t)$ and there is no solution.
• In case $\ker(A) = \{0\}$ any solution is unique, else the solutions can be described as $x_p + \ker(A),$ where $x_p$ is any (particular) solution of $Ax = b$.

Proof

Proof

The first statement is a reformulation of the geometric version of the rank–nullity theorem; the second follows from that $\begin{cases} Ax = b\\ Ay = b \end{cases} \quad \Longleftrightarrow \quad \begin{cases} Ax = b\\ A(x-y) = 0 \end{cases} \quad \Longleftrightarrow \quad \begin{cases} Ax = b\\ y = x + z,\quad z \in \mathrm{ker}(A). \end{cases}$ If $\mathrm{ker}(A) = \{0\}$ there is at most one solution, $x$, else the solution space is an affine space8) of the same dimension as $\mathrm{ker}(A)$.

1)
Such an operator is called a linear functional.
2)
For infinite-dimensional Banach spaces one needs the additional concept of boundedness (continuity) of a linear transformation to state a similar result, which then says that the transformation is determined by $Te_j$ (but we cannot choose $Te_j = y_j$ arbitrarily).
3)
The determinant of the matrix is $1$, so it is invertible, regardless of the value of $\theta$.
4)
Note that this matrix is nilpotent, meaning that $A^n = 0$ for some $n \in \mathbb N$ (in this case $n=3$). This is because three derivations on any $p \in P_2(\R)$ produces the zero polynomial.
5)
Here we use the convention that $C^k(\R) = C^k(\R,\R)$, just as one may write $L(X) = L(X,X)$ for linear transformations on a space $X$.
6)
This is the reason why the degree of the zero polynomial is sometimes taken as $-1$; if $\mathrm{deg}(0) = -1$, then $P_{-1}(\R)$ is naturally definied, and the differential operator maps $P_n(\R) \to P_{n-1}(\R)$ for all $n \geq 0$.
7)
This realisation is unique as long we have agreed upon a choice of bases for $\R^n$ and $\R^m$. If nothing else is said, we assume that vectors in $\R^n$, $n \in \mathbb N$, are expressed in the standard basis.
8)
An affine space is a 'translation' of a vector space (or a vector space which has lost its origin); affine spaces are not themselves vector spaces, since they do not have any zero element.