\[ \newcommand{R}{\mathbb{R}} \newcommand{N}{\mathbb{N}} \newcommand{defarrow}{\quad \stackrel{\text{def}}{\Longleftrightarrow} \quad} \]

# Linear transformations

Let \(X\) and \(Y\) be vector spaces (both real, or both complex), and \(T \colon X \to Y\) a mapping between them.

## Linear transformations

We say that \(T\) is a **linear transformation** (or just **linear**) if it preserves the linear structure of a vector space:
\[
T \text{ linear } \defarrow T(\lambda x+ \mu y) = \lambda Tx + \mu Ty, \qquad x,y \in X, \: \mu, \lambda \in \mathbb R \: (\text{or } \mathbb C).
\]

**Ex.**

- Any matrix \(A \in M_{m \times n}(\R)\) defines a linear transformation \(\R^n \to \R^m\): \[ \underbrace{\begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix}}_{x} \mapsto \underbrace{ \begin{bmatrix} a_{11} & a_{12} & \cdots & a_{1n}\\ a_{21} & a_{22} & \cdots & a_{2n}\\ \vdots & \vdots & \ddots & \vdots\\ a_{m1} & a_{m2} & \cdots & a_{mn}\\ \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix}.}_{Ax}\]
- The integral operator defined by \( Tf(t) := \int_0^t f(s)\,ds \) is a linear transformation on \(C(I,\R)\): \[ T \colon C(I,\R) \to C(I,\R), \quad Tf = \bigg[ t \mapsto \int_0^t f(s)\,ds\bigg].\]
- A slight modification, \[ Tf := \int_0^1 f(s)\,ds, \] yields a linear transformation \(C(I,\R) \to \R\) (given that \([0,1] \subset I\)).
^{1)}

- For any polynomial \(p \in P_k(\R)\), the differential operator \( p(D) := \sum_{j=0}^k a_j D^j\) is a linear transformation: \[ p(D) \colon C^k(I,\R) \to C(I,\R), \qquad p(D)f = \sum_{j=0}^k a_j f^{(j)}. \] Here, \(D = \frac{d}{dx}\) is the standard differentiation operator.

- The shift operator \( T \colon (x_1, x_2, \ldots) \mapsto (0,x_1,x_2, \ldots),\) is a linear transformation \(l_p \to l_p\), for any \(p\), \(1 \leq p \leq \infty\): \[T(\lambda x + \mu y) = (0, \lambda x_1 + \mu y_1, \ldots) = \lambda (0, x_1, \ldots) + \mu (0,y_1, \ldots) = \lambda Tx + \mu Ty. \] Note that \(\|Tx\|_{l_p} = \|x\|_{l_p}\) guarantees that \(\mathrm{ran}(T) \subset l_p\).

## The set of linear transformations as a vector space

**The set of linear transformations** \(X \to Y\) is denoted by \(L(X,Y)\):
\[
L(X,Y) \stackrel{\text{def.}}{=} \{ T \colon X \to Y \text{ linear}\}
\]

IF \(X = Y\), we may abbreviate \(L(X,X)\) by \(L(X)\).

### › L(X,Y) is a vector space

If, for all \(S,T \in L(X,Y)\), we define \[ (T+S)(x) := Tx + Sx \quad\text{ and }\quad (\lambda T)x := \lambda (Tx),\] for all \(x \in X\) and \(\lambda \in \R\) (or \(\mathbb C\)), it is easily checked that \(L(X,Y)\) becomes a vector space. In particular, \(\mu T + \lambda S \in L(X,Y)\) for any \(S,T \in L(X,Y)\).

**Ex.**

- The set of \(m \times n\)-matrices \(M_{m\times n}(\R)\) forms a real vector space. As we shall see, \(M_{m \times n}(\R) \cong L(\R^n,\R^m)\).

### › A linear transformation is determined by its action on any basis

Let \(X\) be a finite-dimensional
^{2)} vector space with basis \(\{e_1, \ldots, e_n\}\). For any values \(y_1, \ldots, y_n \in Y\) there exists exactly one linear transformation \(T \in L(X,Y)\) such that \[ Te_j = y_j, \qquad j =1,\ldots,n.\]

**Ex.**

- The columns \(A_j\) of an \(m \times n\)-matrix \(A\) are determined by its action on the standard basis \(\{e_j\}_{j=1}^n\): \[ Ae_j = A_j, \qquad j = 1, \ldots, n. \] Here \(A_j\) plays the role of \(y_j\) in the above theorem.

### › Linear transformations between finite-dimensional vector spaces correspond to matrices

Let \(X,Y\) be real vector spaces of dimension \(n\) and \(m\), respectively. Then \(L(X,Y) \cong M_{m\times n}(\R)\).

**N.b.** The corresponding statement holds for complex vector spaces \(X,Y\), with \(M_{m\times n}(\mathbb C)\) also complex-valued.

**Ex.**

- The linear transformation \(T\colon (x_1,x_2) \mapsto (-x_2,x_1)\) on \(\R^2\) is realized by a rotation matrix \(A\): \[ \begin{bmatrix} 0 & -1 \\ 1 & 0\end{bmatrix}\begin{bmatrix} x_1 \\ x_2\end{bmatrix} = \begin{bmatrix} -x_2 \\ x_1\end{bmatrix}.\]
- More generally, \[ \begin{bmatrix} \cos(\theta) & -\sin(\theta) \\ \sin(\theta) & \cos(\theta) \end{bmatrix} \] rotates a vector \( \theta\) radians counterclockwise; the preceeding example is attained for \(\theta=\pi/2\). Any such matrix also corresponds to a change of basis
^{3)}: if \(f_1 = (\cos(\theta), \sin(\theta))\) and \(f_2 = (-\sin(\theta),\cos(\theta))\), then \[ \begin{bmatrix} \cos(\theta) & -\sin(\theta) \\ \sin(\theta) & \cos(\theta) \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} \] expresses the coordinates \((x_1,x_2)_f\) in the standard basis \(e\) as \(x_1 f_1 + x_2 f_2\).

- The differential operator \(\frac{d}{dx}\) is a linear operator on \(P_2(\R)\). Since \(P_2(\R) \cong \R^3\) via the vector space isomorphism \[ \sum_{j=0}^2 a_j x^j \stackrel{\varphi}{\mapsto} (a_0,a_1,a_2), \] we see that \[\frac{d}{dx} \colon \begin{bmatrix} 0 & 1 & 0\\ 0 & 0 & 2 \\ 0 & 0 & 0 \end{bmatrix} \begin{bmatrix} a_0 \\ a_1 \\ a_2 \end{bmatrix} = \begin{bmatrix} a_1 \\ 2a_2 \\ 0 \end{bmatrix}\] expresses the derivation \[\frac{d}{dx} \big(a_0 + a_1 x + a_2 x^2\big) = a_1 + 2 a_2 x + 0 x^2 \] using a matrix.
^{4)}

### Digression: representing linear transformations in different bases

Let \(e = \{e_1, \ldots, e_n\}\) (standard basis) and \(f = \{f_1, \ldots, f_n\}\) (new basis) be two bases for \(\R^n\), and \([f]\) the matrix with \([f_j]\), \( j=1,\ldots, n\), as column vectors (expressed in the standard basis \(e\)). Then \[ x_e = [f] x_f \quad\text{ and }\quad x_f = [f]^{-1}x_e. \] Hence, if \[ T \in L(\R^n) \colon\quad T \text{ is realised by } A_e \in M_{n \times n}(\R) \text{ in the basis } e,\] what is its realisation \(A_f\) in the basis \(f\)? We have \[ y_e = A_e x_e \quad\Longleftrightarrow\quad y_f = [f]^{-1} y_e = [f]^{-1} A_e x_e = [f]^{-1} A_e [f] x_f. \] Thus \[ A_f = [f]^{-1} A_e [f] \] is the realisation of \(T\) in the basis \(f\).

**Ex.**

- How do we express the rotation \[ \begin{bmatrix} x_1 \\ x_2\end{bmatrix}_e \mapsto \begin{bmatrix} 0 & -1 \\ 1 & 0\end{bmatrix}\begin{bmatrix} x_1 \\ x_2\end{bmatrix}_e = \begin{bmatrix} -x_2 \\ x_1\end{bmatrix}_e\] in the basis \(f = \{(1,1), (-1,0)\}\)? Since \[ [f] = \begin{bmatrix} 1 & -1 \\ 1 & 0 \end{bmatrix} \quad\text{ and }\quad [f]^{-1} = \begin{bmatrix} 0 & 1 \\ -1 & 1 \end{bmatrix}, \] we have \[A_f = \underbrace{\begin{bmatrix} 0 & 1 \\ -1 & 1 \end{bmatrix}}_{[f]^{-1}} \underbrace{\begin{bmatrix} 0 & -1 \\ 1 & 0\end{bmatrix}}_{A_e} \underbrace{\begin{bmatrix} 1 & -1 \\ 1 & 0 \end{bmatrix}}_{[f]} = \begin{bmatrix} 1 & -1 \\ 2 & -1 \end{bmatrix}. \] Check: \( (x_1,x_2)_e \stackrel{[f]^{-1}}{\mapsto} (x_2, x_2-x_1)_f \stackrel{A_f}{\mapsto} (x_1, x_1+x_2)_f \stackrel{[f]}{\mapsto} (-x_2, x_1)_e \) describes the correct transformation.

## Kernels and ranks

Let \(T \in L(X,Y)\). The set of vectors for which \(T\) vanishes is called the **kernel of** T.
\[
\mathrm{ker}(T) \stackrel{\text{def.}}{=} \{ x \in X \colon Tx = {\bf 0} \text{ in } Y\}.
\]

### › The kernel and range of a linear transformation are vector spaces

Let \(T \in L(X,Y)\). Then \(\mathrm{ker}(T) \subset X\) is a linear subspace of \(X\), and \(\mathrm{ran}(T) \subset Y\) is a linear subspace of \(Y\).

**N.b.** The dimension of \(\mathrm{ran}(T)\) is called the **rank of** \(T\), \(\mathrm{rank}(T)\).

**Ex.**

- The kernel of \(T \in L(\R^2)\colon (x_1,x_2) \mapsto (-x_2,x_1)\) is the trivial subspace \(\{(0,0)\} \subset \R^2\). Since \(\mathrm{ran}(T) = \R^2\), we have \(\mathrm{rank}(T) = 2\).
- The differential operator \(\frac{d}{dx}\) is a linear operator \(C^1(\R) \to C(\R)\)
^{5)}. As we know, \[\mathrm{ker}\big(\frac{d}{dx}\big) = \{f \in C^1(\R) \colon f(x) \equiv c \text{ for some } c \in \R\},\] so that \(\mathrm{ker}(\frac{d}{dx}) \cong \R\) is a one-dimensional subspace of \(C^1(\R)\). Since \[ \frac{d}{dx} \int_0^x f(t)\,dt = f(x) \qquad\text{ for any } f \in C(\R), \] we have \(\mathrm{ran} (\frac{d}{dx}) = C(\R)\) and \(\mathrm{rank}(\frac{d}{dx}) = \infty\).

*The domain of definition matters*: considered as an operator on \(P_n(\R)\) the differential operator \(\frac{d}{dx} \colon P_n(\R) \to P_n(\R)\) still has a one-dimensional kernel (the space of constant polynomials, \(P_0(\R)\)), but its range is now finite-dimensional: \[ \mathrm{ran}\big(\frac{d}{dx}\big) = P_{n-1}(\R) \cong \R^n. \] This even works for \(n = 0\), if we define \(P_{-1}(\R) := \R^0 = \{0\}\).^{6)}

### › A linear transformation is injective if and only if its kernel is trivial

Let \(T \in L(X,Y)\). Then \[ T \text{ injective } \quad \Longleftrightarrow\quad \mathrm{ker}(T) = \{\bf{0}\}.\]

**Ex.**

- A matrix \(A \in M_{m\times n}(\R)\) describes a linear transformation \(\R^n \to \R^m\). This transformation is injective if zero (the zero element in \(\R^n\)) is the only solution of the corresponding linear homogeneous system: \[ \begin{align*} &a_{11} x_1 &+ \ldots & a_{1n} x_n = 0\\ &\vdots & & \vdots \\ &a_{m1} x_1 &+ \ldots & a_{mn} x_n = 0\end{align*} \quad\Longrightarrow\quad (x_1,\ldots,x_n) = (0, \ldots, 0).\]

## Matrices: null spaces, column spaces and row spaces

Let \(A = (a_{ij})_{ij} \in M_{m\times n}(\R)\) be the matrix realisation of a linear map \(\R^n \to \R^m\).^{7)}

### The null space of a matrix

In this case the kernel of \(A\) is also called the **null space of** \(A\):
\[
\begin{align*}
x \in \mathrm{ker}(A) \quad&\Longleftrightarrow\quad Ax = 0 \quad\Longleftrightarrow\quad \sum_{j=1}^n a_{ij} x_j = 0 \quad \forall i = 1, \ldots, m\\ \quad&\Longleftrightarrow\quad (x_1, \ldots, x_n) \perp (a_{i1}, \ldots, a_{in}) \quad \text{ for all } i = 1, \ldots, n.
\end{align*}
\]
Thus, *the kernel is the space of vectors* \(x \in \R^n\) *which are orthogonal to the row vectors of* \(A\).

### The column space of a matrix

The **column space of** \(A\) is the range of \(A\): since
\[
Ax = \begin{bmatrix} a_{11} & \cdots & a_{1n} \\ \vdots & \ddots & \vdots \\ a_{m1} & \cdots & a_{mn} \end{bmatrix} \begin{bmatrix} x_1 \\ \vdots \\ x_n \end{bmatrix} = x_1 \begin{bmatrix} a_{11} \\ \vdots \\ a_{m1} \end{bmatrix} + x_2 \begin{bmatrix} a_{12} \\ \vdots \\ a_{m2} \end{bmatrix} + \ldots + x_n \begin{bmatrix} a_{1n} \\ \vdots \\ a_{mn} \end{bmatrix},
\]
we have that
\[
\mathrm{ran}(A) = \{Ax \colon x \in \R^n\} = \Big\{ \sum_{j=1}^n x_j A_j \colon (x_1, \ldots, x_n) \in \R^n \Big\} = \mathrm{span}\{A_1, \ldots, A_n\}
\]
is the subspace of \(\R^m\) spanned by the column vectors \(A_j\), \(j = 1, \ldots, n\), of \(A\).

### The row space of a matrix

Similarly, define the **row space of** \(A\) to be the space spanned by the row vectors of \(A\). Then
\[
\text{ row space of } A = \text{ column space of } A^t,
\]
where \(A^t = (a_{ji})\) is the transpose of \(A = (a_{ij})\).

### › The kernel of a matrix is perpendicular to the range of its transpose

Let \(A \in M_{m\times n}(\R)\). Then \[ \mathrm{ker}(A) \perp \mathrm{ran}(A^t), \] meaning that if \(x \in \mathrm{ker}(A)\) and \(y \in \mathrm{ran}(A^t)\), then \(x \cdot y = \sum_{j=1}^n x_j y_j = 0.\)

## The rank–nullity theorem and its consequences

### › The rank–nullity theorem

Let \(T \in L(\R^n,\R^m)\). Then \[ \mathrm{dim}\, \mathrm{ker}(T) + \mathrm{dim}\, \mathrm{ran}(T) = n. \]

**N.b.** The name comes from that \(\mathrm{dim}\, \mathrm{ker}(T)\) is the **nullity of** \(T\). Thus, the sum of the rank and the nullity of \(T\) equals the dimension of its ground space (domain).

### › For finite-dimensional linear transformations, injective means surjective

Let \(T \in L(\R^n)\) be a linear transformation \(\R^n \to \R^n\). Then the following are equivalent:

- \(T\) is injective \(\quad\) (\(\mathrm{ker}(T) = \{\mathbf{0}\}\))
- \(T\) is surjective \(\quad\) (\(\mathrm{ran}(T) = \R^n\))
- \(T\colon \R^n \to \R^n \) is invertible
- The matrix representation \(A\) of \(T\) (in any given basis) is invertible.
- For any \(b \in \R^n\) the system \(Ax = b\) has a unique solution \(x\).

### Geometric interpretation of the rank–nullity theorem

Define the **direct sum** \(X \oplus Y\) of two vector spaces (both real, or both complex) as the space of pairs \((x,y)\) with the naturally induced vector addition and scalar multiplication:
\[
X \oplus Y \stackrel{\text{def.}}{=} \{(x,y) \in X \times Y\},
\]
where
\[
(x_1,y_1) + (x_2,y_2) \stackrel{\text{def.}}{=} (x_1+x_2, y_1 + y_2) \quad\text{ and }\quad \lambda(x,y) \stackrel{\text{def.}}{=} (\lambda x, \lambda y).
\]
If \(X,Y \subset V\) are subspaces of a vector space \(V\), then
\[
X \oplus Y = V \quad\Longleftrightarrow\quad X \cap Y = \{0\} \quad\text{ and }\quad X + Y \stackrel{\text{def.}}{=} \{ x + y \colon x \in X, y \in Y\} = V,
\]
where the equality \(X \oplus Y = V\) should be interpreted in terms of isomorphisms (\(V\) can be represented as \(X \oplus Y\)). Note that
\[
\dim(X \oplus Y) = \dim(X) + \dim(Y).
\]

With these definitions, the rank–nullity theorem can be expressed as a geometric description of the underlying space (\(\R^n\)) in terms of the matrix \(A\).

### › Rank–nullity theorem: geometric version

Let \(A \in M_{m\times n}(\R)\). Then \[ \R^n = \mathrm{ker}(A) \oplus \mathrm{ran}(A^t). \]

** N.b. ** A consequence of this is that \(\mathrm{rank}(A) = \mathrm{rank}(A^t)\); another is that \(\R^m = \mathrm{ker}(A^t) \oplus \mathrm{ran}(A)\).

### › Summary on linear equations (the Fredholm alternative)

Let \(A \in M_{m\times n}(\R)\) be the realisation of a linear transformation \(\R^n \to \R^m\), and consider the linear equation \[Ax = b.\]

- Either \(b \in \mathrm{ran}(A)\) and the equation is solvable, or \(b \in \mathrm{ker}(A^t)\) and there is no solution.
- In case \(\ker(A) = \{0\}\) any solution is unique, else the solutions can be described as \[ x_p + \ker(A), \] where \(x_p\) is any (particular) solution of \(Ax = b\).

^{1)}

**linear functional**.

^{2)}

**boundedness**(continuity) of a linear transformation to state a similar result, which then says that the transformation is determined by \(Te_j\) (but we cannot choose \(Te_j = y_j\) arbitrarily).

^{3)}

^{4)}

**nilpotent**, meaning that \(A^n = 0\) for some \(n \in \mathbb N\) (in this case \(n=3\)). This is because three derivations on any \(p \in P_2(\R)\) produces the zero polynomial.

^{5)}

^{6)}

^{7)}

^{8)}