\[ \newcommand{R}{\mathbb{R}} \newcommand{N}{\mathbb{N}} \newcommand{defarrow}{\quad \stackrel{\text{def}}{\Longleftrightarrow} \quad} \]
Linear transformations
Let \(X\) and \(Y\) be vector spaces (both real, or both complex), and \(T \colon X \to Y\) a mapping between them.
Linear transformations
We say that \(T\) is a linear transformation (or just linear) if it preserves the linear structure of a vector space: \[ T \text{ linear } \defarrow T(\lambda x+ \mu y) = \lambda Tx + \mu Ty, \qquad x,y \in X, \: \mu, \lambda \in \mathbb R \: (\text{or } \mathbb C). \]
- Any matrix \(A \in M_{m \times n}(\R)\) defines a linear transformation \(\R^n \to \R^m\): \[ \underbrace{\begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix}}_{x} \mapsto \underbrace{ \begin{bmatrix} a_{11} & a_{12} & \cdots & a_{1n}\\ a_{21} & a_{22} & \cdots & a_{2n}\\ \vdots & \vdots & \ddots & \vdots\\ a_{m1} & a_{m2} & \cdots & a_{mn}\\ \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix}.}_{Ax}\]
- The integral operator defined by \( Tf(t) := \int_0^t f(s)\,ds \) is a linear transformation on \(C(I,\R)\): \[ T \colon C(I,\R) \to C(I,\R), \quad Tf = \bigg[ t \mapsto \int_0^t f(s)\,ds\bigg].\]
- A slight modification, \[ Tf := \int_0^1 f(s)\,ds, \] yields a linear transformation \(C(I,\R) \to \R\) (given that \([0,1] \subset I\)). 1)
- For any polynomial \(p \in P_k(\R)\), the differential operator \( p(D) := \sum_{j=0}^k a_j D^j\) is a linear transformation: \[ p(D) \colon C^k(I,\R) \to C(I,\R), \qquad p(D)f = \sum_{j=0}^k a_j f^{(j)}. \] Here, \(D = \frac{d}{dx}\) is the standard differentiation operator.
- The shift operator \( T \colon (x_1, x_2, \ldots) \mapsto (0,x_1,x_2, \ldots),\) is a linear transformation \(l_p \to l_p\), for any \(p\), \(1 \leq p \leq \infty\): \[T(\lambda x + \mu y) = (0, \lambda x_1 + \mu y_1, \ldots) = \lambda (0, x_1, \ldots) + \mu (0,y_1, \ldots) = \lambda Tx + \mu Ty. \] Note that \(\|Tx\|_{l_p} = \|x\|_{l_p}\) guarantees that \(\mathrm{ran}(T) \subset l_p\).
The set of linear transformations as a vector space
The set of linear transformations \(X \to Y\) is denoted by \(L(X,Y)\): \[ L(X,Y) \stackrel{\text{def.}}{=} \{ T \colon X \to Y \text{ linear}\} \]
IF \(X = Y\), we may abbreviate \(L(X,X)\) by \(L(X)\).
› L(X,Y) is a vector space
If, for all \(S,T \in L(X,Y)\), we define \[ (T+S)(x) := Tx + Sx \quad\text{ and }\quad (\lambda T)x := \lambda (Tx),\] for all \(x \in X\) and \(\lambda \in \R\) (or \(\mathbb C\)), it is easily checked that \(L(X,Y)\) becomes a vector space. In particular, \(\mu T + \lambda S \in L(X,Y)\) for any \(S,T \in L(X,Y)\).
- The set of \(m \times n\)-matrices \(M_{m\times n}(\R)\) forms a real vector space. As we shall see, \(M_{m \times n}(\R) \cong L(\R^n,\R^m)\).
› A linear transformation is determined by its action on any basis
Let \(X\) be a finite-dimensional 2) vector space with basis \(\{e_1, \ldots, e_n\}\). For any values \(y_1, \ldots, y_n \in Y\) there exists exactly one linear transformation \(T \in L(X,Y)\) such that \[ Te_j = y_j, \qquad j =1,\ldots,n.\]
- The columns \(A_j\) of an \(m \times n\)-matrix \(A\) are determined by its action on the standard basis \(\{e_j\}_{j=1}^n\): \[ Ae_j = A_j, \qquad j = 1, \ldots, n. \] Here \(A_j\) plays the role of \(y_j\) in the above theorem.
› Linear transformations between finite-dimensional vector spaces correspond to matrices
Let \(X,Y\) be real vector spaces of dimension \(n\) and \(m\), respectively. Then \(L(X,Y) \cong M_{m\times n}(\R)\).
N.b. The corresponding statement holds for complex vector spaces \(X,Y\), with \(M_{m\times n}(\mathbb C)\) also complex-valued.
- The linear transformation \(T\colon (x_1,x_2) \mapsto (-x_2,x_1)\) on \(\R^2\) is realized by a rotation matrix \(A\): \[ \begin{bmatrix} 0 & -1 \\ 1 & 0\end{bmatrix}\begin{bmatrix} x_1 \\ x_2\end{bmatrix} = \begin{bmatrix} -x_2 \\ x_1\end{bmatrix}.\]
- More generally, \[ \begin{bmatrix} \cos(\theta) & -\sin(\theta) \\ \sin(\theta) & \cos(\theta) \end{bmatrix} \] rotates a vector \( \theta\) radians counterclockwise; the preceeding example is attained for \(\theta=\pi/2\). Any such matrix also corresponds to a change of basis3): if \(f_1 = (\cos(\theta), \sin(\theta))\) and \(f_2 = (-\sin(\theta),\cos(\theta))\), then \[ \begin{bmatrix} \cos(\theta) & -\sin(\theta) \\ \sin(\theta) & \cos(\theta) \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} \] expresses the coordinates \((x_1,x_2)_f\) in the standard basis \(e\) as \(x_1 f_1 + x_2 f_2\).
- The differential operator \(\frac{d}{dx}\) is a linear operator on \(P_2(\R)\). Since \(P_2(\R) \cong \R^3\) via the vector space isomorphism \[ \sum_{j=0}^2 a_j x^j \stackrel{\varphi}{\mapsto} (a_0,a_1,a_2), \] we see that \[\frac{d}{dx} \colon \begin{bmatrix} 0 & 1 & 0\\ 0 & 0 & 2 \\ 0 & 0 & 0 \end{bmatrix} \begin{bmatrix} a_0 \\ a_1 \\ a_2 \end{bmatrix} = \begin{bmatrix} a_1 \\ 2a_2 \\ 0 \end{bmatrix}\] expresses the derivation \[\frac{d}{dx} \big(a_0 + a_1 x + a_2 x^2\big) = a_1 + 2 a_2 x + 0 x^2 \] using a matrix. 4)
Digression: representing linear transformations in different bases
Let \(e = \{e_1, \ldots, e_n\}\) (standard basis) and \(f = \{f_1, \ldots, f_n\}\) (new basis) be two bases for \(\R^n\), and \([f]\) the matrix with \([f_j]\), \( j=1,\ldots, n\), as column vectors (expressed in the standard basis \(e\)). Then \[ x_e = [f] x_f \quad\text{ and }\quad x_f = [f]^{-1}x_e. \] Hence, if \[ T \in L(\R^n) \colon\quad T \text{ is realised by } A_e \in M_{n \times n}(\R) \text{ in the basis } e,\] what is its realisation \(A_f\) in the basis \(f\)? We have \[ y_e = A_e x_e \quad\Longleftrightarrow\quad y_f = [f]^{-1} y_e = [f]^{-1} A_e x_e = [f]^{-1} A_e [f] x_f. \] Thus \[ A_f = [f]^{-1} A_e [f] \] is the realisation of \(T\) in the basis \(f\).
- How do we express the rotation \[ \begin{bmatrix} x_1 \\ x_2\end{bmatrix}_e \mapsto \begin{bmatrix} 0 & -1 \\ 1 & 0\end{bmatrix}\begin{bmatrix} x_1 \\ x_2\end{bmatrix}_e = \begin{bmatrix} -x_2 \\ x_1\end{bmatrix}_e\] in the basis \(f = \{(1,1), (-1,0)\}\)? Since \[ [f] = \begin{bmatrix} 1 & -1 \\ 1 & 0 \end{bmatrix} \quad\text{ and }\quad [f]^{-1} = \begin{bmatrix} 0 & 1 \\ -1 & 1 \end{bmatrix}, \] we have \[A_f = \underbrace{\begin{bmatrix} 0 & 1 \\ -1 & 1 \end{bmatrix}}_{[f]^{-1}} \underbrace{\begin{bmatrix} 0 & -1 \\ 1 & 0\end{bmatrix}}_{A_e} \underbrace{\begin{bmatrix} 1 & -1 \\ 1 & 0 \end{bmatrix}}_{[f]} = \begin{bmatrix} 1 & -1 \\ 2 & -1 \end{bmatrix}. \] Check: \( (x_1,x_2)_e \stackrel{[f]^{-1}}{\mapsto} (x_2, x_2-x_1)_f \stackrel{A_f}{\mapsto} (x_1, x_1+x_2)_f \stackrel{[f]}{\mapsto} (-x_2, x_1)_e \) describes the correct transformation.
Kernels and ranks
Let \(T \in L(X,Y)\). The set of vectors for which \(T\) vanishes is called the kernel of T. \[ \mathrm{ker}(T) \stackrel{\text{def.}}{=} \{ x \in X \colon Tx = {\bf 0} \text{ in } Y\}. \]
› The kernel and range of a linear transformation are vector spaces
Let \(T \in L(X,Y)\). Then \(\mathrm{ker}(T) \subset X\) is a linear subspace of \(X\), and \(\mathrm{ran}(T) \subset Y\) is a linear subspace of \(Y\).
N.b. The dimension of \(\mathrm{ran}(T)\) is called the rank of \(T\), \(\mathrm{rank}(T)\).
- The kernel of \(T \in L(\R^2)\colon (x_1,x_2) \mapsto (-x_2,x_1)\) is the trivial subspace \(\{(0,0)\} \subset \R^2\). Since \(\mathrm{ran}(T) = \R^2\), we have \(\mathrm{rank}(T) = 2\).
- The differential operator \(\frac{d}{dx}\) is a linear operator \(C^1(\R) \to C(\R)\)5). As we know, \[\mathrm{ker}\big(\frac{d}{dx}\big) = \{f \in C^1(\R) \colon f(x) \equiv c \text{ for some } c \in \R\},\] so that \(\mathrm{ker}(\frac{d}{dx}) \cong \R\) is a one-dimensional subspace of \(C^1(\R)\). Since \[ \frac{d}{dx} \int_0^x f(t)\,dt = f(x) \qquad\text{ for any } f \in C(\R), \] we have \(\mathrm{ran} (\frac{d}{dx}) = C(\R)\) and \(\mathrm{rank}(\frac{d}{dx}) = \infty\).
- The domain of definition matters: considered as an operator on \(P_n(\R)\) the differential operator \(\frac{d}{dx} \colon P_n(\R) \to P_n(\R)\) still has a one-dimensional kernel (the space of constant polynomials, \(P_0(\R)\)), but its range is now finite-dimensional: \[ \mathrm{ran}\big(\frac{d}{dx}\big) = P_{n-1}(\R) \cong \R^n. \] This even works for \(n = 0\), if we define \(P_{-1}(\R) := \R^0 = \{0\}\). 6)
› A linear transformation is injective if and only if its kernel is trivial
Let \(T \in L(X,Y)\). Then \[ T \text{ injective } \quad \Longleftrightarrow\quad \mathrm{ker}(T) = \{\bf{0}\}.\]
- A matrix \(A \in M_{m\times n}(\R)\) describes a linear transformation \(\R^n \to \R^m\). This transformation is injective if zero (the zero element in \(\R^n\)) is the only solution of the corresponding linear homogeneous system: \[ \begin{align*} &a_{11} x_1 &+ \ldots & a_{1n} x_n = 0\\ &\vdots & & \vdots \\ &a_{m1} x_1 &+ \ldots & a_{mn} x_n = 0\end{align*} \quad\Longrightarrow\quad (x_1,\ldots,x_n) = (0, \ldots, 0).\]
Matrices: null spaces, column spaces and row spaces
Let \(A = (a_{ij})_{ij} \in M_{m\times n}(\R)\) be the matrix realisation of a linear map \(\R^n \to \R^m\).7)
The null space of a matrix
In this case the kernel of \(A\) is also called the null space of \(A\): \[ \begin{align*} x \in \mathrm{ker}(A) \quad&\Longleftrightarrow\quad Ax = 0 \quad\Longleftrightarrow\quad \sum_{j=1}^n a_{ij} x_j = 0 \quad \forall i = 1, \ldots, m\\ \quad&\Longleftrightarrow\quad (x_1, \ldots, x_n) \perp (a_{i1}, \ldots, a_{in}) \quad \text{ for all } i = 1, \ldots, n. \end{align*} \] Thus, the kernel is the space of vectors \(x \in \R^n\) which are orthogonal to the row vectors of \(A\).
The column space of a matrix
The column space of \(A\) is the range of \(A\): since \[ Ax = \begin{bmatrix} a_{11} & \cdots & a_{1n} \\ \vdots & \ddots & \vdots \\ a_{m1} & \cdots & a_{mn} \end{bmatrix} \begin{bmatrix} x_1 \\ \vdots \\ x_n \end{bmatrix} = x_1 \begin{bmatrix} a_{11} \\ \vdots \\ a_{m1} \end{bmatrix} + x_2 \begin{bmatrix} a_{12} \\ \vdots \\ a_{m2} \end{bmatrix} + \ldots + x_n \begin{bmatrix} a_{1n} \\ \vdots \\ a_{mn} \end{bmatrix}, \] we have that \[ \mathrm{ran}(A) = \{Ax \colon x \in \R^n\} = \Big\{ \sum_{j=1}^n x_j A_j \colon (x_1, \ldots, x_n) \in \R^n \Big\} = \mathrm{span}\{A_1, \ldots, A_n\} \] is the subspace of \(\R^m\) spanned by the column vectors \(A_j\), \(j = 1, \ldots, n\), of \(A\).
The row space of a matrix
Similarly, define the row space of \(A\) to be the space spanned by the row vectors of \(A\). Then \[ \text{ row space of } A = \text{ column space of } A^t, \] where \(A^t = (a_{ji})\) is the transpose of \(A = (a_{ij})\).
› The kernel of a matrix is perpendicular to the range of its transpose
Let \(A \in M_{m\times n}(\R)\). Then \[ \mathrm{ker}(A) \perp \mathrm{ran}(A^t), \] meaning that if \(x \in \mathrm{ker}(A)\) and \(y \in \mathrm{ran}(A^t)\), then \(x \cdot y = \sum_{j=1}^n x_j y_j = 0.\)
The rank–nullity theorem and its consequences
› The rank–nullity theorem
Let \(T \in L(\R^n,\R^m)\). Then \[ \mathrm{dim}\, \mathrm{ker}(T) + \mathrm{dim}\, \mathrm{ran}(T) = n. \]
N.b. The name comes from that \(\mathrm{dim}\, \mathrm{ker}(T)\) is the nullity of \(T\). Thus, the sum of the rank and the nullity of \(T\) equals the dimension of its ground space (domain).
› For finite-dimensional linear transformations, injective means surjective
Let \(T \in L(\R^n)\) be a linear transformation \(\R^n \to \R^n\). Then the following are equivalent:
- \(T\) is injective \(\quad\) (\(\mathrm{ker}(T) = \{\mathbf{0}\}\))
- \(T\) is surjective \(\quad\) (\(\mathrm{ran}(T) = \R^n\))
- \(T\colon \R^n \to \R^n \) is invertible
- The matrix representation \(A\) of \(T\) (in any given basis) is invertible.
- For any \(b \in \R^n\) the system \(Ax = b\) has a unique solution \(x\).
Geometric interpretation of the rank–nullity theorem
Define the direct sum \(X \oplus Y\) of two vector spaces (both real, or both complex) as the space of pairs \((x,y)\) with the naturally induced vector addition and scalar multiplication: \[ X \oplus Y \stackrel{\text{def.}}{=} \{(x,y) \in X \times Y\}, \] where \[ (x_1,y_1) + (x_2,y_2) \stackrel{\text{def.}}{=} (x_1+x_2, y_1 + y_2) \quad\text{ and }\quad \lambda(x,y) \stackrel{\text{def.}}{=} (\lambda x, \lambda y). \] If \(X,Y \subset V\) are subspaces of a vector space \(V\), then \[ X \oplus Y = V \quad\Longleftrightarrow\quad X \cap Y = \{0\} \quad\text{ and }\quad X + Y \stackrel{\text{def.}}{=} \{ x + y \colon x \in X, y \in Y\} = V, \] where the equality \(X \oplus Y = V\) should be interpreted in terms of isomorphisms (\(V\) can be represented as \(X \oplus Y\)). Note that \[ \dim(X \oplus Y) = \dim(X) + \dim(Y). \]
With these definitions, the rank–nullity theorem can be expressed as a geometric description of the underlying space (\(\R^n\)) in terms of the matrix \(A\).
› Rank–nullity theorem: geometric version
Let \(A \in M_{m\times n}(\R)\). Then \[ \R^n = \mathrm{ker}(A) \oplus \mathrm{ran}(A^t). \]
N.b. A consequence of this is that \(\mathrm{rank}(A) = \mathrm{rank}(A^t)\); another is that \(\R^m = \mathrm{ker}(A^t) \oplus \mathrm{ran}(A)\).
› Summary on linear equations (the Fredholm alternative)
Let \(A \in M_{m\times n}(\R)\) be the realisation of a linear transformation \(\R^n \to \R^m\), and consider the linear equation \[Ax = b.\]
- Either \(b \in \mathrm{ran}(A)\) and the equation is solvable, or \(b \in \mathrm{ker}(A^t)\) and there is no solution.
- In case \(\ker(A) = \{0\}\) any solution is unique, else the solutions can be described as \[ x_p + \ker(A), \] where \(x_p\) is any (particular) solution of \(Ax = b\).