\[ \newcommand{R}{\mathbb{R}} \newcommand{N}{\mathbb{N}} \newcommand{defarrow}{\quad \stackrel{\text{def}}{\Longleftrightarrow} \quad} \] ====== Linear transformations ====== Let \(X\) and \(Y\) be vector spaces (both real, or both complex), and \(T \colon X \to Y\) a mapping between them. ===== Linear transformations ===== We say that \(T\) is a **linear transformation** (or just **linear**) if it preserves the linear structure of a vector space: \[ T \text{ linear } \defarrow T(\lambda x+ \mu y) = \lambda Tx + \mu Ty, \qquad x,y \in X, \: \mu, \lambda \in \mathbb R \: (\text{or } \mathbb C). \] ** Ex. **\\ * Any matrix \(A \in M_{m \times n}(\R)\) defines a linear transformation \(\R^n \to \R^m\): \[ \underbrace{\begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix}}_{x} \mapsto \underbrace{ \begin{bmatrix} a_{11} & a_{12} & \cdots & a_{1n}\\ a_{21} & a_{22} & \cdots & a_{2n}\\ \vdots & \vdots & \ddots & \vdots\\ a_{m1} & a_{m2} & \cdots & a_{mn}\\ \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix}.}_{Ax}\] * The integral operator defined by \( Tf(t) := \int_0^t f(s)\,ds \) is a linear transformation on \(C(I,\R)\): \[ T \colon C(I,\R) \to C(I,\R), \quad Tf = \bigg[ t \mapsto \int_0^t f(s)\,ds\bigg].\] * A slight modification, \[ Tf := \int_0^1 f(s)\,ds, \] yields a linear transformation \(C(I,\R) \to \R\) (given that \([0,1] \subset I\)). ((Such an operator is called a **linear functional**.)) * For any polynomial \(p \in P_k(\R)\), the differential operator \( p(D) := \sum_{j=0}^k a_j D^j\) is a linear transformation: \[ p(D) \colon C^k(I,\R) \to C(I,\R), \qquad p(D)f = \sum_{j=0}^k a_j f^{(j)}. \] Here, \(D = \frac{d}{dx}\) is the standard differentiation operator. * The shift operator \( T \colon (x_1, x_2, \ldots) \mapsto (0,x_1,x_2, \ldots),\) is a linear transformation \(l_p \to l_p\), for any \(p\), \(1 \leq p \leq \infty\): \[T(\lambda x + \mu y) = (0, \lambda x_1 + \mu y_1, \ldots) = \lambda (0, x_1, \ldots) + \mu (0,y_1, \ldots) = \lambda Tx + \mu Ty. \] Note that \(\|Tx\|_{l_p} = \|x\|_{l_p}\) guarantees that \(\mathrm{ran}(T) \subset l_p\). ===== The set of linear transformations as a vector space ===== **The set of linear transformations** \(X \to Y\) is denoted by \(L(X,Y)\): \[ L(X,Y) \stackrel{\text{def.}}{=} \{ T \colon X \to Y \text{ linear}\} \] IF \(X = Y\), we may abbreviate \(L(X,X)\) by \(L(X)\). ==== › L(X,Y) is a vector space ==== If, for all \(S,T \in L(X,Y)\), we define \[ (T+S)(x) := Tx + Sx \quad\text{ and }\quad (\lambda T)x := \lambda (Tx),\] for all \(x \in X\) and \(\lambda \in \R\) (or \(\mathbb C\)), it is easily checked that \(L(X,Y)\) becomes a vector space. In particular, \(\mu T + \lambda S \in L(X,Y)\) for any \(S,T \in L(X,Y)\). ** Ex. **\\ * The set of \(m \times n\)-matrices \(M_{m\times n}(\R)\) forms a real vector space. As we shall see, \(M_{m \times n}(\R) \cong L(\R^n,\R^m)\). ==== › A linear transformation is determined by its action on any basis ==== Let \(X\) be a finite-dimensional ((For infinite-dimensional Banach spaces one needs the additional concept of **boundedness** (continuity) of a linear transformation to state a similar result, which then says that the transformation is determined by \(Te_j\) (but we cannot choose \(Te_j = y_j\) arbitrarily).)) vector space with basis \(\{e_1, \ldots, e_n\}\). For any values \(y_1, \ldots, y_n \in Y\) there exists exactly one linear transformation \(T \in L(X,Y)\) such that \[ Te_j = y_j, \qquad j =1,\ldots,n.\] Any \(x \in X\) has a unique representation \(x = \sum_{j=1}^n x_j e_j\). Define \(T\) through \[Tx = \sum_{j=1}^n x_j y_j.\] Then \(Te_j = y_j\), and \(T\) is linear since it acts as multiplication with a \(1 \times n\) matrix (a dot product with the vector \((y_1, \ldots, y_n)\)). Moreover, if \(S \in L(X,Y)\) also satisfies \(Se_j = y_j\), then \[ Sx = S\big( \sum_{j=1}^n x_j e_j\big) = \sum_{j=1}^n x_j S e_j = \sum_{j=1}^n x_j y_j = Tx, \qquad\text{ for all } x \in X,\] so that \(S = T\) in \(L(X,Y)\). \\ ** Ex. **\\ * The columns \(A_j\) of an \(m \times n\)-matrix \(A\) are determined by its action on the standard basis \(\{e_j\}_{j=1}^n\): \[ Ae_j = A_j, \qquad j = 1, \ldots, n. \] Here \(A_j\) plays the role of \(y_j\) in the above theorem. ==== › Linear transformations between finite-dimensional vector spaces correspond to matrices ==== Let \(X,Y\) be real vector spaces of dimension \(n\) and \(m\), respectively. Then \(L(X,Y) \cong M_{m\times n}(\R)\). **N.b.** The corresponding statement holds for complex vector spaces \(X,Y\), with \(M_{m\times n}(\mathbb C)\) also complex-valued. Since \(X \cong \R^n\) and \(Y \cong \R^m\) it suffices to prove the statement for these choices of \(X\) and \(Y\). Let \(\{e_j\}_{j=1}^n\) be the standard basis for \(\R^n\). Then \[ T \colon \begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix} \mapsto \begin{bmatrix} a_{11} & a_{12} & \cdots & a_{1n}\\ a_{21} & a_{22} & \cdots & a_{2n}\\ \vdots & \vdots & \ddots & \vdots\\ a_{m1} & a_{m2} & \cdots & a_{mn}\\ \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix}\] is a linear transformation \(\R^n \to \R^m\) satisfying \[T e_j = \begin{bmatrix} a_{1j} \\ a_{2j} \\ \vdots \\ a_{mj} \end{bmatrix}. \] According to the above proposition, there is exactly one such \(T \in L(\R^n,\R^m)\). Since we can choose the columns of \(A = (a_{ij})_{ij}\) to be any elements in \(\R^m\), we get all possible \(T \in L(\R^n,\R^m)\) in this way.\\ \\ ** Ex. **\\ * The linear transformation \(T\colon (x_1,x_2) \mapsto (-x_2,x_1)\) on \(\R^2\) is realized by a rotation matrix \(A\): \[ \begin{bmatrix} 0 & -1 \\ 1 & 0\end{bmatrix}\begin{bmatrix} x_1 \\ x_2\end{bmatrix} = \begin{bmatrix} -x_2 \\ x_1\end{bmatrix}.\] * More generally, \[ \begin{bmatrix} \cos(\theta) & -\sin(\theta) \\ \sin(\theta) & \cos(\theta) \end{bmatrix} \] rotates a vector \( \theta\) radians counterclockwise; the preceeding example is attained for \(\theta=\pi/2\). Any such matrix also corresponds to a change of basis((The determinant of the matrix is \(1\), so it is invertible, regardless of the value of \(\theta\).)): if \(f_1 = (\cos(\theta), \sin(\theta))\) and \(f_2 = (-\sin(\theta),\cos(\theta))\), then \[ \begin{bmatrix} \cos(\theta) & -\sin(\theta) \\ \sin(\theta) & \cos(\theta) \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} \] expresses the coordinates \((x_1,x_2)_f\) in the standard basis \(e\) as \(x_1 f_1 + x_2 f_2\). * The differential operator \(\frac{d}{dx}\) is a linear operator on \(P_2(\R)\). Since \(P_2(\R) \cong \R^3\) via the vector space isomorphism \[ \sum_{j=0}^2 a_j x^j \stackrel{\varphi}{\mapsto} (a_0,a_1,a_2), \] we see that \[\frac{d}{dx} \colon \begin{bmatrix} 0 & 1 & 0\\ 0 & 0 & 2 \\ 0 & 0 & 0 \end{bmatrix} \begin{bmatrix} a_0 \\ a_1 \\ a_2 \end{bmatrix} = \begin{bmatrix} a_1 \\ 2a_2 \\ 0 \end{bmatrix}\] expresses the derivation \[\frac{d}{dx} \big(a_0 + a_1 x + a_2 x^2\big) = a_1 + 2 a_2 x + 0 x^2 \] using a matrix. ((Note that this matrix is **nilpotent**, meaning that \(A^n = 0\) for some \(n \in \mathbb N\) (in this case \(n=3\)). This is because three derivations on any \(p \in P_2(\R)\) produces the zero polynomial.)) ==== Digression: representing linear transformations in different bases ==== Let \(e = \{e_1, \ldots, e_n\}\) (standard basis) and \(f = \{f_1, \ldots, f_n\}\) (new basis) be two bases for \(\R^n\), and \([f]\) the matrix with \([f_j]\), \( j=1,\ldots, n\), as column vectors (expressed in the standard basis \(e\)). Then \[ x_e = [f] x_f \quad\text{ and }\quad x_f = [f]^{-1}x_e. \] Hence, if \[ T \in L(\R^n) \colon\quad T \text{ is realised by } A_e \in M_{n \times n}(\R) \text{ in the basis } e,\] what is its realisation \(A_f\) in the basis \(f\)? We have \[ y_e = A_e x_e \quad\Longleftrightarrow\quad y_f = [f]^{-1} y_e = [f]^{-1} A_e x_e = [f]^{-1} A_e [f] x_f. \] Thus \[ A_f = [f]^{-1} A_e [f] \] is the realisation of \(T\) in the basis \(f\). ** Ex. **\\ * How do we express the rotation \[ \begin{bmatrix} x_1 \\ x_2\end{bmatrix}_e \mapsto \begin{bmatrix} 0 & -1 \\ 1 & 0\end{bmatrix}\begin{bmatrix} x_1 \\ x_2\end{bmatrix}_e = \begin{bmatrix} -x_2 \\ x_1\end{bmatrix}_e\] in the basis \(f = \{(1,1), (-1,0)\}\)? Since \[ [f] = \begin{bmatrix} 1 & -1 \\ 1 & 0 \end{bmatrix} \quad\text{ and }\quad [f]^{-1} = \begin{bmatrix} 0 & 1 \\ -1 & 1 \end{bmatrix}, \] we have \[A_f = \underbrace{\begin{bmatrix} 0 & 1 \\ -1 & 1 \end{bmatrix}}_{[f]^{-1}} \underbrace{\begin{bmatrix} 0 & -1 \\ 1 & 0\end{bmatrix}}_{A_e} \underbrace{\begin{bmatrix} 1 & -1 \\ 1 & 0 \end{bmatrix}}_{[f]} = \begin{bmatrix} 1 & -1 \\ 2 & -1 \end{bmatrix}. \] Check: \( (x_1,x_2)_e \stackrel{[f]^{-1}}{\mapsto} (x_2, x_2-x_1)_f \stackrel{A_f}{\mapsto} (x_1, x_1+x_2)_f \stackrel{[f]}{\mapsto} (-x_2, x_1)_e \) describes the correct transformation. ===== Kernels and ranks ===== Let \(T \in L(X,Y)\). The set of vectors for which \(T\) vanishes is called the **kernel of** T. \[ \mathrm{ker}(T) \stackrel{\text{def.}}{=} \{ x \in X \colon Tx = {\bf 0} \text{ in } Y\}. \] ==== › The kernel and range of a linear transformation are vector spaces ==== Let \(T \in L(X,Y)\). Then \(\mathrm{ker}(T) \subset X\) is a linear subspace of \(X\), and \(\mathrm{ran}(T) \subset Y\) is a linear subspace of \(Y\). **N.b.** The dimension of \(\mathrm{ran}(T)\) is called the **rank of** \(T\), \(\mathrm{rank}(T)\). For the kernel of \(T\): If \(x_1,x_2 \in \mathrm{ker}(T)\), then \[T(\lambda x_1 + \mu x_2) = \lambda T x_1 + \mu T x_2 = \lambda \bf{0} + \mu \bf{0} = \bf{0}. \] This shows that \(\mathrm{ker}(T)\) is a subspace of \(X\). (Note, in particular, that the zero element of \(X\) is always in \(\mathrm{ker}(T)\).)\\ \\ For the range of \(T\): If \(y_1, y_2 \in \mathrm{ran}(T)\), then there exists \(x_1, x_2 \in X\) such that \[ Tx_1 = y_1, \qquad Tx_2 = y_2. \] We want to show that, for any scalars \(\lambda, \mu\), we have \(\lambda y_1 + \mu y_2 \in \mathrm{ran}(T)\). But this follows from that \[ \lambda y_1 + \mu y_2 = \lambda T x_1 + \mu T x_2 = T(\lambda x_1 + \mu x_2) \in \mathrm{ran}(T), \] where we have used that \(\mu x_1 + \lambda x_2 \in X\), by the properties of a vector space. \\ ** Ex. **\\ * The kernel of \(T \in L(\R^2)\colon (x_1,x_2) \mapsto (-x_2,x_1)\) is the trivial subspace \(\{(0,0)\} \subset \R^2\). Since \(\mathrm{ran}(T) = \R^2\), we have \(\mathrm{rank}(T) = 2\). * The differential operator \(\frac{d}{dx}\) is a linear operator \(C^1(\R) \to C(\R)\)((Here we use the convention that \(C^k(\R) = C^k(\R,\R)\), just as one may write \(L(X) = L(X,X)\) for linear transformations on a space \(X\).)). As we know, \[\mathrm{ker}\big(\frac{d}{dx}\big) = \{f \in C^1(\R) \colon f(x) \equiv c \text{ for some } c \in \R\},\] so that \(\mathrm{ker}(\frac{d}{dx}) \cong \R\) is a one-dimensional subspace of \(C^1(\R)\). Since \[ \frac{d}{dx} \int_0^x f(t)\,dt = f(x) \qquad\text{ for any } f \in C(\R), \] we have \(\mathrm{ran} (\frac{d}{dx}) = C(\R)\) and \(\mathrm{rank}(\frac{d}{dx}) = \infty\). * //The domain of definition matters//: considered as an operator on \(P_n(\R)\) the differential operator \(\frac{d}{dx} \colon P_n(\R) \to P_n(\R)\) still has a one-dimensional kernel (the space of constant polynomials, \(P_0(\R)\)), but its range is now finite-dimensional: \[ \mathrm{ran}\big(\frac{d}{dx}\big) = P_{n-1}(\R) \cong \R^n. \] This even works for \(n = 0\), if we define \(P_{-1}(\R) := \R^0 = \{0\}\). ((This is the reason why the degree of the zero polynomial is sometimes taken as \(-1\); if \(\mathrm{deg}(0) = -1\), then \(P_{-1}(\R)\) is naturally definied, and the differential operator maps \(P_n(\R) \to P_{n-1}(\R)\) for all \(n \geq 0\).)) ==== › A linear transformation is injective if and only if its kernel is trivial ==== Let \(T \in L(X,Y)\). Then \[ T \text{ injective } \quad \Longleftrightarrow\quad \mathrm{ker}(T) = \{\bf{0}\}.\] \[ \begin{align*} T \text{ injective} \quad&\Longleftrightarrow\quad \big[ Tx = Ty \Longrightarrow x = y \big] \Longleftrightarrow\quad \big[ T(x-y) = 0 \Longrightarrow x - y = 0 \big]\\ \quad&\Longleftrightarrow\quad \big[ Tz = 0 \Longrightarrow z = 0 \big] \quad\Longleftrightarrow\quad \mathrm{ker}(T) = \{0\}. \end{align*} \] \\ ** Ex. **\\ * A matrix \(A \in M_{m\times n}(\R)\) describes a linear transformation \(\R^n \to \R^m\). This transformation is injective if zero (the zero element in \(\R^n\)) is the only solution of the corresponding linear homogeneous system: \[ \begin{align*} &a_{11} x_1 &+ \ldots & a_{1n} x_n = 0\\ &\vdots & & \vdots \\ &a_{m1} x_1 &+ \ldots & a_{mn} x_n = 0\end{align*} \quad\Longrightarrow\quad (x_1,\ldots,x_n) = (0, \ldots, 0).\] ===== Matrices: null spaces, column spaces and row spaces ===== Let \(A = (a_{ij})_{ij} \in M_{m\times n}(\R)\) be the matrix realisation of a linear map \(\R^n \to \R^m\).((This realisation is unique as long we have agreed upon a choice of bases for \(\R^n\) and \(\R^m\). If nothing else is said, we assume that vectors in \(\R^n\), \(n \in \mathbb N\), are expressed in the standard basis.)) ==== The null space of a matrix ==== In this case the kernel of \(A\) is also called the **null space of** \(A\): \[ \begin{align*} x \in \mathrm{ker}(A) \quad&\Longleftrightarrow\quad Ax = 0 \quad\Longleftrightarrow\quad \sum_{j=1}^n a_{ij} x_j = 0 \quad \forall i = 1, \ldots, m\\ \quad&\Longleftrightarrow\quad (x_1, \ldots, x_n) \perp (a_{i1}, \ldots, a_{in}) \quad \text{ for all } i = 1, \ldots, n. \end{align*} \] Thus, //the kernel is the space of vectors// \(x \in \R^n\) //which are orthogonal to the row vectors of// \(A\). ==== The column space of a matrix ==== The **column space of** \(A\) is the range of \(A\): since \[ Ax = \begin{bmatrix} a_{11} & \cdots & a_{1n} \\ \vdots & \ddots & \vdots \\ a_{m1} & \cdots & a_{mn} \end{bmatrix} \begin{bmatrix} x_1 \\ \vdots \\ x_n \end{bmatrix} = x_1 \begin{bmatrix} a_{11} \\ \vdots \\ a_{m1} \end{bmatrix} + x_2 \begin{bmatrix} a_{12} \\ \vdots \\ a_{m2} \end{bmatrix} + \ldots + x_n \begin{bmatrix} a_{1n} \\ \vdots \\ a_{mn} \end{bmatrix}, \] we have that \[ \mathrm{ran}(A) = \{Ax \colon x \in \R^n\} = \Big\{ \sum_{j=1}^n x_j A_j \colon (x_1, \ldots, x_n) \in \R^n \Big\} = \mathrm{span}\{A_1, \ldots, A_n\} \] is the subspace of \(\R^m\) spanned by the column vectors \(A_j\), \(j = 1, \ldots, n\), of \(A\). ==== The row space of a matrix ==== Similarly, define the **row space of** \(A\) to be the space spanned by the row vectors of \(A\). Then \[ \text{ row space of } A = \text{ column space of } A^t, \] where \(A^t = (a_{ji})\) is the transpose of \(A = (a_{ij})\). ==== › The kernel of a matrix is perpendicular to the range of its transpose ==== Let \(A \in M_{m\times n}(\R)\). Then \[ \mathrm{ker}(A) \perp \mathrm{ran}(A^t), \] meaning that if \(x \in \mathrm{ker}(A)\) and \(y \in \mathrm{ran}(A^t)\), then \(x \cdot y = \sum_{j=1}^n x_j y_j = 0.\) As shown above, the null space of \(A\) is perpendicular to the row space of \(A\). The row space of \(A\) equals the column space of \(A^t\) (this is the definition of the matrix transpose). The proposition follows. \\ ===== The rank–nullity theorem and its consequences ===== ==== › The rank–nullity theorem ==== Let \(T \in L(\R^n,\R^m)\). Then \[ \mathrm{dim}\, \mathrm{ker}(T) + \mathrm{dim}\, \mathrm{ran}(T) = n. \] **N.b.** The name comes from that \(\mathrm{dim}\, \mathrm{ker}(T)\) is the **nullity of** \(T\). Thus, the sum of the rank and the nullity of \(T\) equals the dimension of its ground space (domain). Pick a basis \(e = \{e_1, \ldots, e_k\}\) for \(\mathrm{ker}(T)\). If \(k = n\) and \( \mathrm{ker}(T) = \R^n\) we are done, since then \[ \mathrm{ran}(T) = \{ Tx \colon x \in \R^n\} = \{0\}, \] so that \(\mathrm{dim}\, \mathrm{ker}(T) + \mathrm{dim}\,\mathrm{ran}(T) = n\).\\ \\ Hence, assume that \(k < n\) and extend \(e\) to a basis \(\{e_1, \ldots, e_k, f_1, \ldots, f_m\}\) for \(\R^n\).\\ This can be done in the following way: pick \(f_1 \not\in \mathrm{span}\{e_1, \ldots, e_k\}\). Then \(\{e_1, \ldots, e_n, f_1\}\) is linearly independent. If \(\mathrm{span} \{e_1, \ldots, e_k, f_1\} = \R^n\) we stop. Else, pick \(f_2\not\in \mathrm{span}\{e_1, \ldots, e_k,f_1\}\). Since \(l > n \) vectors are always linearly dependent in \(\R^n\), this process stops when \(k + m = n\) (it cannot stop before, since then \(\R^n\) would be spanned by a set of dimension \( < n\), which is impossible; see the [[users/ehrnstro/teaching/linearmethods/bases#the_dimension_of_any_finite-dimensional_vector_space_is_unique| definition]] of vector space dimension).\\ We now prove that \(Tf = \{Tf_1, \ldots Tf_m\}\) is a basis for \(\mathrm{ran}(T)\).\\ \(Tf\) is linearly independent: \[ \sum_{j=1}^m a_j Tf_j = 0 \quad\Longleftrightarrow\quad T\big( \sum_{j=1}^m a_j f_j \big) = 0 \quad\Longleftrightarrow\quad \sum_{j=1}^m a_j f_j \in \mathrm{ker}(T) \quad\Longleftrightarrow\quad a_j = 0 \:\forall j = 1, \ldots, m, \] since \(T\) is linear, and since, by the construction of \(f\), no non-zero linear combination of elements \(f_j\) is in \(\mathrm{ker}(T)\).\\ Furthermore, \(Tf\) spans \(\mathrm{ran}(T)\): \[ \begin{align*} \mathrm{ran(T)} = \{ Tx \colon x \in \R^n\} &= \Big\{ T \big(\sum_{j=1}^k a_j e_j + \sum_{j=1}^m b_j f_j \big) \colon a_j, b_j \in \R \Big\}\\ &= \Big\{ T \big(\sum_{j=1}^k a_j e_j \big) + T\big(\sum_{j=1}^m b_j f_j\big) \colon a_j, b_j \in \R \Big\} = \Big\{ \sum_{j=1}^m b_j T f_j \colon b_j \in \R\Big\}, \end{align*} \] since \(T\) is linear and \(e \subset \mathrm{ker}(T)\). Hence, \(\{Tf_1, \ldots, Tf_m\}\) is a basis for \(\mathrm{ran}(T)\), and \[ \mathrm{dim}\, \mathrm{ker}(T) + \mathrm{dim}\, \mathrm{ran}(T) = k + m = n. \] \\ ==== › For finite-dimensional linear transformations, injective means surjective ==== Let \(T \in L(\R^n)\) be a linear transformation \(\R^n \to \R^n\). Then the following are equivalent: * \(T\) is injective \(\quad\) (\(\mathrm{ker}(T) = \{\mathbf{0}\}\)) * \(T\) is surjective \(\quad\) (\(\mathrm{ran}(T) = \R^n\)) * \(T\colon \R^n \to \R^n \) is invertible * The matrix representation \(A\) of \(T\) (in any given basis) is invertible. * For any \(b \in \R^n\) the system \(Ax = b\) has a unique solution \(x\). ** (i) ** \(\Longleftrightarrow\) **(ii):** When \(m = n\), the rank--nullity theorem says that \( \mathrm{ran}(T) = \R^n\) (so that \(T\) is surjective) exactly when \(\mathrm{ker}(T) = \{ \mathbf 0 \}\) (so that \(T\) is injective).\\ \\ ** (i,ii) ** \(\Longleftrightarrow\) **(iii):** A function is bijective exactly if it is both invertible and surjective.\\ ** (iii) ** \(\Longleftrightarrow\) **(iv):** Given any basis for \(\R^n\), \(T\) has a unique matrix representation \(A\) (defined by its action on the basis vectors). If the inverse matrix \(A^{-1}\) exists, then there exists a corresponding linear transformation \(S\) such that \(ST = ST = \mathrm{id}\) (since \(A^{-1}A = A A^{-1} = I\), and the identity map \(\mathrm{id}\colon \R^n \to \R^n\) has the identity matrix \(I\) as representation in all bases). Thus \(S = T^{-1}\) is the inverse of \(T\). If, on the other hand, \(T^{-1}\) exists, it must by the same argument have a matrix representation \(B\) such that \(AB = BA = I\). Hence, \(A^{-1} = B\) exists. ** (iv) ** \(\Longleftrightarrow\) **(v):** If \(A\) is invertible it is immediate that \(x = A^{-1}b\) is the unique solution. If, on the other hand, \(Ax = b\), has a unique solution \(x\) for any \(b\), we construct a matrix \(B\) by taking as its columns \(x_j\) such that \(Ax_j = e_j\), where \(\{e_1, \ldots, e_n\}\) is the standard basis. This guarantees that \(B = A^{-1}\) is the inverse matrix of \(A\). (A less constructive argument would be to note that \(Ax = b\) is uniquely solvable for all \(b \in \R^n\) exactly if \(T\) is invertible.) \\ ==== Geometric interpretation of the rank–nullity theorem ==== Define the **direct sum** \(X \oplus Y\) of two vector spaces (both real, or both complex) as the space of pairs \((x,y)\) with the naturally induced vector addition and scalar multiplication: \[ X \oplus Y \stackrel{\text{def.}}{=} \{(x,y) \in X \times Y\}, \] where \[ (x_1,y_1) + (x_2,y_2) \stackrel{\text{def.}}{=} (x_1+x_2, y_1 + y_2) \quad\text{ and }\quad \lambda(x,y) \stackrel{\text{def.}}{=} (\lambda x, \lambda y). \] If \(X,Y \subset V\) are subspaces of a vector space \(V\), then \[ X \oplus Y = V \quad\Longleftrightarrow\quad X \cap Y = \{0\} \quad\text{ and }\quad X + Y \stackrel{\text{def.}}{=} \{ x + y \colon x \in X, y \in Y\} = V, \] where the equality \(X \oplus Y = V\) should be interpreted in terms of isomorphisms (\(V\) can be represented as \(X \oplus Y\)). Note that \[ \dim(X \oplus Y) = \dim(X) + \dim(Y). \] With these definitions, the rank--nullity theorem can be expressed as a geometric description of the underlying space (\(\R^n\)) in terms of the matrix \(A\). ==== › Rank–nullity theorem: geometric version ==== Let \(A \in M_{m\times n}(\R)\). Then \[ \R^n = \mathrm{ker}(A) \oplus \mathrm{ran}(A^t). \] ** N.b. ** A consequence of this is that \(\mathrm{rank}(A) = \mathrm{rank}(A^t)\); another is that \(\R^m = \mathrm{ker}(A^t) \oplus \mathrm{ran}(A)\). We have already showed that \(\mathrm{ker}(A) \perp \mathrm{ran}(A^t)\) in \(\R^n\), so that \[\mathrm{ker}(A) \cap \mathrm{ran}(A^t) = \{0\};\] this is a consequence of that \(|x|^2 = x \cdot x = 0\) for any \( x \in \mathrm{ker}(A) \cap \mathrm{ran}(A^t)\).\\ \\ It remains to show that \(\mathrm{ker}(A) \oplus \mathrm{ran}(A^t)\) make up of all \(\R^n\). Since \(\mathrm{ker}(A) \perp \mathrm{ran}(A^t)\) in \(\R^n\), we have \[\mathrm{rank}(A^t) \leq n - \dim(\mathrm{ker}(A) = \mathrm{rank}(A),\] where the last equality follows is the rank--nullity theorem. But this argument is not dependent on \(A\); hence \[ \mathrm{rank}(A) = \mathrm{rank}((A^t)^t) \leq \mathrm{rank}(A^t), \] and \[\mathrm{rank}(A) = \mathrm{rank}(A^t).\] This shows that \[\mathrm{ker}(A) \oplus \mathrm{ran}(A^t) = \R^n\] constitute all of \(\R^n\). \\ ==== › Summary on linear equations (the Fredholm alternative) ==== Let \(A \in M_{m\times n}(\R)\) be the realisation of a linear transformation \(\R^n \to \R^m\), and consider the linear equation \[Ax = b.\] * Either \(b \in \mathrm{ran}(A)\) and the equation is solvable, or \(b \in \mathrm{ker}(A^t)\) and there is no solution. * In case \(\ker(A) = \{0\}\) any solution is unique, else the solutions can be described as \[ x_p + \ker(A), \] where \(x_p\) is any (particular) solution of \(Ax = b\). The first statement is a reformulation of the geometric version of the rank–nullity theorem; the second follows from that \[ \begin{cases} Ax = b\\ Ay = b \end{cases} \quad \Longleftrightarrow \quad \begin{cases} Ax = b\\ A(x-y) = 0 \end{cases} \quad \Longleftrightarrow \quad \begin{cases} Ax = b\\ y = x + z,\quad z \in \mathrm{ker}(A). \end{cases} \] If \(\mathrm{ker}(A) = \{0\}\) there is at most one solution, \(x\), else the solution space is an affine space((An affine space is a 'translation' of a vector space (or a vector space which has lost its origin); affine spaces are not themselves vector spaces, since they do not have any zero element.)) of the same dimension as \(\mathrm{ker}(A)\). {{page>linearmethods:footer&noheader&noindent}}