Linear Algebra
November 19, 2023
Linear algebra #
This is my take on reading a book on linear algebra. (Linear algebra done right, upto ch 7)
What is it about #
Two basic ingredients #
uniqueness (sameness)
- vector: two vectors are different if one can’t be stretched out to match another
- set of vectors: two set of vectors are different if any vector in one set can’t be represented as linear combination of vectors in another set
spanness
It is the opposite of uniqueness in a sense, only when you define what you can cover (span), you can talk about what you can’t cover (uniqueness)
Compared to a number #
- A number’s identity .. spans only itself.
- A number is a single entity, where a vector is a set
Mathematical definition #
We say one thing is unique (independent) when it can’t be represented as linear combinations of another.
Linear combination
: \( a_1 v_1 + a_2 v_2 + \dots + a_n v_n \)
Vector space
a set of vectors which behaves nicely according to the rule we define
- it has 0
- it has inverse (like 3 has -3 in numbers)
- any linear combination of the members (vectors) is also the member of this set
- This defines the scope of vector space, not the other way around
- any member is well defined mathematical object
- it is equal to itself, and the equality is well defined (i.e. you can’t have conflicting statement on equality statement)
Basis
a set of vectors where each vector is unique (independent) and linear combination of them spans the whole vector space
Different type of vectors #
As we have multiple type of numbers such as integer and rational etc, we have different type of vectors.
Linear map #
Suppose we have a map \(T\) from a vector space \( V \) to another vector space \( W \).
\( V \) is comprised of transit routes you can take: e.g. you take subway from home to office
\( W \) is comprised of locations of places (such as address of office)
\( T \) is a linear map which can map a transit route to a location of place
Now imagine, in one day, a public transportation routes are changed, so you have to take different bus or subway to go to office. But rest assured, you can go to office because our new routes cover the same locations as we did.(spanness of the routes becomes the sameness of the transporation system)
Now we established there can be \( S \) which is also a linear map which maps a transit route to a location of place
Since we have a set of entities, we ask if we can apply the Linear combination
on these new entities.
\[ T = a_1 T_1 + a_2 T_2 \]
I.e. we ask if the equality is well defined after this linear combination
.
Then, we say all the possible combinations of this forms the new space. (i.e. we define a new vector space)
Again, if expressed mathematically, we have to define Linear combination
of this new member, to do that we need to define two operations.
\( S+T \) and \(aT\)
So we choose to define it by
\( (S+T)(v) = Sv + Tv \) and \( (aT)(v) = a(Tv) \)
(? Since this definition define the ’equality’ of the members of this new space, we are done.)
What have we accompished?
- We have built a new vector space with new object (transit system \(T\) and \(S\) ) where we can do linear operation on them. Each transit system has its own transit routes
Dual Map #
Can we do the similar thing on linear map as well? I.e. can we build a new vector space where each linear map is a vector in this?
Linear functional
is a special kind of linear map where it takes a vector to a number space \( F \)So in our transit example, given routes, a cost function (linear functional) computes the cost for the route.
Dual basis
We know that any linear cost function can be represented by set of cost functions where each function outputs 1 for the respective basis route and 0 for all the other basis routes.
I.e. if I know the special routes that are the building blocks which can span your whole routes, I can assign 1 or 0 for each route, and with these new (basis) functions, I can build any linear cost function for any route.
We call these basis functions, dual basis of the dual space. I.e dual space of \(V\) is the vector space of all linear functionals on \(V\), and the dual space is denoted as \( V^’ \)
Dual map
As \( V \) has its own dual space, \( W \) has its own dual space.
Suppose each location has a cost associated with it as well, and the cost is linear to the locations.
Now, we can think of a cost function: given a route it would compute the cost associated the destination (location).
Suppose there are multiple cost functions associated with a location. (e.g. a cost function maps to rental price, another cost maps to # of rooms)
Dual map of T acts on these cost functions of \( W \), where each cost function is a linear map from \( W \) to \( F \).
And this dual map is a linear map on these cost functions. i.e. linear combination of cost functions for a location is same whether we calculate the cost separately and add them or we add the cost function and calculate the cost using the new cost function.
Transpose \( A^t \)
Every linear map can be represented by a matrix once the basis for \(V\) and \(W\) are chosen.
It happens that a matrix \( A \) represents a linear map \( T \), then \( A^t \) represents the dual map of \( T \)
So, given a matrix \(A\) which maps a route to a location.
One can calculate costs of a location from the perspective of someone in \(V\) (who is using \(A\) to get to locations in \(W\)) using \( A^t \)
So, if there is another matrix \( B \) which maps \(V\) to \(W\) (maybe he is located in different origin, or he’s using only helicopter as transportation), he would calculate the cost of location which he reaches using \(B \) by the matrix \( B^t \).
Why so abstract? #
As defining numbers abstractically let us to apply number oriented thinking to just about anything, defining a special set (vector space) give us the similar capability.
For example, if we consider vector space of all polynomials, we can apply linear algebra and prove followings.
Division algorithm for polynomials #
\[ p = sq + r \]
There exists unique \( q, r\) given \( p, s\).
A polynomial has at most as many solutions as its degrees #
Fundamental theorem of algebra #
Every nonconstant polynomial with complex coefficients have a solution.
Summary #
We get the above results by recognizing our problem space is a vector space
What are the underlying models #
The base entity #
\( 0 \) and \( 1 \) #
Number system has 0 and 1 as base entity. And numbers \(F\) are used for the coefficent in the model (linear algebra) as well.
Although numbers could be continuous, base entities for linear algebra (at least for finite dimension) are discrete.
I.e. as 1 quantize the number line, all the base entity is quantizing entity.
Linearly independent vector #
A vector space has linearly indepdent vector as base entity.
Where a linearly independent vector ( \( v\) ) can’t be represented using other (linearly independent to \(v\) ) vectors.
A linearly independent vector quantize (separate by discrete parts) a vector space.
Basis (set of vectors) #
A basis is quantizing the vector spaces into sub vector spaces.
Linear map #
Linear map is also a vector space as noted, but it is special in that:
a linear map is a function from a vector space to another vector space, i.e. it connects two vector spaces.
What are the assumptions #
Underlying axioms #
Following definitions can be considered as foundational definitions.
I.e. Followings seems to be the basis
of the linear algebra. (in finite dimensional)
- Linear combination
- Vector space
- Linear map
- Injective and surjective
- Inner product
- Invariant subspace
- set property (see below)
Similar idea on multiple domains #
E.g. partition is expressed as linear independence in one domain, and as direct sum in another domain. And they are often times related.
- Sometimes they are exactly related as in the partition case.
- Sometimes they are subset, superset relation. E.g. normal are necessary condition for independent
- Sometimes they seem to be exact, but in analogy..
Other definitions such as the followings can be constructed from the above #
Spanness #
All linear combinations you can have using the given vectors
Linear independence #
If your span doesn’t include me, i’m independent from you
Basis #
It’s the set which spans your vector space. Or you could think backward, if you have vector space, you can find basis.
Subspace #
A space of the subset of the basis of the bigger space.
Direct sum #
When a set of basis are independent from other all other basis, the subspaces are also the partition of the space.
Dimension #
The number of partitions. Each finite vector space has a dimension.
Matrix #
A linear map from \( V \) to \( W\) is set of linear combination relations from a basis of \( V \) to a basis of \( W\).
If we fix a basis for the two spaces, the linear map can be represented as a matrix. Where i th column of A \( A_i \), represents linear combination of \( W^s \) for \( v_i \), or equivalently \( A M(v) \) represents \( \text{M}(w) \)
Transpose \(A^t\) is a dual map of \( A \), when \(A\) maps vectors, \( A^t\) maps linear functionals from the target space to the source space.
Conjugate Transpose \( A^* = \overline{A^t} \) is a matrix of the adjoint.
Null space, Range, Quotient space #
k- You have two spaces with partition (by basis).
- You have a linear map from one space to another space
The linear map partitions source basis vector into two partitions.
null partition, if map is injective, this partition is empty
range partition, if map is surjective, the images of this partition’s vectors also makes complete partition over the destination space
the number of partition of source space is the same as the number of partition in null part and range part. For two spaces with same dimension, they are just renaming of each other. (isomorphic)
Quotient space \( V/U \): you can partition a space into set of parallel subspaces to a subspace \( U \)
Eigenvector, eigenvalue, operator #
Operator is a linear map from \( V \) to \( V \)
span(eigenvector v) is a 1-dimensional subspace of V invariant under operator T
Norm, orthogonal, projection #
An inner product \( \langle u, v \rangle \) gives us a number in \( F \).
We can define length (norm), and angle (angle) (how similar the two vectors are in direction), and decompose it to two orthorgonal components.
Linear functional #
A linear functional \( \phi(v) \in L(V, F) \) can be viewed as a inner product with unique \(u\) in \( V\). I.e. \( \phi(v) = \langle v, u \rangle = \overline{\langle u, v \rangle} \)
Dual basis, dual map #
For a space of dimension N, a linear functional is a linear map of dimension N. Given a basis, there is a dual basis which can represent any linear functional.
For given two basis of \( V \) and \(W\), matrix \( A \) represents the linear map. \( A^t \) represents the linear map of dual spaces from \(W^’’ \) to \( V^’\).
Where \( A_1\) column selects which component of \(W\) is present, \(A^{t}_1\) column selects all the components which has a \( W_1 \) component.
Adjoint #
For a linear map \( T \), a linear functional of \( \langle Tv, w \rangle \), has the same linear functional of the form \( \langle v, T^* w \rangle \), where \( T^* \) is a linear map.
The similarity after mapping \(v\) onto \(W\) by \( T\) is the same as the similarity mapping \( w\) onto \(V\) by \( T^* \). Where \( M(T^*) = \overline{A^t}\)
QUESTION What is the relation inner product’s conjugate symmetry, dual map’s transpose and adjoint’s conjugate transpose?
Dual map represents their linear combination (of linear functional) from my perspective.
Adjoint represents their linear combination from their perspective to match the conjugate symmetry between us.
QUESTION So why are we imposing conjugate symmetry for inner product? What does it mean?
Maybe we wanna have a notion of direction of the inner product operation.
Similarity from your perspective is conjugate symmetric to similarity from my perspective.
Notice that for real \( T \), there’s no conjugacity, i.e. there’s no direction of inner product.
self adjoint \( T = T^* \)
is much like \( z = \bar{z} \), which indicates z is a real number.
T is self adjoint if and only if \( \langle Tv, v \rangle \in R \).
QUESTION would it make sense to interpret \( \bar{z}, T^* \) the following way?
It looks like, you can interpret \( z \) and \( \bar{z} \), describing a thing from a symmetric perspective.
Then, one can look at \( T \) similarly, where \( T \) describes mapping from my perspective, and \( T^* \) from symmetric perspective.
Indeed, \( \langle Tv, w \rangle = \langle v, T^* w \rangle = \overline{\langle T^* w, v \rangle} \), for \(T \), \(T^*\) seems to be the other side’s mapping representation.
Then what would it mean that we have \( R \) where perspective doesn’t matter, and \( C \) where perspective does matter?
Normal #
We say an operator (a linear map whose target space is the same as source space) is normal when \( \| Tv \| = \| T^* v\| \) or \( T T^* = T^* T\).
QUESTION so, why call this normal for the linear map dimension?
These linear map preserves the angle (similarity) measured by inner product: because their length components remain equal.
By cauche, we have \( \| \langle u, v \rangle \| \leq \|u\| \|v\| \), if we define \( sim(u,v) = \frac{\| \langle u, v \rangle \|}{ \|u\| \|v\|} \) So when our vector norms are same, our similarity measure can only differ by sign.
We are transforming our space by \( T \) ( \(V^’\) ), then our inner product is also transformed by \( T^* \) in \( V \)’s perspective.
So, in a sense, with \( T^* T \), they are calculating inner product from their perspective, and we are calculating our inner product with \( T T^*\).
If they are same, our transformation of each vector has the same similarity against the original vector, from both perspective. This can happen only at right angles…
Positive operator #
Much like, we have square root of positive number, when \( T \) is positive, we have \( T = S^2 = R^* R\), i.e. we have square root of the operator \( T \)
I.e. \( \langle Tv, v \rangle = \langle R^* Rv, v \rangle = \langle Rv, Rv \rangle \geq 0 \)
Conversely, \( T T^* \) is a positive operator for any operator T.
\[ \langle T T^*x, x \rangle = \langle T^*x, T^*x \rangle \]
And eigen values are real and possitive. Theorem1: If A is (conjugate) symmetric positive definite, all its eigenvalues are positive
Isometry #
\( \| Sv \| = \| v\| \), a norm preserving operator is an isometry.
And \( S S^* = I \)
QUESTION Why are we restricting ourself to “linear”? #
- When you compose multiple things, a linear composition (combination) is the most basic one in a sense
- If we master this simple composition, we can study many more complex compositions using this simple one
- In Calculus, a function is also linear combination of other functions (in the limiting small space) Taylor series even says we can represent functions by a linear combination of polynomial functions
- Fourier transfrom says any periodic function can be represented by a linear combination of sine function (which can also be represented by polynomial functions)
- I guess after mathematicians master the linear composition, they might move on to another
Selected theorems #
Finite dimensional subspaces #
Every subspace of a finite-dimensional vector space is finite-dimensional.
Spectral theorem #
When T is normal (\(F = C\)) or self-adjoint (\(F=R\)), there is a basis in V where each of each basis vector is normal to each other and eigenvector of T.
I.e. For normal T, a vector space can be partitioned into perpendicular basis where T acts as a stretching. \[ M(T, V^s) = \lambda \]
Singular value decomposition #
Any operator can be viewed as linear map of stretching (diagnoal matrix) from an eigen basis to another eigen basis.