https://variationalform.github.io/
https://github.com/variationalform
Simon Shaw https://www.brunel.ac.uk/people/simon-shaw.
This work is licensed under CC BY-SA 4.0 (Attribution-ShareAlike 4.0 International) Visit http://creativecommons.org/licenses/by-sa/4.0/ to see the terms. |
This document uses python | and also makes use of LaTeX | in Markdown |
You will be introduced to ...
numpy
to work with systems of equations.As usual our emphasis will be on doing rather than proving: just enough: progress at pace
For this worksheet you should read Chapter 11 of [VMLS] for background
to the linear algenra of vectors, and also Appendix D of [DSML] if you want to read more
about python
and numpy
.
Further accessible material can be found in Chapter 1 of [FCLA] and in Chapters 2.3 of [MML].
All of the above can be accessed legally and without cost.
There are also these useful references for coding:
python
: https://docs.python.org/3/tutorialnumpy
: https://numpy.org/doc/stable/user/quickstart.htmlmatplotlib
: https://matplotlib.orgThe following type of problem occurs frequently across the whole landscape of applied mathematics - of which data science and analytics is a part. We'll see several examples of it as we go along. The problem is of this nature: find the values of $x$, $y$ and $z$ that simultaneously satisfy these three linear equations:
\begin{alignat*}{6} && 3x &-2y +\ 4z && = & 2 \\ &&-6x &+6y -11z && = & -1 \\ && 6x &+2y +\ 5z && = & 7 \end{alignat*}We can write this as: find $\boldsymbol{u} = (x, y, z)^T$ such that
$$ \boldsymbol{B}\boldsymbol{u}=\boldsymbol{f} \qquad\text{ for } \boldsymbol{B} = \left(\begin{array}{rrr} 3 & -2 & 4 \\ -6 & 6 & -11 \\ 6 & 2 & 5 \\ \end{array}\right) \text{ and } \boldsymbol{f} = \left(\begin{array}{rrr} 2 \\ -1 \\ 7 \end{array}\right) $$If $\boldsymbol{B}$ is invertible then we can obtain $\boldsymbol{u}$, and hence $x$, $y$ and $z$, by solving the system:
$$ \left(\begin{array}{rrr} x \\ y \\ x \end{array}\right) = \boldsymbol{u} = \boldsymbol{B}^{-1}\boldsymbol{f} = \frac{1}{36}\left(\begin{array}{rrr} 52 & 18 & -2 \\ -36 & -9 & 9 \\ -48 & -18 & 6 \end{array}\right) \left(\begin{array}{rrr} 2 \\ -1 \\ 7 \end{array}\right) = \left(\begin{array}{rrr} 2 \\ 0 \\ -1 \end{array}\right). $$What we see here is that if we have three unknowns, then we can determine them if we have three independent pieces of information. In this case that means the three equations. If those equations are not independent of each other then we wont be able to find all three unknowns.
Let's see this in numpy
import numpy as np
B = np.array( [[3, -2, 4],[-6, 6, -11],[ 6, 2, 5 ]])
u = np.array([[2], [0], [-1]])
f = B.dot(u)
print('f = \n', f)
f = [[ 2] [-1] [ 7]]
# here is how to solve a linear system like this...
u = np.linalg.solve(B, f)
print(u)
[[ 2.] [ 0.] [-1.]]
The point just made is quite subtle. Rather than explore the general theory let's look at some examples.
Example 1 The system, find $\boldsymbol{u}$ such that,
$$ \boldsymbol{B}\boldsymbol{u}=\boldsymbol{f} \qquad\text{ for } \boldsymbol{B} = \left(\begin{array}{rrr} 3 & -2 & 4 \\ -6 & 6 & -11 \\ 6 & -4 & 8 \\ \end{array}\right) \text{ and } \boldsymbol{f} = \left(\begin{array}{rrr} 2 \\ -1 \\ 4 \end{array}\right) $$seems just like the one above, but there's an important difference. It's hard to spot, but if you look closely you can see that the third equation is exactly the same as the first multiplied by 2. Therefore these equations do not give three independent pieces of information - just two, and so there is not enough information to obtain three unknowns $\boldsymbol{u}$.
However, we can assert that $x = (10-2z)/6$ and $y = (9+9z)/6$ will solve this system for any choice of $z$ (we'll see why in the lab).
Example 2 On the other hand, consider this problem: find $\boldsymbol{u}$ such that,
$$ \boldsymbol{B}\boldsymbol{u}=\boldsymbol{f} \qquad\text{ for } \boldsymbol{B} = \left(\begin{array}{rrr} 3 & -2 & 4 \\ -6 & 6 & -11 \\ 9 & -8 & 15 \\ \end{array}\right) \text{ and } \boldsymbol{f} = \left(\begin{array}{rrr} 2 \\ -1 \\ 3 \end{array}\right). $$Here you have to look even more closely: if you subtract the second equation from the first you'll get the third. So, again, there are three unknowns but only two (independent) equations.
Example 3 What do you make of this? Find $\boldsymbol{u}=(x, y)^T$ such that
$$ \boldsymbol{B}\boldsymbol{u}=\boldsymbol{f} \qquad\text{ for } \boldsymbol{B} = \left(\begin{array}{rrr} 3 & -2 \\ -6 & 6 \\ 6 & 2 \\ \end{array}\right) \text{ and } \boldsymbol{f} = \left(\begin{array}{rrr} 6 \\ -18 \\ 7 \end{array}\right). $$This is the same as
$$ \left(\begin{array}{rrr} 3 & -2 \\ -6 & 6 \\ 6 & 2 \\ \end{array}\right) \left(\begin{array}{rrr} x \\ y \end{array}\right) = \left(\begin{array}{rrr} 6 \\ -18 \\ 7 \end{array}\right) $$or,
\begin{alignat*}{6} && 3x &-2y = & 6 \\ &&-6x &+6y = & -18 \\ && 6x &+2y = & 7 \\ \end{alignat*}Specifically: We want to find $x$ and $y$ such that,
\begin{alignat*}{6} && 3x &-2y = & 6 \\ &&-6x &+6y = & -18 \\ && 6x &+2y = & 7 \\ \end{alignat*}It turns out this is impossible: $x=0$ and $y=-3$ is the unique (why?) solution to the first two equations yet using these in the third gives $2y=-6\ne 7$.
There is therefore no solution to this problem because now we have three independent pieces of information (the equations), but only two degrees of freedom to fit that information.
Examples 1 and 2 are examples of under-determined systems. Here there are not as many equations as there are unknowns and when we remove the redundant equations the system matrices are short and wide (more columns than rows). We can expect to be able to solve such systems - although not uniquely.
Example 3 is an example of an over-determined system where there are more independent equations than unknowns. The system matrix is in this case tall and narrow (more rows than columns). These systems may not have solutions in the usual sense unless the extra equations are consistent with the rest of the system (although in that case it isn't really over-determined). The full story is complicated - here we're doing just enough so we can make progress at pace.
The rank (for our purposes) of a matrix corresponds to the number of independent equations it is capable of representing. In Examples 1 and 2 above the rank of the matrix, which is $2$, is less than the dimension of the matrix, $3$ in both cases.
When the rank of a square matrix is equal to its dimension, then that matrix has an inverse and we can solve the linear system in order to get a unique solution. This was the case in the original example above where we wrote $\boldsymbol{B}\boldsymbol{u}=\boldsymbol{f}$ and solved for $\boldsymbol{u}=\boldsymbol{B}^{-1}\boldsymbol{f}=(2, 0,-1)^T$.
Consider these two vectors,
$$ \boldsymbol{u} = \frac{1}{5}\left(\begin{array}{r} 3 \\ 4 \end{array}\right) \qquad\text{ and }\qquad \boldsymbol{v} = \frac{1}{10}\left(\begin{array}{r} 8 \\ -6 \end{array}\right). $$It's easy to check that $\boldsymbol{u}^T\boldsymbol{v}=0$ and so they are orthogonal (that is $\cos\theta=0$ where $\theta$ is the angle between the vectors). Moreover, the denominators ensure that they are normalized to unit length, so they are in fact orthonormal. Let's stack them together horizontally and create a matrix like this,
$$ \boldsymbol{Q} = \big( \boldsymbol{u}\ \boldsymbol{v}\big) = \left(\begin{array}{rr} 3/5 & 8/10\\ 4/5 & -6/10 \end{array}\right) = \left(\begin{array}{rr} 3/5 & 4/5\\ 4/5 & -3/5 \end{array}\right) = \frac{1}{5}\left(\begin{array}{rr} 3 & 4\\ 4 & -3 \end{array}\right). $$Now notice that because of the orthonormality,
$$ \boldsymbol{Q}^T\boldsymbol{Q} = \frac{1}{25}\left(\begin{array}{rr} 3 & 4\\ 4 & -3 \end{array}\right) \left(\begin{array}{rr} 3 & 4\\ 4 & -3 \end{array}\right) = \frac{1}{25}\left(\begin{array}{rr} 25 & 0\\ 0 & 25 \end{array}\right) = \left(\begin{array}{rr} 1 & 0\\ 0 & 1 \end{array}\right) = \boldsymbol{I} $$and so $\boldsymbol{Q}^T=\boldsymbol{Q}^{-1}$. Such a matrix is called an orthogonal matrix. In this example $\boldsymbol{Q}$ is symmetric, but it need not be to have this property.
Indeed, if $\boldsymbol{Q}^T=\boldsymbol{Q}^{-1}$, it will also follow that $\boldsymbol{Q}\boldsymbol{Q}^T=\boldsymbol{I}$, even if $\boldsymbol{Q}$ is not symmetric.
This idea generalises to $n\times n$ matrices. We just have to imagine a matrix made by stacking a set of mutually orthogonal vectors next to each other. If each vector has $n$ components, and there are $n$ such vectors, then the matrix will be square. If the matrix is denoted by $\boldsymbol{Q}$ then it will follow that
$$ \boldsymbol{Q}^T\boldsymbol{Q} = \boldsymbol{I} \qquad\text{ the } n\times n\text{ identity matrix, and }\qquad \boldsymbol{Q}\boldsymbol{Q}^T = \boldsymbol{I}. $$We will see orthogonal matrices below when we study the eigenvectors of symmetric matrices.
We have made a lot of progress:
numpy
in python
we couldWe will be building extensively on these concepts. Matrices and vectors are form the branch of Mathematics called Linear Algebra.
Linear Algebra is of central importance in Data Science and Machine Learning.