https://variationalform.github.io/
https://github.com/variationalform
Simon Shaw https://www.brunel.ac.uk/people/simon-shaw.
This work is licensed under CC BY-SA 4.0 (Attribution-ShareAlike 4.0 International) Visit http://creativecommons.org/licenses/by-sa/4.0/ to see the terms. |
This document uses python | and also makes use of LaTeX | in Markdown |
You will be introduced to ...
numpy
to compute these decompositions.As usual our emphasis will be on doing rather than proving: just enough: progress at pace
For this worksheet you should read Chapter 7 of [FCLA], and Chapters 4 of [MML].
All of the above can be accessed legally and without cost.
There are also these useful references for coding:
python
: https://docs.python.org/3/tutorialnumpy
: https://numpy.org/doc/stable/user/quickstart.htmlmatplotlib
: https://matplotlib.orgConsider this matrix and vector,
$$ \boldsymbol{B} = \left(\begin{array}{rrr} 3 & -2 & 4 \\ -6 & 6 & -11 \\ 6 & 2 & 5 \\ \end{array}\right) \qquad\text{ and }\qquad \boldsymbol{x} = \left(\begin{array}{r} -2 \\ 3 \\ 1 \\ \end{array}\right) \qquad\text{ then }\qquad \boldsymbol{B}\boldsymbol{x} = \left(\begin{array}{r} -8 \\ 19 \\ -1 \\ \end{array}\right) $$and notice that $\boldsymbol{B}$ mixes up $\boldsymbol{x}$ to such an extent that $\boldsymbol{B}\boldsymbol{x}$ bears no relationship to the original $\boldsymbol{x}$. This isn't so surprising when you think about it.
We can do this in python
using numpy
as follows...
import numpy as np
B = np.array( [[3, -2, 4],[-6, 6, -11],[ 6, 2, 5 ]])
x = np.array([[-2], [3], [1]])
f = B.dot(x)
print('f = \n', f)
f = [[-8] [19] [-1]]
On the other hand, consider this matrix with a different vector,
$$ \boldsymbol{B} = \left(\begin{array}{rrr} 3 & -2 & 4 \\ -6 & 6 & -11 \\ 6 & 2 & 5 \\ \end{array}\right) \qquad\text{ and }\qquad \boldsymbol{w} = \left(\begin{array}{r} -2 \\ 5 \\ 2 \\ \end{array}\right). $$This time,
$$ \boldsymbol{B}\boldsymbol{w} = \left(\begin{array}{rrr} 3 & -2 & 4 \\ -6 & 6 & -11 \\ 6 & 2 & 5 \\ \end{array}\right) \left(\begin{array}{r} -2 \\ 5 \\ 2 \\ \end{array}\right) = \left(\begin{array}{r} -8 \\ 20 \\ 8 \\ \end{array}\right) = 4\left(\begin{array}{r} -2 \\ 5 \\ 2 \\ \end{array}\right) = 4\boldsymbol{w} $$So $\boldsymbol{B}\boldsymbol{w}=4\boldsymbol{w}$, and all $\boldsymbol{B}$ does is magnify $\boldsymbol{w}$ to be $4$ times longer.
Here it is again,
$$ \boldsymbol{B}\boldsymbol{w} = \left(\begin{array}{rrr} 3 & -2 & 4 \\ -6 & 6 & -11 \\ 6 & 2 & 5 \\ \end{array}\right) \left(\begin{array}{r} -2 \\ 5 \\ 2 \\ \end{array}\right) = \left(\begin{array}{r} -8 \\ 20 \\ 8 \\ \end{array}\right) = 4\left(\begin{array}{r} -2 \\ 5 \\ 2 \\ \end{array}\right) = 4\boldsymbol{w} $$use python
to show that $\frac{1}{4}\boldsymbol{B}\boldsymbol{w}-\boldsymbol{w}=\boldsymbol{0}$
B = np.array( [[3, -2, 4],[-6, 6, -11],[ 6, 2, 5 ]])
w = np.array([[-2], [5], [2]])
f = 0.25*B.dot(w)
print('f - w = \n', f-w)
# or many other variants, such as
print('result = \n', 0.25*B.dot(w)-w)
f - w = [[0.] [0.] [0.]] result = [[0.] [0.] [0.]]
This seems more surprising. Here $4$ is called an eigenvalue ('own value') of $\boldsymbol{B}$ and $\boldsymbol{w}$ is the corresponding eigenvector.
In general, for any square matrix $\boldsymbol{B}$ the problem of finding scalars $\lambda$ and vectors $\boldsymbol{v}$ such that
$$ \boldsymbol{B}\boldsymbol{v}=\lambda\boldsymbol{v} $$is called an eigenvalue problem. The following facts are known to be true:
The Eigenvalue Theorem. Every square matrix of dimension $n$ has $n$ eigenvalue-eigenvector pairs, $(\lambda_1,\boldsymbol{v}_1), (\lambda_2,\boldsymbol{v}_2),\ldots, (\lambda_n,\boldsymbol{v}_n)$. The eigenvalues need not be distinct, and the eigenvector lengths are arbitrary. A matrix which has one or more zero eigenvalues is not invertible. On the other hand, If the eigenvalues of a matrix are all non-zero then that matrix is invertible, and it has full rank. The determinant of a matrix is the product of its eigenvalues.
NOTE: if $\boldsymbol{B}\boldsymbol{v}=\lambda\boldsymbol{v}$ then it can be shown that we need $\det(\boldsymbol{B}-\lambda\boldsymbol{I})=0$. This is not practically useful, but is of central importance for theory.
For $\boldsymbol{B}$ above we have that
$$ \boldsymbol{B}\boldsymbol{v}=\lambda\boldsymbol{v} $$for $(\lambda_1,\boldsymbol{v}_1), (\lambda_2,\boldsymbol{v}_2), (\lambda_3,\boldsymbol{v}_3)$ given by
$$ 9 \text{ with } \left(\begin{array}{r} 17 \\ -45 \\ 3 \\ \end{array}\right), \qquad\qquad 1 \text{ with } \left(\begin{array}{r} -1 \\ 1 \\ 1 \end{array}\right) \qquad\text{ and }\qquad 4 \text{ with } \left(\begin{array}{r} -2 \\ 5 \\ 2 \end{array}\right). $$We have already seen the case $\lambda = 4$ and $\boldsymbol{v}=(-2,5,2)^T$ above.
The eigenvectors are not unique - they can be multiplied by an arbitrary (non-zero) scalar and they remain eigenvectors. For that reason it is usual to normalize an eigenvector by dividing through by its length, as given by the (Euclidean, Pythagorean) $2$-norm, $\Vert\cdot\Vert_2$. For example, for $\boldsymbol{v}=(-2,5,2)^T$ we have
$$ \Vert\boldsymbol{v}\Vert_2 = \sqrt{(-2)^2+5^2+2^2} = \surd{33} $$which means that
$$ \boldsymbol{v} = \frac{1}{\sqrt{33}}\left(\begin{array}{r} -2 \\ 5 \\ 2 \end{array}\right) = \left(\begin{array}{r} -0.348155311911396 \\ 0.870388279778489 \\ 0.348155311911395 \end{array}\right) $$is also an eigenvector for the eigenvalue $\lambda = 4$.
THINK ABOUT: An eigenpair satisfies $\boldsymbol{B}\boldsymbol{v}=\lambda\boldsymbol{v}$. Choose a non-zero real number $\alpha$ and write $\boldsymbol{w}=\alpha\boldsymbol{v}$. Is it true that $\boldsymbol{B}\boldsymbol{w}=\lambda\boldsymbol{w}$? Can you see why eigenvectors are not unique in length?
THINK ABOUT: If we choose $\alpha = \Vert\boldsymbol{v}\Vert_2^{-1}$ above what can you say about the value of $\Vert\boldsymbol{w}\Vert_2$?
If we normalize each of the eigenvectors above with their own length we get something like this (the decimals may go on for ever - why?):
$$ 9 \text{ with } \left(\begin{array}{r} 0.3527\ldots \\ -0.9336\ldots \\ 0.0622\ldots \\ \end{array}\right), \qquad\qquad 1 \text{ with } \left(\begin{array}{r} -0.5773\ldots \\ 0.5773\ldots \\ 0.5773\ldots \end{array}\right) \qquad\text{ and }\qquad 4 \text{ with } \left(\begin{array}{r} -0.3481\ldots \\ 0.8703\ldots \\ 0.3481\ldots \end{array}\right). $$Let's use numpy
to calculate the eigensystem for $\boldsymbol{B}$.
It goes like this...
w, V = np.linalg.eig(B)
print(w)
print(V)
[9. 1. 4.] [[ 0.35271531 -0.57735027 -0.34815531] [-0.93365819 0.57735027 0.87038828] [ 0.06224388 0.57735027 0.34815531]]
We can see that two quantities are returned, w
and V
. The eigenvalues are
collected in w
and the corresponding eigenvectors are the columns of V
.
Note that these columns of V
agree with our calculations above for the
normalized eigenvectors.
As indicated by this computation, we can stack the length-normalized eigenvectors together next to each other, with the eigenvalues on the leading diagonal of an otherwise zero matrix. We also make sure that the eigenvectors appear in the same order as the corresponding eigenvalues and then use the fact that $\boldsymbol{B}\boldsymbol{v}=\lambda \boldsymbol{v}$ for each eigen-pair. Then entire eigen-system can then be represented in one equation. For example,
$$ \begin{split} \left(\begin{array}{rrr} 3 & -2 & 4 \\ -6 & 6 & -11 \\ 6 & 2 & 5 \\ \end{array}\right) \left(\begin{array}{rrr} 0.3527\ldots & -0.5773\ldots & -0.3481\ldots \\ -0.9336\ldots & 0.5773\ldots & 0.8703\ldots \\ 0.0622\ldots & 0.5773\ldots & 0.3481\ldots \\ \end{array}\right) \\\ \\ \qquad {}= \left(\begin{array}{rrr} 0.3527\ldots & -0.5773\ldots & -0.3481\ldots \\ -0.9336\ldots & 0.5773\ldots & 0.8703\ldots \\ 0.0622\ldots & 0.5773\ldots & 0.3481\ldots \\ \end{array}\right) \left(\begin{array}{rrr} 9 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 4 \\ \end{array}\right) \end{split} $$We can write this as
$$ \boldsymbol{B}\boldsymbol{V} = \boldsymbol{V}\boldsymbol{D} $$where the columns of $\boldsymbol{V}$ are the eigenvectors of $\boldsymbol{B}$, and the diagonal matrix $\boldsymbol{D}$ has the eigenvalues on the leading diagonal. The left-to-right order of the eigenvalues in $\boldsymbol{D}$ matches the order that the eigenvectors appear in $\boldsymbol{V}$.
All three of these matrices are square and they each have the same dimension as $\boldsymbol{B}$.
Now, suppose that $\det(\boldsymbol{V})\ne 0$, then $\boldsymbol{V}^{-1}$ exists and we can (pre-)multiply both sides of $\boldsymbol{B}\boldsymbol{V} = \boldsymbol{V}\boldsymbol{D}$ by $\boldsymbol{V}^{-1}$ and get,
$$ \boldsymbol{V}^{-1}\boldsymbol{B}\boldsymbol{V} = \boldsymbol{V}^{-1}\boldsymbol{V}\boldsymbol{D} = \boldsymbol{D}. $$We see that this has produced a diagonal matrix $\boldsymbol{V}^{-1}\boldsymbol{B}\boldsymbol{V}$ that is similar to $\boldsymbol{B}$ in the sense that it has the same eigenvalues (why are they the same? What can you say about the eigenvalues of diagonal matrices?). Such an operation is called a similarity transformation.
On the other hand, we can (post-)multiply both sides of $\boldsymbol{B}\boldsymbol{V} = \boldsymbol{V}\boldsymbol{D}$ by $\boldsymbol{V}^{-1}$ and get,
$$ \boldsymbol{B} = \boldsymbol{B}\boldsymbol{V}\boldsymbol{V}^{-1} = \boldsymbol{V}\boldsymbol{D}\boldsymbol{V}^{-1}. $$One reason why this is useful is that we can now easily raise $\boldsymbol{B}$ to powers. For example,
$$ \boldsymbol{B}^2 = (\boldsymbol{V}\boldsymbol{D}\boldsymbol{V}^{-1}) (\boldsymbol{V}\boldsymbol{D}\boldsymbol{V}^{-1}) =\boldsymbol{V}\boldsymbol{D}^2\boldsymbol{V}^{-1}. $$and so on.
However, to do this we needed to assume that $\det(\boldsymbol{V})\ne 0$, and this need not be the case. Matrices for which this is true are called diagonalizable, and matrices for which it isn't true are called defective.
The eigenvalues and eigenvectors of a general square matrix could be complex numbers (which we aren't going to be too concerned with), and for large matrices the inverse $\boldsymbol{V}^{-1}$ could be hard to find explicitly.
However, in the special case of a symmetric matrix $\boldsymbol{A}$, the eigensystem $\boldsymbol{A}\boldsymbol{v}=\lambda\boldsymbol{A}$ is made up exclusively of real numbers.
Furthermore, the matrix of normalized eigenvectors, $\boldsymbol{V}$, is an orthogonal matrix. This means that $\boldsymbol{V}^{-1}=\boldsymbol{V}^T$ - which is very easy to calculate once $\boldsymbol{V}$ is known.
This is called the Spectral Theorem - see [MML, Theorem 4.15]
Spectral Theorem (for matrices) If $\boldsymbol{A}$ is real and symmetric (hence square) then its eigenvalues are all real and its eigenvector matrix $\boldsymbol{V}$ can be taken as orthogonal so that $\boldsymbol{V}^{-1}=\boldsymbol{V}^T$.
Let's see an example of this.
$$ \text{if } \boldsymbol{A} = \left(\begin{array}{rrr} 3 & -2 & 4 \\ -2 & 6 & 2 \\ 4 & 2 & 5 \end{array}\right) $$then (with some rounding),
$$ \boldsymbol{D} \approx \left( \begin{array}{rrr} -1.217 & 0 & 0 \\ 0 & 8.217 & 0 \\ 0 & 0 & 7 \end{array} \right) \quad\text{ and }\quad \boldsymbol{V} \approx \left( \begin{array}{rrr} .726 & .522 & -.447 \\ .363 & .261 & .894 \\ -.584 & .812 & 0 \\ \end{array} \right) $$We'll take a look at how to do that in the lab.
We will verify that $\boldsymbol{V}^T\boldsymbol{V}=\boldsymbol{I}$ up to rounding error.
We'll also confirm that $\boldsymbol{A}\boldsymbol{V}=\boldsymbol{V}\boldsymbol{D}$.
Next, by post-multiplying by $\boldsymbol{V}^{-1}$ we note that we can also write $\boldsymbol{A}\boldsymbol{V}=\boldsymbol{V}\boldsymbol{D}$ as $\boldsymbol{A}=\boldsymbol{V}\boldsymbol{D}\boldsymbol{V}^T$ which, in expanded form, is,
$$ \boldsymbol{A} = (\boldsymbol{v}_1\ \boldsymbol{v}_2\ \boldsymbol{v}_3\ \ldots) \left(\begin{array}{rrrr} \lambda_1 \\ & \lambda_2 \\ & & \lambda_3 \\ & & & \ \ \ddots \end{array}\right) \left(\begin{array}{rrrr} \boldsymbol{v}_1^T\\ \boldsymbol{v}_2^T\\ \boldsymbol{v}_3^T\\ \vdots \end{array}\right) $$(we haven't shown all the zero elements). Then simplifying this we get
$$ \boldsymbol{A} = (\boldsymbol{v}_1\ \boldsymbol{v}_2\ \boldsymbol{v}_3\ \ldots) \left(\begin{array}{cccc} \lambda_1\boldsymbol{v}_1^T \\ \lambda_2\boldsymbol{v}_2^T \\ \lambda_3\boldsymbol{v}_3^T \\ \vdots \end{array}\right) = \lambda_1\boldsymbol{v}_1\boldsymbol{v}_1^T + \lambda_2\boldsymbol{v}_2\boldsymbol{v}_2^T + \lambda_3\boldsymbol{v}_3\boldsymbol{v}_3^T + \cdots = \sum_{k=1}^n \lambda_k\boldsymbol{v}_k\boldsymbol{v}_k^T. $$Let's see this in python for a specific example. We start by getting the eigen-system for $\boldsymbol{A}$ as above.
A = np.array([[3,-2,4],[-2,6,2],[4,2,5]])
w, V = np.linalg.eig(A)
D=np.diag(w)
print('D = \n', D)
D = [[-1.21699057 0. 0. ] [ 0. 8.21699057 0. ] [ 0. 0. 7. ]]
Let's look at each decomposed term in turn. First, for the $k=1$ term,
print( D[0,0]*V[:,0:1]*V[:,0:1].T )
[[-0.64159712 -0.32079856 0.51600297] [-0.32079856 -0.16039928 0.25800148] [ 0.51600297 0.25800148 -0.41499417]]
Let's think about what is going on here. We are printing out the quantity
D[0,0]*V[:,0:1]*V[:,0:1].T
In this, the D[0,0]
factor is just the eigenvalue from the top left
(first row, first column) of the $\boldsymbol{D}$ matrix.
Then, next, V[:,0:1]
is a numpy
slice.
In this type of expression [c,a:b]
means take the elements in row c
that occupy columns a
through to b-1
. The expression [:,a:b]
means
take all of the rows. Remember that column and row numbering starts at zero
in numpy
.
So, V[:,0:1]
says take the first column of V
- a column vector,
and V[:,0:1].T
says take the transpose of the first solumn of V
.
It is important to note that we have to write our slicing
expressions in the form V[:,0:1]
rather than V[:,0]
,
otherwise we lose the shape. See e.g.
https://stackoverflow.com/questions/29635501/row-vs-column-vector-return-during-numpy-array-slicing
Here is a demo:
print('V = \n', V)
print('V[:,0:1] = \n', V[:,0:1])
print('V[:,0] = \n', V[:,0])
V = [[ 7.26085219e-01 5.22302838e-01 -4.47213595e-01] [ 3.63042610e-01 2.61151419e-01 8.94427191e-01] [-5.83952325e-01 8.11787954e-01 2.85088132e-16]] V[:,0:1] = [[ 0.72608522] [ 0.36304261] [-0.58395233]] V[:,0] = [ 0.72608522 0.36304261 -0.58395233]
Therefore, for $k=1$ we can write $\lambda_k\boldsymbol{v}_k\boldsymbol{v}_k^T$
in code as D[0,0]*V[:,0:1]*V[:,0:1].T
.
The full reconstruction of $\boldsymbol{A}$ is then...
print('\n The reconstruction of A ...')
print(D[0,0]*V[:,0:1]*V[:,0:1].T + D[1,1]*V[:,1:2]*V[:,1:2].T + D[2,2]*V[:,2:3]*V[:,2:3].T)
The reconstruction of A ... [[ 3. -2. 4.] [-2. 6. 2.] [ 4. 2. 5.]]
This is quite a big deal, it means in effect that we can break a square symmetric matrix into pieces and consider as many or as few of those pieces as we wish.
This point of view really suggests an approximation scheme. If $\boldsymbol{A}$ is $n\times n$ and symmetric then,
$$ \boldsymbol{A} = \sum_{k=1}^n \lambda_k\boldsymbol{v}_k\boldsymbol{v}_k^T \qquad\text{ which suggests that }\qquad \boldsymbol{A} \approx \sum_{k=1}^m \lambda_k\boldsymbol{v}_k\boldsymbol{v}_k^T $$for $m < n$. We would want to sort the eigenvalues so that the most dominant ones come first in this sum, which is the same as saying
$$ \vert\lambda_1\vert \ge \vert\lambda_2\vert \ge \vert\lambda_3\vert \ge \cdots $$We'll look more closely at this in the workshop session where we will see the effect on the amount of data storage we can save.
Building on these examples we can infer that if a large matrix can be well approximated by just a few terms in the eigen-expansion then the amount of storage required in computer memory can be vastly reduced.
This is very useful. But it only applies to symmetric square matrices.
We can extend it to non-symmetric matrices by introducing complex numbers but we can't extend this to non-square matrices because they don't have eigenvalues.
Fortunately, there is another - even more powerful - tool at our disposal.
Only square matrices have eigenvalues, and not all square matrices are diagonalizable via the similarity transform. For general matrices (containing real numbers for us) there is a tool called the SVD - the Singular Value Decomposition.
Let $\boldsymbol{K}$ be an $n$-row by $m$-column matrix of real numbers.
Then $\boldsymbol{K} = \boldsymbol{U}\boldsymbol{\Sigma}\boldsymbol{V}^T$ - this is called the Singular Value Decomposition of $\boldsymbol{K}$. In this:
The entries on the diagonal of $\boldsymbol{\Sigma}$ are called the singular values of $\boldsymbol{K}$ and the number of non-zero singular values gives the rank of $\boldsymbol{K}$.
The columns of $\boldsymbol{U}$ (resp. $\boldsymbol{V}$) are called the left (resp. right) singular vectors of $\boldsymbol{K}$.
Let's see an example of $\boldsymbol{K} = \boldsymbol{U}\boldsymbol{\Sigma}\boldsymbol{V}^T$.
$$ \text{If } \boldsymbol{K}=\left(\begin{array}{rrr} 1 & 2 & 5 \\ 5 & -6 & 1 \\ \end{array}\right) \text{ then } \boldsymbol{U}=\left(\begin{array}{rr} -0.06213\ldots & 0.99806\ldots \\ 0.99806\ldots & 0.06213\ldots \\ \end{array}\right), $$$$ \boldsymbol{\Sigma}=\left(\begin{array}{rrr} 7.88191\ldots & 0 & 0 \\ 0 & 5.46584\ldots & 0 \\ \end{array}\right) \text{ and } \boldsymbol{V}=\left(\begin{array}{rrr} 0.62525\ldots & 0.23944\ldots &-0.74278\ldots \\ -0.77553\ldots & 0.29699\ldots &-0.55708\ldots \\ 0.08720\ldots & 0.92437\ldots & 0.37139\ldots \\ \end{array}\right) $$If we use these we can indeed check that
\begin{align*} & \scriptsize \left(\begin{array}{rr} -0.062\ldots & 0.998\ldots \\ 0.998\ldots & 0.062\ldots \\ \end{array}\right) \left(\begin{array}{rrr} 7.881\ldots & 0 & 0 \\ 0 & 5.465\ldots & 0 \\ \end{array}\right) \left(\begin{array}{rrr} 0.625\ldots & 0.239\ldots &-0.742\ldots \\ -0.775\ldots & 0.296\ldots &-0.557\ldots \\ 0.087\ldots & 0.924\ldots & 0.371\ldots \\ \end{array}\right)^T \\ &\qquad{} = \left(\begin{array}{rrr} 1 & 2 & 5 \\ 5 & -6 & 1 \\ \end{array}\right) \end{align*}as required. We aren't going to do it by hand though, that's what computers are for!
K = np.array([[1,2,5],[5,-6,1]])
U, S, VT = np.linalg.svd(K)
print(U)
print(S)
print(VT)
[[-0.06213744 0.9980676 ] [ 0.9980676 0.06213744]] [7.88191065 5.4658471 ] [[ 0.62525456 -0.77553283 0.08720987] [ 0.23944227 0.29699158 0.9243719 ] [-0.74278135 -0.55708601 0.37139068]]
Note two things:
np.linalg.svd
returns $\boldsymbol{V}^T$, not $\boldsymbol{V}$. S
doesn't agree with $\boldsymbol{\Sigma}$.We'll need to pad S
- and then we can check the reconstruction $\boldsymbol{K} = \boldsymbol{U}\boldsymbol{\Sigma}\boldsymbol{V}^T$.
The padding is a bit awkward - here it is...
S = np.hstack(( np.diag(S), np.zeros((2,1)) ))
print(S)
[[7.88191065 0. 0. ] [0. 5.4658471 0. ]]
Now we can check the reconstruction $\boldsymbol{K} = \boldsymbol{U}\boldsymbol{\Sigma}\boldsymbol{V}^T$.
Note that we can also use @
to perform matrix multiplication.
print(K - U @ S @ VT)
[[-4.44089210e-16 -4.44089210e-16 -1.77635684e-15] [-8.88178420e-16 1.77635684e-15 1.11022302e-16]]
This is zero (to machine precision) as expected.
There is a great deal that can be said about the SVD, but we're going to stay narrowly focussed and explore its value in data science and machine learning.
Let's start with the following observation. If we denote the $n$-th column of $\boldsymbol{U}$ by $\boldsymbol{u}_n$, and the $n$-th column of $\boldsymbol{V}$ by $\boldsymbol{v}_n$, then the statement $\boldsymbol{K} = \boldsymbol{U}\boldsymbol{\Sigma}\boldsymbol{V}^T$ becomes (we saw something very similar to this above with eigenvalues),
$$ \boldsymbol{K} = (\boldsymbol{u}_1\ \boldsymbol{u}_2\ \cdots\ \boldsymbol{u}_n) \left(\begin{array}{rrrr} \sigma_1 & & & \\ & \sigma_2 & & \\ & & \ddots & \\ & & & \sigma_n \end{array}\right) \left(\begin{array}{r} \boldsymbol{v}_1^T \\ \boldsymbol{v}_2^T \\ \vdots\ \\ \boldsymbol{v}_n^T \\ \end{array}\right) = (\boldsymbol{u}_1\ \boldsymbol{u}_2\ \cdots\ \boldsymbol{u}_n) \left(\begin{array}{r} \sigma_1\boldsymbol{v}_1^T \\ \sigma_2\boldsymbol{v}_2^T \\ \vdots\ \\ \sigma_n\boldsymbol{v}_n^T \\ \end{array}\right) $$(we haven't shown all the zero elements of $\boldsymbol{\Sigma}$). Simplifying this further then gives,
$$ \boldsymbol{K} = \sigma_1\boldsymbol{u}_1\boldsymbol{v}_1^T + \sigma_2\boldsymbol{u}_2\boldsymbol{v}_2^T + \cdots + \sigma_n\boldsymbol{u}_n\boldsymbol{v}_n^T $$THINK ABOUT: $\boldsymbol{u}\boldsymbol{v}^T$ is a rank 1 matrix - why?
Here are the full details. For $\boldsymbol{B}\in\mathbb{R}^{m\times n}$
$$ \boldsymbol{B} = \boldsymbol{U}\boldsymbol{\Sigma}\boldsymbol{V}^T =\sum_{j=1}^{p} \sigma_j \boldsymbol{u}_j\boldsymbol{v}_j^T $$where: $\boldsymbol{U}\in\mathbb{R}^{m\times m}$, $\boldsymbol{\Sigma}\in\mathbb{R}^{m\times n}$, $\boldsymbol{V}\in\mathbb{R}^{n\times n}$ and $p=\min\{m,n\}$.
Note that $\boldsymbol{\Sigma}=\text{diag}(\sigma_1,\ldots,\sigma_p)$ and we can always arrange this so that $0 \le \sigma_1\le\cdots\le\sigma_p$.
$\boldsymbol{B}$ is real here (we aren't interested in complex matrices), then $\boldsymbol{U}$ and $\boldsymbol{V}$ are real and orthogonal.
If $\sigma_r\ne 0$ and $\sigma_p= 0$ for all $p>r$ then $r$ is the rank of $\boldsymbol{B}$.
Note storage for $\boldsymbol{B}$ is $mn$. That for the SVD is $r(m+n+1)$. The ratio is
$$ \frac{r(m+n+1)}{mn} = \frac{r}{n} + \frac{r}{m} + \frac{r}{mn} $$We saw above that the zero columns were missing from $\boldsymbol{\Sigma}$
when we used numpy
to calculate the SVD. This is called the
Thin SVD.
Here we still have $\boldsymbol{B}\in\mathbb{R}^{m\times n}$ but with
$$ \boldsymbol{B} = \boldsymbol{U}_1\boldsymbol{\Sigma}_1\boldsymbol{V}^T =\sum_{j=1}^{n} \sigma_j \boldsymbol{u}_j\boldsymbol{v}_j^T $$where: $\boldsymbol{U}_1\in\mathbb{R}^{m\times n}$, $\boldsymbol{\Sigma}_1\in\mathbb{R}^{n\times n}$.
It is called the thin SVD because we drop the values that make no contribution (i.e. the zeros).
Don't worry too much about this. We'll let numpy
do the hard work for us.
In the lab we are going to see how the SVD can be used to compress data.
We'll use image compression as an example.
Take a good quality jpeg colour photo (e.g. on your phone) of something vivid, detailed and colourful and save it on your account (One Drive, for example) so that your Jupyter notebook in Anaconda can use it.
We are going to use the SVD to compress the image.
numpy
in python
.