Week 5 - What are eigenvalues and eigenvectors?

What are eigen-things?

What are eigenvalues and eigenvectors?

  • "Eigen" is translated from German as "characteristic."
  • "Eigenproblem" is about finding characteristic properties of something.
  • Geometric interpretation of Eigenvector and Eigenvalue (00:45-04:22)

    • Though we typically visualize linear transformations based on how they affect a single vector, we can also consider how they affect every vector in the space by drawing a square.

      Visualing all vectors in space as square

      We can then see how a transformation distorts the shape of the square.

    • If we apply a scaling of 2 in vertical direction ([1002])\left( \begin{bmatrix}1 & 0 \\ 0 & 2\end{bmatrix} \right) the square becomes a rectangle.

      Vertical scaling

    • A horizontal sheer on the other hand looks like this: ([1s01])\left( \begin{bmatrix}1 & s \\ 0 & 1\end{bmatrix} \right)

      Horizontal sheer transform

    • When we perform these operations:

      • Some vectors point in the same direction but change length.
      • Some vectors point in a new direction.
      • Some vectors do not change.

      Highlighted vectors after scaling

    • The vectors that point in the same direction we refer to as Eigenvector.

    • The vectors that point in the same direction and whose size does not change are said to have Eigenvalues 1.
      • In the above example, the vertical Eigenvector doubles in length, with an Eigenvalue of 2.
    • In a pure sheer operation, only the horizontal vector is unchanged. So the transformation has 1 Eigenvector.
    • In a rotation, there are no Eigenvectors.

Getting into the detail of eigenproblems

Special eigen-cases

  • Recap (00:00-00:18):
    • Eigenvector lie along the same span before and after applying a linear transform to a space.
    • Eigenvalues are the amount we stretch each of those vectors in the process.
  • 3 special Eigen-cases (00:18-02:15)

    • Uniform scaling
      • Scale by the same amount in each direction.
      • All vectors are Eigenvectors.
    • 180° rotation

      • In regular rotation, there are no Eigenvectors. However, in 180° rotation, all vectors become Eigenvector pointing in the opposite direction.
        • Since they are pointing in the opposite direction, we say they have Eigenvalues of -1.

          180 degree rotation Eigenvectors

    • Combination of horizontal sheer and vertical scaling

      • Has 2 Eigenvector. The horizontal vector and a 2nd Eigenvector between the orange and pink vector.

        Eigenvectors after horizontal and vertical scaling

  • We can calculate Eigenvalues in much higher dimensions.

    • In the 3d example, finding the Eigenvector also tells you the axis of rotation.

      3d Eigenvector example showing the axis of rotation

Calculating eigenvectors

  • Calculating Eigenvector in general case (00:24-04:36)
    • Given transformation AA, Eigenvectors stay on the same span after the transformation.
    • We can write the expression as Ax=λxAx = \lambda x where λ\lambda is a scalar value, and xx is the Eigenvector.
    • Trying to find values of x that make the two sides equal.
      • Having A applied to them scales their length (or nothing, same as scaling by 1)
    • A is an n-dimensional transform.
    • To find the solution of the express, we can rewrite:
      • (AλI)x=0(A - \lambda I)x = 0
        • The II is an n×nn \times n Identity Matrix that allows us to subtract a matrix by a scalar, which would otherwise not be defined.
      • For the left-hand side to be 0, either:
        • Contents of the bracket are 0.
        • x is 0
      • We are not interested in the 2nd case as it means it has no length or direction. We call it a "trivial solution."
      • We can test if a matrix operation results in 0 output by calculating its Matrix Determinate: det(AλI)=0det(A - \lambda I) = 0
      • We can apply it to an arbitrary 2x2 matrix: A=[abcd]A = \begin{bmatrix}a & b \\ c & d \end{bmatrix} as follows: det([abcd][λ00λ])=0det(\begin{bmatrix}a & b \\ c & d \end{bmatrix} - \begin{bmatrix}\lambda & 0 \\ 0 & \lambda \end{bmatrix}) = 0
      • Evaluating that gives us the Characteristic Polynomial: λ2(a+d)λ+adbc=0\lambda^{2} - (a+d) \lambda + ad - bc = 0
      • Our Eigenvalues are the solution to this equation. We can then plug the solutions into the original expression.
  • Applying to a simple vertical scaling transformation (04:36-07:53):
    • Give vertical scaling matrix: A=[1002]A = \begin{bmatrix}1 & 0 \\ 0 & 2\end{bmatrix}
    • We calculate the determinate of AIλA - I\lambda: det([1λ002λ])det \left( \begin{bmatrix}1 - \lambda & 0 \\ 0 & 2 - \lambda \end{bmatrix} \right) as (1λ)(2λ)(1-\lambda)(2-\lambda) which equals 00
    • This means our equation has solutions at λ=1\lambda = 1 and λ=2\lambda = 2.
    • We can sub these two values back in:
      • 1st case: @λ=1@\lambda = 1: [110021][x1x2]=[0001][x1x2]=[0x2]=0\begin{bmatrix}1 & -1 & 0 \\ 0 & 2 & -1\end{bmatrix} \begin{bmatrix}x_1 \\ x_2\end{bmatrix} = \begin{bmatrix}0 & 0 \\0 & 1 \end{bmatrix}\begin{bmatrix}x_1 \\ x_2 \end{bmatrix} = \begin{bmatrix}0 \\ x_2\end{bmatrix} = 0
      • 2nd case: @λ=2@\lambda = 2: [120022][x1x2]=[1000][x1x2]=[x10]=0\begin{bmatrix}1 & -2 & 0 \\ 0 & 2 & -2\end{bmatrix} \begin{bmatrix}x_1 \\ x_2\end{bmatrix} = \begin{bmatrix}-1 & 0 \\0 & 0 \end{bmatrix}\begin{bmatrix}x_1 \\ x_2 \end{bmatrix} = \begin{bmatrix}-x_1 \\ 0\end{bmatrix} = 0
    • We know that any vectors that point along the horizontal axis can be Eigenvector of this system.
      • @λ=1@\lambda = 1: x=[t0]x=\begin{bmatrix}t \\ 0\end{bmatrix}
        • When λ=1\lambda=1, the Eigenvector can point anywhere along the horizontal axis.
      • @λ=2@\lambda = 2: x=[0t]x=\begin{bmatrix}0 \\ t\end{bmatrix}
        • When λ=2\lambda=2, the Eigenvector can point anywhere along the vertical axis.
  • Applying to a 90° rotation transformation (07:54-)
    • Transformation: A=[0110]A=\begin{bmatrix}0 & -1 \\ 1 & 0\end{bmatrix}
    • Applying the formula gives us det([0λ110λ)]=λ2+1=0\text{det}( \begin{bmatrix}0 - \lambda & -1 \\ 1 & 0-\lambda) \end{bmatrix} = \lambda^{2} + 1 = 0
    • Which means there are no Eigenvectors.
      • Note that we could still calculate imaginary Eigenvectors using imaginary numbers.
  • In practice, you never have to calculate Eigenvectors by hand.

When changing to the eigenbasis is useful

Changing to the eigenbasis

  • Multiple matrix multiplications

    • Combines the idea of finding Eigenvector and changing basis.
    • Motivation:

      • Sometimes, we have to apply the same matrix multiplication many times.
      • For example, if you have a matrix that represents a change to a single particle after a specific timestep: T=([0.90.810.35])T = \left( \begin{bmatrix}0.9 & 0.8 \\ -1 & 0.35\end{bmatrix} \right) and you apply to a matrix v0=[0.51]v_0 = \begin{bmatrix}0.5 \\ 1 \end{bmatrix} you end up with result: v1=Tv0v_1 = Tv_0
      • You can apply it again to that result v2=Tv1v_2 = Tv_1

        Multiple matrix multiplications

      • If you wanted to apply it millions of times, the operation could be expensive.

        • Can instead square TT to get the same result: v2=T2v0v_2 = {T^2} v_0 or to the power of any nn: vn=Tnv0v_n = {T^n} v_0
        • If T was is a Diagonal Matrices, where all terms along the leading diagonal are 0, you can simply square the non-zero values as: Tn=[an000bn000cn]T^{n} = \begin{bmatrix}a^n & 0 & 0 \\ 0 & b^n & 0 \\ 0 & 0 & c^n\end{bmatrix}
        • When all terms except those along diagonal are 0.
        • If the matrix isn't diagonal, you can construct a Diagonal Matrix using Eigenanalysis.
      • Constructing a Diagonal Matrix
        • Plug in Eigenvectors as columns: C=[x1x2x3.........]C = \begin{bmatrix}x_1 & x_2 & x_3 \\ . & . & . \\ . & . & . \\ . & . & .\end{bmatrix}
        • Create a diagonal matrix from that: D=[λ1000λ2000λ3]D = \begin{bmatrix} \lambda_1 & 0 & 0 \\ 0 & \lambda_2 & 0 \\ 0 & 0 & \lambda_3 \end{bmatrix}
        • We then want to convert back to the original transformation, which we can use the inverse.
        • In summary: T=CDC1T = CDC^{-1}
        • Then T2T^2 would be: T2=CDC1CDC1T^{2} = CDC^{-1}CDC^{1}
        • Since C1CC^1C returns the identity, we can rewrite as: Tn=CD2C1T^{n} = CD^{2}C^{1}
      • The whole process looks like this:

    Multiple matrix multiplications with diagonalization

Eigenbasis example

  • Give a transformation matrix: T=[1102]T = \begin{bmatrix}1 & 1 \\ 0 & 2\end{bmatrix}
  • When we multiply it with vector [10]\begin{bmatrix}1 \\ 0\end{bmatrix} the i^\hat{i} vector is unchanged, so it's an Eigenvector with Eigenvalue 1.
  • When multiplying with [11]\begin{bmatrix}1 \\ 1\end{bmatrix} gives us [22]\begin{bmatrix}2 \\ 2\end{bmatrix}, so it's an Eigenvector with Eigenvalue 2.
  • So, the Eigenvectors are:
    • @λ=1@\lambda = 1: n=[10]n = \begin{bmatrix}1 \\ 0\end{bmatrix}
    • @λ=2@\lambda = 2: n=[11]n = \begin{bmatrix}1 \\ 1\end{bmatrix}
  • Transformation using squaring:
    • What happens to vector [11]\begin{bmatrix}-1 \\ 1\end{bmatrix} when you multiply it numerous times? [1102][11]=[1+10+2]=[02]\begin{bmatrix}1 & 1 \\ 0 & 2\end{bmatrix}\begin{bmatrix}-1 \\ 1\end{bmatrix} = \begin{bmatrix}-1 + 1 \\ 0 + 2\end{bmatrix} = \begin{bmatrix}0 \\ 2\end{bmatrix} [1102][02]=[0+20+4]=[24]\begin{bmatrix}1 & 1 \\ 0 & 2\end{bmatrix}\begin{bmatrix}0 \\ 2\end{bmatrix} = \begin{bmatrix}0 + 2 \\ 0 + 4\end{bmatrix} = \begin{bmatrix}2 \\ 4\end{bmatrix}
    • If we instead started with T2T^2: T2=[1102][1102]=[1304]T^2 = \begin{bmatrix}1 & 1 \\ 0 & 2\end{bmatrix} \begin{bmatrix}1 & 1 \\ 0 & 2\end{bmatrix} = \begin{bmatrix}1 & 3 \\ 0 & 4\end{bmatrix} we can get straight to the answer: [1304][11]=[24]\begin{bmatrix}1 & 3 \\ 0 & 4\end{bmatrix}\begin{bmatrix}-1 \\ 1\end{bmatrix} = \begin{bmatrix}2 \\ 4\end{bmatrix}
  • Transformation using Eigenbasis approach:
    • We have our conversion matrix from our Eigenvectors: C=[1101]C = \begin{bmatrix}1 & 1 \\ 0 & 1\end{bmatrix}
    • We know the inverse is: C1=[1101]C^{-1} = \begin{bmatrix}1 & -1 \\ 0 & 1\end{bmatrix}
    • We can take the Eigenvalues to construct the diagonal matrix is [1002]\begin{bmatrix}1 & 0 \\ 0 & 2\end{bmatrix}
    • The problem is constructed as: T2=CD2C1=[1101][1102][1101]=[1034]T^{2} = CD^2C^{-1} = \begin{bmatrix}1 & 1 \\ 0 & 1\end{bmatrix} \begin{bmatrix}1 & 1 \\ 0 & 2\end{bmatrix} \begin{bmatrix}1 & -1 \\ 0 & 1\end{bmatrix} = \begin{bmatrix}1 & 0 \\ 3 & 4\end{bmatrix}
    • If we apply that to our original vector, we get the same results: [1034][11]=[24]\begin{bmatrix}1 & 0 \\ 3 & 4\end{bmatrix}\begin{bmatrix}-1 \\ 1\end{bmatrix} = \begin{bmatrix}2 \\ 4\end{bmatrix}

Making the PageRank algorithm

Introduction to PageRank

  • PageRank (00:00-07:20)
    • Ranks websites by importance based on the importance of pages that link to them.
      • Central assumption: "the importance of a website is related it links to and from other websites."
    • Model represents a model mini internet:

      Mini internet graph

      • Network 1A1_A would have vector: [0111]\begin{bmatrix}0 & 1 & 1 & 1\end{bmatrix} where a 11 represents a link to it.
      • We then normalise the vector so the total probabilty sums to 1: [01/31/31/3]\begin{bmatrix}0 & 1/3 & 1/3 & 1/3 \end{bmatrix}
      • Network 1B1_B would have vector: [1/2001/2]\begin{bmatrix}1/2 & 0 & 0 & 1/2\end{bmatrix}
      • Nework 1C1_C: [0001]\begin{bmatrix}0 & 0 & 0 & 1\end{bmatrix}
      • Network 1D1_D: [01/21/20]\begin{bmatrix}0 & 1/2 & 1/2 & 0\end{bmatrix}
        • We can then convert each vector into a column of a matrix LL: [01/2001/3001/21/3001/21/31/210]\begin{bmatrix}0 & 1/2 & 0 & 0 \\ 1/3 & 0 & 0 & 1/2 \\ 1/3 & 0 & 0 & 1/2 \\ 1/3 & 1/2 & 1 & 0 \end{bmatrix}
      • This matrix represents the probability of getting to each page. I.e., the only way to get to AA is from BB.
      • You then need to know the probability of getting to BB. You would need to be at either AA or DD.
      • The problem is "self-referential": the ranks of all the pages depend on the ranks of others.
      • The columns are for external links, and the rows describe inward links normalized from the origin page.
        • An expression can be written using vector rr to store the rank of all web pages.
        • We can then get the rank for webpage A, as follows: rA=j=1nLAjrjr_A = \sum\limits_{j=1}^{n} {L_A}_{j} r_j
      • That means that the rank of AA is the sum of ranks of all pages that link to it, weighted by specific link probability taken from LL.
        • We can rewrite that as simple matrix multiplication for all web pages: r=Lrr = Lr
        • Since we start out not knowning RR, we assume all ranks are equal and normalise by total webpages: r=[1/41/41/41/4]r=\begin{bmatrix}1/4 \\ 1/4 \\ 1/4 \\ 1/4\end{bmatrix}
        • Then we keep updating iterative until rr stops changing: ri+1=Lrir^{i+1}={L_r}^{i}
        • Applying repeated means, we are solving iteratively until rr stops changing.
        • This means that rr is an Eigenvector of matrix LL, with an Eigenvalue of 1.
      • We might assume that we could apply the Diagonalisation method, but we would first need to know all the Eigenvalues, which is what we're trying to find.
        • Though there are many approaches for efficiently calculating Eigenvectors, randomly multiplying a randomly selected initial guest vector by a matrix, called the Power Method, is still very effective.
      • The power method will only give you one Eigenvector for an n by n webpage system, the vector you get will be the one you're looking for with an Eigenvalue of 1.
      • The graph for the whole internet will be very Sparse Matrix. Algorithms exist that allow us to perform efficient matrix multiplication.
      • The damping factor DD (07:20-08:24)
        • Adds an additional term to formula: ri+1=d(Lri)+1dnr^{i+1}=d\left( {L_r}^{i} \right) + \frac{1-d}{n}
        • It's the probability that a random web surfer will type a URL instead of clicking
        • Finds compromise between speed and stability.