Linear equations and matrix inverses (BV Chapter 11)

In [1]:
using Pkg
Pkg.activate(pwd())
Pkg.instantiate()
  Activating environment at `~/Documents/github.com/ucla-biostat216-2021fall.github.io/slides/06-matinv/Project.toml`
In [2]:
using LinearAlgebra

Left and right inverses

  • A left inverse of $\mathbf{A} \in \mathbb{R}^{m \times n}$ is any matrix $\mathbf{X} \in \mathbb{R}^{n \times m}$ such that $$ \mathbf{X} \mathbf{A} = \mathbf{I}_n. $$ $\mathbf{A}$ is left invertible if it has at least one left inverse.
In [3]:
# a 3x2 matrix
A = [-3 -4; 4 6; 1 1]
Out[3]:
3×2 Matrix{Int64}:
 -3  -4
  4   6
  1   1
In [4]:
# X1 is a left inverse of A
X1 = [-11//9 -10//9 16//9; 7//9 8//9 -11//9]
Out[4]:
2×3 Matrix{Rational{Int64}}:
 -11//9  -10//9   16//9
   7//9    8//9  -11//9
In [5]:
Int.(X1 * A)
Out[5]:
2×2 Matrix{Int64}:
 1  0
 0  1
In [6]:
# B2 is another left inverse of A
X2 = [0 -1//2 3; 0 1//2 -2]
Out[6]:
2×3 Matrix{Rational{Int64}}:
 0//1  -1//2   3//1
 0//1   1//2  -2//1
In [7]:
Int.(X2 * A)
Out[7]:
2×2 Matrix{Int64}:
 1  0
 0  1
  • A right inverse of $\mathbf{A} \in \mathbb{R}^{m \times n}$ is any matrix $\mathbf{X} \in \mathbb{R}^{n \times m}$ such that $$ \mathbf{A} \mathbf{X} = \mathbf{I}_m. $$ $\mathbf{A}$ is right invertible if it has at least one right inverse.
In [8]:
# a 2x3 matrix
A = [1 0 1; 0 1 1]
Out[8]:
2×3 Matrix{Int64}:
 1  0  1
 0  1  1
In [9]:
# X1 is a right inverse of A
X1 = [1//2 -1//2; -1//2 1//2; 1//2 1//2]
Out[9]:
3×2 Matrix{Rational{Int64}}:
  1//2  -1//2
 -1//2   1//2
  1//2   1//2
In [10]:
Int.(A * X1)
Out[10]:
2×2 Matrix{Int64}:
 1  0
 0  1
In [11]:
# X2 is anoter right inverse of A
X2 = [1 0; 0 1; 0 0]
Out[11]:
3×2 Matrix{Int64}:
 1  0
 0  1
 0  0
In [12]:
Int.(A * X2)
Out[12]:
2×2 Matrix{Int64}:
 1  0
 0  1
In [13]:
# X3 is yet anoter right inverse of A
X3 = [1 -1; 0 0; 0 1]
Out[13]:
3×2 Matrix{Int64}:
 1  -1
 0   0
 0   1
In [14]:
Int.(A * X3)
Out[14]:
2×2 Matrix{Int64}:
 1  0
 0  1
  • $\mathbf{X}$ is a left inverse of $\mathbf{A}$ if and only if $\mathbf{X}'$ is a right inverse of $\mathbf{A}'$. $$ \mathbf{A}' \mathbf{X}' = \mathbf{I} = (\mathbf{X} \mathbf{A})'. $$

When does a matrix have a left or right inverse?

  • For $\mathbf{A} \in \mathbb{R}^{m \times n}$,

    1. $\mathbf{A}$ is right invertible if and only if $\text{rank}(\mathbf{A}) = m$, or equivalently, $\mathbf{A}$ has full row rank, or equivalently $\mathbf{A}$ has independent rows. (Only a square or wide matrix can have right inverses.)
    2. $\mathbf{A}$ is left invertible if and only if $\text{rank}(\mathbf{A}) = n$, or equivalently, $\mathbf{A}$ has full column rank, or equivalently $\mathbf{A}$ has independent columns. (Only a square or tall matrix can have left inverses.)

      Proof of (1). For the if part, since column rank equals row rank, the columns of $\mathbf{A}$ span $\mathbb{R}^m$. Then for each column $\mathbf{e}_j$ of $\mathbf{I}_m$, there exists $\mathbf{c}_j$ such that $\mathbf{A} \mathbf{c}_j = \mathbf{e}_j$. Collecting $\mathbf{c}_j$ as columns of $\mathbf{C}$, we proved $\mathbf{A} \mathbf{C} = \mathbf{I}_m$.

      For the only if part, since $\mathbf{A} \mathbf{C} = \mathbf{I}_m$, we have $m \ge \text{rank}(\mathbf{A}) \ge \text{rank}(\mathbf{A} \mathbf{C}) = \text{rank}(\mathbf{I}_m) = m$. Thus $\text{rank}(\mathbf{A}) = m$.

      Proof of (2). Apply (1) to $\mathbf{A}'$.

  • If $\mathbf{A} \in \mathbb{R}^{m \times n}$ has a left inverse $\mathbf{X}$ and a right inverse $\mathbf{Y}$, then

    1. $\mathbf{A}$ must be a square, nonsingular matrix, and $\mathbf{X} = \mathbf{Y}$.
    2. $\mathbf{A}$ has a unique left inverse and a unique right inverse, which we call the inverse of $\mathbf{A}$ and denote it by $\mathbf{A}^{-1}$.

      Proof: From previous theorem, we have $m = \text{rank}(\mathbf{A}) = n$. Thus $\mathbf{A}$ is square, and $\mathbf{X}$ and $\mathbf{Y}$ as well. Then $$ \mathbf{X} = \mathbf{X} \mathbf{I}_n = \mathbf{X} (\mathbf{A} \mathbf{Y}) = (\mathbf{X} \mathbf{A}) \mathbf{Y} = \mathbf{Y}. $$ To show the uniqueness of left inverse, suppose $\mathbf{Z}$ is another left inverse. Then $$ \mathbf{Z} \mathbf{A} = \mathbf{I}_n = \mathbf{X} \mathbf{A}. $$ Multiplying both sides on the right by right inverse $\mathbf{Y}$, we obtain $\mathbf{Z} = \mathbf{X}$. Thus left inverse is unique. Uniqueness of right inverse is shown in a similar fashion.

  • Invertibility conditions. For a square matrix,

    Therefore, for a square matrix $\mathbf{A}$, the following are equivalent:

    1. $\mathbf{A}$ is invertible.
    2. The columns of $\mathbf{A}$ are linear independent.
    3. The rows of $\mathbf{A}$ are linear independent.
    4. $\mathbf{A}$ has a left inverse.
    5. $\mathbf{A}$ has a right inverse.
  • Let $\mathbf{A} \in \mathbb{R}^{m \times n}$. Then

    1. $\mathbf{A}$ has full column rank if and only if $\mathbf{A}'\mathbf{A}$ is nonsingular. In this case, $(\mathbf{A}'\mathbf{A})^{-1} \mathbf{A}'$ is a left inverse of $\mathbf{A}$.

    2. $\mathbf{A}$ has full row rank if and only if $\mathbf{A}\mathbf{A}'$ is nonsingular. In this case, $\mathbf{A}'(\mathbf{A} \mathbf{A}')^{-1}$ is a right inverse of $\mathbf{A}$.

      Proof: easy.

In [15]:
A = [-1 1 -3; 1 -1 1; 2 2 2]
Out[15]:
3×3 Matrix{Int64}:
 -1   1  -3
  1  -1   1
  2   2   2
In [16]:
A⁻¹ = inv(A)
Out[16]:
3×3 Matrix{Float64}:
  0.5   1.0  0.25
 -0.0  -0.5  0.25
 -0.5  -0.5  0.0
In [17]:
A⁻¹ * A
Out[17]:
3×3 Matrix{Float64}:
 1.0  0.0  0.0
 0.0  1.0  0.0
 0.0  0.0  1.0
In [18]:
A * A⁻¹
Out[18]:
3×3 Matrix{Float64}:
 1.0  0.0  0.0
 0.0  1.0  0.0
 0.0  0.0  1.0

Examples of matrix inverse:

  • Identity matrix $\mathbf{I}_n$ is invertible, with inverse $\mathbf{I}_n$.

  • A diagonal matrix is invertible if all diagonal entries are nonzero. $$ \text{diag}(a_{11}, \ldots, a_{nn})^{-1} = \text{diag}(a_{11}^{-1}, \ldots, a_{nn}^{-1}). $$

  • A $2 \times 2$ matrix is invertible if and only if $a_{11} a_{22} \ne a_{12} a_{21}$ $$ \begin{pmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{pmatrix}^{-1} = \frac{1}{a_{11} a_{22} - a_{12} a_{21}} \begin{pmatrix} a_{22} & -a_{12} \\ -a_{21} & a_{11} \end{pmatrix}. $$

  • Inverse of orthogonal matrix. If $\mathbf{A}$ is square and has orthonormal columns, then $$ \mathbf{A}^{-1} = \mathbf{A}'. $$

  • Inverse of matrix transpose. If $\mathbf{A}$ is invertible, then $$ (\mathbf{A}')^{-1} = (\mathbf{A}^{-1})'. $$

  • Inverse of matrix product. If $\mathbf{A}$ and $\mathbf{B}$ are invertible, then $$ (\mathbf{A} \mathbf{B})^{-1} = \mathbf{B}^{-1} \mathbf{A}^{-1}. $$

Linear equations and left/right inverses

  • Solving linear system \begin{eqnarray*} a_{11} x_1 + \cdots + a_{1n} x_n &=& b_1 \\ a_{21} x_1 + \cdots + a_{2n} x_n &=& b_2 \\ &\vdots& \\ a_{m1} x_1 + \cdots + a_{mn} x_n &=& b_m \end{eqnarray*} or $$ \mathbf{A} \mathbf{x} = \mathbf{b}, $$ where $\mathbf{A} \in \mathbb{R}^{m \times n}$ (coefficient matrix), $\mathbf{x} \in \mathbb{R}^n$ (solution vector), and $\mathbf{b} \in \mathbb{R}^m$ (right hand side) is a central theme in linear algebra.

  • Left-invertible matrix: If $\mathbf{X}$ is a left inverse of $\mathbf{A}$, then $$ \mathbf{A} \mathbf{x} = \mathbf{b} $$ implies (multiplying both sizes by $\mathbf{X}$) $$ \mathbf{x} = \mathbf{X} \mathbf{b}. $$ There is at most one solution. If there is a solution, it must be equal to $\mathbf{X} \mathbf{b}$. Note there might be no solution! $\mathbf{X} \mathbf{b}$ gives us a candidate solution to check. Why there can be at most one solution? (Recall there can be infinitely many left inverses. Why each left inverse will produce the same solution?)

  • Right-invertible matrix: If $\mathbf{X}$ is a right inverse of $\mathbf{A}$, then $$ \mathbf{A} (\mathbf{X} \mathbf{b}) = \mathbf{b}. $$ This says there is at least one solution $\mathbf{x} = \mathbf{X} \mathbf{b}$.

  • Invertible matrix: If a square $\mathbf{A}$ is invertible, then $$ \mathbf{A} \mathbf{x} = \mathbf{b} $$ implies (multiplying both sizes of $\mathbf{A}^{-1}$) $$ \mathbf{x} = \mathbf{A}^{-1} \mathbf{b}. $$ There is a unique solution.

  • Example. Consider the 3x2 matrix $$ \mathbf{A} = \begin{pmatrix} -3 & -4 \\ 4 & 6 \\ 1 & 1 \end{pmatrix}. $$

In [19]:
# a 3x2 matrix
A = [-3 -4; 4 6; 1 1]
Out[19]:
3×2 Matrix{Int64}:
 -3  -4
  4   6
  1   1

with left inverses

In [20]:
# X1 is a left inverse of A
X1 = [-11//9 -10//9 16//9; 7//9 8//9 -11//9]
Out[20]:
2×3 Matrix{Rational{Int64}}:
 -11//9  -10//9   16//9
   7//9    8//9  -11//9
In [21]:
# B2 is another left inverse of A
X2 = [0 -1//2 3; 0 1//2 -2]
Out[21]:
2×3 Matrix{Rational{Int64}}:
 0//1  -1//2   3//1
 0//1   1//2  -2//1

The over-determined linear equations $$ \begin{pmatrix} -3 & -4 \\ 4 & 6 \\ 1 & 1 \end{pmatrix} \mathbf{x} = \begin{pmatrix} 1 \\ -2 \\ 0 \end{pmatrix} $$ has a unique solution $\mathbf{x} = \begin{pmatrix} 1 \\ -1 \end{pmatrix}$, which can be obtained from either of the left inverses.

In [22]:
Int.(X1 * [1, -2 , 0])
Out[22]:
2-element Vector{Int64}:
  1
 -1
In [23]:
Int.(X2 * [1, -2 , 0])
Out[23]:
2-element Vector{Int64}:
  1
 -1

The over-determined linear equations $$ \begin{pmatrix} -3 & -4 \\ 4 & 6 \\ 1 & 1 \end{pmatrix} \mathbf{x} = \begin{pmatrix} 1 \\ -1 \\ 0 \end{pmatrix} $$ has no solution since $\mathbf{x} = \mathbf{X}_1 \begin{pmatrix} 1 \\ -1 \\ 0 \end{pmatrix} = \begin{pmatrix} 1/2 \\ -1/2 \end{pmatrix}$ does not satisfy $\mathbf{A} \mathbf{x} = \begin{pmatrix} 1 \\ -1 \\ 0 \end{pmatrix}$.

In [24]:
A * (X1 * [1, -1, 0]) == [1, -1, 0]
Out[24]:
false

The under-determined linear equations $$ \mathbf{A}' \mathbf{y} = \begin{pmatrix} -3 & 4 & 1 \\ -4 & 6 & 1 \end{pmatrix} \begin{pmatrix} y_1 \\ y_2 \\ y_3 \end{pmatrix} = \begin{pmatrix} 1 \\ 2 \end{pmatrix} $$ has (different) solutions

In [25]:
X1' * [1, 2]
Out[25]:
3-element Vector{Rational{Int64}}:
  1//3
  2//3
 -2//3
In [26]:
# check solution
A' * (X1' * [1, 2]) == [1, 2]
Out[26]:
true
In [27]:
X2' * [1, 2]
Out[27]:
3-element Vector{Rational{Int64}}:
  0//1
  1//2
 -1//1
In [28]:
# check solution
A' * (X2' * [1, 2]) == [1, 2]
Out[28]:
true

Recall that both $\mathbf{X}_1'$ and $\mathbf{X}_2'$ are right inverses of $\mathbf{A}'$.

Solving linear equations - triangular systems

  • To solve $\mathbf{A} \mathbf{x} = \mathbf{b}$, where $\mathbf{A} \in \mathbb{R}^{n \times n}$ is lower triangular $$ \begin{pmatrix} a_{11} & 0 & \cdots & 0 \\ a_{21} & a_{22} & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ a_{n1} & a_{n2} & \cdots & a_{nn} \end{pmatrix} \begin{pmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{pmatrix} = \begin{pmatrix} b_1 \\ b_2 \\ \vdots \\ b_n \end{pmatrix}. $$

    Forward substitution algorithm: $$ \begin{eqnarray*} x_1 &=& b_1 / a_{11} \\ x_2 &=& (b_2 - a_{21} x_1) / a_{22} \\ x_3 &=& (b_3 - a_{31} x_1 - a_{32} x_2) / a_{33} \\ &\vdots& \\ x_n &=& (b_n - a_{n1} x_1 - a_{n2} x_2 - \cdots - a_{n,n-1} x_{n-1}) / a_{nn}. \end{eqnarray*} $$ $1 + 3 + 5 + \cdots + (2n-1) = n^2$ flops.

  • To solve $\mathbf{A} \mathbf{x} = \mathbf{b}$, where $\mathbf{A} \in \mathbb{R}^{n \times n}$ is upper triangular
    $$ \begin{pmatrix} a_{11} & \cdots & a_{1,n-1} & a_{1n} \\ \vdots & \ddots & \vdots & \vdots \\ 0 & \cdots & a_{n-1,n-1} & a_{n-1,n} \\ 0 & 0 & 0 & a_{nn} \end{pmatrix} \begin{pmatrix} x_1 \\ \vdots \\ x_{n-1} \\ x_n \end{pmatrix} = \begin{pmatrix} b_1 \\ \vdots \\ b_{n-1} \\ b_n \end{pmatrix}. $$ Back substitution algorithm: $$ \begin{eqnarray*} x_n &=& b_n / a_{nn} \\ x_{n-1} &=& (b_{n-1} - a_{n-1,n} x_n) / a_{n-1,n-1} \\ x_{n-2} &=& (b_{n-2} - a_{n-2,n-1} x_{n-1} - a_{n-2,n} x_n) / a_{n-2,n-2} \\ &\vdots& \\ x_1 &=& (b_1 - a_{12} x_2 - a_{13} x_3 - \cdots - a_{1,n} x_{n}) / a_{11}. \end{eqnarray*} $$ $n^2$ flops.

Solving linear equations by QR

  • For an invertible matrix $\mathbf{A} \in \mathbb{R}^{n \times n}$ and $\mathbf{b} \in \mathbb{R}^n$, we can solve the linear equation $\mathbf{A} \mathbf{x} = \mathbf{b}$ by following steps.

    • Step 1: Compute QR factorization: $\mathbf{A} = \mathbf{Q} \mathbf{R}$. $2n^3$ flops.
    • Step 2: Compute $\mathbf{Q}^T \mathbf{b}$. $2n^2$ flops.
    • Step 3: Solve the triangular equation $\mathbf{R} \mathbf{x} = \mathbf{Q}' \mathbf{b}$ by back substitution. $n^2$ flops.

      The total number of flops is $2n^3 + 3n^2$. The dominant step is $2n^3$.

  • Factor-solve method with multiple right-hand sides. For multiple right hand sides $\mathbf{b}$, we only need to do Step 1 once.

  • Compute the inverse, $\mathbf{B}$, of an invertible matrix $\mathbf{A} \in \mathbb{R}^{n \times n}$.

    • Step 1. QR factorization $\mathbf{A} = \mathbf{Q} \mathbf{R}$.
    • Step 2. For $i=1,\ldots,n$, solve the triangular equation $\mathbf{R} \mathbf{b}_i = \mathbf{s}_i$, where $\mathbf{s}_i$ is the $i$-th row of $\mathbf{Q}$.

      Total computational cost is $2n^3 + n^3 = 3n^3$ flops.

  • Question. Two ways to solve a linear system. qr(A) \ b vs inv(A) * b. Which way is better?

Solving linear equations by Gaussian elimination and LU factorization

In this section, we (1) review the Gaussian elimination (GE) for solving linear equations $\mathbf{A} \mathbf{x} = \mathbf{b}$, and then (2) show that GE leads to a useful decomposition $$ \mathbf{A} = \mathbf{L} \mathbf{U}. $$

Gaussian elimination (GE)

Let's review (from any undergraduate linear algebra textbook) how to solve a linear system $$ \begin{pmatrix} 2 & 1 & -1 \\ -3 & -1 & 2 \\ -2 & 1 & 2 \end{pmatrix} \begin{pmatrix} x_1 \\ x_2 \\ x_3 \end{pmatrix} = \begin{pmatrix} 8 \\ -11 \\ -3 \end{pmatrix}. $$

Stage 1: Let's eliminate variable $x_1$ from equations (2) and (3).

Multiplying equation (1) by $\ell_{21} = 3/2$ and adds to equation (2) leads to $$ \begin{pmatrix} 2 & 1 & -1 \\ 0 & 1/2 & 1/2 \\ -2 & 1 & 2 \end{pmatrix} \begin{pmatrix} x_1 \\ x_2 \\ x_3 \end{pmatrix} = \begin{pmatrix} 8 \\ 1 \\ -3 \end{pmatrix}. $$

Multiplying equation (1) by $\ell_{31} = 1$ and adds to equation (3) leads to $$ \begin{pmatrix} 2 & 1 & -1 \\ 0 & 1/2 & 1/2 \\ 0 & 2 & 1 \end{pmatrix} \begin{pmatrix} x_1 \\ x_2 \\ x_3 \end{pmatrix} = \begin{pmatrix} 8 \\ 1 \\ 5 \end{pmatrix}. $$

Stage 2: Let's eliminate variable $x_2$ from equation (3).

Multiplying equation (2) by $\ell_{32} = -4$ and adds to equation (3) leads to $$ \begin{pmatrix} 2 & 1 & -1 \\ 0 & 1/2 & 1/2 \\ 0 & 0 & -1 \end{pmatrix} \begin{pmatrix} x_1 \\ x_2 \\ x_3 \end{pmatrix} = \begin{pmatrix} 8 \\ 1 \\ 1 \end{pmatrix}. $$

Stage 3: Now we can collect results by back-solve or back-substitution.

From the equation (3), $x_3 = 1$.

Substituting $x_3 = 1$ to equation (2) and solve for $x_2 = 3$.

Substituting $x_2=3$ and $x_3 = 1$ to equation (1) and solve for $x_2 = 2$.

This is essentially how computer solves linear equation:

In [29]:
using LinearAlgebra

A = [2.0 1.0 -1.0; -3.0 -1.0 2.0; -2.0 1.0 2.0]
Out[29]:
3×3 Matrix{Float64}:
  2.0   1.0  -1.0
 -3.0  -1.0   2.0
 -2.0   1.0   2.0
In [30]:
b = [8.0, -11.0, -3.0]
Out[30]:
3-element Vector{Float64}:
   8.0
 -11.0
  -3.0
In [31]:
# Julia way to solve linear equation
# equivalent to `solve(A, b)` in R
A \ b
Out[31]:
3-element Vector{Float64}:
  2.0000000000000004
  2.9999999999999996
 -0.9999999999999994

LU decomposition

Let's collect those multipliers $-\ell_{ij}$ into a unit lower triangular matrix $\mathbf{L}$

In [32]:
L = [1.0 0.0 0.0; -1.5 1.0 0.0; -1.0 4.0 1.0]
Out[32]:
3×3 Matrix{Float64}:
  1.0  0.0  0.0
 -1.5  1.0  0.0
 -1.0  4.0  1.0

and save the resultant upper triangular matrix after GE into $\mathbf{U}$

In [33]:
U = [2.0 1.0 -1.0; 0.0 0.5 0.5; 0.0 0.0 -1.0]
Out[33]:
3×3 Matrix{Float64}:
 2.0  1.0  -1.0
 0.0  0.5   0.5
 0.0  0.0  -1.0

Surprisingly, we find that $\mathbf{A} = \mathbf{L} \mathbf{U}$!

In [34]:
A  L * U
Out[34]:
true
  • Theorem: For any nonsingular $\mathbf{A} \in \mathbb{R}^{n \times n}$, there exists a unique unit lower triangular matrix $\mathbf{L}$ and upper triangular matrix $\mathbf{U}$ such that $$ \mathbf{A} = \mathbf{L} \mathbf{U}. $$

  • Given LU decomposition $\mathbf{A} = \mathbf{L} \mathbf{U}$, for a new right hand side $\mathbf{b}$, the solution to $\mathbf{A} \mathbf{x} = \mathbf{L} \mathbf{U} \mathbf{x} = \mathbf{b}$ is readily given by two triangular solves: \begin{eqnarray*} \mathbf{L} \mathbf{y} &=& \mathbf{b} \\ \mathbf{U} \mathbf{x} &=& \mathbf{y}. \end{eqnarray*}

  • Indeed, computer performs GE/LU on a row-permuted version of $\mathbf{A}$ (pivoting) for numerical stability. That is $$ \mathbf{P} \mathbf{A} = \mathbf{L} \mathbf{U}, $$ where $\mathbf{P}$ is a permutation matrix. All multipliers $\ell_{ij}$ in $\mathbf{L}$ has magnitude $\le 1$.

  • Total computational cost of LU decomposition is $(2/3)n^3$ flops.

In [35]:
A
Out[35]:
3×3 Matrix{Float64}:
  2.0   1.0  -1.0
 -3.0  -1.0   2.0
 -2.0   1.0   2.0
In [36]:
Alu = lu(A)
Out[36]:
LU{Float64, Matrix{Float64}}
L factor:
3×3 Matrix{Float64}:
  1.0       0.0  0.0
  0.666667  1.0  0.0
 -0.666667  0.2  1.0
U factor:
3×3 Matrix{Float64}:
 -3.0  -1.0      2.0
  0.0   1.66667  0.666667
  0.0   0.0      0.2
In [37]:
Alu.p
Out[37]:
3-element Vector{Int64}:
 2
 3
 1
In [38]:
A[Alu.p, :]  Alu.L * Alu.U
Out[38]:
true

Generalized inverse and Moore-Penrose inverse (optional)

  • Let $\mathbf{A} \in \mathbb{R}^{m \times n}$. A matrix $\mathbf{G} \in \mathbb{R}^{n \times m}$ is called the Moore-Penrose inverse or MP inverse of $\mathbf{A}$ if it satisifes following four conditions:
    (1) $\mathbf{A} \mathbf{G} \mathbf{A} = \mathbf{A}$. $\quad \quad$ (generalized inverse, $g_1$ inverse, or inner pseudo-inverse)
    (2) $\mathbf{G} \mathbf{A} \mathbf{G} = \mathbf{G}$. $\quad \quad$ (outer pseudo-inverse)
    (3) $(\mathbf{A} \mathbf{G})' = \mathbf{A} \mathbf{G}$.
    (4) $(\mathbf{G} \mathbf{A})' = \mathbf{G} \mathbf{A}$.
    We shall denote the Moore-Penrose inverse of $\mathbf{A}$ by $\mathbf{A}^+$.

  • Any matrix $\mathbf{G} \in \mathbb{R}^{n \times m}$ that satisfies (1) is called a generalized inverse, or $g_1$ inverse, or inner pseudo-inverse. Denoted by $\mathbf{A}^-$ or $\mathbf{A}^g$.

    Generalized inverse is not unique in general.

  • Definition (1) for generalized inverses is motivated by the following result.

    If $\mathbf{A} \mathbf{x} = \mathbf{b}$ has solution(s), then $\mathbf{A}^- \mathbf{b}$ is a solution for any generalized inverse $\mathbf{A}^-$.

    Proof: $\mathbf{A} \mathbf{A}^- \mathbf{b} = \mathbf{A} \mathbf{A}^- \mathbf{A} \mathbf{x} = \mathbf{A} \mathbf{x} = \mathbf{b}$.

  • Any matrix $\mathbf{G} \in \mathbb{R}^{n \times m}$ that satisfies (1)+(2) is called a reflective generalized inverse, or $g_2$ inverse, or outer pseudo-inverse. Denoted by $\mathbf{A}^*$.

  • The Moore-Penrose inverse of any matrix $\mathbf{A}$ exists and is unique.

    Proof of uniqueness: Let $\mathbf{G}_1, \mathbf{G}_2 \in \mathbb{R}^{n \times m}$ be two matrices satisfying properties (1)-(4). Then
    $$ \mathbf{A} \mathbf{G}_1 = (\mathbf{A} \mathbf{G}_1)' = \mathbf{G}_1' \mathbf{A}' = \mathbf{G}_1' (\mathbf{A} \mathbf{G}_2 \mathbf{A})' = \mathbf{G}_1' \mathbf{A}' \mathbf{G}_2' \mathbf{A}' = (\mathbf{A} \mathbf{G}_1)' (\mathbf{A} \mathbf{G}_2)' = \mathbf{A} \mathbf{G}_1 \mathbf{A} \mathbf{G}_2 = \mathbf{A} \mathbf{G}_2. $$ Similarly, $$ \mathbf{G}_1 \mathbf{A} = (\mathbf{G}_1 \mathbf{A})' = \mathbf{A}' \mathbf{G}_1' = (\mathbf{A} \mathbf{G}_2 \mathbf{A})' \mathbf{G}_1' = \mathbf{A}' \mathbf{G}_2' \mathbf{A}' \mathbf{G}_1' = (\mathbf{G}_2 \mathbf{A})' (\mathbf{G}_1 \mathbf{A})' = \mathbf{G}_2 \mathbf{A} \mathbf{G}_1 \mathbf{A} = \mathbf{G}_2 \mathbf{A}. $$ Hence, $$ \mathbf{G}_1 = \mathbf{G}_1 \mathbf{A} \mathbf{G}_1 = \mathbf{G}_1 \mathbf{A} \mathbf{G}_2 = \mathbf{G}_2 \mathbf{A} \mathbf{G}_2 = \mathbf{G}_2. $$

    Proof of existence: TODO later. We construct a matrix that satisfies (1)-(4) using the singular value decomposition (SVD) of $\mathbf{A}$.

  • Following are true:

    1. $(\mathbf{A}^-)'$ is a generalized inverse of $\mathbf{A}'$.
    2. $(\mathbf{A}')^+ = (\mathbf{A}^+)'$.
    3. For any nonzero $k$, $(1/k) \mathbf{A}^-$ is a generalized inverse of $k \mathbf{A}$.
    4. $(k \mathbf{A})^+ = (1/k) \mathbf{A}^+$.

      Proof: Check properties (1)-(4).

  • Multiplication by generalized inverse does not change rank.

    1. $\mathcal{C}(\mathbf{A}) = \mathcal{C}(\mathbf{A} \mathbf{A}^-)$ and $\mathcal{C}(\mathbf{A}') = \mathcal{C}((\mathbf{A}^- \mathbf{A})')$.
    2. $\text{rank}(\mathbf{A}) = \text{rank}(\mathbf{A} \mathbf{A}^-) = \text{rank}(\mathbf{A}^- \mathbf{A})$.

      Proof of 1: We already know $\mathcal{C}(\mathbf{A}) \supseteq \mathcal{C}(\mathbf{A} \mathbf{A}^-)$. Now since $\mathbf{A} = \mathbf{A} \mathbf{A}^- \mathbf{A}$, we also have $\mathcal{C}(\mathbf{A}) \subseteq \mathcal{C}(\mathbf{A} \mathbf{A}^-)$.
      Proof of 2: Since $\mathbf{A} = \mathbf{A} \mathbf{A}^- \mathbf{A}$, $\text{rank}(\mathbf{A}) \le \text{rank}(\mathbf{A} \mathbf{A}^-) \le \text{rank}(\mathbf{A})$.

  • Generalized inverse has equal or larger rank. $\text{rank}(\mathbf{A}^-) \ge \text{rank}(\mathbf{A})$.

    Proof: $\text{rank}(\mathbf{A}) = \text{rank}(\mathbf{A}\mathbf{A}^- \mathbf{A}) \le \text{rank}(\mathbf{A}\mathbf{A}^-) \le \text{rank}(\mathbf{A}^-)$.

  • Let $\mathbf{A} \in \mathbb{R}^{m \times n}$.

    1. If $\mathbf{A}$ has full column rank, then the MP inverse is $\mathbf{A}^+ = (\mathbf{A}'\mathbf{A})^{-1} \mathbf{A}'$. Given QR factorization $\mathbf{A} = \mathbf{Q} \mathbf{R}$, $\mathbf{A}^+ = \mathbf{R}^{-1} \mathbf{Q}'$.

    2. If $\mathbf{A}$ has full row rank, then the MP inverse is $\mathbf{A}^+ = \mathbf{A}'(\mathbf{A} \mathbf{A}')^{-1}$. Given QR factorization $\mathbf{A}' = \mathbf{Q} \mathbf{R}$, $\mathbf{A}^+ = \mathbf{Q} \mathbf{R}'^{-1}$.

In [39]:
# a 3x2 matrix (full column rank)
A = [-3 -4; 4 6; 1 1]
Out[39]:
3×2 Matrix{Int64}:
 -3  -4
  4   6
  1   1
In [40]:
# compute MP inverse using SVD
pinv(A)
Out[40]:
2×3 Matrix{Float64}:
 -1.22222   -1.11111    1.77778
  0.777778   0.888889  -1.22222
In [41]:
# compute MP inverse using pseudo-inverse
(A'A) \ A'
Out[41]:
2×3 Matrix{Float64}:
 -1.22222   -1.11111    1.77778
  0.777778   0.888889  -1.22222
In [42]:
# compute MP inverse by QR
qra = qr(A)
Out[42]:
LinearAlgebra.QRCompactWY{Float64, Matrix{Float64}}
Q factor:
3×3 LinearAlgebra.QRCompactWYQ{Float64, Matrix{Float64}}:
 -0.588348  -0.457604  0.666667
  0.784465  -0.522976  0.333333
  0.196116   0.719092  0.666667
R factor:
2×2 Matrix{Float64}:
 5.09902   7.2563
 0.0      -0.588348
In [43]:
qra.R \ qra.Q[:, 1:2]'
Out[43]:
2×3 Matrix{Float64}:
 -1.22222   -1.11111    1.77778
  0.777778   0.888889  -1.22222