In [1]:
using Pkg
Pkg.activate(pwd())
Pkg.instantiate()
  Activating environment at `~/Documents/github.com/ucla-biostat216-2021fall.github.io/slides/02-vector/Project.toml`
In [2]:
using LinearAlgebra, Random, Statistics
Random.seed!(216)
Out[2]:
MersenneTwister(216)

Vectors (BV Chapters 1, 3, 5)

Notation

  • A column vector $\mathbf{x} \in \mathbb{R}^n$ $$ \mathbf{x} = \begin{pmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{pmatrix}. $$ $n$ is the dimension or length or size of the vector. $x_i$ is the $i$-th entry or element or component of $\mathbf{x}$.

  • A row vector $\mathbf{x}' = (x_1 \ldots x_n)$.

In [3]:
# a (column) vector in Julia
x = [1, -1, 2, 0, 3]
Out[3]:
5-element Vector{Int64}:
  1
 -1
  2
  0
  3
In [4]:
# row vector
x'
Out[4]:
1×5 adjoint(::Vector{Int64}) with eltype Int64:
 1  -1  2  0  3
  • Subvector: $\mathbf{x}_{2:4} = (x_2, x_3, x_4)'$.
In [5]:
x[2:4]
Out[5]:
3-element Vector{Int64}:
 -1
  2
  0
  • Stacked vector: $$ \mathbf{x} = \begin{pmatrix} \mathbf{y} \\ \mathbf{z} \end{pmatrix} = \begin{pmatrix} y_1 \\ \vdots \\ y_m \\ z_1 \\ \vdots \\ z_n \end{pmatrix}. $$
In [6]:
y = [1, 2, 3]
z = [4, 5]
x = [y; z]
Out[6]:
5-element Vector{Int64}:
 1
 2
 3
 4
 5

Some special vectors

  • Zero vector: $$ \mathbf{0}_n = \begin{pmatrix} 0 \\ \vdots \\ 0 \end{pmatrix}. $$ The subscript denotes the length of the vector; it sometimes is omitted if obvious from context.
In [7]:
zeros(5)
Out[7]:
5-element Vector{Float64}:
 0.0
 0.0
 0.0
 0.0
 0.0
  • One vector: $$ \mathbf{1}_n = \begin{pmatrix} 1 \\ \vdots \\ 1 \end{pmatrix}. $$
In [8]:
ones(5)
Out[8]:
5-element Vector{Float64}:
 1.0
 1.0
 1.0
 1.0
 1.0
  • Unit vector, $\mathbf{e}_i$. It has all zero entries except the $i$-th entry being 1. Note the length of $\mathbf{e}_i$ is often implied by context. $$ \mathbf{e}_1 = \begin{pmatrix} 1 \\ 0 \\ 0 \\ \vdots \\ 0 \end{pmatrix}, \quad \mathbf{e}_2 = \begin{pmatrix} 0 \\ 1 \\ 0 \\ \vdots \\ 0 \end{pmatrix}, \quad \ldots, \quad \mathbf{e}_n = \begin{pmatrix} 0 \\ 0 \\ \vdots \\ 0 \\ 1 \end{pmatrix}. $$
In [9]:
# define a unit vector by comprehension
e3 = [i == 3 ? 1 : 0 for i in 1:5]
Out[9]:
5-element Vector{Int64}:
 0
 0
 1
 0
 0

Vector operations

  • Vector addition and vector substraction. For two vectors $\mathbf{x}, \mathbf{y} \in \mathbb{R}^n$ (of same length) $$ \mathbf{x} + \mathbf{y} = \begin{pmatrix} x_1 + y_1 \\ \vdots \\ x_n + y_n \end{pmatrix}, \quad \mathbf{x} - \mathbf{y} = \begin{pmatrix} x_1 - y_1 \\ \vdots \\ x_n - y_n \end{pmatrix}. $$
In [10]:
x = [1, 2, 3, 4, 5]
y = [6, 7, 8, 9, 10]
x + y
Out[10]:
5-element Vector{Int64}:
  7
  9
 11
 13
 15
In [11]:
x - y
Out[11]:
5-element Vector{Int64}:
 -5
 -5
 -5
 -5
 -5
  • Scalar-vector multiplication. For a scalar $\alpha \in \mathbb{R}$ and a vector $\mathbf{x} \in \mathbb{R}^n$, $$ \alpha \mathbf{x} = \begin{pmatrix} \alpha x_1 \\ \alpha x_2 \\ \vdots \\ \alpha x_n \end{pmatrix}. $$
In [12]:
α = 0.5
x = [1, 2, 3, 4, 5]
α * x
Out[12]:
5-element Vector{Float64}:
 0.5
 1.0
 1.5
 2.0
 2.5
  • Elementwise multiplication or Hadamard product. For two vectors $\mathbf{x}, \mathbf{y} \in \mathbb{R}^n$ (of same length) $$ \mathbf{x} \circ \mathbf{y} = \begin{pmatrix} x_1 y_1 \\ \vdots \\ x_n y_n \end{pmatrix}. $$
In [13]:
# in Julia, dot operation is elementwise operation
x = [1, 2, 3, 4, 5]
y = [6, 7, 8, 9, 10]
x .* y
Out[13]:
5-element Vector{Int64}:
  6
 14
 24
 36
 50
  • For scalars $\alpha_1, \ldots, \alpha_k \in \mathbb{R}$ and vectors $\mathbf{x}_1, \ldots, \mathbf{x}_k \in \mathbb{R}^n$, the linear combination $$ \alpha_1 \mathbf{x}_1 + \cdots + \alpha_k \mathbf{x}_k $$ is a sum of scalar-vector products.
In [14]:
x = [1, 2, 3, 4, 5]
y = [6, 7, 8, 9, 10]
1 * x + 0.5 * y
Out[14]:
5-element Vector{Float64}:
  4.0
  5.5
  7.0
  8.5
 10.0
  • The inner product or dot product between two vectors $\mathbf{x}, \mathbf{y} \in \mathbb{R}^n$ is $$ \mathbf{x}' \mathbf{y} = x_1 y_1 + \cdots + x_n y_n. $$ Other notations for inner products: $\langle \mathbf{x}, \mathbf{y} \rangle$.
In [15]:
x = [1, 2, 3, 4, 5]
y = [6, 7, 8, 9, 10]
x'y
Out[15]:
130
In [16]:
# dot function is avaiable from the standard library Linear Algebra
dot(x, y)
Out[16]:
130

Properties of inner product:

  1. $\mathbf{x}' \mathbf{x} = x_1^2 + \cdots + x_n^2 \ge 0$.

  2. $\mathbf{x}' \mathbf{x} = 0$ if and only if $\mathbf{x} = 0$.

  3. Commutative: $\mathbf{x}' \mathbf{y} = \mathbf{y}' \mathbf{x}$.

  4. Associative with scalar multiplication: $(\alpha \mathbf{x})' \mathbf{y} = \alpha (\mathbf{x}' \mathbf{y}) = \mathbf{x}' (\alpha \mathbf{y})$.

  5. Distributive with vector addition: $(\mathbf{x} + \mathbf{y})' \mathbf{z} = \mathbf{x}' \mathbf{z} + \mathbf{y}' \mathbf{z}$.

Examples of inner product:

  1. Inner product with unit vector: $\mathbf{e}_i' \mathbf{x} = x_i$.

  2. Differencing: $(\mathbf{e}_i - \mathbf{e}_j)' \mathbf{x} = x_i - x_j$.

  3. Sum: $\mathbf{1}' \mathbf{x} = x_1 + \cdots + x_n$.

  4. Average: $(\frac 1n \mathbf{1})' \mathbf{x} = (x_1 + \cdots + x_n) / n$.

  • Vector transpose: $$ \mathbf{x}' = \begin{pmatrix} x_1 \\ \vdots \\ x_n \end{pmatrix}' = (x_1 \, \ldots \, x_n). $$

    Properties:

    1. $(\mathbf{x}')' = \mathbf{x}$.

    2. Transposition is a linear operator: $(\alpha \mathbf{x} + \beta \mathbf{y})' = \alpha \mathbf{x}' + \beta \mathbf{y}'$.

Superscript $^t$ or $^T$ is also commonly used to denote transpose: $\mathbf{x}^t$, $\mathbf{x}^T$.

Computational complexity of vector operations

  • One floating point operation (flop) is one basic arithmetic operation in $\mathbb{R}$ or $\mathbb{C}$: $+, -, *, /, \sqrt, ...$

  • The flop count of operation count is the total number of flops in an algorithm.

  • A (very) crude predictor of run time of the algorithm is $$ \text{run time} \approx \frac{\text{flop count}}{\text{computer speed}}. $$

  • Dominant term: the highest-order term in the flop count. $$ \frac 13 n^3 + 100 n^2 + 10n + 5 \approx \frac 13 n^3. $$

  • Order: the power in the dominant term. $$ \frac 13 n^3 + 100 n^2 + 10n + 5 = \text{order } n^3 = O(n^3). $$

  • Assume vectors are all of length $n$.

    1. Vector addition and substraction: $n$ flops.

    2. Scalar multiplication: $n$ flops.

    3. Elementwise multiplication: $n$ flops.

    4. Inner product: $2n - 1$ flops.

  • These vector operations are all order $n$ algorithms.

In [17]:
# info about my computer 
versioninfo()
Julia Version 1.6.0
Commit f9720dc2eb (2021-03-24 12:55 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin19.6.0)
  CPU: Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, skylake)
Environment:
  JULIA_EDITOR = code
  JULIA_NUM_THREADS = 4
In [18]:
n = 10^7
x = randn(n)
sum(x) # compile
@time sum(x);
  0.004684 seconds (1 allocation: 16 bytes)

Norm, distance, angle

  • The Euclidean norm or L2 norm of a vector $\mathbf{x} \in \mathbb{R}^n$ is $$ \|\mathbf{x}\| = \|\mathbf{x}\|_2 = (\mathbf{x}'\mathbf{x})^{1/2} = \sqrt{x_1^2 + \cdots + x_n^2}. $$

  • The L1 norm of a vector $\mathbf{x} \in \mathbb{R}^n$ is $$ \|\mathbf{x}\|_1 = |x_1| + \cdots + |x_n|. $$

  • Properties of L2 norm.

    1. Positive definiteness: $\|\mathbf{x}\| \ge 0$ for any vector $\mathbf{x}$. $\|\mathbf{x}\| = 0$ if and only if $\mathbf{x}=\mathbf{0}$.

    2. Homogeneity: $\|\alpha \mathbf{x}\| = |\alpha| \|\mathbf{x}\|$ for any scalar $\alpha$ and vector $\mathbf{x}$.

    3. Triangle inequality: $\|\mathbf{x} + \mathbf{y}\| \le \|\mathbf{x}\| + \|\mathbf{y}\|$ for any $\mathbf{x}, \mathbf{y} \in \mathbb{R}^n$. Proof: use Cauchy-Schwartz inequality.

    4. Cauchy-Schwartz inequality: $|\mathbf{x}' \mathbf{y}| \le \|\mathbf{x}\| \|\mathbf{y}\|$ for any $\mathbf{x}, \mathbf{y} \in \mathbb{R}^n$. The equality holds when (1) $\mathbf{x} = \mathbf{0}$ or $\mathbf{y}=\mathbf{0}$ or (2) $\mathbf{x} \ne \mathbf{0}$, $\mathbf{y} \ne \mathbf{0}$, and $\mathbf{x} = \alpha \mathbf{y}$ for some $\alpha \ne 0$.

      Proof: The function $f(t) = \|\mathbf{x} - t \mathbf{y}\|_2^2 = \|\mathbf{x}\|_2^2 - 2t (\mathbf{x}' \mathbf{y}) + t^2\|\mathbf{y}\|_2^2$ is minimized at $t^\star =(\mathbf{x}'\mathbf{y}) / \|\mathbf{y}\|^2$ with minimal value $f(t^\star) = \|\mathbf{x}\|^2 - (\mathbf{x}'\mathbf{y})^2 / \|\mathbf{y}\|^2 \ge 0$.

In [19]:
# check triangular inequality
@show x = randn(5)
@show y = randn(5)
@show norm(x + y)
@show norm(x) + norm(y)
@show norm(x + y)  norm(x) + norm(y)
x = randn(5) = [0.7738530709037769, 1.344750883612216, -0.01000755421055196, -1.543182965649529, 1.3848416572338424]
y = randn(5) = [0.7823014256853303, -1.4645887807781848, -1.3848949741410768, 0.6386517857683485, -0.25131514633683166]
norm(x + y) = 2.546525085039811
norm(x) + norm(y) = 4.858168254537617
norm(x + y) ≤ norm(x) + norm(y) = true
Out[19]:
true
In [20]:
# check Cauchy-Schwartz inequality
@show x = randn(5)
@show y = randn(5)
@show abs(x'y)
@show norm(x) * norm(y)
@show abs(x'y)  norm(x) * norm(y)
x = randn(5) = [1.0646056965408952, 0.7848733126530085, 1.1509742958697582, -0.503293236154652, 1.4424561169317982]
y = randn(5) = [-0.6624673653939196, 0.16075565612441853, -0.5029725911423841, 0.24792071486232836, -1.4234612064756527]
abs(x' * y) = 3.336059373941377
norm(x) * norm(y) = 3.895118899741072
abs(x' * y) ≤ norm(x) * norm(y) = true
Out[20]:
true
  • The (Euclidean) distance between vectors $\mathbf{x}$ and $\mathbf{y}$ is defined as $\|\mathbf{x} - \mathbf{y}\|$.

  • Property of distances.

    1. Nonnegativity. $\|\mathbf{x} - \mathbf{y}\| \ge 0$ for all $\mathbf{x}$ and $\mathbf{y}$. And $\|\mathbf{x} - \mathbf{y}\| = 0$ if and only if $\mathbf{x} = \mathbf{y}$.

    2. Triangular inequality: $\|\mathbf{x} - \mathbf{y}\| \le \|\mathbf{x} - \mathbf{z}\| + \|\mathbf{z} - \mathbf{y}\|$.

  • The average of a vector $\mathbf{x}$ is $$ \operatorname{avg}(\mathbf{x}) = \bar{\mathbf{x}} = \frac{x_1 + \cdots + x_n}{n} = \frac{\mathbf{1}' \mathbf{x}}{n}. $$

  • The rooted mean square (RMS) of a vector is $$ \operatorname{rms}(\mathbf{x}) = \sqrt{\frac{x_1^2 + \cdots + x_n^2}{n}} = \frac{\|\mathbf{x}\|}{\sqrt n}. $$

  • The standard deviation of a vector $\mathbf{x}$ is $$ \operatorname{std}(\mathbf{x}) = \sqrt{\frac{(x_1 - \bar{\mathbf{x}})^2 + \cdots + (x_n - \bar{\mathbf{x}})^2}{n}} = \operatorname{rms}(\mathbf{x} - \bar{\mathbf{x}} \mathbf{1}) = \frac{\|\mathbf{x} - (\mathbf{1}' \mathbf{x} / n) \mathbf{1}\|}{\sqrt n}. $$

  • Theorem: $\operatorname{avg}(\mathbf{x})^2 + \operatorname{std}(\mathbf{x})^2 = \operatorname{rms}(\mathbf{x})^2$.

    This result underlies the famous bias-variance tradeoff in statistics.

    Proof: TODO in class. $\operatorname{std}(\mathbf{x})^2 = \frac{\|\mathbf{x} - (\mathbf{1}' \mathbf{x} / n) \mathbf{1}\|^2}{n} = ... = \operatorname{rms}(\mathbf{x})^2 - \operatorname{avg}(\mathbf{x})^2$.

In [21]:
x = randn(5)
@show mean(x)^2 + std(x, corrected = false)^2
@show norm(x)^2 / length(x)
# floating point arithmetics is not exact
@show mean(x)^2 + std(x, corrected = false)^2  norm(x)^2 / length(x)
mean(x) ^ 2 + std(x, corrected = false) ^ 2 = 0.27681813422052637
norm(x) ^ 2 / length(x) = 0.2768181342205265
mean(x) ^ 2 + std(x, corrected = false) ^ 2 ≈ norm(x) ^ 2 / length(x) = true
Out[21]:
true
  • Angle between two nonzero vectors $\mathbf{x}, \mathbf{y}$ is $$ \theta = \operatorname{arccos} \left(\frac{\mathbf{x}'\mathbf{y}}{\|\mathbf{x}\|\|\mathbf{y}\|}\right). $$ This is the unique value of $\theta \in [0, \pi]$ that satisifies $$ \mathbf{x}'\mathbf{y} = \|\mathbf{x}\|\|\mathbf{y}\| \cos \theta. $$

  • Cauchy-Schwartz inequality guarantees that $$ -1 \le \cos \theta = \frac{\mathbf{x}'\mathbf{y}}{\|\mathbf{x}\|\|\mathbf{y}\|} \le 1. $$

  • Terminology

Angle Sign of inner product Terminology
$\theta = 0$ $\mathbf{x}'\mathbf{y}=|\mathbf{x}||\mathbf{y}|$ $\mathbf{x}$ and $\mathbf{y}$ are aligned or parallel
$0 \le \theta < \pi/2$ $\mathbf{x}'\mathbf{y} > 0$ $\mathbf{x}$ and $\mathbf{y}$ make an acute angle
$\theta = \pi / 2$ $\mathbf{x}'\mathbf{y} = 0$ $\mathbf{x}$ and $\mathbf{y}$ are orthogonal, $\mathbf{x} \perp \mathbf{y}$
$\pi/2 < \theta \le \pi$ $\mathbf{x}'\mathbf{y} < 0$ $\mathbf{x}$ and $\mathbf{y}$ make an obtuse angle
$\theta = \pi$ $\mathbf{x}'\mathbf{y} = -|\mathbf{x}||\mathbf{y}|$ $\mathbf{x}$ and $\mathbf{y}$ are anti-aligned or opposed

TODO: Picture.

Linear independence

  • A set of vectors $\mathbf{a}_1, \ldots, \mathbf{a}_k \in \mathbb{R}^n$ are linearly dependent if there exist constants $\alpha_1, \ldots, \alpha_k$, which are not all zeros, such that $$ \alpha_1 \mathbf{a}_1 + \cdots + \alpha_k \mathbf{a}_k = \mathbf{0}. $$ They are linearly independent if they are not linearly dependent. That is if $\alpha_1 \mathbf{a}_1 + \cdots + \alpha_k \mathbf{a}_k = \mathbf{0}$ then $\alpha_1 = \cdots = \alpha_k = 0$.

  • Theorem: Unit vectors $\mathbf{e}_1, \ldots, \mathbf{e}_n \in \mathbb{R}^n$ are linearly independent.

    Proof: TODO in class.

  • Theorem: If $\mathbf{x}$ is a linear combination of linearly independent vectors $\mathbf{a}_1, \ldots, \mathbf{a}_k$. That is $\mathbf{x} = \alpha_1 \mathbf{a}_1 + \cdots \alpha_k \mathbf{a}_k$. Then the coefficients $\alpha_1, \ldots, \alpha_k$ are unique.

    Proof: TODO in class.

  • Independence-dimension inequality or order-dimension inequality. If the vectors $\mathbf{a}_1, \ldots, \mathbf{a}_k \in \mathbb{R}^n$ are linearly independent, then $k \le n$.

    In words, a linearly independent set of vectors in $\mathbb{R}^n$ can have at most $n$ elements. Or any set of $n+1$ or more vectors in $\mathbb{R}^n$ is linearly dependent.

    Proof (optional): We show this by induction. Let $a_1, \ldots, a_k \in \mathbb{R}^1$ be linearly independent. We must have $a_1 \ne 0$. This means that every element $a_i$ of the collection can be expressed as a multiple of $a_i = (a_i / a_1) a_1$ of the first element $a_1$. This contradicts the linear independence thus $k$ must be 1.
    Induction hypothesis: suppose $n \ge 2$ and the independence-dimension inequality holds for $k \le n$. We partition the vectors $\mathbf{a}_i \in \mathbb{R}^n$ as $$ \mathbf{a}_i = \begin{pmatrix} \mathbf{b}_i \\ \alpha_i \end{pmatrix}, \quad i = 1,\ldots,k, $$ where $\mathbf{b}_i \in \mathbb{R}^{n-1}$ and $\alpha_i \in \mathbb{R}$.
    First suppose $\alpha_1 = \cdots = \alpha_k = 0$. Then the vectors $\mathbf{b}_1, \ldots, \mathbf{b}_k$ are linearly independent: $\sum_{i=1}^k \beta_i \mathbf{b}_i = \mathbf{0}$ if and only if $\sum_{i=1}^k \beta_i \mathbf{a}_i = \mathbf{0}$, which is only possible for $\beta_1 = \cdots = \beta_k = 0$ because the vectors $\mathbf{a}_i$ are linearly independent. The vectors $\mathbf{b}_i$ therefore form a linearly independent collection of $(n-1)$-vectors. By the induction hypothesis we have $k \le n-1$ so $k \le n$.
    Next we assume the scalars $\alpha_i$ are not all zero. Assume $\alpha_j \ne 0$. We define a collection of $k-1$ vectors $\mathbf{c}_i$ of length $n-1$ as follows: $$ \mathbf{c}_i = \mathbf{b}_i - \frac{\alpha_i}{\alpha_j} \mathbf{b}_j, \quad i = 1, \ldots, j-1, \mathbf{c}_i = \mathbf{b}_{i+1} - \frac{\alpha_{i+1}}{\alpha_j} \mathbf{b}_j, \quad i = j, \ldots, k-1. $$ These $k-1$ vectors are linealy independent: If $\sum_{i=1}^{k-1} \beta_i c_i = 0$ then $$ \sum_{i=1}^{j-1} \beta_i \begin{pmatrix} \mathbf{b}_i \\ \alpha_i \end{pmatrix} + \gamma \begin{pmatrix} \mathbf{b}_j \\ \alpha_j \end{pmatrix} + \sum_{i=j+1}^k \beta_{i-1} \begin{pmatrix} \mathbf{b}_i \\ \alpha_i \end{pmatrix} = \mathbf{0} $$ with $\gamma = - \alpha_j^{-1} \left( \sum_{i=1}^{j-1} \beta_i \alpha_i + \sum_{i=j+1}^k \beta_{i-1} \alpha_i \right)$. Since the vectors $\mathbf{a}_i$ are linearly independent, all coefficients $\beta_i$ and $\gamma$ are all zero. This in turns implies that the vectors $\mathbf{c}_1, \ldots, \mathbf{c}_{k-1}$ are linearly independent. By the induction hypothesis $k-1 \le n-1$, we have established $k \le n$.

Basis

  • A set of $n$ linearly independent vectors $\mathbf{a}_1, \ldots, \mathbf{a}_n \in \mathbb{R}^n$ is called a basis for $\mathbb{R}^n$.

  • Fact: the zero vector $\mathbf{0}_n$ cannot be a basis vector in $\mathbb{R}^n$. Why?

  • Any vector $\mathbf{x} \in \mathbb{R}^n$ can be expressed as a linear combination of basis vectors $\mathbf{x} = \alpha_1 \mathbf{a}_1 + \cdots + \alpha_n \mathbf{a}_n$ for some $\alpha_1, \ldots, \alpha_n$, and these coefficients are unique. This is called expansion of $\mathbf{x}$ in the basis $\mathbf{a}_1, \ldots, \mathbf{a}_n$.

    Proof of existence by contradition. Suppose $\mathbf{x}$ can NOT be expressed as a linear combination of basis vectors $\mathbf{x} = \alpha_1 \mathbf{a}_1 + \cdots + \alpha_n \mathbf{a}_n$. Suppose an arbitrary linear combination $\alpha_1 \mathbf{a}_1 + \cdots + \alpha_n \mathbf{a}_n + \beta \mathbf{x} = \mathbf{0}$. Then $\beta = 0$ otherwise it contradictions with our assumption. Also $\alpha_1 = \cdots = \alpha_n = 0$ by linear independence of $\mathbf{a}_1, \ldots, \mathbf{a}_n$. Therefore we conclude $\alpha_1 = \cdots = \alpha_n = \beta = 0$. Thus $\mathbf{a}_1, \ldots, \mathbf{a}_n, \mathbf{x}$ are linearly independent, contradicting with the independence-dimension inequality.

    Proof of uniqueness: TODO in class.

  • Example: Unit vectors $\mathbf{e}_1, \ldots, \mathbf{e}_n$ form a basis for $\mathbb{R}^n$. Expansion of a vector $\mathbf{x} \in \mathbb{R}^n$ in this basis is $$ \mathbf{x} = x_1 \mathbf{e}_1 + \cdots + x_n \mathbf{e}_n. $$

Orthonormal basis

  • A set of vectors $\mathbf{a}_1, \ldots, \mathbf{a}_k$ are (mutually) orthogonal if $\mathbf{a}_i \perp \mathbf{a}_j$ for any $i \ne j$. They are normalized if $\|\mathbf{a}_i\|=1$ for all $i$. They are orthonormal if they are both orthogonal and normalized.

    Orthonormality is often expressed compactly by $\mathbf{a}_i'\mathbf{a}_j = \delta_{ij}$, where $$ \delta_{ij} = \begin{cases} 1 & \text{if } i = j \\ 0 & \text{if } i \ne j \end{cases} $$ is the Kronecker delta notation.

  • Theorem: An orthonormal set of vectors are linearly independent.

    Proof: TODO in class. Expand $\|\alpha_1 \mathbf{a}_1 + \cdots + \alpha_k \mathbf{a}_k\|^2 = (\alpha_1 \mathbf{a}_1 + \cdots + \alpha_k \mathbf{a}_k)'(\alpha_1 \mathbf{a}_1 + \cdots + \alpha_k \mathbf{a}_k)$.

  • By the independence-dimension inequality, must have $k \le n$. When $k=n$, $\mathbf{a}_1, \ldots, \mathbf{a}_n$ are called an orthonormal basis.

  • Examples of orthonormal basis:

    1. Unit vectors $\mathbf{e}_1, \ldots, \mathbf{e}_n$ in $\mathbb{R}^n$.
    2. The 3 vectors in $\mathbb{R}^3$: $$ \begin{pmatrix} 0 \\ 0 \\ 1 \end{pmatrix}, \quad \frac{1}{\sqrt 2} \begin{pmatrix} 1 \\ 1 \\ 0 \end{pmatrix}, \quad \quad \frac{1}{\sqrt 2} \begin{pmatrix} 1 \\ -1 \\ 0 \end{pmatrix}. $$

TODO: Picture.

In [22]:
a1 = [0, 0, 1]
a2 = (1 / sqrt(2)) * [1, 1, 0]
a3 = (1 / sqrt(2)) * [1, -1, 0]
@show norm(a1), norm(a2), norm(a3)
@show a1'a2, a1'a3, a2'a3
(norm(a1), norm(a2), norm(a3)) = (1.0, 0.9999999999999999, 0.9999999999999999)
(a1' * a2, a1' * a3, a2' * a3) = (0.0, 0.0, 0.0)
Out[22]:
(0.0, 0.0, 0.0)
  • Orthonormal expansion. If $\mathbf{a}_1, \ldots, \mathbf{a}_n \in \mathbb{R}^n$ is an orthonormal basis, then for any vector $\mathbf{x} \in \mathbb{R}^n$, $$ \mathbf{x} = (\mathbf{a}_1'\mathbf{x}) \mathbf{a}_1 + \ldots + (\mathbf{a}_n'\mathbf{x}) \mathbf{a}_n. $$

    Proof: Take inner product with $\mathbf{a}_i$ on both sides.

In [23]:
@show x = randn(3)
@show (a1'x) * a1 + (a2'x) * a2 + (a3'x) * a3
@show x  (a1'x) * a1 + (a2'x) * a2 + (a3'x) * a3
x = randn(3) = [2.0627778719698555, 1.414919144471015, -0.45520068621714055]
(a1' * x) * a1 + (a2' * x) * a2 + (a3' * x) * a3 = [2.0627778719698555, 1.414919144471015, -0.45520068621714055]
x ≈ (a1' * x) * a1 + (a2' * x) * a2 + (a3' * x) * a3 = true
Out[23]:
true

Gram-Schmidt algorithm

  • Given vectros $\mathbf{a}_1, \ldots, \mathbf{a}_k \in \mathbb{R}^n$. G-S algorithm generates an orthonormal basis $\mathbf{q}_1, \mathbf{q}_2, \ldots$.

  • For $i=1,\ldots,k$:

    1. Orthogonalization: $\tilde{\mathbf{q}}_i = \mathbf{a}_i - (\mathbf{q}_1' \mathbf{a}_i) \mathbf{q}_1 - \cdots - (\mathbf{q}_{i-1}' \mathbf{a}_i) \mathbf{q}_{i-1}$.

    2. Test for linear independence: if $\tilde{\mathbf{q}}_i = \mathbf{0}$, quit.

    3. Normalization: $\mathbf{q}_i = \tilde{\mathbf{q}}_i / \|\tilde{\mathbf{q}}_i\|$.

  • If G-S does not stop early (in step 2), $\mathbf{a}_1, \ldots, \mathbf{a}_k$ are linearly independent.

  • If G-S stops early in iteration $i=j$, then $\mathbf{a}_j$ is a linear combination of $\mathbf{a}_1, \ldots, \mathbf{a}_{j-1}$; so $\mathbf{a}_1, \ldots, \mathbf{a}_{j-1}$ are linearly independent.

In [24]:
# n = 5, k = 3
@show a1 = randn(5)
@show a2 = randn(5)
@show a3 = randn(5);
a1 = randn(5) = [0.3794474614022756, -1.8588962079349254, 1.1184926614886757, 0.018548652538052303, -0.16650209737984434]
a2 = randn(5) = [0.386337896419044, -0.6124240184915484, 0.40703118404375854, 0.4987380584188193, 1.4437657000426827]
a3 = randn(5) = [-2.1799967231158695, 1.6267724851759806, 0.5422451069937892, 0.4868174173021037, -2.310038880156914]
In [25]:
# for i = 1
# orthogonalization
@show q̃1 = copy(a1)
# test for linear independence
@show norm(q̃1)  0
# normalization
@show q1 = q̃1 / norm(q̃1);
q̃1 = copy(a1) = [0.3794474614022756, -1.8588962079349254, 1.1184926614886757, 0.018548652538052303, -0.16650209737984434]
norm(q̃1) ≈ 0 = false
q1 = q̃1 / norm(q̃1) = [0.1717929968742402, -0.8416062378152109, 0.5063921249832146, 0.008397812428932689, -0.07538301663429839]
In [26]:
# for i = 2
# orthogonalization
@show q̃2 = a2 - (q1'a2) * q1
# test for linear independence
@show norm(q̃2)  0
# normalization
@show q2 = q̃2 / norm(q̃2);
q̃2 = a2 - (q1' * a2) * q1 = [0.2689585617873683, -0.03738788771764001, 0.061033548664228776, 0.49300016657686413, 1.4952719226248516]
norm(q̃2) ≈ 0 = false
q2 = q̃2 / norm(q̃2) = [0.16821910046553434, -0.02338411091423074, 0.03817319882396163, 0.3083450625248457, 0.9352120865896744]
In [27]:
# for i = 3
# orthogonalization
@show q̃3 = a3 - (q1'a3) * q1 - (q2'a3) * q2
# test for linear independence
@show norm(q̃3)  0
# Normalization
@show q3 = q̃3 / norm(q̃3);
q̃3 = (a3 - (q1' * a3) * q1) - (q2' * a3) * q2 = [-1.5554755602603014, 0.48444174163375003, 1.2872928987631225, 1.2359361903751251, -0.16813936962635756]
norm(q̃3) ≈ 0 = false
q3 = q̃3 / norm(q̃3) = [-0.6421710176235804, 0.19999957193298795, 0.5314530243342332, 0.5102506406196355, -0.06941557479532266]
In [28]:
# test for orthonormality of q1, q2, q3
@show norm(q1), norm(q2), norm(q3)
@show q1'q2, q1'q3, q2'q3;
(norm(q1), norm(q2), norm(q3)) = (1.0, 0.9999999999999999, 0.9999999999999999)
(q1' * q2, q1' * q3, q2' * q3) = (-9.71445146547012e-17, 2.6020852139652106e-18, 4.163336342344337e-17)

Show by induction that $\mathbf{q}_1, \ldots, \mathbf{q}_i$ are orthonormal:

  • Assume it's true for $i-1$.

  • Orthogonalization step ensures that $\tilde{\mathbf{q}}_i \perp \mathbf{q}_1, \ldots, \tilde{\mathbf{q}}_i \perp \mathbf{q}_{i-1}$. To show this, take inner product of both sides with $\mathbf{q}_j$, $j < i$ $$ \mathbf{q}_j' \tilde{\mathbf{q}}_i = \mathbf{q}_j' \mathbf{a}_i - (\mathbf{q}_1' \mathbf{a}_i) (\mathbf{q}_j' \mathbf{q}_1) - \cdots - (\mathbf{q}_{i-1}' \mathbf{a}_i) (\mathbf{q}_j' \mathbf{q}_{i-1}) = \mathbf{q}_j' \mathbf{a}_i - \mathbf{q}_j' \mathbf{a}_i = 0. $$

  • So $\mathbf{q}_1, \ldots, \mathbf{q}_i$ are orthogonal. The normalization step enures $\mathbf{q}_i$ is normal.

Suppose G-S has not terminated by iteration $i$, then

  • $\mathbf{a}_i$ is a combination of $\mathbf{q}_1, \ldots, \mathbf{q}_i$, and

  • $\mathbf{q}_i$ is a combination of $\mathbf{a}_1, \ldots, \mathbf{a}_i$.

Computational complexity of G-S algorithm:

  • Step 1 of iteration $i$ requires $i-1$ inner products, $\mathbf{q}_1' \mathbf{a}_i, \ldots, \mathbf{q}_{i-1}' \mathbf{a}_i$, which costs $(i-1)(2n-1)$ flops.

  • $2n(i-1)$ flops to compute $\tilde{\mathbf{q}}_i$.

  • $3n$ flops to compute $\|\tilde{\mathbf{q}}_i\|$ and $\mathbf{q}_i$.

  • Assuming no early termination, total computational cost is: $$ \sum_{i=1}^k ((4n-1) (i - 1) + 3n) = (4n - 1) \frac{k(k-1)}{2} + 3nk \approx 2nk^2, $$ using $\sum_{i=1}^k (i - 1) = k(k-1)/2$.