Orthogonal Projections

Table of contents

Orthogonal Complement
1. Decomposition of a Vector
Orthogonal Projection
1. Projection
  1. Projection Matrix
One-Dimensional Projection
Projection onto a General Subspace
Why Orthogonal Projection?
Gram-Shmidt Process

Orthogonal Complement

Let $V$ be an $d$-dimensional vector space.

Let $U \subseteq V$ be an $m$-dimensional subspace of $V$.

Then the orthogonal complement of $U$ is the subspace $U^\perp \subseteq V$, the set of all vectors in $V$ that are orthogonal to $U$.

Therefore,

$$ U \cap U^\perp = \{\mathbf{0}\} $$

$U^\perp$ is a $(d-m)$-dimensional subspace.

Decomposition of a Vector

Any $\mathbf{x} \in V$ can be decomposed into vectors in $U$ and $U^\perp$.

Let $(\mathbf{b}_1, \dots, \mathbf{b}_i)$ be a basis for $U$, where $i \in [1, m]$.

Let $(\mathbf{b}_1^\perp, \dots, \mathbf{b}_j^\perp)$ be a basis for $U^\perp$, where $j \in [1, d - m]$.

Then $\mathbf{x}$ can be decomposed as follows:

\[\mathbf{x} = \sum_{i=1}^m \lambda_i \mathbf{b}_i + \sum_{j=1}^{d-m} \psi_j \mathbf{b}_j^\perp\]

Orthogonal Projection

We can project a vector $\mathbf{x}$ onto a subspace $U$.

This concept is often used in dimensionality reduction, where we project a high-dimensional vector onto a lower-dimensional feature space.

When mapping high-dimensional data to a lower subspace, orthogonal projection retains as much information as possible.

Projection

Let $V$ be a vector space and $U \subseteq V$ be a subspace.

A linear mappinng $\pi: V \to U$ is called a projection if:

$$ \pi^2 = \pi \circ \pi = \pi $$

Projection Matrix

Equivalently, let $P_\pi$ be the transformation matrix of $\pi$.

Then,

$$ P_\pi^2 = P_\pi $$

Projection matrices are always symmetric.

One-Dimensional Projection

Assume $V = \mathbb{R}^n$.

Line

Let $U \subseteq \mathbb{R}^n$ be a one-dimensional subspace spanned by a single basis vector $\mathbf{b} \in \mathbb{R}^n$.

This subspace $U$ is called a line.

Projection onto a Line

We want to project a vector $\mathbf{x} \in \mathbb{R}^n$ onto a line $U$, denoted:

$$ \pi_U(\mathbf{x}) $$

This projection has the following properties:

$\pi_U(\mathbf{x})$ is closest to $\mathbf{x}$
$$ \|\mathbf{x} - \pi_U(\mathbf{x})\| \textrm{ is minimal} $$
The displacement vector $\mathbf{x} - \pi_U(\mathbf{x})$ is orthogonal to $U$
$$ \langle\mathbf{x} - \pi_U(\mathbf{x}), \mathbf{b}\rangle = 0 $$
$\pi_U(\mathbf{x}) \in U$
$$ \exists \lambda \in \mathbb{R},\, \pi_U(\mathbf{x}) = \lambda \mathbf{b} $$

Finding Coordinate of Projection

We mentioned that $\pi_U(\mathbf{x})$ is a scalar multiple of $\mathbf{b}$:

\[\exists \lambda \in \mathbb{R},\, \pi_U(\mathbf{x}) = \lambda \mathbf{b}\]

Then to calculate the projection, it suffices to find $\lambda$, which is the coordinate of $\pi_U(\mathbf{x})$ with respect to $\mathbf{b}$.

$$ \lambda = \frac{\langle\mathbf{b}, \mathbf{x}\rangle} {\langle\mathbf{b}, \mathbf{b}\rangle} $$

Derivation

\[\begin{align*} &\langle\mathbf{x} - \pi_U(\mathbf{x}), \mathbf{b}\rangle = 0 \tag{orthogonality} \\[1em] \iff &\langle\mathbf{x} - \lambda \mathbf{b}, \mathbf{b}\rangle = 0 \tag{definition} \\[1em] \iff &\langle\mathbf{x}, \mathbf{b}\rangle - \lambda \langle\mathbf{b}, \mathbf{b}\rangle = 0 \tag{bilinearity} \\[1em] \iff &\lambda = \frac{\langle\mathbf{x}, \mathbf{b}\rangle} {\langle\mathbf{b}, \mathbf{b}\rangle} \\[1em] \iff &\lambda = \frac{\langle\mathbf{b}, \mathbf{x}\rangle} {\langle\mathbf{b}, \mathbf{b}\rangle} \tag{symmetry} \end{align*}\]

If the inner product is a dot product, then:

$$ \lambda = \frac{\mathbf{b}^T \mathbf{x}}{\|\mathbf{b}\|^2} $$

Further, if $\mathbf{b}$ is a unit vector (orthonormal basis), then:

$$ \lambda = \mathbf{b}^T \mathbf{x} $$

Then the projection is:

$$ \pi_U(\mathbf{x}) = \lambda \mathbf{b} = \frac{\langle\mathbf{b}, \mathbf{x}\rangle} {\langle\mathbf{b}, \mathbf{b}\rangle} \mathbf{b} $$

Projection onto a General Subspace

Overall logic is the same as projection onto a line, it’s just that while line only had one basis vector, general subspace has multiple basis vectors.

Suppose we want to project a vector $\mathbf{x} \in \mathbb{R}^n$ onto a subspace $U \subseteq \mathbb{R}^n$ where $\dim(U) = m$.

Let $B = (\mathbf{b}_1, \dots, \mathbf{b}_m)$ be an ordered basis for $U$.

We know that $\pi_U(\mathbf{x}) \in U$, therefore $\pi_U(\mathbf{x})$ can be written as a linear combination of $B$:

\[\pi_U(\mathbf{x}) = \sum_{i=1}^m \lambda_i \mathbf{b}_i\]

Define:

\[\mathbf{B} = [\mathbf{b_1} \dots \mathbf{b_m}] \quad \boldsymbol{\lambda} = \begin{bmatrix} \lambda_1 \\ \vdots \\ \lambda_m \end{bmatrix}\]

Then $\pi_U(\mathbf{x}) = \mathbf{B} \boldsymbol{\lambda}$.

$\mathbf{B}$ is a $n \times m$ matrix and $\boldsymbol{\lambda}$ is a $m \times 1$ vector.

The displacement vector $\mathbf{x} - \pi_U(\mathbf{x}) = \mathbf{x} - \mathbf{B} \boldsymbol{\lambda}$ is orthogonal to $U$ (for all $\mathbf{b}_i \in B$):

\[\begin{gather*} \langle\mathbf{b}_i, \mathbf{x} - \mathbf{B} \boldsymbol{\lambda}\rangle = 0\\[1em] \mathbf{b}_i^\top (\mathbf{x} - \mathbf{B} \boldsymbol{\lambda}) = 0\\ \end{gather*}\]

These $m$ equations can be formed into a homogeneous linear system:

\[\begin{align*} \begin{bmatrix} \mathbf{b}_1^\top \\ \vdots \\ \mathbf{b}_m^\top \end{bmatrix} (\mathbf{x} - \mathbf{B} \boldsymbol{\lambda}) = 0 &\iff \mathbf{B}^\top (\mathbf{x} - \mathbf{B} \boldsymbol{\lambda}) = 0\\ &\iff \mathbf{B}^\top \mathbf{x} - \mathbf{B}^\top \mathbf{B} \boldsymbol{\lambda} = 0\\[1em] &\iff \mathbf{B}^\top \mathbf{B} \boldsymbol{\lambda} = \mathbf{B}^\top \mathbf{x}\\[1em] &\iff \boldsymbol{\lambda} = (\mathbf{B}^\top \mathbf{B})^{-1} \mathbf{B}^\top \mathbf{x} \end{align*}\]

$(\mathbf{B}^\top \mathbf{B})^{-1} \mathbf{B}^\top$ is the pseudoinverse of $\mathbf{B}$. It only exists if $\mathbf{B}$ is a full-rank matrix.

The projection is:

$$ \pi_U(\mathbf{x}) = \mathbf{B} \boldsymbol{\lambda} = \mathbf{B} (\mathbf{B}^\top \mathbf{B})^{-1} \mathbf{B}^\top \mathbf{x} $$

It is easy to see that the projection matrix $P_\pi$ is:

$$ P_\pi = \mathbf{B} (\mathbf{B}^\top \mathbf{B})^{-1} \mathbf{B}^\top $$

If $B$ is an orthonormal basis

We saw that

\[\pi_U(\mathbf{x}) = \mathbf{B} (\mathbf{B}^\top \mathbf{B})^{-1} \mathbf{B}^\top \mathbf{x}\]

If the basis is orthonormal, or $\mathbf{B}^\top \mathbf{B} = \mathbf{I}$, the projection simplifies to:

\[\pi_U(\mathbf{x}) = \mathbf{B} \mathbf{B}^\top \mathbf{x}\]

Why Orthogonal Projection?

Orthogonal projection is used in dimensionality reduction and approximating solutions to unsovleable linear system $\mathbf{A} \mathbf{x} = \mathbf{b}$.

Mapping high-dimensional data to a lower subspace seems quite intuitive after we’ve seen the projection formula.

Approximating solutions to unsovleable linear systems comes from the idea of finding a vector in the column space of $\mathbf{A}$ that is closest to $\mathbf{b}$.

This approximation is called the least squares solution when the inner product is a dot product.

Gram-Shmidt Process

Now that we know about orthogonal projections, Gram-Shmidt process comes naturally.

Given a set of linearly independent vectors $\{\boldsymbol{x}_1, \ldots, \boldsymbol{x}_n\}$, there exists an orthogonal set $\{\boldsymbol{v}_1, \ldots, \boldsymbol{v}_n\}$ such that

\[\operatorname{span}(\boldsymbol{x}_1, \ldots, \boldsymbol{x}_n) = \operatorname{span}(\boldsymbol{v}_1, \ldots, \boldsymbol{v}_n)\]

and the process to find such $\boldsymbol{v}_i$ is called the Gram-Schmidt process.

By additionally normalizing $\boldsymbol{v}_i$, we can also find an orthonormal set.

Which leads to the following:

Every vector space has an orthonormal basis, because we can always turn any basis into an orthonormal basis.

The process is as follows:

Let $\boldsymbol{v}_1 = \frac{\boldsymbol{x}_1}{\lVert\boldsymbol{x}_1\rVert}$
- Take the normalized first vector as a member of the orthogonal basis
For $\boldsymbol{x}_2$:
- Subtract the projection of $\boldsymbol{x}_2$ onto the current space spanned by $\boldsymbol{v}_1$ from $\boldsymbol{x}_2$
  \[\boldsymbol{v}'_2 = \boldsymbol{x}_2 - \pi_{\operatorname{span}(\boldsymbol{v}_1)}(\boldsymbol{x}_2) = \boldsymbol{x}_2 - \frac{\langle \boldsymbol{x}_2, \boldsymbol{v}_1\rangle} {\langle\boldsymbol{v}_1, \boldsymbol{v}_1\rangle} \boldsymbol{v}_1\]
  We’re getting rid of the parallel component (projection) with the span from $\boldsymbol{x}_2$ so that it can be part of the orthogonal basis
Normalize to get $\boldsymbol{v}_2$
For $\boldsymbol{x}_3$:
\[\boldsymbol{x}_3 - \pi_{\operatorname{span}(\boldsymbol{v}_1)}(\boldsymbol{x}_3) - \pi_{\operatorname{span}(\boldsymbol{v}_2)}(\boldsymbol{x}_3) = \boldsymbol{v}_3\]
Normalize and so on…