Orthogonal Projections
Table of contents
Orthogonal Complement
Let $V$ be an $d$-dimensional vector space.
Let $U \subseteq V$ be an $m$-dimensional subspace of $V$.
Then the orthogonal complement of $U$ is the subspace $U^\perp \subseteq V$, the set of all vectors in $V$ that are orthogonal to $U$.
Therefore,
$$ U \cap U^\perp = \{\mathbf{0}\} $$
$U^\perp$ is a $(d-m)$-dimensional subspace.
Decomposition of a Vector
Any $\mathbf{x} \in V$ can be decomposed into vectors in $U$ and $U^\perp$.
Let $(\mathbf{b}_1, \dots, \mathbf{b}_i)$ be a basis for $U$, where $i \in [1, m]$.
Let $(\mathbf{b}_1^\perp, \dots, \mathbf{b}_j^\perp)$ be a basis for $U^\perp$, where $j \in [1, d - m]$.
Then $\mathbf{x}$ can be decomposed as follows:
\[\mathbf{x} = \sum_{i=1}^m \lambda_i \mathbf{b}_i + \sum_{j=1}^{d-m} \psi_j \mathbf{b}_j^\perp\]Orthogonal Projection
We can project a vector $\mathbf{x}$ onto a subspace $U$.
This concept is often used in dimensionality reduction, where we project a high-dimensional vector onto a lower-dimensional feature space.
When mapping high-dimensional data to a lower subspace, orthogonal projection retains as much information as possible.
Projection
Let $V$ be a vector space and $U \subseteq V$ be a subspace.
A linear mappinng $\pi: V \to U$ is called a projection if:
$$ \pi^2 = \pi \circ \pi = \pi $$
Projection Matrix
Equivalently, let $P_\pi$ be the transformation matrix of $\pi$.
Then,
$$ P_\pi^2 = P_\pi $$
Projection matrices are always symmetric.
One-Dimensional Projection
Assume $V = \mathbb{R}^n$.
Line
Let $U \subseteq \mathbb{R}^n$ be a one-dimensional subspace spanned by a single basis vector $\mathbf{b} \in \mathbb{R}^n$.
This subspace $U$ is called a line.
Projection onto a Line
We want to project a vector $\mathbf{x} \in \mathbb{R}^n$ onto a line $U$, denoted:
$$ \pi_U(\mathbf{x}) $$
This projection has the following properties:
- $\pi_U(\mathbf{x})$ is closest to $\mathbf{x}$
$$ \|\mathbf{x} - \pi_U(\mathbf{x})\| \textrm{ is minimal} $$
- The displacement vector $\mathbf{x} - \pi_U(\mathbf{x})$ is orthogonal to $U$
$$ \langle\mathbf{x} - \pi_U(\mathbf{x}), \mathbf{b}\rangle = 0 $$
- $\pi_U(\mathbf{x}) \in U$
$$ \exists \lambda \in \mathbb{R},\, \pi_U(\mathbf{x}) = \lambda \mathbf{b} $$
Finding Coordinate of Projection
We mentioned that $\pi_U(\mathbf{x})$ is a scalar multiple of $\mathbf{b}$:
\[\exists \lambda \in \mathbb{R},\, \pi_U(\mathbf{x}) = \lambda \mathbf{b}\]Then to calculate the projection, it suffices to find $\lambda$, which is the coordinate of $\pi_U(\mathbf{x})$ with respect to $\mathbf{b}$.
$$ \lambda = \frac{\langle\mathbf{b}, \mathbf{x}\rangle} {\langle\mathbf{b}, \mathbf{b}\rangle} $$
Derivation
\[\begin{align*} &\langle\mathbf{x} - \pi_U(\mathbf{x}), \mathbf{b}\rangle = 0 \tag{orthogonality} \\[1em] \iff &\langle\mathbf{x} - \lambda \mathbf{b}, \mathbf{b}\rangle = 0 \tag{definition} \\[1em] \iff &\langle\mathbf{x}, \mathbf{b}\rangle - \lambda \langle\mathbf{b}, \mathbf{b}\rangle = 0 \tag{bilinearity} \\[1em] \iff &\lambda = \frac{\langle\mathbf{x}, \mathbf{b}\rangle} {\langle\mathbf{b}, \mathbf{b}\rangle} \\[1em] \iff &\lambda = \frac{\langle\mathbf{b}, \mathbf{x}\rangle} {\langle\mathbf{b}, \mathbf{b}\rangle} \tag{symmetry} \end{align*}\]If the inner product is a dot product, then:
$$ \lambda = \frac{\mathbf{b}^T \mathbf{x}}{\|\mathbf{b}\|^2} $$
Further, if $\mathbf{b}$ is a unit vector (orthonormal basis), then:
$$ \lambda = \mathbf{b}^T \mathbf{x} $$
Then the projection is:
$$ \pi_U(\mathbf{x}) = \lambda \mathbf{b} = \frac{\langle\mathbf{b}, \mathbf{x}\rangle} {\langle\mathbf{b}, \mathbf{b}\rangle} \mathbf{b} $$
Projection onto a General Subspace
Overall logic is the same as projection onto a line, it’s just that while line only had one basis vector, general subspace has multiple basis vectors.
Suppose we want to project a vector $\mathbf{x} \in \mathbb{R}^n$ onto a subspace $U \subseteq \mathbb{R}^n$ where $\dim(U) = m$.
Let $B = (\mathbf{b}_1, \dots, \mathbf{b}_m)$ be an ordered basis for $U$.
We know that $\pi_U(\mathbf{x}) \in U$, therefore $\pi_U(\mathbf{x})$ can be written as a linear combination of $B$:
\[\pi_U(\mathbf{x}) = \sum_{i=1}^m \lambda_i \mathbf{b}_i\]Define:
\[\mathbf{B} = [\mathbf{b_1} \dots \mathbf{b_m}] \quad \boldsymbol{\lambda} = \begin{bmatrix} \lambda_1 \\ \vdots \\ \lambda_m \end{bmatrix}\]Then $\pi_U(\mathbf{x}) = \mathbf{B} \boldsymbol{\lambda}$.
$\mathbf{B}$ is a $n \times m$ matrix and $\boldsymbol{\lambda}$ is a $m \times 1$ vector.
The displacement vector $\mathbf{x} - \pi_U(\mathbf{x}) = \mathbf{x} - \mathbf{B} \boldsymbol{\lambda}$ is orthogonal to $U$ (for all $\mathbf{b}_i \in B$):
\[\begin{gather*} \langle\mathbf{b}_i, \mathbf{x} - \mathbf{B} \boldsymbol{\lambda}\rangle = 0\\[1em] \mathbf{b}_i^\top (\mathbf{x} - \mathbf{B} \boldsymbol{\lambda}) = 0\\ \end{gather*}\]These $m$ equations can be formed into a homogeneous linear system:
\[\begin{align*} \begin{bmatrix} \mathbf{b}_1^\top \\ \vdots \\ \mathbf{b}_m^\top \end{bmatrix} (\mathbf{x} - \mathbf{B} \boldsymbol{\lambda}) = 0 &\iff \mathbf{B}^\top (\mathbf{x} - \mathbf{B} \boldsymbol{\lambda}) = 0\\ &\iff \mathbf{B}^\top \mathbf{x} - \mathbf{B}^\top \mathbf{B} \boldsymbol{\lambda} = 0\\[1em] &\iff \mathbf{B}^\top \mathbf{B} \boldsymbol{\lambda} = \mathbf{B}^\top \mathbf{x}\\[1em] &\iff \boldsymbol{\lambda} = (\mathbf{B}^\top \mathbf{B})^{-1} \mathbf{B}^\top \mathbf{x} \end{align*}\]$(\mathbf{B}^\top \mathbf{B})^{-1} \mathbf{B}^\top$ is the pseudoinverse of $\mathbf{B}$. It only exists if $\mathbf{B}$ is a full-rank matrix.
The projection is:
$$ \pi_U(\mathbf{x}) = \mathbf{B} \boldsymbol{\lambda} = \mathbf{B} (\mathbf{B}^\top \mathbf{B})^{-1} \mathbf{B}^\top \mathbf{x} $$
It is easy to see that the projection matrix $P_\pi$ is:
$$ P_\pi = \mathbf{B} (\mathbf{B}^\top \mathbf{B})^{-1} \mathbf{B}^\top $$
If $B$ is an orthonormal basis
We saw that
\[\pi_U(\mathbf{x}) = \mathbf{B} (\mathbf{B}^\top \mathbf{B})^{-1} \mathbf{B}^\top \mathbf{x}\]If the basis is orthonormal, or $\mathbf{B}^\top \mathbf{B} = \mathbf{I}$, the projection simplifies to:
\[\pi_U(\mathbf{x}) = \mathbf{B} \mathbf{B}^\top \mathbf{x}\]Why Orthogonal Projection?
Orthogonal projection is used in dimensionality reduction and approximating solutions to unsovleable linear system $\mathbf{A} \mathbf{x} = \mathbf{b}$.
Mapping high-dimensional data to a lower subspace seems quite intuitive after we’ve seen the projection formula.
Approximating solutions to unsovleable linear systems comes from the idea of finding a vector in the column space of $\mathbf{A}$ that is closest to $\mathbf{b}$.
This approximation is called the least squares solution when the inner product is a dot product.
Gram-Shmidt Process
Now that we know about orthogonal projections, Gram-Shmidt process comes naturally.
Given a set of linearly independent vectors $\{\boldsymbol{x}_1, \ldots, \boldsymbol{x}_n\}$, there exists an orthogonal set $\{\boldsymbol{v}_1, \ldots, \boldsymbol{v}_n\}$ such that
\[\operatorname{span}(\boldsymbol{x}_1, \ldots, \boldsymbol{x}_n) = \operatorname{span}(\boldsymbol{v}_1, \ldots, \boldsymbol{v}_n)\]and the process to find such $\boldsymbol{v}_i$ is called the Gram-Schmidt process.
By additionally normalizing $\boldsymbol{v}_i$, we can also find an orthonormal set.
Which leads to the following:
Every vector space has an orthonormal basis, because we can always turn any basis into an orthonormal basis.
The process is as follows:
- Let $\boldsymbol{v}_1 = \frac{\boldsymbol{x}_1}{\lVert\boldsymbol{x}_1\rVert}$
- Take the normalized first vector as a member of the orthogonal basis
- For $\boldsymbol{x}_2$:
Subtract the projection of $\boldsymbol{x}_2$ onto the current space spanned by $\boldsymbol{v}_1$ from $\boldsymbol{x}_2$
\[\boldsymbol{v}'_2 = \boldsymbol{x}_2 - \pi_{\operatorname{span}(\boldsymbol{v}_1)}(\boldsymbol{x}_2) = \boldsymbol{x}_2 - \frac{\langle \boldsymbol{x}_2, \boldsymbol{v}_1\rangle} {\langle\boldsymbol{v}_1, \boldsymbol{v}_1\rangle} \boldsymbol{v}_1\]We’re getting rid of the parallel component (projection) with the span from $\boldsymbol{x}_2$ so that it can be part of the orthogonal basis
- Normalize to get $\boldsymbol{v}_2$
For $\boldsymbol{x}_3$:
\[\boldsymbol{x}_3 - \pi_{\operatorname{span}(\boldsymbol{v}_1)}(\boldsymbol{x}_3) - \pi_{\operatorname{span}(\boldsymbol{v}_2)}(\boldsymbol{x}_3) = \boldsymbol{v}_3\]- Normalize and so on…