Kernel principal component analysis (kernel PCA) is an extension of principal component analysis (PCA) using techniques of kernel methods. Using a kernel, the originally linear operations of PCA are done in a reproducing kernel Hilbert space with a non-linear mapping.

KPCA is an extension of PCA to non-linear distributions. Instead of directly doing a PCA, the original data points are mapped into a higher-dimensional (possibly infinite-dimensional) feature space defined by a (usually nonlinear) function φ through a mathematical process called the “kernel trick”.

### The Kernel trick

The kernel trick transforms any algorithm that solely depends on the dot product between two vectors. Wherever a dot product is used, it is replaced with the kernel function. Thus, a linear algorithm can easily be transformed into a non-linear algorithm. This non-linear algorithm is equivalent to the linear algorithm operating in the range space of φ. However, because kernels are used, the φ function is never explicitly computed. This is desirable, because the high-dimensional space may be infinite-dimensional (as is the case when the kernel is a Gaussian).

http://www.youtube.com/watch?v=3liCbRZPrZA

Interesting video by Udi Aharoni demonstrating how points that are not linearly separable in a 2D space can almost always be linearly separated in a higher dimensional (3D) space.

### Principal Components in Kernel Space

Like in PCA, the overall idea is to perform a transformation that will maximize the variance of the captured variables while minimizing the overall covariance between those variables. Using the kernel trick, the covariance matrix is substituted by the Kernel matrix and the analysis is carried analogously in feature space. An Eigen value decomposition is performed and the eigenvectors are sorted in ascending order of Eigen values, so those vectors may form a basis in feature space that explain most of the variance in the data on its first dimensions.

However, because the principal components are in feature space, we will not be directly performing dimensionality reduction. Suppose that the number of observations** m** exceeds the input dimensionality **n**. In linear PCA, we can find at most** n** nonzero Eigen values. On the other hand, using Kernel PCA we can find up to **m** nonzero Eigen values because we will be operating on a **m **x** m** kernel matrix.

A more detailed discussion can be seen in the works of *Mika et al.* (Kernel PCA and De-Noising in Feature Spaces) and *Scholkopf et al.* (Nonlinear Component Analysis as a Kernel Eigenvalue Problem). A more accessible explanation of the topic is also available in the work from Ian Fasel.

###

### The pre-image problem

In standard PCA it was possible to revert the transformation, retrieving the original data points and enabling the use of PCA for compression* and denoising. However, as the results provided by kernel PCA live in some high dimensional feature space, they need not have pre-images in input space. In fact, even if it exists, it also need be not unique.

Thus, in KPCA, it is only possible to obtain an approximate pre-image for a given set of patterns projected on the feature space. This problem can (and has) been tackled by many approaches, some of them of iterative nature, but some closed-form solutions also exists. Typically those solutions make use of the fact that distances in feature space are related to distances in input space. Thus, those solutions try to achieve an optimal transformation that can embed those feature points in input space respecting those distances relationships.

One of the first solutions of this form were proposed by *Kwok and Tsang* in their paper “The Pre-Image Problem in Kernel Methods". A better approach is also described by *Bakir* on his thesis “Extensions to Kernel Dependency Estimation”, alongside with a nice comprehensive description on various other methods for handling the pre-image problem.

Note: actually the use of KPCA for compression is a bit limited, since one needs to store the original data set to compute transformations and its approximate reversion.

## Code Overview

Below is the main code behind KPCA. This is a direct, naive implementation of the algorithm that may not scale very well for very large datasets. It is implemented on top of the previous standard PCA code and thus on top of the AForge.NET Framework.

public override void Compute()
{
int rows = Source.GetLength(0);
int cols = Source.GetLength(1);
sourceCentered = adjust(Source);
double[,] K = new double[rows, rows];
for (int i = 0; i < rows; i++)
{
for (int j = i; j < rows; j++)
{
double k = kernel.Kernel(sourceCentered.GetRow(i), sourceCentered.GetRow(j));
K[i, j] = k; K[j, i] = k;
}
}
if (centerFeatureSpace)
K = centerKernel(K);
EigenValueDecomposition evd = new EigenValueDecomposition(K);
double[] evals = evd.RealEigenValues;
double[,] eigs = evd.EigenVectors;
int[] indices = new int[rows];
for (int i = 0; i < rows; i++) indices[i] = i;
Array.Sort(evals, indices, new AbsoluteComparer(true));
eigs = eigs.Submatrix(0, rows - 1, indices);
if (centerFeatureSpace)
{
for (int j = 0; j < rows; j++)
{
double eig = System.Math.Sqrt(System.Math.Abs(evals[j]));
for (int i = 0; i < rows; i++)
eigs[i, j] = eigs[i, j] / eig;
}
}
this.SingularValues = new double[rows];
this.EigenValues = evals;
this.ComponentMatrix = eigs;
double[,] result = new double[rows, rows];
for (int i = 0; i < rows; i++)
for (int j = 0; j < rows; j++)
for (int k = 0; k < rows; k++)
result[i, j] += K[i, k] * eigs[k, j];
this.Result = result;
createComponents();
}
public override double[,] Transform(double[,] data, int components)
{
int rows = data.GetLength(0);
int cols = data.GetLength(1);
int N = sourceCentered.GetLength(0);
data = adjust(data);
double[,] K = new double[rows, N];
for (int i = 0; i < rows; i++)
for (int j = 0; j < N; j++)
K[i, j] = kernel.Kernel(data.GetRow(i), sourceCentered.GetRow(j));
if (centerFeatureSpace)
K = centerKernel(K);
double[,] result = new double[rows, components];
for (int i = 0; i < rows; i++)
for (int j = 0; j < components; j++)
for (int k = 0; k < rows; k++)
result[i, j] += K[i, k] * ComponentMatrix[k, j];
return result;
}

The main code behind the reversion of the projection. It is a direct, naive implementation of the closed-form solution algorithm as proposed by Kwok and Tsang. Currently it is completely **not** optimized, but it gives an better overview of the algorithm on its essential form. The source code available to download may be better optimized in the future.

</span>/// Reverts a set of projected data into it's original form. Complete reverse
/// transformation is not always possible and is not even guaranteed to exist.
/// <span class="code-SummaryComment"></summary>
</span>/// <span class="code-SummaryComment"><remarks>
</span>/// This method works using a closed-form MDS approach as suggested by
/// Kwok and Tsang. It is currently a direct implementation of the algorithm
/// without any kind of optimization.
///
/// Reference:
/// - http://cmp.felk.cvut.cz/cmp/software/stprtool/manual/kernels/preimage/list/rbfpreimg3.html
/// <span class="code-SummaryComment"></remarks>
</span>/// <span class="code-SummaryComment"><param name="data">The kpca-transformed data.</param>
</span>/// <span class="code-SummaryComment"><param name="neighbors">The number of nearest neighbors to use while constructing the pre-image.</param>
</span>public double[,] Revert(double[,] data, int neighbors)
{
int rows = data.GetLength(0);
int cols = data.GetLength(1);
double[,] reversion = new double[rows, sourceCentered.GetLength(1)];
// number of neighbors cannot exceed the number of training vectors.
int nn = System.Math.Min(neighbors, sourceCentered.GetLength(0));
// For each point to be reversed
for (int p = 0; p < rows; p++)
{
// 1. Get the point in feature space
double[] y = data.GetRow(p);
// 2. Select nn nearest neighbors of the feature space
double[,] X = sourceCentered;
double[] d2 = new double[Result.GetLength(0)];
int[] inx = new int[Result.GetLength(0)];
// 2.1 Calculate distances
for (int i = 0; i < X.GetLength(0); i++)
{
inx[i] = i;
d2[i] = kernel.Distance(y, Result.GetRow(i).Submatrix(y.Length));
}
// 2.2 Order them
Array.Sort(d2, inx);
// 2.3 Select nn neighbors
inx = inx.Submatrix(nn);
X = X.Submatrix(inx).Transpose(); // X is in input space
d2 = d2.Submatrix(nn); // distances in input space
// 3. Create centering matrix
// H = eye(nn, nn) - 1 / nn * ones(nn, nn);
double[,] H = Matrix.Identity(nn).Subtract(Matrix.Create(nn, 1.0 / nn));
// 4. Perform SVD
// [U,L,V] = svd(X*H);
SingularValueDecomposition svd = new SingularValueDecomposition(X.Multiply(H));
double[,] U = svd.LeftSingularVectors;
double[,] L = Matrix.Diagonal(nn, svd.Diagonal);
double[,] V = svd.RightSingularVectors;
// 5. Compute projections
// Z = L*V';
double[,] Z = L.Multiply(V.Transpose());
// 6. Calculate distances
// d02 = sum(Z.^2)';
double[] d02 = Matrix.Sum(Matrix.DotPow(Z,2));
// 7. Get the pre-image using z = -0.5*inv(Z')*(d2-d02)
double[,] inv = Matrix.PseudoInverse(Z.Transpose());
double[] z = (-0.5).Multiply(inv).Multiply(d2.Subtract(d02));
// 8. Project the pre-image on the original basis
// using x = U*z + sum(X,2)/nn;
double[] x = (U.Multiply(z)).Add(Matrix.Sum(X.Transpose()).Multiply(1.0 / nn));
// 9. Store the computed pre-image.
for (int i = 0; i < reversion.GetLength(1); i++)
reversion[p, i] = x[i];
}
// if the data has been standardized or centered,
// we need to revert those operations as well
if (this.Method == AnalysisMethod.Correlation)
{
// multiply by standard deviation and add the mean
for (int i = 0; i < reversion.GetLength(0); i++)
for (int j = 0; j < reversion.GetLength(1); j++)
reversion[i, j] = (reversion[i, j] * StandardDeviations[j]) + Means[j];
}
else
{
// only add the mean
for (int i = 0; i < reversion.GetLength(0); i++)
for (int j = 0; j < reversion.GetLength(1); j++)
reversion[i, j] = reversion[i, j] + Means[j];
}
return reversion;
}

## Using the Code

As with PCA, code usage is simple. First, choose a kernel implementing the *IKernel* interface then pass it to the *KernelPrincipalComponentAnalysis* constructor together with the data points.

Gaussian kernel = new Gaussian(10.0);
var kpca = new KernelPrincipalComponentAnalysis(data, kernel);

Then, simply call *Compute()* to perform the analysis. Once done, data about the analysis will be available through the object properties and transformations on additional data can be computed by calling *Transform()*.

kpca.Compute();
double[,] result = kpca.Transform(data, 2);

## Sample Application

To demonstrate the use of KPCA, I created a simple Windows Forms Application which performs simple statistical analysis and KPCA transformations. It can also perform projection reversion by solving the approximate pre-image problem.

On left: Scholkopf Toy Example. Middle: Kernel principal components detected by the KPCA. Right: The two first dimensions obtained by the two first Kernel principal components.

Left: Wikipedia example. Middle: Projection using a **non-centered** Gaussian kernel. Right: Reconstruction of the original data using all Kernel principal components, showing some denoising ability.

####

#### Linear PCA as a special case

Additionally we may check that linear PCA is indeed a special case of kernel PCA by using a linear kernel on the example by *Lindsay Smith* on his A Tutorial On Principal Component Analysis.

## References

- BAKIR, Gokhan.
*Extensions to Kernel Dependency Estimation*. Doctorate Thesis.

Available in: http://opus.kobv.de/tuberlin/volltexte/2006/1256/pdf/bakir_goekhan.pdf

- FASEL, Ian.
*Scholkopf, Smola and Muller: Kernel PCA*.

Available in: http://cseweb.ucsd.edu/classes/fa01/cse291/kernelPCA_article.pdf

- HOFFMANN, Heiko.
*Unsupervised Learning of Visuomotor Associations*. Dissertation.

Available in: http://www.heikohoffmann.de/htmlthesis/hoffmann_diss.html.

- KWOK, James; TSANG, Ivor.
*The Pre-Image Problem in Kernel Methods*. Proceedings of the Twentieth International Conference on Machine Learning (ICML-2003), Washington DC, 2003.

Available in: http://www.hpl.hp.com/conferences/icml2003/papers/345.pdf

- KWOK, James; TSANG, Ivor.
*The Pre-Image Problem in Kernel Methods*. Proceedings of the Twentieth International Conference on Machine Learning (ICML-2003), Washington DC, 2003.

Available in: http://www.cse.ust.hk/~jamesk/papers/icml03_slides.pdf

- MIKA, Sebastian; SCHOLKOPF, Benhard; et al.
*Kernel PCA and De-Noising in Feature Spaces*.

Available in: http://www.cs.dartmouth.edu/farid/teaching/cs88/nips99.pdf

- WIKIPEDIA;
*Kernel Principal Component Analysis*.

Available in: http://en.wikipedia.org/wiki/Kernel_principal_component_analysis

- SMITH, Lindsay.
*A Tutorial on Principal Component Analysis*. 2002.

Available in: http://www.cs.otago.ac.nz/cosc453/student_tutorials/principal_components.pdf

## See also