| SOCR ≫ | DSPA ≫ | DSPA2 Topics ≫ |
We use a simple 2D rotation matrix example to demonstrate the rationale for representing complex phenomena in terms of corresponding simple lower-dimensional fundamental constituents. Specifically, we will discuss linear and nonlinear data dimensionality reduction techniques, such as principal and independent component analyses, factor analysis, matrix singular value decomposition, t-distributed stochastic neighbor embedding, and uniform manifold approximation and projection. These methods will be applied to hand-written digit recognition imaging data and Parkinson’s disease clinical data.
Now that we have most of the R fundamentals covered and
saw a model-based forecasting (multivariate linear regression), we can
delve into more interesting ML/AI data analytic methods. Dimension
reduction reduces the number of features when dealing with a very
large number of variables. Linear and nonlinear dimension reduction can
help us extract a set of “uncorrelated” (principal) variables, reduce
the complexity of the data, and facilitate supervised data clustering
and unsupervised classification. Dimensionality reduction is related to
feature selection, however, it goes beyond simply picking some of the
original variables. Rather, we are constructing new (uncorrelated)
variables, which sometimes may be expressed as linear functions of the
original features, and sometimes can only be inferred as highly
nonlinear transformations (maps) of the initial state space.
Dimensionality reduction techniques enable exploratory data analytics by reducing the complexity of the dataset, still approximately preserving important process characteristics, such as retaining the distances between cases or subjects. If we are able to reduce the complexity down to a few dimensions and protect some of that information content in the data, then we may visually inspect the data and try other lower-dimensional methods to untangle its intrinsic characteristics.
All dimensionality reduction techniques carry some resemblance to variable selection, which we will see in Chapter 11. The continuous process of dimensionality reduction yields lower variability compared to the discrete feature selection process, e.g., see Wolfgang M. Hartmann’s paper.
Specifically, we will (1) start with a synthetic example demonstrating the reduction of a 2D data into 1D, (2) explain the notion of rotation matrices, (3) show examples of principal component analysis (PCA), singular value decomposition (SVD), independent component analysis (ICA), factor analysis (FA), and t-distributed Stochastic Neighbor Embedding (t-SNE), Uniform Manifold Approximation and Projection (UMAP), and (4) present a Parkinson’s disease case-study at the end.
Let’s consider a simple example of twin heights. Suppose we simulate \(1,000\) 2D points that representing normalized individual heights, i.e., number of standard deviations from the mean height. Each 2D point represents a pair of twins. We will simulate this scenario using Bivariate Normal Distribution.
| \(Twin1_{Height}\) | \(Twin2_{Height}\) |
|---|---|
| \(y[1,1]\) | \(y[1,2]\) |
| \(y[2,1]\) | \(y[2,2]\) |
| \(y[3,1]\) | \(y[3,2]\) |
| \(\cdots\) | \(\cdots\) |
| \(y[500,1]\) | \(y[500,2]\) |
\[ y^T_{2\times500} = \begin{bmatrix} y[1, ]=Twin1_{Height} \\ y[2, ]=Twin2_{Height} \end{bmatrix} \sim BVN \left ( \mu= \begin{bmatrix} Twin1_{Height} \\ Twin2_{Height} \end{bmatrix} , \Sigma=\begin{bmatrix} 1 & 0.95 \\ 0.95 & 1 \end{bmatrix} \right ) .\]
# plot(y[1, ], y[2, ], xlab="Twin 1 (standardized height)",
# ylab="Twin 2 (standardized height)", xlim=c(-3, 3), ylim=c(-3, 3))
# points(y[1, 1:2], y[2, 1:2], col=2, pch=16)
library(plotly)
plot_ly() %>%
add_markers(x = ~y[1, ], y = ~y[2, ], name="Data Scatter") %>%
add_markers(x = y[1, 1], y = y[2, 1], marker=list(color = 'red', size = 20,
line = list(color = 'yellow', width = 2)), name="Twin-pair 1") %>%
add_markers(x = y[1, 2], y = y[2, 2], marker=list(color = 'green', size = 20,
line = list(color = 'orange', width = 2)), name="Twin-pair 2")%>%
layout(title='Scatter Plot Simulated Twin Data (Y)',
xaxis = list(title="Twin 1 (standardized height)"), yaxis = list(title="Twin 2 (standardized height)"),
legend = list(orientation = 'h'))These data may represent a fraction of the information included in a high-throughput neuroimaging genetics study of twins. You can see one example of such pediatric study here.
Tracking the distances between any two samples can be accomplished
using the dist function. stats::dist() is a
function computing the distance matrix (according to a user-specified
distance metric, default=Euclidean) representing the distances between
the rows of the data matrix. For example, here is the distance
between the two RED points in the figure above:
## [1] 2.100187
To reduce the 2D data to a simpler 1D plot we can transform the data to a 1D matrix (vector) preserving (approximately) the distances between the 2D points.
The 2D plot shows the Euclidean distance between the pair of RED
points, the length of this line is the distance between the 2 points. In
2D, these lines tend to go along the direction of the diagonal. If we
rotate the plot so that the diagonal aligned with the
x-axis we get the following raw and transformed
plots:
z1 = (y[1, ]+y[2, ])/2 # the sum (actually average)
z2 = (y[1, ]-y[2, ]) # the difference
z = rbind( z1, z2) # matrix z has the same dimension as y
thelim <- c(-3, 3)
# par(mar=c(1, 2))
# par(mfrow=c(2,1))
# plot(y[1, ], y[2, ], xlab="Twin 1 (standardized height)",
# ylab="Twin 2 (standardized height)",
# xlim=thelim, ylim=thelim)
# points(y[1, 1:2], y[2, 1:2], col=2, pch=16)
# plot(z[1, ], z[2, ], xlim=thelim, ylim=thelim, xlab="Average height", ylab="Difference in height")
# points(z[1, 1:2], z[2, 1:2], col=2, pch=16)
# par(mfrow=c(1,1))
plot_ly() %>%
add_markers(x = ~z[1, ], y = ~z[2, ], name="Data Scatter") %>%
add_markers(x = z[1, 1], y = z[2, 1], marker=list(color = 'red', size = 20,
line = list(color = 'yellow', width = 2)), name="Twin-pair 1") %>%
add_markers(x = y[1, 2], y = y[2, 2], marker=list(color = 'green', size = 20,
line = list(color = 'orange', width = 2)), name="Twin-pair 2") %>%
layout(title='Scatter Plot Transformed Twin Data (Z)',
xaxis = list(title="Twin 1 (standardized height)", scaleanchor = "y", scaleratio = 2),
yaxis = list(title="Twin 2 (standardized height)", scaleanchor = "x", scaleratio = 0.5),
legend = list(orientation = 'h'))Raw (above) and transformed (below) Twin heights scatterplots.
Of course, matrix linear algebra notation can be used to represent
this affine transformation of the data. Here we can see that to get the
result z, we multiplied y by the matrix \(A\):
\[ A = \begin{pmatrix} 1/2&1/2\\ 1&-1\\ \end{pmatrix} \implies z = A \times y .\]
We can invert this transform by multiplying the result by the
inverse (rotation matrix?) \(A^{-1}\) as follows:
\[ A^{-1} = \begin{pmatrix} 1&1/2\\ 1&-1/2\\ \end{pmatrix} \implies y = A^{-1} \times z \]
You can try this in R:
## [,1] [,2]
## [1,] 0.5 0.5
## [2,] 1.0 -1.0
## [,1] [,2]
## [1,] 1 0.5
## [2,] 1 -0.5
## [,1] [,2]
## [1,] 1 0
## [2,] 0 1
Note that this matrix transformation did not preserve distances, i.e., it’s not a simple rotation in 2D:
## [1] 2.100187
## [1] 1.541323
One important question is how to identify transformations that
preserve distances. In mathematics, transformations between metric
spaces that are distance-preserving are called isometries
(or congruences or congruent transformations).
First, let’s test the MA transformation we used above: \[A=\frac{Y_1+Y_2}{2} \\ M=Y_1 - Y_2.\]
MA <- matrix(c(1/2, 1, 1/2, -1), 2, 2)
MA_z <- MA%*%y
d <- dist(t(y))
d_MA <- dist(t(MA_z))
# plot(as.numeric(d), as.numeric(d_MA))
# abline(0, 1, col=2)
plot_ly() %>%
add_markers(x = ~as.numeric(d)[1:5000], y = ~as.numeric(d_MA)[1:5000], name="Transformed Twin Distances") %>%
add_markers(x = ~as.numeric(d)[1], y = ~as.numeric(d_MA)[1], marker=list(color = 'red', size = 20,
line = list(color = 'yellow', width = 2)), name="Twin-pair 1") %>%
add_markers(x = ~as.numeric(d)[2], y = ~as.numeric(d_MA)[2], marker=list(color = 'green', size = 20,
line = list(color = 'orange', width = 2)), name="Twin-pair 2") %>%
add_trace(x = ~c(0,8), y = ~c(0,8), mode="lines",
line = list(color = "red", width = 4), name="Preserved Distances") %>%
layout(title='Preservation of Distances Between Twins (Transform=MA_Z)',
xaxis = list(title="Original Twin Distances", range = c(0, 8)),
yaxis = list(title="Transformed Twin Distances", range = c(0, 8)),
legend = list(orientation = 'h'))Observe that this MA transformation is not an isometry - the distances are not preserved. Here is one example with \(v1=\begin{bmatrix} v1_x=0 \\ v1_y=1 \end{bmatrix}\), \(v2=\begin{bmatrix} v2_x=1 \\ v2_y=0 \end{bmatrix}\), which are distance \(\sqrt{2}\) apart in their native space, but separated further by the transformation \(MA\), \(d(MA(v1),MA(v2))=2\).
## [,1] [,2]
## [1,] 0.5 0.5
## [2,] 1.0 -1.0
## [,1] [,2]
## [1,] 0.5 1
## [2,] 0.5 -1
## [,1] [,2]
## [1,] 1 0.5
## [2,] 1 -0.5
## [,1] [,2]
## [1,] -0.5 0.5
## [2,] -0.5 -0.5
## [,1] [,2]
## v1 0 1
## v2 1 0
## [1] 1.414214
## [1] 2
More generally, if \[ \begin{pmatrix} Y_1\\ Y_2\\ \end{pmatrix} \sim BVN \left( \begin{pmatrix} \mu_1\\ \mu_2\\ \end{pmatrix}, \begin{pmatrix} \sigma_1^2&\sigma_{12}\\ \sigma_{12}&\sigma_2^2\\ \end{pmatrix} \right).\]
Then, \[ Z = AY + \eta \sim BVN(\eta + A\mu,A\Sigma A^{T}),\]
where BVN denotes bivariate normal distribution, \(A = \begin{pmatrix}a&b\\c&d\\ \end{pmatrix}\), \(Y=(Y_1,Y_2)^T\), \(\mu = (\mu_1,\mu_2)\), \(\Sigma = \begin{pmatrix} \sigma_1^2&\sigma_{12}\\ \sigma_{12}&\sigma_2^2\\ \end{pmatrix}\).
You can verify this by using the change of variable theorem. Thus, affine transformations preserve bivariate normality. However, in general, the linear transformation (\(A\)) is not guaranteed to be an isometry.
The question now is under what additional conditions for the transformation matrix \(A\), can we guarantee an isometry.
Notice that, \[ d^2(P_i,P_j) =\sum_{k=1}^{n} (P_{jk}-P_{ik})^2 = ||P||^2 = P^TP,\]
where \(P = (P_{j,1}-P_{i,1},...,P_{j,n}-P_{i,n})^T\), \(P_i,P_j \in \mathbb{R}^n\) are any two points in \(n\) dimensions.
Thus, the only requirement we need for \(A\) to be an isometry is \((AY)^T(AY)=Y^TY\), i.e., \(A^TA=I\), which implies that \(A\) is an orthogonal (rotational) matrix.
Let’s use a two dimension orthogonal matrix to illustrate this.
Set \(A = \frac{1}{\sqrt{2}} \begin{pmatrix}1&1\\1&-1\\ \end{pmatrix}\). It’s easy to verify that \(A\) is an orthogonal (2D rotation) matrix.
The simplest way to test the isometry is to perform the linear transformation directly as follows.
A <- 1/sqrt(2)*matrix(c(1, 1, 1, -1), 2, 2)
z <- A%*%y
d <- dist(t(y))
d2 <- dist(t(z))
# plot(as.numeric(d), as.numeric(d2))
# abline(0, 1, col=2)
plot_ly() %>%
add_markers(x = ~as.numeric(d)[1:5000], y = ~as.numeric(d2)[1:5000], name="Transformed Twin Distances") %>%
add_trace(x = ~c(0,8), y = ~c(0,8), mode="lines",
line = list(color = "red", width = 4), name="Preserved Distances") %>%
add_markers(x = ~as.numeric(d)[1], y = ~as.numeric(d2)[1], marker=list(color = 'red', size = 20,
line = list(color = 'yellow', width = 2)), name="Twin-pair 1") %>%
add_markers(x = ~as.numeric(d)[2], y = ~as.numeric(d2)[2], marker=list(color = 'green', size = 20,
line = list(color = 'orange', width = 2)), name="Twin-pair 2") %>%
layout(title='Preservation of Distances Between Twins (Transform=A)',
xaxis = list(title="Original Twin Distances", range = c(0, 8)),
yaxis = list(title="Transformed Twin Distances", range = c(0, 8)),
legend = list(orientation = 'h'))