In this case study, we will look at another interesting example of a large structured tabular dataset UK Biobank (UKBB) data. The goal of this study is to compare the data representations as well as the subsequent data analytics between classical time-series and kime-series.

1 Data Preparation

A previous investigation Predictive Big Data Analytics using the UK Biobank Data based on \(7,614\) clinical information, phenotypic features, and neuroimaging data of \(9,914\) UKBB subjects, reported the twenty most salient derived imaging biomarkers associated with mental health conditions (e.g., depression, anxiety). By jointly representing and modeling the significant clinical and demographic variables along with the derived salient neuroimaging features, the researchers predicted the presence and progression of depression and other mental health disorders in the cohort of UKBB participating volunteers. We will explore the effects of kime-direction on the findings based on the same data and similar analytical methods. To streamline the demonstration, enable efficient calculations, and facilitate direct interpretation, we will transform the data into a tight computable object of dimensions \(9,914\times 106\) (participants × features) that can be processed, interpreted and visualized more intuitively. An interactive demonstration of this tensor data showing linear and non-linear dimensionality reduction is available online.

## [1] 9914  106

The UKBB archive contains incomplete records. We employed multiple imputation using chained equations (MICE) to obtain complete instances of the data and avoid issues with missing observations.

# This chunk is not set up to run due to long runtime; instead we load the pretrained imputation results.
# MICE imputation (With parallel core computing)
number_cores <- detectCores() - 2
clustUKBB <- makeCluster(number_cores)
clusterSetRNGStream(clustUKBB, 1234)
registerDoParallel(clustUKBB)

imp_tight106_UKBB_data <-
  foreach(no = 1:number_cores, 
          .combine = ibind, 
          .export = "tight106_UKBB_data",
          .packages = "mice") %dopar% {mice(tight106_UKBB_data, m=2,maxit=3, printFlag=T, seed=1234, method = 'cart')}

comp_imp_tight106_UKBB_data <- as.matrix(complete(imp_tight106_UKBB_data), 
                                         dimnames = list(NULL, colnames(tight106_UKBB_data)))
stopCluster(clustUKBB)
save(comp_imp_tight106_UKBB_data, file = "origin_after_imputation.RData")

2 Modeling Structure

To assess the quality of kime-series based analytical inference, we will focus on predicting a specific clinical phenotype – Ever depressed for a whole week 1, i.e. column “X4598.2.0”. A number of model-based and model-free methods can be used for supervised and unsupervised, retrodictive and predictive strategies for binary, categorical, or continuous regression, clustering and classification. We will demonstrate a Random Forest classification approach for forecasting this binary clinical depression phenotype.

Other preprocessing steps were necessary prior to the main data analytics. To construct a kime-series, we randomly splitted the UKBB archive into 11 time epochs. We used all 11 epochs for phase estimation, and we iteratively used 10 epochs for modeling training and 1 epoch for model testing. Many alternative kime-direction aggregator functions can be employed, for this demonstration, we used nil-phase, averaging and smoothing splines with different degrees of freedom separately to estimate the unknown phase-angles.

Note that in later analysis, we found that the first 50 columns of neuroimaging data do not contribute much to this specific classification task. So in the following analysis, the building blocks of our model contain only elements from the last 56 clinical features (54 of them as predictors, 1 as response variable, 1 discarded due to close relationship with the response variable).

load("origin_after_imputation.RData")
# randomly splitted the UKBB archive into 11 time epochs
set.seed(1234)
suffle_rid <- sample(1:nrow(comp_imp_tight106_UKBB_data))
comp_imp_tight106_UKBB_data <- comp_imp_tight106_UKBB_data[suffle_rid[1:9900],]
col_name <- colnames(comp_imp_tight106_UKBB_data)
dim(comp_imp_tight106_UKBB_data) <- c(11, 900, dim(comp_imp_tight106_UKBB_data)[2])
dim(comp_imp_tight106_UKBB_data)

## [1]  11 900 106

3 Fourier Transform

We implemented Fourier Transform to shift the 54 predictors from time domain to frequency domain.

# fft of all 54 predictors
set.seed(1234)
FT_epochs_tight106_UKBB <- array(complex(), c(11, 900, 54))
mag_FT_epochs_tight106_UKBB <- array(NA, c(11, 900, 54))
phase_FT_epochs_tight106_UKBB <- array(NA, c(11, 900, 54))

for (i in 1:11) {
  FT_epochs_tight106_UKBB[i, , ] <- fft(comp_imp_tight106_UKBB_data[i, , c(51:94,97:106)])
  # col96 <X4598.2.0>: response variable; col95 <X4598.0.0>: highly related to the response variable
  # fourier transform
  X2 <- FT_epochs_tight106_UKBB[i, , ]
  mag_FT_epochs_tight106_UKBB[i, , ] <- sqrt(Re(X2)^2+Im(X2)^2); 
  phase_FT_epochs_tight106_UKBB[i, , ] <- atan2(Im(X2), Re(X2)); 
}
dim(phase_FT_epochs_tight106_UKBB)

## [1]  11 900  54

4 Phase Estimation

4.1 Nil Phase Estimation

We applied Nil-Phase estimation as a trivial baseline. The idea is to estimate phase with 0 at each position of the fft transformed data. We’ll show its model performance compared to true-phase and other phase estimation methods in the next section.

# Nil Phase
set.seed(1234)
ift_NilPhase <- array(NA, c(11, 900, 54))
for(i in 1:11){
  Real_Nil <- mag_FT_epochs_tight106_UKBB[i, , ] * cos(0)
  Imaginary_Nil <- mag_FT_epochs_tight106_UKBB[i, , ] * sin(0)
  ift_NilPhase[i,,] <- Re(fft(Real_Nil+1i*Imaginary_Nil, inverse = T)/length(mag_FT_epochs_tight106_UKBB[i,,]))
}

4.2 Average Phase Estimation

By this method, we average over the 11 estimated phase for each position of the \(900\times 54\) matrix.

# Average Phase
avgPhase_FT_epochs_tight106_UKBB <- apply(phase_FT_epochs_tight106_UKBB, c(2,3), mean)
set.seed(1234)
ift_AvgPhase <- array(NA, c(11, 900, 54))
for(i in 1:11){
  Real_Avg <- mag_FT_epochs_tight106_UKBB[i, , ] * cos(avgPhase_FT_epochs_tight106_UKBB)
  Imaginary_Avg <- mag_FT_epochs_tight106_UKBB[i, , ] * sin(avgPhase_FT_epochs_tight106_UKBB)
  ift_AvgPhase[i,,] <- Re(fft(Real_Avg+1i*Imaginary_Avg, inverse = T)/length(mag_FT_epochs_tight106_UKBB[i,,]))
}

4.3 Smoothing Spline Phase Estimation

We fit a smoothing spline to the 11 estimated phase values for each position of the \(900\times 54\) matrix. We’ll see later that different degrees of freedom of the smoothing spline function will result in different prediction accuracy. The average phase and nil phase estimation can actually be seen as special cases of the smoothing spline estimation with degrees of freedom df = 1 and df = 0 respectively.

We visualize this estimation using one biomarker column “X31.0.0” as an example:

plot(1:11,phase_FT_epochs_tight106_UKBB[,10,1],'b',main = "phase_FT_epochs_tight106_UKBB[,10,1]",ylab = "fft phase",xlab = "epochs 1-11",lty = 1,lwd = 1)
lines(1:11,smooth.spline(1:11,phase_FT_epochs_tight106_UKBB[,10,1],df = 10)$y,col = 'red')
lines(1:11,smooth.spline(1:11,phase_FT_epochs_tight106_UKBB[,10,1],df = 5)$y,col = 'blue')
legend("topleft",legend = c("df = 10","df = 5"),lty = 1:1,col = c("red","blue"),cex = 0.8)

5 Inverse Fourier Transform

After getting the phase estimation, we need to reverse back into the time domain to conduct further classification tasks. The ability to reflect the structure of the original dataset after inverse fft represents the adequacy of the phase estimation process.

We show the results of smoothing spline phase estimation with df = 10 as an example:

# inverse FFT example - smoothing spline df = 10
set.seed(1234)
ift_splinePhase <- array(NA, c(11, 900, 54))
smooth_spline_value <- array(NA, c(11, 900, 54))
df <- 10

for(i in 1:900){
  for(j in 1:54){
    smooth_spline_value[,i,j] <- smooth.spline(1:11,phase_FT_epochs_tight106_UKBB[,i,j],df = df)$y
  }
}

for(i in 1:11){
  Real_Avg <- mag_FT_epochs_tight106_UKBB[i, , ] * cos(smooth_spline_value[i, , ])
  Imaginary_Avg <- mag_FT_epochs_tight106_UKBB[i, , ] * sin(smooth_spline_value[i, , ])
  ift_splinePhase[i,,] <- Re(fft(Real_Avg+1i*Imaginary_Avg, inverse = T)/length(mag_FT_epochs_tight106_UKBB[i,,]))
}

Original Data

head(comp_imp_tight106_UKBB_data[1, ,52],100)

##   [1] 1 3 3 4 3 3 3 2 4 4 2 2 2 3 3 3 4 4 3 3 2 3 4 4 3 3 3 3 3 3 4 3 4 4 3 2 3
##  [38] 3 4 3 3 3 3 4 2 4 3 4 2 3 3 4 3 1 3 4 3 4 3 3 3 3 4 3 3 4 3 3 2 2 4 3 3 2
##  [75] 3 2 1 2 4 4 2 2 3 3 3 4 3 3 4 2 3 3 2 4 4 3 3 1 4 4

Raw Inverse FFT

head(ift_splinePhase[1,,2],100)

##   [1] 1.3247109 3.0109421 3.0380485 3.9536578 3.0965326 2.9792134 3.0052876
##   [8] 2.0398020 4.0458510 4.0057234 1.9994767 1.9411536 2.0524283 3.0157179
##  [15] 3.0360039 3.0777665 3.9860074 3.9957264 2.9719703 3.0162383 1.9161712
##  [22] 2.9620848 3.9687503 3.9462660 2.9716492 3.0064971 3.0089882 3.1169813
##  [29] 3.0573894 3.0341785 3.9858190 2.9811874 3.9915375 3.9722213 3.0124213
##  [36] 2.0457208 2.9877710 2.9418941 3.9399941 3.0442297 2.9720348 2.9641899
##  [43] 3.1037576 4.0743839 2.0338855 3.9690007 2.9351730 3.9271476 1.9860520
##  [50] 2.9196992 3.0628154 3.9515632 3.0055973 0.9939673 2.9840694 3.9977194
##  [57] 2.9547721 3.9556126 3.0118504 3.0141401 2.9367009 3.0495779 3.9797034
##  [64] 3.0051608 2.9035839 3.9462564 2.9772724 3.0457601 1.9507338 2.0121289
##  [71] 4.0372838 3.0624453 2.9687833 2.0286521 2.9988872 1.9180096 1.0311940
##  [78] 1.9083107 4.0499099 4.0093490 1.9983529 2.0102444 3.0531050 3.0401639
##  [85] 3.0352656 3.9668564 2.9811530 3.0232930 3.9619172 2.0037215 2.9440798
##  [92] 2.9938688 2.0727398 4.0473087 3.9915757 3.0407106 2.9565383 0.9115500
##  [99] 3.9625790 4.0143996

Round Inverse FFT

head(round(ift_splinePhase[1,,2]),100)

##   [1] 1 3 3 4 3 3 3 2 4 4 2 2 2 3 3 3 4 4 3 3 2 3 4 4 3 3 3 3 3 3 4 3 4 4 3 2 3
##  [38] 3 4 3 3 3 3 4 2 4 3 4 2 3 3 4 3 1 3 4 3 4 3 3 3 3 4 3 3 4 3 3 2 2 4 3 3 2
##  [75] 3 2 1 2 4 4 2 2 3 3 3 4 3 3 4 2 3 3 2 4 4 3 3 1 4 4

6 Inverse FFT based Random Forest

Then, we compared the model performance of a classification task based on original time-series data and transferred data utilizing a different phase estimation approach.

6.1 Original Data based Random Forest (Unknown True-Phase)

We first build the model based on original data. For comparison across different phase estimation methods, we use the 4th epoch as the test set and the other 10 epochs as the training set for all methods. We have also implemented cross-validation to check performance with each 11 epoch as the testing set and found that the model performance remains similar. So for simplicity, we will only show the result of one train-test epoch setting.

library(randomForest)
rf_train_data_ift <- c()
label_train <- c()
test_epoch_idx <- 4
for(i in setdiff(1:11,test_epoch_idx)){
  rf_train_data_ift <- rbind(rf_train_data_ift,comp_imp_tight106_UKBB_data[i,,c(51:94,97:106)])
  label_train <- c(label_train,comp_imp_tight106_UKBB_data[i, ,96])
}
rf_train_data_ift <- as.data.frame(rf_train_data_ift)
rf_train_label_ift <- as.factor(label_train)

set.seed(1234)
rf_Phase_ift_original_data <- randomForest(rf_train_label_ift ~ . ,data= rf_train_data_ift)
rf_Phase_ift_original_data

## 
## Call:
##  randomForest(formula = rf_train_label_ift ~ ., data = rf_train_data_ift) 
##                Type of random forest: classification
##                      Number of trees: 500
## No. of variables tried at each split: 7
## 
##         OOB estimate of  error rate: 19.18%
## Confusion matrix:
##      0    1 class.error
## 0 3886  577   0.1292852
## 1 1149 3388   0.2532510

pred_train = predict(rf_Phase_ift_original_data, type="class")
confusionMatrix(pred_train, rf_train_label_ift)

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction    0    1
##          0 3886 1149
##          1  577 3388
##                                           
##                Accuracy : 0.8082          
##                  95% CI : (0.7999, 0.8163)
##     No Information Rate : 0.5041          
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.6168          
##                                           
##  Mcnemar's Test P-Value : < 2.2e-16       
##                                           
##             Sensitivity : 0.8707          
##             Specificity : 0.7467          
##          Pos Pred Value : 0.7718          
##          Neg Pred Value : 0.8545          
##              Prevalence : 0.4959          
##          Detection Rate : 0.4318          
##    Detection Prevalence : 0.5594          
##       Balanced Accuracy : 0.8087          
##                                           
##        'Positive' Class : 0               
##

6.2 Smoothing Spline Phase Estimation based Random Forest

Then we show the results for Random Forest based on transformed data using smoothing spline phase estimation with df = 10 as an example. Later we will show the training and testing accuracy trending with degrees of freedom increasing from 2 to 11. We can see that smoothing spline phase estimation with df = 10 can give us a competitive classification accuracy compared to true-phase Random Forest.

rf_train_data_ift <- c()
label_train <- c()
test_epoch_idx <- 4
for(i in setdiff(1:11,test_epoch_idx)){
  rf_train_data_ift <- rbind(rf_train_data_ift,ift_splinePhase[i,,])
  label_train <- c(label_train,comp_imp_tight106_UKBB_data[i, ,96])
}
rf_train_data_ift <- as.data.frame(rf_train_data_ift)
rf_train_label_ift <- as.factor(label_train)

set.seed(1234)
rf_Phase_ift_spline <- randomForest(rf_train_label_ift ~ . ,data= rf_train_data_ift)
rf_Phase_ift_spline

## 
## Call:
##  randomForest(formula = rf_train_label_ift ~ ., data = rf_train_data_ift) 
##                Type of random forest: classification
##                      Number of trees: 500
## No. of variables tried at each split: 7
## 
##         OOB estimate of  error rate: 19.56%
## Confusion matrix:
##      0    1 class.error
## 0 3855  608   0.1362312
## 1 1152 3385   0.2539123

pred_train = predict(rf_Phase_ift_spline, type="class")
confusionMatrix(pred_train, rf_train_label_ift)

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction    0    1
##          0 3855 1152
##          1  608 3385
##                                           
##                Accuracy : 0.8044          
##                  95% CI : (0.7961, 0.8126)
##     No Information Rate : 0.5041          
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.6093          
##                                           
##  Mcnemar's Test P-Value : < 2.2e-16       
##                                           
##             Sensitivity : 0.8638          
##             Specificity : 0.7461          
##          Pos Pred Value : 0.7699          
##          Neg Pred Value : 0.8477          
##              Prevalence : 0.4959          
##          Detection Rate : 0.4283          
##    Detection Prevalence : 0.5563          
##       Balanced Accuracy : 0.8049          
##                                           
##        'Positive' Class : 0               
##

6.3 Average Phase Estimation based Random Forest

We will see that with average phase estimation, our model has bad classification performance. This also indicates the importance of phase information.

rf_train_data_ift <- c()
label_train <- c()
test_epoch_idx <- 4
for(i in setdiff(1:11,test_epoch_idx)){
  rf_train_data_ift <- rbind(rf_train_data_ift,ift_AvgPhase[i,,])
  label_train <- c(label_train,comp_imp_tight106_UKBB_data[i, ,96])
}
rf_train_data_ift <- as.data.frame(rf_train_data_ift)
rf_train_label_ift <- as.factor(label_train)

set.seed(1234)
rf_Phase_ift_avg <- randomForest(rf_train_label_ift ~ . ,data= rf_train_data_ift)
rf_Phase_ift_avg

## 
## Call:
##  randomForest(formula = rf_train_label_ift ~ ., data = rf_train_data_ift) 
##                Type of random forest: classification
##                      Number of trees: 500
## No. of variables tried at each split: 7
## 
##         OOB estimate of  error rate: 46.69%
## Confusion matrix:
##      0    1 class.error
## 0 2325 2138   0.4790500
## 1 2064 2473   0.4549262

pred_train = predict(rf_Phase_ift_avg, type="class")
confusionMatrix(pred_train, rf_train_label_ift)

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction    0    1
##          0 2325 2064
##          1 2138 2473
##                                           
##                Accuracy : 0.5331          
##                  95% CI : (0.5227, 0.5435)
##     No Information Rate : 0.5041          
##     P-Value [Acc > NIR] : 1.961e-08       
##                                           
##                   Kappa : 0.066           
##                                           
##  Mcnemar's Test P-Value : 0.2601          
##                                           
##             Sensitivity : 0.5210          
##             Specificity : 0.5451          
##          Pos Pred Value : 0.5297          
##          Neg Pred Value : 0.5363          
##              Prevalence : 0.4959          
##          Detection Rate : 0.2583          
##    Detection Prevalence : 0.4877          
##       Balanced Accuracy : 0.5330          
##                                           
##        'Positive' Class : 0               
##

6.4 Prediction Accuracy Trend w.r.t. Degrees of Freedom in Phase Estimation

Finally, we visualize the training and testing accuracy of Inverse FFT-based Random Forest with different degrees of freedom in the smoothing spline phase estimation process. Note that we include the Nil-Phase estimation and the Average-Phase estimation as special cases with df = 0 and df = 1 respectively. The dotted horizontal line in the plot represents the testing accuracy of original data based Random Forest. This plot provides strong evidence of the importance of correctly estimating kime-phases.

# helper function 1: compute smoothing spline with different degrees of freedom
ift_df <- function(df){
  set.seed(1234)
  ift_splinePhase <- array(NA, c(11, 900, 54))
  smooth_spline_value <- array(NA, c(11, 900, 54))
  for(i in 1:900){
    for(j in 1:54){
      smooth_spline_value[,i,j] <- smooth.spline(1:11,phase_FT_epochs_tight106_UKBB[,i,j],df = df)$y
    }
  }
  for(i in 1:11){
    Real_Avg <- mag_FT_epochs_tight106_UKBB[i, , ] * cos(smooth_spline_value[i, , ])
    Imaginary_Avg <- mag_FT_epochs_tight106_UKBB[i, , ] * sin(smooth_spline_value[i, , ])
    ift_splinePhase[i,,] <- Re(fft(Real_Avg+1i*Imaginary_Avg, inverse = T)/length(mag_FT_epochs_tight106_UKBB[i,,]))
  }
  
  return(ift_splinePhase)
}

# helper function 2: compute model accuracy with different test epoch (mainly developed for the cross-validation part not shown)
rf_model_ift <- function(ift_splinePhase,test_epoch_idx){
  # dim(ift_splinePhase) 11,900,54
  rf_train_data_ift <- c()
  label_train <- c()
  for(i in setdiff(1:11,test_epoch_idx)){
    rf_train_data_ift <- rbind(rf_train_data_ift,ift_splinePhase[i,,])
    label_train <- c(label_train,comp_imp_tight106_UKBB_data[i, ,96])
  }
  rf_train_data_ift <- as.data.frame(rf_train_data_ift)
  rf_train_label_ift <- as.factor(label_train)
  # dim(rf_train_data_ift) 9000, 54
  set.seed(1234)
  rf_Phase_ift <- randomForest(rf_train_label_ift ~ . ,data= rf_train_data_ift)
  # OOB_accuracy <- 1- rf_Phase_ift$err.rate[,1][500] # 500 - number of trees
  
  # rf_Phase_ift 
  pred_train = predict(rf_Phase_ift, type="class")
  train_accuracy = sum(pred_train == rf_train_label_ift)/length(rf_train_label_ift)
  
  # test
  rf_test_data_ift <- ift_splinePhase[test_epoch_idx,,]
  rf_test_label_ift <- as.factor(comp_imp_tight106_UKBB_data[test_epoch_idx, ,96])
  
  pred_test = predict(rf_Phase_ift, rf_test_data_ift, type="class")
  test_accuracy = sum(pred_test == rf_test_label_ift)/length(rf_test_label_ift)

  return(c(train_accuracy,test_accuracy))
}

# This chunk is not setup to run due to long runtime; instead we load the pretrained results.
train_test_accuracy <- c()
for(df in 2:11){
  train_test_accuracy <- rbind(train_test_accuracy,c(rf_model_ift(ift_df(df),4),df = df))
}

train_test_accuracy <- as.data.frame(train_test_accuracy)
names(train_test_accuracy) <- c("train_accuracy","test_accuracy","df")

# add results from nil-phase and average phase 
train_test_accuracy <- rbind(train_test_accuracy,c(rf_model_ift(ift_AvgPhase,4),1)) # avg-phase
train_test_accuracy <- rbind(train_test_accuracy,c(rf_model_ift(ift_NilPhase,4),0)) # nil-phase
origin_acc <- rf_model_ift(comp_imp_tight106_UKBB_data[,,c(51:94,97:106)],4)        # original data
save(train_test_accuracy,origin_acc, file = "train_test_accuracy_testepoch4.RData" )

library(ggplot2)
load("train_test_accuracy_testepoch4.RData")
accuracy_data <- reshape(train_test_accuracy, idvar = "df", varying = list(1:2),v.names = "accuracy", direction = "long") 

names(accuracy_data) <- c("df","group","accuracy")
accuracy_data$group <- factor(accuracy_data$group,levels = c(1,2),labels = c("train","test"))
ggplot(data = accuracy_data, aes(x= df, y = accuracy, group = group)) +
  geom_line(aes(color=group))+
  geom_point(aes(shape=group,color=group)) + 
  scale_x_continuous(breaks = accuracy_data$df) + 
  geom_text(aes(label=round(accuracy,2)),hjust=0,vjust=0.2)+
  ggtitle("Train & Test Accuracy of Inverse FFT-based RF with Different Degree of Freedom") + 
  geom_hline(yintercept = origin_acc[2], linetype="dashed")

7 Kimesurface Transformation

We further compare the characteristics of two groups of people with or without suffering from depression in the kime space. For better visualization, we show the kime surface of one feature (“X4631.2.0: Ever unenthusiastic/disinterested for a whole week”) that is most correlated with depression as an example.

# data preparation
load("origin_after_imputation.RData")
set.seed(1234)
suffle_rid <- sample(1:nrow(comp_imp_tight106_UKBB_data))
comp_imp_tight106_UKBB_data <- as.data.frame(comp_imp_tight106_UKBB_data[suffle_rid[1:9900],])

depressed_data <- filter(comp_imp_tight106_UKBB_data, X4598.2.0 == 1) # 5007, 106
not_depressed_data <- filter(comp_imp_tight106_UKBB_data, X4598.2.0 == 0)

depressed_idx = which(comp_imp_tight106_UKBB_data$X4598.2.0 == 1)
not_depressed_idx = which(comp_imp_tight106_UKBB_data$X4598.2.0 == 0)

load("ift_splinePhase.RData")
dim(ift_splinePhase) <- c(9900,54)
depressed_data_ift = ift_splinePhase[depressed_idx,]    # dim: 5007, 54
not_depressed_data_ift = ift_splinePhase[not_depressed_idx,]  # dim: 4893, 54

# divide into 110 epochs
comp_imp_tight106_UKBB_data_3D <- as.matrix(comp_imp_tight106_UKBB_data)
dim(comp_imp_tight106_UKBB_data_3D) <- c(110,90,106)

ift_splinePhase_3D <- ift_splinePhase
dim(ift_splinePhase_3D) <- c(110,90,54)

# col45: (most salient predictor) X4631.2.0 (Ever unenthusiastic/disinterested for a whole week)
depressed_data_ift_avg_col45 <- c()
not_depressed_data_ift_avg_col45 <- c()
for(i in 1:110){
  depressed_idx_epoch = which(comp_imp_tight106_UKBB_data_3D[i,,96] == 1)
  not_depressed_idx_epoch = which(comp_imp_tight106_UKBB_data_3D[i,,96] == 0)
  depressed_data_ift_avg_col45[i] = mean(ift_splinePhase_3D[i,depressed_idx_epoch,45])
  not_depressed_data_ift_avg_col45[i] = mean(ift_splinePhase_3D[i,not_depressed_idx_epoch,45])
}

We apply NuLT function which achieves Laplace Transform to two time-series respectively and we visualize two kime-surfaces.

source('LT_ILT.R')
x2 = seq(from = 1, to = 11, length.out = 50)
y2 = seq(from = -5, to = 5, length.out = 50)
XY = expand.grid(X = x2, Y = y2)       
complex_xy = mapply(complex, real=XY$X,imaginary=XY$Y)
time_points <- seq(0+0.001, 2*pi, length.out = 110)

Y1 = depressed_data_ift_avg_col45
kime_depressed <- NuLT(time_points, Y1, complex_xy, mirror = FALSE, range = 2*pi,k = 3)
dim(kime_depressed) = c(length(x2), length(y2))

Y2 = not_depressed_data_ift_avg_col45
kime_not_depressed <- NuLT(time_points, Y2, complex_xy, mirror = FALSE, range = 2*pi,k = 3)
dim(kime_not_depressed) = c(length(x2), length(y2))

kime_grid1 <- kime_depressed
kime_grid2 <- kime_not_depressed
phase_depressed <- atan2(Im(kime_grid1), Re(kime_grid1))
magnitude_depressed <- sqrt( Re(kime_grid1)^2+ Im(kime_grid1)^2)
phase_not_depressed <- atan2(Im(kime_grid2), Re(kime_grid2))
magnitude_not_depressed <- sqrt(Re(kime_grid2)^2+ Im(kime_grid2)^2)

plot_ly(hoverinfo="none", showscale = FALSE)%>%
  add_trace(z=magnitude_depressed, surfacecolor=phase_depressed, type="surface",showlegend=TRUE,name = 'Depressed') %>%
  add_trace(z=magnitude_not_depressed, surfacecolor=phase_not_depressed, type="surface", opacity=0.7,showlegend=TRUE,name = 'Not Depressed') %>%
  
  layout(title = "X4631.2.0 Kimesurface Difference: Depressed vs Not Depressed",
         # X4631.2.0 (Ever unenthusiastic/disinterested for a whole week)
         showlegend = TRUE,
         scene = list(aspectmode = "manual",
                      aspectratio = list(x=1, y=1, z=1.0)  
         )
  )

8 Inverse Kimesurface Transformation

We further recover the time-series from the kime-surface by applying the ctILT to the NuLT function output.

z2_funct_depressed <- function(z) NuLT(time_points,depressed_data_ift_avg_col45, z, k = 3, mirror = FALSE, fitwarning = FALSE, range = 2*pi)
inv_result_depressed <- ctILT(z2_funct_depressed,nnt = 110,tend = 2*pi)

##  [1] -2.980958e+03  2.980958e+03 -2.980956e+03  2.980935e+03 -2.980803e+03
##  [6]  2.980144e+03 -2.977506e+03  2.968840e+03 -2.945008e+03  2.889400e+03
## [11] -2.778184e+03  2.586084e+03 -2.297934e+03  1.921121e+03 -1.490479e+03
## [16]  1.059837e+03 -6.830244e+02  3.948739e+02 -2.027736e+02  9.155766e+01
## [21] -3.594967e+01  1.211768e+01 -3.451502e+00  8.139694e-01 -1.545862e-01
## [26]  2.270959e-02 -2.420876e-03  1.665740e-04 -5.552467e-06

# plot to validate LT and ILT
plot(time_points, inv_result_depressed, col="blue",
     xlim=c(0,2*pi))
lines(time_points,depressed_data_ift_avg_col45, col="red")
legend("top",c("recovered","data"), fill=c("blue","red"))
grid(nx = NULL, ny = NULL, col = "lightgray", lty = "dotted")

We plot the difference of recovered and original time-series to validate the LT and ILT process.

# diff
plot(time_points,inv_result_depressed - depressed_data_ift_avg_col45,lty = 1,lwd = 1,'l',col = 'blue',ylab = "diff", main = 'Difference of IFT recovery and original time-series')

Spacekime Analytics (Time Complexity and Inferential Uncertainty)

Structured Big Data Analytics Case-Study (UKBB)

SOCR Team

01/18/2022