SOCR ≫ TCIU Website ≫ TCIU GitHub ≫

1 IID Spacetime Sampling vs. Spacekime Sampling

For all natural spacetime processes, various population characteristics like the mean, variance, range, and quantiles can be estimated by collecting independent and identically distributed (IID) samples. These samples represent observed data that is traditionally used to obtain sample-driven estimates of the specific population characteristics via standard formulas like the sample arithmetic average, variance, range, quantiles, etc. The latter approximate their population counterparts and form the basis for classical parametric and non-parametric statistical inference.

Typically, reliable spacetime statistical inference is conditional on the distribution of the native process as well as a sample-size reflecting the characteristics of the phenomenon. We will demonstrate that spacekime analytics can be equally effective with measuring a single spacetime observation and having a reasonable estimates of the unobserved process kime-phases.

Without loss of generality, suppose we have a pair of cohorts \(A\) and \(B\) and we obtain a series of measurements \(\{X_{A,i}\}_{i=1}^{n_A}\) and \(\{X_{B,i}\}_{i=1}^{n_B}\), respectively. Obviously the relations between the cohorts could widely vary, from being samples of the same process, to being related or completely independent.

To allow us to examine the extreme cases of pairing (1) IID cohorts (\(A\) and \(B\)), and (2) independent but differently distributed cohorts (\(A\) and \(C\)). The latter case may be thought of as a split of the first cohort (\(A\)) into training (\(C\)) and testing (\(D\)) groups. This design allows us to compare the classical spacetime-derived population characteristics of cohort \(A\) to their spacekime-reconstructed counterparts obtained using a single random kime-magnitude observation from \(A\) and kime-phases estimates derived from cohorts \(B\), \(C\) or \(D\).

1.1 fMRI timeseries

The demonstration below is based on a functional magnetic resonance imaging (fMRI) data, which is a 4D hypervolume with intensities representing the blood oxygenation level dependence at a specific spacetime location \((x,y,z,t)\). For simplicity, we will only focus on two fixed spatial locations with varying intensity distributions.

library(EBImage)
require(brainR)
library(spatstat) 
library(ggplot2)
library(kSamples)
library(reshape2)
library(beanplot)
library(rstanarm)

fMRIURL <- "http://socr.umich.edu/HTML5/BrainViewer/data/fMRI_FilteredData_4D.nii.gz"
fMRIFile <- file.path(tempdir(), "fMRI_FilteredData_4D.nii.gz")
download.file(fMRIURL, dest=fMRIFile, quiet=TRUE)
fMRIVolume <- readNIfTI(fMRIFile, reorient=FALSE)
# dimensions: 64 x 64 x 21 x 180 ; 4mm x 4mm x 6mm x 3 sec 

fMRIVolDims <- dim(fMRIVolume); # fMRIVolDims
# time_dim <- fMRIVolDims[4]; time_dim ## 180

# 2. extract the time-corse of 1D mid-axial slice (3D) hypervolume
xA_fMRI_1D_x20_y20_z11 <- fMRIVolume[20, 20, 11, ]; # length(xA_fMRI_1D_x20_y20_z11)   #  180
# hist(xA_fMRI_1D_x20_y20_z11)
library(plotly)
plot_ly(x = ~xA_fMRI_1D_x20_y20_z11, type = "histogram") %>%
  layout(bargap=0.1)
xB_fMRI_1D_x30_y30_z13 <- fMRIVolume[30, 30, 13, ]; # length(xB_fMRI_1D_x30_y30_z13)   #  180
# hist(xB_fMRI_1D_x30_y30_z13)

# Now, combine your two 1D timeseries into one dataframe for joint hist plotting as densities.  
# First make a new column in each that will be 
# a variable to identify where they came from later.
xA_df <- as.data.frame(xA_fMRI_1D_x20_y20_z11)
colnames(xA_df) <- "value"; xA_df$cohort <- "xA"
xB_df <- as.data.frame(xB_fMRI_1D_x30_y30_z13)
colnames(xB_df) <- "value"; xB_df$cohort <- "xB"

# and combine into your new data frame vegLengths
xA_xB_df <- rbind(xA_df, xB_df)

# ggplot(xA_xB_df, aes(value, fill = cohort)) + 
#   geom_density(alpha = 0.5, size=1.2) +
#   theme(text = element_text(size=20)) +
#   xlim(c(10200, 12000))

density_xA <- density(xA_xB_df[ which(xA_xB_df$cohort=="xA"), ]$value)
density_xB <- density(xA_xB_df[ which(xA_xB_df$cohort=="xB"), ]$value)
df_xA <- as.data.frame(cbind(x=density_xA$x, y=density_xA$y))
df_xB <- as.data.frame(cbind(x=density_xB$x, y=density_xB$y))

plot_ly(df_xA, x = ~x, y = ~y, type = 'scatter', 
        mode = 'lines', name = "xA", fill = 'tozeroy') %>% 
  add_trace(x = ~df_xB$x, y = ~df_xB$y, type = 'scatter', 
        mode = 'lines', name = "xB", fill = 'tozeroy') %>% 
  layout(title="Cohort A and B Distributions",
         xaxis = list(title = 'value'), yaxis = list(title = 'Density'))

Clearly the intensities of cohorts \(A\) and \(B\) are independent and follow different distribution. We’ll split the first cohort (\(A\)) into training (\(C\)) and testing (\(D\)) subgroups. Then we will:

  • transform all four cohorts into Fourier k-space,
  • iteratively randomly sample single observations from cohort \(C\),
  • reconstruct the data into spacetime using alternative kime-phase estimates derived from cohorts \(B\) and \(D\), and
  • compute and compare the classical spacetime-derived population characteristics of cohort \(A\) to their counterparts obtained using a single \(C\) kime-radial measurements paired with \(B\) and \(D\) kime-phases.
# Generic function to Transform Data to/from k-space (Space/Fourier domain)
kSpaceTransform <- function(data, inverse = FALSE, reconPhases = NULL) {
  # ForwardFT (rawData, FALSE, NULL)
  # InverseFT(magnitudes, TRUE, reconPhasesToUse) or InverseFT(FT_data, TRUE, NULL)
  FT_data <- array(complex(), length(data))
  mag_FT_data <- array(complex(), length(data))
  phase_FT_data <- array(complex(), length(data))
  IFT_reconPhases_data <- array(complex(), length(data))

  if (inverse == FALSE | is.null(reconPhases)) {
      FT_data <- fft(data, inverse)
      X2 <- FT_data
      mag_FT_data <- sqrt(Re(X2)^2+Im(X2)^2) 
      phase_FT_data <- atan2(Im(X2), Re(X2)) 
  }
  else {  # for IFT synthesis using user-provided Phases, typically from kime-phase aggregators
      Real <- data * cos(reconPhases)  
      Imaginary <- data * sin(reconPhases) 
      IFT_reconPhases_data <- 
          Re(fft(Real+1i*Imaginary, inverse = TRUE)/length(data))
  }

    ######### Test the FT-IFT analysis-synthesis back-and-forth transform process 
    #         to confirm calculations
    # X2 <- FT_data[ , 1]; mag_FT_data[ , 1] <- sqrt(Re(X2)^2+Im(X2)^2); 
    # phase_FT_data[ , 1] <- atan2(Im(X2), Re(X2)); 
    # Real2 = mag_FT_data[ , 1] * cos(phase_FT_data[ , 1])
    # Imaginary2 = mag_FT_data[ , 1] * sin(phase_FT_data[ , 1])
    # man_hat_X2 = Re(fft(Real2 + 1i*Imaginary2, inverse = T)/length(X2))
    # ifelse(abs(man_hat_X2[5] - data[5, 1]) < 0.001, "Perfect Syntesis", "Problems!!!")
    #########
  
    if (inverse == FALSE | is.null(reconPhases)) {
      return(list("magnitudes"=mag_FT_data, "phases"=phase_FT_data))
      # Use kSpaceTransform$magnitudes & kSpaceTransform$phases to retrieve teh Mags and Phases
    }
    else {
      return(IFT_reconPhases_data)
      # Use Re(kSpaceTransform) to extract spacetime Real-valued reconstructed data
    }
}

# 1. Split the first cohort ($A$) into *training* ($C$) and *testing* ($D$) subgroups.
subset_int <- sample(length(xA_df$value),floor(length(xA_df$value)*0.8))  
# 80% training + 20% testing
xC_fMRI_train <- xA_df$value [subset_int]; # length(xC_fMRI_train) # 144
xD_test <- xA_df$value [-subset_int]; # length(xD_test)  # 36

# 2. Transform all four cohorts into Fourier k-space
# xA, xB, xC_fMRI_train; xD_test
xA <- xA_fMRI_1D_x20_y20_z11; # length(xA)   #  180
xB <- xB_fMRI_1D_x30_y30_z13; # length(xB)   #  180

ft_xA <- fft(xA); ft_xB <- fft(xB)
ft_xC_fMRI_train <- fft(xC_fMRI_train); ft_xD_test <- fft(xD_test); 

# Magnitudes and Phases: Phase  <- atan(Im(img_ff)/Re(img_ff))
mag_ft_xA <- sqrt(Re(ft_xA)^2+Im(ft_xA)^2)
mag_ft_xB <- sqrt(Re(ft_xB)^2+Im(ft_xB)^2)
mag_ft_xC_fMRI_train <- sqrt(Re(ft_xC_fMRI_train)^2+Im(ft_xC_fMRI_train)^2)
mag_ft_xD_test <- sqrt(Re(ft_xD_test)^2+Im(ft_xD_test)^2)

phase_ft_xA <- atan2(Im(ft_xA), Re(ft_xA))
phase_ft_xB <- atan2(Im(ft_xB), Re(ft_xB))
phase_ft_xC_fMRI_train <- atan2(Im(ft_xC_fMRI_train), Re(ft_xC_fMRI_train))
phase_ft_xD_test <- atan2(Im(ft_xD_test), Re(ft_xD_test))

# Double-Check FT-IFT==I   ImplicitlyInvert the FT (IFT)
fftinv <- function( x ) { fft( x, inverse=TRUE ) / length( x ) }
# head(Re(fftinv(ft_xA))); head(xA)

# 3. Iteratively randomly sample single observations from cohort $C$,
N <- 30 # to 30 simulations
# take a random sample of size N (without replacement) from $C$
N_sampleIndx <- sample(1:length(xC_fMRI_train), N, replace=FALSE)
xC_fMRI_sampleN <- xC_fMRI_train[N_sampleIndx]
ft_xC_fMRI_sampleN_mag <- mag_ft_xC_fMRI_train[N_sampleIndx]

# 4. reconstruct the $C$ data into spacetime using a single ft_xC_fMRI_sampleN_mag value and alternative kime-phase estimates derived from cohorts $B$ and $D$ 
# for each ft_xC_fMRI_sampleN_mag[i] value, use $B$ and $D$ phases to reconstruct ift_ft_xC_fMRI_sampleN_PhaseB ift_ft_xC_fMRI_sampleN_PhaseD
ift_ft_xC_fMRI_1sampleN_PhaseB <- 
      array(dim=c(length(xC_fMRI_train), length(xC_fMRI_sampleN)))
ift_ft_xC_fMRI_1sampleN_PhaseD <-
      array(dim=c(length(xC_fMRI_train),length(xC_fMRI_sampleN)))
ift_ft_xC_fMRI_1sampleN_PhaseC <-
      array(dim=c(length(xC_fMRI_train),length(xC_fMRI_sampleN)))
ift_ft_xC_fMRI <- array(dim=length(xC_fMRI_train))
# dim(ift_ft_xC_fMRI_1sampleN_PhaseB) # [1] Time=144 Samples_N=30

for (i in 1:N) {
  ift_ft_xC_fMRI_1sampleN_PhaseB[ , i] <- 
    Re(kSpaceTransform(rep(ft_xC_fMRI_sampleN_mag[i], length(xC_fMRI_train)),
                       TRUE, phase_ft_xB[1:length(xC_fMRI_train)]))
  ift_ft_xC_fMRI_1sampleN_PhaseD[ , i] <- 
    Re(kSpaceTransform(rep(ft_xC_fMRI_sampleN_mag[i],
                           length(phase_ft_xD_test)), TRUE,
                       phase_ft_xD_test[1:length(phase_ft_xD_test)]))
  ift_ft_xC_fMRI_1sampleN_PhaseC[ , i] <- 
    Re(kSpaceTransform(rep(ft_xC_fMRI_sampleN_mag[i], length(xC_fMRI_train)),
                       TRUE, phase_ft_xC_fMRI_train[1:length(xC_fMRI_train)]))
}
ift_ft_xC_fMRI <- Re(kSpaceTransform(mag_ft_xC_fMRI_train, TRUE,
                       phase_ft_xC_fMRI_train[1:length(xC_fMRI_train)]))
# head(xC_fMRI_train) == head(ift_ft_xC_fMRI)

# 5.  compute and compare the *classical spacetime-derived* population characteristics of cohort $A$ to their counterparts obtained using a single $C$ kime-radial measurements paired with $B$ and $D$ kime-phases.
# Data = xC_fMRI_train, ift_ft_xC_fMRI_1sampleN_PhaseB, ift_ft_xC_fMRI_1sampleN_PhaseD
# length(xC_fMRI_train) == length(ift_ft_xC_fMRI_1sampleN_PhaseB[ , 1])

summary(scale(xC_fMRI_train))
##        V1          
##  Min.   :-3.23209  
##  1st Qu.:-0.67117  
##  Median : 0.01454  
##  Mean   : 0.00000  
##  3rd Qu.: 0.63308  
##  Max.   : 3.48507
summary(scale(ift_ft_xC_fMRI_1sampleN_PhaseC[ , 15]))
##        V1          
##  Min.   :-2.85429  
##  1st Qu.:-0.70335  
##  Median :-0.02874  
##  Mean   : 0.00000  
##  3rd Qu.: 0.66837  
##  Max.   : 3.07460
summary(scale(ift_ft_xC_fMRI_1sampleN_PhaseB[ , 15]))
##        V1          
##  Min.   :-3.32577  
##  1st Qu.:-0.56299  
##  Median :-0.03926  
##  Mean   : 0.00000  
##  3rd Qu.: 0.60848  
##  Max.   : 2.56786
summary(scale(ift_ft_xC_fMRI_1sampleN_PhaseD[ , 15]))
##        V1          
##  Min.   :-2.09096  
##  1st Qu.:-0.62007  
##  Median :-0.04813  
##  Mean   : 0.00000  
##  3rd Qu.: 0.59282  
##  Max.   : 2.02428

1.1.1 Data Visualization

# Plot all histograms as densities
ift_ft_xC_fMRI_1sampleN_PhaseC_df <- 
  as.data.frame(scale(ift_ft_xC_fMRI_1sampleN_PhaseC))
# colnames(xA_df) <- "value"; xA_df$cohort <- "xA"
#  xA_scale_df <- as.data.frame(scale(xA_df$value))
#  colnames(xA_scale_df) <- "value"; xA_scale_df$cohort <- "xA"
xB_df <- as.data.frame(scale(xB_fMRI_1D_x30_y30_z13))
colnames(xB_df) <- "value"; xB_df$cohort <- "xB"

# 
# and combine into your new data frame Lengths
xA_xB_df <- rbind(xA_df, xB_df)

# ggplot(xA_xB_df, aes(value, fill = cohort)) + 
#   geom_density(alpha = 0.5, size=1.2) +
#   theme(text = element_text(size=20)) +
#   xlim(c(10200, 12000))

density_xA <- density(xA_df$value)
df_xA <- as.data.frame(cbind(x=density_xA$x, y=density_xA$y))

plot_ly(df_xA, x = ~x, y = ~y, type = 'scatter', 
        mode = 'lines', name = "xA", fill = 'tozeroy') %>% 
  layout(title="Cohort A Distribution",
         xaxis = list(title = 'value'), yaxis = list(title = 'Density'))
# length(xC_fMRI_train); dim(ift_ft_xC_fMRI_1sampleN_PhaseB)
# dim(ift_ft_xC_fMRI_1sampleN_PhaseC); dim(ift_ft_xC_fMRI_1sampleN_PhaseD)

# Compute the averages accross all N=30 experiments for the B, C & D reconstructions
ift_ft_xC_fMRI_1sampleN_PhaseB_avg <- apply(ift_ft_xC_fMRI_1sampleN_PhaseB, 1, mean)
ift_ft_xC_fMRI_1sampleN_PhaseC_avg <- apply(ift_ft_xC_fMRI_1sampleN_PhaseC, 1, mean)
ift_ft_xC_fMRI_1sampleN_PhaseD_avg <- apply(ift_ft_xC_fMRI_1sampleN_PhaseD, 1, mean)

# Plot 4 density curves (orig=xC_fMRI and 3 reconstructions from B, C and D)
xC_fMRI_train_scale_df <- as.data.frame(scale(xC_fMRI_train))
colnames(xC_fMRI_train_scale_df) <- "value"; xC_fMRI_train_scale_df$series <- "xC_fMRI_original"

ift_ft_xC_fMRI_1sampleN_PhaseB_avg_scale_df <- as.data.frame(scale(ift_ft_xC_fMRI_1sampleN_PhaseB_avg))
colnames(ift_ft_xC_fMRI_1sampleN_PhaseB_avg_scale_df) <- "value"
ift_ft_xC_fMRI_1sampleN_PhaseB_avg_scale_df$series <- "SK_PhaseB"

ift_ft_xC_fMRI_1sampleN_PhaseC_avg_scale_df <- as.data.frame(scale(ift_ft_xC_fMRI_1sampleN_PhaseC_avg))
colnames(ift_ft_xC_fMRI_1sampleN_PhaseC_avg_scale_df) <- "value"
ift_ft_xC_fMRI_1sampleN_PhaseC_avg_scale_df$series <- "SK_PhaseC"

ift_ft_xC_fMRI_1sampleN_PhaseD_avg_scale_df <- as.data.frame(scale(ift_ft_xC_fMRI_1sampleN_PhaseD_avg))
colnames(ift_ft_xC_fMRI_1sampleN_PhaseD_avg_scale_df) <- "value"
ift_ft_xC_fMRI_1sampleN_PhaseD_avg_scale_df$series <- "SK_PhaseD"

# and combine into your new data frame vegLengths
xC_fMRI_SK_Phases_B_C_D_df <- rbind(xC_fMRI_train_scale_df, ift_ft_xC_fMRI_1sampleN_PhaseB_avg_scale_df, 
             ift_ft_xC_fMRI_1sampleN_PhaseC_avg_scale_df, ift_ft_xC_fMRI_1sampleN_PhaseD_avg_scale_df)

# library(ggplot2)
# ggplot(xC_fMRI_SK_Phases_B_C_D_df, aes(value, fill = series)) + 
#   geom_density(aes(color=series, linetype = series), alpha=0.4, size=1.2) +  # position = "stack"
#   theme(text = element_text(size=20)) +
#   scale_fill_manual( values = c("yellow", "red", "blue", "green")) +
#   geom_line(data=xC_fMRI_train_scale_df, stat = "density", color="purple", lty=4, lwd=2) +
#   ## guides(color = guide_legend(order=1)) +
#   theme(axis.title.x=element_blank(),axis.text.x=element_blank(), axis.ticks.x=element_blank())
#   # theme(legend.position="bottom")

density_xC_fMRI_orig <- density(xC_fMRI_SK_Phases_B_C_D_df[ 
  which(xC_fMRI_SK_Phases_B_C_D_df$series=="xC_fMRI_original"), ]$value)
density_SK_PhaseB <- density(xC_fMRI_SK_Phases_B_C_D_df[ 
  which(xC_fMRI_SK_Phases_B_C_D_df$series=="SK_PhaseB"), ]$value)
density_SK_PhaseC <- density(xC_fMRI_SK_Phases_B_C_D_df[ 
  which(xC_fMRI_SK_Phases_B_C_D_df$series=="SK_PhaseC"), ]$value)
density_SK_PhaseD <- density(xC_fMRI_SK_Phases_B_C_D_df[ 
  which(xC_fMRI_SK_Phases_B_C_D_df$series=="SK_PhaseD"), ]$value)

df_xC_fMRI_orig <- as.data.frame(cbind(
  x=density_xC_fMRI_orig$x, y=density_xC_fMRI_orig$y))
df_SK_PhaseB <- as.data.frame(cbind(
  x=density_SK_PhaseB$x, y=density_SK_PhaseB$y))
df_SK_PhaseC <- as.data.frame(cbind(
  x=density_SK_PhaseC$x, y=density_SK_PhaseC$y))
df_SK_PhaseD <- as.data.frame(cbind(
  x=density_SK_PhaseD$x, y=density_SK_PhaseD$y))

plot_ly(df_xC_fMRI_orig, x = ~x, y = ~y, type = 'scatter', 
        mode = 'lines', name = "xC_fMRI_orig", fill = 'tozeroy') %>% 
  add_trace(x = ~df_SK_PhaseB$x, y = ~df_SK_PhaseB$y, type = 'scatter', 
        mode = 'lines', name = "SK_PhaseB", fill = 'tozeroy') %>% 
  add_trace(x = ~df_SK_PhaseC$x, y = ~df_SK_PhaseC$y, type = 'scatter', 
        mode = 'lines', name = "SK_PhaseC", fill = 'tozeroy') %>% 
  add_trace(x = ~df_SK_PhaseD$x, y = ~df_SK_PhaseD$y, type = 'scatter', 
        mode = 'lines', name = "SK_PhaseD", fill = 'tozeroy') %>% 
  layout(title="Cohort Distribuitions - xC_fMRI_orig, SK_PhaseB, SK_PhaseC and SK_PhaseD", 
         legend = list(orientation = 'h'),
         xaxis = list(title = 'value'), yaxis = list(title = 'Density'))

1.1.1.1 Violin Plot

# ggplot(xC_fMRI_SK_Phases_B_C_D_df,aes(x=series, y=value, fill=series)) + 
#   geom_violin(trim=FALSE) +
#   geom_boxplot(width=0.1) + 
#   theme_bw()

xC_fMRI_SK_Phases_B_C_D_df %>%
  plot_ly(x = ~series, y = ~value , split = ~series, type = 'violin',
    box = list(visible = T), meanline = list(visible = T)) %>%
  layout(xaxis = list(title = "series"), 
         yaxis = list(title = "density", zeroline = F))

1.1.1.2 Empirical CDF

# ggplot(xC_fMRI_SK_Phases_B_C_D_df,aes(x=value, color=series)) +
#   stat_ecdf(size = 0.5)

df <- dplyr::arrange(xC_fMRI_SK_Phases_B_C_D_df, value)

pl <- ggplot(df, aes(x=value, color=series)) +
  stat_ecdf(size = 0.5)
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
ggplotly(pl) 

1.1.1.3 Beanplot

boxplot(xC_fMRI_SK_Phases_B_C_D_df$value ~ xC_fMRI_SK_Phases_B_C_D_df$series)
beanplot(xC_fMRI_SK_Phases_B_C_D_df$value ~ xC_fMRI_SK_Phases_B_C_D_df$series,
         border = "#CAB2D6",
         col = c("#CAB2D6", "#33A02C", "#B2DF8A"),
         side="second",add = T)

1.2 IID Simulation

The following simulation example generates two mixture distribution random samples each of \(n=10,000\) observations, \(\{X_{A,i}\}_{i=1}^{n_A}\), where \(X_{A,i} = 0.3U_i + 0.7V_i\), \(U_i \sim N(0,1)\) and \(V_i \sim N(5,3)\), and \(\{X_{B,i}\}_{i=1}^{n_B}\), where \(X_{B,i} = 0.4P_i + 0.6Q_i\), \(P_i \sim N(20,20)\) and \(Q_i \sim N(100,30)\).

n=10000
mu1 <- 0; mu2 <- 5
sig1 <- 1; sig2 <- 3
weight <- 0.7   

mixedDistFunc <- function (n, weight, mu1, mu2, sig1, sig2) {
    set.seed(1234); U <- rnorm(n, mean=mu1, sd = sig1)
    set.seed(1234); V <- rnorm(n,mean=mu2, sd = sig2)
    # randomly choose U or V
    set.seed(1234); wght <- rbinom(n, size=1, prob=weight) 
    X <- U*(1 - wght) + V*wght 
}

xA <- mixedDistFunc(n=n, weight, mu1, mu2, sig1, sig2)
hist(xA, freq = F)

# length(xA)   #  n=10,000

xB <- mixedDistFunc(n=n, 0.6, 20, 100, 20, 30)
hist(xB, freq = F)

# length(xB) 

# Now, combine your two univariate sets into one dataframe for joint hist plotting as densities.  
# First make a new column in each that will be 
# a variable to identify where they came from later.
xA_df <- as.data.frame(xA); colnames(xA_df)<-"value"; xA_df$cohort<-"xA"
xB_df <- as.data.frame(xB); colnames(xB_df)<-"value"; xB_df$cohort<-"xB"

# and combine into your new data frame vegLengths
xA_xB_df <- rbind(xA_df, xB_df)

Figure 5.2

ggplot(xA_xB_df, aes(value, fill = cohort)) + 
  geom_density(alpha = 0.5, size=1.2) +
  theme(text = element_text(size=20))

Clearly the intensities of cohorts \(A\) and \(B\) are independent and follow different distributions. We’ll split the first cohort (\(A\)) into training (\(C\)) and testing (\(D\)) subgroups, and then:

  • transform all four cohorts into Fourier k-space,
  • iteratively randomly sample single observations from cohort \(C\),
  • reconstruct the data into spacetime using a single kime-magnitude value and alternative kime-phase estimates derived from cohorts \(B\), \(C\), and \(D\), and
  • compute and compare the classical spacetime-derived population characteristics of cohort \(A\) to their spacekime counterparts obtained using a single \(C\) kime-magnitude paired with \(B\), \(C\), or \(D\) kime-phases.
# Generic function to Transform 1D Data to/from k-space (Space/Fourier domain)
kSpaceTransform <- function(data, inverse = FALSE, reconPhases = NULL) {
  # ForwardFT (rawData, FALSE, NULL)
  # InverseFT(magnitudes, TRUE, reconPhasesToUse) or InverseFT(FT_data, TRUE, NULL)
  FT_data <- array(complex(), length(data))
  mag_FT_data <- array(complex(), length(data))
  phase_FT_data <- array(complex(), length(data))
  IFT_reconPhases_data <- array(complex(), length(data))

  if (inverse == FALSE | is.null(reconPhases)) {
      FT_data <- fft(data, inverse)
      X2 <- FT_data
      mag_FT_data <- sqrt(Re(X2)^2+Im(X2)^2) 
      phase_FT_data <- atan2(Im(X2), Re(X2)) 
  }
  else {  # for IFT synthesis using user-provided Phases, typically from kime-phase aggregators
      Real <- data * cos(reconPhases)  
      Imaginary <- data * sin(reconPhases) 
      IFT_reconPhases_data <- 
          Re(fft(Real+1i*Imaginary, inverse = TRUE)/length(data))
  }

    ######### Test the FT-IFT analysis-synthesis back-and-forth transform process 
    #         to confirm calculations
    # X2 <- FT_data[ , 1]; mag_FT_data[ , 1] <- sqrt(Re(X2)^2+Im(X2)^2); 
    # phase_FT_data[ , 1] <- atan2(Im(X2), Re(X2)); 
    # Real2 = mag_FT_data[ , 1] * cos(phase_FT_data[ , 1])
    # Imaginary2 = mag_FT_data[ , 1] * sin(phase_FT_data[ , 1])
    # man_hat_X2 = Re(fft(Real2 + 1i*Imaginary2, inverse = T)/length(X2))
    # ifelse(abs(man_hat_X2[5] - data[5, 1]) < 0.001, "Perfect Syntesis", "Problems!!!")
    #########
  
    if (inverse == FALSE | is.null(reconPhases)) {
      return(list("magnitudes"=mag_FT_data, "phases"=phase_FT_data))
      # Use kSpaceTransform$magnitudes & kSpaceTransform$phases to retrieve teh Mags and Phases
    }
    else {
      return(IFT_reconPhases_data)
      # Use Re(kSpaceTransform) to extract spacetime Real-valued reconstructed data
    }
}

# 1. Split the first cohort ($A$) into *training* ($C$) and *testing* ($D$) subgroups.
subset_int <- sample(length(xA_df$value),floor(length(xA_df$value)*0.8))  
# 80% training + 20% testing
xC <- xA_df$value [subset_int]; # length(xC) # 8000
xD <- xA_df$value [-subset_int]; # length(xD)  # 2000

# 2. Transform all four cohorts into Fourier k-space
# xA, xB, xC (train), xD (test)
ft_xA <- fft(xA); ft_xB <- fft(xB)
ft_xC <- fft(xC); ft_xD <- fft(xD)

# Magnitudes and Phases: Phase  <- atan(Im(img_ff)/Re(img_ff))
mag_ft_xA <- sqrt(Re(ft_xA)^2+Im(ft_xA)^2)
mag_ft_xB <- sqrt(Re(ft_xB)^2+Im(ft_xB)^2)
mag_ft_xC <- sqrt(Re(ft_xC)^2+Im(ft_xC)^2)
mag_ft_xD <- sqrt(Re(ft_xD)^2+Im(ft_xD)^2)

phase_ft_xA <- atan2(Im(ft_xA), Re(ft_xA))
phase_ft_xB <- atan2(Im(ft_xB), Re(ft_xB))
phase_ft_xC <- atan2(Im(ft_xC), Re(ft_xC))
phase_ft_xD <- atan2(Im(ft_xD), Re(ft_xD))

# Double-Check FT-IFT==I   ImplicitlyInvert the FT (IFT)
fftinv <- function( x ) { fft( x, inverse=TRUE ) / length( x ) }
# head(Re(fftinv(ft_xA))); head(xA)

# 3. Iteratively randomly sample single observations from cohort $C$,
N <- 30 #  30 simulations
# take a random sample of size N (without replacement) from $C$
set.seed(1234); N_sampleIndx <- sample(1:length(xC), N, replace=FALSE)
xC_sampleN <- xC[N_sampleIndx]
ft_xC_mag <- mag_ft_xC[N_sampleIndx]

# 4. reconstruct the $C$ data into spacetime using a single ft_xC_sampleN_mag value and alternative kime-phase estimates derived from cohorts $B$ and $D$ 
# for each ft_xC_sampleN_mag[i] value, use $B$ and $D$ phases to reconstruct ift_ft_xC_sampleN_PhaseB ift_ft_xC_sampleN_PhaseD
ift_ft_xC_1sampleN_PhaseB <- 
      array(dim=c(length(xC), length(xC_sampleN)))
ift_ft_xC_1sampleN_PhaseD <-
      array(dim=c(length(xC), length(xC_sampleN)))
ift_ft_xC_1sampleN_PhaseC <-
      array(dim=c(length(xC), length(xC_sampleN)))
ift_ft_xC <- array(dim=length(xC))
# dim(ift_ft_xC_1sampleN_PhaseB) # [1] Size=8000 Samples_N=30

for (i in 1:N) {
  ift_ft_xC_1sampleN_PhaseB[ , i] <- 
    Re(kSpaceTransform(rep(ft_xC_mag[i], length(xC)),
                       TRUE, phase_ft_xB[1:length(xC)]))
  ift_ft_xC_1sampleN_PhaseD[ , i] <- 
    Re(kSpaceTransform(rep(ft_xC_mag[i], length(phase_ft_xD)), 
                       TRUE, phase_ft_xD[1:length(phase_ft_xD)]))
  ift_ft_xC_1sampleN_PhaseC[ , i] <- 
    Re(kSpaceTransform(rep(ft_xC_mag[i], length(xC)),
                       TRUE, phase_ft_xC[1:length(xC)]))
}
ift_ft_xC <- Re(kSpaceTransform(mag_ft_xC, TRUE,phase_ft_xC[1:length(xC)]))
# head(xC) - head(ift_ft_xC) # roundoff should be ~ 0

# 5.  compute and compare the *classical spacetime-derived* population characteristics of cohort $A$ to their counterparts obtained using a single $C$ kime-radial measurements paired with $B$ and $D$ kime-phases.
# Data = xC_train, ift_ft_xC_1sampleN_PhaseB, ift_ft_xC_1sampleN_PhaseD
# length(xC) == length(ift_ft_xC_1sampleN_PhaseB[ , 1])

summary(scale(xC))
##        V1          
##  Min.   :-2.38784  
##  1st Qu.:-0.88609  
##  Median :-0.03893  
##  Mean   : 0.00000  
##  3rd Qu.: 0.75821  
##  Max.   : 3.59925
summary(scale(ift_ft_xC_1sampleN_PhaseC[ , 15]))
##        V1          
##  Min.   :-2.52901  
##  1st Qu.:-0.76221  
##  Median :-0.05584  
##  Mean   : 0.00000  
##  3rd Qu.: 0.72999  
##  Max.   : 3.73114
summary(scale(ift_ft_xC_1sampleN_PhaseB[ , 15]))
##        V1           
##  Min.   :-3.798440  
##  1st Qu.:-0.636799  
##  Median : 0.009279  
##  Mean   : 0.000000  
##  3rd Qu.: 0.645119  
##  Max.   : 3.986702
summary(scale(ift_ft_xC_1sampleN_PhaseD[ , 15]))
##        V1          
##  Min.   :-2.66007  
##  1st Qu.:-0.79651  
##  Median :-0.08165  
##  Mean   : 0.00000  
##  3rd Qu.: 0.73477  
##  Max.   : 3.39448

1.2.1 Data Visualization

# Plot all histograms as densities
ift_ft_xC_1sampleN_PhaseC_df <- 
  as.data.frame(scale(ift_ft_xC_1sampleN_PhaseC))
# colnames(xA_df) <- "value"; xA_df$cohort <- "xA"
#  xA_scale_df <- as.data.frame(scale(xA_df$value))
#  colnames(xA_scale_df) <- "value"; xA_scale_df$cohort <- "xA"
xB_df <- as.data.frame(scale(xB))
colnames(xB_df) <- "value"; xB_df$cohort <- "xB"

# 
# and combine into your new data frame Lengths
xA_xB_df <- rbind(xA_df, xB_df)


ggplot(xA_xB_df, aes(value, fill = cohort)) + 
  geom_density(alpha = 0.5, size=1.2) +
  theme(text = element_text(size=20))

# length(xC); dim(ift_ft_xC_1sampleN_PhaseB)
# dim(ift_ft_xC_1sampleN_PhaseC); dim(ift_ft_xC_1sampleN_PhaseD)

# Compute the averages accross all N=30 experiments for the B, C & D spacekime (IFT) reconstructions
ift_ft_xC_1sampleN_PhaseB_avg <- apply(ift_ft_xC_1sampleN_PhaseB, 1, mean)
ift_ft_xC_1sampleN_PhaseC_avg <- apply(ift_ft_xC_1sampleN_PhaseC, 1, mean)
ift_ft_xC_1sampleN_PhaseD_avg <- apply(ift_ft_xC_1sampleN_PhaseD, 1, mean)

# Plot 4 density curves (orig=xC and 3 reconstructions from B, C and D)
xC_scale_df <- as.data.frame(xC)
colnames(xC_scale_df) <- "value"; xC_scale_df$series <- "xC_original"

ift_ft_xC_1sampleN_PhaseB_avg_scale_df <- 
    as.data.frame(ift_ft_xC_1sampleN_PhaseB_avg)
colnames(ift_ft_xC_1sampleN_PhaseB_avg_scale_df) <- "value"
ift_ft_xC_1sampleN_PhaseB_avg_scale_df$series <- "SK_PhaseB"

ift_ft_xC_1sampleN_PhaseC_avg_scale_df <- 
    as.data.frame(ift_ft_xC_1sampleN_PhaseC_avg)
colnames(ift_ft_xC_1sampleN_PhaseC_avg_scale_df) <- "value"
ift_ft_xC_1sampleN_PhaseC_avg_scale_df$series <- "SK_PhaseC"

ift_ft_xC_1sampleN_PhaseD_avg_scale_df <- 
      as.data.frame(ift_ft_xC_1sampleN_PhaseD_avg)
colnames(ift_ft_xC_1sampleN_PhaseD_avg_scale_df) <- "value"
ift_ft_xC_1sampleN_PhaseD_avg_scale_df$series <- "SK_PhaseD"

# Combine into a new data frame xC_SK_Phases_B_C_D_df
xC_SK_Phases_B_C_D_df <- rbind(xC_scale_df, ift_ft_xC_1sampleN_PhaseB_avg_scale_df, 
             ift_ft_xC_1sampleN_PhaseC_avg_scale_df, ift_ft_xC_1sampleN_PhaseD_avg_scale_df)

# library(ggplot2)
ggplot(xC_SK_Phases_B_C_D_df, aes(value, fill = series)) + 
  geom_density(aes(color=series, linetype = series), alpha=0.4, size=1.2) +  # position = "stack"
  theme(text = element_text(size=20)) +
  scale_fill_manual( values = c("yellow", "red", "blue", "green")) +
  geom_line(data=xC_scale_df, stat = "density", color="purple", lty=4, lwd=2) +
  ## guides(color = guide_legend(order=1)) +
  theme(axis.title.x=element_blank(),axis.text.x=element_blank(), axis.ticks.x=element_blank()) # +

  # xlim(c(-1, 1))
  # theme(legend.position="bottom")

#ggplot(xC_SK_Phases_B_C_D_df, aes(value, fill = series)) + 
#  theme(text = element_text(size=20)) +
#  scale_fill_manual( values = c("yellow", "red", "blue", "green")) +
#  geom_line(stat = "density", lty=4, lwd=2) +
#  ## guides(color = guide_legend(order=1)) +
#  theme(axis.title.x=element_blank(),axis.text.x=element_blank(), #axis.ticks.x=element_blank())

ggplot(xC_SK_Phases_B_C_D_df, aes(value, fill = series)) + 
     geom_line(aes(colour=series, lty=series), stat = "density", lwd=1.5) +
  theme(axis.title.x=element_blank(),axis.text.x=element_blank(),
        axis.ticks.x=element_blank())

# Are the xC (original training) and ift_ft_xC_1sampleN_PhaseC_avg (reconstructed) CORRELATED?
# cor(xC, ift_ft_xC_1sampleN_PhaseC_avg)  # [1] 0.8872053
# Are the xC (original training) and ift_ft_xC_1sampleN_PhaseD_avg (reconstructed) CORRELATED?
# cor(xC, ift_ft_xC_1sampleN_PhaseD_avg)  # [1] 0.005248561
# Are the xC (original training) and ift_ft_xC_1sampleN_PhaseB_avg (reconstructed) CORRELATED?
# cor(xC, ift_ft_xC_1sampleN_PhaseB_avg)  # [1] -0.001070121

Figure 5.3

# Plot xC vs. ift_ft_xC_1sampleN_PhaseC_avg
plot(xC, ift_ft_xC_1sampleN_PhaseC_avg, xlab = "Original", ylab = "Reconstructed",
     main = "Spacekime signal reconstruction using \n a single spacetime observation and perfect kime-phases",
     cex=1.4)
abline(lm(ift_ft_xC_1sampleN_PhaseC_avg ~ xC), col="red", lwd=2)
text(0, 9, sprintf("Corr(Orig, Rec)=%s", 
                    format(cor(scale(ift_ft_xC_1sampleN_PhaseC_avg), scale(xC)), digits=2)))

#Additional quantitative measures quantifying the differences between distribution (original signal and spacekime reconstructions) include two-sample Kolmogorov-Smirnov (KS) test and the correlation coefficient. The KS test statistic (D) is the maximum distance between the estimated cumulative distribution functions and the corresponding p-value is the probability of seeing a test statistic as high or higher than the one observed given that the two samples were drawn from the same distribution.  In our case, comparing the distributions of the original data and its reconstruction using a single kime magnitude and the correct kime-phases yields a KS statistics D = 0.053875, and p_value= 1.647?e^(-10). This suggests very strong statistical evidence (due to the large sample size) and marginal practical difference between the real and reconstructed signals. Comparing a pair of reconstructions using a single kime-magnitude value and two independent kime-phase estimates (cohorts C and D) yields D = 0.017375, and p_value= 0.1786.

ks.test(scale(xC), scale(ift_ft_xC_1sampleN_PhaseC_avg))
## 
##  Asymptotic two-sample Kolmogorov-Smirnov test
## 
## data:  scale(xC) and scale(ift_ft_xC_1sampleN_PhaseC_avg)
## D = 0.05775, p-value = 5.174e-12
## alternative hypothesis: two-sided
#      D = 0.053875, p-value = 1.647e-10, alternative hypothesis: two-sided
ks.test(scale(ift_ft_xC_1sampleN_PhaseC_avg), scale(ift_ft_xC_1sampleN_PhaseD_avg))
## 
##  Asymptotic two-sample Kolmogorov-Smirnov test
## 
## data:  scale(ift_ft_xC_1sampleN_PhaseC_avg) and scale(ift_ft_xC_1sampleN_PhaseD_avg)
## D = 0.014625, p-value = 0.3592
## alternative hypothesis: two-sided
#      D = 0.017375, p-value = 0.1786, alternative hypothesis: two-sided
# wilcox.test(scale(xC), scale(ift_ft_xC_1sampleN_PhaseC_avg)) # insignificant because of scaling
#null_distribution <- function (x) {
#  if (!is.na(approxfun(density(scale(xC)))(x))) {
#    return (approxfun(density(scale(xC)))(x))
#  } else { return (0) }
#}
# Cramer-Von Mises Test of Goodness-of-Fit
#cvm.test(scale(ift_ft_xC_1sampleN_PhaseC_avg), null_distribution) # omega2 = 1854.8, p-value = 0.2955
# Anderson-Darling Test of Goodness-of-Fit
#ad.test(scale(ift_ft_xC_1sampleN_PhaseC_avg), null_distribution) # 

# Anderson-Darling Test of Goodness-of-Fit

ad.test(scale(ift_ft_xC_1sampleN_PhaseC_avg), scale(xC)) # T.AD=19.18; asympt. P-value=8.578e-09
## 
## 
##  Anderson-Darling k-sample test.
## 
## Number of samples:  2
## Sample sizes:  8000, 8000
## Number of ties: 0
## 
## Mean of  Anderson-Darling  Criterion: 1
## Standard deviation of  Anderson-Darling  Criterion: 0.76131
## 
## T.AD = ( Anderson-Darling  Criterion - mean)/sigma
## 
## Null Hypothesis: All samples come from a common population.
## 
##               AD  T.AD  asympt. P-value
## version 1: 16.18 19.93        4.168e-09
## version 2: 16.20 19.94        4.291e-09

Figure 5.4

plot(density(scale(xC)), col="black", lty=1, lwd=3, xaxt="n", xlab = "Range",
     main="Spacetime Original vs. Spacekime (SK) Reconstructed Distributions")
lines(density(scale(ift_ft_xC_1sampleN_PhaseC_avg)), col="green", lwd=2, lty=2)
lines(density(scale(ift_ft_xC_1sampleN_PhaseD_avg)), col="blue", lwd=2, lty=10)
lines(density(scale(ift_ft_xC_1sampleN_PhaseB_avg)), col="red", lwd=2, lty=20)
legend("topright", legend=c("Original (Mixture of N(0,1) & N(5,3))", "SK Synthesis (1 Mag, Phase=True)",
      "SK Synthesis (1 Mag, Phase=Indep)", "SK Synthesis (1 Mag, Phase=Diff.Proc.)",
      paste0(sprintf("\nKS.test(Original, SK(1Mag,Phase=True)), D=%s, p=%s\n",
                    format(ks.test(scale(xC), scale(ift_ft_xC_1sampleN_PhaseC_avg))$statistic[[1]], digits=2),
                    format(ks.test(scale(xC), scale(ift_ft_xC_1sampleN_PhaseC_avg))$p.value, digits=2)),
          sprintf("KS.test(SK(1Mag,Phase=Indep), SK(1Mag,Phase=True)), p=%s\n",
                    format(ks.test(scale(ift_ft_xC_1sampleN_PhaseD_avg),
                                   scale(ift_ft_xC_1sampleN_PhaseC_avg))$p.value, digits=2)),
          sprintf("Corr(Original, SK(1Mag,Phase=True))=%s", 
                    format(cor(scale(ift_ft_xC_1sampleN_PhaseC_avg), scale(xC)), digits=2)))),
       col=c("black", "green", "blue", "red", "black"), lty=c(1, 2, 10, 20, 0), lwd=2, bty = "n", cex=0.75,
       y.intersp=0.0, title="Scaled Densities")

# Skewness
# e1071::skewness(xC); e1071::skewness(ift_ft_xC_1sampleN_PhaseC_avg); e1071::skewness(ift_ft_xC_1sampleN_PhaseB_avg); e1071::skewness(ift_ft_xC_1sampleN_PhaseD_avg)

# Kurtosis
# e1071::kurtosis(xC); e1071::kurtosis(ift_ft_xC_1sampleN_PhaseC_avg); e1071::kurtosis(ift_ft_xC_1sampleN_PhaseB_avg); e1071::kurtosis(ift_ft_xC_1sampleN_PhaseD_avg)
raw_data <- 
  data.frame(Original=scale(xC),
             Phase_True=scale(ift_ft_xC_1sampleN_PhaseB_avg),
             Phase_Diff.Proc=scale(ift_ft_xC_1sampleN_PhaseC_avg),
             Phase_Indep=scale(ift_ft_xC_1sampleN_PhaseD_avg))

1.2.1.1 Violin Plot

melted_data = melt(raw_data)
## No id variables; using all as measure variables
ggplot(melted_data,aes(x=variable, y=value, fill=variable)) + 
  geom_violin(trim=FALSE) +
  geom_boxplot(width=0.1) + 
  theme_bw()

1.2.1.2 Empirical CDF

ggplot(melted_data,aes(x=value, color=variable)) +
  stat_ecdf(size = 0.5)

1.2.1.3 Beanplot

boxplot(melted_data$value ~ melted_data$variable)
beanplot(melted_data$value ~ melted_data$variable,
         border = "#CAB2D6",
         col = c("#CAB2D6", "#33A02C", "#B2DF8A"),
         side="second", add = T)

1.3 Bayesian Inference Simulation

Here we relate to Bayesian inference to Spacekime analytics based on a single (cohort (\(A\)) spacetime observation \(x_{i_o}\) and some prior kime-phases (obtained from cohorts \(A\), \(B\), or \(C\)). This is accomplished by computing the prior or posterior predictive distribution.

1.3.1 Canned academic example (using the cars dataset)

#install.packages("rstanarm")
#library("rstanarm")

# Docs: https://rdrr.io/cran/rstanarm/man/posterior_predict.stanreg.html

# 1. Canned example
# if (!exists("example_model")) example(example_model)
# yrep <- posterior_predict(example_model)
# table(yrep)

# 2. Example using sample data (n=10): counts ~ outcome + treatment
# counts <- c(18,17,15,20,10,20,25,13,12,15)
# outcome <- gl(5,2,10)
# treatment <- gl(2,5)
# model_fit <- stan_glm(counts ~ outcome + treatment, 
#                  family = poisson(link="log"),
#                  prior = normal(0, 1), prior_intercept = normal(0, 5))
# new_data <- data.frame(treatment = factor(rep(1,5)), outcome = factor(1:5))
# Draw from the posterior predictive distribution of the outcome.
# ytilde <- posterior_predict(model_fit, new_data, draws = 500)
# print(dim(ytilde))  # 500 by 5 matrix (draws by nrow(new_data))
# ytilde <- data.frame(count = c(ytilde),
#                      outcome = rep(new_data$outcome, each = 500))
# ggplot2::ggplot(ytilde, ggplot2::aes(x=outcome, y=count)) +
#   ggplot2::geom_boxplot() +
#   ggplot2::ylab("predicted count")
# ytilde <- posterior_predict(model_fit, draws = 500)
# bayesplot::color_scheme_set("brightblue")
# bayesplot::ppc_dens_overlay(counts, ytilde[1:100, ])

# Using the CARS data (mpg ~ wt)
mtcars2 <- mtcars   # dim(mtcars2) # [1] 32(Automakers) 11(Features)
# mtcars2$log_mpg <- log(mtcars2$mpg)   # Define outcome of interest
model_fit <- stan_glm(mpg ~ wt, data = mtcars2)
## 
## SAMPLING FOR MODEL 'continuous' NOW (CHAIN 1).
## Chain 1: 
## Chain 1: Gradient evaluation took 3.2e-05 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.32 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1: 
## Chain 1: 
## Chain 1: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 1: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 1: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 1: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 1: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 1: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 1: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 1: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 1: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 1: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 1: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 1: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 1: 
## Chain 1:  Elapsed Time: 0.032 seconds (Warm-up)
## Chain 1:                0.028 seconds (Sampling)
## Chain 1:                0.06 seconds (Total)
## Chain 1: 
## 
## SAMPLING FOR MODEL 'continuous' NOW (CHAIN 2).
## Chain 2: 
## Chain 2: Gradient evaluation took 8e-06 seconds
## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.08 seconds.
## Chain 2: Adjust your expectations accordingly!
## Chain 2: 
## Chain 2: 
## Chain 2: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 2: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 2: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 2: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 2: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 2: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 2: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 2: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 2: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 2: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 2: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 2: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 2: 
## Chain 2:  Elapsed Time: 0.032 seconds (Warm-up)
## Chain 2:                0.028 seconds (Sampling)
## Chain 2:                0.06 seconds (Total)
## Chain 2: 
## 
## SAMPLING FOR MODEL 'continuous' NOW (CHAIN 3).
## Chain 3: 
## Chain 3: Gradient evaluation took 8e-06 seconds
## Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.08 seconds.
## Chain 3: Adjust your expectations accordingly!
## Chain 3: 
## Chain 3: 
## Chain 3: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 3: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 3: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 3: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 3: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 3: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 3: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 3: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 3: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 3: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 3: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 3: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 3: 
## Chain 3:  Elapsed Time: 0.032 seconds (Warm-up)
## Chain 3:                0.027 seconds (Sampling)
## Chain 3:                0.059 seconds (Total)
## Chain 3: 
## 
## SAMPLING FOR MODEL 'continuous' NOW (CHAIN 4).
## Chain 4: 
## Chain 4: Gradient evaluation took 8e-06 seconds
## Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.08 seconds.
## Chain 4: Adjust your expectations accordingly!
## Chain 4: 
## Chain 4: 
## Chain 4: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 4: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 4: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 4: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 4: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 4: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 4: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 4: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 4: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 4: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 4: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 4: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 4: 
## Chain 4:  Elapsed Time: 0.03 seconds (Warm-up)
## Chain 4:                0.032 seconds (Sampling)
## Chain 4:                0.062 seconds (Total)
## Chain 4:
# Get the posterior predictive distribution as a matrix:
# D(number of draws posterior predictive distribution)=500 by 
# N(number of data points being predicted per draw)
ytilde <- posterior_predict(model_fit, draws = 500)
# dim(ytilde) # Outcome=mpg, [1] 500(MCMC draws)  32(automakers)
head(ytilde)
##      Mazda RX4 Mazda RX4 Wag Datsun 710 Hornet 4 Drive Hornet Sportabout
## [1,]  26.57814      19.28906   21.56788       16.66677          20.96405
## [2,]  26.17666      18.55600   25.05435       21.89238          24.38512
## [3,]  23.12718      19.10766   21.75938       19.86292          24.29539
## [4,]  27.77261      27.98705   25.03451       19.56903          16.11836
## [5,]  20.81131      22.28745   19.05529       18.59521          21.42627
## [6,]  21.35933      27.77400   29.76848       22.31773          20.36743
##       Valiant Duster 360 Merc 240D Merc 230 Merc 280 Merc 280C Merc 450SE
## [1,] 17.20666   13.90925  13.59989 21.81288 19.19456  17.79067   19.91971
## [2,] 18.69859   21.48694  18.76814 18.35035 20.21671  19.84562   15.66679
## [3,] 11.70067   16.46605  19.62225 19.35699 15.33518  13.49081   19.11797
## [4,] 19.91092   11.96854  18.10896 20.19260 21.07821  17.28403   13.42928
## [5,] 20.15630   20.46428  11.16570 23.09031 15.24620  21.55653   12.50579
## [6,] 19.58688   16.88209  27.68756 20.70701 16.31986  16.19064   12.91713
##      Merc 450SL Merc 450SLC Cadillac Fleetwood Lincoln Continental
## [1,]   16.23999    13.83903           8.388953           10.173952
## [2,]   19.03287    18.40036          11.483816           14.499870
## [3,]   14.25349    11.85542          14.379122            7.610252
## [4,]   14.47135    21.35191          13.291918            7.569892
## [5,]   13.40928    12.24925          12.579192            9.875078
## [6,]   19.89516    17.88470          10.521398           14.824445
##      Chrysler Imperial Fiat 128 Honda Civic Toyota Corolla Toyota Corona
## [1,]          5.929918 25.80967    31.52660       26.38044      23.61710
## [2,]         12.073042 25.26523    33.23144       32.99065      20.16215
## [3,]         12.195849 26.25890    31.13767       29.07340      18.72665
## [4,]          6.852154 24.73502    29.75136       27.30006      24.26506
## [5,]         10.391211 32.13723    26.77398       30.29637      20.54983
## [6,]         12.554134 26.90141    21.39055       26.91797      21.98639
##      Dodge Challenger AMC Javelin Camaro Z28 Pontiac Firebird Fiat X1-9
## [1,]         17.66693    20.62394   15.21254         19.78452  22.64817
## [2,]         21.82785    21.39405   17.10364         15.82495  30.52120
## [3,]         18.19218    18.93431   16.63858         17.22821  22.30362
## [4,]         21.86567    17.45261   19.18302         19.99441  32.16494
## [5,]         19.40845    12.78614   13.94972         14.58452  28.71103
## [6,]         18.53164    16.98536   18.09791         16.23752  23.98330
##      Porsche 914-2 Lotus Europa Ford Pantera L Ferrari Dino Maserati Bora
## [1,]      25.84653     29.29596       21.22268     20.17266      16.24216
## [2,]      27.11637     32.79159       18.64201     23.84569      16.08446
## [3,]      22.44502     29.41379       22.56831     21.01964      14.86345
## [4,]      30.88321     26.51899       20.68652     28.58005      24.75199
## [5,]      23.79990     28.54900       19.87864     23.63179      17.10681
## [6,]      24.51688     30.42253       10.21943     20.00487      14.55974
##      Volvo 142E
## [1,]   20.61542
## [2,]   23.08619
## [3,]   24.25682
## [4,]   20.72706
## [5,]   18.36442
## [6,]   17.87471
bayesplot::color_scheme_set("brightblue")
bayesplot::ppc_dens_overlay(mtcars2$mpg, ytilde[1:100, ])

bayesplot::ppc_hist(mtcars2$mpg, ytilde[1:100, ])
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

# Plot a histogram of the distribution of various test statistics (e.g., mean, sd) across MCMC draws.
# The distributions are computed by applying the statistics to each dataset (draw=row) in *ytilde*.
# The blue vertical line overlays the value of the same statistic in the observed data, stat(y=mtcars2$mpg)
bayesplot::ppc_stat(mtcars2$mpg, ytilde[1:100, ], stat = "mean")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

bayesplot::ppc_stat(mtcars2$mpg, ytilde[1:100, ], stat = "sd")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

#bayesplot::ppc_stat(mtcars2$mpg, ytilde[1:100, ], stat = "range")
bayesplot::ppc_stat(mtcars2$mpg, ytilde[1:100, ], stat = "min")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

bayesplot::ppc_stat(mtcars2$mpg, ytilde[1:100, ], stat = "max")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

# 2D (mean, sd) plot
bayesplot::ppc_stat_2d(mtcars2$mpg, ytilde[1:100, ], stat=c("mean", "sd"), size=2.5, alpha=0.7)

1.3.2 Example of spacekime-analytics as a Bayesian Inference problem (using the fMRI dataset)

Recall that we have:

  • Original data ift_ft_xC_fMRI
  • IFT of the data using C-Phases ift_ft_xC_fMRI_1sampleN_PhaseC for all samples of \(N=30\)
  • IFT of the data using B-Phases ift_ft_xC_fMRI_1sampleN_PhaseB for all samples of \(N=30\), and
  • IFT of the data using D-Phases ift_ft_xC_fMRI_1sampleN_PhaseD for all samples of \(N=30\)
#install.packages("rstanarm")
#library("rstanarm")

# xC_fMRI_scaled <- scale(xC_fMRI)[ , 1]
# length(ift_ft_xC_fMRI)    # [1] 8000



# dim(ift_ft_xC_fMRI_1sampleN_PhaseC)  # [1] 8000   30
# dim(ift_ft_xC_fMRI_1sampleN_PhaseB) 
# dim(ift_ft_xC_fMRI_1sampleN_PhaseD) 

# Get the posterior predictive distribution as a matrix:
# D(number of draws posterior predictive distribution)=30 by 
# N(number of data time points being predicted per draw)=8000
# Inspect the first few spacekime reconstructed draws: Mind the transposition of the tensor!
# t(ift_ft_xC_fMRI_1sampleN_PhaseC)[ , 1:7]  # 30(draws)  7(first timepoints ony) 
# the spacekime estimates represent the posterior predictive distribution
# ytilde <- t(ift_ft_xC_fMRI_1sampleN_PhaseC + mean(xC_fMRI))  # all 8000 timepoints (columns)
# Center/Offset the spacekime estimates to the center of the original data (xC_fMRI)
ytilde <- rbind(t(ift_ft_xC_fMRI_1sampleN_PhaseC+mean(ift_ft_xC_fMRI)-apply(ift_ft_xC_fMRI_1sampleN_PhaseC,2,mean)),
                t(ift_ft_xC_fMRI_1sampleN_PhaseB+mean(ift_ft_xC_fMRI)-apply(ift_ft_xC_fMRI_1sampleN_PhaseB,2,mean)),
                t(ift_ft_xC_fMRI_1sampleN_PhaseD+mean(ift_ft_xC_fMRI)-apply(ift_ft_xC_fMRI_1sampleN_PhaseD,2,mean))) 
# dim(ytilde) # Outcome=bimodal process (xC_fMRI), [1] 30(MCMC draws)  8000(timepoints)

bayesplot::color_scheme_set("brightblue")
bayesplot::ppc_dens_overlay(ift_ft_xC_fMRI, ytilde) + xlim(10300,10800) + ylim(0,0.01)
## Warning: Removed 843 rows containing non-finite values (`stat_density()`).
## Warning: Removed 2 rows containing non-finite values (`stat_density()`).

bayesplot::ppc_hist(ift_ft_xC_fMRI, ytilde[1:11, ])
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

# Plot a histogram of the distribution of various test statistics (e.g., mean, sd) across MCMC draws.
# The distributions are computed by applying the statistics to each dataset (draw=row) in *ytilde*.
# The blue vertical line overlays the value of the same statistic in the observed data, stat(y=mtcars2$mpg)
bayesplot::ppc_stat(ift_ft_xC_fMRI, ytilde, stat = "mean")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

bayesplot::ppc_stat(ift_ft_xC_fMRI, ytilde, stat = "sd")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

bayesplot::ppc_stat(ift_ft_xC_fMRI, ytilde, stat = IQR)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

bayesplot::ppc_stat(ift_ft_xC_fMRI, ytilde, stat = "min")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

bayesplot::ppc_stat(ift_ft_xC_fMRI, ytilde, stat = "max")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

# bayesplot::ppc_stat(xC_fMRI_scaled, ytilde, stat = e1071::skewness)
# bayesplot::ppc_stat(xC_fMRI_scaled, ytilde, stat = e1071::kurtosis)
# 2D (mean, sd) plot
bayesplot::ppc_stat_2d(ift_ft_xC_fMRI, ytilde, stat=c("mean", "sd"), size=2.5, alpha=0.7)

1.3.3 Example of spacekime-analytics as a Bayesian Inference problem (using the bimodal simulated dataset)

Recall that we have:

  • Original data xC
  • IFT of the data using C-Phases ift_ft_xC_1sampleN_PhaseC for all samples of \(N=30\)
  • IFT of the data using B-Phases ift_ft_xC_1sampleN_PhaseB for all samples of \(N=30\), and
  • IFT of the data using D-Phases ift_ft_xC_1sampleN_PhaseD for all samples of \(N=30\)
# xC_scaled <- scale(xC)[ , 1]
# length(xC)                      # [1] 8000
# dim(ift_ft_xC_1sampleN_PhaseC)  # [1] 8000   30
# dim(ift_ft_xC_1sampleN_PhaseB) 
# dim(ift_ft_xC_1sampleN_PhaseD) 

# Get the posterior predictive distribution as a matrix:
# D(number of draws posterior predictive distribution)=30 by 
# N(number of data time points being predicted per draw)=8000
# Inspect the first few spacekime reconstructed draws: Mind the transposition of the tensor!
# t(ift_ft_xC_1sampleN_PhaseC)[ , 1:7]  # 30(draws)  7(first timepoints ony) 
# the spacekime estimates represent the posterior predictive distribution
# ytilde <- t(ift_ft_xC_1sampleN_PhaseC + mean(xC))  # all 8000 timepoints (columns)
# Center/Offset the spacekime estimates to the center of the original data (xC)
ytilde <- rbind(t(ift_ft_xC_1sampleN_PhaseC+mean(xC)-apply(ift_ft_xC_1sampleN_PhaseC,2,mean)),
                t(ift_ft_xC_1sampleN_PhaseB+mean(xC)-apply(ift_ft_xC_1sampleN_PhaseB,2,mean)),
                t(ift_ft_xC_1sampleN_PhaseD+mean(xC)-apply(ift_ft_xC_1sampleN_PhaseD,2,mean))) 
# dim(ytilde) # Outcome=bimodal process (xC), [1] 30(MCMC draws)  8000(timepoints)

bayesplot::color_scheme_set("brightblue")

Figure 5.5

bayesplot::ppc_dens_overlay(xC, ytilde) + xlim(-22,25) + ylim(0,0.3)

bayesplot::ppc_hist(xC, ytilde[1:11, ])
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

# Plot a histogram of the distribution of various test statistics (e.g., mean, sd) across MCMC draws.
# The distributions are computed by applying the statistics to each dataset (draw=row) in *ytilde*.
# The blue vertical line overlays the value of the same statistic in the observed data, stat(y=mtcars2$mpg)
bayesplot::ppc_stat(xC, ytilde, stat = "mean")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

bayesplot::ppc_stat(xC, ytilde, stat = "sd")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

bayesplot::ppc_stat(xC, ytilde, stat = IQR)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

bayesplot::ppc_stat(xC, ytilde, stat = "min")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

bayesplot::ppc_stat(xC, ytilde, stat = "max")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

# bayesplot::ppc_stat(xC_scaled, ytilde, stat = e1071::skewness)
# bayesplot::ppc_stat(xC_scaled, ytilde, stat = e1071::kurtosis)
# 2D (mean, sd) plot
bayesplot::ppc_stat_2d(xC, ytilde, stat=c("mean", "sd"), size=2.5, alpha=0.7)

For the MCSI data analytics, see the separate script: UM_Michigan_Consumer_Sentiment_Index.R.

SOCR Resource Visitor number Web Analytics SOCR Email