This SOCR TCIU/Spacekime Analytics Rmd notebook
implements V.5 of the kime-phase tomography (KPT)
algorithm.
V.5 improves the previous versions by utilizing
The unified V.5 presents the mathematical
theory, includes core helper functions, e.g., von Mises
sampling, KDE, PCA initialization. The simulation framework uses
time-varying harmonics for identifiability, where both algorithms
(KPT-GEM and KPT-FFT) have consistent interfaces, prediction functions,
comprehensive validation protocols, and respond to real Kepler data and
simulated data using Uniform (non-informative) phase distributions and
von Mises (concentrated) phase distributions.
Kime-Phase Tomography (KPT) is a technique for recovering latent phase structure from repeated measurements of longitudinal processes. The core idea is that each independent realization (e.g., a star, a subject) has an unobserved phase \(\theta_i \in [0, 2\pi)\) that governs its observed trajectory.
The observation model is \[Y_{i,k} = S(t_k, \theta_i) + \varepsilon_{i,k}, \quad \varepsilon_{i,k} \stackrel{\text{i.i.d.}}{\sim} \mathcal{N}(0, \sigma^2)\]
where \(Y_{i,k}\) is the observation for realization \(i\) at time window \(k\), \(\theta_i \sim \varphi(\theta) \, d\theta / (2\pi)\) is the latent phase for realization \(i\), \(S(t_k, \theta)\) is the kime-surface describing phase-dependent patterns over time, and \(\varphi(\theta)\) is the phase distribution (phase law) to be recovered.
The KPT model is invariant under the joint rotation \(S(t, \theta) \mapsto S(t, \theta + \alpha)\) and \(\varphi(\theta) \mapsto \varphi(\theta - \alpha)\). To enforce identifiability
We aim to test whether KPT can discover latent phase structure in repeated measurements of stellar brightness variability.
Kepler provides an ideal testbed: each star is an independent realization of the “stellar variability process,” observed over the same time period with identical instrumentation.
This validation with Kepler stellar light curves tests whether Kime-Phase Tomography (KPT) can discover latent phase structure in repeated measurements of a stochastic astrophysical process—stellar brightness variability. KPT requires data where multiple independent realizations of a longitudinal process are observed. Kepler provides an ideal testbed: each star is an independent realization of the “stellar variability process,” observed over the same time period with identical instrumentation.
Data Source
Mission: NASA Kepler Space Telescope (2009–2013)
Archive: Mikulski Archive for Space Telescopes (MAST)
Product: Pre-search Data Conditioning Simple Aperture Photometry (PDCSAP) flux.
Sample Selection:
Data Dictionary
| Variable | Dimension | Description |
|---|---|---|
Y |
\(I \times K\) | Standardized variability matrix; rows = stars, columns = time windows |
Y_raw |
\(I \times K\) | Raw variance of flux within each window (\(ppm^2\)) |
t_center |
\(K\) | Window center times (days from start) |
n_stars |
\(1\) | Number of stellar realizations (\(I = 75\)) |
n_windows |
\(1\) | Number of temporal windows (\(K = 28\)) |
metadata |
\(I \times 10\) | Stellar properties from Kepler Input Catalog |
Metadata fields: kepid (Kepler ID),
ra, dec (J2000 coordinates), teff
(effective temperature, K), logg (surface gravity, dex),
feh (metallicity, dex), kepmag (Kepler
magnitude), radius (R☉), mass (M☉),
star_id (index).
The KPT data structure mapping is shown below.
| KPT Concept | Kepler Implementation |
|---|---|
| Realization index \(i\) | Star \(i\) (independent stellar system) |
| Time window \(k\) | \(5\)-day binned interval |
| Observation \(Y_{i\ k}\) | Standardized flux variance of star \(i\) in window \(k\) |
| Kime-phase \(\theta_i\) | Latent stellar “activity phase” or rotation phase |
| Phase distribution \(\varphi(\theta)\) | Distribution of stellar phases across the sample |
| Kime-surface \(\mathbb{S}(\theta, t)\) | Phase-dependent variability pattern |
Null Hypothesis (\(H_o\)): Stellar variability is phase-independent. In other words, \(\varphi(\theta)=\) uniform (all phases equally likely), \(\mathbb{S}(\theta, t)=\) constant (no \(\theta\)-dependence), and the optimal model is \(Y_{i k} \sim N(\mu, \sigma^2)\).
Alternative Hypothesis (\(H_1\)): Stellar variability depends on a latent phase, i.e., \(\varphi(\theta)\not=\) uniform (certain phases more prevalent), \(\mathbb{S}(\theta, t)\) varies with \(\theta\) (phase-dependent patterns exist), and KPT provides improved predictions beyond the additive noise, simple Gaussian model.
Null Hypothesis (\(H_o\)): Stellar variability is phase-independent
Alternative Hypothesis (\(H_1\)): Stellar variability depends on a latent phase:
| Variable | Dimension | Description |
|---|---|---|
Y |
\(I \times K\) | Standardized variability matrix; rows = stars, columns = time windows |
Y_raw |
\(I \times K\) | Raw variance of flux within each window (\(ppm^2\)) |
t_center |
\(K\) | Window center times (days from start) |
n_stars |
\(1\) | Number of stellar realizations (\(I\)) |
n_windows |
\(1\) | Number of temporal windows (\(K\)) |
metadata |
\(I \times p\) | Stellar properties from Kepler Input Catalog |
| Metric | \(H_0\) (No Structure) | \(H_1\) (Structure Detected) |
|---|---|---|
| \(\varphi\) deviation from uniform | \(< 0.3\) | \(> 0.5\) |
| \(S\) average \(\theta\)-variation | \(< 0.1\) | \(> 0.2\) |
| KPT vs Gaussian RMSE | Equal | KPT lower |
| 95% PI coverage | \(\sim 95\%\) | \(\sim 95\%\) |
# Core packages
library(tidyverse)
library(httr)
library(jsonlite)
# Visualization
library(viridis)
# For von Mises sampling (optional, with fallback)
vonmises_available <- requireNamespace("circular", quietly = TRUE) ||
requireNamespace("CircStats", quietly = TRUE)
cat("═══════════════════════════════════════════════════════════════\n")## ═══════════════════════════════════════════════════════════════
## KPT VALIDATION ENVIRONMENT
## ═══════════════════════════════════════════════════════════════
## R version: R version 4.3.3 (2024-02-29 ucrt)
## Circular statistics available: TRUE
## ═══════════════════════════════════════════════════════════════
These helper functions are shared by both KPT-GEM and KPT-FFT algorithms.
# ==============================================================================
# HELPER FUNCTIONS FOR KPT ALGORITHMS
# ==============================================================================
#' Generate uniform theta grid on [0, 2π)
#' @param L Number of grid points
theta_grid <- function(L) {
seq(0, 2*pi, length.out = L + 1)[-(L + 1)]
}
#' Sample from von Mises distribution (with fallback)
#' @param n Number of samples
#' @param mu Mean direction (radians)
#' @param kappa Concentration parameter
rvonmises <- function(n, mu = 0, kappa = 1) {
if (kappa < 1e-6) return(runif(n, 0, 2*pi))
if (requireNamespace("circular", quietly = TRUE)) {
as.numeric(circular::rvonmises(n, mu = circular::circular(mu), kappa = kappa))
} else if (requireNamespace("CircStats", quietly = TRUE)) {
CircStats::rvm(n, mean = mu, k = kappa)
} else {
# Rejection sampling fallback
samples <- numeric(n)
for (i in 1:n) {
while (TRUE) {
u1 <- runif(1)
u2 <- runif(1)
theta <- 2 * pi * u1
if (u2 < exp(kappa * (cos(theta - mu) - 1))) {
samples[i] <- theta %% (2*pi)
break
}
}
}
samples
}
}
#' von Mises density (mean-1 normalized)
#' @param theta Evaluation points
#' @param mu Mean direction
#' @param kappa Concentration
dvonmises_normalized <- function(theta, mu, kappa) {
p <- exp(kappa * cos(theta - mu)) / (2*pi * besselI(kappa, 0))
p / mean(p) # Normalize to mean-1
}
#' Project vector onto simplex with given sum
#' @param v Input vector
#' @param target_sum Desired sum
project_simplex_sum <- function(v, target_sum) {
u <- sort(v, decreasing = TRUE)
cssv <- cumsum(u)
rho <- max(which(u + (target_sum - cssv) / seq_along(u) > 0))
tau <- (cssv[rho] - target_sum) / rho
pmax(v - tau, 0)
}
#' Project to mean-1 density
#' @param phi Input density values
project_density_mean1 <- function(phi) {
L <- length(phi)
out <- project_simplex_sum(phi, target_sum = L)
out / mean(out)
}
#' Circular kernel density estimation with von Mises kernel
#' @param theta_eval Evaluation points
#' @param theta_samp Sample points
#' @param kappa Concentration (higher = less smooth)
circ_kde_vonmises <- function(theta_eval, theta_samp, kappa = 4) {
K <- exp(kappa * outer(theta_eval, theta_samp, function(a, b) cos(a - b)))
phi <- rowMeans(K)
phi / mean(phi)
}
#' PCA-based phase initialization
#' @param Y Data matrix (I x K)
pca_phase_init <- function(Y) {
# Handle missing values
Yimp <- Y
col_mu <- colMeans(Yimp, na.rm = TRUE)
for (k in seq_len(ncol(Yimp))) {
bad <- !is.finite(Yimp[, k])
if (any(bad)) Yimp[bad, k] <- col_mu[k]
}
# Center and compute PCA
Yc <- scale(Yimp, center = TRUE, scale = FALSE)
pc <- prcomp(Yc, center = FALSE, scale. = FALSE)
# Map first two PCs to phase
z1 <- pc$x[, 1]
z2 <- if (ncol(pc$x) >= 2) pc$x[, 2] else rep(0, nrow(pc$x))
theta_init <- (atan2(z2, z1) %% (2*pi))
list(theta_init = theta_init, pc = pc)
}
#' Enforce Hermitian symmetry on Fourier coefficients
#' @param cvec Complex coefficient vector
#' @param nvec Harmonic indices
enforce_hermitian <- function(cvec, nvec) {
out <- cvec
for (j in seq_along(nvec)) {
n <- nvec[j]
if (n < 0) {
jp <- which(nvec == -n)
if (length(jp) == 1) out[j] <- Conj(out[jp])
}
}
j0 <- which(nvec == 0)
if (length(j0) == 1) out[j0] <- Re(out[j0])
out
}We use a rich simulation framework with Time-Varying Harmonics to generates data with identifiable phase structure by using multiple harmonics and time-varying phase shifts.
# ==============================================================================
# SIMULATION FRAMEWORK FOR KPT VALIDATION
# ==============================================================================
#' Simulate KPT data with rich, identifiable phase structure
#'
#' @param n_stars Number of independent realizations (stars)
#' @param n_windows Number of time windows
#' @param phase_law "uniform" or "vonMises"
#' @param mu_true Mean direction for von Mises (radians)
#' @param kappa_true Concentration for von Mises
#' @param effect_size Amplitude of phase effect (0 = none, 1 = strong)
#' @param noise_sd Observation noise standard deviation
#' @param seed Random seed for reproducibility
#' @return List containing Y matrix, true phases, and parameters
simulate_kpt_data <- function(n_stars = 200,
n_windows = 20,
phase_law = c("uniform", "vonMises"),
mu_true = pi,
kappa_true = 2.0,
effect_size = 0.6,
noise_sd = 0.5,
seed = NULL) {
phase_law <- match.arg(phase_law)
if (!is.null(seed)) set.seed(seed)
cat("═══════════════════════════════════════════════════════════════\n")
cat(" SIMULATING KPT TEST DATA \n")
cat("═══════════════════════════════════════════════════════════════\n")
cat(sprintf("Stars (I): %d\n", n_stars))
cat(sprintf("Windows (K): %d\n", n_windows))
cat(sprintf("Phase law: %s\n", phase_law))
if (phase_law == "vonMises") {
cat(sprintf(" μ_true = %.2f (%.0f°)\n", mu_true, mu_true * 180/pi))
cat(sprintf(" κ_true = %.2f\n", kappa_true))
}
cat(sprintf("Effect size: %.2f\n", effect_size))
cat(sprintf("Noise SD: %.2f\n", noise_sd))
cat("\n")
# -------------------------------------------------------------------------
# Step 1: Sample phase θ_i for each star
# -------------------------------------------------------------------------
if (phase_law == "uniform") {
theta_i <- runif(n_stars, 0, 2*pi)
} else {
theta_i <- rvonmises(n_stars, mu = mu_true, kappa = kappa_true) %% (2*pi)
}
# True phase distribution on grid
theta_grid_eval <- theta_grid(64)
if (phase_law == "uniform") {
phi_true <- rep(1, 64)
} else {
phi_true <- dvonmises_normalized(theta_grid_eval, mu_true, kappa_true)
}
# -------------------------------------------------------------------------
# Step 2: Generate time-varying surface parameters (KEY for identifiability)
# -------------------------------------------------------------------------
k_idx <- 1:n_windows
# Time-varying window means
mu_k <- rnorm(n_windows, 0, 0.25)
# Time-varying harmonic amplitudes
A1 <- effect_size * (1 + 0.3 * sin(2*pi * k_idx / n_windows))
A2 <- 0.5 * effect_size * (1 + 0.2 * cos(2*pi * k_idx / n_windows))
# Time-varying phase shifts (CRITICAL for breaking symmetries)
delta1 <- 0.6 * cos(2*pi * k_idx / n_windows)
delta2 <- -0.4 * sin(2*pi * k_idx / n_windows)
# -------------------------------------------------------------------------
# Step 3: Generate observations Y_{i,k} = S_k(θ_i) + ε_{i,k}
# -------------------------------------------------------------------------
Y <- matrix(NA_real_, nrow = n_stars, ncol = n_windows)
for (i in 1:n_stars) {
for (k in 1:n_windows) {
# Multi-harmonic surface with time-varying structure
s_ik <- mu_k[k] +
A1[k] * cos(theta_i[i] - delta1[k]) +
A2[k] * cos(2 * (theta_i[i] - delta2[k]))
Y[i, k] <- s_ik + rnorm(1, 0, noise_sd)
}
}
# -------------------------------------------------------------------------
# Step 4: Compute true surface S(θ) for comparison
# -------------------------------------------------------------------------
S_true <- matrix(0, n_windows, 64)
for (k in 1:n_windows) {
S_true[k, ] <- mu_k[k] +
A1[k] * cos(theta_grid_eval - delta1[k]) +
A2[k] * cos(2 * (theta_grid_eval - delta2[k]))
}
# -------------------------------------------------------------------------
# Summary statistics
# -------------------------------------------------------------------------
cat("Data summary:\n")
cat(sprintf(" Y dimensions: %d × %d\n", nrow(Y), ncol(Y)))
cat(sprintf(" Y mean: %.4f\n", mean(Y)))
cat(sprintf(" Y SD: %.4f\n", sd(Y)))
cat("═══════════════════════════════════════════════════════════════\n")
list(
Y = Y,
theta_i_true = theta_i,
theta_grid = theta_grid_eval,
phi_true = phi_true,
S_true = S_true,
n_stars = n_stars,
n_windows = n_windows,
phase_law = phase_law,
parameters = list(
mu_true = mu_true,
kappa_true = kappa_true,
effect_size = effect_size,
noise_sd = noise_sd,
mu_k = mu_k,
A1 = A1, A2 = A2,
delta1 = delta1, delta2 = delta2
)
)
}The KPT-GEM algorithm uses the general Expectation-Maximization (GEM) framework with:
E-Step: Compute curve-level posterior weights \[w_{i\ell} = P(\theta_i = \theta_\ell | \mathbf{Y}_i) = \frac{\varphi_\ell \exp\left(-\frac{1}{2\sigma^2}\sum_{k} (Y_{i,k} - S_{k,\ell})^2\right)}{\sum_{\ell'} \varphi_{\ell'} \exp\left(-\frac{1}{2\sigma^2}\sum_{k} (Y_{i,k} - S_{k,\ell'})^2\right)}.\]
M-Step: Update parameters \[\hat{\varphi}_\ell \propto \sum_i w_{i\ell}, \quad \hat{S}_{k,\ell} = \frac{\sum_i w_{i\ell} Y_{i,k}}{\sum_i w_{i\ell}}.\]
# ==============================================================================
# KPT-GEM: OPTIMAL IMPLEMENTATION
# ==============================================================================
#
# Key features:
# - Curve-level (subject-level) E-step for correct posterior computation
# - PCA-based initialization to preserve phase structure
# - Damping for stable convergence
# - Fourier regularization for smooth surfaces
#
# ==============================================================================
#' KPT-GEM Algorithm for Phase Recovery
#'
#' @param Y Data matrix (n_stars × n_windows)
#' @param L_theta Number of theta grid points (default: 64)
#' @param J Number of Fourier harmonics for surface representation (default: 3)
#' @param smooth_kappa Smoothing parameter for φ (default: 4)
#' @param shrink Shrinkage factor for Fourier coefficients (default: 1.0)
#' @param damp Damping factor for updates (default: 0.5)
#' @param min_iter Minimum iterations before convergence check (default: 20)
#' @param max_iter Maximum iterations (default: 300)
#' @param tol Convergence tolerance (default: 1e-5)
#' @param verbose Print progress (default: TRUE)
#' @return List with estimated parameters and diagnostics
kpt_gem <- function(Y,
L_theta = 64,
J = 3,
smooth_kappa = 4,
shrink = 1.0,
damp = 0.5,
min_iter = 20,
max_iter = 300,
tol = 1e-5,
verbose = TRUE) {
I <- nrow(Y) # Number of realizations
K <- ncol(Y) # Number of time windows
theta <- theta_grid(L_theta)
# =========================================================================
# INITIALIZATION (PCA-based warm start)
# =========================================================================
if (verbose) {
cat("\n")
cat("════════════════════════════════════════════════════════════════\n")
cat(" KPT-GEM ALGORITHM \n")
cat("════════════════════════════════════════════════════════════════\n")
cat(sprintf("Data: %d realizations × %d windows\n", I, K))
cat(sprintf("Theta grid: %d points, %d harmonics\n", L_theta, J))
cat("\n")
}
# PCA initialization
init <- pca_phase_init(Y)
theta_init <- init$theta_init
# Initialize φ from PCA phases using circular KDE
phi <- circ_kde_vonmises(theta, theta_init, kappa = 6)
phi <- project_density_mean1(phi)
# Handle missing values for initialization
Yimp <- Y
col_mu <- colMeans(Yimp, na.rm = TRUE)
for (k in 1:K) {
bad <- !is.finite(Yimp[, k])
if (any(bad)) Yimp[bad, k] <- col_mu[k]
}
# Initialize S(θ) using harmonic regression on PCA phases
# Build design matrix for subjects
Xsub <- matrix(1, nrow = I, ncol = 1 + 2*J)
for (j in 1:J) {
Xsub[, 2*j] <- cos(j * theta_init)
Xsub[, 2*j+1] <- sin(j * theta_init)
}
S <- matrix(0, nrow = K, ncol = L_theta)
for (k in 1:K) {
beta <- lm.fit(Xsub, Yimp[, k])$coefficients
beta[is.na(beta)] <- 0
Sk <- rep(beta[1], L_theta)
for (j in 1:J) {
Sk <- Sk + beta[2*j] * cos(j * theta) + beta[2*j+1] * sin(j * theta)
}
S[k, ] <- Sk
}
# Initial noise variance
sigma2 <- var(as.vector(Yimp - rowMeans(Yimp, na.rm = TRUE)))
sigma2 <- max(sigma2, 0.01)
# Precompute trig bases for M-step
cos_basis <- sapply(1:J, function(j) cos(j * theta))
sin_basis <- sapply(1:J, function(j) sin(j * theta))
# Smoothing kernel for φ (von Mises)
kernel <- exp(smooth_kappa * cos(theta - theta[1]))
kernel <- kernel / sum(kernel)
if (verbose) {
cat(sprintf("Initial φ deviation: %.4f\n", max(abs(phi - 1))))
cat(sprintf("Initial S variation: %.4f\n", mean(apply(S, 1, sd))))
cat(sprintf("Initial σ: %.4f\n", sqrt(sigma2)))
cat("\n")
}
# =========================================================================
# TRACKING
# =========================================================================
history <- data.frame(
iter = integer(),
ll = numeric(),
dphi = numeric(),
dS = numeric(),
phi_dev = numeric(),
sigma = numeric()
)
# =========================================================================
# EM ITERATIONS
# =========================================================================
for (iter in 1:max_iter) {
phi_old <- phi
S_old <- S
sigma2_old <- sigma2
log_prior <- log(pmax(phi, 1e-12))
# =====================================================================
# E-STEP: Curve-level posterior weights w_{i,l}
# =====================================================================
w <- matrix(0, nrow = I, ncol = L_theta)
ll_total <- 0
for (i in 1:I) {
valid_k <- which(is.finite(Y[i, ]))
if (length(valid_k) == 0) next
y_i <- Y[i, valid_k]
S_i <- S[valid_k, , drop = FALSE] # (|valid_k| × L_theta)
# Sum of squared residuals over all valid time points
# resid2[l] = sum_k (y_{ik} - S_{k,l})^2
resid2 <- colSums((t(S_i) - y_i)^2)
# Log posterior (unnormalized)
log_w <- -resid2 / (2 * sigma2) + log_prior
# Normalize (log-sum-exp for stability)
m <- max(log_w)
ww <- exp(log_w - m)
ww <- ww / sum(ww)
w[i, ] <- ww
# Marginal log-likelihood contribution
ll_total <- ll_total + (m + log(sum(exp(log_w - m))))
}
# =====================================================================
# M-STEP: Update φ
# =====================================================================
phi_raw <- colSums(w) + 1e-2 # Small regularization
# Circular smoothing via FFT convolution
phi_smooth <- Re(fft(fft(phi_raw) * fft(kernel), inverse = TRUE)) / L_theta
phi_new <- project_density_mean1(phi_smooth)
# Damped update
phi <- project_density_mean1((1 - damp) * phi + damp * phi_new)
# =====================================================================
# M-STEP: Update S(θ) via weighted bin means + Fourier projection
# =====================================================================
S_new <- matrix(0, nrow = K, ncol = L_theta)
for (k in 1:K) {
valid_i <- which(is.finite(Y[, k]))
if (length(valid_i) == 0) next
wk <- w[valid_i, , drop = FALSE]
yk <- Y[valid_i, k]
# Weighted mean at each θ
num <- colSums(wk * yk)
den <- colSums(wk)
y_l <- num / pmax(den, 1e-12)
# DC component under current φ
mu_k <- sum(y_l * phi) / sum(phi)
y_ctr <- y_l - mu_k
# Fourier coefficients (φ-weighted)
a <- b <- numeric(J)
for (j in 1:J) {
a[j] <- 2 * sum(y_ctr * cos_basis[, j] * phi) / sum(phi)
b[j] <- 2 * sum(y_ctr * sin_basis[, j] * phi) / sum(phi)
}
# Apply shrinkage
a <- a * shrink
b <- b * shrink
# Reconstruct S_k(θ)
Sk <- rep(mu_k, L_theta)
for (j in 1:J) {
Sk <- Sk + a[j] * cos_basis[, j] + b[j] * sin_basis[, j]
}
S_new[k, ] <- Sk
}
# Damped update
S <- (1 - damp) * S + damp * S_new
# =====================================================================
# M-STEP: Update σ²
# =====================================================================
ss_total <- 0
n_obs <- 0
for (k in 1:K) {
valid_i <- which(is.finite(Y[, k]))
if (length(valid_i) == 0) next
wk <- w[valid_i, , drop = FALSE]
yk <- Y[valid_i, k]
# Expected squared residual
resid2_mat <- (outer(yk, S[k, ], "-"))^2
ss_total <- ss_total + sum(wk * resid2_mat)
n_obs <- n_obs + length(valid_i)
}
sigma2_new <- ss_total / max(n_obs, 1)
sigma2 <- (1 - damp) * sigma2 + damp * sigma2_new
# =====================================================================
# CONVERGENCE DIAGNOSTICS
# =====================================================================
phi_dev <- max(abs(phi - 1))
dphi <- sqrt(mean((phi - phi_old)^2))
dS <- sqrt(mean((S - S_old)^2))
history <- rbind(history, data.frame(
iter = iter,
ll = ll_total,
dphi = dphi,
dS = dS,
phi_dev = phi_dev,
sigma = sqrt(sigma2)
))
if (verbose && (iter %% 20 == 0 || iter <= 5)) {
cat(sprintf("Iter %3d: LL=%9.1f σ=%.3f Δφ=%.2e ΔS=%.2e φ_dev=%.3f\n",
iter, ll_total, sqrt(sigma2), dphi, dS, phi_dev))
}
# Check convergence
if (iter >= min_iter && max(dphi, dS, abs(sigma2 - sigma2_old)) < tol) {
if (verbose) cat("Converged!\n")
break
}
}
# =========================================================================
# FINAL RESULTS
# =========================================================================
# Posterior mean phases for each realization
mu_i <- sapply(1:I, function(i) {
sum(theta * w[i, ]) / sum(w[i, ])
})
# E[S(θ)] under φ
E_S <- sapply(1:K, function(k) sum(S[k, ] * phi) / sum(phi))
# KPT estimate
kpt_estimate <- mean(E_S)
# S variation metric
S_variation <- mean(apply(S, 1, sd))
if (verbose) {
cat("\n")
cat("════════════════════════════════════════════════════════════════\n")
cat(" KPT-GEM RESULTS \n")
cat("════════════════════════════════════════════════════════════════\n")
cat(sprintf("Final σ: %.4f\n", sqrt(sigma2)))
cat(sprintf("φ deviation from uniform: %.4f\n", phi_dev))
cat(sprintf("S average θ-variation: %.4f\n", S_variation))
cat(sprintf("Iterations: %d\n", iter))
cat(sprintf("Final log-likelihood: %.1f\n", ll_total))
if (phi_dev < 0.3 && S_variation < 0.2) {
cat("\n→ Result: Minimal phase structure detected (consistent with H₀)\n")
} else if (phi_dev > 0.5) {
cat("\n→ Result: Significant phase structure detected (consistent with H₁)\n")
}
cat("════════════════════════════════════════════════════════════════\n")
}
list(
# Grid
theta = theta,
L_theta = L_theta,
# Estimated parameters
phi = phi,
S = S,
sigma = sqrt(sigma2),
# Posterior quantities
w = w,
mu_i = mu_i,
# Derived quantities
E_S = E_S,
kpt_estimate = kpt_estimate,
# Diagnostics
history = history,
phi_uniform_dist = phi_dev,
S_variation = S_variation,
log_lik = ll_total,
# Initialization
theta_init = theta_init,
# Convergence
converged = (iter < max_iter),
n_iter = iter
)
}The KPT-FFT algorithm uses spectral methods with Wiener-Sobolev deconvolution
Mixed Moments: \(\hat{m}_n(t_k) = \frac{1}{I} \sum_{i} Y_{i,k} \xi_i(n)\) where \(\xi_i(n) = \mathbb{E}[e^{-in\theta_i} | \mathbf{Y}_i]\)
\(\Phi\)-Step (Pooled Wiener Filter): \[\hat{\Phi}(\omega) = \frac{\sum_k \overline{F_k(\omega)} M_k(\omega)}{\sum_k |F_k(\omega)|^2 + \lambda_\varphi \Lambda(\omega)}.\]
F-Step (Per-Window Wiener Filter): \[\hat{F}_k(\omega) = \frac{\overline{\Phi(\omega)} M_k(\omega)}{|\Phi(\omega)|^2 + \lambda_S \Lambda(\omega)},\]
where \(\Lambda(\omega) = (2\sin(\omega/2))^{2s}\) is the Sobolev penalty.
# ==============================================================================
# KPT-FFT: OPTIMAL IMPLEMENTATION
# ==============================================================================
#
# Key features:
# - Pooled multi-window Wiener deconvolution for stable φ estimation
# - Hermitian symmetry enforcement for real-valued outputs
# - PCA-based initialization
# - Damping for stable convergence
# - Optional phase anchoring for identifiability
#
# ==============================================================================
#' KPT-FFT Algorithm for Phase Recovery
#'
#' @param Y Data matrix (n_stars × n_windows)
#' @param L_theta Number of theta grid points (default: 64)
#' @param J Number of Fourier harmonics (default: 6)
#' @param lambda_phi Regularization strength for φ (default: 0.05)
#' @param lambda_S Regularization strength for S (default: 0.10)
#' @param s_sobolev Sobolev smoothness order (default: 2)
#' @param damp Damping factor for updates (default: 0.5)
#' @param min_iter Minimum iterations (default: 20)
#' @param max_iter Maximum iterations (default: 300)
#' @param tol Convergence tolerance (default: 1e-5)
#' @param verbose Print progress (default: TRUE)
#' @return List with estimated parameters and diagnostics
kpt_fft <- function(Y,
L_theta = 64,
J = 6,
lambda_phi = 0.05,
lambda_S = 0.10,
s_sobolev = 2,
damp = 0.5,
min_iter = 20,
max_iter = 300,
tol = 1e-5,
verbose = TRUE) {
I <- nrow(Y) # Number of realizations
K <- ncol(Y) # Number of time windows
theta <- theta_grid(L_theta)
nvec <- (-J):J
Nn <- length(nvec)
# =========================================================================
# SETUP: Basis matrices and penalty
# =========================================================================
# Basis for Fourier synthesis/analysis
Eneg <- exp(-1i * outer(theta, nvec)) # (L_theta × Nn)
Epos <- exp(1i * outer(nvec, theta)) # (Nn × L_theta)
# Padded length for convolution (reduces aliasing)
M <- 4*J + 1
omega <- 2*pi * (0:(M-1)) / M
# Sobolev penalty weights
Lambda <- 1 + (2 * sin(omega/2))^(2 * s_sobolev)
ridge <- 1e-8
# FFT helper functions
coeffs_to_dft_pad <- function(c, nvec, M) {
z <- rep(0+0i, M)
for (j in seq_along(nvec)) {
idx <- (nvec[j] %% M) + 1
z[idx] <- c[j]
}
z
}
coeffs_from_dft_pad <- function(z, nvec, M) {
out <- rep(0+0i, length(nvec))
for (j in seq_along(nvec)) {
idx <- (nvec[j] %% M) + 1
out[j] <- z[idx]
}
out
}
Xomega_pad <- function(c, nvec, M) fft(coeffs_to_dft_pad(c, nvec, M))
invXomega_pad <- function(Xw, nvec, M) {
z <- fft(Xw, inverse = TRUE) / M
coeffs_from_dft_pad(z, nvec, M)
}
# =========================================================================
# INITIALIZATION (PCA-based)
# =========================================================================
if (verbose) {
cat("\n")
cat("════════════════════════════════════════════════════════════════\n")
cat(" KPT-FFT ALGORITHM (Spectral Wiener-Sobolev) \n")
cat("════════════════════════════════════════════════════════════════\n")
cat(sprintf("Data: %d realizations × %d windows\n", I, K))
cat(sprintf("Fourier modes: J = %d (indices -%d to %d)\n", J, J, J))
cat(sprintf("Regularization: λ_φ = %.3f, λ_S = %.3f\n", lambda_phi, lambda_S))
cat("\n")
}
# PCA initialization
init <- pca_phase_init(Y)
theta_init <- init$theta_init
# Initialize φ from PCA phases
phi <- circ_kde_vonmises(theta, theta_init, kappa = 6)
phi <- project_density_mean1(phi)
# Handle missing values
Yimp <- Y
col_mu <- colMeans(Yimp, na.rm = TRUE)
for (k in 1:K) {
bad <- !is.finite(Yimp[, k])
if (any(bad)) Yimp[bad, k] <- col_mu[k]
}
# Initialize S via harmonic regression
Xsub <- matrix(1, nrow = I, ncol = 1 + 2*J)
for (j in 1:J) {
Xsub[, 2*j] <- cos(j * theta_init)
Xsub[, 2*j+1] <- sin(j * theta_init)
}
S <- matrix(0, nrow = K, ncol = L_theta)
for (k in 1:K) {
beta <- lm.fit(Xsub, Yimp[, k])$coefficients
beta[is.na(beta)] <- 0
Sk <- rep(beta[1], L_theta)
for (j in 1:J) {
Sk <- Sk + beta[2*j] * cos(j * theta) + beta[2*j+1] * sin(j * theta)
}
S[k, ] <- Sk
}
# Initialize Fourier coefficients from S
f_n <- matrix(0+0i, nrow = K, ncol = Nn)
for (k in 1:K) {
f_n[k, ] <- (S[k, ] %*% Eneg) / L_theta
f_n[k, ] <- enforce_hermitian(f_n[k, ], nvec)
}
# Initial noise variance
sigma2 <- var(as.vector(Yimp - rowMeans(Yimp, na.rm = TRUE)))
sigma2 <- max(sigma2, 0.01)
if (verbose) {
cat(sprintf("Initial φ deviation: %.4f\n", max(abs(phi - 1))))
cat(sprintf("Initial S variation: %.4f\n", mean(apply(S, 1, sd))))
cat(sprintf("Initial σ: %.4f\n", sqrt(sigma2)))
cat("\n")
}
# =========================================================================
# TRACKING
# =========================================================================
history <- data.frame(
iter = integer(),
ll = numeric(),
dphi = numeric(),
df = numeric(),
phi_dev = numeric(),
sigma = numeric()
)
# =========================================================================
# ALTERNATING OPTIMIZATION
# =========================================================================
for (iter in 1:max_iter) {
phi_old <- phi
f_old <- f_n
sigma2_old <- sigma2
# Recompute S from f_n
for (k in 1:K) S[k, ] <- Re(f_n[k, ] %*% Epos)
# =====================================================================
# E-STEP: Curve-level posterior weights and Fourier factors
# =====================================================================
log_prior <- log(pmax(phi, 1e-12))
w <- matrix(0, nrow = I, ncol = L_theta)
ll_total <- 0
for (i in 1:I) {
valid_k <- which(is.finite(Y[i, ]))
if (length(valid_k) == 0) next
y_i <- Y[i, valid_k]
S_i <- S[valid_k, , drop = FALSE]
resid2 <- colSums((t(S_i) - y_i)^2)
log_w <- -resid2 / (2 * sigma2) + log_prior
m <- max(log_w)
ww <- exp(log_w - m)
ww <- ww / sum(ww)
w[i, ] <- ww
ll_total <- ll_total + (m + log(sum(exp(log_w - m))))
}
# Conditional Fourier factors: ξ_i(n) = Σ_l w_{i,l} e^{-in θ_l}
xi <- w %*% Eneg # (I × Nn)
# Mixed moments: m_hat[k,n] = mean_i Y_{ik} ξ_i(n)
m_hat <- matrix(0+0i, nrow = K, ncol = Nn)
for (k in 1:K) {
valid_i <- which(is.finite(Y[, k]))
if (length(valid_i) == 0) next
yk <- Y[valid_i, k]
m_hat[k, ] <- colMeans(xi[valid_i, , drop = FALSE] * yk)
}
# =====================================================================
# Frequency-domain objects (padded)
# =====================================================================
Fw <- matrix(0+0i, nrow = K, ncol = M)
Mw <- matrix(0+0i, nrow = K, ncol = M)
for (k in 1:K) {
Fw[k, ] <- Xomega_pad(f_n[k, ], nvec, M)
Mw[k, ] <- Xomega_pad(m_hat[k, ], nvec, M)
}
# =====================================================================
# Φ-STEP: Pooled multi-window Wiener deconvolution
# =====================================================================
num <- colSums(Conj(Fw) * Mw)
den <- colSums(Mod(Fw)^2) + lambda_phi * Lambda + ridge
Phiw <- num / den
phi_hat_n <- invXomega_pad(Phiw, nvec, M)
phi_hat_n <- enforce_hermitian(phi_hat_n, nvec)
# Optional anchoring: fix arg(φ_1) = 0 for identifiability
j1 <- which(nvec == 1)
if (length(j1) == 1 && Mod(phi_hat_n[j1]) > 1e-6) {
alpha <- Arg(phi_hat_n[j1])
phi_hat_n <- phi_hat_n * exp(-1i * nvec * alpha)
f_n <- f_n * matrix(exp(1i * nvec * alpha), nrow = K, ncol = Nn, byrow = TRUE)
}
phi_raw <- Re(phi_hat_n %*% Epos)
phi_new <- project_density_mean1(as.numeric(phi_raw))
phi <- project_density_mean1((1 - damp) * phi + damp * phi_new)
# =====================================================================
# F-STEP: Per-window Wiener update with shared Φ
# =====================================================================
Phiw_new <- Xomega_pad(phi_hat_n, nvec, M)
denF <- Mod(Phiw_new)^2 + lambda_S * Lambda + ridge
for (k in 1:K) {
Fw_new <- Conj(Phiw_new) * Mw[k, ] / denF
f_k <- invXomega_pad(Fw_new, nvec, M)
f_k <- enforce_hermitian(f_k, nvec)
f_n[k, ] <- (1 - damp) * f_n[k, ] + damp * f_k
}
# Recompute S with new f_n
for (k in 1:K) S[k, ] <- Re(f_n[k, ] %*% Epos)
# =====================================================================
# Update σ²
# =====================================================================
ss_total <- 0
n_obs <- 0
for (k in 1:K) {
valid_i <- which(is.finite(Y[, k]))
if (length(valid_i) == 0) next
wk <- w[valid_i, , drop = FALSE]
yk <- Y[valid_i, k]
resid2_mat <- (outer(yk, S[k, ], "-"))^2
ss_total <- ss_total + sum(wk * resid2_mat)
n_obs <- n_obs + length(valid_i)
}
sigma2_new <- ss_total / max(n_obs, 1)
sigma2 <- (1 - damp) * sigma2 + damp * sigma2_new
# =====================================================================
# CONVERGENCE DIAGNOSTICS
# =====================================================================
phi_dev <- max(abs(phi - 1))
dphi <- sqrt(mean((phi - phi_old)^2))
df <- sqrt(mean(Mod(f_n - f_old)^2))
history <- rbind(history, data.frame(
iter = iter,
ll = ll_total,
dphi = dphi,
df = df,
phi_dev = phi_dev,
sigma = sqrt(sigma2)
))
if (verbose && (iter %% 20 == 0 || iter <= 5)) {
cat(sprintf("Iter %3d: LL=%9.1f σ=%.3f Δφ=%.2e Δf=%.2e φ_dev=%.3f\n",
iter, ll_total, sqrt(sigma2), dphi, df, phi_dev))
}
# Check convergence
if (iter >= min_iter && max(dphi, df, abs(sigma2 - sigma2_old)) < tol) {
if (verbose) cat("Converged!\n")
break
}
}
# =========================================================================
# FINAL RESULTS
# =========================================================================
# Posterior mean phases
mu_i <- sapply(1:I, function(i) sum(theta * w[i, ]) / sum(w[i, ]))
# E[S(θ)] under φ
E_S <- sapply(1:K, function(k) sum(S[k, ] * phi) / sum(phi))
# KPT estimate
kpt_estimate <- mean(E_S)
# S variation
S_variation <- mean(apply(S, 1, sd))
if (verbose) {
cat("\n")
cat("════════════════════════════════════════════════════════════════\n")
cat(" KPT-FFT RESULTS \n")
cat("════════════════════════════════════════════════════════════════\n")
cat(sprintf("Final σ: %.4f\n", sqrt(sigma2)))
cat(sprintf("φ deviation from uniform: %.4f\n", phi_dev))
cat(sprintf("S average θ-variation: %.4f\n", S_variation))
cat(sprintf("Iterations: %d\n", iter))
cat(sprintf("Final log-likelihood: %.1f\n", ll_total))
if (phi_dev < 0.3 && S_variation < 0.2) {
cat("\n→ Result: Minimal phase structure detected (consistent with H₀)\n")
} else if (phi_dev > 0.5) {
cat("\n→ Result: Significant phase structure detected (consistent with H₁)\n")
}
cat("════════════════════════════════════════════════════════════════\n")
}
list(
# Grid
theta = theta,
L_theta = L_theta,
J = J,
nvec = nvec,
# Estimated parameters
phi = phi,
S = S,
f_n = f_n,
sigma = sqrt(sigma2),
# Posterior quantities
w = w,
mu_i = mu_i,
# Derived quantities
E_S = E_S,
kpt_estimate = kpt_estimate,
# Diagnostics
history = history,
phi_uniform_dist = phi_dev,
S_variation = S_variation,
log_lik = ll_total,
# Initialization
theta_init = theta_init,
# Convergence
converged = (iter < max_iter),
n_iter = iter
)
}| Parameter | Current Default | Recommended Adjustment | Why |
|---|---|---|---|
damp |
0.5 | \(\downarrow\) Try \(0.2–0.3\) | Most critical for stability. Lower damping prevents oscillations in alternating updates (\(\varphi \leftrightarrow f_n\)). If \(\frac{\Delta\varphi}{\Delta f}\) oscillating in history, reduce this first |
lambda_phi |
0.05 | \(\uparrow\) Try 0.1–0.3 | Weak φ-regularization causes unstable density estimates (\(\varphi\to 0\) or \(\varphi\to \infty\)). Increase if phi_dev explodes or log-likelihood jumps erratically. |
lambda_S |
0.10 | \(\uparrow\) Try 0.2–0.5 | Stabilizes per-window signal estimates. Increase if df (\(\Delta f\)) remains large while \(\Delta\varphi\) converges. |
J |
6 | \(\downarrow\) Try 4–5 | Too many harmonics relative to data → overfitting/instability.
Reduce if n_obs < \(10·J·K\). |
s_sobolev |
2 | \(\uparrow\) Try 3 | Stronger smoothing in frequency domain improves numerical stability (especially with noisy data). |
tol |
1e-5 | \(\uparrow\) Try 1e-4 | Overly strict tolerance may prevent convergence if parameters jitter near optimum. |
# ==============================================================================
# PREDICTION FUNCTIONS FOR KPT MODELS
# ==============================================================================
#' Generate predictions from KPT fit (works for both GEM and FFT)
#'
#' @param fit Output from kpt_gem() or kpt_fft()
#' @param n_new Number of new time windows to predict
#' @param n_draw Number of Monte Carlo draws (default: 5000)
#' @return List with point estimates, intervals, and draws
kpt_predict <- function(fit, n_new, n_draw = 5000) {
phi <- fit$phi
S_avg <- colMeans(fit$S)
L <- length(fit$theta)
sigma <- fit$sigma
# Convert φ to probability
p_theta <- pmax(phi, 1e-10)
p_theta <- p_theta / sum(p_theta)
# Monte Carlo draws
draws <- matrix(NA, n_draw, n_new)
for (k in 1:n_new) {
theta_idx <- sample(L, n_draw, replace = TRUE, prob = p_theta)
draws[, k] <- S_avg[theta_idx] + rnorm(n_draw, 0, sigma)
}
# Summary statistics
list(
point = rep(mean(S_avg), n_new),
mean = colMeans(draws),
sd = apply(draws, 2, sd),
draws = draws,
pi_low = apply(draws, 2, quantile, 0.025),
pi_high = apply(draws, 2, quantile, 0.975),
pi_50_low = apply(draws, 2, quantile, 0.25),
pi_50_high = apply(draws, 2, quantile, 0.75)
)
}Next we execute the validation protocol.
# ==============================================================================
# COMPREHENSIVE VALIDATION PROTOCOL
# ==============================================================================
#' Run comprehensive validation on simulated data
#'
#' @param n_simulations Number of simulation replicates
#' @param seed Random seed
validate_kpt <- function(n_simulations = 5, seed = 123) {
set.seed(seed)
results <- list()
for (sim in 1:n_simulations) {
cat(sprintf("\n=== Simulation %d/%d ===\n", sim, n_simulations))
# Test 1: Uniform phase (should recover φ ≈ uniform)
sim_uniform <- simulate_kpt_data(
n_stars = 150,
n_windows = 20,
phase_law = "uniform",
effect_size = 0.6,
noise_sd = 0.5,
seed = seed + sim
)
# Test 2: von Mises phase (should recover non-uniform φ)
sim_vonmises <- simulate_kpt_data(
n_stars = 150,
n_windows = 20,
phase_law = "vonMises",
mu_true = pi,
kappa_true = 2.0,
effect_size = 0.6,
noise_sd = 0.5,
seed = seed + sim + 1000
)
# Fit both algorithms on both datasets
fit_gem_unif <- kpt_gem(sim_uniform$Y, verbose = FALSE)
fit_gem_vm <- kpt_gem(sim_vonmises$Y, verbose = FALSE)
fit_fft_unif <- kpt_fft(sim_uniform$Y, verbose = FALSE)
fit_fft_vm <- kpt_fft(sim_vonmises$Y, verbose = FALSE)
results[[sim]] <- data.frame(
simulation = sim,
algorithm = c("KPT-GEM", "KPT-GEM", "KPT-FFT", "KPT-FFT"),
phase_law = c("Uniform", "vonMises", "Uniform", "vonMises"),
phi_deviation = c(fit_gem_unif$phi_uniform_dist,
fit_gem_vm$phi_uniform_dist,
fit_fft_unif$phi_uniform_dist,
fit_fft_vm$phi_uniform_dist),
S_variation = c(fit_gem_unif$S_variation,
fit_gem_vm$S_variation,
fit_fft_unif$S_variation,
fit_fft_vm$S_variation),
sigma_est = c(fit_gem_unif$sigma,
fit_gem_vm$sigma,
fit_fft_unif$sigma,
fit_fft_vm$sigma),
converged = c(fit_gem_unif$converged,
fit_gem_vm$converged,
fit_fft_unif$converged,
fit_fft_vm$converged)
)
}
# Combine results
all_results <- do.call(rbind, results)
# Summary statistics
summary_stats <- all_results %>%
group_by(algorithm, phase_law) %>%
summarise(
phi_dev_mean = mean(phi_deviation),
phi_dev_sd = sd(phi_deviation),
S_var_mean = mean(S_variation),
sigma_mean = mean(sigma_est),
converge_rate = mean(converged),
.groups = "drop"
)
list(
all_results = all_results,
summary = summary_stats
)
}First we generate the test (Kepler) dataset.
# ==============================================================================
# VALIDATION WITH SIMULATED DATA
# ==============================================================================
cat("\n")## ╔══════════════════════════════════════════════════════════════════╗
## ║ KPT VALIDATION WITH SIMULATED DATA ║
## ╚══════════════════════════════════════════════════════════════════╝
# Generate test datasets
set.seed(42)
# Uniform phase (null case - should detect NO structure)
sim_uniform <- simulate_kpt_data(
n_stars = 1000,
n_windows = 50,
phase_law = "uniform",
effect_size = 0.6,
noise_sd = 0.5,
seed = 123
)## ═══════════════════════════════════════════════════════════════
## SIMULATING KPT TEST DATA
## ═══════════════════════════════════════════════════════════════
## Stars (I): 1000
## Windows (K): 50
## Phase law: uniform
## Effect size: 0.60
## Noise SD: 0.50
##
## Data summary:
## Y dimensions: 1000 × 50
## Y mean: 0.0012
## Y SD: 0.7287
## ═══════════════════════════════════════════════════════════════
# von Mises phase (alternative case - should detect structure)
sim_vonmises <- simulate_kpt_data(
n_stars = 1000,
n_windows = 50,
phase_law = "vonMises",
mu_true = pi,
kappa_true = 2.0,
effect_size = 0.6,
noise_sd = 0.5,
seed = 456
)## ═══════════════════════════════════════════════════════════════
## SIMULATING KPT TEST DATA
## ═══════════════════════════════════════════════════════════════
## Stars (I): 1000
## Windows (K): 50
## Phase law: vonMises
## μ_true = 3.14 (180°)
## κ_true = 2.00
## Effect size: 0.60
## Noise SD: 0.50
##
## Data summary:
## Y dimensions: 1000 × 50
## Y mean: -0.2563
## Y SD: 0.6252
## ═══════════════════════════════════════════════════════════════
# ==============================================================================
# VISUALIZE SIMULATED DATA
# ==============================================================================
par(mfrow = c(2, 3), mar = c(4, 4, 3, 1))
# --- UNIFORM PHASE ---
hist(sim_uniform$theta_i_true, breaks = 20, freq = FALSE,
main = "Uniform: True Phase Distribution",
xlab = "θ (radians)", col = "lightblue", border = "steelblue")
abline(h = 1/(2*pi), col = "red", lwd = 2, lty = 2)
legend("topright", "True uniform", col = "red", lwd = 2, lty = 2, bty = "n")
image(1:sim_uniform$n_windows, sim_uniform$theta_grid, sim_uniform$S_true,
main = "Uniform: True Surface S(θ, t)",
xlab = "Time Window", ylab = "θ (radians)",
col = viridis(50))
plot(sim_uniform$theta_i_true, rowMeans(sim_uniform$Y),
pch = 19, cex = 0.6, col = alpha("steelblue", 0.5),
main = "Uniform: Y vs θ",
xlab = "True θ", ylab = "Mean Y")
# --- VON MISES PHASE ---
hist(sim_vonmises$theta_i_true, breaks = 20, freq = FALSE,
main = "von Mises: True Phase Distribution",
xlab = "θ (radians)", col = "lightgreen", border = "forestgreen")
lines(sim_vonmises$theta_grid,
sim_vonmises$phi_true / (2*pi),
col = "red", lwd = 2)
legend("topright", "True φ(θ)", col = "red", lwd = 2, bty = "n")
image(1:sim_vonmises$n_windows, sim_vonmises$theta_grid, sim_vonmises$S_true,
main = "von Mises: True Surface S(θ, t)",
xlab = "Time Window", ylab = "θ (radians)",
col = viridis(50))
plot(sim_vonmises$theta_i_true, rowMeans(sim_vonmises$Y),
pch = 19, cex = 0.6, col = alpha("forestgreen", 0.5),
main = "von Mises: Y vs θ",
xlab = "True θ", ylab = "Mean Y")# ==============================================================================
# RUN KPT ALGORITHMS ON SIMULATED DATA
# ==============================================================================
cat("\n### KPT-GEM on Uniform Data ###\n")##
## ### KPT-GEM on Uniform Data ###
##
## ════════════════════════════════════════════════════════════════
## KPT-GEM ALGORITHM
## ════════════════════════════════════════════════════════════════
## Data: 1000 realizations × 50 windows
## Theta grid: 64 points, 3 harmonics
##
## Initial φ deviation: 0.9719
## Initial S variation: 0.4259
## Initial σ: 0.5874
## Iter 1: LL= -32807.7 σ=0.685 Δφ=1.28e+00 ΔS=1.98e-01 φ_dev=4.618
## Iter 2: LL= -21594.8 σ=0.699 Δφ=7.06e-01 ΔS=1.33e-01 φ_dev=6.890
## Iter 3: LL= -20970.2 σ=0.703 Δφ=6.05e-01 ΔS=9.38e-02 φ_dev=8.776
## Iter 4: LL= -21286.7 σ=0.696 Δφ=5.15e-01 ΔS=5.32e-02 φ_dev=10.335
## Iter 5: LL= -22014.8 σ=0.684 Δφ=3.72e-01 ΔS=3.03e-02 φ_dev=11.272
## Iter 20: LL= -23741.0 σ=0.662 Δφ=7.68e-03 ΔS=4.94e-03 φ_dev=12.797
## Iter 40: LL= -23317.4 σ=0.666 Δφ=1.95e-04 ΔS=9.86e-05 φ_dev=12.754
## Converged!
##
## ════════════════════════════════════════════════════════════════
## KPT-GEM RESULTS
## ════════════════════════════════════════════════════════════════
## Final σ: 0.6661
## φ deviation from uniform: 12.7527
## S average θ-variation: 0.1273
## Iterations: 55
## Final log-likelihood: -23310.0
##
## → Result: Significant phase structure detected (consistent with H₁)
## ════════════════════════════════════════════════════════════════
##
## ### KPT-GEM on von Mises Data ###
##
## ════════════════════════════════════════════════════════════════
## KPT-GEM ALGORITHM
## ════════════════════════════════════════════════════════════════
## Data: 1000 realizations × 50 windows
## Theta grid: 64 points, 3 harmonics
##
## Initial φ deviation: 0.7879
## Initial S variation: 0.2312
## Initial σ: 0.5966
## Iter 1: LL= -27194.4 σ=0.589 Δφ=6.46e-01 ΔS=1.54e-01 φ_dev=1.713
## Iter 2: LL= -28402.9 σ=0.580 Δφ=9.05e-01 ΔS=7.20e-02 φ_dev=4.030
## Iter 3: LL= -30088.3 σ=0.576 Δφ=6.88e-01 ΔS=4.93e-02 φ_dev=6.411
## Iter 4: LL= -30564.2 σ=0.577 Δφ=5.61e-01 ΔS=4.09e-02 φ_dev=8.453
## Iter 5: LL= -30576.7 σ=0.577 Δφ=4.61e-01 ΔS=3.16e-02 φ_dev=10.082
## Iter 20: LL= -31557.0 σ=0.575 Δφ=9.57e-03 ΔS=6.32e-04 φ_dev=12.912
## Iter 40: LL= -32035.9 σ=0.575 Δφ=9.65e-04 ΔS=3.38e-04 φ_dev=12.955
## Iter 60: LL= -32082.3 σ=0.574 Δφ=3.39e-05 ΔS=2.39e-06 φ_dev=12.956
## Converged!
##
## ════════════════════════════════════════════════════════════════
## KPT-GEM RESULTS
## ════════════════════════════════════════════════════════════════
## Final σ: 0.5744
## φ deviation from uniform: 12.9559
## S average θ-variation: 0.0176
## Iterations: 66
## Final log-likelihood: -32082.3
##
## → Result: Significant phase structure detected (consistent with H₁)
## ════════════════════════════════════════════════════════════════
##
## ### KPT-FFT on Uniform Data ###
##
## ════════════════════════════════════════════════════════════════
## KPT-FFT ALGORITHM (Spectral Wiener-Sobolev)
## ════════════════════════════════════════════════════════════════
## Data: 1000 realizations × 50 windows
## Fourier modes: J = 6 (indices -6 to 6)
## Regularization: λ_φ = 0.050, λ_S = 0.100
##
## Initial φ deviation: 0.9719
## Initial S variation: 0.4098
## Initial σ: 0.5874
## Iter 1: LL= -32478.2 σ=0.666 Δφ=3.00e-01 Δf=1.68e-01 φ_dev=1.377
## Iter 2: LL= -26188.2 σ=0.691 Δφ=2.69e-01 Δf=9.02e-02 φ_dev=1.760
## Iter 3: LL= -26747.0 σ=0.678 Δφ=3.31e-01 Δf=3.18e-02 φ_dev=1.582
## Iter 4: LL= -26885.3 σ=0.680 Δφ=6.28e-01 Δf=3.52e-02 φ_dev=1.241
## Iter 5: LL= -27198.8 σ=0.745 Δφ=5.42e-01 Δf=5.05e-02 φ_dev=2.140
## Iter 20: LL= -29665.3 σ=0.672 Δφ=1.47e-01 Δf=4.67e-03 φ_dev=2.618
## Iter 40: LL= -28253.6 σ=0.695 Δφ=2.94e-01 Δf=1.44e-02 φ_dev=2.168
## Iter 60: LL= -29847.6 σ=0.680 Δφ=2.04e-01 Δf=4.24e-03 φ_dev=3.918
## Iter 80: LL= -26839.1 σ=0.699 Δφ=1.56e-01 Δf=1.19e-02 φ_dev=3.464
## Iter 100: LL= -25976.2 σ=0.711 Δφ=2.27e-01 Δf=7.30e-03 φ_dev=4.384
## Iter 120: LL= -30896.5 σ=0.666 Δφ=5.50e-02 Δf=1.27e-03 φ_dev=2.976
## Iter 140: LL= -30045.4 σ=0.671 Δφ=1.39e-01 Δf=1.22e-02 φ_dev=3.615
## Iter 160: LL= -29861.9 σ=0.676 Δφ=1.73e-01 Δf=3.49e-03 φ_dev=4.084
## Iter 180: LL= -30791.7 σ=0.665 Δφ=5.65e-02 Δf=2.09e-03 φ_dev=3.378
## Iter 200: LL= -32631.1 σ=0.655 Δφ=2.99e-01 Δf=8.71e-03 φ_dev=5.436
## Iter 220: LL= -32394.1 σ=0.640 Δφ=5.41e-01 Δf=2.66e-02 φ_dev=1.984
## Iter 240: LL= -27687.0 σ=0.698 Δφ=6.24e-02 Δf=2.79e-03 φ_dev=9.070
## Iter 260: LL= -30084.3 σ=0.668 Δφ=5.26e-01 Δf=2.68e-02 φ_dev=7.539
## Iter 280: LL= -28328.3 σ=0.698 Δφ=1.43e-01 Δf=1.77e-03 φ_dev=6.735
## Iter 300: LL= -32224.4 σ=0.658 Δφ=5.24e-02 Δf=1.38e-03 φ_dev=3.320
##
## ════════════════════════════════════════════════════════════════
## KPT-FFT RESULTS
## ════════════════════════════════════════════════════════════════
## Final σ: 0.6576
## φ deviation from uniform: 3.3201
## S average θ-variation: 0.1312
## Iterations: 300
## Final log-likelihood: -32224.4
##
## → Result: Significant phase structure detected (consistent with H₁)
## ════════════════════════════════════════════════════════════════
##
## ### KPT-FFT on von Mises Data ###
##
## ════════════════════════════════════════════════════════════════
## KPT-FFT ALGORITHM (Spectral Wiener-Sobolev)
## ════════════════════════════════════════════════════════════════
## Data: 1000 realizations × 50 windows
## Fourier modes: J = 6 (indices -6 to 6)
## Regularization: λ_φ = 0.050, λ_S = 0.100
##
## Initial φ deviation: 0.7879
## Initial S variation: 0.2308
## Initial σ: 0.5966
## Iter 1: LL= -27355.3 σ=0.591 Δφ=3.77e-01 Δf=5.42e-02 φ_dev=0.978
## Iter 2: LL= -28927.5 σ=0.585 Δφ=2.63e-01 Δf=2.77e-02 φ_dev=0.961
## Iter 3: LL= -29972.2 σ=0.583 Δφ=4.94e-01 Δf=3.55e-02 φ_dev=1.312
## Iter 4: LL= -30013.1 σ=0.583 Δφ=6.03e-01 Δf=1.60e-02 φ_dev=2.664
## Iter 5: LL= -30168.2 σ=0.583 Δφ=6.19e-01 Δf=1.53e-02 φ_dev=4.316
## Iter 20: LL= -33037.6 σ=0.566 Δφ=6.79e-02 Δf=2.83e-03 φ_dev=10.900
## Iter 40: LL= -33260.4 σ=0.574 Δφ=6.53e-02 Δf=5.33e-03 φ_dev=9.323
## Iter 60: LL= -35262.3 σ=0.572 Δφ=1.11e-01 Δf=4.12e-03 φ_dev=8.017
## Iter 80: LL= -34423.3 σ=0.587 Δφ=3.32e-01 Δf=2.03e-02 φ_dev=10.993
## Iter 100: LL= -25279.8 σ=0.650 Δφ=8.67e-01 Δf=2.06e-02 φ_dev=5.832
## Iter 120: LL= -34499.2 σ=0.574 Δφ=4.90e-01 Δf=1.99e-02 φ_dev=11.672
## Iter 140: LL= -35612.7 σ=0.580 Δφ=1.58e-01 Δf=1.29e-03 φ_dev=3.967
## Iter 160: LL= -22830.3 σ=0.676 Δφ=7.76e-01 Δf=2.47e-02 φ_dev=5.899
## Iter 180: LL= -33253.3 σ=0.566 Δφ=4.16e-02 Δf=5.82e-04 φ_dev=11.099
## Iter 200: LL= -33227.7 σ=0.570 Δφ=6.55e-02 Δf=5.32e-03 φ_dev=10.408
## Iter 220: LL= -27572.5 σ=0.638 Δφ=6.47e-01 Δf=1.97e-02 φ_dev=5.476
## Iter 240: LL= -33047.3 σ=0.569 Δφ=9.36e-02 Δf=3.50e-03 φ_dev=7.136
## Iter 260: LL= -33380.2 σ=0.575 Δφ=4.57e-02 Δf=1.19e-03 φ_dev=8.919
## Iter 280: LL= -33960.9 σ=0.578 Δφ=2.84e-01 Δf=5.69e-03 φ_dev=10.149
## Iter 300: LL= -33344.8 σ=0.571 Δφ=2.43e-01 Δf=5.92e-03 φ_dev=10.480
##
## ════════════════════════════════════════════════════════════════
## KPT-FFT RESULTS
## ════════════════════════════════════════════════════════════════
## Final σ: 0.5710
## φ deviation from uniform: 10.4802
## S average θ-variation: 0.0845
## Iterations: 300
## Final log-likelihood: -33344.8
##
## → Result: Significant phase structure detected (consistent with H₁)
## ════════════════════════════════════════════════════════════════
# ==============================================================================
# VISUALIZE VALIDATION RESULTS
# ==============================================================================
par(mfrow = c(2, 4), mar = c(4, 4, 3, 1))
# --- ROW 1: UNIFORM DATA ---
# True vs Recovered φ (GEM)
plot(fit_gem_uniform$theta, fit_gem_uniform$phi, type = "l",
lwd = 2, col = "blue",
main = "Uniform: GEM φ(θ)",
xlab = "θ", ylab = "φ(θ)", ylim = c(0, 2.5))
abline(h = 1, col = "red", lwd = 2, lty = 2)
legend("topright", c("Estimated", "True (uniform)"),
col = c("blue", "red"), lwd = 2, lty = c(1, 2), bty = "n", cex = 0.8)
mtext(sprintf("Dev = %.3f", fit_gem_uniform$phi_uniform_dist),
side = 3, line = 0.2, cex = 0.7)
# True vs Recovered φ (FFT)
plot(fit_fft_uniform$theta, fit_fft_uniform$phi, type = "l",
lwd = 2, col = "purple",
main = "Uniform: FFT φ(θ)",
xlab = "θ", ylab = "φ(θ)", ylim = c(0, 2.5))
abline(h = 1, col = "red", lwd = 2, lty = 2)
legend("topright", c("Estimated", "True (uniform)"),
col = c("purple", "red"), lwd = 2, lty = c(1, 2), bty = "n", cex = 0.8)
mtext(sprintf("Dev = %.3f", fit_fft_uniform$phi_uniform_dist),
side = 3, line = 0.2, cex = 0.7)
# Surface heatmap (GEM)
image(fit_gem_uniform$theta, 1:nrow(fit_gem_uniform$S), t(fit_gem_uniform$S),
main = "Uniform: GEM S(θ, t)",
xlab = "θ", ylab = "Time Window", col = viridis(50))
# Convergence (GEM vs FFT)
plot(fit_gem_uniform$history$iter, fit_gem_uniform$history$phi_dev,
type = "l", col = "blue", lwd = 2,
main = "Uniform: Convergence",
xlab = "Iteration", ylab = "φ deviation",
ylim = c(0, max(c(fit_gem_uniform$history$phi_dev,
fit_fft_uniform$history$phi_dev)) * 1.1))
lines(fit_fft_uniform$history$iter, fit_fft_uniform$history$phi_dev,
col = "purple", lwd = 2)
legend("topright", c("GEM", "FFT"), col = c("blue", "purple"),
lwd = 2, bty = "n", cex = 0.8)
# --- ROW 2: VON MISES DATA ---
# True vs Recovered φ (GEM)
y_max <- max(c(fit_gem_vonmises$phi, sim_vonmises$phi_true)) * 1.2
plot(fit_gem_vonmises$theta, fit_gem_vonmises$phi, type = "l",
lwd = 2, col = "blue",
main = "von Mises: GEM φ(θ)",
xlab = "θ", ylab = "φ(θ)", ylim = c(0, y_max))
lines(sim_vonmises$theta_grid, sim_vonmises$phi_true,
col = "red", lwd = 2, lty = 2)
legend("topright", c("Estimated", "True"),
col = c("blue", "red"), lwd = 2, lty = c(1, 2), bty = "n", cex = 0.8)
mtext(sprintf("Dev = %.3f", fit_gem_vonmises$phi_uniform_dist),
side = 3, line = 0.2, cex = 0.7)
# True vs Recovered φ (FFT)
plot(fit_fft_vonmises$theta, fit_fft_vonmises$phi, type = "l",
lwd = 2, col = "purple",
main = "von Mises: FFT φ(θ)",
xlab = "θ", ylab = "φ(θ)", ylim = c(0, y_max))
lines(sim_vonmises$theta_grid, sim_vonmises$phi_true,
col = "red", lwd = 2, lty = 2)
legend("topright", c("Estimated", "True"),
col = c("purple", "red"), lwd = 2, lty = c(1, 2), bty = "n", cex = 0.8)
mtext(sprintf("Dev = %.3f", fit_fft_vonmises$phi_uniform_dist),
side = 3, line = 0.2, cex = 0.7)
# Surface heatmap (FFT)
image(fit_fft_vonmises$theta, 1:nrow(fit_fft_vonmises$S), t(fit_fft_vonmises$S),
main = "von Mises: FFT S(θ, t)",
xlab = "θ", ylab = "Time Window", col = viridis(50))
# Convergence (GEM vs FFT)
plot(fit_gem_vonmises$history$iter, fit_gem_vonmises$history$phi_dev,
type = "l", col = "blue", lwd = 2,
main = "von Mises: Convergence",
xlab = "Iteration", ylab = "φ deviation",
ylim = c(0, max(c(fit_gem_vonmises$history$phi_dev,
fit_fft_vonmises$history$phi_dev)) * 1.1))
lines(fit_fft_vonmises$history$iter, fit_fft_vonmises$history$phi_dev,
col = "purple", lwd = 2)
legend("topright", c("GEM", "FFT"), col = c("blue", "purple"),
lwd = 2, bty = "n", cex = 0.8)# ==============================================================================
# VALIDATION SUMMARY
# ==============================================================================
results_df <- data.frame(
Algorithm = c("KPT-GEM", "KPT-GEM", "KPT-FFT", "KPT-FFT"),
`Phase Law` = c("Uniform", "von Mises", "Uniform", "von Mises"),
`φ Deviation` = round(c(fit_gem_uniform$phi_uniform_dist,
fit_gem_vonmises$phi_uniform_dist,
fit_fft_uniform$phi_uniform_dist,
fit_fft_vonmises$phi_uniform_dist), 4),
`S Variation` = round(c(fit_gem_uniform$S_variation,
fit_gem_vonmises$S_variation,
fit_fft_uniform$S_variation,
fit_fft_vonmises$S_variation), 4),
`σ Estimate` = round(c(fit_gem_uniform$sigma,
fit_gem_vonmises$sigma,
fit_fft_uniform$sigma,
fit_fft_vonmises$sigma), 4),
`True σ` = c(0.5, 0.5, 0.5, 0.5),
Converged = c(fit_gem_uniform$converged,
fit_gem_vonmises$converged,
fit_fft_uniform$converged,
fit_fft_vonmises$converged),
check.names = FALSE
)
cat("\n")## ═══════════════════════════════════════════════════════════════════════════
## KPT VALIDATION SUMMARY
## ═══════════════════════════════════════════════════════════════════════════
## Algorithm Phase Law φ Deviation S Variation σ Estimate True σ Converged
## KPT-GEM Uniform 12.7527 0.1273 0.6661 0.5 TRUE
## KPT-GEM von Mises 12.9559 0.0176 0.5744 0.5 TRUE
## KPT-FFT Uniform 3.3201 0.1312 0.6576 0.5 FALSE
## KPT-FFT von Mises 10.4802 0.0845 0.5710 0.5 FALSE
## EXPECTED BEHAVIOR:
## • Uniform data: φ deviation should be SMALL (< 0.3)
## • von Mises data: φ deviation should be LARGE (> 0.5)
## • σ estimates should be close to true value (0.5)
## • Both algorithms should converge
## ═══════════════════════════════════════════════════════════════════════════
##
## INTERPRETATION:
if (fit_gem_uniform$phi_uniform_dist < 0.3 && fit_gem_vonmises$phi_uniform_dist > 0.5) {
cat("✓ KPT-GEM correctly distinguishes uniform from von Mises phases\n")
} else {
cat("⚠ KPT-GEM may need parameter tuning\n")
}## ⚠ KPT-GEM may need parameter tuning
if (fit_fft_uniform$phi_uniform_dist < 0.3 && fit_fft_vonmises$phi_uniform_dist > 0.5) {
cat("✓ KPT-FFT correctly distinguishes uniform from von Mises phases\n")
} else {
cat("⚠ KPT-FFT may need parameter tuning\n")
}## ⚠ KPT-FFT may need parameter tuning
# ==============================================================================
# KEPLER DATA DOWNLOAD AND PREPARATION
# ==============================================================================
#' Query Kepler Input Catalog for Sun-like targets
#'
#' @param n_targets Maximum number of targets to query
#' @param teff_range Temperature range (K)
#' @param logg_range Surface gravity range (dex)
#' @param mag_range Kepler magnitude range
query_kepler_targets <- function(n_targets = 500,
teff_range = c(5500, 6000),
logg_range = c(4.0, 4.6),
mag_range = c(10, 13)) {
tap_url <- "https://exoplanetarchive.ipac.caltech.edu/TAP/sync"
query <- sprintf(
"SELECT TOP %d kepid, ra, dec, teff, logg, feh, kepmag, radius, mass
FROM keplerstellar
WHERE teff BETWEEN %d AND %d
AND logg BETWEEN %.2f AND %.2f
AND kepmag BETWEEN %.1f AND %.1f
AND radius IS NOT NULL AND mass IS NOT NULL
ORDER BY kepmag ASC",
n_targets,
teff_range[1], teff_range[2],
logg_range[1], logg_range[2],
mag_range[1], mag_range[2]
)
response <- tryCatch({
GET(tap_url, query = list(query = query, format = "json"), timeout(120))
}, error = function(e) NULL)
if (is.null(response) || status_code(response) != 200) {
warning("Failed to query Kepler catalog")
return(NULL)
}
as_tibble(fromJSON(rawToChar(response$content)))
}
#' Simulate Kepler-like stellar light curves
#'
#' @param n_stars Number of stars
#' @param n_time Number of time points
#' @param cadence Cadence in minutes
#' @param variability_ppm RMS variability (ppm)
#' @param phase_law Phase distribution ("uniform" or "vonMises")
simulate_kepler_lightcurves <- function(n_stars = 100,
n_time = 2000,
cadence = 30,
variability_ppm = 100,
phase_law = "uniform") {
# Time in days
dt <- cadence / (60 * 24)
t <- seq(0, (n_time - 1) * dt, by = dt)
# Initialize
Y <- matrix(NA_real_, nrow = n_stars, ncol = n_time)
metadata <- tibble(
star_id = 1:n_stars,
rotation_period = numeric(n_stars),
granulation_tau = numeric(n_stars),
activity_level = numeric(n_stars)
)
for (i in 1:n_stars) {
# Stellar parameters
P_rot <- runif(1, 10, 30)
tau_gran <- runif(1, 0.1, 0.5)
activity <- runif(1, 0.3, 1.5)
metadata$rotation_period[i] <- P_rot
metadata$granulation_tau[i] <- tau_gran
metadata$activity_level[i] <- activity
# Granulation (red noise via AR process)
ar_coef <- exp(-dt / tau_gran)
granulation <- numeric(n_time)
granulation[1] <- rnorm(1, 0, variability_ppm * 0.5 * activity)
for (j in 2:n_time) {
granulation[j] <- ar_coef * granulation[j-1] +
sqrt(1 - ar_coef^2) * rnorm(1, 0, variability_ppm * 0.5 * activity)
}
# Rotation + spots
if (phase_law == "uniform") {
phase <- runif(1, 0, 2*pi)
} else {
phase <- rvonmises(1, mu = pi, kappa = 2)
}
spot_amp <- variability_ppm * 0.3 * activity
rotation_signal <- spot_amp * sin(2*pi*t/P_rot + phase) +
spot_amp * 0.5 * sin(4*pi*t/P_rot + phase + runif(1, 0, pi))
# Photon noise
photon_noise <- rnorm(n_time, 0, variability_ppm * 0.2)
# Combine and normalize
flux <- 1e6 + granulation + rotation_signal + photon_noise
Y[i, ] <- (flux - mean(flux)) / sd(flux)
}
list(
Y = Y,
t = t,
metadata = metadata,
n_stars = n_stars,
n_time = n_time,
cadence_min = cadence
)
}
#' Create KPT matrix from light curves
#'
#' @param sim Output from simulate_kepler_lightcurves()
#' @param window_days Window size in days
#' @param step_days Step between windows
create_kpt_matrix <- function(sim, window_days = 5, step_days = 2) {
Y_full <- sim$Y
t <- sim$t
dt <- t[2] - t[1]
n_stars <- nrow(Y_full)
# Window parameters
window_pts <- round(window_days / dt)
step_pts <- round(step_days / dt)
# Window centers
center_idx <- seq(window_pts/2 + 1, ncol(Y_full) - window_pts/2, by = step_pts)
n_windows <- length(center_idx)
cat(sprintf("Creating KPT matrix:\n"))
cat(sprintf(" Window size: %.1f days (%d points)\n", window_days, window_pts))
cat(sprintf(" Step size: %.1f days (%d points)\n", step_days, step_pts))
cat(sprintf(" Number of windows: %d\n", n_windows))
# Compute variance within each window
Y <- matrix(NA_real_, nrow = n_stars, ncol = n_windows)
t_center <- numeric(n_windows)
for (k in seq_along(center_idx)) {
idx <- (center_idx[k] - window_pts/2 + 1):(center_idx[k] + window_pts/2)
idx <- idx[idx > 0 & idx <= ncol(Y_full)]
Y[, k] <- apply(Y_full[, idx, drop = FALSE], 1, var)
t_center[k] <- mean(t[idx])
}
# Log-transform and standardize
Y_log <- log(Y)
Y_std <- scale(Y_log)
list(
Y = Y_std,
Y_raw = Y,
t_center = t_center,
n_stars = n_stars,
n_windows = n_windows,
metadata = sim$metadata
)
}# ==============================================================================
# EXAMPLE: SIMULATED KEPLER DATA
# ==============================================================================
cat("\n")## ╔══════════════════════════════════════════════════════════════════╗
## ║ KPT APPLICATION TO KEPLER-LIKE DATA ║
## ╚══════════════════════════════════════════════════════════════════╝
# Simulate Kepler light curves
set.seed(42)
kepler_sim <- simulate_kepler_lightcurves(
n_stars = 100,
n_time = 2000,
cadence = 30,
variability_ppm = 100,
phase_law = "uniform" # No phase structure expected
)
cat(sprintf("Simulated %d stellar light curves\n", kepler_sim$n_stars))## Simulated 100 stellar light curves
## Time span: 41.6 days
## Creating KPT matrix:
## Window size: 5.0 days (240 points)
## Step size: 2.0 days (96 points)
## Number of windows: 19
##
## KPT Matrix: 100 stars × 19 windows
# ==============================================================================
# FIT KPT TO KEPLER DATA
# ==============================================================================
cat("\n### KPT-GEM on Kepler Data ###\n")##
## ### KPT-GEM on Kepler Data ###
##
## ════════════════════════════════════════════════════════════════
## KPT-GEM ALGORITHM
## ════════════════════════════════════════════════════════════════
## Data: 100 realizations × 19 windows
## Theta grid: 64 points, 3 harmonics
##
## Initial φ deviation: 0.4175
## Initial S variation: 0.5358
## Initial σ: 0.9314
## Iter 1: LL= -3329.0 σ=0.992 Δφ=2.38e-01 ΔS=3.54e-01 φ_dev=0.665
## Iter 2: LL= -2761.4 σ=1.006 Δφ=7.31e-02 ΔS=1.54e-01 φ_dev=0.761
## Iter 3: LL= -2630.4 σ=1.008 Δφ=6.50e-02 ΔS=8.82e-02 φ_dev=0.841
## Iter 4: LL= -2600.3 σ=1.003 Δφ=6.91e-02 ΔS=6.09e-02 φ_dev=0.914
## Iter 5: LL= -2618.9 σ=0.996 Δφ=7.39e-02 ΔS=6.19e-02 φ_dev=0.984
## Iter 20: LL= -2679.1 σ=0.995 Δφ=5.91e-02 ΔS=1.35e-02 φ_dev=2.430
## Iter 40: LL= -2681.4 σ=0.995 Δφ=1.58e-02 ΔS=1.28e-03 φ_dev=2.973
## Iter 60: LL= -2683.5 σ=0.995 Δφ=6.92e-03 ΔS=5.75e-04 φ_dev=2.985
## Iter 80: LL= -2684.7 σ=0.995 Δφ=2.76e-03 ΔS=1.79e-04 φ_dev=2.970
## Iter 100: LL= -2685.5 σ=0.995 Δφ=1.03e-03 ΔS=1.87e-04 φ_dev=2.968
## Iter 120: LL= -2686.0 σ=0.995 Δφ=4.50e-04 ΔS=1.43e-04 φ_dev=2.966
## Iter 140: LL= -2686.3 σ=0.995 Δφ=2.22e-04 ΔS=1.12e-04 φ_dev=2.964
## Iter 160: LL= -2686.6 σ=0.995 Δφ=1.43e-04 ΔS=8.77e-05 φ_dev=2.962
## Iter 180: LL= -2686.9 σ=0.995 Δφ=1.20e-04 ΔS=6.75e-05 φ_dev=2.961
## Iter 200: LL= -2687.1 σ=0.995 Δφ=1.09e-04 ΔS=5.05e-05 φ_dev=2.961
## Iter 220: LL= -2687.2 σ=0.995 Δφ=9.80e-05 ΔS=3.69e-05 φ_dev=2.960
## Iter 240: LL= -2687.3 σ=0.995 Δφ=8.39e-05 ΔS=2.66e-05 φ_dev=2.959
## Iter 260: LL= -2687.4 σ=0.995 Δφ=6.85e-05 ΔS=1.90e-05 φ_dev=2.959
## Iter 280: LL= -2687.4 σ=0.995 Δφ=5.38e-05 ΔS=1.35e-05 φ_dev=2.959
## Iter 300: LL= -2687.5 σ=0.995 Δφ=4.11e-05 ΔS=9.55e-06 φ_dev=2.958
##
## ════════════════════════════════════════════════════════════════
## KPT-GEM RESULTS
## ════════════════════════════════════════════════════════════════
## Final σ: 0.9950
## φ deviation from uniform: 2.9584
## S average θ-variation: 0.0500
## Iterations: 300
## Final log-likelihood: -2687.5
##
## → Result: Significant phase structure detected (consistent with H₁)
## ════════════════════════════════════════════════════════════════
##
## ### KPT-FFT on Kepler Data ###
##
## ════════════════════════════════════════════════════════════════
## KPT-FFT ALGORITHM (Spectral Wiener-Sobolev)
## ════════════════════════════════════════════════════════════════
## Data: 100 realizations × 19 windows
## Fourier modes: J = 6 (indices -6 to 6)
## Regularization: λ_φ = 0.050, λ_S = 0.100
##
## Initial φ deviation: 0.4175
## Initial S variation: 0.6121
## Initial σ: 0.9314
## Iter 1: LL= -3675.9 σ=0.987 Δφ=1.72e-01 Δf=2.34e-01 φ_dev=0.352
## Iter 2: LL= -2891.2 σ=1.002 Δφ=1.22e-01 Δf=1.18e-01 φ_dev=0.277
## Iter 3: LL= -2675.1 σ=1.001 Δφ=1.55e-01 Δf=6.14e-02 φ_dev=0.439
## Iter 4: LL= -2657.7 σ=0.999 Δφ=1.37e-01 Δf=3.00e-02 φ_dev=0.539
## Iter 5: LL= -2679.0 σ=0.997 Δφ=1.33e-01 Δf=1.68e-02 φ_dev=0.274
## Iter 20: LL= -2695.0 σ=0.995 Δφ=5.91e-03 Δf=1.07e-03 φ_dev=0.305
## Iter 40: LL= -2695.9 σ=0.995 Δφ=1.26e-03 Δf=1.97e-04 φ_dev=0.317
## Iter 60: LL= -2695.8 σ=0.995 Δφ=4.02e-04 Δf=2.80e-05 φ_dev=0.316
## Iter 80: LL= -2695.8 σ=0.995 Δφ=1.98e-05 Δf=4.56e-06 φ_dev=0.316
## Converged!
##
## ════════════════════════════════════════════════════════════════
## KPT-FFT RESULTS
## ════════════════════════════════════════════════════════════════
## Final σ: 0.9947
## φ deviation from uniform: 0.3164
## S average θ-variation: 0.0353
## Iterations: 88
## Final log-likelihood: -2695.8
## ════════════════════════════════════════════════════════════════
# ==============================================================================
# VISUALIZE KEPLER RESULTS
# ==============================================================================
par(mfrow = c(2, 3), mar = c(1, 1, 1, 1))
# Sample light curves
for (i in 1:3) {
plot(kepler_sim$t, kepler_sim$Y[i, ], type = "l", col = "steelblue",
main = sprintf("Star %d Light Curve", i),
xlab = "Time (days)", ylab = "Normalized Flux")
grid()
}
# Convergence comparison
plot(fit_kepler_gem$history$iter, fit_kepler_gem$history$ll,
type = "l", col = "blue", lwd = 2,
main = "Log-Likelihood Convergence",
xlab = "Iteration", ylab = "Log-Likelihood")
lines(fit_kepler_fft$history$iter, fit_kepler_fft$history$ll,
col = "purple", lwd = 2)
legend("bottomright", c("GEM", "FFT"), col = c("blue", "purple"), lwd = 2)
grid()
# Recovered φ(θ)
plot(fit_kepler_gem$theta, fit_kepler_gem$phi, type = "l",
col = "blue", lwd = 2,
main = "Recovered Phase Distribution φ(θ)",
xlab = "θ (radians)", ylab = "φ(θ)", ylim = c(0, 2))
lines(fit_kepler_fft$theta, fit_kepler_fft$phi, col = "purple", lwd = 2)
abline(h = 1, col = "red", lwd = 2, lty = 2)
legend("topright", c("GEM", "FFT", "Uniform"),
col = c("blue", "purple", "red"), lwd = 2, lty = c(1, 1, 2))
grid()
# Kime surface (GEM)
filled.contour(
x = fit_kepler_gem$theta,
y = 1:nrow(fit_kepler_gem$S),
z = t(fit_kepler_gem$S),
color.palette = viridis,
main = "KPT-GEM: Kime Surface S(θ, t)",
xlab = "θ (radians)",
ylab = "Time Window"
)# ==============================================================================
# KEPLER RESULTS SUMMARY
# ==============================================================================
cat("\n")## ═══════════════════════════════════════════════════════════════════════════
## KEPLER DATA ANALYSIS SUMMARY
## ═══════════════════════════════════════════════════════════════════════════
## KPT-GEM Results:
## φ deviation from uniform: 2.9584
## S average variation: 0.0500
## Estimated σ: 0.9950
## Converged: FALSE (iter: 300)
## KPT-FFT Results:
## φ deviation from uniform: 0.3164
## S average variation: 0.0353
## Estimated σ: 0.9947
## Converged: TRUE (iter: 88)
## INTERPRETATION:
if (fit_kepler_gem$phi_uniform_dist < 0.3) {
cat("→ GEM: φ(θ) ≈ uniform — No exploitable phase structure detected\n")
cat(" Stellar variability appears phase-independent (H₀ not rejected)\n")
} else {
cat("→ GEM: φ(θ) shows non-uniform structure — Phase dependence detected (H₁)\n")
}## → GEM: φ(θ) shows non-uniform structure — Phase dependence detected (H₁)
if (fit_kepler_fft$phi_uniform_dist < 0.3) {
cat("→ FFT: φ(θ) ≈ uniform — No exploitable phase structure detected\n")
cat(" Consistent with GEM findings\n")
} else {
cat("→ FFT: φ(θ) shows non-uniform structure\n")
}## → FFT: φ(θ) shows non-uniform structure
## ═══════════════════════════════════════════════════════════════════════════
This notebook provides optimal implementations of two Kime-Phase Tomography algorithms:
The Kepler validation experiments demonstrate that
For the Kepler stellar data:
For new data applications, use
simulate_kpt_data() to generate test data with known
structure. Then validate that the parameter choices recover the known
structure and apply to real data with the validated parameters. Finally,
interpret \(\varphi\) deviation: \(< 0.3\) suggests \(H_o\) can’t be rejected, while \(> 0.5\) rejects the nul and suggests
\(H_1\).
Recommended parameters include:
| Parameter | KPT-GEM | KPT-FFT |
|---|---|---|
L_theta |
64 | 64 |
J (harmonics) |
3 | 6 |
damp |
0.5 | 0.5 |
max_iter |
300 | 300 |
tol |
1e-5 | 1e-5 |
Mission: NASA Kepler Space Telescope (2009–2013)
Archive: Mikulski Archive for Space Telescopes (MAST)
Product: Pre-search Data Conditioning Simple Aperture Photometry (PDCSAP) flux.
Sample Selection:
Data Dictionary
| Variable | Dimension | Description |
|---|---|---|
Y |
\(I \times K\) | Standardized variability matrix; rows = stars, columns = time windows |
Y_raw |
\(I \times K\) | Raw variance of flux within each window (\(ppm^2\)) |
t_center |
\(K\) | Window center times (days from start) |
n_stars |
\(1\) | Number of stellar realizations (\(I = 75\)) |
n_windows |
\(1\) | Number of temporal windows (\(K = 28\)) |
metadata |
\(I \times 10\) | Stellar properties from Kepler Input Catalog |
Metadata fields: kepid (Kepler ID),
ra, dec (J2000 coordinates), teff
(effective temperature, K), logg (surface gravity, dex),
feh (metallicity, dex), kepmag (Kepler
magnitude), radius (R☉), mass (M☉),
star_id (index).
The KPT data structure mapping is shown below.
| KPT Concept | Kepler Implementation |
|---|---|
| Realization index \(i\) | Star \(i\) (independent stellar system) |
| Time window \(k\) | \(5\)-day binned interval |
| Observation \(Y_{i\ k}\) | Standardized flux variance of star \(i\) in window \(k\) |
| Kime-phase \(\theta_i\) | Latent stellar “activity phase” or rotation phase |
| Phase distribution \(\varphi(\theta)\) | Distribution of stellar phases across the sample |
| Kime-surface \(\mathbb{S}(\theta, t)\) | Phase-dependent variability pattern |
If this experiment using Kepler stellar data indicates that the Observed phase \(\varphi(\theta)\approx\) is uniform and that \(\mathbb{S}(\theta, t)\approx\) constant, then KPT suggests that stellar variability in this sample has no exploitable kime-phase structure. The stochastic process (granulation, magnetic activity) is statistically homogeneous across stars. Thus, there would be no evidence of hidden phase variable systematically affects variability levels.
This KPT experiment may not falsify KPT, but it may demonstrate that KPT correctly recovers the null structure when no phase-dependence exists. This may validate the algorithm’s behavior and confirm that Sun-like stellar variability is well-modeled as a phase-independent Gaussian process.
We select Sun-like stars with similar properties to ensure the “process” being measured is comparable across realizations.
#' Query NASA Exoplanet Archive for Kepler stellar properties
#'
#' @param n_targets Maximum number of targets
#' @param teff_range Temperature range (K) for Sun-like stars
#' @param logg_range Surface gravity range for main-sequence
#' @param mag_max Maximum Kepler magnitude (brightness limit)
query_sunlike_targets <- function(n_targets = 500,
teff_range = c(5500, 6000),
logg_range = c(4.0, 4.6),
mag_max = 13) {
base_url <- "https://exoplanetarchive.ipac.caltech.edu/TAP/sync"
query <- sprintf(
"SELECT TOP %d kepid, ra, dec, teff, logg, feh, kepmag, radius, mass
FROM keplerstellar
WHERE teff BETWEEN %d AND %d
AND logg BETWEEN %.1f AND %.1f
AND kepmag < %.1f
AND radius IS NOT NULL
ORDER BY kepmag ASC",
n_targets,
teff_range[1], teff_range[2],
logg_range[1], logg_range[2],
mag_max
)
response <- GET(base_url, query = list(query = query, format = "json"))
if (status_code(response) != 200) {
stop("Query failed with status: ", status_code(response))
}
data <- fromJSON(rawToChar(response$content))
as_tibble(data)
}
# Query Sun-like targets
cat("Querying Kepler archive for Sun-like stars...\n")## Querying Kepler archive for Sun-like stars...
targets <- tryCatch({
query_sunlike_targets(n_targets = 200)
}, error = function(e) {
cat("Error:", e$message, "\n")
cat("Using simulated data for demonstration\n")
NULL
})
if (!is.null(targets)) {
cat(sprintf("\nFound %d Sun-like targets\n", nrow(targets)))
cat("\nStellar parameter ranges:\n")
cat(sprintf(" Teff: %d - %d K\n", min(targets$teff), max(targets$teff)))
cat(sprintf(" log g: %.2f - %.2f\n", min(targets$logg), max(targets$logg)))
cat(sprintf(" Kepmag: %.1f - %.1f\n", min(targets$kepmag), max(targets$kepmag)))
print(head(targets, 10))
}##
## Found 200 Sun-like targets
##
## Stellar parameter ranges:
## Teff: 5545 - 5997 K
## log g: 4.09 - 4.58
## Kepmag: 6.0 - 13.0
## # A tibble: 10 × 9
## kepid ra dec teff logg feh kepmag radius mass
## <int> <dbl> <dbl> <int> <dbl> <dbl> <dbl> <dbl> <lgl>
## 1 10005473 289. 47.0 5780 4.44 0 6.00 1 NA
## 2 10002106 287. 47.0 5780 4.44 0 8.98 1 NA
## 3 10064712 287. 47.1 5780 4.44 0 9.36 1 NA
## 4 10068519 289. 47.1 5780 4.44 0 9.54 1 NA
## 5 10031808 299. 46.9 5780 4.44 0 9.56 1 NA
## 6 10200031 288. 47.2 5780 4.44 0 9.84 1 NA
## 7 10202015 289. 47.2 5780 4.44 0 9.86 1 NA
## 8 10139791 291. 47.2 5780 4.44 0 9.90 1 NA
## 9 10082904 295. 47.1 5780 4.44 0 9.93 1 NA
## 10 10068482 289. 47.1 5780 4.44 0 9.99 1 NA
In practice, the real Kepler data download may take \(5-20\) minutes depending on \(n\_stars\) value and network speed.
# ==============================================================================
# DOWNLOAD REAL KEPLER STELLAR LIGHT CURVES - UNIFIED VERSION
# ==============================================================================
#
# This chunk downloads actual observed Kepler light curves to use as the
# ultimate validation dataset for Kime-Phase Tomography.
#
# Two methods available:
# 1. lightkurve (Python) via reticulate - Most reliable, more features
# 2. Pure R via direct MAST API - No Python dependency
#
# Output structure matches simulate_kepler_lightcurves():
# - Y: matrix (n_stars × n_time) of standardized flux
# - t: time vector in days (starting at 0)
# - metadata: tibble with stellar properties
# - n_stars, n_time, cadence_min: scalar metadata
#
# ==============================================================================
library(tidyverse)
library(httr)
library(jsonlite)
# ==============================================================================
# HELPER: Query Kepler Input Catalog
# ==============================================================================
query_kepler_targets <- function(n_targets = 500,
teff_range = c(5500, 6000),
logg_range = c(4.0, 4.6),
mag_range = c(10, 13)) {
tap_url <- "https://exoplanetarchive.ipac.caltech.edu/TAP/sync"
query <- sprintf(
"SELECT TOP %d kepid, ra, dec, teff, logg, feh, kepmag, radius, mass
FROM keplerstellar
WHERE teff BETWEEN %d AND %d
AND logg BETWEEN %.2f AND %.2f
AND kepmag BETWEEN %.1f AND %.1f
AND radius IS NOT NULL AND mass IS NOT NULL
ORDER BY kepmag ASC",
n_targets,
teff_range[1], teff_range[2],
logg_range[1], logg_range[2],
mag_range[1], mag_range[2]
)
response <- GET(tap_url, query = list(query = query, format = "json"), timeout(120))
if (status_code(response) != 200) {
stop("Failed to query Kepler stellar catalog")
}
as_tibble(fromJSON(rawToChar(response$content)))
}
# ==============================================================================
# METHOD 1: Using lightkurve (Python) - RECOMMENDED
# ==============================================================================
download_kepler_via_lightkurve <- function(targets,
n_stars = 100,
quarter = 5,
cadence = "long",
max_time_days = 60,
verbose = TRUE) {
# Check Python availability
if (!requireNamespace("reticulate", quietly = TRUE)) {
stop("reticulate package required. Install with: install.packages('reticulate')")
}
library(reticulate)
# Check/install lightkurve
if (!py_module_available("lightkurve")) {
if (verbose) cat("Installing lightkurve Python package...\n")
py_install("lightkurve", pip = TRUE)
}
lk <- import("lightkurve")
if (verbose) cat("Downloading via lightkurve...\n")
lightcurves <- list()
metadata_list <- list()
successful <- 0
for (i in seq_len(min(nrow(targets), n_stars * 3))) {
if (successful >= n_stars) break
kepid <- targets$kepid[i]
if (verbose && (i %% 20 == 0 || i == 1)) {
cat(sprintf(" Target %d: KIC %d (success: %d/%d)\n", i, kepid, successful, n_stars))
}
lc_result <- tryCatch({
search_str <- sprintf("KIC %d", kepid)
search_result <- lk$search_lightcurve(
search_str,
mission = "Kepler",
quarter = as.integer(quarter),
cadence = cadence
)
if (length(search_result) == 0) return(NULL)
lc <- search_result$download(quality_bitmask = "hard")
time_raw <- as.numeric(lc$time$value)
flux_raw <- as.numeric(lc$flux$value)
valid <- is.finite(time_raw) & is.finite(flux_raw)
if (sum(valid) < 100) return(NULL)
list(
time = time_raw[valid] - min(time_raw[valid]),
flux = (flux_raw[valid] / median(flux_raw[valid]) - 1) * 1e6,
kepid = kepid
)
}, error = function(e) NULL)
if (!is.null(lc_result)) {
successful <- successful + 1
lightcurves[[successful]] <- lc_result
metadata_list[[successful]] <- targets[i, ] %>% mutate(star_id = successful)
}
Sys.sleep(0.3)
}
list(lightcurves = lightcurves, metadata = bind_rows(metadata_list), n = successful)
}
# ==============================================================================
# METHOD 2: Pure R via MAST API
# ==============================================================================
download_kepler_via_mast <- function(targets,
n_stars = 50,
max_time_days = 60,
verbose = TRUE) {
if (!requireNamespace("FITSio", quietly = TRUE)) {
install.packages("FITSio")
}
library(FITSio)
if (verbose) cat("Downloading via direct MAST API (pure R)...\n")
lightcurves <- list()
metadata_list <- list()
successful <- 0
# Quarter timestamps for filename construction
q_timestamps <- c("2010078095331", "2010174085026", "2009350155506", "2010265121752")
for (i in seq_len(min(nrow(targets), n_stars * 5))) {
if (successful >= n_stars) break
kepid <- targets$kepid[i]
kepid_str <- sprintf("%09d", kepid)
prefix <- substr(kepid_str, 1, 4)
if (verbose && (i %% 10 == 0 || i == 1)) {
cat(sprintf(" Target %d: KIC %d (success: %d/%d)\n", i, kepid, successful, n_stars))
}
lc_result <- NULL
for (ts in q_timestamps) {
fits_url <- sprintf(
"https://archive.stsci.edu/missions/kepler/lightcurves/%s/%s/kplr%s-%s_llc.fits",
prefix, kepid_str, kepid_str, ts
)
lc_result <- tryCatch({
temp_file <- tempfile(fileext = ".fits")
on.exit(unlink(temp_file), add = TRUE)
download.file(fits_url, temp_file, mode = "wb", quiet = TRUE)
fits_data <- readFITS(temp_file)
col_names <- fits_data$colNames
time_idx <- which(col_names == "TIME")
flux_idx <- which(col_names == "PDCSAP_FLUX")
qual_idx <- which(col_names == "SAP_QUALITY")
if (length(time_idx) == 0 || length(flux_idx) == 0) return(NULL)
time <- as.numeric(fits_data$col[[time_idx]])
flux <- as.numeric(fits_data$col[[flux_idx]])
quality <- if (length(qual_idx) > 0) as.integer(fits_data$col[[qual_idx]]) else rep(0L, length(time))
valid <- (quality == 0) & is.finite(time) & is.finite(flux) & (flux > 0)
if (sum(valid) < 100) return(NULL)
list(
time = time[valid] - min(time[valid]),
flux = (flux[valid] / median(flux[valid]) - 1) * 1e6,
kepid = kepid
)
}, error = function(e) NULL)
if (!is.null(lc_result)) break
}
if (!is.null(lc_result)) {
successful <- successful + 1
lightcurves[[successful]] <- lc_result
metadata_list[[successful]] <- targets[i, ] %>% mutate(star_id = successful)
}
Sys.sleep(0.2)
}
list(lightcurves = lightcurves, metadata = bind_rows(metadata_list), n = successful)
}
# ==============================================================================
# MAIN FUNCTION: Align and create matrix
# ==============================================================================
create_kepler_matrix <- function(download_result, max_time_days = 60) {
lightcurves <- download_result$lightcurves
metadata <- download_result$metadata
n <- download_result$n
if (n < 5) stop("Too few successful downloads")
# Find common time range
t_min <- max(sapply(lightcurves, function(lc) min(lc$time)))
t_max <- min(sapply(lightcurves, function(lc) max(lc$time)))
if (!is.null(max_time_days)) {
t_max <- min(t_max, t_min + max_time_days)
}
# Create time grid (30-min cadence)
dt <- 0.0208
t_grid <- seq(t_min, t_max, by = dt)
n_time <- length(t_grid)
# Build matrix
Y <- matrix(NA_real_, nrow = n, ncol = n_time)
for (i in seq_len(n)) {
lc <- lightcurves[[i]]
flux_interp <- approx(lc$time, lc$flux, xout = t_grid, rule = 1)$y
if (sum(!is.na(flux_interp)) > 50) {
Y[i, ] <- (flux_interp - mean(flux_interp, na.rm = TRUE)) /
sd(flux_interp, na.rm = TRUE)
}
}
list(
Y = Y,
t = t_grid,
metadata = metadata,
n_stars = n,
n_time = n_time,
cadence_min = dt * 24 * 60,
source = "Kepler/MAST"
)
}
# ==============================================================================
# UNIFIED DOWNLOAD FUNCTION
# ==============================================================================
#' Download real Kepler light curves
#'
#' @param n_stars Number of stars to download
#' @param teff_range Temperature range for Sun-like stars (K)
#' @param logg_range Surface gravity range (dex)
#' @param mag_range Kepler magnitude range
#' @param quarter Kepler quarter (1-17)
#' @param max_time_days Maximum time span to include
#' @param method "auto" (try lightkurve, fall back to mast), "lightkurve", or "mast"
#' @param verbose Print progress
#' @return List with Y matrix, t vector, metadata (same as simulate_kepler_lightcurves)
download_kepler_lightcurves <- function(n_stars = 100,
teff_range = c(5500, 6000),
logg_range = c(4.0, 4.6),
mag_range = c(10, 13),
quarter = 5,
max_time_days = 60,
method = "auto",
verbose = TRUE) {
if (verbose) {
cat("\n")
cat("═══════════════════════════════════════════════════════════════════\n")
cat(" DOWNLOADING REAL KEPLER STELLAR LIGHT CURVES \n")
cat("═══════════════════════════════════════════════════════════════════\n")
}
# Step 1: Query targets
if (verbose) cat("\nStep 1: Querying Kepler stellar catalog...\n")
targets <- query_kepler_targets(
n_targets = n_stars * 5,
teff_range = teff_range,
logg_range = logg_range,
mag_range = mag_range
)
if (verbose) {
cat(sprintf(" Found %d candidate Sun-like stars\n", nrow(targets)))
}
# Step 2: Download light curves
if (verbose) cat("\nStep 2: Downloading light curves...\n")
download_result <- NULL
if (method %in% c("auto", "lightkurve")) {
download_result <- tryCatch({
download_kepler_via_lightkurve(
targets, n_stars = n_stars, quarter = quarter,
max_time_days = max_time_days, verbose = verbose
)
}, error = function(e) {
if (verbose) cat(" lightkurve method failed:", e$message, "\n")
NULL
})
}
if (is.null(download_result) && method %in% c("auto", "mast")) {
if (verbose && method == "auto") cat(" Falling back to direct MAST API...\n")
download_result <- download_kepler_via_mast(
targets, n_stars = min(n_stars, 100), # MAST is slower
max_time_days = max_time_days, verbose = verbose
)
}
if (is.null(download_result) || download_result$n < 5) {
stop("Failed to download sufficient light curves")
}
if (verbose) cat(sprintf(" Downloaded %d light curves\n", download_result$n))
# Step 3: Create aligned matrix
if (verbose) cat("\nStep 3: Creating aligned data matrix...\n")
result <- create_kepler_matrix(download_result, max_time_days = max_time_days)
# Summary
if (verbose) {
cat("\n")
cat("═══════════════════════════════════════════════════════════════════\n")
cat(" DOWNLOAD COMPLETE \n")
cat("═══════════════════════════════════════════════════════════════════\n")
cat(sprintf("Stars (realizations): %d\n", result$n_stars))
cat(sprintf("Time points: %d\n", result$n_time))
cat(sprintf("Time span: %.1f days\n", max(result$t)))
cat(sprintf("Cadence: %.1f minutes\n", result$cadence_min))
cat(sprintf("Matrix dimensions: %d × %d\n", nrow(result$Y), ncol(result$Y)))
cat(sprintf("Missing values: %.1f%%\n", 100 * mean(is.na(result$Y))))
cat("\nStellar properties:\n")
cat(sprintf(" Teff: %d - %d K\n",
min(result$metadata$teff), max(result$metadata$teff)))
cat(sprintf(" Kepmag: %.1f - %.1f\n",
min(result$metadata$kepmag), max(result$metadata$kepmag)))
cat("═══════════════════════════════════════════════════════════════════\n")
}
result
}
# ==============================================================================
# EXECUTE: DOWNLOAD REAL DATA
# ==============================================================================
cat("Starting Kepler light curve download...\n")## Starting Kepler light curve download...
## This may take 5-20 minutes depending on n_stars and method...
# Download real Kepler data
# This replaces simulate_kepler_lightcurves() with real observed data
# # OLD
# kepler_real <- download_kepler_lightcurves(
# n_stars = 100, # Number of stars (realizations)
# teff_range = c(5500, 6000), # Sun-like temperature
# logg_range = c(4.0, 4.6), # Main sequence
# mag_range = c(10, 13), # Bright enough for good data
# quarter = 5, # Kepler Q5 (good quality)
# max_time_days = 60, # ~2 months of data
# method = "auto", # Try lightkurve, fall back to MAST
# verbose = TRUE
# )
n_stars = 100
cache_file <- sprintf("kepler_lightcurves_%s.rds",
gsub("-", "", n_stars))
if (file.exists(cache_file)) {
message("Loading cached Kepler data: ", cache_file)
kepler_real <- readRDS(cache_file)
} else {
# Retrieve data from Horizons
kepler_real <- tryCatch({
kepler_real <- download_kepler_lightcurves(
n_stars = 100, # Number of stars (realizations)
teff_range = c(5500, 6000), # Sun-like temperature
logg_range = c(4.0, 4.6), # Main sequence
mag_range = c(10, 13), # Bright enough for good data
quarter = 5, # Kepler Q5 (good quality)
max_time_days = 60, # ~2 months of data
method = "auto", # Try lightkurve, fall back to MAST
verbose = TRUE
)
}, error = function(e) {
cat("Primary request failed, trying backup range...\n")
# fetch_horizons_data(
# start_date = "2000-01-01",
# end_date = "2024-06-01",
# step_size = "1 d"
# )
})
saveRDS(kepler_real, cache_file)
message("Saved cached Kepler Lightcurves Data: ", cache_file)
}
# Quick validation
cat("\n=== DATA READY FOR KPT ===\n")##
## === DATA READY FOR KPT ===
## Use kepler_real$Y as input to create_kpt_matrix_kepler()
## Dimensions match simulate_kepler_lightcurves() output
##
## First 5 stars:
## # A tibble: 5 × 10
## kepid ra dec teff logg feh kepmag radius mass star_id
## <int> <dbl> <dbl> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 10677047 296. 47.9 5663 4.54 -0.14 10.1 0.849 0.916 1
## 2 11081642 291. 48.6 5993 4.25 -0.02 10.1 1.27 1.05 2
## 3 9760362 287. 46.6 5860 4.21 -0.2 10.1 1.25 0.929 3
## 4 10934241 298. 48.4 5912 4.49 -0.16 10.3 0.934 0.974 4
## 5 11200185 296. 48.8 5580 4.48 -0.6 10.3 0.814 0.724 5
##
## Flux statistics per star (first 5):
for (i in 1:min(5, nrow(kepler_real$Y))) {
flux <- kepler_real$Y[i, ]
cat(sprintf(" Star %d (KIC %d): mean=%.3f, sd=%.3f, range=[%.2f, %.2f]\n",
i, kepler_real$metadata$kepid[i],
mean(flux, na.rm=TRUE), sd(flux, na.rm=TRUE),
min(flux, na.rm=TRUE), max(flux, na.rm=TRUE)))
}## Star 1 (KIC 10677047): mean=0.000, sd=1.000, range=[-2.68, 2.44]
## Star 2 (KIC 11081642): mean=-0.000, sd=1.000, range=[-2.78, 2.34]
## Star 3 (KIC 9760362): mean=-0.000, sd=1.000, range=[-3.30, 1.82]
## Star 4 (KIC 10934241): mean=0.000, sd=1.000, range=[-2.56, 1.85]
## Star 5 (KIC 11200185): mean=-0.000, sd=1.000, range=[-2.96, 2.25]
# str(kepler_real)
# List of 7
# $ Y : num [1:75, 1:2885] -2.159 -0.223 0.366 1.02 1.722 ...
# $ t : num [1:2885] 0 0.0208 0.0416 0.0624 0.0832 ...
# $ metadata : tibble [75 × 10] (S3: tbl_df/tbl/data.frame)
# ..$ kepid : int [1:75] 10677047 11081642 9760362 10934241 11200185 10850420 10019747 10482041 10866844 10477502 ...
# ..$ ra : num [1:75] 296 291 287 298 296 ...
# ..$ dec : num [1:75] 47.9 48.6 46.6 48.4 48.8 ...
# ..$ teff : int [1:75] 5663 5993 5860 5912 5580 5876 5769 5665 5557 5568 ...
# ..$ logg : num [1:75] 4.54 4.25 4.21 4.49 4.48 ...
# ..$ feh : num [1:75] -0.14 -0.02 -0.2 -0.16 -0.6 -0.66 -0.2 -0.12 -0.5 -0.46 ...
# ..$ kepmag : num [1:75] 10.1 10.1 10.1 10.3 10.3 ...
# ..$ radius : num [1:75] 0.849 1.269 1.25 0.934 0.814 ...
# ..$ mass : num [1:75] 0.916 1.048 0.929 0.974 0.724 ...
# ..$ star_id: num [1:75] 1 2 3 4 5 6 7 8 9 10 ...
# $ n_stars : num 75
# $ n_time : int 2885
# $ cadence_min: num 30
# $ source : chr "Kepler/MAST"Choose real or simulated Kepler star data…
kepler_data <- kepler_real # Real Kepler Data
kepler_data <- kepler_sim # Simulated Kepler Data
# Plot sample light curves
par(mfrow = c(3, 1), mar = c(4, 4, 2, 1))
kepler_data <- kepler_real # Real Kepler Data
# kepler_data <- kepler_sim # Simulated Kepler Data
for (i in 1:3) {
plot(kepler_data$t, kepler_data$Y[i, ], type = "l", col = "steelblue",
xlab = "Time (days)", ylab = "Normalized Flux",
main = sprintf("Star %d (feh = %.1f d, mass = %.2f d)",
i, kepler_data$metadata$feh[i],
kepler_data$metadata$mass[i]))
# main = sprintf("Star %d (P_rot = %.1f d, τ_gran = %.2f d)",
# i, kepler_data$metadata$rotation_period[i],
# kepler_data$metadata$granulation_tau[i]))
grid()
}## Creating KPT matrix:
## Window size: 1.0 days (48 points)
## Step size: 1.0 days (48 points)
## Number of windows: 60
##
## KPT Matrix: 75 stars × 60 windows
## List of 7
## $ Y : num [1:75, 1:2885] -2.159 -0.223 0.366 1.02 1.722 ...
## $ t : num [1:2885] 0 0.0208 0.0416 0.0624 0.0832 ...
## $ metadata : tibble [75 × 10] (S3: tbl_df/tbl/data.frame)
## ..$ kepid : int [1:75] 10677047 11081642 9760362 10934241 11200185 10850420 10019747 10482041 10866844 10477502 ...
## ..$ ra : num [1:75] 296 291 287 298 296 ...
## ..$ dec : num [1:75] 47.9 48.6 46.6 48.4 48.8 ...
## ..$ teff : int [1:75] 5663 5993 5860 5912 5580 5876 5769 5665 5557 5568 ...
## ..$ logg : num [1:75] 4.54 4.25 4.21 4.49 4.48 ...
## ..$ feh : num [1:75] -0.14 -0.02 -0.2 -0.16 -0.6 -0.66 -0.2 -0.12 -0.5 -0.46 ...
## ..$ kepmag : num [1:75] 10.1 10.1 10.1 10.3 10.3 ...
## ..$ radius : num [1:75] 0.849 1.269 1.25 0.934 0.814 ...
## ..$ mass : num [1:75] 0.916 1.048 0.929 0.974 0.724 ...
## ..$ star_id: num [1:75] 1 2 3 4 5 6 7 8 9 10 ...
## $ n_stars : num 75
## $ n_time : int 2885
## $ cadence_min: num 30
## $ source : chr "Kepler/MAST"
## List of 6
## $ Y : num [1:75, 1:60] -0.709 -0.907 -1.097 -0.695 -0.887 ...
## ..- attr(*, "scaled:center")= num [1:60] -2.09 -2.08 -2.03 -2.02 -1.98 ...
## ..- attr(*, "scaled:scale")= num [1:60] 1.83 1.85 1.72 1.63 1.64 ...
## $ Y_raw : num [1:75, 1:60] 0.0338 0.0235 0.0166 0.0346 0.0244 ...
## $ t_center : num [1:60] 0.51 1.51 2.51 3.5 4.5 ...
## $ n_stars : int 75
## $ n_windows: int 60
## $ metadata : tibble [75 × 10] (S3: tbl_df/tbl/data.frame)
## ..$ kepid : int [1:75] 10677047 11081642 9760362 10934241 11200185 10850420 10019747 10482041 10866844 10477502 ...
## ..$ ra : num [1:75] 296 291 287 298 296 ...
## ..$ dec : num [1:75] 47.9 48.6 46.6 48.4 48.8 ...
## ..$ teff : int [1:75] 5663 5993 5860 5912 5580 5876 5769 5665 5557 5568 ...
## ..$ logg : num [1:75] 4.54 4.25 4.21 4.49 4.48 ...
## ..$ feh : num [1:75] -0.14 -0.02 -0.2 -0.16 -0.6 -0.66 -0.2 -0.12 -0.5 -0.46 ...
## ..$ kepmag : num [1:75] 10.1 10.1 10.1 10.3 10.3 ...
## ..$ radius : num [1:75] 0.849 1.269 1.25 0.934 0.814 ...
## ..$ mass : num [1:75] 0.916 1.048 0.929 0.974 0.724 ...
## ..$ star_id: num [1:75] 1 2 3 4 5 6 7 8 9 10 ...
cat(sprintf("\nKPT Matrix dimensions: %d stars × %d windows\n",
kpt_data$n_stars, kpt_data$n_windows))##
## KPT Matrix dimensions: 75 stars × 60 windows
## Overall mean: -0.0000
## Overall SD: 0.9934
# ==============================================================================
# FIT KPT TO REAL KEPLER DATA
# ==============================================================================
cat("\n### KPT-GEM on Kepler Data ###\n")##
## ### KPT-GEM on Kepler Data ###
##
## ════════════════════════════════════════════════════════════════
## KPT-GEM ALGORITHM
## ════════════════════════════════════════════════════════════════
## Data: 75 realizations × 60 windows
## Theta grid: 64 points, 3 harmonics
##
## Initial φ deviation: 1.4507
## Initial S variation: 0.6266
## Initial σ: 0.5411
## Iter 1: LL= -7643.4 σ=0.895 Δφ=6.02e-01 ΔS=3.43e-01 φ_dev=1.895
## Iter 2: LL= -2458.0 σ=0.908 Δφ=3.41e-01 ΔS=1.58e-01 φ_dev=1.419
## Iter 3: LL= -2410.5 σ=0.923 Δφ=2.41e-01 ΔS=1.01e-01 φ_dev=1.026
## Iter 4: LL= -2391.1 σ=0.942 Δφ=9.24e-02 ΔS=7.23e-02 φ_dev=0.936
## Iter 5: LL= -2352.6 σ=0.963 Δφ=7.72e-02 ΔS=5.43e-02 φ_dev=0.892
## Iter 20: LL= -2366.4 σ=0.918 Δφ=2.66e-03 ΔS=1.42e-02 φ_dev=3.471
## Iter 40: LL= -2586.7 σ=0.898 Δφ=5.05e-03 ΔS=2.46e-03 φ_dev=3.467
## Iter 60: LL= -2661.6 σ=0.892 Δφ=1.32e-03 ΔS=9.78e-04 φ_dev=3.481
## Iter 80: LL= -2687.2 σ=0.891 Δφ=1.78e-04 ΔS=2.08e-04 φ_dev=3.482
## Iter 100: LL= -2691.2 σ=0.891 Δφ=1.88e-05 ΔS=2.23e-05 φ_dev=3.482
## Converged!
##
## ════════════════════════════════════════════════════════════════
## KPT-GEM RESULTS
## ════════════════════════════════════════════════════════════════
## Final σ: 0.8912
## φ deviation from uniform: 3.4820
## S average θ-variation: 0.3246
## Iterations: 107
## Final log-likelihood: -2691.5
##
## → Result: Significant phase structure detected (consistent with H₁)
## ════════════════════════════════════════════════════════════════
##
## ### KPT-FFT on Kepler Data ###
fit_kepler_fft <- kpt_fft(kpt_data$Y, max_iter = 1000, damp=0.2, lambda_phi = 0.05, s_sobolev = 2, J=4, lambda_S = 0.20)##
## ════════════════════════════════════════════════════════════════
## KPT-FFT ALGORITHM (Spectral Wiener-Sobolev)
## ════════════════════════════════════════════════════════════════
## Data: 75 realizations × 60 windows
## Fourier modes: J = 4 (indices -4 to 4)
## Regularization: λ_φ = 0.050, λ_S = 0.200
##
## Initial φ deviation: 1.4507
## Initial S variation: 0.6755
## Initial σ: 0.5411
## Iter 1: LL= -8175.1 σ=0.733 Δφ=2.04e-01 Δf=1.94e-01 φ_dev=1.145
## Iter 2: LL= -4232.5 σ=0.840 Δφ=1.43e-01 Δf=2.83e-01 φ_dev=0.906
## Iter 3: LL= -3068.2 σ=0.874 Δφ=1.08e-01 Δf=1.42e-01 φ_dev=0.835
## Iter 4: LL= -2770.8 σ=0.916 Δφ=1.33e-01 Δf=1.55e-01 φ_dev=0.723
## Iter 5: LL= -2461.3 σ=0.948 Δφ=1.41e-01 Δf=2.49e-02 φ_dev=0.605
## Iter 20: LL= -2062.2 σ=0.992 Δφ=7.95e-02 Δf=6.94e-03 φ_dev=0.729
## Iter 40: LL= -2063.0 σ=0.990 Δφ=3.42e-02 Δf=8.23e-04 φ_dev=1.019
## Iter 60: LL= -2069.9 σ=0.989 Δφ=1.06e-02 Δf=3.07e-04 φ_dev=0.994
## Iter 80: LL= -2064.6 σ=0.990 Δφ=8.05e-03 Δf=3.02e-04 φ_dev=0.880
## Iter 100: LL= -2068.6 σ=0.989 Δφ=4.04e-03 Δf=1.50e-04 φ_dev=0.983
## Iter 120: LL= -2066.7 σ=0.990 Δφ=2.74e-03 Δf=1.22e-04 φ_dev=0.924
## Iter 140: LL= -2068.0 σ=0.989 Δφ=1.78e-03 Δf=7.79e-05 φ_dev=0.960
## Iter 160: LL= -2067.2 σ=0.989 Δφ=1.23e-03 Δf=5.72e-05 φ_dev=0.937
## Iter 180: LL= -2067.7 σ=0.989 Δφ=8.28e-04 Δf=3.86e-05 φ_dev=0.952
## Iter 200: LL= -2067.4 σ=0.989 Δφ=5.72e-04 Δf=2.69e-05 φ_dev=0.942
## Iter 220: LL= -2067.6 σ=0.989 Δφ=3.92e-04 Δf=1.83e-05 φ_dev=0.948
## Iter 240: LL= -2067.5 σ=0.989 Δφ=2.72e-04 Δf=1.24e-05 φ_dev=0.944
## Iter 260: LL= -2067.5 σ=0.989 Δφ=1.88e-04 Δf=8.35e-06 φ_dev=0.947
## Iter 280: LL= -2067.5 σ=0.989 Δφ=1.30e-04 Δf=5.55e-06 φ_dev=0.945
## Iter 300: LL= -2067.5 σ=0.989 Δφ=9.01e-05 Δf=3.69e-06 φ_dev=0.946
## Iter 320: LL= -2067.5 σ=0.989 Δφ=6.24e-05 Δf=2.41e-06 φ_dev=0.946
## Iter 340: LL= -2067.5 σ=0.989 Δφ=4.31e-05 Δf=1.58e-06 φ_dev=0.946
## Iter 360: LL= -2067.5 σ=0.989 Δφ=2.98e-05 Δf=1.02e-06 φ_dev=0.946
## Iter 380: LL= -2067.5 σ=0.989 Δφ=2.05e-05 Δf=6.56e-07 φ_dev=0.946
## Iter 400: LL= -2067.5 σ=0.989 Δφ=1.41e-05 Δf=4.19e-07 φ_dev=0.946
## Converged!
##
## ════════════════════════════════════════════════════════════════
## KPT-FFT RESULTS
## ════════════════════════════════════════════════════════════════
## Final σ: 0.9894
## φ deviation from uniform: 0.9457
## S average θ-variation: 0.0614
## Iterations: 410
## Final log-likelihood: -2067.5
##
## → Result: Significant phase structure detected (consistent with H₁)
## ════════════════════════════════════════════════════════════════
# kpt_fft <- function(Y,
# L_theta = 64,
# J = 6,
# lambda_phi = 0.05,
# lambda_S = 0.10,
# s_sobolev = 2,
# damp = 0.5,
# min_iter = 20,
# max_iter = 300,
# tol = 1e-5,
# verbose = TRUE)
# 1. Check for oscillations
plot(fit_kepler_gem$history$iter, fit_kepler_gem$history$dphi, type = 'l', log = 'y',
ylab = 'Δφ (log)', xlab = 'Iteration', main = 'KPT-GEM Convergence Diagnostics')
lines(fit_kepler_gem$history$iter, fit_kepler_gem$history$df, col = 'red')
legend('topright', legend = c('Δφ', 'Δf'), col = c('black', 'red'), lty = 1)plot(fit_kepler_fft$history$iter, fit_kepler_fft$history$dphi, type = 'l', log = 'y',
ylab = 'Δφ (log)', xlab = 'Iteration', main = 'KPT-FFT Convergence Diagnostics')
lines(fit_kepler_fft$history$iter, fit_kepler_fft$history$df, col = 'red')
legend('topright', legend = c('Δφ', 'Δf'), col = c('black', 'red'), lty = 1)# 2. Check KPT-FFT log-likelihood monotonicity (should increase)
plot(fit_kepler_fft$history$iter, fit_kepler_fft$history$ll, type = 'l',
main = 'KPT-FFT Log-Likelihood monotonicity (should increase)')## KPT-FFT Final φ range: 0.0543199 1.781815
## KPT-FFT φ negative values? FALSE
Types of failure of KPT-FFT algorithm convergence
lambda_phi and s_sobolev parameterslambda_phi
significantly (\(\ge 0.2\))lambda_S or reduce J parameters.# ==============================================================================
# VISUALIZE KEPLER RESULTS
# ==============================================================================
par(mfrow = c(2, 3), mar = c(1, 1, 1, 1))
# Sample light curves
for (i in 1:3) {
plot(kepler_data$t, kepler_data$Y[i, ], type = "l", col = "steelblue",
main = sprintf("Star %d Light Curve", i),
xlab = "Time (days)", ylab = "Normalized Flux")
grid()
}
# Convergence comparison
plot(fit_kepler_gem$history$iter, fit_kepler_gem$history$ll,
type = "l", col = "blue", lwd = 2,
main = "Log-Likelihood Convergence",
xlab = "Iteration", ylab = "Log-Likelihood")
lines(fit_kepler_fft$history$iter, fit_kepler_fft$history$ll,
col = "purple", lwd = 2)
legend("bottomright", c("GEM", "FFT"), col = c("blue", "purple"), lwd = 2)
grid()
# Recovered φ(θ)
plot(fit_kepler_gem$theta, fit_kepler_gem$phi, type = "l",
col = "blue", lwd = 2,
main = "Recovered Phase Distribution φ(θ)",
xlab = "θ (radians)", ylab = "φ(θ)", ylim = c(0, 2))
lines(fit_kepler_fft$theta, fit_kepler_fft$phi, col = "purple", lwd = 2)
abline(h = 1, col = "red", lwd = 2, lty = 2)
legend("topright", c("GEM", "FFT", "Uniform"),
col = c("blue", "purple", "red"), lwd = 2, lty = c(1, 1, 2))
grid()
# Kime surface (GEM)
filled.contour(
x = fit_kepler_gem$theta,
y = 1:nrow(fit_kepler_gem$S),
z = t(fit_kepler_gem$S),
color.palette = viridis,
main = "KPT-GEM: Kime Surface S(θ, t)",
xlab = "θ (radians)",
ylab = "Time Window"
)# ==============================================================================
# KEPLER RESULTS SUMMARY
# ==============================================================================
cat("\n")## ═══════════════════════════════════════════════════════════════════════════
## KEPLER DATA ANALYSIS SUMMARY
## ═══════════════════════════════════════════════════════════════════════════
## KPT-GEM Results:
## φ deviation from uniform: 3.4820
## S average variation: 0.3246
## Estimated σ: 0.8912
## Converged: TRUE (iter: 107)
## KPT-FFT Results:
## φ deviation from uniform: 0.9457
## S average variation: 0.0614
## Estimated σ: 0.9894
## Converged: TRUE (iter: 410)
## INTERPRETATION:
if (fit_kepler_gem$phi_uniform_dist < 0.3) {
cat("→ GEM: φ(θ) ≈ uniform — No exploitable phase structure detected\n")
cat(" Stellar variability appears phase-independent (H₀ not rejected)\n")
} else {
cat("→ GEM: φ(θ) shows non-uniform structure — Phase dependence detected (H₁)\n")
}## → GEM: φ(θ) shows non-uniform structure — Phase dependence detected (H₁)
if (fit_kepler_fft$phi_uniform_dist < 0.3) {
cat("→ FFT: φ(θ) ≈ uniform — No exploitable phase structure detected\n")
cat(" Consistent with GEM findings\n")
} else {
cat("→ FFT: φ(θ) shows non-uniform structure\n")
}## → FFT: φ(θ) shows non-uniform structure
## ═══════════════════════════════════════════════════════════════════════════
Simulated Data)Let’s test the kpt_predict() function for forward
prediction of multiple individual time-courses with uncertainty
quantification (UQ).
The test includes synthetic data generation with known ground truth, model fitting, prediction, and rich visualizations showing individual trajectories:
# ==============================================================================
# COMPREHENSIVE TEST: KPT-FFT PREDICTION WITH MULTIPLE TIME-COURSES
# ==============================================================================
# Load required packages
if (!requireNamespace("ggplot2", quietly = TRUE)) install.packages("ggplot2")
if (!requireNamespace("dplyr", quietly = TRUE)) install.packages("dplyr")
library(ggplot2)
library(dplyr)
# ------------------------------------------------------------------------------
# 1. SIMULATE GROUND TRUTH DATA WITH KNOWN PHASE DISTRIBUTION
# ------------------------------------------------------------------------------
simulate_kpt_data <- function(
n_stars = 200,
n_windows = 15,
L_theta = 64,
J = 5,
sigma = 0.3,
phase_type = c("bimodal", "unimodal", "uniform")
) {
phase_type <- match.arg(phase_type)
theta_grid <- seq(0, 2*pi, length.out = L_theta + 1)[-1]
nvec <- (-J):J
# True phase distribution φ(θ)
phi_true <- switch(phase_type,
"bimodal" = 0.6 * dnorm(theta_grid, pi/2, 0.6) + 0.4 * dnorm(theta_grid, 3*pi/2, 0.8),
"unimodal" = dnorm(theta_grid, pi, 0.7),
"uniform" = rep(1, L_theta)
)
phi_true <- phi_true / mean(phi_true) # Enforce mean = 1
# Sample true phases for each star
p_theta <- phi_true / sum(phi_true)
theta_true <- sample(theta_grid, n_stars, replace = TRUE, prob = p_theta)
# True signal functions S_k(θ) - smooth periodic functions
S_true <- matrix(0, nrow = n_windows, ncol = L_theta)
for (k in 1:n_windows) {
# Time-varying amplitude and frequency
amp <- 1.5 + 0.5 * sin(2*pi * k / n_windows)
freq <- 1 + floor(k / 5)
phase_shift <- 0.5 * sin(2*pi * k / (n_windows/2))
S_true[k, ] <- amp * sin(freq * theta_grid + phase_shift) +
0.3 * cos(2*freq * theta_grid)
}
# Generate observations Y_ik = S_k(θ_i) + noise
Y <- matrix(NA, nrow = n_stars, ncol = n_windows)
for (i in 1:n_stars) {
for (k in 1:n_windows) {
# Find closest grid point to θ_i
idx <- which.min(abs(theta_grid - theta_true[i]))
Y[i, k] <- S_true[k, idx] + rnorm(1, 0, sigma)
}
}
# Add 5% missing values for realism
na_idx <- sample(length(Y), size = round(0.05 * length(Y)))
Y[na_idx] <- NA
list(
Y = Y,
theta_grid = theta_grid,
phi_true = phi_true,
S_true = S_true,
theta_true = theta_true,
sigma_true = sigma
)
}
# ------------------------------------------------------------------------------
# 2. RUN FULL PIPELINE: FIT + PREDICT
# ------------------------------------------------------------------------------
set.seed(42)
sim <- simulate_kpt_data(
n_stars = 250,
n_windows = 20,
phase_type = "bimodal",
sigma = 0.25
)
# Fit KPT-FFT model (using convergence-friendly parameters)
fit <- kpt_fft(
Y = sim$Y,
L_theta = 64,
J = 5,
lambda_phi = 0.15,
lambda_S = 0.25,
damp = 0.25,
s_sobolev = 3,
max_iter = 400,
tol = 1e-4,
verbose = TRUE
)##
## ════════════════════════════════════════════════════════════════
## KPT-FFT ALGORITHM (Spectral Wiener-Sobolev)
## ════════════════════════════════════════════════════════════════
## Data: 250 realizations × 20 windows
## Fourier modes: J = 5 (indices -5 to 5)
## Regularization: λ_φ = 0.150, λ_S = 0.250
##
## Initial φ deviation: 0.6464
## Initial S variation: 0.9172
## Initial σ: 1.0954
## Iter 1: LL= -8850.0 σ=1.164 Δφ=8.86e-02 Δf=3.26e-01 φ_dev=0.552
## Iter 2: LL= -7128.7 σ=1.196 Δφ=9.30e-02 Δf=2.85e-01 φ_dev=0.374
## Iter 3: LL= -5996.6 σ=1.211 Δφ=6.38e-02 Δf=2.56e-01 φ_dev=0.300
## Iter 4: LL= -6293.7 σ=1.209 Δφ=7.14e-02 Δf=1.50e-01 φ_dev=0.285
## Iter 5: LL= -6258.9 σ=1.208 Δφ=1.11e-01 Δf=1.28e-01 φ_dev=0.464
## Iter 20: LL= -6757.8 σ=1.178 Δφ=3.18e-02 Δf=1.45e-03 φ_dev=1.671
## Iter 40: LL= -6783.2 σ=1.176 Δφ=3.99e-03 Δf=1.55e-04 φ_dev=2.038
## Iter 60: LL= -6776.1 σ=1.176 Δφ=9.25e-04 Δf=3.51e-05 φ_dev=1.968
## Iter 80: LL= -6774.5 σ=1.176 Δφ=2.03e-04 Δf=7.94e-06 φ_dev=1.951
## Converged!
##
## ════════════════════════════════════════════════════════════════
## KPT-FFT RESULTS
## ════════════════════════════════════════════════════════════════
## Final σ: 1.1760
## φ deviation from uniform: 1.9481
## S average θ-variation: 0.0905
## Iterations: 90
## Final log-likelihood: -6774.2
##
## → Result: Significant phase structure detected (consistent with H₁)
## ════════════════════════════════════════════════════════════════
# Generate predictions for 30 future time windows
n_new <- 30
pred <- kpt_predict(fit, n_new = n_new, n_draw = 5000)
# ------------------------------------------------------------------------------
# 3. VISUALIZE MULTIPLE INDIVIDUAL TIME-COURSES
# ------------------------------------------------------------------------------
# Extract 100 individual predicted trajectories
n_show <- 100
individual_draws <- as.data.frame(pred$draws[1:n_show, ])
individual_draws$draw_id <- 1:n_show
individual_draws <- tidyr::pivot_longer(
individual_draws,
cols = -draw_id,
names_to = "time",
values_to = "value"
)
individual_draws$time <- as.numeric(gsub("V", "", individual_draws$time))
# Create plot with multiple trajectories + intervals
p1 <- ggplot() +
# Individual trajectories (faint)
geom_line(
data = individual_draws,
aes(x = time, y = value, group = draw_id),
alpha = 0.15,
size = 0.6,
color = "#2E86AB"
) +
# 50% prediction interval
geom_ribbon(
aes(x = 1:n_new, ymin = pred$pi_50_low, ymax = pred$pi_50_high),
fill = "#A23B72",
alpha = 0.35
) +
# 95% prediction interval
geom_ribbon(
aes(x = 1:n_new, ymin = pred$pi_low, ymax = pred$pi_high),
fill = "#A23B72",
alpha = 0.20
) +
# Point prediction (posterior mean)
geom_line(
aes(x = 1:n_new, y = pred$mean),
color = "#F18F01",
size = 1.5,
linetype = "solid"
) +
# Zero line for reference
geom_hline(yintercept = 0, linetype = "dashed", color = "gray40", alpha = 0.5) +
labs(
title = "KPT-FFT Forward Predictions: 100 Individual Time-Courses",
subtitle = sprintf(
"n_stars = %d, n_windows = %d (observed) → %d (predicted) | σ̂ = %.3f",
nrow(sim$Y), ncol(sim$Y), n_new, fit$sigma
),
x = "Time Window (Future)",
y = "Predicted Signal Value",
caption = "Faint blue lines: individual Monte Carlo trajectories\nShaded bands: 50% (dark) and 95% (light) prediction intervals\nOrange line: posterior mean prediction"
) +
theme_minimal(base_size = 13) +
theme(
plot.title = element_text(face = "bold", hjust = 0.5),
plot.subtitle = element_text(hjust = 0.5, size = 11, color = "gray40"),
legend.position = "none"
)
print(p1)# ------------------------------------------------------------------------------
# 4. DIAGNOSTIC: PHASE DISTRIBUTION + PREDICTION CALIBRATION
# ------------------------------------------------------------------------------
# Plot 1: True vs Estimated Phase Distribution
p2_left <- ggplot(data.frame(
theta = fit$theta,
phi_true = approx(sim$theta_grid, sim$phi_true, fit$theta)$y,
phi_est = fit$phi
), aes(x = theta)) +
geom_line(aes(y = phi_true, color = "True φ(θ)"), size = 1.2) +
geom_line(aes(y = phi_est, color = "Estimated φ(θ)"), size = 1.2, linetype = "dashed") +
labs(
title = "Phase Distribution Recovery",
x = expression(theta),
y = expression(phi(theta)),
color = ""
) +
scale_color_manual(values = c("True φ(θ)" = "#06A77D", "Estimated φ(θ)" = "#D62828")) +
theme_minimal() +
theme(legend.position = c(0.85, 0.85))
# Plot 2: Prediction interval coverage (simulation only)
# Generate pseudo-observations from true model for coverage check
set.seed(123)
theta_new <- sample(sim$theta_grid, n_new, replace = TRUE, prob = sim$phi_true / sum(sim$phi_true))
y_new_true <- numeric(n_new)
for (k in 1:n_new) {
idx <- which.min(abs(sim$theta_grid - theta_new[k]))
# Use last observed window's signal shape scaled appropriately
S_idx <- min(k, nrow(sim$S_true))
y_new_true[k] <- sim$S_true[S_idx, idx] + rnorm(1, 0, sim$sigma_true)
}
coverage <- mean(y_new_true >= pred$pi_low & y_new_true <= pred$pi_high)
p2_right <- ggplot(data.frame(
time = 1:n_new,
y_true = y_new_true,
pi_low = pred$pi_low,
pi_high = pred$pi_high,
mean_pred = pred$mean
), aes(x = time)) +
geom_ribbon(aes(ymin = pi_low, ymax = pi_high), fill = "#A23B72", alpha = 0.25) +
geom_line(aes(y = mean_pred), color = "#F18F01", size = 1) +
geom_point(aes(y = y_true), color = "#06A77D", size = 2.5, stroke = 0.5) +
geom_hline(yintercept = 0, linetype = "dashed", color = "gray50", alpha = 0.4) +
labs(
title = sprintf("Prediction Coverage: %.1f%% (target 95%%)", coverage * 100),
x = "Time Window",
y = "Signal Value",
caption = "Green points: simulated future observations\nShaded band: 95% prediction interval"
) +
theme_minimal()
# Combine plots
library(gridExtra)
grid.arrange(p2_left, p2_right, ncol = 2)# ------------------------------------------------------------------------------
# 5. ADVANCED: ANIMATED TRAJECTORY SAMPLING (OPTIONAL)
# ------------------------------------------------------------------------------
# Show how individual trajectories reflect phase structure
sample_phases <- function(fit, n_samples = 5) {
p_theta <- fit$phi / sum(fit$phi)
theta_samples <- sample(fit$theta, n_samples, replace = TRUE, prob = p_theta)
df <- data.frame(
time = rep(1:ncol(fit$S), n_samples),
value = as.vector(t(fit$S)[, rep(1:n_samples, each = ncol(fit$S))]),
trajectory = rep(1:n_samples, each = ncol(fit$S)),
theta_phase = rep(round(theta_samples, 2), each = ncol(fit$S))
)
ggplot(df, aes(x = time, y = value, group = trajectory, color = factor(trajectory))) +
geom_line(size = 1.2, alpha = 0.8) +
geom_point(aes(shape = factor(trajectory)), size = 3) +
labs(
title = "Sampled Trajectories Reflect Phase Distribution",
subtitle = sprintf("Each line = S_k(θ) for a random θ ~ φ(θ); θ values: %s",
paste(round(theta_samples, 2), collapse = ", ")),
x = "Time Window (Observed)",
y = "Signal Value",
color = "Trajectory",
shape = "Trajectory"
) +
scale_color_viridis_d() +
theme_minimal() +
theme(legend.position = "none")
}
# Run this to see phase-driven trajectory diversity
print(sample_phases(fit, n_samples = 6))# ------------------------------------------------------------------------------
# 6. QUANTITATIVE VALIDATION METRICS
# ------------------------------------------------------------------------------
cat("\n════════════════════════════════════════════════════════════════\n")##
## ════════════════════════════════════════════════════════════════
## PREDICTION VALIDATION METRICS
## ════════════════════════════════════════════════════════════════
## Prediction horizon: 30 time windows
## Monte Carlo draws: 5000
## Predictive SD (avg): 1.1763
## 95% PI width (avg): 4.6044
## 50% PI width (avg): 1.5826
## Point prediction stability: 0.0169 (SD of means)
# Check for pathological predictions
if (any(pred$pi_low > pred$pi_high)) {
cat("⚠️ WARNING: Prediction intervals inverted in some windows!\n")
}
if (max(pred$sd) > 3 * fit$sigma) {
cat("⚠️ WARNING: Excessive predictive uncertainty (SD > 3σ)\n")
} else {
cat("✓ Predictive uncertainty calibrated to observation noise\n")
}## ✓ Predictive uncertainty calibrated to observation noise
## ════════════════════════════════════════════════════════════════
Summary of the above KPT-Prediction experimental results.
| Feature | Implementation |
|---|---|
| Multiple time-courses | 100 individual Monte Carlo trajectories shown as faint blue lines |
| Uncertainty quantification | Nested \(50\%\) (dark) and \(95\%\) (light) prediction intervals |
| Ground truth validation | Simulated data with known \(\varphi(\theta)\) and \(S(\theta)\) for coverage checks |
| Phase-driven diversity | Trajectories reflect sampling from \(\varphi(\theta)\) (see animated sampling plot) |
| Calibration diagnostics | Coverage percentage + interval width metrics |
| Realism | \(5\%\) missing data, time-varying signal shapes, bimodal phase distribution |
The expected output of this experiment contains:
Tuning Prediction Behavior
| Desired behavior | Parameter to adjust |
|---|---|
| Wider prediction intervals | Increase sigma in simulation or check if
fit$sigma is underestimated |
| More trajectory diversity | Ensure \(\varphi(\theta)\) is not
overly peaked (check fit$phi_uniform_dist) |
| Smoother trajectories | Increase J during fitting for richer harmonic
representation |
| Tighter intervals | Reduce observation noise (sigma) or increase
n_stars |
This test shows that kpt_predict() correctly propagates
uncertainty from both phase distribution (\(\varphi\)) and observation noise (\(\sigma\)) into realistic, diverse future
time-courses, which demonstrated KPT-prediction for forecasting
applications.
Real Kepler Star-Curves Data)# ==============================================================================
# COMPREHENSIVE TEST: KPT-FFT PREDICTION WITH MULTIPLE TIME-COURSES
# ==============================================================================
# Fit KPT-FFT model (using convergence-friendly parameters)
# fit_kepler_fft <- kpt_fft(kpt_data$Y, max_iter = 1000, damp=0.2, lambda_phi = 0.05, s_sobolev = 2, J=4, lambda_S = 0.20)
# Generate predictions for 30 future time windows
n_new <- 30
pred <- kpt_predict(fit_kepler_fft, n_new = n_new, n_draw = 5000)
# ------------------------------------------------------------------------------
# 3. VISUALIZE MULTIPLE INDIVIDUAL TIME-COURSES
# ------------------------------------------------------------------------------
# Extract 100 individual predicted trajectories
n_show <- 100
individual_draws <- as.data.frame(pred$draws[1:n_show, ])
individual_draws$draw_id <- 1:n_show
individual_draws <- tidyr::pivot_longer(
individual_draws,
cols = -draw_id,
names_to = "time",
values_to = "value"
)
individual_draws$time <- as.numeric(gsub("V", "", individual_draws$time))
# Create plot with multiple trajectories + intervals
p1 <- ggplot() +
# Individual trajectories (faint)
geom_line(
data = individual_draws,
aes(x = time, y = value, group = draw_id),
alpha = 0.15,
size = 0.6,
color = "#2E86AB"
) +
# 50% prediction interval
geom_ribbon(
aes(x = 1:n_new, ymin = pred$pi_50_low, ymax = pred$pi_50_high),
fill = "#A23B72",
alpha = 0.35
) +
# 95% prediction interval
geom_ribbon(
aes(x = 1:n_new, ymin = pred$pi_low, ymax = pred$pi_high),
fill = "#A23B72",
alpha = 0.20
) +
# Point prediction (posterior mean)
geom_line(
aes(x = 1:n_new, y = pred$mean),
color = "#F18F01",
size = 1.5,
linetype = "solid"
) +
# Zero line for reference
geom_hline(yintercept = 0, linetype = "dashed", color = "gray40", alpha = 0.5) +
labs(
title = "KPT-FFT Forward Predictions: 100 Individual Time-Courses",
subtitle = sprintf(
"n_stars = %d, n_windows = %d (observed) → %d (predicted) | σ̂ = %.3f",
nrow(kpt_data$Y), ncol(kpt_data$Y), n_new, fit_kepler_fft$sigma
),
x = "Time Window (Future)",
y = "Predicted Signal Value",
caption = "Faint blue lines: individual Monte Carlo trajectories\nShaded bands: 50% (dark) and 95% (light) prediction intervals\nOrange line: posterior mean prediction"
) +
theme_minimal(base_size = 13) +
theme(
plot.title = element_text(face = "bold", hjust = 0.5),
plot.subtitle = element_text(hjust = 0.5, size = 11, color = "gray40"),
legend.position = "none"
)
p1# ------------------------------------------------------------------------------
# 4. DIAGNOSTIC: PHASE DISTRIBUTION + PREDICTION CALIBRATION
# ------------------------------------------------------------------------------
# Plot 1: True vs Estimated Phase Distribution - TRUE is UNKNOWN!!!
# p2_left <- ggplot(data.frame(
# theta = fit_kepler_fft$theta,
# phi_true = approx(kpt_data$theta_grid, kpt_data$phi_true, fit_kepler_fft$theta)$y,
# phi_est = fit_kepler_fft$phi
# ), aes(x = theta)) +
# geom_line(aes(y = phi_true, color = "True φ(θ)"), size = 1.2) +
# geom_line(aes(y = phi_est, color = "Estimated φ(θ)"), size = 1.2, linetype = "dashed") +
# labs(
# title = "Phase Distribution Recovery",
# x = expression(theta),
# y = expression(phi(theta)),
# color = ""
# ) +
# scale_color_manual(values = c("True φ(θ)" = "#06A77D", "Estimated φ(θ)" = "#D62828")) +
# theme_minimal() +
# theme(legend.position = c(0.85, 0.85))
# # Plot 2: Prediction interval coverage (simulation only)
# # Generate pseudo-observations from true model for coverage check
# set.seed(123)
# theta_new <- sample(kpt_data$theta_grid, n_new, replace = TRUE, prob = kpt_data$phi_true / sum(kpt_data$phi_true))
# y_new_true <- numeric(n_new)
# for (k in 1:n_new) {
# idx <- which.min(abs(kpt_data$theta_grid - theta_new[k]))
# # Use last observed window's signal shape scaled appropriately
# S_idx <- min(k, nrow(kpt_data$S_true))
# y_new_true[k] <- kpt_data$S_true[S_idx, idx] + rnorm(1, 0, kpt_data$sigma_true)
# }
#
# coverage <- mean(y_new_true >= pred$pi_low & y_new_true <= pred$pi_high)
# p2_right <- ggplot(data.frame(
# time = 1:n_new,
# y_true = y_new_true,
# pi_low = pred$pi_low,
# pi_high = pred$pi_high,
# mean_pred = pred$mean
# ), aes(x = time)) +
# geom_ribbon(aes(ymin = pi_low, ymax = pi_high), fill = "#A23B72", alpha = 0.25) +
# geom_line(aes(y = mean_pred), color = "#F18F01", size = 1) +
# geom_point(aes(y = y_true), color = "#06A77D", size = 2.5, stroke = 0.5) +
# geom_hline(yintercept = 0, linetype = "dashed", color = "gray50", alpha = 0.4) +
# labs(
# title = sprintf("Prediction Coverage: %.1f%% (target 95%%)", coverage * 100),
# x = "Time Window",
# y = "Signal Value",
# caption = "Green points: simulated future observations\nShaded band: 95% prediction interval"
# ) +
# theme_minimal()
# # Combine plots
# library(gridExtra)
# grid.arrange(p2_left, p2_right, ncol = 2)
# ------------------------------------------------------------------------------
# 5. ADVANCED: ANIMATED TRAJECTORY SAMPLING (OPTIONAL)
# ------------------------------------------------------------------------------
# Show how individual trajectories reflect phase structure
sample_phases <- function(fit_kepler_fft, n_samples = 5) {
p_theta <- fit_kepler_fft$phi / sum(fit_kepler_fft$phi)
theta_samples <- sample(fit_kepler_fft$theta, n_samples, replace = TRUE, prob = p_theta)
df <- data.frame(
time = rep(1:ncol(fit_kepler_fft$S), n_samples),
value = as.vector(t(fit_kepler_fft$S)[, rep(1:n_samples, each = ncol(fit_kepler_fft$S))]),
trajectory = rep(1:n_samples, each = ncol(fit_kepler_fft$S)),
theta_phase = rep(round(theta_samples, 2), each = ncol(fit_kepler_fft$S))
)
ggplot(df, aes(x = time, y = value, group = trajectory, color = factor(trajectory))) +
geom_line(size = 1.2, alpha = 0.8) +
geom_point(aes(shape = factor(trajectory)), size = 3) +
labs(
title = "Sampled Trajectories Reflect Phase Distribution",
subtitle = sprintf("Each line = S_k(θ) for a random θ ~ φ(θ); θ values: %s",
paste(round(theta_samples, 2), collapse = ", ")),
x = "Time Window (Observed)",
y = "Signal Value",
color = "Trajectory",
shape = "Trajectory"
) +
scale_color_viridis_d() +
theme_minimal() +
theme(legend.position = "none")
}
# Run this to see phase-driven trajectory diversity
sample_phases(fit_kepler_fft, n_samples = 6)# ------------------------------------------------------------------------------
# 6. QUANTITATIVE VALIDATION METRICS
# ------------------------------------------------------------------------------
cat("\n════════════════════════════════════════════════════════════════\n")##
## ════════════════════════════════════════════════════════════════
## PREDICTION VALIDATION METRICS
## ════════════════════════════════════════════════════════════════
## Prediction horizon: 30 time windows
## Monte Carlo draws: 5000
## Predictive SD (avg): 0.9936
## 95% PI width (avg): 3.8894
## 50% PI width (avg): 1.3326
## Point prediction stability: 0.0130 (SD of means)
# Check for pathological predictions
if (any(pred$pi_low > pred$pi_high)) {
cat("⚠️ WARNING: Prediction intervals inverted in some windows!\n")
}
if (max(pred$sd) > 3 * fit_kepler_fft$sigma) {
cat("⚠️ WARNING: Excessive predictive uncertainty (SD > 3σ)\n")
} else {
cat("✓ Predictive uncertainty calibrated to observation noise\n")
}## ✓ Predictive uncertainty calibrated to observation noise
## ════════════════════════════════════════════════════════════════