SOCR ≫ DSPA ≫ DSPA2 Topics ≫

1 Overview

This notebook contains a self-contained simulation pipeline for validating the regularization commutator framework in linear and logistic regression. This V6 Rmd version addresses several issues with previous versions, expands DSPA_Appendix_25_StatisticalCommutator_v1, and reflects R1 commutator paper expansion.

The main V6 improvements include:

  1. The naive influence commutator is treated as identically degenerate because \(H_\lambda^{-1}A + H_\lambda^{-1}(\lambda P)=I\), which is never used as a tuning objective.

  2. The implemented Type I and Type II commutators are

\[C_I(\lambda) = [A, H_\lambda^{-1}P], \qquad C_{II}(\lambda) = [P, H_\lambda^{-1}A].\]

  1. The balanced score is maximized directly (V6 no longer minimizes the reciprocal of the score).

  2. The effective degrees-of-freedom balance is adjusted for rank-deficient penalties. If \(r=\dim\ker(P)\), then

\[B_r(\lambda)=\{\operatorname{df}(\lambda)-r\}\{p-\operatorname{df}(\lambda)\}, \qquad \operatorname{df}(\lambda)=\operatorname{tr}(H_\lambda^{-1}A).\]

  1. Oracle, cross-validation, and test evaluation are all computed on a grid that includes \(\lambda=0\), and failed or flat objectives fall back to \(\lambda=0\), not to the middle of the grid.

  2. Logistic regression uses a stable penalized IRLS algorithm with a proper penalized-objective line search.

  3. Summary tables use a corrected median/MAD aggregation routine, avoiding the previous interleaving bug that mislabeled median and MAD columns.

  4. The sparse ridge case \(P=I\) is retained as a negative control. Since \(P\) commutes with every \(A\), both commutator scores should be flat or numerically zero.

2 Core numerical utilities

# -----------------------------------------------------------------------------
# Helpers
# -----------------------------------------------------------------------------
`%||%` <- function(x, y) if (is.null(x)) y else x

as_numeric_matrix <- function(x) {
  if (inherits(x, "Matrix")) as.matrix(x) else x
}

symmetrize <- function(M) {
  Matrix::forceSymmetric((M + t(M)) / 2)
}

# -----------------------------------------------------------------------------
# Numerically stable solve for symmetric positive definite / semidefinite systems.
# Escalates diagonal jitter before falling back to a Moore-Penrose inverse.
# -----------------------------------------------------------------------------
stable_solve_spd <- function(M, b = NULL, jitter = 1e-8, max_tries = 6) {
  M <- Matrix::forceSymmetric(M)
  p <- nrow(M)
  diag_scale <- mean(abs(Matrix::diag(M)))
  if (!is.finite(diag_scale) || diag_scale <= 0) diag_scale <- 1

  for (k in 0:max_tries) {
    jit <- jitter * (10^k) * diag_scale
    Mj <- M + jit * Matrix::Diagonal(n = p)
    cholM <- tryCatch(
      Matrix::Cholesky(Mj, LDL = FALSE, perm = TRUE),
      error = function(e) NULL
    )
    if (!is.null(cholM)) {
      if (is.null(b)) return(Matrix::solve(cholM, Matrix::Diagonal(n = p)))
      return(Matrix::solve(cholM, b))
    }
  }

  Minv <- MASS::ginv(as.matrix(M))
  if (is.null(b)) return(Minv)
  Minv %*% as.matrix(b)
}

# -----------------------------------------------------------------------------
# Penalty nullity and adjusted balance factor.
# -----------------------------------------------------------------------------
penalty_nullity <- function(P, tol = 1e-8) {
  ev <- eigen(as.matrix(Matrix::forceSymmetric(P)), symmetric = TRUE,
              only.values = TRUE)$values
  scale <- max(1, max(abs(ev), na.rm = TRUE))
  sum(ev <= tol * scale)
}

balance_factor <- function(eff_df, p, nullity = 0) {
  if (!is.finite(eff_df)) return(NA_real_)
  nullity <- max(0, min(p - 1, as.integer(round(nullity))))
  df_clip <- min(p, max(nullity, Re(eff_df)))
  (df_clip - nullity) * (p - df_clip)
}

scale_penalty_to_A <- function(P, A, eps = 1e-12) {
  P <- Matrix::forceSymmetric(P)
  scale <- as.numeric(Matrix::norm(A, "F")) / max(eps, as.numeric(Matrix::norm(P, "F")))
  P * scale
}

# -----------------------------------------------------------------------------
# Safe grid selection.
# Returns a status flag instead of silently choosing a meaningless grid point.
# -----------------------------------------------------------------------------
select_lambda_grid <- function(vals, grid_vals, maximize = FALSE,
                               fallback = 0, tie_tol = 1e-10,
                               flat_tol = 1e-12) {
  stopifnot(length(vals) == length(grid_vals))
  ok <- is.finite(vals) & is.finite(grid_vals)
  if (!any(ok)) {
    return(list(lambda = fallback, value = NA_real_, index = NA_integer_,
                status = "all_nonfinite"))
  }

  v <- vals[ok]
  g <- grid_vals[ok]
  original_idx <- which(ok)

  spread <- max(v) - min(v)
  scale <- max(1, max(abs(v)))
  if (!is.finite(spread) || spread <= flat_tol * scale) {
    return(list(lambda = fallback, value = v[1], index = original_idx[1],
                status = "flat_objective"))
  }

  best <- if (maximize) max(v) else min(v)
  tol_abs <- tie_tol * max(1, abs(best))
  tie_idx <- if (maximize) which(v >= best - tol_abs) else which(v <= best + tol_abs)
  j <- tie_idx[1]
  list(lambda = g[j], value = v[j], index = original_idx[j], status = "ok")
}

lambda_index <- function(lambda, grid_vals, tol = 1e-12) {
  d <- abs(grid_vals - lambda)
  j <- which.min(d)
  if (!length(j) || !is.finite(d[j]) || d[j] > tol * max(1, abs(lambda))) {
    return(NA_integer_)
  }
  j
}

metric_at_lambda <- function(metric, lambda, grid_vals) {
  j <- lambda_index(lambda, grid_vals)
  if (is.na(j)) return(NA_real_)
  metric[j]
}

3 Penalty construction and data generation

# -----------------------------------------------------------------------------
# Penalty matrices
# -----------------------------------------------------------------------------
make_diff_penalty <- function(p, order = 1) {
  if (p < 2) stop("p must be at least 2")
  D <- Matrix::Matrix(0, nrow = p - 1, ncol = p, sparse = TRUE)
  for (i in seq_len(p - 1)) {
    D[i, i] <- -1
    D[i, i + 1] <- 1
  }
  P <- Matrix::crossprod(D)
  if (order <= 1) return(Matrix::forceSymmetric(P))
  for (k in 2:order) P <- Matrix::crossprod(D %*% P)
  Matrix::forceSymmetric(P)
}

make_complete_graph_laplacian <- function(groups, p) {
  P <- Matrix::Matrix(0, nrow = p, ncol = p, sparse = FALSE)
  for (idx in groups) {
    m <- length(idx)
    if (m <= 1) next
    L <- diag(m) * m - matrix(1, m, m)
    P[idx, idx] <- P[idx, idx] + L
  }
  Matrix::forceSymmetric(P)
}

make_groups <- function(p, n_groups) {
  split(seq_len(p), rep(seq_len(n_groups), length.out = p))
}

# -----------------------------------------------------------------------------
# Simulation scenarios.
# Sparse with P=I is intentionally a negative control for the commutator score.
# -----------------------------------------------------------------------------
generate_dataset <- function(scenario = "toeplitz", n = 200, p = 50,
                             type = c("linear", "logistic"),
                             signal_scale = 1, seed = NULL,
                             linear_noise_sd = 1,
                             logistic_max_eta_sd = 2.5,
                             block_penalty = c("laplacian", "covariance")) {
  type <- match.arg(type)
  block_penalty <- match.arg(block_penalty)
  if (!is.null(seed)) set.seed(seed)

  scenario <- tolower(scenario)

  if (scenario == "toeplitz") {
    rho <- 0.5
    Sigma <- toeplitz(rho^(0:(p - 1)))
    X <- MASS::mvrnorm(n, mu = rep(0, p), Sigma = Sigma)
    beta_true <- numeric(p)
    beta_true[5:min(10, p)] <- 2
    if (p >= 25) beta_true[20:25] <- -1.5
    if (p >= 40) beta_true[35:40] <- 1
    P <- make_diff_penalty(p, order = 1)

  } else if (scenario == "block") {
    n_blocks <- 5
    groups <- make_groups(p, n_blocks)
    Sigma <- matrix(0.2, p, p)
    for (idx in groups) Sigma[idx, idx] <- 0.7
    diag(Sigma) <- 1
    X <- MASS::mvrnorm(n, mu = rep(0, p), Sigma = Sigma)
    beta_true <- numeric(p)
    active <- sample(seq_len(p), min(10, p))
    beta_true[active] <- rnorm(length(active), mean = 2, sd = 0.5)
    P <- if (block_penalty == "laplacian") {
      make_complete_graph_laplacian(groups, p)
    } else {
      Matrix::forceSymmetric(Matrix::Matrix(Sigma, sparse = FALSE))
    }

  } else if (scenario == "sparse") {
    X <- matrix(rnorm(n * p), n, p)
    beta_true <- numeric(p)
    active <- sample(seq_len(p), min(5, p))
    beta_true[active] <- runif(length(active), 2, 4) *
      sample(c(-1, 1), length(active), replace = TRUE)
    P <- Matrix::Diagonal(p)

  } else if (scenario == "grouped") {
    n_groups <- 10
    groups <- make_groups(p, n_groups)
    X <- matrix(0, n, p)
    for (idx in groups) {
      factor <- rnorm(n)
      X[, idx] <- matrix(factor, n, length(idx)) +
        matrix(rnorm(n * length(idx), sd = 0.5), n, length(idx))
    }
    beta_true <- numeric(p)
    active_groups <- sample(seq_along(groups), min(3, length(groups)))
    for (g in active_groups) {
      idx <- groups[[g]]
      beta_true[idx] <- rnorm(length(idx), mean = 2, sd = 0.5)
    }
    P <- make_complete_graph_laplacian(groups, p)

  } else {
    stop("Unknown scenario: ", scenario)
  }

  beta_true <- beta_true * signal_scale

  if (type == "logistic" && is.finite(logistic_max_eta_sd)) {
    eta0 <- as.vector(X %*% beta_true)
    s_eta <- stats::sd(eta0)
    if (is.finite(s_eta) && s_eta > logistic_max_eta_sd && s_eta > 0) {
      beta_true <- beta_true * (logistic_max_eta_sd / s_eta)
    }
  }

  if (type == "linear") {
    y <- as.vector(X %*% beta_true + rnorm(n, sd = linear_noise_sd))
  } else {
    prob <- stats::plogis(as.vector(X %*% beta_true))
    y <- stats::rbinom(n, size = 1, prob = prob)
  }

  list(X = X, y = y, beta_true = beta_true, P = P,
       scenario = scenario, type = type)
}

preprocess_train_test <- function(X_train, X_test, y_train, y_test, type) {
  x_center <- colMeans(X_train)
  x_scale <- apply(X_train, 2, stats::sd)
  x_scale[!is.finite(x_scale) | x_scale < 1e-8] <- 1
  X_train_s <- sweep(sweep(X_train, 2, x_center, "-"), 2, x_scale, "/")
  X_test_s <- sweep(sweep(X_test, 2, x_center, "-"), 2, x_scale, "/")

  if (type == "linear") {
    y_center <- mean(y_train)
    y_train_s <- y_train - y_center
    y_test_s <- y_test - y_center
  } else {
    y_center <- 0
    y_train_s <- as.integer(y_train)
    y_test_s <- as.integer(y_test)
  }

  list(X_train = X_train_s, X_test = X_test_s,
       y_train = y_train_s, y_test = y_test_s,
       x_center = x_center, x_scale = x_scale, y_center = y_center)
}

4 Model fitting and metrics

# -----------------------------------------------------------------------------
# Linear model fitting
# -----------------------------------------------------------------------------
fit_linear_penalized <- function(X, y, P, lambda, jitter = 1e-8) {
  n <- nrow(X)
  A <- Matrix::crossprod(X) / n
  b <- Matrix::crossprod(X, y) / n
  beta <- stable_solve_spd(A + lambda * P, b, jitter = jitter)
  as.vector(beta)
}

# -----------------------------------------------------------------------------
# AUC and log-loss
# -----------------------------------------------------------------------------
auc_fast <- function(y_true, p_hat) {
  y_true <- as.integer(y_true)
  p_hat <- as.vector(p_hat)
  n_pos <- sum(y_true == 1L)
  n_neg <- sum(y_true == 0L)
  if (n_pos == 0L || n_neg == 0L) return(NA_real_)
  r <- rank(p_hat, ties.method = "average")
  (sum(r[y_true == 1L]) - n_pos * (n_pos + 1) / 2) / (n_pos * n_neg)
}

binomial_logloss <- function(y, prob, eps = 1e-12) {
  prob <- pmin(pmax(as.vector(prob), eps), 1 - eps)
  -mean(y * log(prob) + (1 - y) * log(1 - prob))
}

# -----------------------------------------------------------------------------
# Stable penalized logistic IRLS.
# Objective: mean negative log-likelihood + lambda/2 * beta'P beta.
# -----------------------------------------------------------------------------
logistic_penalized_objective <- function(beta, X, y, P, lambda) {
  eta <- as.vector(X %*% beta)
  nll <- mean(pmax(eta, 0) - y * eta + log1p(exp(-abs(eta))))
  pen <- 0.5 * lambda * as.numeric(Matrix::crossprod(beta, P %*% beta))
  nll + pen
}

irls_penalized_logistic <- function(X, y, P, lambda, beta_init = NULL,
                                    max_iters = 100, tol = 1e-7,
                                    clip_eta = 30, min_w = 1e-6,
                                    jitter = 1e-8, max_half_steps = 25) {
  n <- nrow(X)
  p <- ncol(X)
  beta <- if (is.null(beta_init)) rep(0, p) else as.vector(beta_init)
  obj <- logistic_penalized_objective(beta, X, y, P, lambda)
  converged <- FALSE
  last_step_factor <- NA_real_
  it_used <- 0L

  for (it in seq_len(max_iters)) {
    it_used <- it
    eta <- as.vector(X %*% beta)
    eta_clip <- pmax(pmin(eta, clip_eta), -clip_eta)
    mu <- stats::plogis(eta_clip)
    Wv <- pmax(mu * (1 - mu), min_w)
    z <- eta + (y - mu) / Wv

    W <- Matrix::Diagonal(n = n, x = Wv)
    A <- Matrix::crossprod(X, W %*% X) / n
    b <- Matrix::crossprod(X, W %*% z) / n
    H <- A + lambda * P

    beta_full <- tryCatch(
      as.vector(stable_solve_spd(H, b, jitter = jitter)),
      error = function(e) rep(NA_real_, p)
    )
    if (!all(is.finite(beta_full))) break

    direction <- beta_full - beta
    if (max(abs(direction)) < tol * (1 + max(abs(beta)))) {
      beta <- beta_full
      converged <- TRUE
      break
    }

    accepted <- FALSE
    step_factor <- 1
    beta_try <- beta
    obj_try <- obj
    for (hs in 0:max_half_steps) {
      beta_try <- beta + step_factor * direction
      obj_try <- logistic_penalized_objective(beta_try, X, y, P, lambda)
      if (is.finite(obj_try) && obj_try <= obj + 1e-10) {
        accepted <- TRUE
        break
      }
      step_factor <- step_factor / 2
    }
    last_step_factor <- step_factor
    if (!accepted) break

    if (max(abs(beta_try - beta)) < tol * (1 + max(abs(beta)))) {
      beta <- beta_try
      obj <- obj_try
      converged <- TRUE
      break
    }

    beta <- beta_try
    obj <- obj_try
  }

  list(beta = beta, converged = converged, iters = it_used,
       objective = obj, last_step_factor = last_step_factor)
}

5 Commutator scores

# -----------------------------------------------------------------------------
# Compute Type I and Type II scores from A and P.
# -----------------------------------------------------------------------------
commutator_from_curvature <- function(A, P, lambda, nullity = NULL,
                                      jitter = 1e-8) {
  p <- ncol(A)
  if (is.null(nullity)) nullity <- penalty_nullity(P)
  H <- A + lambda * P
  HinvA <- stable_solve_spd(H, A, jitter = jitter)
  HinvP <- stable_solve_spd(H, P, jitter = jitter)
  eff_df <- Re(sum(Matrix::diag(HinvA)))
  balance <- balance_factor(eff_df, p, nullity = nullity)

  C_I <- A %*% HinvP - HinvP %*% A
  C_II <- P %*% HinvA - HinvA %*% P
  mis_I <- as.numeric(Matrix::norm(C_I, "F")^2)
  mis_II <- as.numeric(Matrix::norm(C_II, "F")^2)

  list(score_I = mis_I * balance,
       score_II = mis_II * balance,
       mis_I = mis_I,
       mis_II = mis_II,
       eff_df = eff_df,
       balance = balance,
       nullity = nullity)
}

compute_commutator_scores_linear <- function(lambda_grid_eval, X, P,
                                             nullity = NULL, jitter = 1e-8) {
  A <- Matrix::crossprod(X) / nrow(X)
  if (is.null(nullity)) nullity <- penalty_nullity(P)
  out <- data.frame(
    lambda = lambda_grid_eval,
    score_I = NA_real_, score_II = NA_real_,
    mis_I = NA_real_, mis_II = NA_real_,
    eff_df = NA_real_, balance = NA_real_,
    converged = TRUE,
    stringsAsFactors = FALSE
  )
  for (j in seq_along(lambda_grid_eval)) {
    res <- tryCatch(
      commutator_from_curvature(A, P, lambda_grid_eval[j], nullity = nullity,
                                jitter = jitter),
      error = function(e) NULL
    )
    if (!is.null(res)) {
      out$score_I[j] <- res$score_I
      out$score_II[j] <- res$score_II
      out$mis_I[j] <- res$mis_I
      out$mis_II[j] <- res$mis_II
      out$eff_df[j] <- res$eff_df
      out$balance[j] <- res$balance
    }
  }
  out
}

compute_commutator_scores_logistic <- function(lambda_grid_eval, X, y, P,
                                               nullity = NULL, jitter = 1e-8) {
  if (is.null(nullity)) nullity <- penalty_nullity(P)
  out <- data.frame(
    lambda = lambda_grid_eval,
    score_I = NA_real_, score_II = NA_real_,
    mis_I = NA_real_, mis_II = NA_real_,
    eff_df = NA_real_, balance = NA_real_,
    converged = FALSE,
    iters = NA_integer_,
    stringsAsFactors = FALSE
  )

  ord <- order(lambda_grid_eval, decreasing = TRUE)
  beta_ws <- NULL
  for (j in ord) {
    lam <- lambda_grid_eval[j]
    fit <- tryCatch(
      irls_penalized_logistic(X, y, P, lam, beta_init = beta_ws, jitter = jitter),
      error = function(e) list(beta = rep(NA_real_, ncol(X)), converged = FALSE,
                               iters = NA_integer_)
    )
    out$converged[j] <- isTRUE(fit$converged)
    out$iters[j] <- fit$iters %||% NA_integer_
    if (!all(is.finite(fit$beta))) next
    beta_ws <- fit$beta

    eta <- as.vector(X %*% fit$beta)
    mu <- stats::plogis(pmax(pmin(eta, 30), -30))
    Wv <- pmax(mu * (1 - mu), 1e-6)
    A <- Matrix::crossprod(X, Matrix::Diagonal(n = nrow(X), x = Wv) %*% X) / nrow(X)
    res <- tryCatch(
      commutator_from_curvature(A, P, lam, nullity = nullity, jitter = jitter),
      error = function(e) NULL
    )
    if (!is.null(res)) {
      out$score_I[j] <- res$score_I
      out$score_II[j] <- res$score_II
      out$mis_I[j] <- res$mis_I
      out$mis_II[j] <- res$mis_II
      out$eff_df[j] <- res$eff_df
      out$balance[j] <- res$balance
    }
  }
  out
}

6 Oracle and cross-validation scans

# -----------------------------------------------------------------------------
# Test-set metric scans. These are used both to define the oracle and to evaluate
# the selected lambda values, ensuring oracle dominance by construction.
# -----------------------------------------------------------------------------
scan_test_metrics_linear <- function(lambda_grid_eval, X_train, y_train,
                                     X_test, y_test, P, jitter = 1e-8) {
  n <- nrow(X_train)
  A <- Matrix::crossprod(X_train) / n
  b <- Matrix::crossprod(X_train, y_train) / n
  mse <- rep(NA_real_, length(lambda_grid_eval))
  for (j in seq_along(lambda_grid_eval)) {
    lam <- lambda_grid_eval[j]
    beta <- tryCatch(
      as.vector(stable_solve_spd(A + lam * P, b, jitter = jitter)),
      error = function(e) rep(NA_real_, ncol(X_train))
    )
    if (all(is.finite(beta))) {
      mse[j] <- mean((y_test - as.vector(X_test %*% beta))^2)
    }
  }
  data.frame(lambda = lambda_grid_eval, MSE = mse, stringsAsFactors = FALSE)
}

scan_test_metrics_logistic <- function(lambda_grid_eval, X_train, y_train,
                                       X_test, y_test, P, jitter = 1e-8) {
  auc <- rep(NA_real_, length(lambda_grid_eval))
  logloss <- rep(NA_real_, length(lambda_grid_eval))
  converged <- rep(FALSE, length(lambda_grid_eval))
  iters <- rep(NA_integer_, length(lambda_grid_eval))

  ord <- order(lambda_grid_eval, decreasing = TRUE)
  beta_ws <- NULL
  for (j in ord) {
    lam <- lambda_grid_eval[j]
    fit <- tryCatch(
      irls_penalized_logistic(X_train, y_train, P, lam,
                              beta_init = beta_ws, jitter = jitter),
      error = function(e) list(beta = rep(NA_real_, ncol(X_train)),
                               converged = FALSE, iters = NA_integer_)
    )
    converged[j] <- isTRUE(fit$converged)
    iters[j] <- fit$iters %||% NA_integer_
    if (!all(is.finite(fit$beta))) next
    beta_ws <- fit$beta
    prob <- stats::plogis(pmax(pmin(as.vector(X_test %*% fit$beta), 30), -30))
    auc[j] <- auc_fast(y_test, prob)
    logloss[j] <- binomial_logloss(y_test, prob)
  }

  data.frame(lambda = lambda_grid_eval, AUC = auc, LogLoss = logloss,
             converged = converged, iters = iters, stringsAsFactors = FALSE)
}

# -----------------------------------------------------------------------------
# Cross-validation objectives.
# Linear CV minimizes validation MSE.
# Logistic CV minimizes validation log-loss.
# -----------------------------------------------------------------------------
make_folds <- function(n, n_folds = 5) {
  sample(rep(seq_len(n_folds), length.out = n))
}

cv_linear <- function(lambda_grid_eval, X, y, P, n_folds = 5, jitter = 1e-8) {
  folds <- make_folds(nrow(X), n_folds)
  scores <- rep(NA_real_, length(lambda_grid_eval))

  for (j in seq_along(lambda_grid_eval)) {
    lam <- lambda_grid_eval[j]
    fold_err <- rep(NA_real_, n_folds)
    for (f in seq_len(n_folds)) {
      idx_tr <- which(folds != f)
      idx_val <- which(folds == f)
      Xtr <- X[idx_tr, , drop = FALSE]
      ytr <- y[idx_tr]
      Xval <- X[idx_val, , drop = FALSE]
      yval <- y[idx_val]
      beta <- tryCatch(
        fit_linear_penalized(Xtr, ytr, P, lam, jitter = jitter),
        error = function(e) rep(NA_real_, ncol(X))
      )
      if (all(is.finite(beta))) {
        fold_err[f] <- mean((yval - as.vector(Xval %*% beta))^2)
      }
    }
    scores[j] <- mean(fold_err, na.rm = TRUE)
  }
  scores
}

cv_logistic <- function(lambda_grid_eval, X, y, P, n_folds = 5, jitter = 1e-8) {
  folds <- make_folds(nrow(X), n_folds)
  fold_scores <- matrix(NA_real_, nrow = length(lambda_grid_eval), ncol = n_folds)

  for (f in seq_len(n_folds)) {
    idx_tr <- which(folds != f)
    idx_val <- which(folds == f)
    Xtr <- X[idx_tr, , drop = FALSE]
    ytr <- y[idx_tr]
    Xval <- X[idx_val, , drop = FALSE]
    yval <- y[idx_val]

    ord <- order(lambda_grid_eval, decreasing = TRUE)
    beta_ws <- NULL
    for (j in ord) {
      lam <- lambda_grid_eval[j]
      fit <- tryCatch(
        irls_penalized_logistic(Xtr, ytr, P, lam,
                                beta_init = beta_ws, jitter = jitter),
        error = function(e) list(beta = rep(NA_real_, ncol(X)), converged = FALSE)
      )
      if (!all(is.finite(fit$beta))) next
      beta_ws <- fit$beta
      prob <- stats::plogis(pmax(pmin(as.vector(Xval %*% fit$beta), 30), -30))
      fold_scores[j, f] <- binomial_logloss(yval, prob)
    }
  }

  rowMeans(fold_scores, na.rm = TRUE)
}

7 Single Run Experimental Driver

run_one_replication <- function(scenario, rep_id, type = c("linear", "logistic"),
                                n = 200, p = 50, lambda_grid_eval,
                                n_folds = 5, seed_base = 1,
                                linear_signal_scale = 1,
                                logistic_signal_scale = 0.5,
                                logistic_max_eta_sd = 2.5,
                                linear_noise_sd = 1,
                                block_penalty = "laplacian") {
  type <- match.arg(type)
  signal_scale <- if (type == "linear") linear_signal_scale else logistic_signal_scale
  data_seed <- seed_base + rep_id * 1000 + match(scenario, scenarios) * 100000 +
    ifelse(type == "linear", 0, 5000000)

  dat <- generate_dataset(
    scenario = scenario, n = n, p = p, type = type,
    signal_scale = signal_scale, seed = data_seed,
    linear_noise_sd = linear_noise_sd,
    logistic_max_eta_sd = logistic_max_eta_sd,
    block_penalty = block_penalty
  )

  set.seed(data_seed + 17)
  n_test <- round(0.30 * n)
  idx_test <- sample(seq_len(n), n_test)
  idx_train <- setdiff(seq_len(n), idx_test)

  prep <- preprocess_train_test(
    dat$X[idx_train, , drop = FALSE], dat$X[idx_test, , drop = FALSE],
    dat$y[idx_train], dat$y[idx_test], type = type
  )
  X_train <- prep$X_train
  X_test <- prep$X_test
  y_train <- prep$y_train
  y_test <- prep$y_test

  A_train <- Matrix::crossprod(X_train) / nrow(X_train)
  P <- scale_penalty_to_A(dat$P, A_train)
  nullity_P <- penalty_nullity(P)

  # Test scan: used for oracle and all selected-method evaluations.
  if (type == "linear") {
    test_scan <- scan_test_metrics_linear(lambda_grid_eval, X_train, y_train,
                                          X_test, y_test, P)
    oracle_sel <- select_lambda_grid(test_scan$MSE, lambda_grid_eval,
                                     maximize = FALSE, fallback = 0)
    cv_scores <- cv_linear(lambda_grid_eval, X_train, y_train, P, n_folds = n_folds)
    cv_sel <- select_lambda_grid(cv_scores, lambda_grid_eval,
                                 maximize = FALSE, fallback = 0)
    comm_scan <- compute_commutator_scores_linear(lambda_grid_eval, X_train, P,
                                                  nullity = nullity_P)
    metric_vec <- test_scan$MSE
    metric_name <- "MSE"
  } else {
    test_scan <- scan_test_metrics_logistic(lambda_grid_eval, X_train, y_train,
                                            X_test, y_test, P)
    oracle_sel <- select_lambda_grid(test_scan$AUC, lambda_grid_eval,
                                     maximize = TRUE, fallback = 0)
    cv_scores <- cv_logistic(lambda_grid_eval, X_train, y_train, P, n_folds = n_folds)
    cv_sel <- select_lambda_grid(cv_scores, lambda_grid_eval,
                                 maximize = FALSE, fallback = 0)
    comm_scan <- compute_commutator_scores_logistic(lambda_grid_eval, X_train, y_train, P,
                                                    nullity = nullity_P)
    metric_vec <- test_scan$AUC
    metric_name <- "AUC"
  }

  comm_I_sel <- select_lambda_grid(comm_scan$score_I, lambda_grid_eval,
                                   maximize = TRUE, fallback = 0)
  comm_II_sel <- select_lambda_grid(comm_scan$score_II, lambda_grid_eval,
                                    maximize = TRUE, fallback = 0)

  lambda_unreg <- 0
  lambda_oracle <- oracle_sel$lambda
  lambda_cv <- cv_sel$lambda
  lambda_comm_I <- comm_I_sel$lambda
  lambda_comm_II <- comm_II_sel$lambda

  metric_unreg <- metric_at_lambda(metric_vec, lambda_unreg, lambda_grid_eval)
  metric_oracle <- metric_at_lambda(metric_vec, lambda_oracle, lambda_grid_eval)
  metric_cv <- metric_at_lambda(metric_vec, lambda_cv, lambda_grid_eval)
  metric_comm_I <- metric_at_lambda(metric_vec, lambda_comm_I, lambda_grid_eval)
  metric_comm_II <- metric_at_lambda(metric_vec, lambda_comm_II, lambda_grid_eval)

  method_metrics <- c(metric_unreg, metric_cv, metric_comm_I, metric_comm_II)
  oracle_ok <- if (type == "linear") {
    is.finite(metric_oracle) && metric_oracle <= min(method_metrics, na.rm = TRUE) + 1e-8
  } else {
    is.finite(metric_oracle) && metric_oracle >= max(method_metrics, na.rm = TRUE) - 1e-8
  }

  data.frame(
    Model = type,
    Scenario = scenario,
    Rep = rep_id,
    NTrain = nrow(X_train),
    NTest = nrow(X_test),
    PDim = ncol(X_train),
    PenaltyNullity = nullity_P,
    Lambda_Unreg = lambda_unreg,
    Lambda_Oracle = lambda_oracle,
    Lambda_CV = lambda_cv,
    Lambda_CommI = lambda_comm_I,
    Lambda_CommII = lambda_comm_II,
    Status_Oracle = oracle_sel$status,
    Status_CV = cv_sel$status,
    Status_CommI = comm_I_sel$status,
    Status_CommII = comm_II_sel$status,
    Objective_Oracle = oracle_sel$value,
    Objective_CV = cv_sel$value,
    Score_CommI = comm_I_sel$value,
    Score_CommII = comm_II_sel$value,
    MetricName = metric_name,
    Metric_Unreg = metric_unreg,
    Metric_Oracle = metric_oracle,
    Metric_CV = metric_cv,
    Metric_CommI = metric_comm_I,
    Metric_CommII = metric_comm_II,
    OracleDominanceOK = oracle_ok,
    stringsAsFactors = FALSE
  )
}

run_experiments <- function(type = c("linear", "logistic"), scenarios,
                            n_reps, n, p, lambda_grid_eval, n_folds,
                            seed_base, use_parallel = TRUE) {
  type <- match.arg(type)
  task_grid <- expand.grid(Scenario = scenarios, Rep = seq_len(n_reps),
                           stringsAsFactors = FALSE)

  runner <- function(i) {
    run_one_replication(
      scenario = task_grid$Scenario[i], rep_id = task_grid$Rep[i], type = type,
      n = n, p = p, lambda_grid_eval = lambda_grid_eval, n_folds = n_folds,
      seed_base = seed_base,
      linear_signal_scale = params$linear_signal_scale,
      logistic_signal_scale = params$logistic_signal_scale,
      logistic_max_eta_sd = params$logistic_max_eta_sd,
      linear_noise_sd = params$linear_noise_sd,
      block_penalty = params$block_penalty
    )
  }

  message("Running ", nrow(task_grid), " ", type, " tasks.")
  if (isTRUE(use_parallel) && has_future) {
    res <- future.apply::future_lapply(
      seq_len(nrow(task_grid)), runner,
      future.seed = TRUE,
      future.packages = c("Matrix", "MASS", "stats")
    )
  } else {
    res <- lapply(seq_len(nrow(task_grid)), runner)
  }
  do.call(rbind, res)
}

8 End-to-end Experimental Runs

The default settings run \(100\) replications per scenario for both models. For a quick smoke test, knit with a smaller params$n_reps, for example \(5\) or \(10\).

8.1 Linear Model

linear_df <- run_experiments(
  type = "linear",
  scenarios = scenarios,
  n_reps = params$n_reps,
  n = params$n,
  p = params$p,
  lambda_grid_eval = lambda_grid_eval,
  n_folds = params$n_folds,
  seed_base = params$seed,
  use_parallel = params$use_parallel
)

8.2 Logistic Model

logistic_df <- run_experiments(
  type = "logistic",
  scenarios = scenarios,
  n_reps = params$n_reps,
  n = params$n,
  p = params$p,
  lambda_grid_eval = lambda_grid_eval,
  n_folds = params$n_folds,
  seed_base = params$seed,
  use_parallel = params$use_parallel
)

9 Results analysis

9.1 Basic result-reporting Help Functions

robust_by_scenario <- function(df, raw_cols) {
  pieces <- lapply(split(df, df$Scenario), function(d) {
    out <- list(Scenario = d$Scenario[1], N = nrow(d))
    for (cl in raw_cols) {
      out[[paste0(cl, ".median")]] <- median(d[[cl]], na.rm = TRUE)
      out[[paste0(cl, ".mad")]] <- stats::mad(d[[cl]], constant = 1, na.rm = TRUE)
    }
    as.data.frame(out, check.names = FALSE)
  })
  res <- do.call(rbind, pieces)
  rownames(res) <- NULL
  res
}

check_oracle_dominance <- function(df, type = c("linear", "logistic"), tol = 1e-8) {
  type <- match.arg(type)
  method_cols <- intersect(c("Metric_Unreg", "Metric_CV", "Metric_CommI", "Metric_CommII"), names(df))
  if (!length(method_cols) || !"Metric_Oracle" %in% names(df)) return(integer(0))
  M <- as.matrix(df[, method_cols, drop = FALSE])
  if (type == "linear") {
    best_method <- apply(M, 1, min, na.rm = TRUE)
    which(df$Metric_Oracle > best_method + tol)
  } else {
    best_method <- apply(M, 1, max, na.rm = TRUE)
    which(df$Metric_Oracle < best_method - tol)
  }
}

selection_status_summary <- function(df) {
  status_cols <- intersect(c("Status_Oracle", "Status_CV", "Status_CommI", "Status_CommII"), names(df))
  pieces <- list()
  for (sc in unique(df$Scenario)) {
    d <- df[df$Scenario == sc, , drop = FALSE]
    for (cl in status_cols) {
      method <- sub("^Status_", "", cl)
      tab <- table(factor(d[[cl]], levels = c("ok", "flat_objective", "all_nonfinite")))
      pieces[[paste(sc, method, sep = "_")]] <- data.frame(
        Scenario = sc,
        Method = method,
        OK = unname(tab["ok"]),
        Flat = unname(tab["flat_objective"]),
        AllNonfinite = unname(tab["all_nonfinite"]),
        stringsAsFactors = FALSE
      )
    }
  }
  res <- do.call(rbind, pieces)
  rownames(res) <- NULL
  res
}

analyze_results <- function(df, type = c("linear", "logistic"),
                            extreme_mse_factor = 10) {
  type <- match.arg(type)
  n_before <- nrow(df)
  df <- df[is.finite(df$Metric_Oracle), , drop = FALSE]
  n_after <- nrow(df)
  cat(sprintf("Rows with finite oracle: %d / %d (%.1f%%)\n",
              n_after, n_before, 100 * n_after / max(1, n_before)))
  if (n_after == 0) stop("No finite oracle results.")

  bad_oracle <- check_oracle_dominance(df, type = type)
  cat(sprintf("Oracle-dominance violations: %d\n", length(bad_oracle)))

  if (type == "linear") {
    med_oracle <- median(df$Metric_Oracle, na.rm = TRUE)
    df$Extreme <- df$Metric_Oracle > extreme_mse_factor * med_oracle
  } else {
    df$Extreme <- FALSE
  }
  cat(sprintf("Extreme rows flagged: %d\n", sum(df$Extreme, na.rm = TRUE)))
  df_clean <- df[!df$Extreme, , drop = FALSE]

  metric_cols <- c("Metric_Unreg", "Metric_Oracle", "Metric_CV", "Metric_CommI", "Metric_CommII")
  robust <- robust_by_scenario(df_clean, metric_cols)

  if (type == "linear") {
    df_clean$Rel_Unreg <- df_clean$Metric_Unreg / df_clean$Metric_Oracle
    df_clean$Rel_CV <- df_clean$Metric_CV / df_clean$Metric_Oracle
    df_clean$Rel_CommI <- df_clean$Metric_CommI / df_clean$Metric_Oracle
    df_clean$Rel_CommII <- df_clean$Metric_CommII / df_clean$Metric_Oracle
    relative <- robust_by_scenario(df_clean,
                                   c("Rel_Unreg", "Rel_CV", "Rel_CommI", "Rel_CommII"))
  } else {
    df_clean$Rel_Unreg <- df_clean$Metric_Unreg / df_clean$Metric_Oracle
    df_clean$Rel_CV <- df_clean$Metric_CV / df_clean$Metric_Oracle
    df_clean$Rel_CommI <- df_clean$Metric_CommI / df_clean$Metric_Oracle
    df_clean$Rel_CommII <- df_clean$Metric_CommII / df_clean$Metric_Oracle
    df_clean$Gap_Unreg <- df_clean$Metric_Oracle - df_clean$Metric_Unreg
    df_clean$Gap_CV <- df_clean$Metric_Oracle - df_clean$Metric_CV
    df_clean$Gap_CommI <- df_clean$Metric_Oracle - df_clean$Metric_CommI
    df_clean$Gap_CommII <- df_clean$Metric_Oracle - df_clean$Metric_CommII
    relative <- robust_by_scenario(
      df_clean,
      c("Rel_Unreg", "Rel_CV", "Rel_CommI", "Rel_CommII",
        "Gap_Unreg", "Gap_CV", "Gap_CommI", "Gap_CommII")
    )
  }

  lambda_cols <- c("Lambda_Unreg", "Lambda_Oracle", "Lambda_CV", "Lambda_CommI", "Lambda_CommII")
  lambda_summary <- robust_by_scenario(df_clean, lambda_cols)

  tests <- list()
  for (sc in unique(df_clean$Scenario)) {
    idx <- which(df_clean$Scenario == sc)
    for (comm in c("CommI", "CommII")) {
      mcol <- paste0("Metric_", comm)
      diff <- if (type == "linear") {
        log(pmax(df_clean[[mcol]][idx], 1e-12)) -
          log(pmax(df_clean$Metric_CV[idx], 1e-12))
      } else {
        df_clean[[mcol]][idx] - df_clean$Metric_CV[idx]
      }
      ok <- is.finite(diff)
      if (sum(ok) >= 2 && stats::sd(diff[ok]) > 0) {
        tt <- stats::t.test(diff[ok])
        tests[[paste(sc, comm, sep = "_")]] <- data.frame(
          Scenario = sc, Method = comm,
          t = unname(tt$statistic), p = tt$p.value, n = sum(ok),
          stringsAsFactors = FALSE
        )
      } else {
        tests[[paste(sc, comm, sep = "_")]] <- data.frame(
          Scenario = sc, Method = comm,
          t = NA_real_, p = NA_real_, n = sum(ok),
          stringsAsFactors = FALSE
        )
      }
    }
  }
  test_df <- do.call(rbind, tests)
  rownames(test_df) <- NULL

  list(
    robust = robust,
    relative = relative,
    lambda = lambda_summary,
    tests = test_df,
    status = selection_status_summary(df_clean),
    df_full = df,
    df_clean = df_clean,
    n_extreme = sum(df$Extreme, na.rm = TRUE),
    oracle_violations = bad_oracle
  )
}

fmt_med_mad <- function(med, mad, digits = 3) {
  sprintf(paste0("%.", digits, "f (%.", digits, "f)"), med, mad)
}

metric_table <- function(analysis, digits = 3) {
  rob <- analysis$robust
  data.frame(
    Scenario = rob$Scenario,
    N = rob$N,
    Unregularized = fmt_med_mad(rob$Metric_Unreg.median, rob$Metric_Unreg.mad, digits),
    Oracle = fmt_med_mad(rob$Metric_Oracle.median, rob$Metric_Oracle.mad, digits),
    CV = fmt_med_mad(rob$Metric_CV.median, rob$Metric_CV.mad, digits),
    CommI = fmt_med_mad(rob$Metric_CommI.median, rob$Metric_CommI.mad, digits),
    CommII = fmt_med_mad(rob$Metric_CommII.median, rob$Metric_CommII.mad, digits),
    check.names = FALSE
  )
}

relative_table <- function(analysis, type = c("linear", "logistic"), digits = 3) {
  type <- match.arg(type)
  rel <- analysis$relative
  if (type == "linear") {
    data.frame(
      Scenario = rel$Scenario,
      N = rel$N,
      Unregularized = fmt_med_mad(rel$Rel_Unreg.median, rel$Rel_Unreg.mad, digits),
      CV = fmt_med_mad(rel$Rel_CV.median, rel$Rel_CV.mad, digits),
      CommI = fmt_med_mad(rel$Rel_CommI.median, rel$Rel_CommI.mad, digits),
      CommII = fmt_med_mad(rel$Rel_CommII.median, rel$Rel_CommII.mad, digits),
      check.names = FALSE
    )
  } else {
    data.frame(
      Scenario = rel$Scenario,
      N = rel$N,
      `AUC ratio: Unreg` = fmt_med_mad(rel$Rel_Unreg.median, rel$Rel_Unreg.mad, digits),
      `AUC ratio: CV` = fmt_med_mad(rel$Rel_CV.median, rel$Rel_CV.mad, digits),
      `AUC ratio: CommI` = fmt_med_mad(rel$Rel_CommI.median, rel$Rel_CommI.mad, digits),
      `AUC ratio: CommII` = fmt_med_mad(rel$Rel_CommII.median, rel$Rel_CommII.mad, digits),
      `AUC gap: Unreg` = fmt_med_mad(rel$Gap_Unreg.median, rel$Gap_Unreg.mad, digits),
      `AUC gap: CV` = fmt_med_mad(rel$Gap_CV.median, rel$Gap_CV.mad, digits),
      `AUC gap: CommI` = fmt_med_mad(rel$Gap_CommI.median, rel$Gap_CommI.mad, digits),
      `AUC gap: CommII` = fmt_med_mad(rel$Gap_CommII.median, rel$Gap_CommII.mad, digits),
      check.names = FALSE
    )
  }
}

latex_ready_table <- function(x, caption, label) {
  knitr::kable(x, format = "latex", booktabs = TRUE, caption = caption,
               label = label, escape = FALSE)
}

9.2 Linear regression results

linear_analysis <- analyze_results(linear_df, type = "linear")

Rows with finite oracle: 400 / 400 (100.0%) Oracle-dominance violations: 0 Extreme rows flagged: 0

cat("\n\n### Linear model: test MSE, median (MAD)\n\n")

9.2.1 Linear model: test MSE, median (MAD)

print(knitr::kable(metric_table(linear_analysis), align = "c"))
Scenario N Unregularized Oracle CV CommI CommII
block 100 1.521 (0.190) 1.431 (0.173) 1.470 (0.176) 1.745 (0.250) 6.593 (0.771)
grouped 100 1.568 (0.175) 1.389 (0.153) 1.421 (0.159) 1.425 (0.158) 1.584 (0.223)
sparse 100 1.545 (0.214) 1.513 (0.226) 1.532 (0.222) 1.545 (0.214) 1.545 (0.214)
toeplitz 100 1.500 (0.206) 1.394 (0.156) 1.429 (0.192) 1.419 (0.173) 2.355 (0.357)
cat("\n\n### Linear model: MSE ratio to oracle, median (MAD)\n\n")

9.2.2 Linear model: MSE ratio to oracle, median (MAD)

cat("Values near 1 are oracle-level; values larger than 1 are worse than oracle.\n\n")

Values near 1 are oracle-level; values larger than 1 are worse than oracle.

print(knitr::kable(relative_table(linear_analysis, type = "linear"), align = "c"))
Scenario N Unregularized CV CommI CommII
block 100 1.045 (0.038) 1.008 (0.008) 1.158 (0.094) 4.492 (0.851)
grouped 100 1.151 (0.091) 1.015 (0.015) 1.024 (0.023) 1.140 (0.090)
sparse 100 1.008 (0.008) 1.017 (0.016) 1.008 (0.008) 1.008 (0.008)
toeplitz 100 1.063 (0.053) 1.018 (0.018) 1.012 (0.011) 1.650 (0.241)
cat("\n\n### Linear model: selection status counts\n\n")

9.2.3 Linear model: selection status counts

print(knitr::kable(linear_analysis$status, align = "c"))
Scenario Method OK Flat AllNonfinite
toeplitz Oracle 100 0 0
toeplitz CV 100 0 0
toeplitz CommI 100 0 0
toeplitz CommII 100 0 0
block Oracle 100 0 0
block CV 100 0 0
block CommI 100 0 0
block CommII 100 0 0
sparse Oracle 100 0 0
sparse CV 100 0 0
sparse CommI 0 100 0
sparse CommII 0 100 0
grouped Oracle 100 0 0
grouped CV 100 0 0
grouped CommI 100 0 0
grouped CommII 100 0 0
cat("\n\n### Linear model: paired tests, commutator vs CV\n\n")

9.2.4 Linear model: paired tests, commutator vs CV

cat("Positive t-statistics mean the commutator has larger log-MSE than CV; negative values favor the commutator.\n\n")

Positive t-statistics mean the commutator has larger log-MSE than CV; negative values favor the commutator.

print(knitr::kable(linear_analysis$tests, digits = 4, align = "c"))
Scenario Method t p n
toeplitz CommI -4.7863 0.0000 100
toeplitz CommII 29.7105 0.0000 100
block CommI 10.8873 0.0000 100
block CommII 56.5521 0.0000 100
sparse CommI 0.4357 0.6640 100
sparse CommII 0.4357 0.6640 100
grouped CommI 1.1651 0.2468 100
grouped CommII 10.5282 0.0000 100

9.3 Logistic regression results

logistic_analysis <- analyze_results(logistic_df, type = "logistic")

Rows with finite oracle: 400 / 400 (100.0%) Oracle-dominance violations: 0 Extreme rows flagged: 0

cat("\n\n### Logistic model: test AUC, median (MAD)\n\n")

9.3.1 Logistic model: test AUC, median (MAD)

print(knitr::kable(metric_table(logistic_analysis), align = "c"))
Scenario N Unregularized Oracle CV CommI CommII
block 100 0.720 (0.039) 0.888 (0.029) 0.878 (0.030) 0.834 (0.033) 0.875 (0.032)
grouped 100 0.710 (0.043) 0.880 (0.030) 0.873 (0.034) 0.834 (0.036) 0.869 (0.032)
sparse 100 0.718 (0.055) 0.815 (0.042) 0.806 (0.041) 0.718 (0.055) 0.718 (0.055)
toeplitz 100 0.735 (0.045) 0.881 (0.026) 0.870 (0.027) 0.812 (0.040) 0.856 (0.031)
cat("\n\n### Logistic model: AUC ratio and AUC gap to oracle, median (MAD)\n\n")

9.3.2 Logistic model: AUC ratio and AUC gap to oracle, median (MAD)

cat("AUC ratios near 1 and gaps near 0 are oracle-level. AUC ratios below 1 are worse than oracle.\n\n")

AUC ratios near 1 and gaps near 0 are oracle-level. AUC ratios below 1 are worse than oracle.

print(knitr::kable(relative_table(logistic_analysis, type = "logistic"), align = "c"))
Scenario N AUC ratio: Unreg AUC ratio: CV AUC ratio: CommI AUC ratio: CommII AUC gap: Unreg AUC gap: CV AUC gap: CommI AUC gap: CommII
block 100 0.822 (0.048) 0.993 (0.005) 0.955 (0.028) 0.992 (0.006) 0.156 (0.044) 0.006 (0.004) 0.041 (0.025) 0.007 (0.005)
grouped 100 0.812 (0.048) 0.996 (0.004) 0.952 (0.022) 0.988 (0.007) 0.169 (0.042) 0.003 (0.003) 0.044 (0.018) 0.010 (0.006)
sparse 100 0.899 (0.043) 0.987 (0.009) 0.899 (0.043) 0.899 (0.043) 0.083 (0.035) 0.011 (0.007) 0.083 (0.035) 0.083 (0.035)
toeplitz 100 0.833 (0.053) 0.990 (0.006) 0.934 (0.034) 0.974 (0.016) 0.145 (0.045) 0.009 (0.006) 0.057 (0.027) 0.022 (0.014)
cat("\n\n### Logistic model: selection status counts\n\n")

9.3.3 Logistic model: selection status counts

print(knitr::kable(logistic_analysis$status, align = "c"))
Scenario Method OK Flat AllNonfinite
toeplitz Oracle 100 0 0
toeplitz CV 100 0 0
toeplitz CommI 100 0 0
toeplitz CommII 100 0 0
block Oracle 100 0 0
block CV 100 0 0
block CommI 100 0 0
block CommII 100 0 0
sparse Oracle 100 0 0
sparse CV 100 0 0
sparse CommI 0 100 0
sparse CommII 0 100 0
grouped Oracle 100 0 0
grouped CV 100 0 0
grouped CommI 100 0 0
grouped CommII 100 0 0
cat("\n\n### Logistic model: paired tests, commutator vs CV\n\n")

9.3.4 Logistic model: paired tests, commutator vs CV

cat("Positive t-statistics mean the commutator has higher AUC than CV; negative values favor CV.\n\n")

Positive t-statistics mean the commutator has higher AUC than CV; negative values favor CV.

print(knitr::kable(logistic_analysis$tests, digits = 4, align = "c"))
Scenario Method t p n
toeplitz CommI -12.0405 0.0000 100
toeplitz CommII -6.0473 0.0000 100
block CommI -10.0030 0.0000 100
block CommII -0.2519 0.8016 100
sparse CommI -14.6176 0.0000 100
sparse CommII -14.6176 0.0000 100
grouped CommI -11.3909 0.0000 100
grouped CommII -4.8783 0.0000 100

10 Diagnostics and sanity checks

if (exists("linear_analysis")) {
  cat("\n\n## Linear diagnostics\n\n")
  cat("Oracle dominance violations:", length(linear_analysis$oracle_violations), "\n\n")
  if (length(linear_analysis$oracle_violations) > 0) {
    print(head(linear_analysis$df_full[linear_analysis$oracle_violations, ], 10))
  }
  flat_sparse <- subset(linear_analysis$status, Scenario == "sparse" & grepl("Comm", Method))
  cat("Sparse P=I negative-control commutator statuses:\n\n")
  print(knitr::kable(flat_sparse, align = "c"))
}

10.1 Linear diagnostics

Oracle dominance violations: 0

Sparse P=I negative-control commutator statuses:

Scenario Method OK Flat AllNonfinite
11 sparse CommI 0 100 0
12 sparse CommII 0 100 0
if (exists("logistic_analysis")) {
  cat("\n\n## Logistic diagnostics\n\n")
  cat("Oracle dominance violations:", length(logistic_analysis$oracle_violations), "\n\n")
  if (length(logistic_analysis$oracle_violations) > 0) {
    print(head(logistic_analysis$df_full[logistic_analysis$oracle_violations, ], 10))
  }
  flat_sparse <- subset(logistic_analysis$status, Scenario == "sparse" & grepl("Comm", Method))
  cat("Sparse P=I negative-control commutator statuses:\n\n")
  print(knitr::kable(flat_sparse, align = "c"))
}

10.2 Logistic diagnostics

Oracle dominance violations: 0

Sparse P=I negative-control commutator statuses:

Scenario Method OK Flat AllNonfinite
11 sparse CommI 0 100 0
12 sparse CommII 0 100 0

Interpretation checks:

  • Oracle dominance violations should be zero or explainable by numerical non-convergence. If violations appear, do not use the summary tables for the paper until the offending rows are inspected.
  • In the sparse scenario, \(P=I\). The Type I and Type II commutator objectives should be flat or numerically zero. Flat selection statuses in this scenario are expected and indicate that the negative control is working.
  • For rank-deficient penalties, the nullity-adjusted balance should prevent the large-\(\lambda\) endpoint from being treated as \(\operatorname{df}=0\) when the true limit is \(\dim\ker(P)>0\).

11 Optional plots

if (has_ggplot2) {
  metric_long <- function(df) {
    metric_cols <- c("Metric_Unreg", "Metric_Oracle", "Metric_CV", "Metric_CommI", "Metric_CommII")
    M <- as.matrix(df[, metric_cols, drop = FALSE])
    data.frame(
      Model = rep(df$Model, each = length(metric_cols)),
      Scenario = rep(df$Scenario, each = length(metric_cols)),
      Method = rep(c("Unreg", "Oracle", "CV", "CommI", "CommII"), times = nrow(df)),
      Metric = as.vector(t(M)),
      stringsAsFactors = FALSE
    )
  }

  # Optional: Define a consistent, colorblind-friendly palette
  method_palette <- c(
    "Unreg" = "#999999",    # Gray
    "Oracle" = "#0072B2",   # Blue
    "CV" = "#009E73",       # Green
    "CommI" = "#D55E00",    # Orange
    "CommII" = "#CC79A7"    # Pink
  )

  if (exists("linear_analysis")) {
    p_linear <- ggplot2::ggplot(metric_long(linear_analysis$df_clean),
                                ggplot2::aes(x = Method, y = Metric, fill = Method)) +
      ggplot2::geom_boxplot(outlier.alpha = 0.25, outlier.size = 0.5) +
      ggplot2::scale_fill_manual(values = method_palette, guide = "none") +  # Hide legend (redundant with x-axis)
      ggplot2::facet_wrap(~ Scenario, scales = "free_y") +
      ggplot2::labs(title = "Linear model test MSE", y = "MSE", x = NULL) +
      ggplot2::theme_bw() +
      ggplot2::theme(
        strip.background = ggplot2::element_rect(fill = "gray90"),
        panel.grid.minor = ggplot2::element_blank()
      )
    print(p_linear)
  }

  if (exists("logistic_analysis")) {
    p_logistic <- ggplot2::ggplot(metric_long(logistic_analysis$df_clean),
                                  ggplot2::aes(x = Method, y = Metric, fill = Method)) +
      ggplot2::geom_boxplot(outlier.alpha = 0.25, outlier.size = 0.5) +
      ggplot2::scale_fill_manual(values = method_palette, guide = "none") +
      ggplot2::facet_wrap(~ Scenario, scales = "free_y") +
      ggplot2::labs(title = "Logistic model test AUC", y = "AUC", x = NULL) +
      ggplot2::theme_bw() +
      ggplot2::theme(
        strip.background = ggplot2::element_rect(fill = "gray90"),
        panel.grid.minor = ggplot2::element_blank()
      )
    print(p_logistic)
  }
} else {
  cat("Package ggplot2 is not installed; skipping plots.\n")
}

12 Export results

if (isTRUE(params$save_results)) {
  dir.create(params$output_dir, showWarnings = FALSE, recursive = TRUE)
  prefix <- file.path(params$output_dir, params$output_prefix)

  export_obj <- list(
    params = params,
    lambda_grid_eval = lambda_grid_eval,
    linear_df = if (exists("linear_df")) linear_df else NULL,
    logistic_df = if (exists("logistic_df")) logistic_df else NULL,
    linear_analysis = if (exists("linear_analysis")) linear_analysis else NULL,
    logistic_analysis = if (exists("logistic_analysis")) logistic_analysis else NULL
  )
  saveRDS(export_obj, paste0(prefix, ".rds"))

  if (exists("linear_df")) write.csv(linear_df, paste0(prefix, "_linear_raw.csv"), row.names = FALSE)
  if (exists("logistic_df")) write.csv(logistic_df, paste0(prefix, "_logistic_raw.csv"), row.names = FALSE)

  tex_lines <- character(0)
  if (exists("linear_analysis")) {
    tex_lines <- c(
      tex_lines,
      "% Linear model tables",
      capture.output(latex_ready_table(
        metric_table(linear_analysis),
        caption = "Linear-model test MSE by method, reported as median (MAD).",
        label = "tab:linear_metric_revised"
      )),
      "",
      capture.output(latex_ready_table(
        relative_table(linear_analysis, type = "linear"),
        caption = "Linear-model MSE ratio to oracle, reported as median (MAD). Values above 1 are worse than oracle.",
        label = "tab:linear_relative_revised"
      )),
      ""
    )
  }
  if (exists("logistic_analysis")) {
    tex_lines <- c(
      tex_lines,
      "% Logistic model tables",
      capture.output(latex_ready_table(
        metric_table(logistic_analysis),
        caption = "Logistic-model test AUC by method, reported as median (MAD).",
        label = "tab:logistic_metric_revised"
      )),
      "",
      capture.output(latex_ready_table(
        relative_table(logistic_analysis, type = "logistic"),
        caption = "Logistic-model AUC ratio and AUC gap to oracle, reported as median (MAD). Ratios near 1 and gaps near 0 are best.",
        label = "tab:logistic_relative_revised"
      )),
      ""
    )
  }
  writeLines(tex_lines, paste0(prefix, "_latex_tables.tex"))

  cat("Saved results to: ", normalizePath(params$output_dir), "\n\n", sep = "")
  cat("Files written with prefix: ", params$output_prefix, "\n", sep = "")
}

Saved results to: D:.dir_Books\2025_BiasVariance_2025_New_Commutator_2026_V2_Commutator_results

Files written with prefix: commutator_validation_revised

13 Result interpretation

The results above provide empirical validation of the commutator framework as a geometric diagnostic and low-cost tuning heuristic, complementing classical cross-validation selection.

The results pass the main internal checks: both linear and logistic experiments have \(400 / 400\) finite oracle rows, zero oracle-dominance violations, and zero extreme rows. The sparse (P=I) case also behaves exactly as a negative control: both Type I and Type II commutator selectors are flat in all 100 sparse replications, which is the expected outcome when (A) and (P) commute.

The simulation results are scientifically coherent. For linear regression, CV is consistently close to the oracle. Type I is competitive or better than CV in Toeplitz, statistically indistinguishable from CV in Grouped and Sparse, and worse in Block. Type II is unstable in the rank-deficient structured-penalty scenarios, especially Block and Toeplitz, while the Sparse equality is not a success but the expected flat (\(P=I\)) negative-control behavior.

For logistic regression, CV remains closest to the oracle overall. Type II is very close to CV in Block and reasonably close in Grouped/Toeplitz, but Type I is systematically worse than CV. In Sparse, both commutator methods collapse to the unregularized baseline because the commutator objective is flat for (\(P=I\)). This is theoretically correct and reflects a negative control setting, rather that a tuning failure.

The following decision rules support meaningful interpretation of the empirical results.

  1. The oracle-dominance count for each model. If it is nonzero, inspect those rows before using any performance tables.

  2. The sparse ridge scenario serves as a negative control. If the commutator selectors return flat_objective, that is expected because \(P=I\) commutes with every curvature matrix \(A\).

  3. For linear models, compare methods using MSE ratios to oracle. A ratio close to \(1\) means oracle-level performance, while values \(> 1\) indicate excess test error.

  4. For logistic models, compare methods using both AUC ratios and AUC gaps to oracle. Ratios close to \(1\) and gaps close to \(0\) indicate oracle-level performance.

  5. The paired tests are only as secondary evidence. Practical effect sizes in the median/MAD tables should drive the interpretation.

A conservative conclusion compatible with this validation design is:

The pipeline simulation results support Type I and Type II commutator scores as geometry-aware diagnostics for regularization tuning when the curvature and penalty operators do not commute. In commuting cases, such as spherical ridge penalties, the commutator score is correctly uninformative. Rank-deficient structured penalties require a nullity-adjusted balance factor, and Type II orderings should be interpreted with conditioning diagnostics. Therefore, the commutator selector is best viewed as a computationally-light complement to cross-validation rather than a universal replacement.

SOCR Resource Visitor number Web Analytics SOCR Email