| SOCR ≫ | DSPA ≫ | DSPA2 Topics ≫ |
This DSPA Appendix presents the foundations of causal inference in data science.
## [1] '3.9.0.2'
In previous DPSA chapters, we introduced many alternative model-based and model-free methods for supervised and unsupervised regression, classification, clustering and forecasting.
In model-based supervised problems, linear regression and correlation analyses play important roles. However, establishing a correlation relation between a pair of features does not necessarily indicate that changes in one of the variables drives the behavior of the other, i.e., causation is not necessarily guaranteed by establishing correlation.
Let’s look at two simple examples.
Example 1: In children, shoe-size is heavily correlated with outcomes on math tests, however, an increase of neither variable causes the rise (or fall) of the other. Shoe-size and cognition move synchronously up and down without a direct causal effect of one on the other.
Example 2: Let’s show a simulation of an explicit causal relation that is undetectable by correlation analysis.
n <- 1024
t <- seq(from=1, to=n, by=1)
x1 <- t; x2 <- t;
for (i in 2:n) {
x1[i] <- 0.5*x1[i-1] + rnorm(1, mean = 0, sd = 1)
x2[i] <- 0.4*x1[i-1]*x1[i-1] + rnorm(1, mean = 0, sd = 1)
}
x1_ord <- seq(from=min(x1), to=max(x1), length.out=n)
x2_ord <- x1_ord
x2_ord[1] <- (2/5)*x1_ord[1]*x1_ord[1]
for (i in 2:n) {
x2_ord[i] <- (2/5)*x1_ord[i-1]*x1_ord[i-1]
}
# plot(x1,x2)
fig <- plot_ly(x=~x1, y=~x2, type="scatter",
name = 'data', mode='markers')
fig <- fig %>% add_trace(
x=~x1_ord, y = ~x2_ord, name = 'model',
mode='lines', line = list(color = 'red',
width = 2, dash = 'dash'))
figThe graph does not illustrate a clear \(x_1\) to \(x_2\) relationship - correlational or causal. At the same time, the apparent stochasticity in system hides the simple (quadratic) causal relationship defined by:
\[x_1(n) = \frac{1}{2}x_1(n-1) + \epsilon_1; \ x_1(1)=1, \ 1\leq n \leq 1024,\]
\[x_2(n) = \frac{2}{5}x_1^2(n-1) + \epsilon_2; \ \epsilon_1,\epsilon_2 \sim N(0,1).\]
Using any (multivariate) observational data, we will showcase how to use computational statistics, data science, information theory, and causal inference to uncover such hidden causal relationships.
The study of causality has been shaped by several foundational frameworks. Granger and Wiener developed a predictive notion of causality for time series, while Pearl developed an interventional framework based on structural models and graphs. A process \(X\) is (Granger-)causally related to another process \(Y\), when predicting prospective \(Y\) realizations solely based on past \(Y\) records can be improved based on the past information from the first process \(X\) along with past knowledge about \(Y\).
It is important to distinguish the two major paradigms of causal reasoning explored in modern statistics:
Predictive (Granger-Wiener) Causality: \(X\) causes \(Y\) if past values of \(X\) help predict future values of \(Y\) above and beyond past values of \(Y\) itself. This is an observational criterion — it relies on statistical associations in the data and tests whether one time series carries incremental information about another. It does not require a mechanistic model of how \(X\) influences \(Y\).
Interventional (Pearl) Causality: Judea Pearl’s framework is built on Structural Causal Models (SCMs), Directed Acyclic Graphs (DAGs), and the do-calculus. Here, \(X\) causes \(Y\) if intervening on \(X\) (setting \(X = x\) by external manipulation) changes the distribution of \(Y\). This is captured by the do-operator: \(P(Y \mid do(X = x))\), which represents the distribution of \(Y\) when \(X\) is forcibly set to \(x\). This is fundamentally different from the observational conditional \(P(Y \mid X = x)\), which may be distorted by confounders.
The critical distinction is that Granger causality asks “Does \(X\) help predict \(Y\)?”, whereas Pearl’s framework asks “What happens to \(Y\) if we manipulate \(X\)?” These are different questions, and they can yield different answers — especially when unmeasured confounders are present. In this module, we focus primarily on Granger and transfer-entropy methods, but students should be aware that these detect predictive causal relationships, not interventional ones.
Granger causality tests whether past values of \(X\) improve the prediction of \(Y\). However, this does not guarantee a direct mechanistic causal link from \(X\) to \(Y\). A critical threat is the unmeasured confounder: if an unobserved variable \(W\) drives both \(X\) and \(Y\), but \(X\) reacts faster than \(Y\), then \(X\) will appear to “Granger-cause” \(Y\) even though there is no direct causal link between them.
For example, suppose a central bank policy change \(W\) simultaneously affects interest rates \(X\) and stock prices \(Y\). If interest rates adjust within a day but stock prices take a week to fully respond, past interest rates will help predict future stock prices — yielding a spurious Granger-causal finding. Transfer entropy, while model-free and sensitive to nonlinear relationships, is equally susceptible to this confounding issue because it too operates on observational data. Only methods that explicitly model the confounders — such as Pearl’s do-calculus with appropriate DAGs — can properly address this problem. When interpreting Granger-causality or transfer entropy results, always consider whether an unmeasured common driver might explain the observed predictive relationship.
Assume we have three time-varying processes indexed by time \(t\geq 0\):
The complete information up to time \(t\) for a random process will be denoted by \(X^t=\cup_{s<t}{ \{X_s\}}\). Also, for time \(t+1\), let’s denote by:
\[e_{t+1}= \underbrace{{Y}_{t+1}}_{actual}- \underbrace{\hat{Y}_{t+1}}_{predicted},\]
\[\underbrace{R(Y^{t+1}| Y^t,Z^t)}_{given\ only\ Y^t} =E(g(e_{t+1}))=E(g({Y}_{t+1}-\hat{Y}_{t+1}))=E(g({Y}_{t+1}-f(Y^t,Z^t))),\] \[\underbrace{R(Y^{t+1}| X^t,Y^t,Z^t)}_{given\ both\ X^t\ and\ Y^t} =E(g(e_{t+1}))=E(g({Y}_{t+1}-\hat{Y}_{t+1}))=E(g({Y}_{t+1}-f(X^t,Y^t,Z^t))).\]
As with other model-based techniques, e.g., linear models, the estimate \(\hat{Y}_{t+1}=f(X^t,Y^t,Z^t)\) is computed by estimating (fitting) the function \(f\) that optimizes (minimizes) the expected loss, \(R(Y^{t+1}| Y^t,Z^t)\) or \(R(Y^{t+1}| X^t,Y^t,Z^t)\).
Granger-causality: The process \(X\) is not Granger-causal to the process \(Y\), relative to the side information \(Z\), if and only if: \[\underbrace{R(Y^{t+1}| Y^t,Z^t)}_{independent\ of\ X^t} = R(Y^{t+1}| X^t,Y^t,Z^t).\]
For example, a linear vector-autoregressive model function \(f\) with a number of lags \(l\) may look like:
\[\hat{Y}(t)=\underbrace{f(Y^t)}_{independent\ of\ X^t}=\beta_o + \sum_{s=1}^l{\beta_s Y(t-s)+\epsilon_t}, \ \ \ \ (1)\] \[\tilde{Y}(t)=\underbrace{f(X^t,Y^t)}_{dependent\ on\ X^t}=\tilde{\beta}_o + \sum_{s=1}^l{\tilde{\beta}_s Y(t-s)+ \sum_{s=1}^l{\tilde\gamma}_s X(t-s) + \tilde{\epsilon}_t}. \ \ \ \ (2)\]
Similarly, one can choose alternative models to estimate the expected prediction errors based on deep learning, neural networks, support vector machines, random forest, or any other supervised technique to fit the model.
In each situation, the variable \(X\) does not Granger-cause \(Y\) if and only if the expected prediction errors of \(X\) in the restricted \(\hat{Y}(t)\) and unrestricted \(\tilde{Y}(t)\) models are equal. In other words, the two models are statistically indistinguishable and any differences in prediction are only due to random factors (noise).
One parametric test for assessing a Granger-causality test is the one-way analysis of variance (one-way ANOVA) test to compare the residuals of the restricted and unrestricted models, i.e., compare the statistical differences of the residuals. When performing multiple such tests, e.g., in contrasting multiple \(l\) lags, one needs to adjust the resulting p-values by correcting for multiple hypothesis tests, using false discovery rate (FDR), family-wise error rate (FWER), or Bonferroni correction.
We can also formulate a more general probabilistic definition of causality using conditional probability, \(p(. | .)\). This approach avoids the need to explicitly define a priori prediction function models in advance.
Recall that \(Y_{t+1}\) and \(X^t\) are (statistically) independent given the past information \((X^t,Y^t)\) if and only if \[p(Y_{t+1}|X^t,Y^t, Z^t)=p(Y_{t+1}|Y^t, Z^t).\] In other words, knowing or ignoring past information, \(X^t\), about the process \(X\) does not affect the probability distribution of the future outcome, \(Y_{t+1}\).
Probability-based Granger causality: The process \(X\) does not Granger-cause the process \(Y\), relative to some side information \(Z\), if and only if \(Y_{t+1} \perp X^t\ |\ Y^t,Z^t\). That is, given the past information about \(Y^t\) and \(Z^t\), \(Y_{t+1}\) is independent of (orthogonal to) \(X^t\).
Probabilistic Granger causality does not require an explicit function model coupling the two processes \(X\) and \(Y\). However, it still requires a strategy to compute the conditional dependence of \(Y_{t+1}\) on the other previously observed variables, \(p(Y_{t+1}|X^t,Y^t, Z^t)\).
Now, we will connect Granger causality with information theory.
Reviewing some of the information theory details in the DSPA Information Theory and Statistical Learning Appendix may be useful before proceeding further in this section.
For a bivariate process \(W=(X,Y)\), let’s denote the pair of marginal and the joint distribution functions by \(p_X(x)\), \(p_Y(y)\), and \(p_{X,Y}(x,y)\). In the discrete and continuous cases, the joint (Shannon/Differential) entropy between the univariate processes \(X\) and \(Y\) is defined by:
\[\underbrace{H(X,Y)}_{Shannon\ entropy}=-\sum_{x\in X}{\sum_{y\in Y}{p_{X,Y}(x,y)\log ( p_{X,Y}(x,y)) }},\]
\[\underbrace{h(X,Y)}_{Differential\ entropy}=-\int_{x\in \Omega_X}{\int_{y\in \Omega_Y}{p_{X,Y}(x,y)\log ( p_{X,Y}(x,y)) }}dxdy,\]
The uncertainty of \(Y\) given a specific realization \(X\) is captured by the conditional entropy :
\[H(Y|X) = H(X,Y) - H(X),\]
where the entropy of the single variable \(X\) is:
\[H(X)=-\sum_{x\in X}{p_{X}(x)\log ( p_{X}(x)) }.\]
The Granger causality can be computed using transfer entropy, which detects directional and dynamical information without assuming any particular function modeling process interactions.
Transfer entropy quantifies the amount of directed information transfer between the random processes \(X\) and \(Y\). As a non-parametric statistic, the transfer entropy measures the reduction of uncertainty in prospective values of \(Y\) given the past values of \(X\) and \(Y\).
The transfer entropy is the following difference of conditional entropies:
\[T_{X\rightarrow Y\ |\ Z}=H\left (\underbrace{Y_{t+1}}_{future}\ |\ \underbrace{Y^t, Z^t}_{past}\right ) - H\left (\underbrace{Y_{t+1}}_{future}\ |\ \underbrace{X^t, Y^t, Z^t}_{past}\right ),\] \[T_{X\rightarrow Y\ |\ Z}=\left (H(Y^t, X^t) - H(Y_{t+1},Y^t, X^t)\right )- \left (H(Y_{t+1}) - H(Y_{t+1},Y^t)\right ).\]
where \(H(X)\) is Shannon entropy of \(X\). Transfer entropy can also be considered as conditional mutual information:
\[T_{X\rightarrow Y\ |\ Z}=I(Y_{t+1}\ ;\ X^t\ |\ Y^t,Z^t),\]
where
\[I(Y_{t+1}\ ;\ X^t\ |\ Y^t,Z^t)=\] \[\sum_{(y^t,z^t)\in \Omega_Y^t\times \Omega_Z^t}{p_{(Y^t,Z^t)}(y^t,z^t)\sum_{x^t\in \Omega_X^t}{\sum_{y_{t+1}\in \Omega_Y^{t+1}}{ \left (p_{(Y_{t+1},X^t|Y^t,Z^t)}(y_{t+1},x^t|y^t,z^t) \times \log\left ( \frac{p_{(Y_{t+1},X^t| Y^t,Z^t)}(y_{t+1},x^t|y^t,z^t)} {p_{(Y_{t+1}| Y^t,Z^t)}(y_{t+1}|y^t,z^t)\times p_{(X^t| Y^t,Z^t)}(x^t|y^t,z^t)} \right ) \right ) }}}=\]
\[\sum_{(y^t,z^t)\in \Omega_Y^t\times \Omega_Z^t}{\sum_{x^t\in \Omega_X^t}{\sum_{y_{t+1}\in \Omega_Y^{t+1}}{ \left (p_{(Y_{t+1},X^t,Y^t,Z^t)}(y_{t+1},x^t,y^t,z^t) \times \log\left ( \frac{p_{(Y^t,Z^t)}(y^t,z^t)\times p_{(Y_{t+1},X^t,Y^t,Z^t)}(y_{t+1},x^t,y^t,z^t)} {p_{(Y_{t+1}, Y^t,Z^t)}(y_{t+1},y^t,z^t)\times p_{(X^t,Y^t,Z^t)}(x^t,y^t,z^t)} \right ) \right ) }}}.\]
In terms of transfer entropy, \(X\) does not cause \(Y\), relative to side information \(Z\), if and only if \(T_{X\rightarrow Y, Z^t}=0.\), i.e.,
\[H(Y_{t+1}\ |\ Y^t, Z^t)=H(Y_{t+1}\ |\ X^t,Y^t, Z^t).\]
As the transfer entropy is not a symmetric function, \(T_{X\rightarrow Y\ |\ Z}\not=T_{Y\rightarrow X\ |\ Z}\), it allows us to infer directionality of the information flow, i.e., causality. More specifically, we can define the net information flow measure, \(F_{X\rightarrow Y\ |\ Z}\), as:
\[F_{X\rightarrow Y\ |\ Z}=T_{X\rightarrow Y\ |\ Z}-T_{Y\rightarrow X\ |\ Z}.\]
The net information flow measure, \(F_{X\rightarrow Y\ |\ Z}\), quantifies the dominant direction of the information flow; positive values indicate a dominant information flow from \(X\longrightarrow Y\), rather than the opposite direction, and similarly, negative values indicate a reversed dominant information flow from \(Y\longrightarrow X\). Hence, it suggests which process yields more predictive information about the other.
In the special case of exploring vector auto-regressive processes, such as the ones we showed in equations (1) and (2) above, the transfer entropy reduces to Granger causality, see this reference. Using the classical \(g=||.||_{2}\) norm as the loss function and based on the linear vector-autoregressive model function \(f\), see equations (1) and (2), Granger-causality of a bivariate process can be expressed as:
\[GrangerCausality_{\{X\rightarrow Y\}}=\log \left ( \frac{Var(\epsilon_t)}{Var(\hat{\epsilon}_t)}\right ).\]
This explicit connection between the transfer entropy and the linear Granger-causality facilitates the estimation of the transfer entropy, \(T_{X\rightarrow Y\ |\ Z}\), and the net information flow metric, \(F_{X\rightarrow Y\ |\ Z}\).
Let’s look at a pair of linear and non-linear synthetic data examples.
set.seed(1234)
# initialization (Gaussian noise)
n <- 10000
x1 <- x2 <- x3 <- x4 <- x5 <- x6 <- rep(0, n + 1)
# instantiation (linear relationships)
for (i in 2:(n + 1)) {
x1[i] <- 0.8 * sqrt(3)* x1[i-1] - 0.8*x1[i-1] + rnorm(1, mean=0, sd=1)
x2[i] <- 0.5*x1[i-1] + rnorm(1, mean=0, sd=1)
x3[i] <- -0.3*x1[i-1] + rnorm(1, mean=0, sd=1)
x4[i] <- -0.6*x1[i-1] + 0.3*sqrt(3)*x4[i-1] + 0.2*sqrt(3)*x5[i-1] + rnorm(1, mean=0, sd=1)
x5[i] <- -0.2*sqrt(3)*x4[i-1] + 0.2*sqrt(3)*x5[i-1] + rnorm(1, mean=0, sd=1)
x6[i] <- 0.3*sqrt(5)*x3[i-1] + 0.4*sqrt(3)*x4[i-1] + rnorm(1, mean=0, sd=1)
}
x1 <- x1[-1]; x2 <- x2[-1]; x3 <- x3[-1]
x4 <- x4[-1]; x5 <- x5[-1]; x6 <- x6[-1]
lin.system <- data.frame(x1, x2, x3, x4, x5, x6)
# Ground-truth causal graph (known from the simulation equations)
library(igraph)
g_ground_truth <- graph(c(1,1, 1,2, 1,3, 1,4, 4,4, 5,4, 4,5, 5,5, 3,6, 4,6), n=6)
V(g_ground_truth)$label <- c("x1","x2","x3","x4","x5","x6")
plot(g_ground_truth, main="Ground-Truth Causal Graph (Linear Simulation)",
vertex.label.color="black", vertex.size=15, edge.arrow.size=0.5)According to the definition above, \(linearGC_{\{X\rightarrow Y\}}=\log \left (
\frac{Var(\epsilon_t)}{Var(\hat{\epsilon}_t)}\right )\), we
define a function, linearGC(), to compute the linear
Granger-causality for a bivariate process \((X,Y)\).
linearGC <- function(X, Y){
n<-length(X)
X.past <- X[1:(n-1)]
Y.past <- Y[1:(n-1)]
Y.future <- Y[2:n]
regression.uni <- lm(Y.future ~ Y.past)
regression.mult <- lm(Y.future ~ Y.past + X.past)
var.eps.uni <- (summary(regression.uni)$sigma)^2
var.eps.mult <- (summary(regression.mult)$sigma)^2
linGC <- log(var.eps.uni/var.eps.mult)
return(linGC)
}Granger causality hinges on the idea that including past values of \(X\) reduces the prediction error for \(Y\). Let’s make this tangible by comparing the residuals of the restricted model (using only \(Y\)’s own past) with those of the unrestricted model (using both \(X\)’s and \(Y\)’s pasts) for a pair where we know a causal link exists: \(x_1 \rightarrow x_2\).
library(plotly)
# Extract the known causal pair: x1 -> x2
X <- lin.system$x1
Y <- lin.system$x2
n_len <- length(X)
X.past <- X[1:(n_len-1)]
Y.past <- Y[1:(n_len-1)]
Y.future <- Y[2:n_len]
# Fit restricted (Y past only) and unrestricted (X + Y past) models
reg_restricted <- lm(Y.future ~ Y.past)
reg_unrestricted <- lm(Y.future ~ Y.past + X.past)
resid_restricted <- residuals(reg_restricted)
resid_unrestricted <- residuals(reg_unrestricted)
# Show a subset of residuals for visual clarity (first 500 time steps)
idx <- 1:500
fig_resid <- plot_ly() %>%
add_lines(x = idx, y = resid_restricted[idx], name = "Restricted (Y past only)",
line = list(color = 'red', width = 1)) %>%
add_lines(x = idx, y = resid_unrestricted[idx], name = "Unrestricted (X+Y past)",
line = list(color = 'blue', width = 1)) %>%
layout(title = "Residual Comparison: x1 → x2 (first 500 steps)<br>Red = Restricted, Blue = Unrestricted",
xaxis = list(title = "Time"), yaxis = list(title = "Residual"))
fig_resid# Compare residual distributions
var_restricted <- var(resid_restricted)
var_unrestricted <- var(resid_unrestricted)
cat("Restricted model residual variance: ", round(var_restricted, 4), "\n")## Restricted model residual variance: 1.3333
## Unrestricted model residual variance: 1.0023
cat("Variance reduction ratio (restricted/unrestricted):",
round(var_restricted / var_unrestricted, 4), "\n")## Variance reduction ratio (restricted/unrestricted): 1.3302
## Linear GC (log ratio): 0.2853
# Overlay histograms of the residual distributions
fig_hist <- plot_ly() %>%
add_histogram(x = resid_restricted, name = "Restricted",
marker = list(color = 'rgba(255,0,0,0.4)'),
nbinsx = 80) %>%
add_histogram(x = resid_unrestricted, name = "Unrestricted",
marker = list(color = 'rgba(0,0,255,0.4)'),
nbinsx = 80) %>%
layout(title = "Residual Distributions: x1 → x2",
xaxis = list(title = "Residual Value"),
yaxis = list(title = "Count"),
barmode = "overlay")
fig_histNotice how the unrestricted model’s residuals (blue) are noticeably tighter around zero compared to the restricted model’s residuals (red). The variance reduction is the essence of Granger causality — by including past \(x_1\) values, we meaningfully improve the prediction of \(x_2\).
Next we employ the function RTransferEntropy::calc_te()
to estimate the pairwise information flow among the 6-variate synthetic
data process \((X_1, \cdots,X_6)\).
# install.packages('RTransferEntropy')
library(future)
library(RTransferEntropy)
# Use multisession for cross-platform compatibility (multicore is faster on Unix/Mac)
plan(multisession)
## Estimate the Granger-causality and the transfer entropy, based on linearGC()
n_idx <- seq(1:ncol(lin.system))
# apply the GC and TE estimating functions to all possible pairs of columns in the DF
ff.GC.value <- function(a, b) linearGC(lin.system[,a], lin.system[,b])
GC.matrix <- outer(n_idx, n_idx, Vectorize(ff.GC.value))
# Note: calc_te can be slow for large datasets; reduce shuffles for speed if needed
ff.TE.value <- function(a, b) calc_te(lin.system[,a], lin.system[,b], shuffles = 100)
TE.matrix <- outer(n_idx, n_idx, Vectorize(ff.TE.value))
str(TE.matrix)## num [1:6, 1:6] 0 0.00248 0.00201 0.00246 0.00088 ...
To compare Granger-causality and Transfer Entropy on a common scale, we normalize each matrix by its maximum value so that all entries fall between 0 and 1.
# Normalize matrices to [0, 1] by dividing by the maximum value
GC.norm <- GC.matrix / max(GC.matrix)
TE.norm <- TE.matrix / max(TE.matrix)
# 6x6 Granger-Causality and Transfer Entropy matrices for 6-variate simulation
library(corrplot)
corrplot(GC.norm, method = "circle", is.corr = FALSE,
title = "Granger Causality (Linear Model, Normalized)",
tl.cex = 0.9, tl.col = 'black', mar=c(1, 1, 1, 1))corrplot(TE.norm, method = "circle", is.corr = FALSE,
title = "Transfer Entropy (Linear Model, Normalized)",
tl.cex = 0.9, tl.col = 'black', mar=c(1, 1, 1, 1))The linear Granger-causality and the non-linear transfer entropy show similar results. Thus, both techniques capture analogous dependencies of the 6-variate process. This congruence is natural due to the simple linear associations between the variables in this simulation.
Recall that the net information flow \(F_{X\rightarrow Y} = T_{X\rightarrow Y} - T_{Y\rightarrow X}\) quantifies the dominant direction of information transfer. Positive values indicate the dominant flow is from \(X\) to \(Y\); negative values indicate the reverse. Let us compute and visualize this for the linear simulation.
# Compute net information flow: F_{X->Y} = T_{X->Y} - T_{Y->X}
# The transpose of TE.matrix gives T_{Y->X} for each (X,Y) entry
NetFlow.matrix <- TE.matrix - t(TE.matrix)
rownames(NetFlow.matrix) <- colnames(NetFlow.matrix) <- var.names
# Round for display
print("Net Information Flow Matrix (rows -> columns):")## [1] "Net Information Flow Matrix (rows -> columns):"
## x1 x2 x3 x4 x5 x6
## x1 0.0000 0.0346 0.0159 0.0482 0.0045 0.0135
## x2 -0.0346 0.0000 0.0006 -0.0016 0.0010 0.0072
## x3 -0.0159 -0.0006 0.0000 0.0003 0.0001 0.0476
## x4 -0.0482 0.0016 -0.0003 0.0000 0.0351 0.0843
## x5 -0.0045 -0.0010 -0.0001 -0.0351 0.0000 -0.0082
## x6 -0.0135 -0.0072 -0.0476 -0.0843 0.0082 0.0000
# Visualize with a diverging color palette (red = reverse flow, blue = forward flow)
corrplot(NetFlow.matrix, method = "circle", is.corr = FALSE,
col = COL2('RdBu', 20),
title = "Net Information Flow (Linear Simulation)\nPositive = Row -> Col, Negative = Col -> Row",
tl.cex = 0.9, tl.col = 'black', mar=c(1, 1, 2, 1))Interpretation: The large positive value at row x1,
column x2 confirms the dominant information flow from \(x_1\) to \(x_2\), consistent with the ground-truth
causal structure \(x_1 \rightarrow
x_2\). Similarly, the strong positive values from \(x_1\) to \(x_3\) and \(x_1\) to \(x_4\) match the known simulation
equations.
Rather than hardcoding the network edges by hand, we can dynamically construct a causal graph from the computed Transfer Entropy matrix by applying a threshold. Edges where the normalized TE exceeds the threshold are retained, providing a data-driven visualization of the inferred causal structure.
# Dynamically generate graph from the normalized TE matrix
TE.norm.diag <- TE.norm
diag(TE.norm.diag) <- 0 # remove self-loops for cleaner visualization
# Choose a threshold (e.g., keep edges where normalized TE > 0.3)
threshold <- 0.3
adj_matrix <- ifelse(TE.norm.diag > threshold, TE.norm.diag, 0)
# Build igraph object from adjacency matrix
g_dynamic <- graph_from_adjacency_matrix(adj_matrix, mode = "directed",
weighted = TRUE, diag = FALSE)
V(g_dynamic)$label <- var.names
E(g_dynamic)$width <- E(g_dynamic)$weight * 5
plot(g_dynamic,
main = paste0("Dynamic TE Network (Linear Simulation, threshold = ", threshold, ")"),
layout = layout_with_kk,
edge.arrow.size = 0.7,
vertex.label.color = "black",
vertex.size = 18,
vertex.color = "lightblue")Compare this data-driven network with the ground-truth graph shown earlier. The dynamically constructed graph successfully recovers the major causal pathways (e.g., \(x_1 \rightarrow x_2\), \(x_1 \rightarrow x_3\), \(x_4 \leftrightarrow x_5\), \(x_4 \rightarrow x_6\)) from the data alone.
SUmmary: In the linear simulation, \(x_1\) influences \(x_2\), \(x_3\), and \(x_4\), but \(x_2\) does not influence any other variable. How does this show up in the Granger Causality and Transfer Entropy matrices? What would happen to the GC value if we added a new variable \(x_7\) that was driven purely by noise (independent of all other variables)?
Next we will introduce non-linear (quadratic) relations in the 6-variate process we considered earlier.
set.seed(5678)
# initialization (Gaussian noise)
n <- 10000
x1 <- x2 <- x3 <- x4 <- x5 <- x6 <- rep(0, n + 1)
# instantiation (nonlinear relationships)
for (i in 2:(n + 1)) {
x1[i] <- 0.8 * sqrt(3)* x1[i-1] -0.9*x1[i-1] + rnorm(1, mean=0, sd=1)
x2[i] <- 0.5*x1[i-1]^2 + rnorm(1, mean=0, sd=1)
x3[i] <- -0.5*x1[i-1] + rnorm(1, mean=0, sd=1)
x4[i] <- -0.5*x1[i-1]^2 + 0.3*sqrt(3)*x4[i-1] + 0.3*sqrt(2)*x5[i - 1] +
rnorm(1, mean=0, sd=1)
x5[i]<- -0.3*sqrt(3)*x4[i-1] + 0.3*sqrt(3)*x5[i-1] + rnorm(1, mean=0, sd=1)
x6[i]<- 0.3*sqrt(3)*x4[i-1]^2 + 0.25*sqrt(3)*x5[i-1]^2 + rnorm(1, mean=0, sd=1)
}
x1 <- x1[-1]; x2 <- x2[-1]; x3 <- x3[-1]
x4 <- x4[-1]; x5 <- x5[-1]; x6 <- x6[-1];
nl.system <- data.frame(x1, x2, x3, x4, x5, x6)
# Ground-truth causal graph (known from the simulation equations)
g_nl_truth <- make_empty_graph(n = 6) %>%
# Linear relations in gray
add_edges(c(1,1, 1,3, 4,4, 5,4, 4,5, 5,5)) %>%
set_edge_attr("color", value = "gray") %>%
# (Quadratic) non-linear relations in green
add_edges(c(1,2, 1,4, 4,6, 5,6), color = "green")
V(g_nl_truth)$label <- c("x1","x2","x3","x4","x5","x6")
plot(g_nl_truth, main=paste("Ground-Truth Causal Graph (Nonlinear Simulation)",
"linear (gray) and Nonlinear (green) relations", sep="\n"),
vertex.label.color="black", vertex.size=15, edge.arrow.size=0.5)Similarly to the linear case, we can compute and display the Granger-causality and the transfer entropy measures for this 6-variate nonlinear simulated process.
## Enable parallel computing
plan(multisession)
## Estimate the Granger-causality and the transfer entropy, based on linearGC()
n_idx <- seq(1:ncol(nl.system))
# apply the GC and TE estimating functions to all possible pairs of columns in the DF
nl.ff.GC.value <- function(a, b) linearGC(nl.system[,a], nl.system[,b])
nl.GC.matrix <- outer(n_idx, n_idx, Vectorize(nl.ff.GC.value))
nl.ff.TE.value <- function(a, b) calc_te(nl.system[,a], nl.system[,b], shuffles = 100)
nl.TE.matrix <- outer(n_idx, n_idx, Vectorize(nl.ff.TE.value))
var.names <- c("x1", "x2", "x3", "x4", "x5", "x6")
rownames(nl.TE.matrix) <- colnames(nl.TE.matrix) <- var.names
rownames(nl.GC.matrix) <- colnames(nl.GC.matrix) <- var.names
# Normalize matrices to [0, 1]
nl.GC.norm <- nl.GC.matrix / max(nl.GC.matrix)
nl.TE.norm <- nl.TE.matrix / max(nl.TE.matrix)
corrplot(nl.GC.norm, method = "circle", is.corr = FALSE,
title = "Granger Causality (Nonlinear Model, Normalized)",
tl.cex = 0.9, tl.col = 'black', mar=c(1, 1, 1, 1))corrplot(nl.TE.norm, method = "circle", is.corr = FALSE,
title = "Transfer Entropy (Nonlinear Model, Normalized)",
tl.cex = 0.9, tl.col = 'black', mar=c(1, 1, 1, 1))Notice the key difference: the linear Granger-causality plot largely misses the nonlinear (quadratic) causal relationships — for instance, the \(x_1 \rightarrow x_2\) link (driven by \(x_1^2\)) appears weak or absent in the GC matrix but is clearly captured by Transfer Entropy. This demonstrates that TE is sensitive to nonlinear dependencies that linear GC cannot detect.
# Compute net information flow for the nonlinear system
nl.NetFlow.matrix <- nl.TE.matrix - t(nl.TE.matrix)
rownames(nl.NetFlow.matrix) <- colnames(nl.NetFlow.matrix) <- var.names
print("Net Information Flow Matrix (Nonlinear Simulation, rows -> columns):")## [1] "Net Information Flow Matrix (Nonlinear Simulation, rows -> columns):"
## x1 x2 x3 x4 x5 x6
## x1 0.0000 0.1097 0.0399 0.0791 0.0034 0.0066
## x2 -0.1097 0.0000 0.0013 0.0042 0.0233 0.0419
## x3 -0.0399 -0.0013 0.0000 0.0004 0.0013 0.0042
## x4 -0.0791 -0.0042 -0.0004 0.0000 0.0583 0.1000
## x5 -0.0034 -0.0233 -0.0013 -0.0583 0.0000 0.0084
## x6 -0.0066 -0.0419 -0.0042 -0.1000 -0.0084 0.0000
corrplot(nl.NetFlow.matrix, method = "circle", is.corr = FALSE,
col = COL2('RdBu', 20),
title = "Net Information Flow (Nonlinear Simulation)\nPositive = Row->Col, Negative = Col->Row",
tl.cex = 0.9, tl.col = 'black', mar=c(1, 1, 2, 1))# Dynamically generate graph from the normalized TE matrix (nonlinear system)
nl.TE.norm.diag <- nl.TE.norm
diag(nl.TE.norm.diag) <- 0
threshold_nl <- 0.3
adj_matrix_nl <- ifelse(nl.TE.norm.diag > threshold_nl, nl.TE.norm.diag, 0)
g_nl_dynamic <- graph_from_adjacency_matrix(adj_matrix_nl, mode = "directed",
weighted = TRUE, diag = FALSE)
V(g_nl_dynamic)$label <- var.names
E(g_nl_dynamic)$width <- E(g_nl_dynamic)$weight * 5
plot(g_nl_dynamic,
main = paste0("Dynamic TE Network (Nonlinear Simulation, threshold = ", threshold_nl, ")"),
layout = layout_with_kk,
edge.arrow.size = 0.7,
vertex.label.color = "black",
vertex.size = 18,
vertex.color = "lightgreen")Questions: Why is Transfer Entropy asymmetric, i.e., why is \(T_{X \rightarrow Y} \neq T_{Y \rightarrow X}\) in general? What does a positive net information flow \(F_{X \rightarrow Y} > 0\) tell us that the raw transfer entropy \(T_{X \rightarrow Y}\) alone does not? In the nonlinear simulation, which causal links does Granger causality miss that Transfer Entropy captures, and why?
Let’s use the complete monthly SOCR Macroecon Market Data for the US (1979 - 2020) to illustrate causality.
We will begin by loading and plotting the longitudinal dataset along with some DJIA and NASDAQ (linear) models.
library(magrittr)
# load SOCR data
US_data <- read.csv("https://umich.instructure.com/files/20026184/download?download_frd=1", header = TRUE)
# order dates chronologically
US_data_ord <- US_data[order(as.Date(US_data$Date, format = "%m/%d/%Y")), ]
US_data_ord$Date <- as.Date(US_data$Date, format = "%m/%d/%Y")
str(US_data_ord) ## 'data.frame': 499 obs. of 17 variables:
## $ Date : Date, format: "1979-01-01" "1979-02-01" ...
## $ US_Debt_B : num 797 797 797 805 805 ...
## $ US_GDP_B : num 6742 6742 6742 6749 6749 ...
## $ Debt2GDP_Pct : num 31.5 31.5 31.5 31.1 31.1 ...
## $ US_CapacityUtilizationRate: num 85.9 86.2 86.2 85.1 85.5 ...
## $ Jobless_Claims : int 2393000 2451000 2335000 2351000 2246000 2282000 2375000 2471000 2426000 2522000 ...
## $ Industrial_Production : num 53.3 53.6 53.7 53.2 53.6 ...
## $ Total_Reserves_B : num 43.1 40.7 40.2 40.7 40.2 40.1 40.9 40.7 41.1 42.3 ...
## $ Monetary_Base_B : num 144 142 142 144 144 ...
## $ CoincidentEcon_ActivityInd: num 45.5 45.7 45.9 46 46.2 ...
## $ UMCSENT : num 72.1 73.9 68.4 66 68.1 65.8 60.4 64.5 66.7 62.1 ...
## $ DJIA : num 839 809 862 855 822 ...
## $ DJIA_3m_Shift : num 855 822 842 846 888 ...
## $ NASDAQ : num 482 464 494 496 481 ...
## $ NASDAQ_3m_Shift : num 496 481 500 506 533 ...
## $ Recession : int 0 0 0 0 0 0 0 0 0 0 ...
## $ RecessionShifted : int 0 0 0 0 0 0 0 0 0 0 ...
# dim(US_data_ord) # 499 16
# NOTE: We avoid attach() to prevent global scoping issues.
# Instead, we use data= argument in lm() and explicit df$column references.# Fit and plot linear models according to specified predictors and outcomes
fitPlot_LM_Model <- function (Y, X) {
nX <- length(X)
nY <- length(Y)
b <- array(dim=nX) # DJIA model coefficients
c <- array(dim=nX) # NASDAQ model coefficients
if (nX < 1 || nY != 2) return ("Error 1")
if (nX == 1) {
formulaDJIA <- paste0(Y[1], " ~ ", X[1], collapse = " ")
formulaNASDAQ <- paste0(Y[2], " ~ ", X[1], collapse = " ")
}
if (nX > 1) {
formulaDJIA <- paste0(Y[1], " ~ ", X[1], paste0(" +", X[-1], collapse = " "))
formulaNASDAQ <- paste0(Y[2], " ~ ", X[1], paste0(" +", X[-1], collapse = " "))
}
### Y[1]: DJIA
print(paste0("Fitting LM: ", formulaDJIA))
modDJIA <- lm(formula = formulaDJIA, data = US_data_ord)
summary(modDJIA)
coef(modDJIA)
# Y[2]: NASDAQ
print(paste0("Fitting LM: ", formulaNASDAQ))
modNASDAQ <- lm(formula = formulaNASDAQ, data = US_data_ord)
summary(modNASDAQ)
coef(modNASDAQ)
for (i in 1:(nX+1)) {
b[i] <- round(coef(modDJIA)[i], 2)
c[i] <- round(coef(modNASDAQ)[i], 2)
}
# define the string for the DJIA LM
modDJIA_label <- paste0("DJIA ~ ", b[1], paste0(" +", b[-1], "*", X, collapse = " "), "\n\n")
modNASDAQ_label <- paste0("NASDAQ ~ ", c[1], paste0(" +", c[-1], "*", X, collapse = " "))
modDJIA_NASDAQ_label <- paste0(modDJIA_label, modNASDAQ_label)
library(plotly)
myPlot <- US_data_ord %>%
plot_ly(x = ~Date) %>%
add_markers(y = ~DJIA, name="DJIA Data",
marker = list(color = "blue",
line = list(color = "blue", width = 1))) %>%
add_trace(x = ~Date, y = fitted(modDJIA), name="DJIA Model", type = "scatter",
mode='lines+markers', marker = list(color = "orange",
line = list(color = "orange", width = 2))) %>%
add_markers(y = ~NASDAQ, name="NASDAQ Data", marker = list(color = "green",
line = list(color = "green", width = 1))) %>%
add_trace(x = ~Date, y = fitted(modNASDAQ), name="NASDAQ Model", type = "scatter",
mode='lines+markers', marker = list(color = "red",
line = list(color = "red", width = 1))) %>%
add_trace(x = ~Date, y = ~Recession*30000, type = 'bar',
marker = list(color='gray', line = list(color='gray', width=4)),
opacity=0.2, name="Recessions", text = "Recessions", hoverinfo = 'recessions') %>%
layout(title=modDJIA_NASDAQ_label, font=list(size=8))
return (myPlot)
}#### Run the Full model-fitting and plotting function
myPlot <- fitPlot_LM_Model(Y=c("DJIA_3m_Shift", "NASDAQ_3m_Shift"),
X=c("US_Debt_B", "US_GDP_B", "Debt2GDP_Pct", "US_CapacityUtilizationRate", "Jobless_Claims",
"Industrial_Production", "Total_Reserves_B", "Monetary_Base_B",
"CoincidentEcon_ActivityInd", "UMCSENT"))## [1] "Fitting LM: DJIA_3m_Shift ~ US_Debt_B +US_GDP_B +Debt2GDP_Pct +US_CapacityUtilizationRate +Jobless_Claims +Industrial_Production +Total_Reserves_B +Monetary_Base_B +CoincidentEcon_ActivityInd +UMCSENT"
## [1] "Fitting LM: NASDAQ_3m_Shift ~ US_Debt_B +US_GDP_B +Debt2GDP_Pct +US_CapacityUtilizationRate +Jobless_Claims +Industrial_Production +Total_Reserves_B +Monetary_Base_B +CoincidentEcon_ActivityInd +UMCSENT"
#### Run a Smaller model-fitting and plotting function
myPlot1 <- fitPlot_LM_Model(Y=c("DJIA_3m_Shift", "NASDAQ_3m_Shift"),
X=c("Debt2GDP_Pct", "Jobless_Claims", "Monetary_Base_B", "UMCSENT"))## [1] "Fitting LM: DJIA_3m_Shift ~ Debt2GDP_Pct +Jobless_Claims +Monetary_Base_B +UMCSENT"
## [1] "Fitting LM: NASDAQ_3m_Shift ~ Debt2GDP_Pct +Jobless_Claims +Monetary_Base_B +UMCSENT"
The DJIA and NASDAQ markets represent complex dynamic processes that are linked across many socioeconomic, cyclical, psychological, technological, and other factors. The multiple types and effects and their dynamic nature lead to highly complex interactions; some causal, some correlational, and some random. Clearly, it is vital to understand the mechanistic effects of different observable processes on macro- and micro-economic indices.
Let’s employ transfer entropy and Granger causality to examine the dependency relations among various factors in this dataset (tensor of dimensions \(499\times 17\)). Specifically, we can display the resulting \(17\times 17\) matrix of normalized causality values. This normalization divides each cell value by the maximum value in the matrix to ensure all values are between 0 and 1. The graph shows the pairs of features that are highly interconnected in the period 1979-2020. These are the cell values with the highest information flow going from one economic index to another, along with the direction of the information flow; this may suggest causal mechanistic dependence.
# Prepare data: remove date column (non-numeric)
lin.system <- US_data_ord[, -1]
## Enable parallel computing
plan(multisession)
## Estimate the Granger-causality and the transfer entropy
n_idx <- seq(1:ncol(lin.system))
var.names <- colnames(lin.system)
# apply the GC and TE estimating functions to all possible pairs of columns
# Note: This may take several minutes for 17 variables with TE bootstrapping
ff.GC.value <- function(a, b) linearGC(lin.system[,a], lin.system[,b])
GC.matrix <- outer(n_idx, n_idx, Vectorize(ff.GC.value))
ff.TE.value <- function(a, b) calc_te(lin.system[,a], lin.system[,b], shuffles = 100)
TE.matrix <- outer(n_idx, n_idx, Vectorize(ff.TE.value))
rownames(TE.matrix) <- colnames(TE.matrix) <- var.names
rownames(GC.matrix) <- colnames(GC.matrix) <- var.names
# Normalize matrices to [0, 1] by dividing by the maximum value
GC.norm <- GC.matrix / max(GC.matrix)
TE.norm <- TE.matrix / max(TE.matrix)
# Plot the pairwise causal relations (properly normalized, no artificial boosting)
corrplot(GC.norm, method = "circle", is.corr = FALSE,
title = "Granger Causality (MacroEcon, Normalized)",
tl.cex = 0.7, tl.col = 'black', mar=c(1, 1, 1, 1))corrplot(TE.norm, method = "circle", is.corr = FALSE,
title = "Transfer Entropy (MacroEcon, Normalized)",
tl.cex = 0.7, tl.col = 'black', mar=c(1, 1, 1, 1))# Compute net information flow for the macroeconomic system
NetFlow.matrix <- TE.matrix - t(TE.matrix)
rownames(NetFlow.matrix) <- colnames(NetFlow.matrix) <- var.names
# Visualize with a diverging color palette
corrplot(NetFlow.matrix, method = "circle", is.corr = FALSE,
col = COL2('RdBu', 20),
title = "Net Information Flow (MacroEcon)\nPositive = Row->Col, Negative = Col->Row",
tl.cex = 0.6, tl.col = 'black', mar=c(1, 1, 2, 1))# Print the top 5 strongest net flows (by absolute magnitude)
net_flow_vec <- as.vector(NetFlow.matrix)
names(net_flow_vec) <- paste0(rep(var.names, each=length(var.names)), "→", rep(var.names, length(var.names)))
sorted_flows <- sort(abs(net_flow_vec), decreasing = TRUE)
# Exclude self-flows (diagonal)
diag_names <- paste0(var.names, "→", var.names)
sorted_flows <- sorted_flows[!names(sorted_flows) %in% diag_names]
cat("\nTop 10 strongest net information flows:\n")##
## Top 10 strongest net information flows:
for (i in 1:min(10, length(sorted_flows))) {
idx <- which(abs(net_flow_vec) == sorted_flows[i])[1]
cat(sprintf("%s : %.4f\n", names(net_flow_vec)[idx], net_flow_vec[idx]))
}## Recession→RecessionShifted : 0.0751
## Recession→RecessionShifted : 0.0751
## DJIA→DJIA_3m_Shift : 0.0482
## DJIA→DJIA_3m_Shift : 0.0482
## CoincidentEcon_ActivityInd→DJIA_3m_Shift : -0.0476
## CoincidentEcon_ActivityInd→DJIA_3m_Shift : -0.0476
## US_Debt_B→Debt2GDP_Pct : -0.0445
## US_Debt_B→Debt2GDP_Pct : -0.0445
## Debt2GDP_Pct→DJIA_3m_Shift : -0.0437
## Debt2GDP_Pct→DJIA_3m_Shift : -0.0437
Instead of hand-picking edges, we dynamically construct the network by thresholding the normalized Transfer Entropy matrix.
library(igraph)
# Dynamically generate graph from the normalized TE matrix
TE.norm.diag <- TE.norm
diag(TE.norm.diag) <- 0
# Choose a threshold (top information flows)
threshold_macro <- 0.35
adj_matrix_macro <- ifelse(TE.norm.diag > threshold_macro, TE.norm.diag, 0)
g_macro <- graph_from_adjacency_matrix(adj_matrix_macro, mode = "directed",
weighted = TRUE, diag = FALSE)
V(g_macro)$label <- var.names
E(g_macro)$width <- E(g_macro)$weight * 4
plot(g_macro,
main = paste0("Dynamic TE Network (MacroEcon, threshold = ", threshold_macro, ")"),
layout = layout_with_kk,
edge.arrow.size = 0.5,
vertex.label.color = "black",
vertex.size = 12,
vertex.label.cex = 0.7,
vertex.color = "lightyellow")Note the different patterns in the paired causal matrix plots above (linear Granger causality vs. the non-linear transfer entropy). Several specific relationships stand out and illustrate the complementary insights these methods provide:
Jobless Claims → Market Indices: Both Granger causality and Transfer Entropy detect a strong information flow from Jobless Claims to the DJIA and NASDAQ indices. This makes economic sense: rising jobless claims signal weakening labor markets, which reduces consumer spending and corporate earnings expectations, thereby pressuring equity indices downward. The predictive relationship is strong because labor market data is released on a weekly schedule, whereas market indices adjust continuously, creating a temporal lag that Granger causality readily captures.
Monetary Base → Financial Variables: Transfer entropy reveals strong nonlinear information flows from the Monetary Base to several other variables (including Total Reserves and the equity indices) that are less prominent in the linear Granger causality plot. This suggests that the Federal Reserve’s expansion of the monetary base — particularly during the quantitative easing periods after the 2008 financial crisis — affected markets through nonlinear channels (e.g., threshold effects where markets respond disproportionately to large balance-sheet expansions).
Industrial Production and Capacity Utilization: These real-economy indicators show moderate bidirectional information flows with market indices in the TE plot, consistent with the well-known feedback loop between economic activity and equity valuations. The net information flow analysis reveals that the dominant direction tends to run from economic indicators toward the markets, rather than the reverse — aligning with the economic intuition that production data informs market expectations more than markets drive production in the short run.
Consumer Sentiment (UMCSENT): The University of Michigan Consumer Sentiment Index shows weaker but still detectable transfer entropy to equity markets. This is consistent with consumer sentiment being a soft, forward-looking indicator that carries incremental information beyond hard economic data, particularly around recession turning points.
Summary: In the macroeconomic analysis, why might Transfer Entropy detect information flows from the Monetary Base to equity indices that Granger causality misses? What does this imply about the nature of the relationship between central bank policy and financial markets? Can you think of a potential unmeasured confounder that might create a spurious Granger-causal relationship between two of these economic variables?
Uncovering directional causal effects is generally difficult and requires a number of assumptions. Data science and statistical inference strategies can assist with discriminating simple linear associations (correlations) from mechanistic effects (causal relations). In this DSPA appendix, we explored several approaches to causal inference:
Granger causality operates under the framework of linear vector-autoregressive modeling. It tests whether including past values of one variable significantly reduces the prediction error for another. While intuitive and computationally efficient, it is limited to detecting linear predictive relationships and is susceptible to spurious findings from unmeasured confounders.
Transfer entropy provides a model-free, information-theoretic alternative that can capture nonlinear causal dependencies. By measuring the reduction in uncertainty about a target variable’s future given a source variable’s past (beyond what the target’s own past provides), TE generalizes Granger causality to arbitrary dynamical relationships. We demonstrated through synthetic simulations that linear Granger-causality may fail to detect explicit non-linear (quadratic) causal patterns, which are successfully untangled by transfer entropy estimation.
Net information flow (\(F_{X\rightarrow Y} = T_{X\rightarrow Y} - T_{Y\rightarrow X}\)) leverages the asymmetry of transfer entropy to quantify the dominant direction of information transfer, providing a clearer picture of which process is the primary driver in a coupled system.
Important caveats remain: Both Granger causality and transfer entropy operate on observational data and cannot, by themselves, establish interventional causality in Pearl’s sense. Unmeasured confounders can produce spurious predictive relationships that mimic causation. Students should interpret these measures as evidence of predictive information flow rather than definitive mechanistic causation, and should always consider the possibility of latent common drivers.
The application to US macroeconomic data demonstrated how these methods can quantify predictive information flows among economic indicators and financial markets, revealing both expected relationships (e.g., jobless claims predicting market movements) and subtler nonlinear dependencies (e.g., monetary base effects) that linear methods alone would understate.
Optional, using a custom _render.R script. Instead of
clicking the “Knit” button, run this in your R console. This allows you
to define the pandoc_args cleanly without the GUI injecting
its own defaults:
# Install if needed: install.packages("stringi")
library(stringi)
# Read the file
file_path <- "DSPA_Appendix_7A_Causality.Rmd"
original_text <- readLines(file_path, encoding = "UTF-8")
# Convert to ASCII (transliterates where possible, replaces others with '')
clean_text <- stri_trans_general(original_text, "latin-ascii")
# Overwrite the file (or save as a new version to be safe)
# writeLines(clean_text, "DSPA_Appendix_7A_Causality_Clean.Rmd", useBytes = TRUE)
rmarkdown::render(
"DSPA_Appendix_7A_Causality.Rmd",
output_format = rmarkdown::html_document(
pandoc_args = c("+RTS", "-K4000m", "-M12G", "-RTS")
)
)