1 Calculus of Differentiation and Integration

Calculus of differentiation models properties of functional derivatives whereas calculus of integration characterizes antiderivatives (integrals). The derivative and integral operators are inverse operations of each other (cf. the fundamental theorem of calculus) that are defined using the concept of a limit.

1.1 Limits

The limit of a function $f(x)$ as $x\to a$ is equal to $L$, i.e., $\lim_{x\to a} {f(x)}=L$, if $\forall \epsilon\gt 0$, $\exists \delta_{\epsilon}\gt 0$, so that $\forall x:\ {0\lt |x-a|\lt \delta_{\epsilon}}$, $|f(x)-L|\lt \epsilon$. In other words, the limit of $f(x)$ as $x$ tends to $a$ is equal to $L$ when for every desired (function $f$) closeness ($\epsilon$) to $L$, there is a small interval around $a$ (excluding $a$) that achieves that closeness under the mapping $f$.

Two algebraic properties of limits include:

$\lim_{x\to a} (\lambda f(x) + \gamma g(x))=\lambda \lim_{x\to a} (f(x)) + \gamma\lim_{x\to a} (g(x))$,
Unless $f(x)$ is continuous at $a$, i.e. $\lim_{x\to a} (f(x))=f(a)$, $f(a)$ may be undefined and/or unrelated to $\lim_{x\to a} (f(x))$.

Below, we show two examples of functions we can explore the limits of as the argument $t\to -1$ and $t\to 0$. Note that $f_1(t)$ and $f_2(t)$ have singularities at $t=0$ and $t=-1$, respectively, however, the piecewise definition of $f_2(t)$ resolves that and controls the growth of the function near $t=-1$.

\[f_1(t)=\frac{1}{t},\] and \[f_2(t)=\begin{cases} 20\left ( 1+ \frac{1}{t+1}\right ) & |t+1|\geq 0.1 \\ 0 & |t+1|\lt 0.1 \end{cases}.\]

library(plotly)    

# Define Domain values
t <- seq(from=-pi, to=pi, length.out = 1000)

# Define corresponding functional values
f1_val <- 1/t
f2_val <- ifelse(abs(t+1)<0.1, 0, 20*(1 + 1/(t+1)))

# compute the corresponding integral and Riemann sum approximations
funct1 <- function (t) {
  1/t
}
funct2 <- function (t) {
  if (abs(t+1)<0.1) return(0)
  else return(20*(1 + 1/(t+1)))
}

# plots
plot_ly(x = ~t, y = ~f1_val, type = 'scatter', name="y=f1(t)=1/t",
               mode = 'lines', opacity=0.7) %>%
  add_trace(x = ~t, y = ~f2_val, type = 'scatter', name="y=f2(t)=20(1 + 1/(t+1))",
               mode = 'lines', opacity=0.7) %>%
     layout(title="Function Limits", legend = list(orientation = 'h'))

1.2 Derivatives

When $f(a)$ is well-defined, the (first order) derivative of $f(x)$ at $x=a$ is defined by: \[f'(a)\equiv \lim_{x\to a} \left ( \frac{f(x)-f(a)}{x-a} \right ) ,\]

wherever this limit exists. In studies of physical systems, when $f(t)$ represents the position as a function over time, the derivative can be interpreted as velocity. More generally, the derivative represents the instantaneous rate of change, or the slope, of a tangent line to the graph of $f$.

Suppressing the argument $(x)$ for brevity, below are some algebraic properties of derivatives:

Constants rule: $c'\equiv 0, \ \forall$ constants $c\in\mathbb{C}$,
Linearity: $(\alpha f+\beta g)'=\alpha f'+\beta g', \ \forall \alpha,\beta \in\mathbb{C}$,
Power rule: $(x^n)'=n\times x^{n-1}, \forall n\in \mathbb{Z}$,
Chain rule: $(f\times g)'=f'\times g + f\times g'$,
Fraction rule: $\left ( \frac{f}{g}\right )'= \frac{f'g-fg'}{(g)^2}$,
Composition: $(f(g(x)))'=g'(x)\times f'(g(x))$,
Trigonometric rules: $\sin'=\cos$ and $\cos'=-\sin$,
Chained differentiation: $(f')'\equiv f''$, $(f^{(n-1)})'\equiv f^{(n)}$,
Euler (natural) number $e=\lim_{n\to\infty}\left (1+\frac{1}{n}\right )^n=2.7182\cdots$,
Exponential: $(e^x)'=e^x$ (unique function property),
Logarithm: $(\log(x))'=\frac{1}{x}$,
L’Hôpital’s rule, if the functions $f$ and $g$ are differentiable on an open interval $I\ni a$, $g'(x)\not= 0, \forall x\in I$, and $\lim_{x\to a}f(x)=\lim_{x\to a}g(x)=0$, or $\lim_{x\to a}f(x)=\lim_{x\to a}g(x)=\pm\infty$, then the limit of the fraction exists:

\[\lim _{x\to a}{\frac {f(x)}{g(x)}}=\lim _{x\to a}{\frac {f'(x)}{g'(x)}}.\]

In many physical applications, derivatives play a role in finding solutions to optimization problems or solutions to differential equations expressing relations between functions and their derivatives. For instance, the trigonometric functions sine and cosine satisfy the following (ordinary) differential equation $f''(x) = -f(x)$ used for modeling of a pendulum, a spring, and other oscillatory harmonic motions. Later, we will see that a simpler differential equation $f'(x) = k f(x)$ is used for modeling population growth and radioactive decay, as its solution is an exponential function $f(x) = e^{kx}$.

Interpreting derivatives as graph tangents (slopes of tangent lines) allows their use for plotting functions. Points on the graphs of differentiable functions where the slope is trivial indicate that the graph is horizontal, $f'(x) = 0$. These points are particularly important, because they indicate candidate (critical) extrema points where the function may attain local or global minimum or maximum.

All extrema points occur at the domain boundary or at critical points where $f'(x) = 0$. To determine if an extrema is a minimum or maximum, we need to examine the concavity of the function. For an optimization problem, the function is a cost (loss, or an objective function)and its second order derivative (or positive-definiteness of the Hessian matrix for multivariate functions) determines the type of the extrema where $f'(x) = 0$. In the univariate case, $f''(x) < 0$ implies a relative maximum for a locally concave down function. Conversely, $f''(x) > 0$ implies a relative local minimum for a concave up function. Below we will demonstrate these for polynomial functions.

The Mean Value Theorem states that for any differentiable function $f(x)$ on an interval $[a, b],\ a\lt b$, there exists some point $c \in (a,b)$ such that the derivative of $f'(c)$ is equal to the average slope of $f$ between $a$ and $b$: \[f'(c)=\frac{f(b)-f(a)}{b-a}.\]

1.3 Integrals

1.4 Indefinite integral (antiderivative)

The antiderivative (indefinite integral) is the inverse operation to differentiation. Let’s start with a simple example - a model of exponential growth, which assumes that at any time $t$, a temporally dynamic process growth rate $N'(t)=\frac{dN}{dt}$ is proportional to the current process value (e.g., population size), $N(t)$.

\[\frac{dN}{dt} = kN,\]

where $N(t)$ represents the process value at time $t$ (e.g., population size) and $k$ is a proportionality constant that controls the rate of change. Solving this simple ordinary differential equation for $N(t)$ can be accomplished by integrating the derivative $N'(t)$, i.e.,

\[\int {N'(t)dt}=N(t) +c.\]

Recall that the exponential function, $f(t)=e^t$, is the only function that is unchanged by differentiation:

\[\frac{d(e^t)}{dt} = e^t.\] More generally, for any function $f(t)$, an antiderivative (indefinite integral) of $f(t)$ is any function $F(t)$ such that $F′(t)=f(t)$. If $F(t)$ is one antiderivative of $f(t)$, then all other antiderivatives of $f(t)$ are the indefinite integral \[\int {f(t)dt} = F(t)+c,\]

where $c$ is any constant, the integral is $\int$, $f(t)$ is the integrand, and $t$ is the integration variable.

The solution to our exponential growth process example above in terms of integration is:

\[N = N_o e^{kt},\]

since differentiating $N_o e^{kt}$ yields \[\frac{dN(t)}{dt}\equiv (N_o e^{kt})'=k N_o e^{kt}\equiv k N(t).\]

Finding $N(t)$ is equivalent to finding the integral (antiderivative) of $g(t)$: \[g(t)\equiv \frac{dN(t)}{dt}=k N_o e^{kt}.\] Hence, $g(t)$ is the derivative of $N(t)$ and, conversely, $N(t)$ is an antiderivative (integral) of $g(t)$. \[N(t) = \int {g(t)dt}+c,\] Note that the antiderivative is unique up to an arbitrary integration constant ($c$) and the term $dt$ identifies the variable of integration, similarly to $\frac{d}{dt}$ identifying the variable of differentiation in derivatives. Consider a pair of different antiderivatives

\[N_1(t) = N_o e^{kt} + 1, \ \ \ \ N_{-5}(t) = N_o e^{kt} -5.\]

These represent two antiderivative examples of $g(t)$, since the derivatives of both of these functions are equal \[\frac{dN_1}{dt} = g(t)\equiv kN_o e^{kt} \equiv \frac{dN_{-5}}{dt}.\] Thus, we have that $N_1(t) = \int {g(t)dt}+c_{(1)}$ and $N_{-5}(t) = \int {g(t)dt}+c_{(-5)}$. Geometrically, we can interpret this property as the graphs of $N_1(t)$ and $N_{-5}(t)$ having the same slope (derivative) for any $t$, but different offsets, $c_{(1)}$ and $c_{(-5)}$.

1.5 Definite integral

The (definite) integral of $f(x)$ from $a$ to $b$ with respect to $x$ is denoted by

\[\int_a^b{f(x)dx}=\lim_{n\to\infty}{\sum_{i=1}^n{\left (f(x_i^*)\Delta x\right )}},\] where the interval $[a,b]$ is divided into $n$ subintervals of equal width, $Δx$, and the points $x_i^*$ belong to the corresponding partitioning intervals.

The (definite) integral gives the area between the graph of $f$ and the horizontal axis over the interval $[a,b]$. It can be defined formally as a Riemann sum - the limit of the areas of rectangular approximations to the area as the approximations get better and better.

library(plotly)    

# Define Domain values
t <- seq(from=-pi/4, to=pi*(9/8), length.out = 1000)
t6 <- seq(from=-pi/4, to=pi*(9/8), length.out = 6)
t15 <- seq(from=-pi/4, to=pi*(9/8), length.out = 15)
t51 <- seq(from=-pi/4, to=pi*(9/8), length.out = 51)
t500 <- seq(from=-pi/4, to=pi*(9/8), length.out = 500)

# Define corresponding values
f_val <- sin(t)*(cos(2*t)+t)
f_val6 <- sin(t6)*(cos(2*t6)+t6)
f_val15 <- sin(t15)*(cos(2*t15)+t15)
f_val51 <- sin(t51)*(cos(2*t51)+t51)
f_val500 <- sin(t500)*(cos(2*t500)+t500)

# compute the corresponding integral and Riemann sum approximations
funct <- function (t) {
  sin(t)*(cos(2*t)+t)
}
int_val   <- round(integrate(funct, lower=-pi/4, upper=pi*(9/8))$value, 3)
int_val6  <- round(sum(f_val6 * (pi*(9/8)+pi/4)/6), 3)
int_val15 <- round(sum(f_val15 * (pi*(9/8)+pi/4)/15), 3)
int_val51 <- round(sum(f_val51 * (pi*(9/8)+pi/4)/51), 3)
int_val500 <- round(sum(f_val500 * (pi*(9/8)+pi/4)/500), 3)

# plots
pl_list <- list()
pl_list[[1]] <- plot_ly(x = ~t, y = ~f_val, type = 'scatter', 
               name=paste0("y=f(x)=sin(x)*(cos(2x)+x); Area=", int_val),
               mode = 'lines', fill = 'tozeroy', opacity=0.3) %>%
  add_trace(x = ~t6, y = ~f_val6, type = 'bar', 
               name = paste0('Coarse Sampling (6)', int_val6))
pl_list[[2]] <- plot_ly(x = ~t, y = ~f_val, type = 'scatter', 
               name=paste0("y=f(x)=sin(x)*(cos(2x)+x); Area=", int_val),
               mode = 'lines', fill = 'tozeroy', opacity=0.3) %>%
  add_bars(x = ~t15, y = ~f_val15, type = 'bar',
               name = paste0('Regular Sampling (15); Area=', int_val15))
pl_list[[3]] <- plot_ly(x = ~t, y = ~f_val, type = 'scatter',
               name=paste0("y=f(x)=sin(x)*(cos(2x)+x); Area=", int_val),
               mode = 'lines', fill = 'tozeroy', opacity=0.3) %>%
  add_trace(x = ~t51, y = ~f_val51, type = 'bar', 
               name = paste0('Fine Sampling (51); Area=', int_val51))
pl_list[[4]] <- plot_ly(x = ~t, y = ~f_val, type = 'scatter',
               name=paste0("y=f(x)=sin(x)*(cos(2x)+x); Area=", int_val),
               mode = 'lines', fill = 'tozeroy', opacity=0.3) %>%
  add_trace(x = ~t500, y = ~f_val500, type = 'bar', 
               name = paste0('High-Fidelity Sampling (500); Area=', int_val500))

pl_list %>%
  subplot(nrows = length(pl_list)) %>%
     layout(title="Integral Approximations by Riemann Sums", legend = list(orientation = 'h'))

The fundamental theorem of calculus demonstrates that integration and differentiation are inverse operations:

\[\int_a^b{f'(x)dx}=f(b)-f(a),\] \[\left( \int_a^x{f(y)dy}\right )'=f(x),\] \[\left( \underbrace{\int_{h(x)}^{g(x)}{f(y)dy}}_{\text{function of 2 arguments}\choose {F(u=h(x), v=g(x))}}\right )' = \frac{d}{dx}F(u(x), v(x)) \overbrace{=}^{\text{by Chain Rule}} \frac{dF}{du}\frac{du}{dx}+\frac{dF}{dv}\frac{dv}{dx}\ \ \underbrace{\overbrace{=}^{\frac{dF}{du}=-f(u)}}_{F=\int_{u}{fdy}} -f(h(x))h'(x)+f(g(x))g'(x).\]

Mind the dummy variable of integration $y$ and the variable upper/lower limits, $x,\ u=h(x),\ v=g(x)$, indicating that the definite integral is a function of its limits.

In addition to computing areas, integrals can also be used to calculate other measurable properties such as length or volume. For instance,

The arc length of the graph of $f(x)$ between $x=a$ and $x=b$ is given by the integral $\int_{a}^{b}{\underbrace{\sqrt{1+(f'(x))^2}dx}_{\sqrt{(dx)^2+(dy)^2}}}$,
The (3D) volume contained inside the region constructed by revolving the graph of $f(x)$ between $x=a$ and $x=b$ about the $x$-axis is given by $\int_{a}^{b}{\pi\left (\underbrace{f(x)}_{\text{radius}}\right )^2dx}$,
The (3D) volume contained inside the region constructed by revolving the graph of $f(x)$ between $x=a$ and $x=b$ about the $y$-axis is given by $\int_{a}^{b}{\underbrace{2\pi x f(x)dx}_{2\pi\ \text{radius height thickness}}}$,
The surface area of the surface formed by revolving the graph of $f(x)$ between $x=a$ and $x=b$ about the $x$-axis is represented by the integral $\int_{a}^{b}{2\pi x f(x) \underbrace{\sqrt{1+(f'(x))^2} dx}_{\text{thickness}}}$.

library(plotly)

# Define a vase-generating function (2D), which will be rotated to form curved kime manifold
kime.manifold <- data.frame( s=seq(0, 1.3, length.out = 200) )
kime.manifold$theta <- seq(0, 2*pi, length.out = 200)
g <- function (t) { sqrt(t^3-2.09*t^2+1.097*t) }
h <- function (t) { t }

# rendering (u==s, v==theta) parametric surfaces requires x,y,z arguments to be 2D arrays
# In out case, the three coordinates have to be 200*200 parameterized tensors/arrays
x2 <- g(kime.manifold$s) %o% cos(kime.manifold$theta)
y2 <- g(kime.manifold$s) %o% sin(kime.manifold$theta)
z2 <- h(kime.manifold$s) %o% rep(1, 200)

#2D plot data (Vase Profile)
xx2 <- g(kime.manifold$s) %o% cos(kime.manifold$theta)
yy2 <- rep(0,200*200)
zz2 <- h(kime.manifold$s) %o% rep(1, 200)

# calculate the diameter of every circle
xxx2<-as.vector(xx2)
zzz2<-as.vector(zz2)
newdata<-data.frame(zzz2,xxx2)
zzzz2<-as.matrix(zzz2)

#calculate every diameter
a=vector()
for(i in 1:200){
  t=max(newdata[which(zzz2==zzzz2[i,]),2])
  a=c(a,t)
}

# Define kime-curve parameter
xCurve <- g(kime.manifold$s) * cos(kime.manifold$theta) 
yCurve <- g(kime.manifold$s) * sin(kime.manifold$theta)
zCurve <- h(kime.manifold$s) 

kimeCurve <- data.frame(xCurve, yCurve, zCurve)

#plot 2D into 3D and make the text of the diameter (time), height (r), and phase (phi)
f <- list(family = "Courier New, monospace", size = 18, color = "black")
x <- list(title = "k1", titlefont = f)
y <- list(title = "k2", titlefont = f)
z <- list(title = "space (x)", titlefont = f)

plot_ly(kimeCurve, x = ~xCurve, y = ~yCurve, z = ~zCurve, type = 'scatter3d', 
             mode = 'lines', showscale = FALSE, hovertemplate = 'Curve (arc)<extra></extra>',
             line = list(width = 22), name="Curve (arc)") %>%
  add_trace(x = 0, y = ~g(kime.manifold$s), z = ~kime.manifold$s, type = 'scatter3d', 
            mode = 'lines', line = list(width = 22), name="(Rotated) Generating Curve: y=sqrt(z^3-2.09*z^2+1.097*z)", 
            hovertemplate = '(Rotated) Generating Curve<extra></extra>') %>%
  add_trace(x = ~xx2, y = ~yy2, z = ~zz2, type = "surface", name="Vase Profile",
            hovertemplate = 'Vase Profile<extra></extra>') %>% 
  add_trace(x=~x2, y=~y2, z=~z2, name="Rotational Vase", hovertemplate = 'Rotational Vase<extra></extra>',
                  colors = c("blue", "yellow"),type="surface", 
                  opacity=0.3, showscale = FALSE) %>%
        # trace the main Z-axis
  add_trace(x=0, y=0, z=~zz2[, 1], type="scatter3d", mode="lines", 
            hovertemplate = 'Z-Axis<extra></extra>',
            line = list(width = 10, color="navy blue"), name="Z-axis", hoverinfo="none") %>%
  layout(title = "Generating Curves, Arcs, Surfaces & Volumes", legend = list(orientation = 'h'),
               scene = list(xaxis = list(title="x"), yaxis = list(title="y"), zaxis = list(title="z")))

Integrating over unbounded intervals, such as from $0$ to $\infty$, or using integrand functions that may be undefined at some points, e.g., $\frac{1}{\sqrt{x}}$, may require the use of improper integrals. The latter can be computed by taking appropriate limits of an integral over an interval that grows towards infinity or towards the singularity points where the integrand function may be undefined.

The definite integral can be considered as a total accumulation. Suppose we are interested in a quantity $F(t)$. Then $F'(t) = \frac{dF}{dt}$ represents the rate of change of $F(t)$, whose anti-derivative is $F(t)$. Integrating the rate of change of $F$ over an interval $[a,b]$ gives the total change in $F$

\[\underbrace{A_F(a,b)}_{accumulation}=\int^b_a F'(t)dt = F(t)|^b_a = F(b) - F(a).\] Hence, the definite integral $\int^b_a F'(t)dt$ is the total change in $F(t)$ over the domain $a\leq t\leq b$.

1.6 Examples

1.6.1 Salmon Growth Rate

The Coho salmon growth $y(x)$ with respect to the dissolved oxygen in water $x$ can be modeled by \[y(x) = 7.3 (x + 3.5) e^{-0.05x}.\] This relationship between dissolved oxygen concentration and growth rate of juvenile coho salmon under unrestricted or limited food supply was studied by Fisher (1963) and Warren (1971).

The lower curve is the straight line \[y = 28\] here $y$ = growth rate, $x$ = dissolved oxygen. A simple comparison of the effect of diet (restricted vs. unrestricted ration) on growth rate is to compare the average growth rates ($\overline y$) for the two diets. Since the lower curve has a constant growth rate, we must have $\overline y = 28$.

library(plotly)
x <- seq(from=1, to=25, length.out = 100)
y <- function(x) {
  return (7.3*(x + 3.5)*exp(-0.05*x))
}
y_seq <- y(x)
plot_ly(x=~x, y=~y_seq, type="scatter", mode="lines+markers", name="Unrestricted Food") %>%
  add_trace(x=~x, y=28, type="scatter", mode="lines+markers", name="Restricted Food") %>%
  layout(title="Growth Rate of Coho Salmon", xaxis=list(title="Dissolved oxygen (mg/l)"),
         yaxis=list(title="Growth Rate (mg/g/day)"),
         legend = list(title=list(text='<b>Models</b>'), orientation = 'h'))

Over the domain of dissolved oxygen, $[3,30]$, for the unlimited food supply, corresponding to unrestricted diet, the average salmon growth rate is \[\overline y_{r} = \frac{1}{27} \int^{30}_3 28dx = \frac{1}{27}[28x]^{30}_3 = \frac{1}{27}[(28)(30)-(28)(3)] = \frac{1}{27}(28)(27)=28.\]

Similarly, for the limited food supply, the growth rate is \[\overline y_u = \frac{1}{27} \int^{30}_3 7.3(x + 3.5)e^{-0.05x}dx = \frac{7.3}{27} \int^{30}_3 3.5e^{-0.05x}dx + {\frac{7.3}{27}}\int^{30}_3 xe^{-0.05x}dx=\] \[\underbrace{0.95 \int^{30}_3 e^{-0.05x}dx}_{I} + \underbrace{0.27\int^{30}_3 xe^{-0.05x}dx}_{II}.\]

\[I=0.95 \int^{30}_3 3.5e^{-0.05x}dx = 0.95\left [-\frac{1}{0.05} e^{-0.05x}\right]^{30}_3= 19(-e^{-0.05(30)}+e^{-0.05(3)}) =12.07.\]

The second integral $II$ can be computed using integration by parts, $\int udv = uv - \int vdu$. Let $u(x) = x$ and $dv(x)= e^{-0.05x}dx$ represent the differential of $v(x)=\int dv = -\frac{e^{-0.05x}}{0.05}$. Then,

\[II=0.27 \int^{30}_3 xe^{-0.05x}dx = 0.27 \int^{30}_3 u(x)dv(x)=\] \[0.27\int^{30}_3 xe^{-0.05x}dx = 0.27\left (\left [{\frac{-xe^{-0.05x}}{0.05}}\right ]^{30}_3 - \int^{30}_3 \frac{-e^{-0.05x}}{0.05}dx\right ) = \frac{0.27}{0.05} \bigg(\left [-30e^{-1.5}+3e^{-0.15}\right ]- \left [{\frac {e^{-0.05x}}{0.05}}\right ]^{30}_3\bigg)=\]

\[5.4\left [-30e^{-1.5}+3e^{-0.15} + \frac{e^{-0.15}}{0.05} - \frac{e^{-1.5}}{0.05}\right ] = 46.72.\] Therefore,

\[\overline y_u = I + II = 12.7+46.72=59.42\] and the average growth rate under unrestricted diet $\overline y_u = 59.42$ is double the corresponding average growth rate under restricted food, $\overline y_{r} = 28$, despite the closeness of the growth rates near $x=3mg/l$ on the dissolved oxygen axis.

The following R code confirms the result using numerical integration using the pracma::integral() function. Note that the $\overline y_u$ results are approximately equal, since in the manual estimation above ($\overline y_u=59.42$), we rounded to the second decimal place, whereas the numerical integration using R, relies on higher precision calculations ($\overline y_u=58.7855362914505$).

f <- function(x) {7.3*(x + 3.5)*exp(-0.05*x)}
paste0("Integral = ", (1/27)*pracma::integral(fun=f, xmin=3, xmax=30))

## [1] "Integral = 58.7855362914505"

# [1] "Integral = 58.7855362914505"

1.6.2 Example 2 - Distance

The velocity $v(t)$ is the rate of change of position $x(t)$. The distance traveled, the total change in position, is the integral of the velocity

\[\underbrace{D}_{distance}=\int_{t_o}^{t_1}{v(t)dt}\equiv \int_{t_o}^{t_1}{dx}=x(t_1)-x(t_o).\]

Consider a free-falling object with velocity $v(t) = g \cdot t$, $g$ is the constant acceleration due to gravity and $t$ is time. The distance $D$ covered after $T$ seconds is given by

\[D = \int^T_0 v(t)dt = \int^T_0 (gt)dt = g \int^T_0 t \, dt = g\left .\frac{t^2}{2}\right |^T_0 = \frac{1}{2}gT^2.\]

1.7 Computing derivatives and integrals

Let’s look at some examples of differentiating (with D() and Deriv::()), integrating (with integrate() and cubature::adaptIntegrate()), and evaluating (with eval()) some discrete signals and continuous (analytical) functions, e.g., $f_1(x)=x^3 - 5x$. Remember that in R, there are two differentiation functions, D(), which applies simple univariate differentiation and outputs an expression, and deriv(), which uses D() for multivariate differentiation and outputs either an expression or an executable function.

Let’s compute the first two derivatives of $f_1(x)=x^3 - 5x$:

\[f_1'(x)=3 x^2 - 5, \ \ f_1''(x)=6 x.\]

library(Deriv)
library(plotly)

f1 <- function (x) {
  (x^3 - 5*x)
}

# First derivative - note conversion from "function" type to "expression" type using body()
D_f1 <- D(body(f1)[[2]],'x');  D_f1  # 3 * x^2 - 5

## 3 * x^2 - 5

# Deriv() function returns the result as an expression
D_f1 <- Deriv(f1)  # using the more general symbolic differentiation

# Second derivative
D_D_f1 <- Deriv(D_f1); D_D_f1  # D(D_f1,'x') == D(D(f1,'x'),'x') = 3 * (2 * x)

## function (x) 
## 6 * x

# evaluate f1 and its first 2 derivatives using eval() function.
x<-1:10
paste0("f1(", x,")=", f1(x))   # for expressions use eval(f1)

##  [1] "f1(1)=-4"   "f1(2)=-2"   "f1(3)=12"   "f1(4)=44"   "f1(5)=100" 
##  [6] "f1(6)=186"  "f1(7)=308"  "f1(8)=472"  "f1(9)=684"  "f1(10)=950"

paste0("D_f1(", x,")=", D_f1(x))

##  [1] "D_f1(1)=-2"   "D_f1(2)=7"    "D_f1(3)=22"   "D_f1(4)=43"   "D_f1(5)=70"  
##  [6] "D_f1(6)=103"  "D_f1(7)=142"  "D_f1(8)=187"  "D_f1(9)=238"  "D_f1(10)=295"

paste0("D_D_f1(", x,")=", D_D_f1(x))

##  [1] "D_D_f1(1)=6"   "D_D_f1(2)=12"  "D_D_f1(3)=18"  "D_D_f1(4)=24" 
##  [5] "D_D_f1(5)=30"  "D_D_f1(6)=36"  "D_D_f1(7)=42"  "D_D_f1(8)=48" 
##  [9] "D_D_f1(9)=54"  "D_D_f1(10)=60"

# Plot f1, D(f1), D(D(f1))
x1 <- seq(from=-5, to=8, by=0.1)
plot_ly(x = ~x1, y = ~f1(x1), name="f1(x)", type = "scatter", mode='lines+markers') %>% 
  add_trace(y = ~D_f1(x1), name="D(f1)(x)", type = "scatter", mode='lines+markers') %>%
  add_trace(y = ~D_D_f1(x1), name="D(D(f1))(x)", type = "scatter", mode='lines+markers') %>%
  layout(title="Original Function (f1(x)=x^3 - 5x)", font=list(size=12),
         xaxis = list(title = "x1"),
         yaxis = list(title="Functional Value"), showlegend= T)

# Partial Derivatives f2(x,y)=sin(cos(5*x + y^3))
f2 <- expression(sin(cos(5*x + y^3)))
D_f2_dx <- D(f2, 'x'); D_f2_dx

## -(cos(cos(5 * x + y^3)) * (sin(5 * x + y^3) * 5))

D_f2_dy <- D(f2, 'y'); D_f2_dy

## -(cos(cos(5 * x + y^3)) * (sin(5 * x + y^3) * (3 * y^2)))

D_f2_dx_dy <- deriv(f2, c('x','y')); D_f2_dx_dy; typeof(D_f2_dx_dy)     # as an EXPRESSION

## expression({
##     .expr3 <- 5 * x + y^3
##     .expr4 <- cos(.expr3)
##     .expr6 <- cos(.expr4)
##     .expr7 <- sin(.expr3)
##     .value <- sin(.expr4)
##     .grad <- array(0, c(length(.value), 2L), list(NULL, c("x", 
##         "y")))
##     .grad[, "x"] <- -(.expr6 * (.expr7 * 5))
##     .grad[, "y"] <- -(.expr6 * (.expr7 * (3 * y^2)))
##     attr(.value, "gradient") <- .grad
##     .value
## })

## [1] "expression"

D_f2_dx_dy_func <- deriv(f2, c('x','y'), func=TRUE);  D_f2_dx_dy_func   # as a FUNCTION

## function (x, y) 
## {
##     .expr3 <- 5 * x + y^3
##     .expr4 <- cos(.expr3)
##     .expr6 <- cos(.expr4)
##     .expr7 <- sin(.expr3)
##     .value <- sin(.expr4)
##     .grad <- array(0, c(length(.value), 2L), list(NULL, c("x", 
##         "y")))
##     .grad[, "x"] <- -(.expr6 * (.expr7 * 5))
##     .grad[, "y"] <- -(.expr6 * (.expr7 * (3 * y^2)))
##     attr(.value, "gradient") <- .grad
##     .value
## }

# General n-th order derivative function
GeneralDerivative <- function(expression, argName, order=1){
  if(order<1) stop("Derivative Order must be >=1")
  if(order==1) D(expression, argName)
  else GeneralDerivative(D(expression, argName), argName, order-1)  # recursive
}

GeneralDerivative(body(f1)[[2]],"x", 2) # convert the function to expression first

## 3 * (2 * x)

For integration, we can use the method integrate() specifying a function and lower and upper limits.

To compute higher-order (double, triple and higher order) integrals, we can use the cubature::adaptIntegrate() function.

integrate(f1, lower=-5, upper=8)

## 770.25 with absolute error < 1.1e-11

# For visualization split f1 = f1+ - f1- parts
# https://en.wikipedia.org/wiki/Positive_and_negative_parts

f1_pos <- x1 # positive: f1>=0
f1_neg <- x1 # positive: f1<0
for (i in 1:length(x1)) {
  if(f1(x1[i])>=0) f1_pos[i] = f1(x1[i])
  else f1_pos[i] = 0
  
  if(f1(x1[i])<0) f1_neg[i] = f1(x1[i])
  else f1_neg[i] = 0
}

plot_ly(x = ~x1, y = ~f1_pos, name="f1_pos(x)", type='scatter', mode='lines', fill='tozeroy') %>%
    # add_trace(y = ~f1_neg, name="f1_neg(x)", type = "scatter",  mode='lines', fill='red') %>%
    add_trace(y=~f1_neg, name="f1_neg(x)", connectgaps=0, fillcolor="rgba(0,40,100,0.2)", fill='tonexty') %>%
    layout(title=paste0("Integral of (f1=(f1+) - (f1-)); Area=",
                        round(integrate(f1, lower=-5, upper=8)$value,2)),
           font=list(size=12), xaxis = list(title = "x"),
           yaxis = list(title="Functional Value"), showlegend= T)

integrate(f1, lower=-5, upper=8)

## 770.25 with absolute error < 1.1e-11

# For higher order integrals, we can use cubature::adaptIntegrate() function
library(cubature)
f3 <- function(x) { 4/3 * (x[1] * x[2] + x[3]) } 

adaptIntegrate(f3, lowerLimit = c(0, 0, 0), upperLimit = c(1.0, 1.0, 1.0))

## $integral
## [1] 1
## 
## $error
## [1] 2.220446e-16
## 
## $functionEvaluations
## [1] 33
## 
## $returnCode
## [1] 0

# If we want differentiation and integration in discrete data (not for analytical functions), 
# we can use diff() and cumsum() functions
diff(f1(x1))

##   [1]  6.851  6.557  6.269  5.987  5.711  5.441  5.177  4.919  4.667  4.421
##  [11]  4.181  3.947  3.719  3.497  3.281  3.071  2.867  2.669  2.477  2.291
##  [21]  2.111  1.937  1.769  1.607  1.451  1.301  1.157  1.019  0.887  0.761
##  [31]  0.641  0.527  0.419  0.317  0.221  0.131  0.047 -0.031 -0.103 -0.169
##  [41] -0.229 -0.283 -0.331 -0.373 -0.409 -0.439 -0.463 -0.481 -0.493 -0.499
##  [51] -0.499 -0.493 -0.481 -0.463 -0.439 -0.409 -0.373 -0.331 -0.283 -0.229
##  [61] -0.169 -0.103 -0.031  0.047  0.131  0.221  0.317  0.419  0.527  0.641
##  [71]  0.761  0.887  1.019  1.157  1.301  1.451  1.607  1.769  1.937  2.111
##  [81]  2.291  2.477  2.669  2.867  3.071  3.281  3.497  3.719  3.947  4.181
##  [91]  4.421  4.667  4.919  5.177  5.441  5.711  5.987  6.269  6.557  6.851
## [101]  7.151  7.457  7.769  8.087  8.411  8.741  9.077  9.419  9.767 10.121
## [111] 10.481 10.847 11.219 11.597 11.981 12.371 12.767 13.169 13.577 13.991
## [121] 14.411 14.837 15.269 15.707 16.151 16.601 17.057 17.519 17.987 18.461

# Compare the discrete and continuous integration (for f1(x))
round(cumsum(f1(x1))[length(f1(x1))]/(x1[length(x1)]-x1[1]), 2)

## [1] 606.88

integrate(f1, lower=-5, upper=8)

## 770.25 with absolute error < 1.1e-11

1.8 Function optimization, critical points

Function first and second derivatives can be used to determine three special types of points in the domain of functions that correspond to relative maxima, minima and the inflection points on the function graph. For univariate functions, relative maxima and minima are displayed as the peaks/crests or valleys on the curves, respectively. The peak is the relative maximum. Relative indicates that these extrema may be local, not global, as the curve may rise or fall even further elsewhere on the graph. Inflection points are special points in the domain where the rate of change (e.g., velocity) reaches an extremum.

For instance, logistic growth model describes population size by:

\[N(t)=N_o\frac{1+b}{a+be^{-kt}}=N_o(1+b)(1+be^{-kt})^{-1},\] where $k$ is a growth coefficient, $N_o$ is the initial population size at $t=0$, and the term $N_o(1+b)$ represents the carrying capacity of the environment. The growth rate is

\[\frac{dN}{dt}=N_o(1+b)(-1)(1+be^{-kt})^{-2}(be^{-kt})(-k) =N_o(1+b)bke^{-kt}(1+be^{-kt})^{-2},\]

where the $\mathbb{R}$ parameters $k>0$ and $b>1$.

At critical points, the tangent line to the function graph will be horizontal with a slope of zero and the maximum growth rate occurs when the derivative of the growth rate equals zero. \[\frac{d}{dt}\left (\frac{dN}{dt}\right )\equiv \frac{d^2N}{dt^2}=0.\] Below we show an example of modeling an observed data $y=y(x)$, using a spline model ($f(x)$), computing the first two derivatives $f'(x)$ and $f''(x)$ of the model and displaying the critical and inflection points of the model.

# Observed data
x = seq(1:21)
y = c(7,5,6,5,5,6,7,8,7,7,6,6,7,8,9,7,6,5,5,7,8)

# Spline Model
# loess_model <- loess(y ~ x)
spline_model <- smooth.spline(x = x, y = y)
x_seq <- seq(min(x),max(x), (max(x) - min(x))/1000)
pred  <-  predict(spline_model, x_seq)

# Define the 3 functions, f(x), D(f(x))=f', and D(D(f(x)))=f''
library(splines)
f <- splinefun(x_seq, pred$y)
f_prime <- f(x_seq, deriv=1)
f_doublePrime <- f(x_seq, deriv=2)
  
# Estimate the extrema points: max & min
extrem <- ifelse (abs(f_prime)<0.01, TRUE, FALSE)

# Estimate the inflection points: rate of change (e.g., velocity) reaches an max
#inflex <- c(FALSE, diff(diff(pred)>0)!=0)
inflex <- ifelse (abs(f_doublePrime)<0.005, TRUE, FALSE)

f_x <- f(x_seq)

plot_ly(x = ~x, y = ~y, name="Data", type='scatter', mode='lines') %>%
  add_trace(x=~x_seq, y=~f_x, name="Smooth Model") %>%
  add_trace(x=~x_seq[extrem], y=~f_x[extrem], name="Extrema Points", 
            mode='markers', marker = list(size = 10)) %>%
  add_trace(x=~x_seq[inflex], y=~f_x[inflex], name="Inflection Points", 
            mode='markers', marker = list(size = 10)) %>%
  layout(title="Extrema (Min/Max) and Maximum Growth Rate (Inflection) Points",
           font=list(size=12), xaxis = list(title = "x"),
           yaxis = list(title="Functional Value (y=f(x)"), 
         legend = list(title=list(text='<b>Models</b>'), orientation = 'h'))

The inflection points correspond to trivial second derivative values $f''(x)=0$, where the function graph curvature changes from concave up to concave down and vice versa. In the population growth example, the maximum growth rate occurs when the population function $N(t)$ is at its inflection point as the slope change is decreasing (leveling off) to the right of the inflection point, and increasing to the left of the inflection point.

We can also demonstrate the analytics of localizing the inflection points corresponding to maximum growth.

\[0=\frac{d^2N}{dt^2}\ \ \mbox{at } (t_1, N_1=N(t_1)),\] \[0=\frac{d}{dt}\left [N_o(1+b)bke^{-kt}(1+be^{-kt})^{-2}\right ]\ \ \mbox{at }(t_1, N_1),\] \[0=\underbrace{N_o(1+b)bk^2e^{-kt_1}(1+be^{-kt_1})^{-3}}_{>0} \underbrace{(be^{-kt_1}-1)}_{0}.\] Therefore, \[0=be^{-kt_1}-1,\] which yields that \[t_1=\left (\frac{1}{k}\right )\ln(b).\] Substituting $t=t_1$ in $N(t)$ gives

\[N_1=N_o(1+b)\left (1+\underbrace{be^{-k(\frac{1}{k}\ln(b))}}_{1}\right )^{-1}=\frac{N_o(1+b)}{2}.\] Hence, that the population growth rate is highest when the population-size is one-half of the carrying capacity, i.e., the species’ average population size, $N_o(1+b)$.

Many additional details about function optimization are provided in DPSA Chapter 13.

2 Scalars, Vectors, Matrices, and Tensors

Different dimensional representations of numbers represent key concepts in biophysics and biomathematics. Such representations are particularly relevant in image processing and foundational to an understanding of the spatial behavior of physical phenomena.

The simplest representation is the scalar. Scalar representations can be described by singular numbers (e.g. integers, doubles) and they are zero-dimensional, since they are expressed as a single element, e.g.,

\[1, 2.7, -30, 2000, 3.14, e, \pi, \cdots \]

Vectors are 1-dimensional and are often called “lists” in some programming languages. Their dimensions are described as ($1 \times n$) or ($n \times 1$), where $n$ can be any positive integer; ($1 \times 1$) vectors are often considered scalars. Note that the terms “list”, “vector”, and “1D array” are all 1-dimensional representations, although they have different formal definitions depending on the field (e.g. linear algebra vs. computer science vs. physics) and even programming language (e.g. Python vs. C++). The term “vector” is the most common in biophysics and will be the one used in BPAD. Vectors represented as vertical column vectors, e.g.,

\[\left[\begin{array}{ccc} 10 \\ 5 \end{array}\right],\] and

\[\left[\begin{array}{ccc} e \\ 25 \\ 77 \end{array}\right].\]

Matrices are 2-dimensional, with dimensions of ($n \times m$), where both $n$ and $m$ are positive integers. Matrices are analogous to rectangular arrays, or tables, where $n$ is the number of rows and $m$ is the number of columns, e.g.,

\[\left[\begin{array}{ccc} 10 & 4 & -1 \\ 5 & 0 & 3 \\ 6 & -200 & 3 \end{array}\right],\] and \[\left[\begin{array}{ccc} 3 & 7 \\ 25 & -9\\ 77 & -20 \end{array}\right].\]

Tensors are multi-dimensional arrays with an order corresponding to their dimension. For example, second order tensor components can correspond to those in a matrix, although they are not exactly the same from a mathematical perspective. Put simply, 2-dimensional physical quantities such as stress in fluid dynamics are represented by tensors while non-physical quantities such as the data that constitute images are typically considered matrices. The following is the Cauchy stress tensor, which completely defines the stress in all three spatial dimensions of a material when deformed.

\[\sigma = \left[\begin{array}{ccc} \sigma_{xx} & \sigma_{xy} & \sigma_{xz} \\ \sigma_{yx} & \sigma_{yy} & \sigma_{yz} \\ \sigma_{zx} & \sigma_{yz} & \sigma_{zz} \end{array}\right].\]

As the most general representation, tensors can be subject to distinct mathematical manipulations that you may have encountered in previous calculus and linear algebra courses. Addition and subtraction are typically allowed for all tensors, provided that the dimensions match. Multiplication has different rules between the different representations, and additional manipulations like the scalar product, vector product, and transposition are possible at higher dimensions. Several examples of these are shown below.

x <- 3; x #a scalar

## [1] 3

y <- -5; y #another scalar

## [1] -5

v <- c(x, y); v #a vector (1 x 2)

## [1]  3 -5

w <- c(2 * x, -14 * y); w #another vector (1 x 2)

## [1]  6 70

u <- t(w); u #use t() to transpose vectors and matrices (switch dimensions)

##      [,1] [,2]
## [1,]    6   70

m <- matrix(c(1:12), nrow=3, byrow=TRUE); m # a matrix (3 x 4)

##      [,1] [,2] [,3] [,4]
## [1,]    1    2    3    4
## [2,]    5    6    7    8
## [3,]    9   10   11   12

n <- matrix(c(24:35), nrow=4, byrow=TRUE); n # a matrix (4 x 3)

##      [,1] [,2] [,3]
## [1,]   24   25   26
## [2,]   27   28   29
## [3,]   30   31   32
## [4,]   33   34   35

# Addition and Subtraction
## Scalar
x + y

## [1] -2

## Vector (same dimensions!)
v + w

## [1]  9 65

w - v

## [1]  3 75

## Matrices (must be same dimensions: can't add m and n)
m + m

##      [,1] [,2] [,3] [,4]
## [1,]    2    4    6    8
## [2,]   10   12   14   16
## [3,]   18   20   22   24

n + n

##      [,1] [,2] [,3]
## [1,]   48   50   52
## [2,]   54   56   58
## [3,]   60   62   64
## [4,]   66   68   70

m + t(n)

##      [,1] [,2] [,3] [,4]
## [1,]   25   29   33   37
## [2,]   30   34   38   42
## [3,]   35   39   43   47

#Multiplication 
## Scalar (normal)
x * y

## [1] -15

## Vector (match "outside" and "inside" dimensions)
###you can't multiply v and w due to their dimensions, but v and u's outside (1) and inside (2) dimensions match
v %*% u

##      [,1] [,2]
## [1,]   18  210
## [2,]  -30 -350

## Matrices (match "outside" and "inside" dimensions)
m %*% n

##      [,1] [,2] [,3]
## [1,]  300  310  320
## [2,]  756  782  808
## [3,] 1212 1254 1296

##Note: element-wise multiplication in R is accomplished with just the * operator, not %*%.

## Matrix inversion
M <- cbind(v,w)
inverseM <- solve(M)
M

##       v  w
## [1,]  3  6
## [2,] -5 70

inverseM

##         [,1]    [,2]
## v 0.29166667 -0.0250
## w 0.02083333  0.0125

M %*% inverseM

##      [,1] [,2]
## [1,]    1    0
## [2,]    0    1

3 Vector, Matrix, Tensor Application: fMRI Imaging

Later in the course, we will dive more deeply into the application of these mathematical concepts to biophysical imaging and image manipulation. As a preview, the following examples will illustrate the application of vectors, matrices, and tensors in biophysics, particularly in functional magnetic resonance imaging, or fMRI.

Just as in images that you might find on the internet or take on your phone, fMRI images are composed of intensity (e.g., grayscale, RBG color vector) values stored in a matrix form. At a fixed time, each pixel on the sensor of an fMRI machine captures a single value in 2-dimensions. Through image processing, slices/matrices captured by fMRI are combined by stacking them into 3D tensors, where the 2D picture-element (pixel), which generalizes to a 3D volume-element (voxels). For longitudinal scans taken over time, these 3D tensors become 4D (spacetime) tensors containing information in all three spatial dimensions in addition to the 1D time dimension. Therefore, by taking different dimensional slices of the output 4D tensors, yields lower-dimensional tensors - vectors, matrices, and scalars, just as we did in the hypothetical examples above. In the 5D spacekime universe, the spacekime analytical representation generalizes the 4D spacetime modeling of data to 2D complex-time (kime), where the longitudinal dynamics incorporate kime magnitude and kime phase.

Let’s look at an fMRI time-series example.

A single spatial location in $\mathbb{R}^3$ (single, fixed x, y, and z values, a list of time values) in the brain to capture a vector.
A cross-sectional image (single, fixed z and t values, a matrix of varying x and y values) of the brain to capture a matrix.
A full, 3D image (a tensor of x, y, and z values with modified intensities as a fourth dimension) of the brain to capture a tensor.

library(brainR)
library(httr)
library(TCIU)
library(fancycut)
library(RColorBrewer)
httr::set_config(config(ssl_verifypeer=0L))

# 4D fMRI data: https://socr.umich.edu/HTML5/BrainViewer/data/fMRI_FilteredData_4D.nii.gz

fMRIURL <- "https://socr.umich.edu/HTML5/BrainViewer/data/fMRI_FilteredData_4D.nii.gz"
fMRIFile <- file.path(tempdir(), "fMRI_FilteredData_4D.nii.gz")
invisible(GET(fMRIURL, write_disk(fMRIFile, overwrite=TRUE)))
brainVolume <- readNIfTI(fMRIFile, reorient=FALSE)

brainVolDims <- dim(brainVolume); brainVolDims

## [1]  64  64  21 180

# get middle values for each dimension for following plots and images
midX <- brainVolDims[1] / 2; midX

## [1] 32

midY <- brainVolDims[2] / 2; midY

## [1] 32

midZ <- brainVolDims[3] / 2; midZ

## [1] 10.5

midt <- brainVolDims[4] / 2; midt

## [1] 90

#Vector of intensity values exactly in the middle (spatially) of the brain throughout time-course
vec <- brainVolume[midX,midY,midZ,]; vec

##   [1] 18874 18635 18381 18642 18812 18687 18588 18762 18771 18792 18624 18829
##  [13] 18689 18956 18735 18901 18908 18799 18700 18380 18518 18593 18577 18626
##  [25] 18760 18819 18711 18670 19064 18466 18676 18969 19057 18558 18583 18664
##  [37] 18527 18756 18563 18643 18753 18711 18576 18621 18737 18884 18771 18480
##  [49] 18668 18803 18515 18819 18600 18682 18888 19021 18754 18253 18882 19108
##  [61] 18881 18610 18797 18750 18439 18650 18510 18691 18464 18469 18729 18790
##  [73] 18865 18801 18897 18930 18754 18912 18815 18707 18620 18194 18855 18814
##  [85] 18512 18601 18795 18930 18914 18784 18708 18836 18552 18462 18513 18608
##  [97] 18658 18674 18860 18911 18934 18940 18834 18581 18649 18836 18534 18539
## [109] 18710 18488 18480 18633 18890 18695 18712 18800 18825 18791 18884 18572
## [121] 18595 18871 18679 18474 18774 18982 18478 18547 18454 18785 18869 18919
## [133] 18910 18705 19004 18846 18768 18988 18702 18337 18709 18607 18496 18293
## [145] 18879 18600 18714 19098 18674 18526 18469 18827 18942 18645 18782 18777
## [157] 18422 18404 18626 18785 18572 18619 18714 18795 18677 18530 18904 18869
## [169] 18676 18813 18810 18876 18736 18551 18870 18729 18727 18629 18521 18719

#Matrix of one cross-sectional image at the midpoint of the Z-axis and midway through the time-course
mat <- brainVolume[,,midZ,midt]; mat[, 1:12]

##       [,1] [,2] [,3] [,4] [,5] [,6] [,7]  [,8]  [,9] [,10] [,11] [,12]
##  [1,]    0    0    0    0    0    0    0     0     0     0     0     0
##  [2,]    0    0    0    0    0    0    0     0     0     0     0     0
##  [3,]    0    0    0    0    0    0    0     0     0     0     0     0
##  [4,]    0    0    0    0    0    0    0     0     0     0     0     0
##  [5,]    0    0    0    0    0    0    0     0     0     0     0     0
##  [6,]    0    0    0    0    0    0    0     0     0     0     0     0
##  [7,]    0    0    0    0    0    0    0     0     0     0     0     0
##  [8,]    0    0    0    0    0    0    0     0     0     0     0     0
##  [9,]    0    0    0    0    0    0    0     0     0     0     0     0
## [10,]    0    0    0    0    0    0    0     0     0     0     0     0
## [11,]    0    0    0    0    0    0    0     0     0     0     0     0
## [12,]    0    0    0    0    0    0    0     0     0     0     0     0
## [13,]    0    0    0    0    0    0    0     0     0     0     0     0
## [14,]    0    0    0    0    0    0    0     0     0     0     0     0
## [15,]    0    0    0    0    0    0    0     0     0     0     0     0
## [16,]    0    0    0    0    0    0    0     0     0     0     0     0
## [17,]    0    0    0    0    0    0    0     0     0     0     0     0
## [18,]    0    0    0    0    0    0    0     0     0     0     0     0
## [19,]    0    0    0    0    0    0    0     0     0     0     0     0
## [20,]    0    0    0    0    0    0    0     0     0     0     0     0
## [21,]    0    0    0    0    0    0    0     0     0     0     0     0
## [22,]    0    0    0    0    0    0    0     0     0     0  1861  3227
## [23,]    0    0    0    0    0    0    0     0     0     0  4234  9497
## [24,]    0    0    0    0    0    0    0     0  1744  3852  9967 12776
## [25,]    0    0    0    0    0    0 2198  2036  3577  8732 12040 12729
## [26,]    0    0    0    0    0 1732 1828     0  5249 11616 12995 13322
## [27,]    0    0    0    0    0 1845    0  2374  9009 14269 14210 15067
## [28,]    0    0    0    0    0    0 2119  6032 13940 16228 14855 13189
## [29,]    0    0    0    0    0    0 2652  9992 16152 16797 14756 12982
## [30,]    0    0    0    0    0    0 3852 13319 16937 14628 13191 13261
## [31,]    0    0    0    0    0    0 5908 15906 15658 15160 14990 13437
## [32,]    0    0    0    0    0    0 4426 15658 16646 16125 14607 14197
## [33,]    0    0    0    0    0    0 2686 10113 12572 15426 15077 15449
## [34,]    0    0    0    0    0    0 5614  6672  7234 13993 14113 15289
## [35,]    0    0    0    0    0 2054 5234  6263 13252 15275 14218 15679
## [36,]    0    0    0    0    0    0 2251  8873 14475 14115 13649 14445
## [37,]    0    0    0    0    0    0    0  6686 12000 13804 14000 13912
## [38,]    0    0    0    0    0    0    0  2677  8574 14289 13635 12251
## [39,]    0    0    0    0    0    0    0  2639  6630 13346 12643 13265
## [40,]    0    0    0    0    0    0    0  2059  7220 12913 13494 14259
## [41,]    0    0    0    0    0    0    0     0  4060  9161 12655 14042
## [42,]    0    0    0    0    0    0    0     0     0  3918 11169 14145
## [43,]    0    0    0    0    0    0    0     0     0  2094  7743 14148
## [44,]    0    0    0    0    0    0    0     0     0  1741  2363  7648
## [45,]    0    0    0    0    0    0    0     0     0     0     0  2167
## [46,]    0    0    0    0    0    0    0     0     0     0     0     0
## [47,]    0    0    0    0    0    0    0     0     0     0     0     0
## [48,]    0    0    0    0    0    0    0     0     0     0     0     0
## [49,]    0    0    0    0    0    0    0     0     0     0     0     0
## [50,]    0    0    0    0    0    0    0     0     0     0     0     0
## [51,]    0    0    0    0    0    0    0     0     0     0     0     0
## [52,]    0    0    0    0    0    0    0     0     0     0     0     0
## [53,]    0    0    0    0    0    0    0     0     0     0     0     0
## [54,]    0    0    0    0    0    0    0     0     0     0     0     0
## [55,]    0    0    0    0    0    0    0     0     0     0     0     0
## [56,]    0    0    0    0    0    0    0     0     0     0     0     0
## [57,]    0    0    0    0    0    0    0     0     0     0     0     0
## [58,]    0    0    0    0    0    0    0     0     0     0     0     0
## [59,]    0    0    0    0    0    0    0     0     0     0     0     0
## [60,]    0    0    0    0    0    0    0     0     0     0     0     0
## [61,]    0    0    0    0    0    0    0     0     0     0     0     0
## [62,]    0    0    0    0    0    0    0     0     0     0     0     0
## [63,]    0    0    0    0    0    0    0     0     0     0     0     0
## [64,]    0    0    0    0    0    0    0     0     0     0     0     0

#Tensor Visualization: TCIU
fMRIData <- brainVolume[,,,]
load("mask.RData")

#"intensities" are calculated and represent p-values (although the actual value is not important in this example, it is more of a usability/normalization operation)
pval = fmri_stimulus_detect(fmridata=fMRIData, mask = mask, stimulus_idx = c(1:160)[rep(c(TRUE,FALSE), c(10,10))],
                            method = "HRF",
                            ons = c(1, 21, 41, 61, 81, 101, 121, 141),
                            dur = c(10, 10, 10, 10, 10, 10, 10, 10))

# this part is to process p value data to add the points of the motor area in the mesh brain plot
# this plot can only be made on 1 - p value
one_minus_p_3d = (1- pval)
dim_pval = dim(one_minus_p_3d)

# convert 3d array 1 - p value to a dataframe to be the input for plotly
pval_df = 
  data.frame(x = rep(c(1:dim_pval[1]), dim_pval[2]*dim_pval[3]),
              y = rep(rep(c(1:dim_pval[2]), each=dim_pval[1]), dim_pval[3]),
              z = rep(c(1:dim_pval[3]), each=dim_pval[1]*dim_pval[2]))
names(pval_df) = c("x", "y", "z")
pval_df$one_minus_p_3dn = as.vector(one_minus_p_3d)
pval_df$p_val = 1 - pval_df$one_minus_p_3dn
multi_pranges = TRUE
p_threshold <- 0.05
color_pal = "YlOrRd"

pval_df = dplyr::filter(pval_df, pval_df$p_val <= p_threshold)
pval_df$cut_invs = wafflecut(pval_df$p_val, 
                            c('[0,1e-8]', '(1e-8, 1e-7]', '(1e-7, 1e-6]', '(1e-6, 1e-5]', 
                              '(1e-5, 1e-4]', '(1e-4, 1e-3]', '(1e-3, 1e-2]', '(1e-2, 5e-2]'))

pval_df$colorgrp = as.numeric(pval_df$cut_invs)
pval_df$cut_invs = wafflecut(pval_df$p_val, 
                              c('[0,1e-7]', '(1e-7, 1e-5]', 
                                '(1e-5, 1e-3]', '(1e-3, 5e-2]'))
label_p = levels(unique(pval_df$cut_invs))
            
#plot the image
p = plot_ly()            
color_choice = rev(brewer.pal(9, color_pal))
pval_df$corresp_color = color_choice[pval_df$colorgrp]

for (i in sort(unique(pval_df$colorgrp))){
  pts_grp = pval_df[pval_df$colorgrp == i, ]
  p = add_markers(p, data=pts_grp, 
                  x=~pval_df$x, y=~pval_df$y, z=~pval_df$z, mode="markers",
                  marker = list(opacity=0.6, symbol="circle", 
                                size=rev(c(1:length(color_choice)))[i]+2*as.numeric(I(multi_pranges==FALSE)),
                                color=~pval_df$corresp_color,
                                line = list(color = ~pval_df$corresp_color,
                                            width = 0.3)
                                )
                  )
}
  
p = layout(p, showlegend=FALSE, 
           title = "Tensor Visualization",
           scene = list(
             xaxis=list(title="x"), 
             yaxis=list(title="y", scaleanchor="x", scaleratio=1), 
             zaxis=list(title="z", scaleanchor="x", scaleratio=1),
             aspectratio = list(x=1, y=1, z=1.0)
             )
           )
p

4 Displacement, Velocity, and Acceleration

Displacement (often denoted by $\Delta x$) is a change in spatial position. Velocity ($v$) is the rate of change of the displacement in time and acceleration ($a$) is the rate of change of the velocity in time. All three are vector quantities, since they must codify both a magnitude and direction. For this reason, complementary terms are typically used for their magnitudes: distance (for displacement) and speed (for velocity), however, acceleration does not have a parallel quantity. The three concepts are connected.

As can be discerned by the term “rate”, velocity is the first (time) derivative of displacement and first integral of acceleration in time.
Acceleration is the second time derivative of displacement and first time derivative of velocity.
Displacement is the first integral of velocity and second integral of acceleration.

4.1 Projectile motion model

Depending on the system, alternative projectile motion models may be utilized to describe the particle trajectory using specific well-defined functions. For instance, in the y-direction, the projectile displacement function may be a function of the dynamic time parameter:

\[y=f(t)=y_o + v_o t - \frac{1}{2} 9.8 t^2.\] The corresponding projectile velocity ($v(t)$) and projectile acceleration ($a(t)$) functions are:

\[v=f'(t)=v_o - 9.8 t,\\ a=f''(t)=- 9.8.\]

library(Deriv)
library(plotly)

y0 <- 2 #starting position on y-axis
v0 <- 50 #starting velocity in the y-axis

displacement <- function (t) {
  (y0 + v0 * t - 0.5 * 9.8 * t^2)
}
displacement

## function (t) {
##   (y0 + v0 * t - 0.5 * 9.8 * t^2)
## }

# First derivative - Velocity
velocity <- Deriv(displacement); velocity #v = v0 - 9.8 * t

## function (t) 
## v0 - 9.8 * t

# Second derivative - Acceleration
acceleration <- Deriv(velocity); acceleration #-9.8

## function (t) 
## -9.8

# Plot f1, D(f1), D(D(f1))
t <- seq(from=0, to=10, by=0.1)
plot_ly(x = ~t, y = ~displacement(t), name="Displacement (Delta x)", type = "scatter", mode='lines+markers') %>%
  add_trace(y = ~velocity(t), name="Velocity (v)", type = "scatter", mode='lines+markers') %>%
  add_trace(y = ~acceleration(t), name="Acceleration (a=-9.8)", type = "scatter", mode='lines+markers') %>%
  layout(title="Projectile Motion in the y-direction", font=list(size=12),
         xaxis = list(title = "time"), yaxis = list(title="Functional Value"), 
         legend = list(title=list(text='<b>Functions</b>'), orientation = 'h'), showlegend= T)

5 Polynomials, Exponents, and Logarithms

Polynomials are functions that consist of constants, variables, and weight coefficients, where all variables can only take non-negative integer exponents. Here are a couple of examples of univariate and multivariate polynomial functions: \[a(x) = 5x + 5x^2 + 5x^5,\] \[b(x,y) = x^2 + xy^3 - 9y^2 + 3.\] Exponential functions are those that contain a variable in the exponent. The sign of the exponent determines if the function is increasing (unbounded) or decreasing (asymptotically converging to a value). The following are two exponential function examples: \[c(x) = -70,000e^{4x} + 10^{19},\] \[d(x) = 9e^{-8x + 2}.\] The inverse-exponential functions are called logarithms are characterized by their base, e.g., $\log()\equiv\log_{10}()$ and $\ln()\equiv \log_e()$. Recall the change of base property for the logarithmic function:

\[\log_{\underbrace{b}_{new\\ base}}(x)=\frac{\log_a(x)}{\log_{\underbrace{a}_{old\\ base}}(b)}.\]

The following are a few logarithmic function examples: \[f(x) = 3\log(2x)\] \[g(x) = -2\ln\left (\frac{x}{3}\right).\] The following example show plots of polynomial and exponential functions. As seen in the last comparison graph, it is clear that exponential functions are the fastest growing function, followed by high exponent polynomials, and finally by logarithmic functions. The relative rate of change (speed of increase or decrease of the function) is key in analyzing the behavior of biophysical phenomena that can be modeled by such functions at short and long time scales.

# Polynomials
exA <- function (x) {
  5 * x + 5 * x^2 + x^5
}

linear <- function (x) {
  1000 * x + 2
}

quadratic <- function (x) {
  100 * x^2 + 3 * x - 2
}

cubic <- function (x) {
  -60 * x^3 + 2 * x^2 - 4
}

x <- seq(from=-7, to=7, by=0.1)
plot_ly(x = ~x, y = ~exA(x), name="Example a (5th order poly)", type = "scatter", mode='lines+markers') %>%
  add_trace(y = ~linear(x), name="Linear Example", type = "scatter", mode='lines+markers') %>%
  add_trace(y = ~quadratic(x), name="Quadratic Example", type = "scatter", mode='lines+markers') %>%
  add_trace(y = ~cubic(x), name="Cubic Example", type = "scatter", mode='lines+markers') %>%
  layout(title="Polynomial Examples", font=list(size=12),
         xaxis = list(title = "x"), yaxis = list(title="f(x)"), showlegend= T,
         legend = list(title=list(text='<b>Functions</b>'), orientation = 'h'))

The next two graphs show the growth and decay of exponential functions (top) and logarithmic functions (bottom).

# Exponential Functions
x <- seq(from=-1, to=5, by=0.1)

exC <- function (x) {
    9 * exp(2*x)
}

exD <- function (x) {
    9 * exp(-8*x + 2)
}

plot_ly(x = ~x, y = ~exC(x), name="Example c (Exp Growth)", type = "scatter", mode='lines+markers') %>%
    add_trace(y = ~exD(x), name="Example d (Exp Decay)", type = "scatter", mode='lines+markers') %>%
    layout(title="Exponential Examples", font=list(size=12),
           xaxis = list(title = "x"), yaxis = list(title="f(x)"), showlegend= T,
           legend = list(title=list(text='<b>Functions</b>'), orientation = 'h'))

# Logarithmic Functions
x <- seq(from=-5, to=5, by=0.1)

exF <- function (x) {
  3 * log(2 * x,base=10)
}

exG <- function (x) {
  -2 * log(x / 3)
}
posx <- seq(from=0, to=5, by=0.1)
plot_ly(x = ~posx, y = ~exF(posx), name="Example f (Log growth)", type = "scatter", mode='lines+markers') %>%
  add_trace(y = ~exG(posx), name="Example g (Natural log decay)", type = "scatter", mode='lines+markers') %>%
  layout(title="Logarithmic Examples", font=list(size=12),
         xaxis = list(title = "x"), yaxis = list(title="f(x)"), showlegend= T,
         legend = list(title=list(text='<b>Functions</b>'), orientation = 'h'))

# Comparison
basicLinear <- function (x) {
  x
}

basicQuad <- function (x) {
  x^2
}

basicCubic <- function (x) {
  x^3
}

basicPosExp <- function (x) {
  exp(x)
}

basicLn <- function (x) {
  log(x)
}

x2 <- seq(from=-7, to=7, by=0.1)
posx2 <- c(rep(c(0), each=length(x2)/2), seq(from=0, to=7, by=0.1))
plot_ly(x = ~x2, y = ~basicLinear(x2), name="Linear", type = "scatter", mode='lines+markers') %>%
  add_trace(y = ~basicQuad(x2), name="Quadratic", type = "scatter", mode='lines+markers') %>%
  add_trace(y = ~basicCubic(x2), name="Cubic", type = "scatter", mode='lines+markers') %>%
  add_trace(y = ~basicPosExp(x2), name="Positive Exponential", type = "scatter", mode='lines+markers') %>%
  add_trace(y = ~basicLn(posx2), name="Natural Logarithmic", type = "scatter", mode='lines+markers') %>%
  layout(title="Comparison Examples", font=list(size=12),
         xaxis = list(title = "x"), yaxis = list(title="f(x)"), showlegend= T,
         legend = list(title=list(text='<b>Functions</b>'), orientation = 'h'))

6 Taylor’s Series Expansions

In mathematics and physics, Taylor’s Series are often used to approximate more complex functions, particularly for difficult derivations. In many cases, it is enough to use only the first few terms in the series.

In general, an infinitely differentiable function $f:\mathbb{C}\longrightarrow \mathbb{C}$ can be expanded as a power series (Taylor series) around each $a\in \mathbb{C}$:

\[f(x) = f(a)+{\frac {f'(a)}{1!}}(x-a)+{\frac {f''(a)}{2!}}(x-a)^{2}+{\frac {f'''(a)}{3!}}(x-a)^{3}+\cdots +{\frac {f^{(k)}(a)}{k!}}(x-a)^{k}+\cdots\\ \equiv \sum_{i=1}^{\infty}{{\frac {f^{(i)}(a)}{i!}}(x-a)^{i}},\]

where the factorial symbol $k!=\prod_{i=1}^k {i}$, $f^{(k)}(a)$ is the n-th derivative of $f$ at the point $a$. Taylor series expansion around the origin, where $a = 0$, is called a Maclaurin series.

Many important functions like $\sin(x)$ and $e^x$ have well-defined closed-form Taylor’s Series expansions.

\[e^x = \sum^{\infty}_{n=0}{\frac{x^n}{n!}} = 1 + x + \frac{x^2}{2!} + \frac{x^3}{3!}+ \cdots + \frac{x^k}{k!}+ \cdots\ ,\] \[\sin(x) = x - \frac{x^3}{3!} + \frac{x^5}{5!}+ \cdots + (-1)^k\frac{x^{2k+1}}{(2k+1)!}+ \cdots\ .\]

The following code helps visualize the various Taylor’s Series approximations of $\sin(x)$. In this demonstration, all Taylor’s Series approximate the functions around $x = 0$, so all expansions are most accurate around the origin. As more terms are added, the approximations become more accurate farther away from $x = 0$.

sine <- function (x) {
  sin(x)
}

linearSin <- function (x) {
  x
}

cubicSin <- function (x) {
  x - (x^3 / (3 * 2 * 1))
}

fifthSin <- function (x) {
  x - (x^3 / (3 * 2 * 1)) + (x^5 / (5 * 4 * 3 * 2 * 1))
}

ninethSin <- function (x) {
  x - (x^3 / (3 * 2 * 1)) + (x^5 / (5 * 4 * 3 * 2 * 1)) - 
    (x^7 / (7*6*5*4*3*2*1)) + (x^9 / (9*8*7*6*5*4*3*2*1))
}

x <- seq(from=-5, to=5, by=0.1)
plot_ly(x = ~x, y = ~sine(x), name="Full Sine Function", type = "scatter", mode='lines+markers') %>%
  add_trace(y = ~linearSin(x), name="Linear Sine Approximation", type = "scatter", mode='lines+markers') %>%
  add_trace(y = ~cubicSin(x), name="Cubic Sine Approximation", type = "scatter", mode='lines+markers') %>%
  add_trace(y = ~fifthSin(x), name="5th Order Sine Approximation", type = "scatter", mode='lines+markers') %>%
  add_trace(y = ~ninethSin(x), name="9th Order Sine Approximation", type = "scatter", mode='lines+markers') %>%
  layout(title="Taylor's Series Approximations of sin(x)", font=list(size=12),
         xaxis = list(title = "x"), yaxis = list(title="f(x)"), showlegend= T,
         legend = list(title=list(text='<b>Functions</b>'), orientation = 'h'))

7 Complex Numbers

Complex numbers include both a real and an imaginary component. In addition to the other numerous applications for complex numbers in biophysics, the Euler’s identity is an important representation of complex numbers of unitary modulus, which can be shown via the Argand diagram.

\[e^{i \theta } =\cos(\theta) + i \sin(\theta)\]

labl  <- c(TeX("\\theta"), TeX("|z|"), TeX("z=x+iy"), TeX("x"), TeX("y"), "y")
x <- c(0.2, 0.25, 0.75, 1.05, 0.1, (-0.1+sqrt(2)/2))
y <- c(0.1, 0.5, 0.8, 0.2, 1.1, sqrt(2)/4)
  
plot_ly() %>%
  add_paths(x = c(0, sqrt(2)/2), y = c(0, sqrt(2)/2), name="radius") %>%
  add_paths(x = c(sqrt(2)/2, sqrt(2)/2), y = c(0,sqrt(2)/2), name="y") %>%
  add_markers(x=sqrt(2)/2, y=sqrt(2)/2) %>%
  add_text(x = ~x, y = ~y, type = 'scatter', mode = 'text', 
           text = ~labl, textposition = 'middle right',
           textfont = list(color = 'black', size = 16)) %>%
  layout(showlegend=FALSE, shapes = list(list(type = 'circle', title="circle",
                       xref= 'x', x0 = -1, x1 = 1, yref= 'y', y0 = -1, y1 = 1,
                       line = list(color='green', size=4), opacity = 0.7))) %>% 
  config(mathjax = 'cdn')

## <request>
## Options:
## * : list(x = list(visdat = list(`702028c74e22` = function () 
## plotlyVisDat), cur_data = "702028c74e22", attrs = list(`702028c74e22` = list(alpha_stroke = 1, sizes = c(10, 100), spans = c(1, 20)), `702028c74e22` = list(alpha_stroke = 1, sizes = c(10, 100), spans = c(1, 20), x = c(0, 0.707106781186548), y = c(0, 0.707106781186548), type = "scatter", mode = "lines", name = "radius", inherit = TRUE), `702028c74e22` = list(alpha_stroke = 1, sizes = c(10, 100), spans = c(1, 20), x = c(0.707106781186548, 0.707106781186548), y = c(0, 0.707106781186548), type = "scatter", 
##     mode = "lines", name = "y", inherit = TRUE), `702028c74e22` = list(alpha_stroke = 1, sizes = c(10, 100), spans = c(1, 20), x = 0.707106781186548, y = 0.707106781186548, type = "scatter", mode = "markers", inherit = TRUE), `702028c74e22` = list(alpha_stroke = 1, sizes = c(10, 100), spans = c(1, 20), x = ~x, y = ~y, text = ~labl, type = "scatter", mode = "text", textposition = "middle right", textfont = list(color = "black", size = 16), inherit = TRUE)), layout = list(width = NULL, height = NULL, 
##     margin = list(b = 40, l = 60, t = 25, r = 10)), source = "A", config = list(modeBarButtonsToAdd = c("hoverclosest", "hovercompare"), showSendToCloud = FALSE), layoutAttrs = list(`702028c74e22` = list(showlegend = FALSE, shapes = list(list(type = "circle", title = "circle", xref = "x", x0 = -1, x1 = 1, yref = "y", y0 = -1, y1 = 1, line = list(color = "green", size = 4), opacity = 0.7))))), width = NULL, height = NULL, sizingPolicy = list(defaultWidth = "100%", defaultHeight = 400, padding = 0, 
##     fill = NULL, viewer = list(defaultWidth = NULL, defaultHeight = NULL, padding = NULL, fill = TRUE, suppress = FALSE, paneHeight = NULL), browser = list(defaultWidth = NULL, defaultHeight = NULL, padding = NULL, fill = TRUE, external = FALSE), knitr = list(defaultWidth = NULL, defaultHeight = NULL, figure = TRUE)), dependencies = list(list(name = "typedarray", version = "0.1", src = list(file = "htmlwidgets/lib/typedarray"), meta = NULL, script = "typedarray.min.js", stylesheet = NULL, head = NULL, 
##     attachment = NULL, package = "plotly", all_files = FALSE), list(name = "jquery", version = "3.5.1", src = list(file = "lib/jquery"), meta = NULL, script = "jquery.min.js", stylesheet = NULL, head = NULL, attachment = NULL, package = "crosstalk", all_files = TRUE), list(name = "crosstalk", version = "1.2.1", src = list(file = "www"), meta = NULL, script = "js/crosstalk.min.js", stylesheet = "css/crosstalk.min.css", head = NULL, attachment = NULL, package = "crosstalk", all_files = TRUE), list(
##     name = "plotly-htmlwidgets-css", version = "2.11.1", src = list(file = "htmlwidgets/lib/plotlyjs"), meta = NULL, script = NULL, stylesheet = "plotly-htmlwidgets.css", head = NULL, attachment = NULL, package = "plotly", all_files = FALSE), list(name = "plotly-main", version = "2.11.1", src = list(file = "htmlwidgets/lib/plotlyjs"), meta = NULL, script = "plotly-latest.min.js", stylesheet = NULL, head = NULL, attachment = NULL, package = "plotly", all_files = FALSE)), elementId = NULL, preRenderHook = function (p, 
##     registerFrames = TRUE) 
## {
##     UseMethod("plotly_build")
## }, jsHooks = list())
## * mathjax: cdn

In essence, we are able to use a two-dimensional graph where the x-axis contains the real numbers and the y-axis contains the imaginary ones, to represent each complex number as a vector in the complex plane. Vector addition and subtraction are applicable, as are relationships related to conversions between Cartesian and polar representations of the complex numbers.

Here is a visualization of the following two complex numbers: \[z_1 = 1 + 4i\] \[z_2 = 2 - 7i\]

#For visualization purposes only; does not represent the complex number Argand diagrams directly!

compl1 <- data.frame(x=1, y = 4) # z1 = 1 + 4i
compl2 <- data.frame(x=2, y = -7) # z2 = 2 - 7i
plot_ly(type = "scatter", mode='lines+markers') %>%
  add_annotations(ax = 0, ay = 0, axref='x', ayref='y', x = compl1$x, y=compl1$y, xref='x', yref='y', text="") %>%
  add_annotations(ax = 0, ay = 0, axref='x', ayref='y', x = compl2$x, y=compl2$y, xref='x', yref='y', text="") %>%
  layout(title = "Argand Diagram Representation of Complex Numbers: z1 and z2", font=list(size=12),
         xaxis = list(title="Re(z)"),
         yaxis = list(title="Im(z)"))

And to visualize the vector addition of two complex numbers: \[z_3 = z_1 + z_2 = 3 - 3i\]

#For visualization purposes only; does not represent the complex number Argand diagrams directly!
labl  <- c(TeX("z_1"), TeX("z_2"), TeX("z_3=z_1 + z_2"))
x <- c(0.3, 1.6, 0.5)
y <- c(3, -1, -4)

compl1 <- data.frame(x=1, y = 4) # z1 = 1 + 4i
compl2 <- data.frame(x=2, y = -7) # z2 = 2 - 7i
plot_ly(type = "scatter", mode='lines+markers') %>%
  add_annotations(ax = 0, ay = 0, axref='x', ayref='y', x = compl1$x, y=compl1$y, 
                  xref='x', yref='y', text="") %>%
  add_annotations(ax = 0, ay = 0, axref='x', ayref='y', x = compl2$x, y=compl2$y, 
                  xref='x', yref='y', text="") %>%
  add_annotations(ax = 1, ay = 4, axref='x', ayref='y', x = compl2$x, y=compl2$y, 
                  xref='x', yref='y', text="") %>%
  add_text(x = ~x, y = ~y, type = 'scatter', mode = 'text', 
           text = ~labl, textposition = 'middle right',
           textfont = list(color = '#000000', size = 16)) %>%
  layout(title = "Argand Diagram Representation of Complex Numbers: z1 + z2 = z3", showlegend = FALSE,
         font=list(size=12), xaxis = list(title="Re(z)"), yaxis = list(title="Im(z)")) %>% 
  config(mathjax = 'cdn')

## <request>
## Options:
## * : list(x = list(visdat = list(`7020442942a8` = function () 
## plotlyVisDat), cur_data = "7020442942a8", attrs = list(`7020442942a8` = list(mode = "lines+markers", alpha_stroke = 1, sizes = c(10, 100), spans = c(1, 20), type = "scatter"), `7020442942a8` = list(mode = "text", alpha_stroke = 1, sizes = c(10, 100), spans = c(1, 20), type = "scatter", x = ~x, y = ~y, text = ~labl, textposition = "middle right", textfont = list(color = "#000000", size = 16), inherit = TRUE)), layout = list(width = NULL, height = NULL, margin = list(b = 40, l = 60, t = 25, r = 10)), 
##     source = "A", config = list(modeBarButtonsToAdd = c("hoverclosest", "hovercompare"), showSendToCloud = FALSE), layoutAttrs = list(`7020442942a8` = list(annotations = list(text = "", ax = 0, ay = 0, axref = "x", ayref = "y", x = 1, y = 4, xref = "x", yref = "y")), `7020442942a8` = list(annotations = list(text = "", ax = 0, ay = 0, axref = "x", ayref = "y", x = 2, y = -7, xref = "x", yref = "y")), `7020442942a8` = list(annotations = list(text = "", ax = 1, ay = 4, axref = "x", ayref = "y", x = 2, 
##         y = -7, xref = "x", yref = "y")), `7020442942a8` = list(title = "Argand Diagram Representation of Complex Numbers: z1 + z2 = z3", showlegend = FALSE, font = list(size = 12), xaxis = list(title = "Re(z)"), yaxis = list(title = "Im(z)")))), width = NULL, height = NULL, sizingPolicy = list(defaultWidth = "100%", defaultHeight = 400, padding = 0, fill = NULL, viewer = list(defaultWidth = NULL, defaultHeight = NULL, padding = NULL, fill = TRUE, suppress = FALSE, paneHeight = NULL), browser = list(
##     defaultWidth = NULL, defaultHeight = NULL, padding = NULL, fill = TRUE, external = FALSE), knitr = list(defaultWidth = NULL, defaultHeight = NULL, figure = TRUE)), dependencies = list(list(name = "typedarray", version = "0.1", src = list(file = "htmlwidgets/lib/typedarray"), meta = NULL, script = "typedarray.min.js", stylesheet = NULL, head = NULL, attachment = NULL, package = "plotly", all_files = FALSE), list(name = "jquery", version = "3.5.1", src = list(file = "lib/jquery"), meta = NULL, 
##     script = "jquery.min.js", stylesheet = NULL, head = NULL, attachment = NULL, package = "crosstalk", all_files = TRUE), list(name = "crosstalk", version = "1.2.1", src = list(file = "www"), meta = NULL, script = "js/crosstalk.min.js", stylesheet = "css/crosstalk.min.css", head = NULL, attachment = NULL, package = "crosstalk", all_files = TRUE), list(name = "plotly-htmlwidgets-css", version = "2.11.1", src = list(file = "htmlwidgets/lib/plotlyjs"), meta = NULL, script = NULL, stylesheet = "plotly-htmlwidgets.css", 
##     head = NULL, attachment = NULL, package = "plotly", all_files = FALSE), list(name = "plotly-main", version = "2.11.1", src = list(file = "htmlwidgets/lib/plotlyjs"), meta = NULL, script = "plotly-latest.min.js", stylesheet = NULL, head = NULL, attachment = NULL, package = "plotly", all_files = FALSE)), elementId = NULL, preRenderHook = function (p, registerFrames = TRUE) 
## {
##     UseMethod("plotly_build")
## }, jsHooks = list())
## * mathjax: cdn

8 Fourier Transform

8.1 Discrete Fourier Transform

The Fourier transform allows us to decompose (analyze, FT) spatiotemporal data into periodic frequency components (Fourier k-space) and vice-versa, synthesize k-space observations into spacetime (IFT).

8.1.1 Sine waves

Let’s simulate some periodic data over time $1, 2, \cdots , n$ using the $\sin(\cdot)$ function.

library(plotly)

n <- 100
t <- seq(0,n)
y1 <- sin(2*pi*t/n)
y5 <- sin(2*pi*5*t/n)

plot_ly(x=t, y=y1, type="scatter", mode="markers+lines", name="Sine (period: 1)") %>%
  add_trace(x=t, y=y5, name="Sine (period: 5)") %>%
  add_trace(x=t, y=(y1+y5), name="Sine (1) + Sin (5)") %>%
  layout(title="Sine Functions with Different Periods", 
         xaxis=list(title="time", 
              tickvals=c(0, 3*n/(2*pi), 3*n/(pi)), 
              automargin = T,      #seq(0, L, length=L+2), 
              ticktext = c("0","pi", "2pi")), 
         yaxis=list(title="Intensity"), legend=list(orientation='h', y=-0.5))

8.1.2 Properties of the trigonometric functions sine and cosine

\[\sin\left (2\pi \omega \frac{t}{n}\right ) \text{ is periodic, with } period\ \frac{n}{\omega}\ \text{and } frequency\ \frac{\omega}{n}\\ \sin(-\theta) = -\sin(\theta)\\ \sin(0) = 0\\ \sin(\pi/2) = 1\\ \sin(\theta+\pi) = -\sin{\theta}\ .\]

Finite sums of sine functions are always periodic functions. Using $\sin(2\pi \omega t/n)$ sine functions can be used to model data expected to go through the origin which oscillates between peak and trough starting at $t=0$.

\[\cos\left (2\pi \omega \frac{t}{n}\right ) \text{ is periodic, with period}\ \frac{n}{\omega}\ \text{ and frequency } \frac{\omega}{n} \\ \cos(-\theta) = \cos{\theta}\\ \cos(0) = 1\\ \cos(\pi/2) = 0\\ \cos(\theta+\pi) = -\cos{\theta}\\ \sin^2(\theta) + \cos^2(\theta)=1\\ \sin(\theta+\phi) = \sin(\theta)\cos(\phi) + \sin(\phi)\cos(\theta)\\ \sin(\theta+\pi/2) = \cos(\theta)\\ \cos(\theta+\phi) = \cos(\theta)\cos(\phi) - \sin(\theta)\sin(\phi)\\ \cos(\theta-\pi/2) = \sin(\theta)\ .\]

Effectively we only need sine functions as cosine can be derived from sine.

For any integer $\omega$,

\[\sum_{t=0}^{n-1}{\sin\left (2\pi\omega \frac{t}{n}\right )} = 0\ ,\] since pairs of time points cancel each other out in the overall sum

\[\sin\left (2\pi\omega \frac{t}{n}\right ) = -\sin\left (2\pi\omega \frac{t+n/2\omega}{n}\right ) .\]

\[\sum_{t=0}^{n-1}{\cos\left (2\pi\omega\frac{t}{n}\right )} = \begin{cases} 0\ , & \omega \neq 0 \\ n-1\ , & \omega = 0 \end{cases}\ .\]

8.1.3 Orthogonality

Sines of different frequencies, $\omega_1\not= \omega_2$, are orthogonal. that is the inner product of two sine waves is usually $0$

\[\sum_{t=1}^{n}{\sin{\left(2\pi\omega_1 \frac{t}{n}\right)} \sin{\left(2\pi\omega_2 \frac{t}{n}\right)}}=\\ \frac{1}{2}\sum_{t=1}^{n}{\left [\cos{\left(2\pi(\omega_1-\omega_2) \frac{t}{n}\right)} - \cos{\left(2\pi(\omega_1+\omega_2)\frac{t}{n}\right)}\right ]}=\\ \begin{cases} n/2\ , & \omega_1 = \omega_2 \neq 0 \\ -n/2\ , & \omega_1= -\omega_2 \neq 0 \\ 0\ , & \omega_1 = \omega_2 = 0 \end{cases}\ .\]

In general, its difficult to resolve either very high or very low frequency sine waves:

High frequency: If $\sin(\phi + 2\pi\omega t/n)$ goes through a complete cycle in the interval $[t,t+1]$, then the sampling resolution allows us to see $\sin(\phi)$ everywhere.
Low frequency: If $\omega$ is tiny, we may not even see half a cycle, and we can’t tell the difference between the functional value and a constant plus some higher-frequency terms.

Orthogonal and normalized vectors yield an (orthonormal) basis representation. Recall that a set of vectors $\vec{v}_1, \vec{v}_2, \cdots ,\vec{v}_n$ in a vector space $V$ over a scalar field $\mathbb{F}$ form a basis when any vector $\vec{x}\in V$ can be expressed as

\[\vec{x} = \sum_{i=1}^{n}{c_i \vec{v}_i} , \ \ \{c_i\}\in \mathbb{F}\ .\] In an $n$-dimensional vector space, any set of $n$ orthogonal vectors forms a basis. Hence, we can use the sines and cosines as a basis for expressing many other function, $x(t)$, e.g., polynomials.

To express $x(t)$ in terms of the trigonometric (sine and cosine) basis:

\[x(t) = \sum_{\omega=0}^{n/2-1}{\left [a_{\omega}\sin{\left(2\pi\omega \frac{t}{n}\right)} + b_{\omega}\cos{\left(2\pi\omega\frac{t}{n}\right)}\right ]}\ .\]

Note that we only summed up $n/2-1$, $b_0 = \overline{x}$, and $a_0=0$. Using the trigonometric identities, we can eliminate the cosine base waves and only use sines waves (or vice-versa).

\[x(t) = \sum_{\omega=0}^{n/2-1}{\alpha_{\omega} \sin{\left(2\pi\frac{t}{n} + \phi_{\omega}\right)}}\ ,\] where the coefficients of hte sne basis are \[\alpha_{\omega} = \sqrt{a_{\omega}^2+b_{\omega}^2}\ .\]

The Euler identity, $e^{i \theta} = \cos(\theta) + i \sin(\theta)$, also allows expressing signal $x(t)$ in terms of complex exponentials. This further facilitates the replacement of the $\sin$ basis with an exponential basis

\[x(t) = \sum_{\omega=0}^{n-1}{c_{\omega} e^{(2\ \pi\ i\ \omega\ t/n)}}\ .\]

8.1.4 Analysis - the Discrete Fourier transform

The Fourier transform of a spacetime signal $x(t)$, $t=0, \cdots , (n-1)$ is a complex-values frequency-space signal

\[\hat{x}(\omega) \equiv \sum_{t=0}^{n-1}{x(t) e^{-2\ \pi\ i\ t\ \omega/n}} \] where $\hat{x}(\omega)= r e^{i\phi}\in\mathbb{C}$ with a real amplitude $r\in\mathbb{R}^+$ and real phase $\phi\in\mathbb{R}$.

8.1.5 Synthesis - the Discrete Inverse Fourier transform

The inverse Fourier transform recovers the spacetime signal $x(t)$ from a frequency-space signal $\hat{x}(\omega)$

\[x(t) = \frac{1}{n}\sum_{\omega=0}^{n/2-1}{\hat{x}(\omega) e^{2\pi\ i\ t\ \omega/n}}\ .\]

8.1.6 Example of Discrete FT an IFT

Let’s first start with the forward Fourier transform (FFT).

n <- 100
t <- seq(0,n)
y <- 1.5*sin(2*pi*t/n) + 0.5*sin(2*pi*5*t/n)

plot_ly(x=t, y=y, type="scatter", mode="markers+lines", name="Sine (period: 1)") %>%
  layout(title="Spacetime signal \n x(t)=1.5*sin(2*pi*t/n) + 0.5*sin(2*pi*5*t/n)", 
         xaxis=list(title="time", 
              tickvals=c(0, 3*n/(2*pi), 3*n/(pi)), 
              automargin = T,      #seq(0, L, length=L+2), 
              ticktext = c("0","pi", "2pi")), 
         yaxis=list(title="Intensity"), legend=list(orientation='h', y=-0.2))

FT_y <- fft(y)
magnitude <- Mod(FT_y) # sqrt(Re(FT_y)*Re(FT_y) + Im(FT_y)*Im(FT_y))
phase <- Arg(FT_y)     # atan(Im(FT_y)/Re(FT_y))

plot_ly(x=Re(FT_y), y=Im(FT_y), type="scatter", mode="markers", name="FT") %>%
  layout(title="(Complex-Valued) Discrete Fourier Transform of hat{x}(omega)", 
         xaxis=list(title="Re(FT(x))"),  yaxis=list(title="Im(FT(x))"), 
         legend=list(orientation='h', y=-0.2))

plot_ly(x=t, y=Im(FT_y), type="scatter", mode="markers+lines", name="Im(FT(x))") %>%
  add_trace(x=t, y=Re(FT_y), type="scatter", name="Re(FT(x))") %>%
  add_trace(x=t, y=Re(FT_y), type="scatter", name="Re(FT(x))") %>%
  add_trace(x=t, y=magnitude, type="scatter", name="|FT(x)|") %>%
  add_trace(x=t, y=phase, type="scatter", name="Phase(FT(x))") %>%
  layout(title="Magnitude, Phase, Real and Imaginary Parts of the Discrete Fourier Transform", 
         xaxis=list(title="frequency"),  yaxis=list(title="Intensity"), 
         legend=list(orientation='h', y=-0.2))

Next we can use the inverse Fourier transform (IFT) to recover (synthesize) the signal back in spacetime, using its FT frequency-space representation. We need to recognize that the IFT is complex-valued, so we will plot and compare just the real part, i.e., the IFT-magnitude, $|IFT(\hat{x})|$, to the original signal $x(t)$. To show the exact match between the original signal and its reconstruction $IFT(FT(x))$, the original signal $x(t)$ is plotted as a smooth curve, whereas the synthesized signal $|IFT(\hat{x})|$ is plotted as point scatter.

IFT_FT_x <- Re(fft(FT_y, inverse=TRUE)/length(t))

plot_ly(x=t, y=y, type="scatter", mode="lines", name="x(t)") %>%
  add_trace(x=t, y=IFT_FT_x, mode="markers", name="|IFT(FT(x))|") %>%
  layout(title="Comparing the Original Signal to its Discrete IFT Reconstruction", 
         xaxis=list(title="time"),  yaxis=list(title="Intensity"), 
         legend=list(orientation='h', y=-0.2))

The Fourier transform has many applications because its spectral decomposition explicates periodic frequency patters in data and it facilitates solving differential and integral equations, see this article, DOI: 10.1016/j.padiff.2022.100280 for examples.

For instance,

\[\frac{d}{dt}\left( c_{\omega} e^{2\pi i \omega t/n}\right) = \frac{2\pi i \omega}{n} c_{\omega} e^{2\pi i \omega t/n}\ ,\]

which implies that

\[\left (\hat{x}(\omega)\right )' = \frac{2\pi i \omega}{n} \hat{x}(\omega) \] and various properties deriving from the FT linearity facilitate scientific computing.

Signal horizontal shifts are easy to compute, e.g., if $y(t) = x(t-t_0)$ then

\[\hat{y}(\omega) = \hat{x}(\omega) e^{-2\pi i t_0 \omega/n}\ .\]

Spacetime convolutions transform into frequency k-space multiplication, and conversely, convolution in the frequency domain coresponds to multiplication in the time domain. Convolutions represent weighted sums of $x$ values around $t$, with weights given by a kernel $h$

\[\underbrace{(x * h)}_{convolution}(t) \equiv \sum_{s=0}^{n-1}{x(s) h(t-s)}\ .\] \[\underbrace{\widehat{(x * h)}}_{FT(convolution)}(\omega) = \underbrace{\hat{x}(\omega) \cdot \hat{h}}_{muliplication}(\omega)\ .\]

Let’s consider a stationary time-process $x(t)$, with autocovariance $\gamma(h)$. Suppose $\mu_t = \mathbb{E}(x_t)$ is the expected value of the stationary process. The autocovariance is

\[{K}_{xx}(t_1,t_2) = {cov}\left(x_{t_1}, x_{t_2}\right) = \mathbb{E}((x_{t_1} - \mu_{t_1})(x_{t_2} - \mu_{t_2})) = \mathbb{E}(x_{t_1} x_{t_2}) - \mu_{t_1} \mu_{t_2}\ .\]

When $x = x(t)$ and $y = x(t + h)$, the covariance ${K}_{xy}(t,t+h)=\gamma(t,t+h)=\gamma(h)$ is then a function of their time separation (or lag). We’ll reflect on the Fourier transform of $\gamma$

\[\hat{\gamma}(\omega) = \sum_{h=0}^{n-1}{\gamma(h) e^{-2\pi i h \omega/n}}\] relative to the Fourier transform of the process $\hat{x}$.

Since $x(t)$ is random, $\hat{x}(\omega)$ is also random and the amplitude $|\hat{x}(\omega)|$ is random, and the power $|\hat{x}(\omega)|^2$ is random.

However, the power spectrum is a real scalar (not-random)

\[S(\omega) = \mathbb{E}{|\hat{x}(\omega)|^2}\ .\]

Wiener-Khinchin theorem: The power spectrum is the Fourier transform of the autocovariance

\[\hat{\gamma}(\omega) = S(\omega) = \mathbb{E}(|\hat{x}(\omega)|^2)\ .\] Having an estimate of one term helps us estimate the other term. For instance, if we can estimate $\mathbb{E}(|\hat{x}(\omega)|^2)$, we can estimate $\gamma(h)$, by inverting the Fourier transform.

To explicate the basic terms in the Fourier transform, $x(t) e^{-2\pi i \omega t/n}$, consider the following relations

\[a_{\omega} \sin(2\pi \omega t/n) + b_{\omega} \cos(2\pi \omega t/n) \underbrace{=}_{algebra} \\ \sqrt{a_\omega^2 + b_\omega^2}\left(\frac{a_{\omega}}{\sqrt{a_\omega^2+b\omega^2}} \sin(2\pi \omega t/n) + \frac{b_{\omega}}{\sqrt{a_\omega^2+b\omega^2}}\cos(2\pi \omega t/n) \right) \underbrace{=}_{~ \text{sine and cosine definition}} \\ \sqrt{a_\omega^2+b_\omega^2}\left(\cos(\phi_\omega) \sin(2\pi \omega t/n) + \sin(\phi_\omega) \cos(2\pi \omega t/n)\right) \underbrace{=}_{trig\ angle-addition\ identity} \\ \sqrt{a_\omega^2+b\omega^2} \sin{(2\pi \omega t/n + \phi_\omega)}\ .\]

The inner product of two real vectors $\vec{u}$ and $\vec{v}$ is a scalar $\langle \vec{u}, \vec{v} \rangle=\vec{u}^T_{1\times p} \vec{v}_{p\times 1}\in\mathbb{R}$. In the special case when $\vec{u}\equiv\vec{v}$, $\langle \vec{u}, \vec{u} \rangle=|\vec{u}|^2$. However, in the more general case of complex vectors, $\vec{u} = a+ib$, then $\langle \vec{u}, \vec{u} \rangle \not=\vec{u}^T \vec{u} = (a+ib)(a+ib) = a^2 - b^2 + 2iab\in\mathbb{C}$ and $\langle \vec{u}, \vec{u} \rangle \not= |\vec{u}|^2$.

To make the inner product consistent, we need to use the complex conjugate, $\vec{u}^*\equiv \vec{v}^\dagger$. So, \[\langle \vec{u}, \vec{u} \rangle=\vec{u}^\dagger* \vec{u} = (a-bi)(a+bi) = a^2 + b^2=|\vec{u}|^2\ .\]

In the Fourier transform case, the inner product of the spacetime vector of $x(t)$ with the basis vector $e^{2\pi i \omega t/n}$ also requires complex-conjugation of the basis vector, $x(t) e^{-2\pi i \omega t/n}$.

\[x(t) = \frac{1}{n}\sum_{\omega=0}^{n/2-1}{\hat{x}(\omega) e^{2\pi i t \omega/n}}.\]

For a given frequency number $\nu$, multiplying both sides of the equation by the confugate base $e^{-2\pi i t \nu/n}$ yields

\[x(t) e^{-2\pi i t \nu/n} = \frac{1}{n}\sum_{\omega=0}^{n/2-1}{\hat{x}(\omega) e^{2\pi i t (\omega-\nu)/n}}\]

and summing across all timepoints, $t$, we obtain

\[\sum_{t=1}^{n}{x(t) e^{-2\pi i t \nu/n}} = \frac{1}{n}\sum_{t=1}^{n} {\sum_{\omega=0}^{n/2-1}{\hat{x}(\omega) e^{2\pi i t (\omega-\nu)/n}}}\ .\]

Next we can exchange the order of the summation indices and rearrange the terms to get

\[\sum_{t=1}^{n}{x(t) e^{-2\pi i t \nu/n}} = \frac{1}{n}\sum_{\omega=0}^{n/2-1} {\hat{x}(\omega)\sum_{t=1}^{n}{ e^{2\pi i t (\omega-\nu)/n}}}\]

For all frequencies $\omega-\nu\not=0$, the sines and cosines average to zero, so the right hand side simplifies

\[\sum_{t=1}^{n}{x(t) e^{-2\pi i t \nu/n}} = \frac{1}{n}\hat{x}(\nu)n = \hat{x}(\nu).\]

8.1.7 Beijing Air Quality Example

Let’s use a week-long hourly data (168 hours) from the DSPA (2008-2016) Beijing air quality PM2.5 measuring air pollutants - PM2.5 particles in micrograms per cubic meter.

library(plotly)
library(ggplot2)
library(zoo) # use: index() converts Date to its index 
library(reshape2)

beijing.pm25<-read.csv("https://umich.instructure.com/files/1823138/download?download_frd=1")
beijing.pm25[beijing.pm25$Value==-999, 9]<-NA
beijing.pm25[is.na(beijing.pm25$Value), 9]<-floor(mean(beijing.pm25$Value, na.rm = T))

id = 1:nrow(beijing.pm25)
range <- 168 #  (1 week hourly data)
mat = matrix(0, nrow=range, ncol=3)
stat = function(x){
  c(mean(beijing.pm25[iid,"Value"]),quantile(beijing.pm25[iid,"Value"],c(0.2,0.8)))
}
for (i in 1:range){
  iid = which(id%%range==i-1)
  mat[i,] = stat(iid)
}

mat <- as.data.frame(mat)
colnames(mat) <- c("mean","20%","80%")
# mat$time = c(1:range)
mat$time <- readr::parse_datetime(beijing.pm25$Date..LST., "%m/%d/%Y %h:%M")[1:range]

dt <- melt(mat,id="time")

trend <- lm(value ~ index(value), data = dt)
detrended.trajectory <- trend$residuals
detrendMod3 <- detrended.trajectory[seq(1, length(detrended.trajectory), 3)]
plot_ly(data = dt, x=~time[1:range], y=~value[(range+1):(2*range)], type="scatter", mode="markers+lines") %>%
  add_trace(x=~time[1:range], y=detrendMod3, mode="lines", name="detrended") %>%
  layout(title="Beijing daily 24-hour average PM2.5 (2008-2016)",
         xaxis=list(title="Time"), yaxis=list(title="AQI (PM2.5)"), 
         legend=list(orientation='h', y=-0.2))

f.data <- GeneCycle::periodogram(detrendMod3)
harmonics <- c(1:10) 
plot_ly(x=f.data$freq[harmonics]*length(detrendMod3), 
        y=f.data$spec[harmonics]/sum(f.data$spec), type="bar") %>%
  layout(title="Beijing Air Quality (PM2.5) Harmonics over 168-hour (week) period",
         xaxis=list(title="Harmonics (~Hz)"), yaxis=list(title="Amplitute Density"), 
         legend=list(orientation='h', y=-0.2))

The stronger harmonics are the $1Hz$, $2Hz$, and $4Hz$. The PM2.5 have a cyclical (hourly) periodicity reflected by the first harmonics ($1Hz$ here corresponds to $1$ hour cycle, $2Hz$ means every $2$ hours, etc.)

8.2 Continuous Fourier Transform

the general form of the Fourier transformation of an integrable function $f:\mathbb{R}\to\mathbb{C}$ and its inverse are analytically defined by \[FT(f)(\omega)\equiv \hat{f}(\omega)=\int_{\mathbb{R}}{f(x)e^{-i2\pi i \omega x}\ dx}, \] \[IFT(\hat{f})(x)\equiv \hat{\hat{f}}(x)=\int_{\mathbb{R}}{\hat{f}(\omega)e^{i2\pi i \omega x}\ d\omega}. \]

This directly generalizes to multivariate functions $g:\mathbb{C}^n \to\mathbb{C}^n$: \[FT(g)(\omega)\equiv \hat{g}(\omega)=\int_{\mathbb{C}^n}{g(x)e^{-i2\pi i \langle \omega ,x\rangle}\ dx}. \]

Properties of the Fourier transformation include:

Linearity: The Fourier transformation is a linear functional, $FT(\alpha f+\beta g)= \alpha FT(f)+\beta FT(g).$
Translation/offset: if $a$ is a constant and $h(x)=f(x-a)$, then $FT(h)=\hat{h}(\omega)=e^{-2\pi i a\omega} \hat{f}(\omega)=e^{-2\pi i a\omega} FT(f)$.
Scaling: if $a$ is a non-zero constant and $h(x)=f(ax)$, then $FT(h)=\hat{h}(\omega)=\frac{1}{|a|} \hat{f}\left (\frac{\omega}{a}\right ).$
Differentiation: $FT\left (\frac{df(t)}{dt}\right )(\omega)= 2π i \omega× FT(f)(\omega)$. More generally, the Fourier transform of the $k^{th}$ derivative is $\widehat{(f^{(k)})}(\omega) = (2π i \omega)^k \hat{f}(\omega)$.
Convolution: $(\widehat{f\cdot g})(\omega) = (\hat{f}*\hat{g})(\omega)$ and $(\widehat{f*g})(\omega) = \hat{f}(\omega)\cdot \hat{g}(\omega)$
Energy preservation: $\int{||f(x)||^2\ dx}=\int{||\hat{f}(\omega)||^2\ d\omega}$.
Duality: $FT(FT(f))(\omega)=f(-\omega)$.
Impulse functions: The Fourier transform of a shifted (Dirac delta) impulse function $\delta(x+\lambda)$ is a complex exponential function $d(\omega)=\int {\delta(x+\lambda) e^{-i x\omega} dx}=e^{i\lambda\omega}$.

Let’s formulate the Fourier transform and its inverse (FT/IFT) for 2D images to hint to the multidimensional interpretation of harmonics and spectral methods.

\[\hat{f}(u,v)=F(u,v)=\int_{-\infty}^{\infty}{\int_{-\infty}^{\infty}{f(x,y)e^{-i2\pi(ux+vy)}dxdy}},\] \[f(x,y)=\hat{\hat{f}}(x,y)=\hat{F}(x,y)=\int_{-\infty}^{\infty}{\int_{-\infty}^{\infty}{F(u,v)e^{i2\pi(ux+vy)}dudv}},\] where $u$ and $v$ are the spatial frequencies, $F(u,v)=F_R(u,v)+iF_I(u,v)$ is a complex number for each pair of arguments, \[|F(u,v)|=\sqrt{F_R^2(u,v)+F_I^2(u,v)}\] is the magnitude spectrum, and \[\arctan\left (\frac{F_I(u,v)}{F_R(u,v)}\right )\] is the phase angle spectrum.

The complex exponential \[e^{-i2\pi(ux+vy)}=\cos(2\pi(ux+vy)) +i\ \sin(2\pi(ux+vy))\] represents the real and imaginary (complex) sinusoidal terms in the 2D plane. The extrema of its real part ($\cos(2\pi(ux+vy))$) occurs at $2\pi(ux+vy)=n\pi$. Using vector notation, \[2\pi(ux+vy)=2\pi \langle U, X \rangle =n\pi,\] where the extrema points $U=(u,v)^T$ and $X=(x,y)^T$ represent sets of equally spaced parallel lines with normal $U$ and wavelength $\frac{1}{\sqrt{u^2+v^2}}$.

9 Ordinary Differential Equations

9.1 Volterra-Lotka Predator-Prey Modeling

The populations of a predator/prey pair could be described in a bivariate statistical model. This is because the populations of each species may be considered to be dependent or independent of one another. Take the wolf-rabbit pairing - as the wolf population increases, the rabbit population is expected to decrease as more wolves catch and eat more rabbits and the rabbits may not be able to reproduce fast enough to replenish/sustain the population numbers. This will eventually lead to some wolves starving, due to a lack of food, causing the wolf population to decrease and the rabbit population to start rising back up, as there are less wolves around to hunt and eat the rabbits.

Let’s represent the time-dependent size of a prey species (e.g., rabbits) by $x(t)$ and assume their food supply is unlimited and the environment is stable. Then, as there are more potential couplings, the rate of population growth would be proportional to the current population-size. Symbolically, this relation can be expressed as a differential equation:

\[\underbrace{\frac{d}{dt} x(t)}_{rate\ of\ change}=\overbrace{a}^{Growth\ rate} × \underbrace{\ x(t)\ }_{population\ size} \\ \Longrightarrow \underbrace{x(t)= Ae^{at}}_{Solution}\]

The proportionality constant $a >0$ represents the growth (birth ratio per rabbit) and the parameter $A=x(0)$ is the initial population size. To adjust for the obvious problem with this model, namely that the population grows unbounded over time, we can introduce a constraint that the growth rate $a=a(t)$ decreases as a function of the population size, $x(t)$. However, another strategy is to introduce environmental effects, such as the presence of another (predator) population $y(t)$, e.g., wolves, that controls the size of the initial population by hunting rabbits. Thus, the rabbit population $x(t)$ will decrease proportionally to the size of the wolf population, $y(t)$, multiplied by the number of rabbits. This approach introduces some predator-prey interactions between the two species that may tamper the growth of the prey and the demise of the predator. The mathematical model of this dual-species interaction can be represented as another simple differential equation

\[\frac{d}{dt} x(t)=a\ x(t) -b\ x(t)\ y(t).\]

The additional parameter $b$ represents the rate of this proportional decrease of the prey species (rabbits) based on the size of the predators (wolves). Clearly, the first equation only handles the rate-of-change of the model prey population rate of change, $x(t)$. Assuming that without any rabbits, the wolf population will rapidly decrease at a rate of $c$, as well as grow proportionally to the rabbit population, $x$, we can add another equation to model the second species:

\[\frac{d}{dt} y(t) =-c\ y(t)+ \delta y(t)\ x(t).\]

This represents a system of two linear ordinary differential equations that model the predator-prey process:

\[ \begin{cases} \frac{d}{dt} x(t)=a\ x(t) -b\ x(t)\ y(t)\\ \frac{d}{dt} y(t) =-c\ y(t)+ \delta y(t)\ x \end{cases} , \] where the four parameters $a,b,c,\delta\ge 0$ describe the interactions between the two species.

The linear system of differential equations is called the Lotka-Volterra model and represents both the individual growth and decay of each species along with their mutual interaction. An example solution of the linear system is shown in the figure below. Note that increases of the rabbit (prey) population naturally correspond to proportional increases of the wolf (predator) population size. In turn, the growth of predators reduces prey population and this periodic growth-and-decay pattern persists, until any extraneous factors may occur and disturb the symbiotic relation captured as a closed 2D curve in phase space. This process forms a cyclical dependence that will center around a certain equilibrium point where both populations can symbiotically coexist, as is illustrated below. This cycle can thus be transformed into a bivariate probability problem, where the resulting probability reflects the chances of the populations of both the wolves and rabbits being observed to contain a certain number of individuals.

##                 Prey   Predator
## Min.    3.226947e-03   2.832805
## 1st Qu. 1.620593e+00   9.388367
## Median  1.976610e+00   9.785843
## Mean    2.033186e+00   9.832447
## 3rd Qu. 2.094876e+00  10.065521
## Max.    2.000000e+01  24.214637
## N       3.010000e+02 301.000000
## sd      1.726485e+00   2.381933

10 Probability and Statistics

Although much of classical Newtonian physics is deterministic, many important phenomena particular to quantum mechanics, biophysics, and statistics are naturally probabilistic and require a modeling of extrinsic environmental effects and accounting for intrinsic noise. This section provides a brief review of concepts that form the basis of probability and statistics that will be useful later in the course. For more information, see the Probability & Statistics EBook and the Data Science and Predictive Analytics Textbook.

10.1 Chance and Probability

Chance represents the possibility of observing a specific outcome in an experiment, observation, or trial. As many such situations encapsulate uncertainties, the occurrence of an outcome event, prior to conducting the experiment, is expected to be stochastic. The chance expresses the likelihood of observing each possible outcome. For a well-posed mathematical problem, chance is defined as a probability function mapping possible outcomes to concrete numerical likelihood values. This probability quantifies the expected likelihood of observing each specific (atomic or composite) event and is often measured as the expected fraction of the positive experiment state space to the entire state space including all possible observable outcomes.

Although there are alternative formulations of probability, the most common approaches are based on the 1933 Kolmogorov axioms of probability theory. The three axioms require that event probabilities (values of the probability mapping: $Outcomes \longrightarrow Likelihoods$) are numbers in $[0,1]\subset \mathbb{R}$. Higher and low event probabilities correspond to likely and unlikely outcomes.

Axiom 1: The probability of all atomic outcomes and composite events $E$ are always between 0 and 1: $0\leq P(E)\leq 1$.
Axiom 2: The probability of the entire sample space is unitary: $P(\Omega )=1$.
Axiom 3: For each countable sequence of mutually exclusive events $\{E_1,E_2,E_3,\cdots \}$, $P\left(\cup_{i=1}^{\infty}E_{i}\right)=\sum_{i=1}^{\infty }P(E_{i})$.

For additional informaation, please review the fundamentals probability theory and complementary examples provided in the Probability and Statistics EBook.

Probability theory plays a major role in all experimental studies, statistical inference, mathematical modeling, computational science, data science, artificial intelligence, machine learning, computer science applications, the general understanding of regularities/singularities, and complex systems dynamics.

10.2 Statistical Moments: Mean, Standard Deviation, Skewness, Kurtosis

When studying different processes, we often rely on observational data, representing random samples of the corresponding process. Four characteristics are used most often to describe distributions of these processes. These descriptors include the first four (sample) moments: mean, standard deviation (or variance), skewness, and kurtosis. In mathematics, these are called “moments”, or qualitative measures of the shape of the distribution of the underlying process.

The (sample) mean (or average) is the value estimating the centrality of the distribution. Mathematically:

\[\bar x = \frac{1}{N} \sum_i{f_ix_i} = \frac{\sum_i{f_ix_i}}{\sum_if_i} = \sum_i{x_ip_i},\] where:

$N$ is the number of values in the dataset
$f_i$ is the frequency of the i-th value
$x_i$ is the measurement of the i-th value
$p_i=\frac{f_i}{N}$ is the probability of the i-th value

The standard deviation, also known as the square root of the variance, is a measure used to characterize how dispersed the data is compared to the mean. The square root is taken to normalize the units of the variance back to the units of the raw data. Mathematically, the sample standard deviation is:

\[s\equiv \hat\sigma = \sqrt{{\frac {1}{N-1}}\sum _{i=1}^{N}\left(x_{i}-{\bar {x}}\right)^{2}} \equiv \sqrt{\sum_{\text{unique } j}{p_j(x_j - \bar x)}^2} .\] Skewness measures how asymmetric a distribution is. Trivial skewness indicates that the distribution is balanced on both sides (symmetric), positive skew means that most of the data is on the left side of the distribution, and negative skew means that most of the data is on the right side of the distribution. Mathematically, the (sample) skewness is:

\[Skewness = \frac{m_3}{m_2^{3/2}} = \frac{\frac{1}{N}\sum_{i=1}^N{(x_i - \bar x)^3}}{[\frac{1}{N} \sum_{i=1}^N{(x_i - \bar x)^2]^{3/2}}},\] where $s=m_2^{1/2}$ is the second central moment, i.e., the sample standard deviation. Note that using $N$ for normalization makes this definition of skewness biased; this is the definition used by the functions provided by the R package.

Kurtosis is a measure of “tailedness” and quantifies how probable outliers are in a distribution. Mathematically: \[Kurtosis = \frac{\sum_{i=1}^N{(x_i - \bar x)^4}}{Ns^4}.\] Note again that using $N$ for normalization makes this definition of kurtosis biased; this is the definition used by the functions provided by the R package.

The following code uses a real positron emission tomography (PET) image of the brain to illustrate the first four statistical moments of the intensity distribution for a neuroimaging PET signal.

## In R, we can use mean(), sd(), and skewness() to take the arithmetic mean, sample standard deviation, and sample skewness.
## kurtosis() is also available through the PerformanceAnalytics library. The formulas used for all of these functions match those above.

# 3D PET data: http://socr.umich.edu/HTML5/BrainViewer/data/PET_FDG_3D_vol.nii.gz

library(httr)
library(PerformanceAnalytics)
library(brainR)
httr::set_config(config(ssl_verifypeer=0L))

PETURL <- "http://socr.umich.edu/HTML5/BrainViewer/data/PET_FDG_3D_vol.nii.gz"
PETFile <- file.path(tempdir(), "PET_FDG_3D_vol.nii.gz")
invisible(GET(PETURL, write_disk(PETFile, overwrite=TRUE)))
brainVolume <- readNIfTI(PETFile, reorient=FALSE) #overwrite old brainVolume variable

plot_ly(z=t(brainVolume[,, 40]), type="heatmap")

# We will use data from only one slice (matrix) of the data and convert it into a vector
PETvec <- c(brainVolume[,, 40])

# Visualize the raw data/distribution
# hist(PETvec)
plot_ly(x = PETvec, type = "histogram")

\[\mu, \sigma, s, \kappa.\]

# Take all four moments
meanPET <- mean(PETvec); meanPET #where the distribution is centered

## [1] 0.4216673

sdPET <- sd(PETvec); sdPET #not too spread apart

## [1] 0.3104582

skewPET <- skewness(PETvec); skewPET #skewed to the left (positive)

## [1] 0.06881796

kurtosisPET <- kurtosis(PETvec); kurtosisPET #few outliers

## [1] -1.574418

10.3 Commonly used Univariate Probability Distributions

The Probability Distributome Project’s distribution navigator and probability calculators provide powerful demonstrations of using probability models to study natural processes.

10.3.1 Binomial Probability Distribution

A binomial distribution is a discrete probability distribution and exact physical model for any experiment which can be characterized as a series of trials where:

Each trial has only two outcomes: success or failure
$P(success) = p$ is the same for every trial
Trials are independent: the outcome of one trial has no effect on any other trial
The random variable $X$ = {the number of successes in $N$ independent experiments}

Additionally, the expected value and variance are simplified for binomial distributions:

\[E(X) = np\] \[Var(X) = np(1-p)\]

# For a biased coin that is heads 70% of the time and flipped 100 times:
x <- seq(1,100,by=1)
flip100 <- dbinom(x,100,0.7) * 100

plot_ly(x = ~rbinom(100, 100, 0.7), name="Simulated 100 coin flips", type = "histogram") %>%
  add_trace(x = ~x, y = ~flip100, name = "Binomial Distribution (%)", type = "scatter", mode = "line") %>%
  layout(title="Flipping a Biased Coin (70% heads) 100 times", font=list(size=12),
         xaxis = list(title = "x"), yaxis = list(title="f(x)"), showlegend= T, 
         legend = list(orientation = 'h'))

10.3.2 Normal (Gaussian) Probability Distribution

Many numerical measurements are well approximated by a normal (or Gaussian) distribution, particularly as the number of samples increases. If the assumption that a phenomena can be described by many small, independent effects additively contributing to each observation, the normal distribution generally holds. The shape of this distribution is usually called a bell curve and is continuous.

The shape is particularly important because of its relationship to the distribution’s mean and standard deviation:

68.26% of all values are within one standard deviation of the mean.
95.44% of all values are within two standard deviations of the mean.
99.74% of all values are within three standard deviations of the mean.

Even more convenient is when comparing general normal distributions, where the mean and standard deviation can be any numbers, to the special case of the standard normal distribution, which has a mean of 0 and a standard deviation of 1. This is known as taking a Z-score:

\[Z = \frac{X - \mu}{\sigma}\] Where:

$Z$ is the Z-score, or standard normal random variable
$X$ is the general normal random variable
$\mu$ is the population mean
$\sigma$ is the population standard deviation

For the following visualization, we will be using an SOCR dataset that illustrates how human height follows a normal distribution.

library(plotly)
#This data is from the SOCR website: it details 25,000 synthetic records of human heights and weights of 18 year old children.
rawdata <- rio::import("http://socr.ucla.edu/docs/resources/SOCR_Data/SOCR_Data_Dinov_020108_HeightsWeights.html",
                          format="html")
heightdata <- as.numeric(rawdata[-1,c(2)])
plot_ly(x = ~heightdata, type = "histogram") %>%
  layout(title="25,000 synthetic heights for 18 year old humans", font=list(size=12),
         xaxis = list(title = "Height (Inches)"),
         yaxis = list(title="People"), showlegend= T)

The plot below shows the 68−95−99.7 rule for the Normal (Gaussian) distribution.

N <- 1000

x <- rnorm(N, 0, 1)
norm <- rnorm(N, 0, 1)
#  hist(x, probability=T, 
#    col='lightblue', xlab=' ', ylab=' ', axes=F, 
#    main='Normal Distribution')
# lines(density(x, bw=0.4), col='red', lwd=3)
normDensity <- density(norm, bw=0.5)
dens <- data.frame(x = normDensity$x, y = normDensity$y)
miny <- 0
maxy <- max(dens$y)

xLabels <- c("&mu;-3&#963;","&mu;-2&#963;", "&mu;-&#963;", "&mu;", "&mu;+&#963;", "&mu;+2&#963;", "&mu;+3&#963;")
labelColors <- c("green", "red", "orange", "black", "orange", "red", "green")
xLocation <- c(-3, -2, -1, 0, 1, 2, 3)
yLocation <- 0.2
data <- data.frame(xLabels, xLocation, yLocation)

plot_ly(dens) %>% 
    add_histogram(x = norm, name="Normal Histogram") %>% 
    add_lines(data = dens, x = ~x, y = ~y+0.05, yaxis = "y2", 
              line = list(width = 3), name="N(0,1)") %>% 
    add_annotations(x = ~xLocation, y = ~yLocation, type = 'scatter', ax = 20, ay = 20,
                    mode = 'text', text = ~xLabels, textposition = 'middle right',
                    textfont = list(color = labelColors, size = 16)) %>%
    add_segments(x=-3, xend=-3, y=0, yend=100, name="99.7%", line=list(dash="dash", color="green")) %>%
    add_segments(x=-2, xend=-2, y=0, yend=90, name="95%", line=list(dash="dash", color="red")) %>%
    add_segments(x=-1, xend=-1, y=0, yend=80, name="68%", line=list(dash="dash", color="orange")) %>%
    add_segments(x=1, xend=1, y=0, yend=80, name="68%", line = list(dash = "dash", color="orange")) %>%
    add_segments(x=2, xend=2, y=0, yend=90, name="95%", line=list(dash="dash", color="red")) %>%
    add_segments(x=3, xend=3, y=0, yend=100, name="99.7%", line=list(dash="dash", color="green")) %>%
    add_segments(x=-3, xend=3, y=100, yend=100, name="99.7%", line=list(dash="dash", color="green")) %>%
    add_segments(x=-2, xend=2, y=90, yend=90, name="95%", line=list(dash="dash", color="red")) %>%
    add_segments(x=-1, xend=1, y=80, yend=80, name="68%", line=list(dash="dash", color="orange")) %>%
    layout(bargap=0.1, xaxis=list(name=""), yaxis=list(title="density/frequency"),
           yaxis2 = list(overlaying = "y", side = "right", # title="prob",
                         range = c(miny, maxy+0.1), showgrid = F, zeroline = F),
           legend = list(orientation = 'h'), title="Normal 68-95-99.7% Rule")

10.3.3 Poisson Distribution

A Poisson distribution is another discrete probability distribution that expresses the probability of the number of events occurring in a fixed period of time. To be modeled properly by this distribution, the phenomena must have a known average rate of occurrence and all events must be independent. In lieu of time, events can occur in other intervals such as distance, area, or volume. Notably, this distribution can be seen and derived as a limiting case of the Binomial distribution.

The mass function is defined as: \[P(X = k) = \frac{\lambda^k e^{-\lambda}}{k!},\ \forall k\in \mathbb{N}_o,\] where:

$k$ is the number of occurrences of an event
$\lambda$, a positive real number, is the expected number of occurrences during a given interval
$X$ is the random variable representing the number of events in (the same!) given time period

And like the binomial distribution, the expected value and variance are simplified:

\[E(X) = \lambda,\] \[Var(x) = \lambda.\]

Particular to biophysics, this distribution is important for positron emission tomography (PET) imaging, since the rate of emissions is effectively modeled by the mass function. The frequency data is converted using the Fourier Transform and other manipulations into the scans that are important in the clinic.

Finally, note that the Poisson distribution, although it is discrete, becomes well-approximated by the normal distribution at high $\lambda$ values. The following visualizations will depict this, as well as a practice problem that uses the above mass function equation.

*Note: the definitions of $k$, $\lambda$ and $X$ might be confusing. An illustrative example: if we hypothetically know that a comet falls to the Earth once every 20 days and we want to know what the chances are of 5 comets falling in the next 20 days, $\lambda = \frac{1}{20}$, $X$ is the variable space that represents all possible numbers of comets falling to the Earth in any interval of 20 days, and $k = 5$. Thus, if we plug in 5 for $k$ (and by extension, $X$) and $\frac{1}{20}$ for $\lambda$ in the above equation, we should get the probability that we are looking for.

# Practice problem: estimate the likelihood that a certain pair of PET detectors is going to generate 15 counts/clicks in one minute, when the average rate per 5 minutes is 10.

## Here, lambda = 10 clicks / 5 minute = 2 clicks per one minute. k = 15
prob <- ((2^15 * exp(-2))) / factorial(15); prob # a really low probability

## [1] 3.391262e-09

# Visualization of high N Poisson vs. Normal distributions
x <- seq(1000,3000,by=10)
poisEx <- dpois(x,2000)
normEx <- dnorm(x, mean=2000, sd=sqrt(2000))

plot_ly(x = ~x, y = ~poisEx, name="Poisson Distribution", type = "scatter", mode='markers') %>%
  add_trace(y = ~normEx, name="Normal Distribution ", type = "scatter", mode='lines') %>%
  layout(title="Poisson vs. Normal (Lambda = 2000)", font=list(size=12),
         xaxis = list(title = "x"), yaxis = list(title="f(x)"), showlegend= T,
         legend = list(orientation = 'h'))

x <- seq(0,5,by=1)
poisEx <- dpois(x,2)
normEx <- dnorm(x, mean=2, sd=sqrt(2))
plot_ly(x = ~x, y = ~poisEx, name="Poisson Distribution", type = "bar") %>%
    add_trace(y = ~normEx, name="Normal Distribution ", type = "scatter", mode='lines') %>%
    layout(title="Poisson vs. Normal (Lambda = 2)", font=list(size=12),
           xaxis = list(title = "x"), yaxis = list(title="f(x)"), showlegend= T,
           legend = list(orientation = 'h'))

10.3.4 Joint Probability Distributions

However, not all processes, or datasets, may always be modeled accurately using a single random variable. In these cases, we introduce one or more random variables that each may be modeled by a distinct distribution and independent parameters (eg. both may be normally distributed, but with different means and standard deviations).

In the following example from the SOCR Time Complexity, Inferential Uncertainty (TCIU), and Spacekime Analytics resource, we are able to easily visualize the resulting surface created by the combination of two distributions, one in time (orange) and one in space (green).

library(extraDistr)
library(pracma)

# set parameters
a <- 2      # base of population exponential growth/decay w.r.t. time, a>0
Ao <- 100   # initial bacterial colony size at time t=0, Ao >0
rho <- 0.3  # initial dependence (correlation), -1 < rho = Corr(t,phi) < 1

s <- seq(0, 10, length=100)       # Time parameter grid
phi <- seq(-pi, pi, length=100)   # Phase parameter grid

# Base function definitions

# f(t)
f <- function (t=0) {
  return ( Ao * a^t )
}

# g(phi)
g <- function (phi=0) {
  # Any other symmetric distribution over a finite domain can be used, see
  # https://en.wikipedia.org/wiki/Symmetric_probability_distribution#Partial_list_of_examples 
  # draw random Phases: phiNew ~ Phi[-pi, pi)
  #  l = rlaplace(1000, mu=0, sigma=0.5)
  # hist(l)
  # Laplace Distribution
  phiNew = rlaplace(length(phi), mu=0, sigma=0.5)
  phi <- phiNew
  return ( (1/(2*a))^(phi/pi) )
}

gMax <- max(abs(g(phi)))
fMax <- max(abs(f(s)))

# Gauss-copula
# C_{XY}^{Gauss} (x,y)=\frac{1}{\sqrt{1-ρ^2}} \times \exp \left (-\frac{ρ^2 (a^2+b^2 )-2abρ)}{2(1-ρ^2 )} \right ),
# where the pairwise correlation $-1 < \rho < 1$ and
# a(x)= \sqrt{2}\times erf^{-1} (2\times F_X (x)-1),
# b(y)= \sqrt{2}\times erf^{-1} (2\times F_Y (y)-1),
# and the error function is
# $$erf^{-1}(s)= \frac{2}{\sqrt{\pi}}\int_{0}^{\infty}{e^{-t^2}dt}.$$  
CGauss <- function (t=0, phi=0) {
  A = sqrt(2) * erfinv(2*(f(t)/fMax)-1)
  B = sqrt(2) * erfinv(2*(g(phi)/gMax)-1)
  return ( (1/(sqrt(1-rho^2))) * 
             exp(-(rho^2*(A^2-B^2) - 2*A*B*rho)/(2*(1-rho^2))) 
  )  
}

# F(t,phi)
F_fun <- function (t=0, phi=0, rho = rho) {
  return ( f(t) * g(phi) * CGauss(t,phi) )
}

# define kime-surface
xsurf <- s    # longitudinal trajectory
ysurf <- phi  # phase trajectory
z_fun <- function(xs,ys) { log(F_fun(xs, ys)) }  # xs + ys }  
zsurf <- outer(xsurf, ysurf, FUN="z_fun")

# Clean zsurf, as %> table(is.na(zsurf)) # FALSE  TRUE # 9911    89 
# and table(is.infinite(zsurf))  # FALSE  TRUE  9920    80 
# replace missing values in zsurf by column (MARGIN=2) average
zsurfSmooth <- sweep(zsurf, MARGIN = 2, 
                  STATS = apply(zsurf, MARGIN = 2, median, na.rm=TRUE),
                  FUN =  function(x,s) ifelse(is.na(x), s, x))
zsurfSmooth2 <- sweep(zsurfSmooth, MARGIN = 2, 
                  STATS = apply(zsurfSmooth, MARGIN = 2, median, na.rm=TRUE),
                  FUN =  function(x,s) ifelse(is.infinite(x), s, x))

library(fields)
zsurfSmooth3 <- image.smooth(zsurfSmooth2)
# str(zsurfSmooth3)  # List of 3: x, y, and z=f(x,y)

# 3D Kime-surface Plot
f <- list(family = "Courier New, monospace", size = 18, color = "black")
xlab <- list(title = "time", titlefont = f)
ylab <- list(title = "phase", titlefont = f)
zlab <- list(title = "Kime-surface", titlefont = f)

plot_ly(x=~s, y=~phi, z=~t(zsurfSmooth3$z), type="surface", opacity=0.95,
        showlegend=FALSE, showscale= FALSE, name="Kime-Surface") %>%
    # add projection curve phi=0
    add_trace(x=~s, y=0, z=~zsurfSmooth3$z[,1], type="scatter3d",
              mode="lines", line=list(width=20), name="Phi=0") %>%
    # add projection curve s=5
    add_trace(x=5, y=~phi, z=~zsurfSmooth3$z[length(s)/2,], type="scatter3d",
              mode="lines", line=list(width=20), name="s=5") %>%
    layout(title = "Dynamical System Simulation Example", 
           scene = list(aspectmode = "manual",
                    xaxis = xlab, yaxis = ylab, zaxis = zlab,
                    aspectratio = list(x=1, y=1, z=1))) # 1:1:1 aspect ratio

Try the SOCR 2D Distribution Calculator, it’s 3D counterpart, and the SOCR 2D/3D Hierarchical Clustering and Gaussian Mixture Modeling and Clustering Shiny App.

10.3.5 Discrete and Continuous Variables

A “random variable” is a term used in probability and statistics to describe the measurable quantity that depends on the outcome of the probability distribution. It can be either discrete or continuous, where the former case can only take particular values within a countable list (such as only the integers) while the latter can be any value, although it is usually between some limit. Typically, the random variable is represented by a capital letter ($X$). This is notable when defining a function for a probability, such as $F(x) = P\left (\underbrace{X}_{random} \leq \overbrace{x}^{value}\right )$, since the lower-case x represents an input value, while the upper-case X represents the random variable.

The following are examples for both types of random variable: discrete and continuous.

library(plotly)
# A good example of a discrete random variable is the Advanced Placement examination grade. In the US, this exam is given to high school students, and is used by most American institutions for college admissions and for determining college credit. The grade can only take the values of 1, 2, 3, 4, and 5: therefore, since any intervening value like decimals are not possible, it is a discrete random variable.

# Data from https://wiki.socr.umich.edu/index.php/SOCR_Data_Dinov_030708_APExamScores for 2007 AP Biology.
biology <- c(rep(5, 27996), rep(4, 29338), rep(3, 30749), rep(2, 33664), rep(1, 23049))
plot_ly(x = ~biology, type = "histogram") %>%
  layout(title="2007 AP Biology Examination Results", font=list(size=12),
         xaxis = list(title = "Score"),
         yaxis = list(title="Frequency"), 
         showlegend= FALSE)

# On the other hand, latitudes and longitudes are continuous random variables. Movement in any direction can be made in as small or as big of an increment as is physically possible, although the surface of the Earth is the limiting interval.

# Data from https://wiki.socr.umich.edu/index.php/SOCR_Data_121608_OzoneData for California Ozone Data.

rawdata <- rio::import("https://wiki.socr.umich.edu/index.php/SOCR_Data_121608_OzoneData",
                          format="html")
plot_ly(x = ~rawdata$LONGITUDE, type = "histogram") %>%
  layout(title = "Longitudes Where Ozone Data Was Collected", font=list(size=12),
         xaxis = list(title = "Longitude"),
         yaxis = list(title = "Frequency"), showlegend=FALSE)

11 Polar, Spherical and Cylindrical Coordinates

Although Cartesian coordinates are likely the first and most familiar coordinate system, other systems based more directly on magnitude and direction like the polar, spherical, and cylindrical coordinate systems can substantially simplify model systems where circular, spherical, and cylindrical symmetries are present.

In 2D, unlike the Cartesian system of the x- and y-coordinates, polar coordinates rely on the “radius”, $r$, or distance from the origin or a reference point as well as the $\theta$, the angle from a reference direction, which is typically the positive x-axis. The following functions are used to convert between the two:

\[x = r \cos{\theta},\]

\[y = r \sin{\theta}.\] In 3D, the $x$-, $y$-, and $z$-coordinates can be replaced by the spherical and cylindrical coordinates. The spherical system can be viewed as a 3D extension of the polar coordinate system: a “radius”, $r$, and two reference angles ($\theta$ and $\phi$) to account for the additional dimension. Notably, $\theta$ is defined as the angle from the $z$-axis down to the $x$-$y$-plane, and $\phi$ is the angle starting from the positive x-axis to the y-axis (which confusingly is the definition of $\theta$ from the polar system). The following functions are used to convert between the two:

From Cartesian to spherical: \[r = \sqrt{x^2 + y^2 + z^2},\] \[\theta = \arctan{\frac{\sqrt{x^2 + y^2}}{z}},\] \[\phi = \arctan{\left (\frac{y}{x}\right )}.\] From spherical to Cartesian: \[x = r \cos{\theta}\sin{\phi},\] \[y = r \sin{\phi}\sin{\theta},\] \[z = r\cos{\phi}.\]

On the other hand, cylindrical coordinate system maintains the definitions of the polar system (although $\rho$ is used instead of $r$ and $\phi$ is used instead of $\theta$), in addition to a $z$-coordinate as in the 3D Cartesian system. The following functions are used to convert between the two:

\[x = \rho \cos{\phi},\] \[y = \rho \sin{\phi},\] \[z = z.\]

# 2D conversion functions
cart2polar <- function(x, y) {
  r = sqrt(x^2 + y^2)
  theta = atan2(y, x)
  c(r, theta) #output a list where [1] is r, and [2] is theta
}

polar2cart <-function(r, theta) {
  x = r * cos(theta)
  y = r * sin(theta)
  c(x, y) #output a list where [1] is x, and [2] is y
}

#3D conversion functions
cart2spherical <- function(x, y, z) {
  r = sqrt(x^2 + y^2 + z^2)
  theta = atan(sqrt(x^2 + y^2) / z)
  phi = atan(y/x)
  c(r, theta, phi) #output a list where [1] is r, [2] is theta, and [3] is phi
}

spherical2cart <- function(r, theta, phi) {
  x = r * cos(theta) * sin(phi)
  y = r * sin(phi) * sin(theta)
  z = r * cos(phi) 
  c(x, y, z) #output a list where [1] is x, [2] is y, and [3] is z1
}

cart2cylindrical <- function(x, y, z) {
  rho = sqrt(x^2 + y^2)
  theta = atan2(y, x)
  z_cyl = z
  c(rho, theta, z_cyl) #output a list where [1] is rho, [2] is phi, and [3] is z (cylindrical)
}

cylindrical2cart <- function(rho, theta, z) {
  x = rho * cos(theta)
  y = rho * sin(theta)
  z_cart = z
  c(x, y, z_cyl) #output a list where [1] is x, [2] is y, and [3] is z (Cartesian)
}

# 2D Example: x = 2, y = 3 (1st quadrant Cartesian)
firstquad <- cart2polar(2, 3); firstquad

## [1] 3.6055513 0.9827937

polar2cart(firstquad[1], firstquad[2]) #back out original Cartesian coordinates

## [1] 2 3

# 2D Example: r = 5, theta = -pi + 1 (3rd quadrant polar)
thirdquad <- polar2cart(5, -3.14 + 1); thirdquad

## [1] -2.694807 -4.211652

cart2polar(thirdquad[1], thirdquad[2]) #back out original polar coordinates

## [1]  5.00 -2.14

# 3D Example: x = 2, y = 8, z = -4 (Cartesian)
cart1 <- c(2, 8, -4); cart1 #Cartesian

## [1]  2  8 -4

spherical1 <- cart2spherical(cart1[1], cart1[2], cart1[3]); spherical1 #Spherical

## [1]  9.165151 -1.119163  1.325818

cylindrical1 <- cart2cylindrical(cart1[1], cart1[2], cart1[3]); cylindrical1 #Cylindrical

## [1]  8.246211  1.325818 -4.000000

#Drawing a circle:
## In polar coordinates, maintain same magnitude (r), change direction (0 to 2pi radians)
direction <- seq(0, 2 * 3.14, by=0.1); direction

##  [1] 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8
## [20] 1.9 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3.0 3.1 3.2 3.3 3.4 3.5 3.6 3.7
## [39] 3.8 3.9 4.0 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 5.0 5.1 5.2 5.3 5.4 5.5 5.6
## [58] 5.7 5.8 5.9 6.0 6.1 6.2

magnitude <- rep(1, length(direction)); magnitude #circle of radius = 1

##  [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [39] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

circle <- data.frame(x=double(), y=double())

#convert polar coordinates to Cartesian for plotting
for (x in 1:length(direction)) {
  coordinate <- polar2cart(magnitude[x], direction[x])
  circle <- rbind(circle, coordinate)
}

x_cart <- circle$X1 #x coordinates
y_cart <- circle$X0 #y coordinates

plot_ly(x = ~x_cart, y = ~y_cart, name="Circle", type = "scatter", mode='markers') %>%
  layout(title="Circle", font=list(size=12),
         xaxis = list(title = ""),
         yaxis = list(title="", scaleanchor="x", scaleratio=1), showlegend= F)

12 Partial Derivatives and PDEs

A partial derivative is the derivative of a function of several variables, where the derivative is taken with respect to one of the constituent variables while the rest are held constant. For example, for a function defined as $f(x,y,z) = 2x + 3xy^3 - 4z$, the partial derivative of $f(x, y, , z)$ with respect to $y$ is $\frac{\partial f}{\partial y} = 9xy^2$; since $2x$ and $-4z$ do not include $y$, they are treated as a constant and are eliminated by the derivative. This is key to the power that partial derivatives have in modeling real systems. There is rarely a variable of interest that is defined only in a single dimension (i.e. only in time or a dimension of space). Partial derivatives give us a tool to use to analyze and describe multivariate systems by isolating the impact of each independent variable.

Thus, it follows that a partial differential equation (PDE) is defined as an equation that relates various partial derivatives of a function with multiple variables. However, what is notable about most PDEs is that there is often no explicit algebraic solution. For this reason, PDEs are studied in many fields using methods to numerically approximate their solutions to gain an understanding of the phenomena that they physically represent.

One key example are the Navier-Stokes equations, which are a set of PDEs that represent the motion of viscous fluids. Although these equations are fundamental to many disciplines in the physical sciences and branches of engineering, they have the aforementioned property of not having an explicit algebraic solution: the Clay Mathematics Institute has a US$ 1 million prize offer out for anyone who can prove that smooth solutions to the equations always exist in three dimensions. Luckily, there are many reasonable assumptions that scientists and engineers make to find practical use out of the equations.

A simplified version of the Navier-Stokes equations for incompressible flow is shown below: \[\frac{\partial \textbf{u}}{\partial t} + (\textbf{u} \cdot\nabla)\textbf{u} - \nu \nabla^2 \textbf{u} = -\nabla w + \textbf{g},\] where:

$\textbf{u}$ is the flow velocity (vector)
$\nu$ is the kinematic viscosity
$w$ is thermodynamic work
$\textbf{g}$ is the body acceleration term (which is gravity in most cases)

In this form, it is less obvious that the equation is a PDE, but knowing that the Laplacian operator ($\nabla ^2$) in 3D Cartesian coordinates is defined as: \[\nabla^2f = \frac{\partial^2 f}{\partial x^2} + \frac{\partial^2 f}{\partial y^2} + \frac{\partial^2 f}{\partial z^2}\] and it is clear that these equations are partial differential equations in 4D Minkowski space (3D spatial dimensions and 1D time).

12.1 Flux, Curl and Divergence

For a continuously differentiable vector field $F: V \rightarrow \rm I\!R^n$, where $V = \{v \in \rm I\!R^n \}$ is compact with a piecewise smooth boundary $\partial V$, let’s denote by $\vec{n}$ the outward pointing unit normal vector on the closed and orientable boundary manifold $S = \partial V$. For the given vector field ($F$), the flux ($F \cdot \vec{n}$) describes the magnitude of the flow over the closed surface boundary ($\partial V$), and the divergence ($\nabla \bullet F$) is a scalar field representing the aggregate of the input vector field. In Cartesian coordinates, the divergence of the field $F = (F_1, F_2, ..., F_n)$ at a given spatial position, $v = (x_1, x_2, ..., x_n) \in V$ represents the sum of the field partial derivatives at the location $v$:

\[\nabla \bullet F(v) = \langle \nabla | F(v)\rangle = \left \langle \left ( \frac{\partial}{\partial x_1}, \frac{\partial}{\partial x_2}, ..., \frac{\partial}{\partial x_n}\right )\ | (F_1, F_2, ..., F_n)\right \rangle = \sum_{i = 1}^n{\frac{\partial F_i(x_1, x_2, ..., x_n)}{\partial x_i}}.\]

The flux of the field $F$ across the boundary surface $S = \partial V$ measures the amount of flow passing through the boundary surface manifold per unit time. For a given small boundary patch with surface area $\Delta S$, the relative flux is $F \cdot \vec{n}\Delta S$.

For example, let’s calculate the flux of the field $F(x, y, z) = (1, 2, 4z)$ through the part of the paraboloid $z = x^2 + y^2$ that lies above the region $x^2 + y^2 \leq 1$. We can parametrize the boundary surface as ${\bf r}(r, \theta) =(r \cos(\theta), r \sin(\theta), r^2)$. Then, ${\bf r}_r \times {\bf r}_{\theta} = (−2r^2 \cos(\theta), −2r^2 \sin(\theta), r)$ and we can express the field $F({\bf r}(u, v)) = (1, 2, 4r^2)$. Hence,

\[{\text {flux}}(F)=\int_{\mathcal{S}} {F\cdot ds} = \int_{0}^{2\pi} {\int_0^1 {(−2r^2 \cos(v) − 4r^2 \sin(v) + 4r^3) dr}d\theta} = 2\pi.\]

For a continuously differentiable vector field $F: V \rightarrow \rm I\!R^3$, where $V = \{v \in \rm I\!R^3 \}$ is compact with a piecewise smooth boundary $\partial V$, $F=(F_1, F_2, F_3)$.

The curl of $F$ is the vector field \[{\text {curl}}(F)=\left ( \overbrace{\frac{\partial F_3}{\partial y}- \frac{\partial F_2}{\partial z}}^{}, \underbrace{\frac{\partial F_1}{\partial z}- \frac{\partial F_3}{\partial x}}_{}, \overbrace{\frac{\partial F_2}{\partial x}- \frac{\partial F_1}{\partial y}}^{} \right ) \underbrace{\overbrace{=}^{symbolic}}_{cross\ product}\] \[\nabla\times F=\left (\frac{\partial}{\partial x}, \frac{\partial}{\partial y}, \frac{\partial}{\partial z}\right ) \times (F_1, F_2, F_3).\]

Contrast this curl vector definition with the real-valued divergence of $F$: \[{\text{div}}(F)=\frac{\partial F_1}{\partial x} + \frac{\partial F_2}{\partial y} + \frac{\partial F_3}{\partial z} \underbrace{\overbrace{=}^{symbolic}}_{dot\ product}\] \[\nabla\cdot F=\left (\frac{\partial}{\partial x}, \frac{\partial}{\partial y}, \frac{\partial}{\partial z}\right ) \cdot (F_1, F_2, F_3).\]

The divergence measures the expansion of a field and fields of zero divergence are incompressible.

The Laplacian of a function $f$ is defined as

\[\Delta f \equiv \nabla^2 f = \nabla \cdot( \nabla f)= {\text {div}}({\text {grad}}(f))= \frac{\partial^2 f}{\partial x^2}+\frac{\partial^2 f}{\partial y^2} +\frac{\partial^2 f}{\partial z^2}.\]

12.1.1 PDE Application: Proving the Divergence Theorem

The following example of using partial differential equations comes from the Spacekime Analytics Book. Suppose $U(y)$ and $V(y)$ are continuously differentiable scalar functions, then integration by parts yields:

\[\int{UV'dy} = UV - \int{U'Vdy}\]

Connecting definite and indefinite integrals, the fundamental theorem of calculus states that integrating the derivative of a function over an interval can be expressed in terms of the functional values at the boundary of the interval. The higher-dimensional analog of the fundamental theorem of calculus is the Gauss-Ostrogradsky divergence theorem, which expresses the relation between the flux and divergence of the vector field $F$:

\[\int_V{\underbrace{(\nabla \bullet F) }_{divergence}(v)dv} = \underbrace{\oint_{\partial V}{F \cdot \vec{n} ds}}_{flux}.\]

For example, suppose $V \subseteq \rm I\!R^3$ is a 3D solid circular cone with height, $h = 1$, and base-radius, $r = 1$. Let’s define $F: V \rightarrow \rm I\!R^3$ by $F = (F_1 = x-y, F_2 = x + z, F_3 = z-y)$. The positively oriented cone surface boundary is expressed by:

\[S = \partial V: \{v = (x, y, z) \in \rm I\!R^3 | \{x^2 + y^2 - z^2 = 0\} \cap \{0 \leq z \leq 1\} \}.\]

We can verify the divergence theorem by independently computing and comparing both hand sides representing the volume and surface integrals of the divergence and the flux, respectively.

The figure found below shows a 3D rendering of the vector field $F = (x - y, x + z, z - y)$.

The divergence of this linear field at any point $v \in \rm I\!R^3$ is constant:

\[\nabla \bullet F(v) = \left \langle \left (\frac{\partial}{\partial x}, \frac{\partial}{\partial y}, \frac{\partial}{\partial z}\right ) | (x-y, x + z, z - y)\right \rangle (v) = 1 + 0 + 1 = 2\]

Hence, the volume integral of the divergence over the solid unitary cone ($h = 1, r= 1$) is:

\[\int_V{(\nabla \bullet F)(v)dv} = \int_V{2dv} = 2(\pi \cdot1^2 \cdot \frac{1}{3}) = \frac{2 \pi}{3}\]

For the flux, the piecewise smooth cone surface boundary $S = \partial V$ consists of the base disc ($S_d$) and the attached cone surface ($S_c$), we can break the surface integral over the closed surface ($S$) in two pieces:

\[\oint_S{F \cdot \vec{n}ds} = \oint_{S_d\uplus S_c}{F \cdot \vec{n}ds} = \oint_{S_d}{F \cdot \vec{n}ds} + \oint_{S_c}{F \cdot \vec{n}ds}\]

\[\frac{2\pi}{3} = \pi - \frac{\pi}{3} = \oint_{S_d}{F \cdot \vec{n}ds} + \oint_{S_c}{F \cdot \vec{n}ds}\equiv\oint_S{F \cdot \vec{n}ds} \overbrace{=}^?\int_V{(\nabla \bullet F)(v)dv} \equiv \int_V{2dv} = 2(\pi \cdot1^2 \cdot \frac{1}{3}) = \frac{2 \pi}{3}\]

See the spacekime analytics textbook:

Data Science: Time Complexity, Inferential Uncertainty, and Spacekime Analytics, De Gruyter(STEM Series), Berlin/Boston, ISBN 9783110697803 / 3110697807.

library(plotly)
library(deSolve)
fz <- function(x, y) {sqrt(x^2 + y^2)}
fu <- function(x,y, z) {x - y}
fv <- function(x, y, z) {x + z}
fw <- function(x, y, z) {z - y}
x = seq(-1, 1, 0.1)
y = seq(-1, 1, 0.1)
z = seq(-1, 1, 0.1)
df = expand.grid(x = x, y = y, z = z)
df = filter(df, z > fz(x, y))
df['u'] = fu(df$x, df$y, df$z)
df['v'] = fv(df$x, df$y, df$z)
df['w'] = fw(df$x, df$y, df$z)

fig <- plot_ly(
  type="cone",
  x= df$x,
  y= df$y,
  z= df$z,
  u= df$u,
  v= df$v,
  w= df$w,
  text="-> wind <-",
  hoverinfo="u+v+w+text",
  marker = list(
    colorscale = "Viridis",
    cmin=0,
    cmax=100
  )
) 
fig

13 Linear Algebra

For a more in-depth summary of the linear algebra relevant to medical physics and biophysics of disease, see DSPA Chapter 4 examples.

Linear algebra is a branch of mathematics that studies linear association using vectors, vector-spaces, linear equations, linear transformations and matrices. Although it is generally challenging to visualize complex data, e.g. large vectors, tensors, and tables in n-dimensional Euclidean spaces ($n \geq 3$), linear algebra allows us to represent, model, synthesize and summarize such complex data.

Virtually all natural processes permit first-order linear approximations. This is useful because linear equations are easy (to write, interpret, solve) and these first order approximations may be useful to practically assess the process, determine general trends, identify potential patterns, and suggest associations in the data.

Linear equations represent the simplest type of model for many processes. Higher order models may include additional nonlinear terms, e.g. Taylor-series expansion. Linear algebra provides the foundation for linear representation, analytics, solutions, inference, and visualization of first-order affine models. Linear algebra is a small part of the larger mathematics functional analysis field, which is actually the infinite-dimensional version of linear algebra.

Specifically, linear algebra allows us to computationally manipulate, model, solve, and interpret complex systems of equations representing large numbers of dimensions/variables. Arbitrarily large problems can be mathematically transformed into simple matrix equations of the form $Ax = b$ or $Ax = \lambda x$. The classic example is the use of linear algebra to solve systems of equations.

The product $AB$ between matrices $A$ and $B$ is defined only if the number of columns in $A$ equals the number of rows in $B$. That is, we can multiply an $m \times n$ matrix $A$ by an $n \times k$ matrix $B$ and the result will be $AB_{m \times k}$ matrix. Each element of the product matrix, ($AB_{i,j}$), represents the product of the $i$-th row in $A$ and the $j$-th column in $B$, which are of the same size $n$. Matrix multiplication is row-by-column.

And with even this amount of information, one can transform a system of equations into one that can be solved in the language of linear algebra (and by coding languages like R). For example, consider the system of linear equations as shown below: \[a + b + 2c = 6\] \[3a -2b + c = 2\] \[2a + b - c = 3\] Using our understanding of matrix multiplication, we can write it in the equivalent matrix notation ($Ax = b$): \[\begin{bmatrix} 1 & 1 & 2\\ 3 & -2 & 1 \\ 2 & 1 & -1 \end{bmatrix} \begin{bmatrix} a \\ b \\ c \end{bmatrix} = \begin{bmatrix} 6 \\ 2 \\ 3 \end{bmatrix}\]

And with one more bit of knowledge, that the operation that effectively “eliminates” a matrix is multiplication by its inverse, we get that:

\[A^{-1}Ax \equiv x = A^{-1}b\]

\[\begin{bmatrix} a \\ b \\ c \end{bmatrix} = \begin{bmatrix} 1 & 1 & 2\\ 3 & -2 & 1 \\ 2 & 1 & -1 \end{bmatrix}^{-1} \begin{bmatrix} 6 \\ 2 \\ 3 \end{bmatrix}\]

(Formally, multiplying a matrix by its inverse yields the identity matrix, which is the equivalent object to “1” in scalar algebra: multiplying any matrix by the identity matrix simply yields the original matrix. The 3x3 identity matrix is shown below, and is always a square matrix with diagonal elements 1 and 0 at the off-diagonal elements:)

\[\begin{bmatrix} 1 & 0 & 0\\ 0& 1 & 0 \\ 0 & 0 & 1 \end{bmatrix}\]

From here, we can either solve the system by hand, or in code:

A_matrix_values <- c(1, 1, 2, 3, -2, 1, 2, 1, -1)
A <- t(matrix(A_matrix_values, nrow=3, ncol=3))
b <- c(6, 2, 3)

# To solve directly, using the equation above:
A.inverse <- solve(A)
x <- A.inverse %*% b; x

##      [,1]
## [1,] 1.35
## [2,] 1.75
## [3,] 1.45

# Or, using the functions given by R:
x_R <- solve(A, b); x_R

## [1] 1.35 1.75 1.45

Thus, while this was a simple example, it is illustrative of the power that linear algebra has to offer in the study of the physical sciences, including biophysics.

Finally, although the topic of linear algebra is vast, with only a few topics summarized in BIOPHYS 430 and can be found in other documents (the construction of matrices and matrix operations are covered in the above Scalars, Vectors, Matrices, and Tensors section. Least squares estimation, linear regression, and variance-covariance matrices are covered in a later chapter BPAD Chapter XII.), the last topic covered here will include Eigenvalues and Eigenvectors, since their definitions are key to understanding more complex topics in the field.

13.1 Eigenvalues and Eigenvectors

Eigen-spectrum decomposition of linear operators (matrices) into eigen-values and eigen-vectors enables us to easily understand linear transformations. The eigen-vectors represent the “axes” (directions) along which a linear transformation acts by stretching, compressing or flipping. And the eigenvalues represent the amounts of this linear transformation into the specified eigen-vector direction. In higher dimensions, there are more directions along which we need to understand the behavior or the linear transformation. The eigen-spectrum makes it easier to understand the linear transformation especially when many of the eigenvectors are linearly independent (orthogonal).

For matrix $A$, if we have $A \overrightarrow{v} = \lambda \overrightarrow{v}$, then we say that $\overrightarrow{v}$ (a non-zero vector) is a right eigenvector of matrix $A$ and the scale factor $\lambda$ is the eigenvalue corresponding to that eigenvector.

With some calculations, we know that $A \overrightarrow{v} = \lambda \overrightarrow{v}$ is the same as $(\lambda I_n - A)\overrightarrow{v} = \overrightarrow{0}$. Here $I_n$ is the $n \times n$ identity matrix. So when we solve this equation, we get our eigenvalues and eigenvectors. Thankfully, we don’t need to do that by hand: R’s eigen() function will help us do the calculations.

identity <- diag(nrow=2, ncol=2); identity

##      [,1] [,2]
## [1,]    1    0
## [2,]    0    1

eigen(identity)

## eigen() decomposition
## $values
## [1] 1 1
## 
## $vectors
##      [,1] [,2]
## [1,]    0   -1
## [2,]    1    0

#Proving that lambda * I_n - Av = 0
(eigen(identity)$values*diag(2)-identity)%*%eigen(identity)$vectors

##      [,1] [,2]
## [1,]    0    0
## [2,]    0    0

14 Dimensionality Reduction

The final mathematical topic that we cover in this mathematical foundations chapter is dimensionality reduction, a topic covered more in-depth in DSPA Chapter 4 examples. Here, we will summarize its importance, utility, and important methods.

Put simply, dimensionality reduction reduces the number of features when dealing with a very large number of variables by extracting a set of “uncorrelated” principle variables and reducing the complexity of the data. Although we are constructing new “uncorrelated” variables as functions of the old features, we are still approximately preserving important properties, such as retaining the distance between cases or subjects. If we are able to reduce the complexity down to a few dimensions, we can then plot the data and untangle its intrinsic characteristics. Since much of biophysical phenomena and data is multi-dimensional, techniques under the umbrella of dimensionality reduction are key to their analysis. The three techniques for dimensionality reduction covered here are PCA/ICA, t-SNE, and UMAP.

Principal Component Analysis (PCA) and Independent Component Analysis (ICA) are similar strategies, seeking to identify a new basis (vectors representing the principal directions) that the data is projected against to maximize certain (specific to each technique) objective function. These basis functions, or vectors, are just linear combinations of the original features in the data/signal. Specifically, PCA aims to explain the variance in the original signal by minimizing the covariance in the data and yielding high-energy orthogonal vectors in terms of the signal variance: it is a mathematical procedure that transforms a number of possibly correlated variables into a smaller number of uncorrelated variables through a process called orthogonal transformation.

Briefly, by defining principle components mathematically as:

\[pc_k = a_k^TX=\sum_{i=1}^N{a_{i,k}X_i},\] where:

$X$ represents a total of $n$ observations and $N$ features
$X_i$ is an $n \times 1$ vector representing a column of the matrix $X$
$a_k =$ {$a_{i,k}, a_{2,k},\cdots ,a_{N,k}$}, and are the weights chosen to maximize the variance in $pc_k$.

Using various optimization techniques to define principal components, we can explain a large amount of the complexity generated by the actual variables in the real data.

On the other hand, ICA minimizes higher-order statistics (e.g., 3rd and 4th order skewness and kurtosis), effectively minimizing the mutual information of the transformed output by aiming to find basis vectors representing independent components of the original data. This optimization is carried out on the following mathematical representation of each variable, $X_i$:

\[X_i = a_{i,1}s_1+...+a_{i,n}s_n,\]

where:

$n$ is the number of independent components
$s_i$ are the components obtained by maximizing their independence
$a_i,j$ are the corresponding weights

T-distributed Stochastic Neighbor Embedding (t-SNE) represents a recent machine learning strategy for nonlinear dimensionality reduction that is useful for embedding (e.g., scatter-plotting) of high-dimensional data into lower-dimensional (1D, 2D, 3D) spaces. For each object (point in high-dimensional space), the method models similar objects using nearby and dissimilar objects using remote, distant objects. The two steps in t-SNE include (1) construction of a probability distribution over pairs of the original high-dimensional objects where similar objects have a high probability of being paired and correspondingly, dissimilar objects have a small probability of being selected; and (2) defining a similar probability distribution over the points in the derived low-dimensional embedding minimizing the Kullback-Leibler divergence between the high- and low-dimensional distributions relative to the locations of the objects in the embedding map. Either Euclidean or non-Euclidean distance measures between objects may be used as similarity metrics.

Finally, Uniform Manifold Approximation and Projection (UMAP) is a technique for dimensional reduction proposed by McInnes and Healy in 2018. Similar to t-SNE, UMAP first constructs a high dimensional graph representation of initial data and employs graph-layout algorithms to project the original high-dimensional data into a lower dimensional space. The iterative process aims to preserve the graph structure as much as possible. The initial high-dimensional graph representation uses simplicial complexes, as weighted graphs where edge weights represent likelihoods of the connectivity between two graph points are connected, i.e., neighborhoods. For a given point, the UMAP connectedness metric computes the distance with other points based on the overlap of their respective neighborhoods.

All three techniques described above have a substantial mathematical component not found here, but is covered in-depth in DSPA Chapter 4.

Biomedical Physics with Applications to Disease (BPAD)

Chapter 1: Mathematical Foundations

BPAD Team + SOCR

February 25, 2024