SOCR ≫ | BPAD Website ≫ | BPAD GitHub ≫ |
Exponential functions are some of the most important and prevalent functions in biophysics, appearing most often when describing certain rates of growth and decay of living and chemical systems, but also in the functions governing certain probability distributions and mass transport across membranes, among others.
Mathematically: \[\underbrace{f(x)}_{value}=a\times b^x; \ \ \underbrace{a\in \mathbb{R}, b\in \mathbb{R}^+, b\not= 1}_{parameters}, \underbrace{x\in \mathbb{R}}_{argument}\]
But, how do we know if a process is exponential, or at least, can be approximated as one?
In general, an exponential growth process (\(b > 1\)) is a phenomena where the rate of increase of a quantity is proportional to the present value of that quantity (Hobbie & Roth, Intermediate Physics for Medicine and Biology, 2015). Conversely, an exponential decay process (\(0<b<1\)) is a phenomenon where the rate of decrease of a quantity is proportional to the present value of that quantity. Thus, it is intuitive that the dynamics of a population of sheep may be simulated by a combination of growth and decay exponential models: the present population of sheep at any point in time relies on a certain number of sheep at a previous time (and in the wild, a certain number of predators at a previous time as well).
The following applications/examples/simulations are illustrative of the different types of exponential growth/decay functions that are prevalent in biophysics.
However, before venturing into exponential modeling, a quick aside about mathematical modeling in biology. Unlike those developed in many classical physics areas, biological models are not always developed purely from first-principles (i.e. Newton’s laws, the laws of thermodynamics). Although biologists obviously take care not to violate those laws, biological systems tend to have a level of complexity and diversity that is not always capturable in a simple equation: many aspects of biology are not even quantifiable using the instruments currently available. That being said, it is a skill honed by all scientists working in biology to be able to decide what level of simplification is necessary for the study of the phenomena in question. In many cases, different scientists are focusing on diverse aspects of a particular system, leading to diverse modeling strategies. Note how each of the following examples handles the simplification of systems as we proceed through this section: which aspects of the phenomena are the most important to capture? Which are not as important? What are the limitations of the model? In all cases, it is the underlying assumptions and priorities of the modeling scientist that leads to the final model.
Suppose a savings account the interest rate is \(5\%\) and if the interest is credited to the account once a year (APR). Each year, the account balance increases in value by \(5\%\) of its present value. Starting out with \(\$100\), the end of the first year will add \(\$5\) credit to the account and the new balance will be \(\$ 105\). At the end of the second year, \(5\%\) of \(\$ 105\), i.e., \(\$ 5.25\) is credited to the account and the balance grows by to \(\$110.25\), etc. The annual interest compounding leads to an exponential chain-reaction (account balance increasing):
\[\underbrace{B_o=100}_{principal}; \ B_1=B_o\times (1.00+\underbrace{0.05}_{APR}); \ B_2=B_1\times (1.05)=B_o\times (1.05)^2; \\ B_3=B_2\times (1.05)=B_o\times (1.05)^3;\cdots ; B_n=B_o\times (1.05)^n . \]
Similarly, retirees that spend down their life savings by drawing \(10\%\) distribution annually and assuming the same \(5\%\) APR leads to a decay by \(5\%\) annually of the initial principal, \(\$1,000\).
library(plotly)
time_months <- 1:1200 # 12 months over 100 years
dataGrowth <- 100*(1.05)^(time_months %/% 12)
dataDecay <- 1000*(0.95)^(time_months %/% 12)
df <- data.frame(time_months, dataGrowth, dataDecay)
fig <- plot_ly(df, x = ~time_months)
fig <- fig %>% add_trace(y = ~dataGrowth, name = 'exp Growth model',
mode = 'lines+markers', type="scatter")
fig <- fig %>% add_trace(y = ~dataDecay, name = 'exp Decay model',
mode = 'lines+markers', type="scatter")
fig <- fig %>% layout(title = "Exponential Growth (income) & Decay (expenditure)",
scene = list(
xaxis = list(title = "Time (months)"),
yaxis = list(title = "Account Values (measured quantity)") ))
fig
Continuously compounded interest formula is obtained as the limit of the number of compounding periods increases. Suppose that compounding is done \(N\) times a year and the APR is fixed \(\alpha=0.05\), then \(t\) years, the number of compounding increments will be \(N\times t\), and the rate-of-increase per period is \(\frac{\alpha}{N}=\frac{0.05}{N}\). Therefore the account balance after \(t\) compounding periods would be:
\[Y=B_o\left (1+\frac{\alpha}{N} \right )^{N\times t} \underbrace{\longrightarrow}_{N\rightarrow\infty}= \lim_{N\rightarrow\infty}{B_o\left (1+\frac{\alpha}{N} \right )^{N\times t}}= B_o e^{\alpha t} .\]
Note that this definition agrees with our earlier intuitive formulation that exponential functions have the unique property that their rate of change is proportional to the function itself:
\[ \frac{d}{dt}\left ( \underbrace{e^{\alpha t}}_{y} \right)=\alpha e^{\alpha t}=\alpha y .\]
Let’s examine various strategies to understand and comprehend the complexity of the problem of “optimal” representation of the intricate balance between intervention (LC) suppressing tumor growth and health side-effects of chemotherapy (RP2) in oncology.
The goal is to understand what may be appropriate and simple mechanisms to communicate this balance amongst basic scientists, clinical investigators, and AI algorithms.
In this project, we are using (perhaps over-simplified) exponential models of growth (tumor) and decay (RP2). We’ll simulate realistic scenarios and investigate the univariate patterns and the bivariate relations of LC and RP2. There are lots of parameters and assumptions in this model-based approach. Later we can transition from model-based to data-driven estimates of the parameters to make the optimal surfaces more realistic.
The goal is to answer the following question - “How to make this Data+AI+Forecasting easily communicated (and manipulated) by clinicians and used in patient onco-counseling. As all graphs are really dynamic, the hope is that we may be able to transform classical Age-growth-curves to Optimal-Cancer-treatment-surfaces.
Before proceeding, please review the DSPA R Introduction Chapter.
Suppose for the specific cancer type, tumor cells reproduce at a rate \(r>0\). Then, if \(y_{o}\) is some initial tumor size, the total tumor size (local-control) representing the total number of cells at time \(t\) is given by: \[\underbrace{y\left(t\right)}_{\text{tumor}}=y_{o}e^{rt} .\]
Similarly, we assume that the \(RP_2\) side-effects (pain) increase exponentially with the increase of the chemotherapeutic strength with a rate \(q>0\). Then, if \(z_{o}\) is some initial default pain level, the \(RP_2\) representing the side-effects at time \(t\) is given as a pain value: \[\underbrace{z\left(t\right)}_{{\text{RP}}_2}=z_{o}e^{qt} .\]
Each time a chemotherapeutic agent is introduced, it destroys a fraction \(f\) of the tumor cells that exist at that moment in time. We are trying to develop a model for \(y\left(t\right)\) as a function of time, and the strength of the chemotherapy, which is directly proportional to the size-effects of the treatment applied as \(T\) time separated treatment administrations. We want to explore different cases and explicate the relations between \(f,T,r\)?
Generally, increasing the chemotherapy dose tends to improve patients tumor LC, however, this highly correlates with increasing risk of RILTs, such as radiation pneumonitis (RP) with grade \(\geq2\) (\(RP_2\)), see this pub PMC6279602.
Build a set of functions:
tumor_run()
: the first helper function runs the
exponential-growth model each time a chemotherapy
administration event occurs, \(y_o \times e^{r
\times time},\ r>0\),tumor_redux()
: the second function captures the entire
duration of the experiment, \(\sum_{k=1}^{Duration/T} {y_k \times e^{r \times
time}}\), \(y_{k+1}=y_k\times (1 -
f)=y_o\times (1-f)^k\),pain_run()
: the third helper function runs the pain
exponential-decay model following each time a chemotherapy
administration event occurs, \(z_o \times e^{q
\times time},\ q<0\),painSideEffects_RP2()
: the last function tracks the
side-effects, pain and RP2, \(\sum_{k=1}^{Duration/T} {z_k \times e^{q \times
time}}\), \(z_{k+1}=z_k\times (1 +
f)=z_o\times (1+f)^k\).We will examine different scenarios to explore the effects on the clinical outcome (local-control, tumor size) of this model-based simulation:
# function running the exponential model each time a chemotherapy administration event occurs
tumor_run <- function(y0, T, r) {
time <- seq(from = 0, to = T - 1, by = 1)
num_cells <- y0 * exp(r * time)
}
# Aggregate function capturing the entire duration of the experiment
tumor_redux <- function(y0, T, r, f, duration) {
num_runs <- seq(0, duration %/% T - 2, by = 1)
time_tot <- seq(0, duration - 1, by = 1)
cells <- tumor_run(y0, T, r)
for (val in num_runs) {
last_num <- cells[length(cells)]
new_start <- last_num * (1 - f)
new_cells <- tumor_run(new_start, T, r)
cells <- c(cells, new_cells)
}
cells
}
pain_run <- function(z0, T, q) {
time <- seq(from = 0, to = T - 1, by = 1)
pain <- z0 * exp(q * time)
}
# Pain/Side-effects function (returns RP2)
painSideEffects_RP2 <- function(z0, T, q, f, duration) {
num_runs <- seq(0, duration %/% T - 2, by = 1)
time_tot <- seq(0, duration - 1, by = 1)
pain <- tumor_run(z0, T, q)
for (val in num_runs) {
last_pain <- pain[length(pain)]
new_start <- last_pain * (1 + f)
new_pain <- tumor_run(new_start, T, q)
pain <- c(pain, new_pain)
}
pain
}
elapsed = 100
time <- seq(0, elapsed - 1, by = 1)
ctrl_y0 = 10 # starting tumor size (number of cancer cells)
ctrl_T = 10 # treatment-frequency (each T days)
ctrl_r = 0.05 # tumor exponential parameter (growth rate)
ctrl_f = 0.3 # treatment efficacy (fraction of elimination), proportion (1 - f) cells remain after each treatment
treatment <- array(dim=1)
for (i in 1:elapsed) {
treatment[i] <- 0
if (i %% 10 ==0) treatment[i] <- 5
}
# Pain RP2
ctrl_z0 = 5
ctrl_q = -0.05
########## LC
LC <- tumor_redux(ctrl_y0, ctrl_T, ctrl_r, ctrl_f, elapsed)
long_interspersed <- tumor_redux(ctrl_y0, ctrl_T + 10, ctrl_r, ctrl_f, elapsed)
low_reproduction <- tumor_redux(ctrl_y0, ctrl_T, ctrl_r - 0.02, ctrl_f, elapsed)
high_elimination <- tumor_redux(ctrl_y0, ctrl_T, ctrl_r, ctrl_f + 0.3, elapsed)
########### RP2
RP2 <- painSideEffects_RP2(ctrl_z0, ctrl_T, ctrl_q+0.02, ctrl_f, elapsed)
long_interspersedRP2 <- painSideEffects_RP2(ctrl_z0, ctrl_T + 10, ctrl_q+0.02, ctrl_f, elapsed)
heavyCompoundingRP2 <- painSideEffects_RP2(ctrl_z0, ctrl_T, ctrl_q + 0.04, ctrl_f, elapsed)
high_eliminationRP2 <- painSideEffects_RP2(ctrl_z0, ctrl_T, ctrl_q+0.02, ctrl_f - 0.3, elapsed)
df <- data.frame(time, LC, long_interspersed, low_reproduction, high_elimination,
RP2, long_interspersedRP2, heavyCompoundingRP2, high_eliminationRP2)
# fig <- fig %>% add_annotations(text="ctrl_y0 = 10 # starting tumor size (number of cancer cells)\n
# ctrl_T = 10 # treatment-frequency (each T days)\n
# ctrl_r = 0.05 # tumor exponential parameter (growth rate) \n
# ctrl_f = 0.3 # treatment efficacy (fraction of elimination),
# proportion (1 - f) cells remain after each treatment", ax = 50)
# Plot LC Growth Models
library(plotly)
fig <- plot_ly(df, x=~time, width = 1000, name="Time")
fig <- fig %>% add_trace(y = ~LC, name = 'Controlled Growth',
name="Number of tumor cells",
mode = 'lines+markers', type="scatter")
fig <- fig %>% add_trace(y = ~long_interspersed, name = 'High T separation (T+10)',
mode = 'lines+markers', type="scatter")
fig <- fig %>% add_trace(y = ~low_reproduction, name = 'Slow Reproduction (r-0.02)',
mode = 'lines+markers', type="scatter")
fig <- fig %>% add_trace(y = ~high_elimination, name = 'High fraction of elimination (f+0.3)',
mode = 'lines+markers', type="scatter")
fig <- fig %>% add_trace(x = ~time, y = ~55*treatment, type = 'bar', # Chemo Treatments
marker = list(color='gray', line = list(color='gray', width=1)),
opacity=0.2, name="ChemoTreat", text = "Chemo Treatments",
hoverinfo = 'Chemo Treatments')
fig <- fig %>% layout(title = "Tumor Growth (Cell Local Control) w.r.t. Different Chemo Interventions vs. Time (T=10, r=0.05, f=0.3)",
scene = list(xaxis = list(title = "Time"),
yaxis = list(title = "Number of tumor cells") ),
legend=list(title=list(text='<b>Models</b>')))
fig
# Plot RP2 Pain Models
library(plotly)
# RP2, long_interspersedRP2, low_reproductionRP2, high_eliminationRP2
fig <- plot_ly(df, x=~time, width = 1000, name="Time")
fig <- fig %>% add_trace(y = ~RP2, name = 'Controlled Pain',
name="Pain RP2",
mode = 'lines+markers', type="scatter")
fig <- fig %>% add_trace(y = ~long_interspersedRP2, name = 'High T separation',
mode = 'lines+markers', type="scatter")
fig <- fig %>% add_trace(y = ~heavyCompoundingRP2, name = 'Compounded Pain (q+0.04)',
mode = 'lines+markers', type="scatter")
fig <- fig %>% add_trace(y = ~high_eliminationRP2, name = 'High fraction of elimination (f=-0.03)',
mode = 'lines+markers', type="scatter")
fig <- fig %>% add_trace(x = ~time, y = ~5*treatment, type = 'bar', # Chemo Treatments
marker = list(color='gray', line = list(color='gray', width=1)),
opacity=0.2, name="Chemo Treatment", text = "Chemo Treatments",
hoverinfo = 'Chemo Treatments')
fig <- fig %>% layout(title = "RP2 (Pain) Model w.r.t. Different Chemo Interventions vs. Time (T=10, q=-0.05, f=0.3)",
scene = list(xaxis = list(title = "Time"),
yaxis = list(title = "Number of tumor cells")),
legend=list(title=list(text='<b>Models</b>')))
fig
This function, plot_3d()
, plots functions that can be
described in this parametric form \(y=f(x,t)\). This will allow us to render 3D
parametric manifolds as 2D graphs with the 3rd dimension \(t\) represented as a dynamic variable using
a horizontal slider.
\[\underbrace{y\left(t\right)}_{\text{LC}}=y_{o}e^{rt} .\] \[\underbrace{z\left(t\right)}_{{\text{RP}}_2}=z_{o}e^{qt} .\] Therefore, \[RP_2=z_{o}\left (\frac{LC}{y_o}\right )^{\frac{q}{r}} .\]
xLC <- seq(0,275,length=100)
x_RP2 <- seq(0,25,length=100)
# Let’s first embed the objective function in 3D by cine-animating along the time-axis
RP2RP2_objective <- ctrl_z0 * (xLC/ctrl_y0)^(ctrl_q/ctrl_r)
plot_ly(x=~xLC, y=~RP2RP2_objective, mode = 'lines+markers', type="scatter") %>%
layout(title = "RP2=f(LC) (T=10, r=0.05, q=-0.05, y0=10, z0=5, f=0.3)",
scene = list(xaxis = list(title = "LC"), yaxis = list(title = "RP2") ))
# plot_3d <- function(f, x, t) {
# x_range <- x
# x_label <- "x"
# slide_range <- t
# frames <- list()
# plt <- plot_ly()
# for (i in 1:length(slide_range)) {
# slide_val <- slide_range[i]
# # Generate the w values for f(x,t)
# w <-(matrix(apply(expand.grid(x_range,slide_val),1,f),length(x_range)))
#
# # A the start, only make the first frame visible
# visible <- i==as.integer(length(slide_range)/2)
#
# # Add this trace (first frame) to the plot
# plt <- add_trace(plt, x=x_range, y=w, mode='lines+markers', type="scatter", visible=visible,
# name=as.character(slide_val), showlegend=FALSE,
# colors = colorRamp(rainbow(8)), opacity=0.5, hoverinfo="none")
#
# # Configure this step in the slider to make this frame visible and none other
# step <- list(args = list('visible', rep(FALSE, length(slide_range))),
# method = 'restyle', label=paste0("t=", round(slide_val,3)))
# step$args[[2]][i] = TRUE
# frames[[i]] = step
# }
#
# # Show the plot + slider focused on the middle plot
# plt %>%
# layout(
# title = paste0("Time-Dynamic 2D Plot LC vs. RP2"),
# scene = list(yaxis=list(title="w=f(x,t)"), xaxis=list(title=x_label)),
# sliders = list(list(active = as.integer(length(slide_range)/2),
# currentvalue = list(prefix = "t:"),
# steps = frames))) %>%
# hide_colorbar()
# }
#
# RP2_objective <- function(LC) {
# return(ctrl_z0 * (LC/ctrl_y0)^(ctrl_q/ctrl_r) ) # see the earlier definitions of the parameters
# }
#
# plot_3d(RP2_objective, xLC, time)
# fig <- plot_ly() %>%
# add_trace(data = df, x = ~time, y = ~LC, z = ~RP2, type="mesh3d",
# contour=list(show=TRUE, color="#000", width=15, lwd=10),
# opacity=0.5, hoverinfo="none") %>%
# # trace the boundary of the saddle point surface
# add_trace(x = ~time, y = ~LC, z = ~RP2, type="scatter3d", mode="lines",
# line = list(width = 1, color="red"), name="Optimal Treatment Surface",
# hoverinfo="none")
# fig
fig <- plot_ly() %>%
add_trace(data = df, x = ~RP2, y = ~LC, z = ~time, type="mesh3d",
# colors = colorRamp(rainbow(8)),
opacity=0.7, name="Optim-Treat-Surf", # hoverinfo="none",
contour=list(show=TRUE, color="#000", width=15),
hovertemplate = 'RP2: %{x:0.2f}\nLC: %{y:0.2f}\nTime: %{z:0.0f}') %>%
# trace the boundary of the Optimal Treatment Surface
add_trace(x = ~RP2, y = ~LC, z = ~time, type="scatter3d", mode="lines",
line = list(width = 1, color="red"), name="Opt-Surf-Trace",
hovertemplate = 'RP2: %{x:0.2f}\nLC: %{y:0.2f}\nTime: %{z:0.0f}') %>%
layout(title = "Optimal Treatment Surface", showlegend = FALSE,
scene = list(
xaxis = list(title = "RP2"),
yaxis = list(title = "LC"),
zaxis = list(title = "Time")
))
fig
A dose \(D\) of drug is given that causes the plasma concentration to rise from \(0\) to \(C_0\). The concentration then falls according to \(C = C_0 e^{-bt}\). At time \(T\), what dose must be given to raise the concentration to \(C_0\) again? What will happen if the original dose is administered over and over again at intervals of T?
The solution to this problem visually is the same as the visualization in Part 1.4.4 of this chapter (1.4.4 Side-effects (Pain, RP2)).
However, to solve it algebraically, we can note that at time \(T\), the fraction remaining is \(C/C_0 = e^{-bT}\). Therefore, the plasma concentration must be increased by \(C_0(1-e^{-bT})\) to return the concentration to \(C_0\). Thus, the dose required is therefore \(D_{const} = D(1-e^{-bT})\).
If the original dose \(D\) is administered over and over again at intervals of T, we can say for example that the concentration has contributions of:
And so on until we can say that:
\[C = C_0(1 + e^{-bT} + (e^{-bT})^2 + (e^{-bT})^3 + ...\]
And if you can recall from Calculus 1, the series in parentheses is equal to \(1/(1- e^{-bT})\). Thus:
\[C_{final} = \frac{C_0}{1-e^{-bT}}\]
The solution to the differential equation \(dy/dt = a - by\) for the initial condition \(y(0)=0\) is \(y = (a/b)(1-e^{-bt})\). Plot the solution for \(a = 5\) g \(min^{-1}\) and for \(b = 0.1, 0.5\), and \(1.0\) \(min^{-1}\). Discuss why the final value and the time to reach the final value change as they do. Also make a plot for \(b = 0.1\) and \(a = 10\) to see how that changes the situation.
y <- function(a, b, t) {
output <- (a / b) * (1 - exp(-b * t))
}
t <- seq(1, 50, 1)
plot_ly(x=~t) %>%
add_trace(y=~y(a=5, b=0.1, t=t), type='scatter', mode='lines', name='a=5, b=0.1') %>%
add_trace(y=~y(a=5, b=0.5, t=t), type='scatter', mode='lines', name='a=5, b=0.5') %>%
add_trace(y=~y(a=5, b=1.0, t=t), type='scatter', mode='lines', name='a=5, b=1.0') %>%
add_trace(y=~y(a=10, b=0.1, t=t), type='scatter', mode='lines', name='a=10, b=0.1') %>%
layout(
xaxis=list(title="t"),
yaxis=list(title="y")
)
As \(b\) increases, the final value and the time to get there decreases. The plot with \(b = 0.1\) and \(a = 10\) will have twice the initial slope and an asymptotic value of 100.
A classic example illustrating exponential decay is the decay of radioactive isotopes. Luckily, the most basic of the exponential equations are a good model:
\[y = y_0 e^{-bt}\]
Where \(y\) is the amount of the isotope left after \(t\) hours, \(y_0\) is the starting amount, and \(b\) is the fractional decay rate, which in this case is always positive. However, scientists have come up with a clever way to characterize the isotopes: they defined their half-lives, \(T_{1/2}\)(or less commonly, their doubling times, \(T_2\)), a value that is unaffected by the amount of the material. To do so, they set \(y/y_0 = 0.5\), and solved for t, since that time is where \(y = 0.5y_0\):
\[0.5 = e^{-bt} = e^{-bT_{1/2}}\] \[T_{1/2} = \frac{\ln(2)}{b} = \frac{0.693}{b}\] (Note: the corresponding sections in the textbook (Chapter 2) include additional information on the differential equations that generate these exponential equations for radioactive decay (2.2), multiple decay paths (2.7) and decay plus input at a constant rate (2.8).)
However, when there are several independent paths by which an isotope can disappear (such as if it can decay radioactively or biologically simultaneously), there might be multiple decay rates, \(b_i\). In this case, we can find a total decay rate, \(b\), by adding all \(b_i\) together.
\[b = \sum_i{b_i}\]
Additionally, we can take the sum of the reciprocals of each decay path’s half-life to get a total half-life, \(T\).
\[\frac{1}{T} = \sum_i{\frac{1}{T_i}}\]
However, it is not entirely realistic to just model the decay of a system like an isotope: most of the time we also have to make sure it doesn’t decay completely by adding more of the material. In a situation where we have a material that is exponentially decaying but we are also adding more of the material at a constant rate, \(a\), we have the model:
\[y = \frac{a}{b}(1 - e^{-bt})\]
library(plotly)
t <- seq(0, 50, 1) # time over 100 hours
y0 <- 100 # starting with 100 kg of Technetium-99
b <- 0.1155 #h^-1, decay constant from the textbook
b1 <- 0.3 #h^-1, theoretical decay in stomach
b2 <- 0.02 #h^-1, theoretical decay in liver
a1 <- 3 #low constant input
a2 <- 10 #high constant input
radioDecay <- function(t, y0, b) {
y <- y0 * exp(-b * t)
}
multiplePath <- function(t, y0, b, b1, b2) {
total_b <- b + b1 + b2
y <- radioDecay(t, y0, total_b)
}
decayPlusConstant <- function(t, b, a) {
y <- (a / b) * (1 - exp(-b * t))
}
plot_ly() %>%
add_trace(x = ~t, y = ~radioDecay(t, y0, b), name = "Only radioactive decay", mode = 'lines+markers', type = 'scatter') %>%
add_trace(x = ~t, y = ~multiplePath(t, y0, b, b1, b2), name = "Radioactive and biological decay", mode = 'lines+markers', type = 'scatter') %>%
add_trace(x = ~t, y = ~decayPlusConstant(t, b, a1), name = "Low constant input (3 kg/hr)", mode = 'lines+markers', type = 'scatter') %>%
add_trace(x = ~t, y = ~decayPlusConstant(t, b, a2), name = "high constant input (10 kg/hr)", mode = 'lines+markers', type = 'scatter') %>%
layout(title = "Decay of Technetium-99", scene = list(
xaxis = list(title = "Time (hours)"),
yaxis = list(title = "Amount of material (kg)"))
)
In Chapter 1, we covered several types of distributions that are commonly used to describe or approximate certain real phenomena. However, not all datasets will be modeled accurately by a single distribution or type of distribution. Using the mixtools R package, we can fit real data to a mixture of probability distributions with particular parameters and different weights using an Expectation-Maximization (EM) algorithm. Put simply, it finds parameter estimates using their maximum likelihoods and is specifically useful when the equations determining the relations of the data/parameters cannot be directly solved. This example is simplified from the [DSPA Data Visualization notes] (https://socr.umich.edu/DSPA2/DSPA2_notes/02_Visualization.html#26_Distributions).
#Read in data and parse it
crystallography_data <- read.csv(file = "https://umich.instructure.com/files/13375767/download?download_frd=1",
header=TRUE)
col_num <- dim(crystallography_data)[2]; col_num
## [1] 9
sampleColNames <- c("AC1338","AC1432","AC1593", "AC1679", "AC1860", "AC1874", "AC1881", "AC1903", "Rec")
sampleMixtureParam <- c(3, 3, 3, 3, 3, 3, 3, 3, 3)
df_sampleMixtureParam <- data.frame(t(sampleMixtureParam))
colnames(df_sampleMixtureParam) <- sampleColNames; # df_sampleMixtureParam
#Generate vectors holding the fits
fit_W <- vector(mode = "list", length = col_num)
# install.packages("mixtools")
library(mixtools)
# Fit mixture models
capture.output(
for(i in 1:col_num) {
data_no_NA <- crystallography_data[complete.cases(crystallography_data[, i]), i]
length(data_no_NA)
fit_W[[i]] <- weibullRMM_SEM(data_no_NA, k=df_sampleMixtureParam[1,i], verb=F)
},
file='NUL'
)
# Custom design of Weibull-Mixture Model plot
weibullMM.plot <- function(mix.object, k = 2, main = "") { # mix.object <- fit_W[[i]]
data_no_NA <- crystallography_data[complete.cases(crystallography_data[, i]), i]
d3 <- function(x) { # construct the mixture using the estimated parameters
mix.object$lambda[1]*dweibull(x, shape=mix.object$shape[1], scale=mix.object$scale[1]) +
mix.object$lambda[2]*dweibull(x, shape=mix.object$shape[2], scale=mix.object$scale[2]) +
mix.object$lambda[3]*dweibull(x, shape=mix.object$shape[3], scale=mix.object$scale[3])
}
x <- seq(min(data_no_NA), max(data_no_NA), 0.001)
hist(data_no_NA, col="pink", freq=F, breaks=15, main = main, xlab="Intensities")
lines(x, d3(x), lwd=3, col="black", xlim=c(4,23), ylim=c(0, 0.25))
mixColors <- colorRampPalette(c("blue", "red"))(k)
for (i in 1:k) {
d = function(x) { # Construct each of the Weibull components using the estimated parameters
mix.object$lambda[i]*dweibull(x, shape=mix.object$shape[i], scale=mix.object$scale[i])
}
lines(x, d(x), lwd=3, col=mixColors[i])
}
}
# Plot Mixture Models and Report model parameter estimates
for(i in 1:2) { # this only plots the first 2 samples to save space
weibullMM.plot(fit_W[[i]], df_sampleMixtureParam[1,i],
paste0("Mixture of ", df_sampleMixtureParam[1, sampleColNames[i]],
" Weibull Models of ", sampleColNames[i]))
}
Cells can repair DNA damage caused by x-ray exposure (see Sect. 16.9). Wang et al. (2001) found that the amount of damage is characterized by two time constants. Assume the DNA damage, D (%), as a function of time, t (hrs), is given by the following data:
library(knitr)
data <- data.frame(
t = c(0, 0.25, 0.5, 0.75, 1.0, 1.5, 2, 4, 6, 8),
D = c(100, 46, 28, 21, 18, 16, 14, 9.0, 5.8, 3.7)
)
kable(data, caption="Problem 2.32 Data")
t | D |
---|---|
0.00 | 100.0 |
0.25 | 46.0 |
0.50 | 28.0 |
0.75 | 21.0 |
1.00 | 18.0 |
1.50 | 16.0 |
2.00 | 14.0 |
4.00 | 9.0 |
6.00 | 5.8 |
8.00 | 3.7 |
Plot the data on semilog paper. Fit the data to Eq. 2.27 by eye or using a spreadsheet and determine \(A_1\), \(A_2\), \(b_1\), and \(b_2\).
Eq. 2.27:
\[y = y_1 + y_2 = A_1 e^{-b_1 t} + A_2 e^{-b_2 t}\]
library(plotly)
data_D <- data.frame(
t = c(0, 0.25, 0.5, 0.75, 1.0, 1.5, 2, 4, 6, 8),
D = c(100, 46, 28, 21, 18, 16, 14, 9.0, 5.8, 3.7)
)
a1.0 = 70
a2.0 = 20
b1.0 = 4
b2.0 = 0.2
start=list(a1 = a1.0, a2 = a2.0, b1 = b1.0, b2=b2.0)
fit <- nls(D ~ I(a1 * exp(-b1 * t) + a2 * exp(-b2 * t)),
data = data_D,
start = start)
summary(fit)
##
## Formula: D ~ I(a1 * exp(-b1 * t) + a2 * exp(-b2 * t))
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## a1 78.295920 0.311888 251.04 2.70e-13 ***
## a2 21.696429 0.247110 87.80 1.47e-10 ***
## b1 4.481156 0.041853 107.07 4.47e-11 ***
## b2 0.219630 0.004878 45.02 8.04e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.207 on 6 degrees of freedom
##
## Number of iterations to convergence: 3
## Achieved convergence tolerance: 6.456e-06
time <- seq(data_D$t[1], data_D$t[length(data_D$t)], 0.1)
pred_D <- predict(fit, list(t=time))
plot_ly(data, x=~t, y=~D) %>%
add_markers(name='data') %>%
add_trace(x=~time, y=~pred_D, mode='lines', name='fit to Eq. 2.27') %>%
layout(
title="Semilog",
xaxis = list(title="t (hours)", type="log"),
yaxis = list(title="D (%)")
)
From nls()
, the final equation is:
\[D = 78.3 e^{-4.48t} + 21.7 e^{-0.21t}\]
Despite the nominal similarity to “logarithmic”, the logistic function is its own function type, and one that is both related to the exponential functions and important in many biological systems. Known primarily for its common S-shaped (sigmoid) curve, its mathematical definition is:
\[f(x) = \frac{K}{1 + e^{-k(x - x_0)}}\]
Where:
library(plotly)
x0 = 5 #midpoint
K = 10 #max value
k = 0.1 # logistic growth rate
x <- seq(-50, 70, 1)
logis <- function(x, K, k, x0) {
y <- K / (1 + exp(-k * (x - x0)))
}
plot_ly(x = ~x, y = ~logis(x, K, k, x0), mode = 'lines', type = 'scatter') %>%
layout(title = "Logistic Equation Example (K = 10, k = 0.1, x0 = 5)")
Developed by Pierre François Verhulst between 1838 and 1847 primarily as a model of population growth, the logistic function’s shape can be easily remembered by this intuition:
With this in mind, the logistic function can be seen as an alternative to the exponential equation for certain systems where saturation’s effect on growth is an important aspect to capture such as in the oxygen-hemoglobin dissociation curve.
In statistics, the logistic curve is also key in the transformation of the polytomous outcomes of binary outcome variables or ordinal categorical variables into real variables. In this case, the logistic curve used is of almost the same form as above \(\left ( y = f(x) = \frac{K}{1 + e^{-x}} \right )\), where:
But, the point of the logistic transformation is that we can rearrange the curve and solve for x, which is the log-odds or logit function when y is the probability of an event of interest:
\[y = f(x) = \frac{K}{1 + e^{-x}} \Leftrightarrow x = \ln{\left ( \frac{y}{1-y}\right )}.\]
Thus, we can use a logistic regression equation model to estimate the probability of specific outcomes.
At the above link to the DSPA material, the example given is Heart Transplant Surgery. Although we encourage you to follow the process laid out in that example to fully grasp how to carry out the transformation, the mathematics behind it, and its significance, here we will cover it in summary.
As in that example, let’s look at a theoretical case where we need to estimate the probability of surviving a heart transplant based on a surgeon’s experience. Suppose a group of 20 patients undergo heart transplantation with different surgeons having experience in a range of 0 to 10, where 0 is the least experience and 10 is the most, each value representing 100’s of operating/surgery hours (i.e. a 10-rated surgeon has had about 10,000 operating hours). How does the surgeon’s experience affect the probability of the patient surviving?
The data below shows the outcome of the surgery (1 = survival) or (0 = death) according to the surgeon’s experience in 100’s of hours of practice.
Surgeon’s Experience (SE) | 1 | 1.5 | 2 | 2.5 | 3 | 3.5 | 4 | 4.5 | 5 | 5.5 | 6 | 6.5 | 7 | 7.5 | 8 | 8.5 | 9 | 9.5 | 10 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Clinical Outcome (CO) | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 1 | 1 |
As you can tell, the clinical outcome is a binary outcome variable: we cannot simply run an analysis like a simple linear regression/fit in Excel to generate a model that can make predictions, but we can use a logistic transformation to do so. In ‘R’ we can do this by estimating a logistic regression model for the clinical outcome (CO) and survival using the generalized linear model (glm) function as shown below. The graph of the regression on the original data is shown below, where the clinical outcome is on the y-axis vs. the surgeon’s experience on the x-axis:
mydata <- read.csv("https://umich.instructure.com/files/405273/download?download_frd=1") # 01_HeartSurgerySurvivalData.csv
# convert Surgeon's Experience (SE) to a factor to indicate it should be treated as a categorical variable.
mylogit <- glm(CO ~ SE, data = mydata, family = "binomial")
library(ggplot2)
ggplot(mydata, aes(x=SE, y=CO)) + geom_point() +
stat_smooth(method="glm", method.args=list(family = "binomial"), se=FALSE)
##
## Call:
## glm(formula = CO ~ SE, family = "binomial", data = mydata)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -4.1030 1.7629 -2.327 0.0199 *
## SE 0.7583 0.3139 2.416 0.0157 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 27.726 on 19 degrees of freedom
## Residual deviance: 16.092 on 18 degrees of freedom
## AIC: 20.092
##
## Number of Fisher Scoring iterations: 5
And from the coefficients given in this output summary, we are able to generate the mathematical model, where CO is the probability of surviving open heart surgery and SE is the 100’s of hours of operating experience of the surgeon: \[CO = \frac{1}{1 + e^{-(-4.1030 + 0.7583SE)}}\]
For example, we can find the probability of a patient undergoing heart surgery with a doctor that has 400 operating hours experience by plugging in \(SE = 4\), which outputs a value of \(CO = 0.26\), or a 26% chance of survival.
In this way, we can use a logistic curve to get a quantitative understanding of binary and categorical data. Again, please see the DSPA materials on logistic transformations for a more in-depth look at this example and on logistic transformations in general.
One of the tricks that scientists and engineers use when graphing specific types of data is the manipulation of the scaling of graphs. A great example of this are log-log plots, two-dimensional graphs where both horizontal and vertical axes use logarithmic scales as opposed to linear ones, as well as semi-log plots, where only one of the axes uses the logarithmic scale. These types of graphs are useful because they allow us to easily visualize exponential relationships and estimate parameters: relationships in the form \(y = Bx^n\) appear as straight lines on a log-log plot, while relationships in the form \(y = \phi a^{\psi x}\) appear as straight lines on a semi-log plot (where y is the axis that is on the logarithmic scale). Notably, the former has a special name: a power law. Practically, this means that while linearly-scaled axes have the same, equal spacing for a distance from 1 to 2 as 10 to 11, on a typical log-log plot, where the scale factor is 10, the distance between 0.1 and 1 is the same as 1 and 10, as well as 10 and 100, and so on. This has been shown previously in Problem 2.32, but the following is an additional (interactive!) example. Note which function is linear in each plot type.
library(plotly)
library(MASS)
xax <- seq(-0.5, 0.5, 0.1)
#parameters for log-log
B <- 2
n <- 4
# parameters for semilog
a <- 2
phi <- 3
psi <- 6
#generate lines in different views
linear <- function(x) {
output <- 10 * x
}
loglog <- function(x, B, n) {
output <- B * x ^ n
}
semilog <- function(x, phi, a, psi)
output <- phi * a^(psi * x)
plot_ly(type = 'scatter', mode = 'lines') %>%
add_trace(x = ~xax, y = ~linear(xax), name='10x') %>%
add_trace(x = ~xax, y = ~semilog(xax, phi, a, psi), name='phi * a^(psi * x)') %>%
add_trace(x = ~xax, y = ~loglog(xax, B, n), name='B * x ^ n (power law)') %>%
layout(
title = "Linear vs. log-log vs. semilog",
xaxis = list(title = "x"),
yaxis = list(title = "y"),
updatemenus = list(list(
active = 0,
buttons = list(
list(
label = 'linear',
method = 'update',
args = list(
list(
visible = c(T,T,T)),
list(
yaxis = list(type = 'linear'),
xaxis = list(type = 'linear'))
)
),
list(
label = 'semilog',
method = 'update',
args = list(
list(
visible = c(T,T,T)),
list(yaxis = list(type = 'log'),
xaxis = list(type = 'linear')))
),
list(
label = 'log-log',
method = 'update',
args = list(
list(
visible = c(T,T,T)),
list(
yaxis = list(type = 'log'),
xaxis = list(type = 'log')))
)
)
)
)
)