SOCR ≫ | DSPA ≫ | DSPA2 Topics ≫ |
After Thomas Bayes died in 1761 his unpublished work about the flat-prior posterior binomial distribution was rediscovered and published in the Philosophical Transactions of the Royal Society of London in 1764.
The Monte Carlo (MC) Method is a simple technique used to estimate analytic quantities that may have closed-form expressions, but are difficult to compute exactly, e.g., integration, differentiation, optimization of non-trivial functions. MC simulation may be used to approximate probability distributions, estimate likelihoods, or calculate expectations. The intuition of MC sampling is rooted in the fact that we can estimate any desired (posterior) expectation using ergodic averages. In other words, we can compute any population characteristic, associated with a corresponding sample-driven statistic, of a (posterior) distribution provided we can generate \(N\) simulated samples from that distribution:
\[E(f(s))_P = \int {f(s)p(s)ds} \approx \frac{1}{N}\sum_{i=1}^N {f(s^{(i)})},\] where \(P\) and \(p\) are the probability (posterior) distribution and density functions of interest, \(E\) represents the expectation, \(f(s)\) is the population characteristic of interest (corresponding to a statistic), and \(s^{(i)}\) is the \(i^{th}\) simulated sample from the distribution \(P\). For instance, we can estimate the population mean (linked to the sample arithmetic average statistic) by:
\[E(x)_P =\int{xp(x)dx} \approx \frac{1}{N}\sum_{i=1}^N {x^{(i)}},\] where \(p(x)\) is the density of the distribution function \(P\). Similarly, assuming a trivial mean, to estimate the population variance (corresponding to the sample-variance statistic) we compute:
\[E(x^2)_P =\int{x^2p(x)dx} \approx \frac{1}{N-1}\sum_{i=1}^N {\left ( x^{(i)}\right ) ^2}.\]
Many integrals, differentials, or sums that cannot be exactly computed in closed analytic form, can still be expressed as expectations of some random variables with respect to some probability distribution, which are estimable by MC methods.
The Markov Chain Monte Carlo (MCMC) method was proposed by Andrey Markov in the early \(20^{th}\) century. Markov Chains refer to one-step dependent sequences of random variables obeying the Markov property (past and future are conditionally independent given the present). For example:
Sampling with replacement vs. without replacement from a space (e.g., an urn) containing a set of identifiable objects represent processes obeying vs. not obeying the Markov property, respectively.
A simple Brownian motion (Wiener process) random walk on an integer lattice \(Z^d \subset R^d\) represents a Markov chain whose transition probabilities are defined by \(p(x,x\pm e_i)=\frac{1}{2d}\), for each \(x\in Z^d\), where \(e_i\) are the normalized vectors in \(Z^d\). At each step, this random walk process, currently at position \(x\), moves to a randomly chosen nearest neighbor, \(x\pm e_i\).
Consider cancer staging and the corresponding transitions between stages. The Cancer Tumor-Node-Metastasis (TNM) staging system describes the size of the primary tumor (T), whether the cancer has spread to the lymph nodes (N), and whether it has spread to a different part of the body (M). \(T=1(small), 2, 3, 4(large)\) refers to the size of the cancer and how far it has spread into nearby tissue, \(N=0\) (no lymph nodes containing cancer cells), \(1, 2, 3\) (lots of lymph nodes containing cancer cells), and \(M=\) \(0\) (no spread), \(1\) (cancer has spread), refers to whether the cancer has spread to another part of the body. Most types of cancer have four stages, \(CS=1(early), 2, 3,\ to\ 4\ (advanced)\). Stage 1 implies that a cancer is relatively small and contained within the organ it started in. Stage 2 means the cancer has not started to spread into surrounding tissue but tumor is larger than in stage 1, or that cancer cells have spread into lymph nodes close to the tumor. Stage 3 usually means the cancer is larger and may have started to spread into surrounding tissues possibly with cancer cells in the lymph nodes in the area. And stage 4 means the cancer has spread from its point of origination to another body organ indicating metastatic progression.
Let’s use cancer staging to illustrate the fundamentals of MC representation, its properties, and computational characteristics. For each stage, let \(X_t\) denote the cancer stage of a patient at time \(t\). Cancer onset, progression, prognosis and outcomes have stochastic behavior. We have 5 states, \(X_n=0(no\ cancer), 1,2, 3, 4\), and initially \(X_o=0\) (cancer occurs), with \(P(X_o=0)=1\). Next we can define the marginal probabilities for transition from one cancer stage to another (these depend upon the type of cancer), say \(P(X_1=1)=0.6\), \(P(X_1=2)=0.4\), \(P(X_1=3)=0.2\), \(P(X_1=4)=0.1\). Computing the distribution of \(X_t\) for \(t\geq 2\) requires conditional probabilities.
Suppose that at time \(t\), the cancer patient is at stage \(s_o\). Then, we can compute the conditional probabilities: \(P(X_{t+1}=s_k|X_t=s_o)\) of patient being in a specific stage \(s_k\) at the next time point \(t+1\). For instance, if at time \(t\) the patient is in stage \(1\) we may have \(P(X_{t+1}=0|X_t=1)=0.15\), \(P(X_{t+1}=1|X_t=1)=0.5\), \(P(X_{t+1}=2|X_t=1)=0.2\), \(P(X_{t+1}=3|X_t=1)=0.1\), and \(P(X_{t+1}=4|X_t=1)=0.05\).
Would these probabilities change if we condition further on the oncological history of the same patient up to time \(t\)? Because of the stochastic coin-toss like mechanism of many biomedical, physiologic and natural processes, these conditional likelihoods may not be affected by knowing a longer staging history of the disease provenance. To predict the likelihood of being in a certain state at the next time period, only the current state of the disease matters. Thus, for any cancer provenance record (\(s_o, s_1, \cdots, s_{t-1}\)), we may have \(P(X_{t+1}=0|X_o=S_o, X_1=s_1, \cdots, X_{t-1}=s_{t-1}, X_t=1)=P(X_{t+1}=0|X_t=1)=0.2\), \(P(X_{t+1}=1|X_o=S_o, X_1=s_1, \cdots, X_{t-1}=s_{t-1}, X_t=1)=P(X_{t+1}=1|X_t=1)=0.2\), \(P(X_{t+1}=2|X_o=S_o, X_1=s_1, \cdots, X_{t-1}=s_{t-1}, X_t=1)=P(X_{t+1}=2|X_t=1)=0.2\), \(P(X_{t+1}=3|X_o=S_o, X_1=s_1, \cdots, X_{t-1}=s_{t-1}, X_t=1)=P(X_{t+1}=3|X_t=1)=0.2\), and \(P(X_{t+1}=4|X_o=S_o, X_1=s_1, \cdots, X_{t-1}=s_{t-1}, X_t=1)=P(X_{t+1}=4|X_t=1)=0.2\).
This is because the coin-toss like cancer staging outcome (not necessarily a fair coin) at time \(t+1\) is independent of the prior coin flips and therefore it does not depend on \(X_o, X_1, \cdots, X_{t-1}\). It only depends on the current cancer stage, \(X_t\), at time \(t\). This rule, that the conditional distribution of \(X_{t+1}\) depends only on the state at the prior time, \(X_t\), is referred to as the Markov property, or memoryless property.
Another characteristic of Markov processes, time homogeneity, is that the conditional distribution of \(X_{t+1}\) given (say) \(X_t=s_o\) is the same for all time points, i.e., \(P(X_{t'+1} | X_{t'}=s_o) = P(X_{t''+1} | X_{t''}=s_o)\), for all \(t'\) and \(t''\). This follows from the fact that albeit dependent on human intervention, the biophysiological mechanism of cancer stage transitioning obeys the same rules independently of the time.
Collectively, the transition probabilities comprise the Markov transition matrix, \(P\), where each element \(P_{i,j}\) represents the conditional probability of observing cancer stage \(s_j\) at time \(t+1\) given that the current, time \(t\), stage is \(s_i\), i.e., \(P_{i,j} = P(X_{t'+1} =s_j| X_{t'}=s_i)\).
This a Markov chain model for cancer progression, using the above 5-cancer stages (model states) may be represented mathematically as a transition matrix or a graph (here \(k=5\)): \[P= \begin{pmatrix} P_{1,1} & P_{1,2} & \cdots & P_{1,k} \\ P_{2,1} & P_{2,2} & \cdots & P_{2,k} \\ \vdots & \vdots & \ddots & \vdots \\ P_{k,1} & P_{k,2} & \cdots & P_{k,k} \end{pmatrix} .\] The axioms of probability theory dictate that:
Below we show the mathematical matrix representation of the MC process above as a (directed, cyclic, complete) graphical object illustrating all states and their corresponding transition probabilities.
# install.packages("network")
library('igraph')
nodes <- data.frame(c('Stage_0','Stage_1','Stage_2', 'Stage_3', 'Stage_4'))
node_names <- c('Stage_0','Stage_1',' Stage_2', ' Stage_3', ' Stage_4' )
edges <- data.frame(from=c( 'Stage_0', 'Stage_0', 'Stage_0', 'Stage_0', 'Stage_0',
'Stage_1', 'Stage_1', 'Stage_1', 'Stage_1', 'Stage_1',
'Stage_2', 'Stage_2', 'Stage_2', 'Stage_2', 'Stage_2',
'Stage_3', 'Stage_3', 'Stage_3', 'Stage_3', 'Stage_3',
'Stage_4', 'Stage_4', 'Stage_4', 'Stage_4', 'Stage_4'),
to=c('Stage_0', 'Stage_1', 'Stage_2', 'Stage_3', 'Stage_4',
'Stage_0', 'Stage_1', 'Stage_2', 'Stage_3', 'Stage_4',
'Stage_0', 'Stage_1', 'Stage_2', 'Stage_3', 'Stage_4',
'Stage_0', 'Stage_1', 'Stage_2', 'Stage_3', 'Stage_4',
'Stage_0', 'Stage_1', 'Stage_2', 'Stage_3', 'Stage_4'),
same.dept=c(T,T,T,F, T)
)
edge_names <- rep('P{i,j}', 25)
vertex_size <- rep(20, 5)
for (i in 1:5) {
vertex_size[i] <- 70-5*(i-1)
for (j in 1:5) {
edge_names[j + (i-1)*5] <- paste0('P{', i-1, ',', j-1, '}')
# edge_names[j + (i-1)*5]
}
}
g <- graph_from_data_frame(edges, directed=TRUE, vertices=nodes)
plot(g, layout=layout_with_fr, vertex.size=vertex_size, vertex.label=node_names, edge.label=edge_names, edge.label.cex = 0.7, edge.curved=0.3, margin=c(0.1, 0.1, 0.1, 0.1), main='Cancer Staging Transition Graph', ylim=c(-1,1),xlim=c(-1,1), asp = 1)
In quantum physics, the Heisenberg uncertainty principle suggests that there is a limit to the process of simultaneously identifying the position and momentum of particles in the universe, which implies that the future is difficult to predict exactly, albeit may be probabilistically quantified. Similarly, in logic, the G?del’s incompleteness theorem suggests that no system (closed system supported by quantitative arithmetic) can be complete and consistent at the same time. This also imposes some theoretical bounds on the precision of our forecasting and inference.
Monte Carlo experiments represent computational algorithms utilizing repeated random sampling of data and calculation of specific statistics that estimate concrete population characteristics. Two examples of using randomness to solve challenging problems involve the estimation of the area of a 2D region in the plane with complex boundary and compressed sensing for rapid estimation of complex signals using (sub-Nyquist frequency) stochastic sampling. The Metropolis-Hastings algorithm and Gibbs sampling represent a couple of specific Markov Chain Monte Carlo (MCMC) methods of simulation and estimation.
Earlier in Chapter 7, we presented the Bayes conditional probability formula in terms of events. Below, we will expand that formulation of the relation between the posterior, prior, conditional and marginal probabilities.
Gibbs sampling can be applied to estimate Bayesian models facilitating subsequent Bayesian inference. The fundamental idea is to obtain MCMC computationally accurate estimates of complicated or intractable statistical estimations, mathematical operations.
Conditional probability reflects partial knowledge about some observed data. Let’s consider the bivariate density and distribution situations to avoid the limitations of univariate cases and reduce the complexities of more general multivariate processes. For discrete-state processes, \(X,Y\in \Omega\), the probability mass function (PMF), \(f(.,.)\geq 0\), consists of the complete set of probability values assigned to all possible \((x, y)\) pair states, where \(\sum_{(x, y) \in \Omega} f(x, y) = 1\). The PMF values, \(f(x, y)\) correspond to the probability of observing the event \(X = x\) and \(Y = y\). For continuous-state processes, the probability density function (PDF), is a non-negative function \(f(x,y)\geq 0\) that integrates to one \(\int_{-\infty}^\infty { \int_{-\infty}^{\infty} { f(x, y) \ d x \ d y }} = 1\).
The PDF assigns probabilities, \(f(x, y) \, d x \, d y\) , to outcome “areas” corresponding to observing the event \(x < X < x + d x\) and \(y < Y < y + d y\), where the infinitesimal \(d x\) and \(d y\) represent the small increments in the two planar directions. In mixed situations where one of the variable may be continuous and the other discrete, probability mass-density functions may be defined in the same spirit yielding probabilities of outcomes, \(f(x, y) \, d x\), defined for mixed continuous-discrete events like \(x < X < x + d x\) and \(Y = y\).
Often we may be able to observe \(X\) but not \(Y\). The joint distribution represents the probability of the outcome \((x,y)\in A\), \(P((X,Y)\in A)\), without any other knowledge and before observing either of the random processes \(X\) or \(Y\). Suppose we see \(X = x\) first, what should be the distribution of \(Y\)? Given \(X\), the probability function has one unknown variable and has a conditional probability function analytic representation. For instance, when \(X\) and \(Y\) are discrete random variables and \((x,y)\in \Omega\), then the conditional probability mass function of \(Y\) given \(X\) is:
\[f(y | x) = \frac{h(x, y)}{\sum_{y \in \Omega_x} h(x, y)}, \qquad y \in \Omega _x,\]
where \[\Omega_x = \{\ y : \text{such that } (x, y) \in \Omega \}.\]
Similarly, when \(X\) and \(Y\) are continuous random variables, then \[ f(y | x) = \frac{h(x, y)}{\int_{- \infty}^\infty h(x, y) \ d y}, \qquad - \infty < y < \infty,\] represents the conditional probability density function of \(Y\) given \(X\).
The joint multivariate distribution is the product of the marginal and the conditional distributions, \(f(x, y)= f_X(x)\times f(y | x)\). In the discrete case, the marginal distribution of \(X\) (ignoring \(Y\)) is \(f_X(x) = \sum_{y \in \Omega_x} f(x, y)\), where \(\Omega_x\) is the support of \(X\), represents the probability of the event \(X = x\). And when \(X\) and \(Y\) are both continuous, the marginal is \(f_X(x) = \int_{- \infty}^\infty f(x, y) \ d y\) and for any event \(E\subset \Re\), \(P(X \in E) = \int_E f_X(x) \ d x = \int_E d x \int_{- \infty}^\infty f(x, y) \ d y\) represents the probability of the event \(X \in E\), which is equivalent to \((X, Y) \in E \times \Re\).
The conditional distributions are expressed as: \[\begin{align*} f(y | x) & = \frac{f(x, y)}{f_X(x)} \\ f(x | y) & = \frac{f(x, y)}{f_Y(y)} \end{align*}.\]
Notice that there are separate marginal distributions for each variable. In our bivariate case, one marginal for \(X\) and another for \(Y\).
This bivariate situation generalizes to \(X\) and \(Y\) variable vectors where multiple variables can be conditioned upon in the above relation between joint, marginal and conditional probabilities.
Review the many hands-on examples of constructing and validating Bayesian confidence intervals that are included in the SOCR General Confidence Intervals Activity.
Bayesian inference is based on probability theory as a consistent model for describing uncertainty using probability distributions. For instance, statistical inference frequently involves probability distribution models, whose true parameters are unknown but may be estimated from observations. A prior distribution of the parameter (or parameters, if the model distribution has multiple parameters like in \(Normal(\mu, \sigma^2)\)) is the initial belief about the parameter distribution before any data are observed. Of course, if there is no reason to believe in a certain distribution without any prior evidence, we can use a uniform prior. In both Bayesian and frequentist inference approaches, the distribution of the data given the parameters (\(\theta\)) are identically interpreted based on some underlying statistical model. However, frequentist representation of this distribution is expressed as \(f^{freq}_{\theta}(x)\) emphasizing \(x\) as random (described by a probability distribution) and the parameter \(\theta\) as fixed. The corresponding Bayesian distribution representation is written as \(f^{Bayes}(x | \theta)\), emphasizing as random both \(X\) and \(\theta\), and conditioning on the \(\theta\) parameter as being given (i.e., known). This conditional distribution describes the probability of observing \(X\) for a given fixed value of \(\theta\).
Thus, in the Bayesian framework, assuming a prior distribution, \(g()\), for the parameter \(\theta\), the joint = marginal times conditional, \(f(X, \theta) = g(\theta) \times f(x | \theta)\), describes the uncertainty about \(X\) and \(\theta\) before observing the data.
Taking a data sample, \(X = x\), the conditional distribution of the parameter \(\theta\) given \(X\) can be computed, $ f(| x) = $, and is called the posterior distribution representing the probability distribution describing the post-evidence uncertainty in the parameter \(\theta\) after taking a data sample \(x\).
Both the (marginal) prior and the (conditional) posterior distributions capture the probability of the parameter of interest. The prior is the initial distribution before seeing any data and the posterior the probability of the same parameter after observing the data. The Bayesian rule or Bayesian theorem provides a mechanism of inverting the conditioning and computing one conditional posterior probability from the other, e.g., having \(f(x | \theta)\), we can compute the conditional posterior probability of \(\theta\) given \(x\):
\[f(\theta |x) = \frac{g(\theta)\times f(x | \theta)}{f_X(x)}.\]
Bayesian inference employs the Bayesian rule to compute the conditional posterior probability from the joint distribution model and the prior belief.
The numerator in the right hand size above represents an unnormalized posterior distribution, \(f(\theta |x) \sim g(\theta)\times f(x | \theta)\). In other words, \(\text{un-normalized posterior}=\text{unnormalized prior} \times \text{likelihood}\).
Bayesian inference suggests unique answers for each problem, each problem yields a single posterior distribution. On the other hand, frequentist inference always provides ranges of answers, e.g., confidence intervals of certain confidence levels. Bayesian inference is subjective as it heavily depends on prior distributions, which may be difficult to justify objectively. The frequentist inference methods (e.g., parametric hypothesis tests) tend to be simple (e.g., using sampling distributions) to compute using samples of data. Bayesian inference is much more difficult due to general challenges of analytically estimating the likelihoods, \(f(x | \theta)\). Hence, MCMC are developed to provide stochastic approaches to compute analytically intractable likelihoods. MCMC transforms analytic challenges into computational problems that are tractable using programming, storage, and compute infrastructure.
Frequentist inference treats parameters as fixed and data as random, even after it is observed (think of the sampling distribution of the mean), whereas Bayesian inference treats both data and parameters as random until data is observed, and then treats the data as fixed to compute the conditional posterior probability. Yet, the two inference strategies tend to agree for large sample sizes.
Suppose \(\hat{\theta}_n\) and \(\theta_o\) denote the maximum likelihood estimate (MLE) and the true unknown parameter value. Frequentist inference treats \(\hat{\theta}_n\) as random and \(\theta_o\) as fixed and computes the asymptotic distribution: \[\hat{\theta}_n - \theta_o \sim Normal\left(0, I_n(\theta)^{- 1} \right),\] where the reciprocal of the Fisher’s information, \(I(\theta)\), is related to the the lower bound of the Cramer-Rao inequality, see this paper.
This inference may not be useful as frequentist inference has no access to \(\theta_o\). However, plugging in a specific value for \(\theta_o\) does yield a useful result: \[\hat{\theta}_n - \theta_o \sim Normal\left(0, I_n(\hat{\theta}_n)^{- 1} \right).\]
A corresponding Bayesian inference is based on a random \(\theta_o\) and a fixed \(\hat{\theta}_n\), not random once the data are observed. Both inference strategies agree that \(\hat{\theta}_n - \theta_o\) is random, as a function of a random variable is itself a random variable, and its distribution is independent of the prior distribution, under regularity conditions (assuming smooth prior and locally strictly positive function around the true unknown parameter value).
In practice, having large amounts of data suggests that small perturbations of the prior don’t substantially alter the posterior probability or final inference.
The main downside of Bayesian inference is that two Bayesian scientists that look at the same data could arrive at different conclusions based on their prior values or prior beliefs.
To mediate the Bayesian subjectivity, we can employ change-of-variables for priors. Suppose the prior distribution for the parameter \(\theta\) is \(g\). Then, the distribution of another parameter \(\phi = h(\theta)\), where \(h\) is an invertible function with inverse \(h^{-1}\). The change-of-variable theorem for integration implies that: \[d \theta = h^{-1}(\phi) d \phi .\]
Thus, \[P(\phi \in A) = \int_{h(\theta) \in A} {g(\theta) \, d \theta } = \int_A {g(h^{-1}(\phi)) h^{-1}(\phi) \ d \phi} .\]
Therefore, we can employ the following prior for \(\phi\): \(g(h^{-1} (\phi)) h^{-1}(\phi)\).
This suggests that a flat uninformative (diffuse) prior for one parameter may transform to a non-flat, or informative, prior for another parameter.
Monte Carlo simulations may be used to estimate probabilities (likelihoods) or expectations (integrals) involving functions that may be difficult to analytically process (i.e., compute a closed form analytical expression of the desired solutions). Likelihoods are special cases of integrals, and each probability can be expressed as an expectation (e.g., for a characteristic function \(g\), \(E(g(X)) = P( g(X) = 1)\)). Let’s assume we want to compute \(\theta = E(g(X)) = \int_{\mathbb{R}} {g(x)P_X(x)dx}\), but the integral can’t be exactly solved using simple analytical terms.
Instead, we can employ Monte Carlo simulation to estimate \(\theta\). Let’s generate a random sample of observations \(\{ X_1, X_2, \cdots \}\), independent and identically distributed (IID). Then, for each iteration \(n\), the average of the random variables is: \[\hat{\theta}_n = \frac{1}{n} \sum_{i = 1}^n g(X_i).\]
If \(Y_i = g(X_i)\), then \(\hat{\theta}_n\) can be written explicitly as a sample arithmetic average: \[\overline{Y}_n = \frac{1}{n} \sum_{i = 1}^n {Y_i}\]
The law of large numbers (LLN) implies that \(\hat{\theta}_n \longrightarrow \theta\) as \(n \longrightarrow \infty\).
Having a sufficiently large simulation sample would allow us to calculate a good estimate of \(\theta\).
By the central limit theorem (CLT): \[\hat{\theta}_n \sim Normal\left( \theta, \frac{\sigma^2}{n} \right),\] as \(n\longrightarrow \infty\) and \(\sigma^2 = \mathop{\rm var}( g(X))\) represents the population variance.
Neither \(\theta\) nor \(\sigma\) are known. However, we can use the sample variance as an estimate \[\hat{\sigma}^2 = \frac{1}{n-1} \sum_{i = 1}^n \left ( g(X_i) - \hat{\theta}_n \right )^2. \]
Thus, \[\hat{\theta}_n \sim Normal\left( \theta, \frac{\hat{\sigma}^2}{n} \right). \]
Let’s construct a 95% confidence interval for \(\theta\): \[\hat{\theta}_n \pm 1.96 \times \frac{\hat{\sigma}_n}{\sqrt{n}}. \]
Let’s now go back to Markov chains, sequences \(\{X_1\), \(X_2\), \(\ldots \}\) of random variables obeying the Markov property that the conditional distribution of \(X_{n + 1}\) given the entire transition history back to the initial time actually only depends on the current state \(X_n\): \[f(x_{n + 1} | x_1, x_2, \cdots, x_n) = f(x_{n + 1} | x_n).\]
Referring back to our 5-state cancer staging Markov process, as the conditional distribution does not depend on \(n\), we have stationary transition probabilities. If \(g\) represents the chain distribution at the initial state, the Bayesian inference, $ = $, implies that
\[\begin{align*} f(x_1, x_2) &= g(x_1) f(x_2 | x_1) \\ f(x_1, x_2, x_3) &= g(x_1) f(x_3 | x_2, x_1) f(x_2 | x_1) \\ f(x_1, x_2, x_3, x_4) &= g(x_1) f(x_4 | x_3 , x_2, x_1) f(x_3 | x_2, x_1) f(x_2 |x_1) \\ f(x_1, x_2, x_3, x_4, x_5) &= g(x_1) f(x_5, x_4, x_3, x_2 | x_1) \\ & = g(x_1) f(x_5 | x_4, x_3, x_2, x_1) f(x_4 | x_3, x_2, x_1) f(x_3 | x_2, x_1) f(x_2 | x_1). \end{align*} \]
All of the Markov chain probabilities are computed by multiplying the initial state distribution \(g\) and the conditional transition probability distribution \(f(\,\cdot\, | \,\cdot\,)\).
The marginal distributions of the different variables can also be estimated by: \[\begin{align*} f(x_2) & = \int f(x_1, x_2) \, d x_1 \\ & = \int f(x_2 | x_1) g(x_1) \, d x_1 \\ f(x_3) & = \iint f(x_1, x_2, x_3) \, d x_1 \, d x_2 \\ & = \iint f(x_3 | x_2) f(x_2 | x_1) g(x_1) \, d x_1 \, d x_2 \\ f(x_4) & = \iiint f(x_1, x_2, x_3, x_4) \, d x_1 \, d x_2 \, d x_3 \\ & = \iiint f(x_4 | x_3) f(x_3 \mid x_2) f(x_2 \mid x_1) g(x_1) \, d x_1 \, d x_2 \, d x_3 \\ f(x_5) & = \iiiint f(x_1, x_2, x_3, x_4, x_5) \, d x_1 \, d x_2 \, d x_3 \, d x_4 \\ & = \iiiint f(x_5 | x_4) f(x_4 | x_3) f(x_3 | x_2) f(x_2 | x_1) g(x_1) \, d x_1 \, d x_2 \, d x_3\, d x_4 \end{align*}\]
An ergodic law, ergodicity, or ergodic theory represents the idea is that for certain dynamical systems, like Markov chains, the time average of their properties is equal to the average over the entire state space. In physics, ergodic methods are used to study geodesic flows on Riemannian manifolds and in mathematics, there are ergodic applications in harmonic analysis, Lie group representation, the theory of diophantine approximations, etc.
Ergodic theory characterizes the behavior of a dynamical system that runs indefinitely. The Poincaré recurrence theorem suggests that on the long run, almost all points in any subset of the phase-space are eventually revisited. In general, ergodic theorems determine the conditions required to guarantee the almost certain existence of the time average of a function along the trajectories and its relation to the space average. For example, one special class of ergodic systems, this time average is statistically the same for almost all initial points, i.e., the evolving dynamical system forgets its initial state.
One special case of ergodic law for Markov chains suggests that: \(\hat{\theta}_n \longrightarrow \theta\) as \(n \longrightarrow \infty\).
The CLT suggests that for sufficiently large \(b<<n\), we have \[\hat{\theta}_b \approx Normal\left( \theta, \frac{\sigma^2}{b} \right) .\] When \(n\) is a multiple of \(b\), i.e., \(m\times b = n\), for an integer \(m\), we can divide the Markov chain into \(m\) batches: \[ X_{m, 1}, X_{m, 2}, \ldots, X_{m, b}, \ \forall\ m=1, 2, 3, \cdots, \frac{n}{b}.\] Computing the batch means yields: \[\hat{\theta}_{b, m} = \frac{1}{b} \sum_{i = 1}^b {X_{m, i}}. \] Again, assume that \(b\) is sufficiently large, then we have \[\hat{\theta}_{b, m} \sim Normal\left( \theta, \frac{\sigma^2}{b} \right) . \] Therefore, we can use the batches and their batch means, \(\hat{\theta}_{b, m}\), to obtain Bayesian simulation based estimates of the unknown process variance, \(\frac{\sigma^2}{b}\), using the sample variance statistic:
\[\frac{1}{m} \sum_{j = 1}^m ( \hat{\theta}_{b, m} - \hat{\theta}_n )^2. \] As the last expression estimates \(\sigma^2 / b\), we can solve for the numerator and estimate the variance by: \[\hat{\sigma}^2_{b, m} = \frac{b}{m} \sum_{j = 1}^m ( \hat{\theta}_{b, m} - \hat{\theta}_n )^2.\] Therefore, for sufficiently large \(b\) and \(m\), the MCMC simulation yields a good estimator of \(\sigma^2\).
Similarly, we can perform Bayesian inference on the parameter \(\theta\) as well as compute confidence intervals for \(\theta\). For instance, when we run the MCMC simulation long enough (large sample), we get large \(b\) and \(m\) such that \(n = b m\), the expression below represents a 95% confidence interval for \(\theta\): \[\hat{\theta}_n \pm 1.96 \times \frac{\hat{\sigma}_{b, m}}{\sqrt{n}}. \]
Some additional details and demonstrations are provided in the SOCR Bayesian Estimator.
Remember that in Chapter 21, we discussed the importance of randomly sampling from diverse arrays of probability distributions. The Metropolis random walk algorithm provides a protocol for simulating data from the distribution of a continuous random vector for a given PDF \(h\). The pseudo-code implementation of the Metropolis algorithm is provided below.
In the Metropolis algorithm, we can only adjust the proposal variance, \(\Sigma\), which in the scalar version corresponds to \(\sigma_2\). Too small variance yields MCMC chains taking tiny steps forward and requiring a very long time to equilibrate. On the flip side, large variance would force the algorithm to take very large steps, \(y_n\), most of which may be discarded (not accepted or used) in the last step because the calculation of \(r\) in step 3 may be too small.
This process generates a positively recurrent Markov chain that obeys the ergodic LLN law and has \(h\) as its equilibrium distribution. This algorithm works for any distribution function \(h\geq 0\).
The R function mcmc::metrop
implements the Random-Walk
Metropolis algorithm. It requires a predefined R function to compute
\(\log (h(x))\). In the degenerate case
of \(h(x) = 0\) for some \(x\), would yield \(\log h(x) = - \infty\).
The metrop
function parameters include a
scale
parameter vector and an output function, \(g\), coded as an R function.
The scale
argument to the metrop
function
specifies the multivariate normal random vector \(y\) in the Metropolis algorithm. If
z
is a standard multivariate normal random vector, then
\(y\) in the algorithm is \(scale \times z\) (when scale
is scalar or vector) and \(scale\ \%*\%\
z\) (when scale
is a matrix of appropriate
dimensions.
The acceptance rate represents the proportion of the time
that \(u < r\) in step 4 of the
algorithm, e.g., acceptance=20%. When \(h\) is a continuous distribution function
the acceptance rate may be close to one, which reduces the iterative
steps and may substantially increase the execution time to transition
from one part of the sample space to another. For scale
parameters, the algorithm will take large steps. When \(h(x)\) is moderately large and \(x\) resembles a sample from the equilibrium
distribution, \(x + y\) will be
extremely large, in the tails of the distribution, corresponding to a
very small \(r = \frac{h(x + y)}{
h(x)}\).
The Metropolis algorithm only requires un-normalized prior distribution, so \(h\) can be defined as \[\text{un-normalized posterior} \sim \text{likelihood} \times \text{un-normalized prior}. \]
MCMC-based Bayesian inference is always possible when we can provide specific functions that evaluate the likelihood and unnormalized posterior.
We will demonstrate frequentist and Bayesian inference using a clinical case-study of hospital admissions; N=58,863 (cases) and k=9 (features) including:
First, we will load and inspect the data.
hosAdmissions <- read.csv("https://umich.instructure.com/files/5005926/download?download_frd=1")
head(hosAdmissions)
## ID AdmissionLengthDays Death_1 Admission_Type Insurance_Type
## 1 42862 4.277778 0 elective private
## 2 5967 26.122917 0 elective private
## 3 50952 7.028472 1 emergency medicare
## 4 17138 34.187500 0 elective medicare
## 5 27703 6.165972 0 elective government
## 6 29593 3.286806 0 elective medicare
## EnglishLanguage_1 Religion_Type Married_1 Race
## 1 1 episcopalian 1 white
## 2 0 catholic 1 white
## 3 1 catholic 0 asian - japanese
## 4 1 catholic 1 white
## 5 0 protestant quaker 0 black/african american
## 6 0 christian scientist 0 white
## Dx
## 1 mitral stenosis\\mitral valve replacement /sda
## 2 duodenal mass/sda
## 3 ? serotonin syndrome
## 4 abdominal aortic aneurysm/sda
## 5 absence/sda
## 6 adenoid cystic carcinoma/sda
# remove the case ID:
hosAdmissions <- hosAdmissions[, -1]
# Clean the diagnostic variable (free text feature!)
preproc_fun = function(x) # text data
{ require("tm")
x = gsub("<.*?>", " ", x) # regex removing HTML tags
x = iconv(x, "latin1", "ASCII", sub="") # remove non-ASCII characters
x = gsub("[^[:alnum:]]", " ", x) # remove non-alpha-numeric values
x = tolower(x) # convert to lowercase characters
# x = removeNumbers(x) # removing numbers
x = stripWhitespace(x) # removing white space
x = gsub("^\\s+|\\s+$", "", x) # remove leading and trailing white space
return(x)
}
# Clean the DX feature by itself
Dx11 <- preproc_fun(hosAdmissions$Dx); hosAdmissions$Dx <- Dx11
## Loading required package: tm
## Loading required package: NLP
Each row of this data frame represents one hospital admission record. The response variable, \(y\) (random outcome), is a binary survival outcome (1 for death, 0 for survival). The frequentist inference will be based on logistic regression modeling.
## [1] 0
## [1] 58863 9
Let’s fit a logistic prediction model for the clinical outcome \(y\) representing the binary variable Death_1.
# model_glm <- glm(y ~ ., data = hosAdmissions, family = binomial)
model_glm <- glm(y ~ AdmissionLengthDays + Married_1 + Insurance_Type + Admission_Type, data = hosAdmissions, family = binomial)
summary(model_glm)
##
## Call:
## glm(formula = y ~ AdmissionLengthDays + Married_1 + Insurance_Type +
## Admission_Type, family = binomial, data = hosAdmissions)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -4.326120 0.131779 -32.829 < 2e-16 ***
## AdmissionLengthDays 0.001042 0.001273 0.819 0.412896
## Married_1 -0.003940 0.029010 -0.136 0.891965
## Insurance_Typemedicaid 0.274587 0.121530 2.259 0.023857 *
## Insurance_Typemedicare 0.920552 0.110573 8.325 < 2e-16 ***
## Insurance_Typeprivate 0.382308 0.113017 3.383 0.000718 ***
## Insurance_Typeself pay 1.017459 0.156722 6.492 8.46e-11 ***
## Admission_Typeemergency 1.707001 0.073893 23.101 < 2e-16 ***
## Admission_Typenewborn -0.875436 0.148996 -5.876 4.21e-09 ***
## Admission_Typeurgent 1.657608 0.111239 14.901 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 38120 on 58862 degrees of freedom
## Residual deviance: 35470 on 58853 degrees of freedom
## AIC: 35490
##
## Number of Fisher Scoring iterations: 7
Stratify the population, say on Race=white, refit the logistic model, and report the results.
model_glm_white <- glm(y ~ AdmissionLengthDays + as.factor(Married_1) + as.factor(Insurance_Type) + as.factor(Admission_Type), data = hosAdmissions, family = binomial, subset = hosAdmissions$Race == "white")
summary(model_glm_white)
##
## Call:
## glm(formula = y ~ AdmissionLengthDays + as.factor(Married_1) +
## as.factor(Insurance_Type) + as.factor(Admission_Type), family = binomial,
## data = hosAdmissions, subset = hosAdmissions$Race == "white")
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -4.408236 0.189887 -23.215 < 2e-16 ***
## AdmissionLengthDays 0.002208 0.001518 1.455 0.145737
## as.factor(Married_1)1 0.033884 0.034740 0.975 0.329387
## as.factor(Insurance_Type)medicaid 0.298104 0.186195 1.601 0.109370
## as.factor(Insurance_Type)medicare 0.972134 0.171160 5.680 1.35e-08 ***
## as.factor(Insurance_Type)private 0.364072 0.173671 2.096 0.036052 *
## as.factor(Insurance_Type)self pay 0.854595 0.245315 3.484 0.000495 ***
## as.factor(Admission_Type)emergency 1.702102 0.084834 20.064 < 2e-16 ***
## as.factor(Admission_Type)newborn -0.825623 0.188062 -4.390 1.13e-05 ***
## as.factor(Admission_Type)urgent 1.479738 0.149358 9.907 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 26399 on 40926 degrees of freedom
## Residual deviance: 24603 on 40917 degrees of freedom
## AIC: 24623
##
## Number of Fisher Scoring iterations: 7
Report the variance-covariance matrix of the estimates of the model coefficients:
## Estimate Std. Error z value
## (Intercept) -4.408236433 0.189887179 -23.2150293
## AdmissionLengthDays 0.002208128 0.001517869 1.4547558
## as.factor(Married_1)1 0.033884030 0.034740425 0.9753488
## as.factor(Insurance_Type)medicaid 0.298103910 0.186195000 1.6010307
## as.factor(Insurance_Type)medicare 0.972133761 0.171159880 5.6796824
## as.factor(Insurance_Type)private 0.364071672 0.173670514 2.0963355
## as.factor(Insurance_Type)self pay 0.854595497 0.245315401 3.4836602
## as.factor(Admission_Type)emergency 1.702102077 0.084834009 20.0639118
## as.factor(Admission_Type)newborn -0.825622895 0.188062297 -4.3901564
## as.factor(Admission_Type)urgent 1.479737666 0.149358060 9.9073171
## Pr(>|z|)
## (Intercept) 3.210399e-119
## AdmissionLengthDays 1.457369e-01
## as.factor(Married_1)1 3.293873e-01
## as.factor(Insurance_Type)medicaid 1.093701e-01
## as.factor(Insurance_Type)medicare 1.349451e-08
## as.factor(Insurance_Type)private 3.605244e-02
## as.factor(Insurance_Type)self pay 4.946070e-04
## as.factor(Admission_Type)emergency 1.525943e-89
## as.factor(Admission_Type)newborn 1.132692e-05
## as.factor(Admission_Type)urgent 3.868958e-23
## (Intercept) AdmissionLengthDays
## (Intercept) 3.605714e-02 -2.153387e-05
## AdmissionLengthDays -2.153387e-05 2.303925e-06
## as.factor(Married_1)1 -4.616676e-04 -3.524855e-07
## as.factor(Insurance_Type)medicaid -2.882354e-02 -3.075147e-06
## as.factor(Insurance_Type)medicare -2.886613e-02 6.731918e-07
## as.factor(Insurance_Type)private -2.886567e-02 -5.812371e-07
## as.factor(Insurance_Type)self pay -2.883529e-02 8.082561e-06
## as.factor(Admission_Type)emergency -7.045865e-03 -2.026943e-06
## as.factor(Admission_Type)newborn -7.298670e-03 -5.844913e-06
## as.factor(Admission_Type)urgent -6.945212e-03 -8.437119e-06
## as.factor(Married_1)1
## (Intercept) -4.616676e-04
## AdmissionLengthDays -3.524855e-07
## as.factor(Married_1)1 1.206897e-03
## as.factor(Insurance_Type)medicaid 9.691941e-05
## as.factor(Insurance_Type)medicare -2.370309e-04
## as.factor(Insurance_Type)private -4.300747e-04
## as.factor(Insurance_Type)self pay 5.090034e-05
## as.factor(Admission_Type)emergency 1.559534e-04
## as.factor(Admission_Type)newborn 8.603678e-04
## as.factor(Admission_Type)urgent 6.969787e-05
## as.factor(Insurance_Type)medicaid
## (Intercept) -2.882354e-02
## AdmissionLengthDays -3.075147e-06
## as.factor(Married_1)1 9.691941e-05
## as.factor(Insurance_Type)medicaid 3.466858e-02
## as.factor(Insurance_Type)medicare 2.880777e-02
## as.factor(Insurance_Type)private 2.879171e-02
## as.factor(Insurance_Type)self pay 2.882194e-02
## as.factor(Admission_Type)emergency 1.063638e-06
## as.factor(Admission_Type)newborn 1.113055e-04
## as.factor(Admission_Type)urgent 6.101094e-05
## as.factor(Insurance_Type)medicare
## (Intercept) -2.886613e-02
## AdmissionLengthDays 6.731918e-07
## as.factor(Married_1)1 -2.370309e-04
## as.factor(Insurance_Type)medicaid 2.880777e-02
## as.factor(Insurance_Type)medicare 2.929570e-02
## as.factor(Insurance_Type)private 2.890530e-02
## as.factor(Insurance_Type)self pay 2.882195e-02
## as.factor(Admission_Type)emergency 9.022303e-05
## as.factor(Admission_Type)newborn 3.258590e-04
## as.factor(Admission_Type)urgent 1.185816e-04
## as.factor(Insurance_Type)private
## (Intercept) -2.886567e-02
## AdmissionLengthDays -5.812371e-07
## as.factor(Married_1)1 -4.300747e-04
## as.factor(Insurance_Type)medicaid 2.879171e-02
## as.factor(Insurance_Type)medicare 2.890530e-02
## as.factor(Insurance_Type)private 3.016145e-02
## as.factor(Insurance_Type)self pay 2.879496e-02
## as.factor(Admission_Type)emergency 1.708531e-04
## as.factor(Admission_Type)newborn -8.262959e-04
## as.factor(Admission_Type)urgent 1.467773e-04
## as.factor(Insurance_Type)self pay
## (Intercept) -2.883529e-02
## AdmissionLengthDays 8.082561e-06
## as.factor(Married_1)1 5.090034e-05
## as.factor(Insurance_Type)medicaid 2.882194e-02
## as.factor(Insurance_Type)medicare 2.882195e-02
## as.factor(Insurance_Type)private 2.879496e-02
## as.factor(Insurance_Type)self pay 6.017965e-02
## as.factor(Admission_Type)emergency -9.802637e-05
## as.factor(Admission_Type)newborn 2.327653e-04
## as.factor(Admission_Type)urgent 7.672538e-05
## as.factor(Admission_Type)emergency
## (Intercept) -7.045865e-03
## AdmissionLengthDays -2.026943e-06
## as.factor(Married_1)1 1.559534e-04
## as.factor(Insurance_Type)medicaid 1.063638e-06
## as.factor(Insurance_Type)medicare 9.022303e-05
## as.factor(Insurance_Type)private 1.708531e-04
## as.factor(Insurance_Type)self pay -9.802637e-05
## as.factor(Admission_Type)emergency 7.196809e-03
## as.factor(Admission_Type)newborn 6.911878e-03
## as.factor(Admission_Type)urgent 6.877517e-03
## as.factor(Admission_Type)newborn
## (Intercept) -7.298670e-03
## AdmissionLengthDays -5.844913e-06
## as.factor(Married_1)1 8.603678e-04
## as.factor(Insurance_Type)medicaid 1.113055e-04
## as.factor(Insurance_Type)medicare 3.258590e-04
## as.factor(Insurance_Type)private -8.262959e-04
## as.factor(Insurance_Type)self pay 2.327653e-04
## as.factor(Admission_Type)emergency 6.911878e-03
## as.factor(Admission_Type)newborn 3.536743e-02
## as.factor(Admission_Type)urgent 6.907991e-03
## as.factor(Admission_Type)urgent
## (Intercept) -6.945212e-03
## AdmissionLengthDays -8.437119e-06
## as.factor(Married_1)1 6.969787e-05
## as.factor(Insurance_Type)medicaid 6.101094e-05
## as.factor(Insurance_Type)medicare 1.185816e-04
## as.factor(Insurance_Type)private 1.467773e-04
## as.factor(Insurance_Type)self pay 7.672538e-05
## as.factor(Admission_Type)emergency 6.877517e-03
## as.factor(Admission_Type)newborn 6.907991e-03
## as.factor(Admission_Type)urgent 2.230783e-02
To perform the analogous MCMC Bayesian inference, we will
use the R function mcmc::metrop
, which takes as input the
log unnormalized posterior probability distribution (that we need to
generate MCMC samples from). Let’s rewrite the Bayesian rule (\(unnormalized\ posterior\ \sim \ likelihood \times
\ unnormalized\ prior\)) in terms of logs
: \[\text{log unnormalized posterior} \sim \text{log
likelihood} + \text{log unnormalized prior}. \] Let’s examine the
two terms on the right-hand side.
The frequentist and Bayesian analyses share the same (log) likelihood function. Thus, the frequentist model matrix can be reused in the Bayesian analysis.
hosAdmissions$Married_1 <- as.factor(hosAdmissions$Married_1)
hosAdmissions$Insurance_Type <- as.factor(hosAdmissions$Insurance_Type)
hosAdmissions$Admission_Type <- as.factor(hosAdmissions$Admission_Type)
model_glm <- glm(y ~ AdmissionLengthDays + Married_1 + Insurance_Type + Admission_Type,
data = hosAdmissions, family = binomial, x=TRUE) # return the design matrix X
design_mat <- model_glm$x
response <- model_glm$y
We need to define a function that computes the log likelihood, \(\log(f(x | \beta))\).
log_likelihod <- function(beta) {
# housekeeping
stopifnot(is.numeric(beta))
stopifnot(is.finite(beta))
stopifnot(length(beta) == ncol(design_mat))
# Compute the estimates y_hat = X*b
eta <- drop(design_mat %*% beta)
# use log1p to calculate the function x --> log(1+x)
# correctly for small x, avoiding singularity problems
logp <- ifelse(eta < 0, eta - log1p(exp(eta)), - log1p(exp(-eta)))
logq <- ifelse(eta < 0, -log1p(exp(eta)), -eta -log1p(exp(-eta)))
sum(ifelse(response == 1, logp, logq))
}
The reason for computing log likelihoods
, \(\log(f(x | \beta))\), instead of the raw
likelihoods, \(f(x | \beta)\), is that
it makes the asymptotic calculations more stable. For instance, let
\(\eta = X\times \beta\), \(p=\frac{\exp{(\eta)}}{1+\exp{(\eta)}}\),
and \(q=1-p=\frac{1}{1+\exp{(\eta)}}\).
Then, \(\log(p)=\eta-\log(1+\exp{(\eta)})\) and
\(\log(q)=-\log(1+\exp{(\eta)})\) are
computable for large \(|\eta|\), either
positive or negative.
The mcmc::demo
package (vignette(“demo”,
“mcmc”)) includes additional vignettes and examples.
Ideally, we would choose informative Bayesian priors, but we also don’t want to introduce strong subjectivity. Conjugate priors assume the prior and posterior distributions are of the same probability distribution family.
\[\prod_{i = 1}^{k} p(\beta_i)
q(\beta_i),\] where \(p(\beta) = {\rm
logit}^{- 1}(\beta)\) is the mean-value parameter corresponding
to the linear predictor \(\beta\) and
\(q(\beta) = 1 - p(\beta)\); the \(p\) and \(q\) functions are used to calculate for
various \(\beta\) values the log
likelihood, see log_likelihod
.
log_unnormalized_prior <- function(beta) {
stopifnot(is.numeric(beta))
stopifnot(is.finite(beta))
stopifnot(length(beta) == ncol(design_mat))
logp <- ifelse(beta < 0, beta - log1p(exp(beta)), - log1p(exp(- beta)))
logq <- ifelse(beta < 0, - log1p(exp(beta)), - beta - log1p(exp(- beta)))
sum(logp) + sum(logq)
}
log_unnormalized_posterior <- function(beta) {
log_likelihod(beta) + log_unnormalized_prior(beta)
}
We are ready to run the MCMC Bayesian inference.
# install.packages("mcmc")
library(mcmc)
logUnnormalizedPDF <- function(x) {
if (all(x >= 0) && sum(x) <= 1) return(1)
else return(-Inf)
}
set.seed(1234)
mcmc_out <- metrop(log_unnormalized_prior, rep(0, ncol(design_mat)),nbatch = 100, blen = 100)
# Runs a "random-walk" Metropolis algorithm with multivariate normal proposal
# Generates a Markov chain (MC) with equilibrium distribution having the
# specified unnormalized density
mcmc_out$accept
## [1] 0.3907
The main output, mcmc_out
, of the MCMC call,
metrop
, includes:
outfun
if outfun
is
a function, or the dimension of state when outfun
is
missing.system.time()
.Sometimes, we need to adjust the proposal to get higher
acceptance rates (mcmc_out$accept
). In general, an
acceptance rate of \(\sim 20\%\) is
good. Forcing the acceptance rates below 70% may keep the sampler from
ever visiting part of the state space. The acceptance rate
reflects the proportion of the time that \(u
< r\) in step
4 of the algorithm.
The metrop
function accepts as first argument the output
of a previous run, and it uses all the same arguments as the previous
run except for any modified in the call for this new run. Of course the
initial state of the next run is the final state of the previous run and
the random number generation seed at the start of the next are the same
as at the end of the previous run. This yields a continuous feed-forward
iteration if no modification of other arguments is made.
## [1] 0.9311
## [1] 0.7905
## [1] 0.6642
## [1] 0.7219
## [1] -0.05332039 0.11673006
## attr(,"conf.level")
## [1] 0.95
## [1] 100
It may be appropriate to check that the batch length is long enough for the validity of this confidence interval for the true unknown acceptance rate.
Similar to diagnostic plots for linear models, analogous MCMC diagnostic plots may be drawn to confirm there are no clear problems with the simulation estimates. For instance, we can display the time series of any component of the Markov function, e.g., acceptance indicators or their batch means. Ideally, the time series should behave stationarily.
# Recall the actual model
# model_glm <- glm(y ~ AdmissionLengthDays + as.factor(Married_1) + as.factor(Insurance_Type) + as.factor(Admission_Type), data = hosAdmissions, family = binomial, x=TRUE)
library(plotly)
## Loading required package: ggplot2
##
## Attaching package: 'ggplot2'
## The following object is masked from 'package:NLP':
##
## annotate
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:igraph':
##
## groups
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
# Batch Means for AdmissionLengthDays
#plot(ts(mcmc_out$batch[ , 1]), main="Batch Means for AdmissionLengthDays Coefficient")
plot_ly(x=c(1:length(ts(mcmc_out$batch[ , 1]))), y=ts(mcmc_out$batch[ , 1]),
type="scatter", mode="markers+lines") %>%
layout(title="Batch Means for AdmissionLengthDays Coefficient",
xaxis=list(title="time"), yaxis=list(title="mcmc_out"))
# Married_1
# plot(ts(mcmc_out$batch[ , 2]), main="Batch Means for Married_1 Coefficient")
plot_ly(x=c(1:length(ts(mcmc_out$batch[ , 2]))), y=ts(mcmc_out$batch[ , 2]),
type="scatter", mode="markers+lines") %>%
layout(title="Batch Means for Married_1 Coefficient",
xaxis=list(title="time"), yaxis=list(title="mcmc_out"))
Autocorrelation (ACF) plots may also be used to report the timeseries autocorrelation of the time series of batch means. These depict the internal timeseries correlation that should not be significantly nonzero, except for lag zero. Otherwise, it may suggest that the batch length is not long enough.
It appears as if some of the low-lag autocorrelations may be significantly non-trivial, i.e., the blue dashed error lines indicate the cutoff points for \(0.05\) level tests of the null hypothesis that the true unknown autocorrelation is zero. To mediate this problem, we can increase the batch length and re-plot the resulting autocorrelations.
Are there examples of questions that may be answerable using Bayesian inference that may not be within the reach of a frequentist inference? Question of about probability of parameters only make sense in a Bayesian setting, as frequentist inference considers all parameters as static, not random.
Once we applied the Bayesian rule and sampled the posterior probability distribution, we can examine the function of those parameters to answer specific parameter probabilistic questions. In order to calculate the probabilities of specific events, we need to output the indicators of these events.
outputIndicatorFun <- function(beta) {
stopifnot(is.numeric(beta))
stopifnot(is.finite(beta))
stopifnot(length(beta) == ncol(design_mat))
as.numeric(beta == max(beta))
}
Let’s call the MCMC with a reference to the output indicator function.
## user system elapsed
## 0.95 0.00 2.04
## [1] 0.71766
Notice that the increase of blen
drives the computing
time higher. However, longer run times yield more precise answers.
## NULL
colnames(mcmc_out$batch) <- names(model_glm_white$coefficients)
# skip over the intercept column and deal with the remaining 10-1=9 coefficient estimates
mcmc_out_ts <- as.ts(mcmc_out$batch[ , 2:10])
# report the overall means for each coefficient by batch averaging
colMeans(mcmc_out_ts)
## AdmissionLengthDays as.factor(Married_1)1
## 0.08101 0.10191
## as.factor(Insurance_Type)medicaid as.factor(Insurance_Type)medicare
## 0.09991 0.10775
## as.factor(Insurance_Type)private as.factor(Insurance_Type)self pay
## 0.10386 0.09003
## as.factor(Admission_Type)emergency as.factor(Admission_Type)newborn
## 0.10722 0.09921
## as.factor(Admission_Type)urgent
## 0.10589
Let’s report only the coefficient estimates corresponding to the larger probabilities (\(p>0.1\)).
## as.factor(Married_1)1 as.factor(Insurance_Type)medicare
## 0.10191 0.10775
## as.factor(Insurance_Type)private as.factor(Admission_Type)emergency
## 0.10386 0.10722
## as.factor(Admission_Type)urgent
## 0.10589
# Compute the grand variance (variance of batch means)
apply(mcmc_out_ts_highProb, 2, sd) / sqrt(mcmc_out$batch)
## (Intercept) AdmissionLengthDays as.factor(Married_1)1
## [1,] 0.1862559 0.3226046 0.3345570
## [2,] 0.2821392 0.4448249 0.1720344
## [3,] 0.5178081 0.2188139 0.2388471
## [4,] 0.2007440 0.2975295 0.2927687
## [5,] 0.2483227 0.4301076 0.2633860
## [6,] 0.3803786 0.1634675 0.3900100
## [7,] 0.1924431 0.2509151 0.3922987
## [8,] Inf 0.2767801 0.2315708
## [9,] 0.1971004 0.2112527 0.1515867
## [10,] 0.1691363 0.3436282 0.1801522
## [11,] 0.3377586 0.1605995 0.2104048
## [12,] 0.2631620 0.2556055 0.2317012
## [13,] 0.2159409 0.1688904 0.3573216
## [14,] 0.2422708 0.2371698 0.1532255
## [15,] 0.3551496 0.2836045 0.7852654
## [16,] 0.1638370 0.2890539 0.3803786
## [17,] 0.4886795 0.2299257 0.6579049
## [18,] 0.4631417 0.2178435 0.1697960
## [19,] 0.2246482 0.1672693 0.6708174
## [20,] 0.3148065 0.9617498 0.3205833
## [21,] 0.4602873 0.2251724 0.3410538
## [22,] 0.1475739 0.4652090 1.1768961
## [23,] 0.4300163 0.2375867 0.3971407
## [24,] 0.2347372 0.2777448 0.2112527
## [25,] 0.2074163 0.2815715 0.1560164
## [26,] 0.1800456 0.2261409 0.2223399
## [27,] 0.5263239 0.2699985 0.3101393
## [28,] 0.2188139 0.2238680 0.1802777
## [29,] 0.2130205 0.2236058 0.2078452
## [30,] 0.1904548 Inf 0.3331599
## [31,] 0.1815398 0.3021004 0.7343645
## [32,] 0.4060677 0.4269051 0.2924022
## [33,] 0.2829088 0.2270739 0.3661456
## [34,] 0.3389581 0.1866505 0.2371698
## [35,] 0.1850888 0.3926327 0.6533803
## [36,] 0.4177039 0.1830718 0.3118481
## [37,] 0.2372851 0.5064554 0.1385063
## [38,] 0.2964961 0.2573009 0.4300163
## [39,] 0.2951203 0.5999973 0.3985249
## [40,] 0.1574033 0.4301076 1.0535441
## [41,] 1.2178058 0.9205747 0.2831341
## [42,] 0.2344433 0.2821392 0.2497823
## [43,] 0.1730842 0.1866051 0.4300163
## [44,] 0.2176419 0.5196129 0.4472116
## [45,] 0.1718141 0.3148065 0.2525679
## [46,] 0.2043920 0.9943343 0.4687337
## [47,] 0.2486647 0.2432933 0.3168098
## [48,] 0.1916494 0.6188991 0.1597991
## [49,] 0.2927687 0.2614456 0.2738600
## [50,] 0.1695739 0.4711592 0.2815715
## [51,] 0.3951084 0.3068582 0.1709464
## [52,] 0.4581062 0.7596831 0.3397406
## [53,] 0.1929757 0.3807001 0.1352851
## [54,] 0.1238579 0.1672693 0.1971004
## [55,] 0.1886146 0.2738557 0.2585822
## [56,] 0.3118481 0.1692867 Inf
## [57,] 0.2248344 0.6382615 0.2475619
## [58,] 0.4159135 0.1890768 0.1978443
## [59,] 0.2598064 0.1978132 0.2491353
## [60,] 0.2367664 0.2738557 0.1862420
## [61,] 0.1701105 0.3170896 0.5078601
## [62,] 0.4804658 0.2618559 0.2475619
## [63,] 0.2557272 0.1943299 0.5458177
## [64,] 0.3543730 0.2838948 0.3319685
## [65,] 0.2355796 0.3592555 0.2483227
## [66,] 0.2073330 0.2512142 0.4687337
## [67,] 0.2699985 0.2177944 0.3685005
## [68,] 0.3308155 0.2281735 0.1993044
## [69,] 0.3051272 0.1792835 0.2777448
## [70,] 0.1910803 0.4808749 0.2944745
## [71,] Inf 0.2525612 0.2793838
## [72,] 0.3123158 0.3880111 0.2821392
## [73,] 0.2270739 0.2917518 0.1760601
## [74,] 0.1943242 0.8783061 0.1545755
## [75,] 0.2027547 0.2367664 0.2005387
## [76,] 0.2911113 0.6089029 0.1301889
## [77,] 0.1514327 0.3168098 0.2605692
## [78,] 0.2829088 0.1750511 0.2228291
## [79,] 0.2148335 0.2614456 0.2505796
## [80,] 0.2344105 0.4301076 0.2082249
## [81,] 0.2022665 0.2998031 0.2581743
## [82,] 0.1727743 0.3342160 0.1480402
## [83,] 0.2917518 0.4726920 0.2031011
## [84,] 1.0392258 0.2927687 0.3674218
## [85,] 0.2720239 0.3365423 0.1923500
## [86,] 2.4356116 0.2870396 1.0892386
## [87,] 0.3059196 0.7596831 0.2871332
## [88,] 0.6188991 Inf 0.1473450
## [89,] 0.2665558 0.2166936 0.3769668
## [90,] Inf 0.3235935 0.2107088
## [91,] 0.3284178 0.1937669 0.1684748
## [92,] 0.4269051 0.2344433 0.6037349
## [93,] 0.1658313 0.2238680 0.2710332
## [94,] 0.2975295 0.2817997 0.1722500
## [95,] 0.1292911 0.3205833 0.2391949
## [96,] 0.3951084 0.6509446 0.2411615
## [97,] 0.2532277 0.4726528 0.2644877
## [98,] 0.2894636 0.2401278 0.2327375
## [99,] 0.2195765 0.2449479 0.1936483
## [100,] 0.2757251 0.2570384 0.2256444
## as.factor(Insurance_Type)medicaid as.factor(Insurance_Type)medicare
## [1,] 0.1605995 0.1943830
## [2,] 0.2728862 0.2743653
## [3,] 0.2351246 1.1578542
## [4,] 0.2205634 0.2777448
## [5,] 0.1195974 0.1455416
## [6,] 0.2485836 0.2447882
## [7,] 0.2080478 0.2281904
## [8,] 0.3859514 0.1163687
## [9,] 0.3721025 0.2505796
## [10,] 1.1778981 0.1722729
## [11,] 0.2689683 0.2261409
## [12,] 0.6794813 0.3581181
## [13,] 0.1658313 0.3308155
## [14,] 0.1577484 0.2007440
## [15,] 0.1791079 0.3176555
## [16,] 0.2793838 0.1762347
## [17,] 0.2821392 0.5263239
## [18,] 0.2748240 0.3971407
## [19,] 0.2121311 0.3820270
## [20,] 0.1610389 0.2601542
## [21,] 0.2343668 0.1271379
## [22,] 0.2758687 0.4269051
## [23,] 0.3040674 0.1476442
## [24,] 0.3585670 0.3191957
## [25,] 0.2776332 0.2922006
## [26,] 0.3198115 0.2553215
## [27,] 0.4448249 0.1613550
## [28,] 0.1662582 0.4093633
## [29,] 0.2112527 0.1592227
## [30,] 0.2720239 0.1760805
## [31,] 0.2161256 0.3479445
## [32,] 0.4513190 0.2134525
## [33,] 0.2497093 0.9453840
## [34,] 0.3354087 0.2422708
## [35,] 0.2684679 0.2332586
## [36,] 0.1717948 0.2512142
## [37,] 0.2372851 0.2871332
## [38,] 0.1923091 0.2259900
## [39,] 0.1532255 0.3464086
## [40,] 0.1806813 0.2601542
## [41,] 0.1595622 0.4177039
## [42,] 0.5742665 0.3342160
## [43,] 0.1654077 0.3067231
## [44,] 0.2550679 0.1326251
## [45,] 0.3400299 0.3266902
## [46,] 0.1244539 0.2365675
## [47,] 0.3059196 0.3967316
## [48,] 0.2228291 0.1819392
## [49,] 0.3354087 0.3319685
## [50,] 0.1856627 0.4374604
## [51,] 0.2998031 0.1696971
## [52,] 0.9304180 0.8772065
## [53,] 0.2808209 0.2710332
## [54,] 0.6708174 0.3464086
## [55,] 0.3511814 0.5022574
## [56,] 0.3254723 0.5740792
## [57,] 0.3145387 0.3059196
## [58,] 0.1517071 0.2557272
## [59,] 0.2359441 0.1722500
## [60,] 0.5552665 0.3205833
## [61,] 0.2657470 0.2975573
## [62,] 0.4448249 0.1661064
## [63,] 0.2729089 0.1943299
## [64,] 0.1626971 0.1432904
## [65,] 0.2150538 0.2570384
## [66,] 0.6288722 0.2641792
## [67,] 0.2193016 0.3426077
## [68,] 0.3807001 0.1808264
## [69,] 0.1878664 0.2062021
## [70,] 0.3331599 0.1796278
## [71,] 0.1841149 Inf
## [72,] 0.2532277 0.2193016
## [73,] 0.3274906 0.2787785
## [74,] 0.2738600 0.2701339
## [75,] 0.4164499 0.2511287
## [76,] 0.5740792 0.3021004
## [77,] 1.5193663 0.4726528
## [78,] 0.2989567 0.5616418
## [79,] 0.3721025 0.2566185
## [80,] 0.1606640 0.4374604
## [81,] 0.1835911 0.1873547
## [82,] 0.2453998 1.8608361
## [83,] 0.1654077 0.2249215
## [84,] 0.2384147 0.1787523
## [85,] 0.1976940 0.2601542
## [86,] 1.4062010 0.1516346
## [87,] 0.2432933 0.2163178
## [88,] 0.2710332 0.1781314
## [89,] 0.4107901 0.1943242
## [90,] 0.2344105 0.4374604
## [91,] 0.2740277 0.2388314
## [92,] 0.7596831 0.2743653
## [93,] 0.1414544 0.3274906
## [94,] 0.3162263 0.1803603
## [95,] 0.3176555 0.3298774
## [96,] 0.2998031 0.2008859
## [97,] 0.9304180 0.1388916
## [98,] 0.3452054 0.5979133
## [99,] 0.9486790 0.2648193
## [100,] 0.3982022 0.2027547
## as.factor(Insurance_Type)private as.factor(Insurance_Type)self pay
## [1,] 0.4687337 0.2281159
## [2,] 0.3485665 0.1688185
## [3,] 0.2808209 0.1123284
## [4,] 0.2396796 0.3985249
## [5,] 0.5267720 0.2991864
## [6,] 0.2793838 0.2332893
## [7,] 0.5487306 0.2871332
## [8,] 0.2150081 0.2388471
## [9,] 0.3629134 0.2860375
## [10,] 0.1570531 0.3772293
## [11,] 0.2512142 0.5078601
## [12,] 0.2193016 0.2979722
## [13,] 0.2327375 0.2388471
## [14,] 0.3354087 0.4173631
## [15,] 0.1700149 0.3120328
## [16,] 0.2775637 0.3226046
## [17,] 0.1754413 0.2335185
## [18,] 0.1336975 0.3452054
## [19,] 0.4242621 0.1672693
## [20,] 0.1629538 0.2141633
## [21,] 0.2812402 0.2831341
## [22,] 0.3397406 0.4213964
## [23,] 0.1929757 0.2000467
## [24,] 0.2014972 0.7006459
## [25,] 0.3041320 0.2367664
## [26,] 0.3630795 0.3170896
## [27,] 0.5610632 0.2821392
## [28,] 1.1578542 0.1726027
## [29,] 0.4173631 0.2566185
## [30,] 0.2082249 0.3148065
## [31,] 0.2460339 0.6755171
## [32,] 0.2568198 0.1972482
## [33,] 0.3242640 0.3807001
## [34,] 0.4173631 0.3426226
## [35,] 0.2256444 0.2332586
## [36,] 0.3951084 0.3068582
## [37,] 0.3798416 0.3922987
## [38,] 0.2281735 0.3531422
## [39,] 0.3162263 0.2777448
## [40,] 0.2050457 0.3872905
## [41,] 0.1835911 0.3068582
## [42,] 0.2200671 0.2353792
## [43,] 0.2427522 0.3014796
## [44,] 0.1635006 0.2797502
## [45,] 0.2555220 0.1796278
## [46,] 0.4446798 0.7343645
## [47,] 0.2018360 0.2888578
## [48,] 0.2304216 0.1745531
## [49,] 0.2070187 0.1771865
## [50,] 0.2141633 0.2187302
## [51,] 0.1975542 0.2641792
## [52,] 0.2854392 0.1598595
## [53,] 0.2710332 0.2808209
## [54,] 0.5635993 0.6708174
## [55,] 0.2525679 0.1917120
## [56,] 0.1602515 0.4116935
## [57,] 0.1961493 0.1671080
## [58,] 0.2238680 0.2315708
## [59,] 0.2157575 0.4173631
## [60,] 0.1439032 0.2050457
## [61,] 0.2104048 0.3851040
## [62,] 0.3922987 0.4060677
## [63,] 0.2270739 0.3067231
## [64,] 0.2701339 0.3629134
## [65,] 0.4374604 0.2344105
## [66,] 0.5192741 0.2301437
## [67,] 0.2618559 0.3581181
## [68,] 0.1707163 0.3122501
## [69,] 0.4472116 0.2070187
## [70,] 0.1765744 0.3235935
## [71,] 0.6755171 0.1556055
## [72,] 0.1966965 1.5193663
## [73,] 0.1438911 0.3859514
## [74,] 0.4045181 0.1713113
## [75,] 0.2058269 0.2776332
## [76,] 0.2460339 0.9205747
## [77,] 0.2273373 0.2193016
## [78,] 0.2304216 0.2691956
## [79,] 0.2904724 0.5070903
## [80,] 0.1976940 0.2299022
## [81,] 0.1525239 0.5192741
## [82,] 0.6037349 0.4652090
## [83,] 0.2894636 0.2207941
## [84,] 0.4472116 0.2701339
## [85,] 0.8328997 0.2815715
## [86,] 0.1664950 0.3377586
## [87,] 0.2805316 0.2273373
## [88,] 0.3274906 0.1280590
## [89,] 0.4107901 0.2359441
## [90,] 0.3148065 0.2216147
## [91,] 0.4177039 0.5192741
## [92,] 0.1406660 1.1768961
## [93,] 0.5616418 0.3342437
## [94,] 0.2347372 0.3721025
## [95,] 0.2367664 0.8328997
## [96,] 0.2058468 0.2058468
## [97,] 0.1632061 0.8321912
## [98,] 0.6422620 0.2589041
## [99,] 0.1831395 0.1860513
## [100,] 0.1910803 0.1850888
## as.factor(Admission_Type)emergency as.factor(Admission_Type)newborn
## [1,] 0.2911113 0.3851040
## [2,] 0.3342160 0.6579049
## [3,] 0.4031131 0.4726920
## [4,] 0.2195765 0.2520493
## [5,] 0.5889490 0.5022574
## [6,] 0.1907718 0.2088520
## [7,] 0.2685886 0.2486647
## [8,] 0.3531422 0.2031011
## [9,] 1.0392258 0.2103851
## [10,] 0.2141633 Inf
## [11,] 0.2332893 0.2223399
## [12,] 0.2464738 0.3581181
## [13,] 0.2062997 0.2710332
## [14,] 0.3286320 0.4845415
## [15,] 0.1880130 0.1713590
## [16,] 0.2232722 0.2932132
## [17,] 0.2200671 0.6794813
## [18,] 0.4726920 0.2573009
## [19,] 0.1992624 0.1777039
## [20,] 0.2511287 0.2226018
## [21,] 0.2706235 0.4004121
## [22,] 0.1433535 0.4886795
## [23,] 0.4631417 0.2989567
## [24,] 0.1273423 0.5196129
## [25,] 0.3093312 0.2899785
## [26,] 0.2271220 0.1815398
## [27,] 0.2685886 0.3168098
## [28,] 0.2573009 0.3491062
## [29,] 0.4647559 0.2520493
## [30,] 0.2585822 0.1614165
## [31,] 0.3144361 0.1937669
## [32,] 0.1691669 0.3880111
## [33,] 0.1419852 0.1711821
## [34,] 0.1929794 1.3416347
## [35,] 0.2878063 0.1751050
## [36,] 0.2261409 0.2311780
## [37,] 0.3397406 0.2672005
## [38,] 0.1561250 0.1813803
## [39,] 0.1581132 0.1903714
## [40,] 0.1850888 0.2288152
## [41,] 0.2073330 0.1672785
## [42,] 0.4448249 0.2087010
## [43,] 0.2375867 0.4159135
## [44,] 0.7745932 0.3162263
## [45,] 0.2005387 0.2404374
## [46,] 0.1919531 0.4239856
## [47,] 0.2924022 0.2326045
## [48,] 0.3377808 1.0356163
## [49,] 0.1992624 0.1766737
## [50,] 0.3400299 0.1949671
## [51,] 0.3630795 0.3044515
## [52,] 0.1592729 0.2924022
## [53,] 0.1866051 1.1578542
## [54,] 0.5999973 0.3426226
## [55,] 0.1546656 0.2456087
## [56,] 0.2399879 0.2472989
## [57,] 0.3798416 0.3038733
## [58,] 0.6188991 0.2259900
## [59,] 0.2927687 0.1999991
## [60,] 0.4301076 0.2050457
## [61,] 0.2498885 0.4374488
## [62,] 0.1754413 0.3038733
## [63,] 0.5178081 0.2000467
## [64,] 0.2007440 0.2166936
## [65,] 0.1641361 0.2540320
## [66,] 0.1846432 0.2740277
## [67,] 0.2290531 0.3168098
## [68,] 0.4031131 0.1666883
## [69,] 0.6445005 0.2614456
## [70,] 0.3266902 0.5267720
## [71,] 0.2152797 0.1988669
## [72,] 0.3101393 0.2127538
## [73,] 0.3414326 0.2964961
## [74,] 0.2038089 0.7745932
## [75,] 0.3331599 0.1775748
## [76,] 0.1562446 0.3630795
## [77,] 0.2412402 0.3649400
## [78,] 0.3859514 0.1355166
## [79,] 0.1722500 0.2078452
## [80,] 0.4533732 0.1839568
## [81,] 0.7343645 0.3254723
## [82,] 0.3759457 0.3426077
## [83,] 0.1526933 0.4159135
## [84,] 0.1756612 0.1532255
## [85,] 0.2667414 0.1304755
## [86,] 0.2223399 0.1487787
## [87,] 0.3123158 0.1989317
## [88,] 0.2159409 1.1578542
## [89,] 0.1563143 0.2053950
## [90,] 0.3772293 0.1314877
## [91,] 0.3021004 0.2205099
## [92,] 0.3239300 0.1754413
## [93,] 0.1854051 0.3914262
## [94,] 0.3162263 0.2701339
## [95,] 0.2115567 0.2483227
## [96,] 0.2152797 0.6089029
## [97,] 0.3342160 0.6202787
## [98,] 0.1617361 1.3369749
## [99,] 0.1943242 0.2648193
## [100,] 0.3235935 0.2442847
## as.factor(Admission_Type)urgent
## [1,] 0.2706235
## [2,] 0.2924022
## [3,] 0.2088004
## [4,] 0.1512652
## [5,] 0.4301076
## [6,] 0.5587676
## [7,] 0.3721672
## [8,] 0.5979133
## [9,] 0.1923174
## [10,] 0.6800598
## [11,] 0.3851040
## [12,] 0.1833536
## [13,] 0.5312599
## [14,] 0.2614456
## [15,] 0.3176555
## [16,] 0.2757787
## [17,] 0.1851601
## [18,] 0.2105189
## [19,] 2.3237795
## [20,] 0.3093312
## [21,] 0.3093230
## [22,] 0.3685005
## [23,] 0.1750511
## [24,] 0.2130205
## [25,] 0.1904548
## [26,] 0.4239856
## [27,] 0.1724031
## [28,] 0.2573009
## [29,] 0.2435983
## [30,] 0.2968024
## [31,] 0.1575466
## [32,] 0.2906138
## [33,] 0.1830728
## [34,] 0.1265843
## [35,] 0.1917120
## [36,] 0.2376914
## [37,] 0.4652090
## [38,] 0.3274906
## [39,] 0.2078452
## [40,] 0.3266902
## [41,] 0.3170896
## [42,] 0.2299257
## [43,] 0.2218046
## [44,] 0.2166936
## [45,] 0.2667414
## [46,] 0.2423524
## [47,] 0.2382556
## [48,] 0.4159135
## [49,] 0.2566185
## [50,] 0.2469545
## [51,] 0.2757787
## [52,] 0.2030339
## [53,] 0.2497093
## [54,] 0.1787523
## [55,] 0.3093312
## [56,] 0.1757751
## [57,] 0.4269051
## [58,] 0.3616529
## [59,] 0.2757819
## [60,] 0.4100915
## [61,] 0.1504726
## [62,] 0.2888578
## [63,] 0.1530263
## [64,] 0.7348436
## [65,] 0.1983938
## [66,] 0.1528239
## [67,] 0.2155907
## [68,] 0.3377808
## [69,] 0.1736874
## [70,] 0.1536752
## [71,] 0.2223399
## [72,] 0.2248344
## [73,] 0.1964158
## [74,] 0.2103851
## [75,] 0.2650478
## [76,] 0.2596371
## [77,] 0.3967316
## [78,] 0.2767801
## [79,] 0.1622998
## [80,] 0.3066985
## [81,] 0.7343645
## [82,] 0.2464738
## [83,] 0.3414326
## [84,] 0.2103851
## [85,] 0.5267720
## [86,] 0.9205747
## [87,] 0.2672005
## [88,] 2.3157084
## [89,] 0.2139212
## [90,] 0.1845202
## [91,] 0.1975542
## [92,] 0.3215035
## [93,] 0.3242640
## [94,] 0.1626971
## [95,] 0.2483227
## [96,] 0.1684748
## [97,] 0.3922987
## [98,] 0.1183272
## [99,] 0.3985249
## [100,] 0.2310048
The Monte Carlo standard errors (MCSE) are computed for the
corresponding Monte Carlo estimates of the posterior probabilities via
this command:
apply(mcmc_out_ts_highProb, 2, sd) / sqrt(mcmc_out$batch)
,
where the second parameter represents margin=2
indicates
the function is computed for each column separately.
The extra term in the denominator, sqrt(mcmc_out$batch)
ensures that the batch means have variance \(\frac{\sigma^2}{b}\), where \(b\) is the batch length (\(out\$blen\)), whereas the CLT dictates that
the overall means, \(\mu\), have
variance \(\frac{\sigma^2}{n}\), where
\(n\) is the total number of
iterations, which is \(out\$blen \times
out\$nbatch\).
The square root law implies that t guarantee one more significant
digit of precision (\(10 \times \
accuracy\)) we need to run \(100\) times as many iterations. Let’s
display some diagnostic plots on the mcmc_out_ts_highProb
results.
##
## Attaching package: 'tidyr'
## The following object is masked from 'package:igraph':
##
## crossing
df_long <- gather(df, factors, value, colnames(df), factor_key=TRUE)
df_long$index <- rep(c(1:(dim(df)[1])), times=dim(df)[2])
plot_ly(data=df_long, x=~index, y=~value, color=~factors,
type="scatter", mode="markers+lines") %>%
layout(title="MCMC Output (Time-Serties High-Probility)",
legend = list(orientation = 'h', y=-0.5))
Most of these autocorrelation plots appear fine and we can generate the final overall summary of the MCMC Bayesian inference.
library(knitr)
bar <- rbind(colMeans(mcmc_out_ts_highProb), apply(mcmc_out_ts_highProb, 2, sd) / sqrt(mcmc_out$nbatch))
rownames(bar) <- c("probability of death", "MCSE")
kable(bar, digits = 2, caption = "Estimated (higherst) Posterior Probability of dying following an admission to the hospital. These probability values may not add to one due to rounding error and the fact that we only chose the high-probability predictors.")
as.factor(Married_1)1 | as.factor(Insurance_Type)medicare | as.factor(Insurance_Type)private | as.factor(Admission_Type)emergency | as.factor(Admission_Type)urgent | |
---|---|---|---|---|---|
probability of death | 0.10 | 0.11 | 0.10 | 0.11 | 0.11 |
MCSE | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 |
We can also display histograms of the marginal posteriors for the parameters.
# mcmc_out <- metrop(out, blen = 1, nbatch = 1e5)
theta <- mcmc_out$batch[ , 1]
# hist(theta, freq = FALSE)
plot_ly(x = theta, type = "histogram") %>%
layout(title="Histogram of Theta", xaxis=list(title="Theta"), yaxis=list(title="Density"))
# Other parameters
beta <- mcmc_out$batch[ , 2]
# hist(beta, freq = FALSE)
plot_ly(x = beta, type = "histogram") %>%
layout(title="Histogram of Beta", xaxis=list(title="Beta"), yaxis=list(title="Density"))
# display smooth parameter density plots
out <- density(beta)
# plot(out)
plot_ly(x = out$x, y=out$y, type = "scatter", mode="lines") %>%
layout(title="Beta Density", xaxis=list(title="Beta"), yaxis=list(title="Density"))
Of course, we can also try to forecast one of the clinical outcomes (“Death_1”, “AdmissionLengthDays”, or “Dx”) using the machine learning methods we saw in the previous chapters.
## [1] 58863 9
hosAdmissions$Married_1 <- as.numeric(hosAdmissions$Married_1)
hosAdmissions$Insurance_Type <- as.numeric(hosAdmissions$Insurance_Type)
hosAdmissions$Admission_Type <- as.numeric(hosAdmissions$Admission_Type)
y <- hosAdmissions[ , "Death_1"] # outcome
x <- hosAdmissions[ , c("AdmissionLengthDays", "Admission_Type", "Insurance_Type", "Married_1", "Race", "Dx")]
# split training/testing data (80:20)
sub <- sample(nrow(x), floor(nrow(x)*0.8))
x <- x[sub, ]; y <- y[sub]
data_train <- cbind(x[ , c("AdmissionLengthDays","Married_1")], y); dim(data_train) # training data
## [1] 47090 3
x_test <- hosAdmissions[-sub, c("AdmissionLengthDays", "Admission_Type", "Insurance_Type", "Married_1", "Race", "Dx")]
y_test <- hosAdmissions[-sub, "Death_1"]
data_test <- cbind(x_test[ , c("AdmissionLengthDays","Married_1")], y_test) # testing data
dim(data_test)
## [1] 11773 3
library(neuralnet)
# hosAdmissions_NNmodel <- neuralnet(y.train ~ AdmissionLengthDays + Admission_Type + Insurance_Type + Married_1 + Race + Dx, data=data_train. hidden=3) # can;t have categorical/unstructured features
set.seed(1234)
hosAdmissions_NNmodel <- neuralnet(y ~ AdmissionLengthDays + Married_1, data=data_train, hidden=1)
plot(hosAdmissions_NNmodel)
# evaluate model on testing data
# Compute the posterior probabilities
hosAdmissions_pred<-compute(hosAdmissions_NNmodel, x_test[, c("AdmissionLengthDays","Married_1")])
# dichotomize the outcome
plot(hosAdmissions_pred$net.result)
# plot(data_test$y_test, hosAdmissions_pred$net.result)
# hist(hosAdmissions_pred$net.result)
pred_results_bin <- as.factor(ifelse (hosAdmissions_pred$net.result>0.107, 1, 0))
table(pred_results_bin, data_test$y_test)
##
## pred_results_bin 0 1
## 0 10581 1192
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 10581 1192
## 1 0 0
##
## Accuracy : 0.8988
## 95% CI : (0.8932, 0.9041)
## No Information Rate : 0.8988
## P-Value [Acc > NIR] : 0.5077
##
## Kappa : 0
##
## Mcnemar's Test P-Value : <2e-16
##
## Sensitivity : 1.0000
## Specificity : 0.0000
## Pos Pred Value : 0.8988
## Neg Pred Value : NaN
## Prevalence : 0.8988
## Detection Rate : 0.8988
## Detection Prevalence : 1.0000
## Balanced Accuracy : 0.5000
##
## 'Positive' Class : 0
##
Let’s also try decision tree classification for the same hospitalization case-study.
library(C50)
# The Dx free text is an important feature, but needs to be cleaned prior to processing .... here is an instance of a "cleaning" function
preproc_fun = function(x) # text data
{ require("tm")
x = gsub("<.*?>", " ", x) # regex removing HTML tags
x = iconv(x, "latin1", "ASCII", sub="") # remove non-ASCII characters
x = gsub("[^[:alnum:]]", " ", x) # remove non-alpha-numeric values
x = tolower(x) # convert to lowercase characters
# x = removeNumbers(x) # removing numbers
x = stripWhitespace(x) # removing white space
x = gsub("^\\s+|\\s+$", "", x) # remove leading and trailing white space
return(x)
}
# Clean the DX feature by itself, if not already pre-cleaned upon loading the hosAdmissions dataset
# Dx11 <- preproc_fun(x[ , "Dx"]); x$Dx <- Dx11
hosAdmissions_DTmodel <- C5.0(x=x[ , c("AdmissionLengthDays", "Admission_Type", "Dx")], y=as.factor(y), control = C5.0Control(winnow = TRUE, minCases= 100))
# hosAdmissions_DTmodel
summary(hosAdmissions_DTmodel)
##
## Call:
## C5.0.default(x = x[, c("AdmissionLengthDays", "Admission_Type", "Dx")], y
## = as.factor(y), control = C5.0Control(winnow = TRUE, minCases = 100))
##
##
## C5.0 [Release 2.07 GPL Edition] Thu Jun 20 10:11:13 2024
## -------------------------------
##
## Class specified by attribute `outcome'
##
## Read 47090 cases (4 attributes) from undefined.data
##
## No attributes winnowed
##
## Decision tree:
##
## AdmissionLengthDays > 1.638194: 0 (44140/3583)
## AdmissionLengthDays <= 1.638194:
## :...Admission_Type <= 1: 0 (290/8)
## Admission_Type > 1:
## :...Admission_Type > 2: 0 (321/37)
## Admission_Type <= 2:
## :...AdmissionLengthDays <= 0.4736111: 1 (548/132)
## AdmissionLengthDays > 0.4736111: 0 (1791/616)
##
##
## Evaluation on training data (47090 cases):
##
## Decision Tree
## ----------------
## Size Errors
##
## 5 4376( 9.3%) <<
##
##
## (a) (b) <-classified as
## ---- ----
## 42298 132 (a): class 0
## 4244 416 (b): class 1
##
##
## Attribute usage:
##
## 100.00% AdmissionLengthDays
## 6.26% Admission_Type
##
##
## Time: 1.4 secs
# Predict death (survival) based on C50 model using testing data
# Clean Dx, if not previously cleaned!
# Dx11 <- preproc_fun(hosAdmissions[-sub, "Dx"]); hosAdmissions[-sub, "Dx"] <- Dx11
# Extract ONLY the hospital-admission TESTING cases that have the common Dx with the training cases (otherwise, prediction on test cases may fail)
common <- intersect(hosAdmissions[sub, ]$Dx, hosAdmissions[-sub, ]$Dx)
hosAdmission_test <- hosAdmissions[which(hosAdmissions$Dx %in% common), ]
hosAdmission_test <- hosAdmission_test[-sub, c("AdmissionLengthDays", "Admission_Type", "Dx")]
# View(hosAdmission_test)
# Same preprocessing for the TESTING data outcome variable (death)
y_test <- hosAdmissions[which(hosAdmissions$Dx %in% common), ]
y_test <- y_test[-sub, "Death_1"]
hosAdmissions_DTpred <- predict(hosAdmissions_DTmodel, hosAdmission_test)
caret::confusionMatrix(hosAdmissions_DTpred, as.factor(y_test))
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 7833 729
## 1 26 79
##
## Accuracy : 0.9129
## 95% CI : (0.9068, 0.9187)
## No Information Rate : 0.9068
## P-Value [Acc > NIR] : 0.02536
##
## Kappa : 0.1549
##
## Mcnemar's Test P-Value : < 2e-16
##
## Sensitivity : 0.99669
## Specificity : 0.09777
## Pos Pred Value : 0.91486
## Neg Pred Value : 0.75238
## Prevalence : 0.90677
## Detection Rate : 0.90377
## Detection Prevalence : 0.98789
## Balanced Accuracy : 0.54723
##
## 'Positive' Class : 0
##
Lastly, we can employ RandomForest classification of the
clinical outcome variable, death
.
require(randomForest)
y <- hosAdmissions[sub, "Death_1"]
data_train <- cbind(hosAdmissions[sub, c("AdmissionLengthDays","Married_1")], y); dim(data_train) # training data
## [1] 47090 3
hosAdmissions_RFmodel_sm <- randomForest(as.factor(y) ~ AdmissionLengthDays + Married_1, data=data_train, importance=TRUE, ntree=3000, mtry=2)
# hosAdmissions_RFmodel <- randomForest(as.factor(y) ~ . , data=data_train, importance=TRUE, ntree=2000, mtry=2)
varImpPlot(hosAdmissions_RFmodel_sm, cex=0.8)
##
## Call:
## randomForest(formula = as.factor(y) ~ AdmissionLengthDays + Married_1, data = data_train, importance = TRUE, ntree = 3000, mtry = 2)
## Type of random forest: classification
## Number of trees: 3000
## No. of variables tried at each split: 2
##
## OOB estimate of error rate: 13.99%
## Confusion matrix:
## 0 1 class.error
## 0 39812 2618 0.06170163
## 1 3970 690 0.85193133
hosAdmissions_RFmodel_pred<-predict(hosAdmissions_RFmodel_sm, hosAdmissions[-sub, ], type = 'class')
caret::confusionMatrix(hosAdmissions_RFmodel_pred, as.factor(data_test$y_test))
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 9956 1022
## 1 625 170
##
## Accuracy : 0.8601
## 95% CI : (0.8537, 0.8663)
## No Information Rate : 0.8988
## P-Value [Acc > NIR] : 1
##
## Kappa : 0.098
##
## Mcnemar's Test P-Value : <2e-16
##
## Sensitivity : 0.9409
## Specificity : 0.1426
## Pos Pred Value : 0.9069
## Neg Pred Value : 0.2138
## Prevalence : 0.8988
## Detection Rate : 0.8457
## Detection Prevalence : 0.9325
## Balanced Accuracy : 0.5418
##
## 'Positive' Class : 0
##
# Use caret::train to fine-tune RF
# control <- trainControl(method="repeatedcv", number=10, repeats=3, search="grid")
# metric <- "Accuracy"
# set.seed(1234)
# tunegrid <- expand.grid(.mtry=c(1:3))
# rf_gridsearch <- train(as.factor(y) ~ AdmissionLengthDays + Race + Admission_Type, data=data_train, method="rf", metric=metric, tuneGrid=tunegrid, trControl=control)
# print(rf_gridsearch)
# plot(rf_gridsearch)
# Alternative parameter tuning using randomForest::tuneRF()
set.seed(234)
best_mtry <- tuneRF(hosAdmissions[sub, c("AdmissionLengthDays","Admission_Type", "Race")], y, stepFactor=1.5, improve=1e-5, ntree=500)
## mtry = 1 OOB error = 0.07936883
## Searching left ...
## Searching right ...
## mtry OOBError
## 1 1 0.07936883
# Try NLP on the clinical diagnosis "Dx"
library(text2vec)
library(data.table)
# split data into train:test (80:20)
set.seed(1234)
sub <- sample(nrow(hosAdmissions), floor(nrow(hosAdmissions)*0.8))
y_train <- hosAdmissions[sub , ] # Training Dx outcome
y_test <- hosAdmissions[-sub , ] # Testing Dx outcome
# coerce Dx as string/character
y_train$Dx <- as.character(y_train$Dx)
y_test$Dx <- as.character(y_test$Dx)
# either a simple (tolower case) function
preproc_fun = tolower
# or a more elaborate "cleaning" function
preproc_fun = function(x) # text data
{ require("tm")
x = gsub("<.*?>", " ", x) # regex removing HTML tags
x = iconv(x, "latin1", "ASCII", sub="") # remove non-ASCII characters
x = gsub("[^[:alnum:]]", " ", x) # remove non-alpha-numeric values
x = tolower(x) # convert to lowercase characters
# x = removeNumbers(x) # removing numbers
x = stripWhitespace(x) # removing white space
x = gsub("^\\s+|\\s+$", "", x) # remove leading and trailing white space
return(x)
}
# define the tokenization function
token_fun = word_tokenizer
# iterator for both training and testing sets
iter_train = itoken(y_train$Dx,
preprocessor = preproc_fun,
tokenizer = token_fun,
ids = sub,
progressbar = TRUE)
iter_test = itoken(y_test$Dx,
preprocessor = preproc_fun,
tokenizer = token_fun,
# ids = -sub,
ids = 1:11773,
progressbar = TRUE)
reviewVocab = create_vocabulary(iter_train)
reviewVocab
## Number of docs: 47090
## 0 stopwords: ...
## ngram_min = 1; ngram_max = 1
## Vocabulary:
## term term_count doc_count
## <char> <int> <int>
## 1: 020807 1 1
## 2: 10 1 1
## 3: 111 1 1
## 4: 15 1 1
## 5: 17 1 1
## ---
## 4308: disease 3185 3159
## 4309: artery 4379 3102
## 4310: coronary 4498 3229
## 4311: sda 5366 5365
## 4312: newborn 6221 6221
# Next, compute the document term matrix (DTM)
reviewVectorizer = vocab_vectorizer(reviewVocab)
t0 = Sys.time()
dtm_train = create_dtm(iter_train, reviewVectorizer)
dtm_test = create_dtm(iter_test, reviewVectorizer)
dim(dtm_train); dim(dtm_test)
## [1] 47090 4312
## [1] 11773 4312
# confirm that the training data review DTM dimensions are consistent with training review IDs, i.e., #rows = number of documents, and #columns = number of unique terms (n-grams), dim(dtm_train)[[2]]
identical(dim(dtm_train)[1], length(sub))
## [1] TRUE
# TM/NLP analytics
library(glmnet)
nFolds = 10
t0 = Sys.time()
glmnet_classifier = cv.glmnet(x = dtm_train,
y = y_train$Death_1,
family = "binomial",
# LASSO L1 penalty
alpha = 1,
# interested in the area under ROC curve or MSE
type.measure = "auc",
# n-fold internal (training data) stats cross-validation
nfolds = nFolds,
# threshold: high value is less accurate / faster training
thresh = 1e-2,
# again lower number of iterations for faster training
maxit = 1e3
)
lambda.best <- glmnet_classifier$lambda.min
lambda.best
## [1] 0.001270929
## Time difference of 13.50766 secs
# some prediction plots
plot(glmnet_classifier)
# plot(glmnet_classifier, xvar="lambda", label="TRUE")
mtext("CV LASSO: Number of Nonzero (Active) Coefficients", side=3, line=2.5)
# report the mean internal cross-validated error
print(paste("max AUC =", round(max(glmnet_classifier$cvm), 4)))
## [1] "max AUC = 0.7509"
# report TESTING data prediction accuracy
xTest = dtm_test
yTest = y_test$Death_1
predLASSO <- predict(glmnet_classifier,
s = glmnet_classifier$lambda.1se, newx = xTest)
testMSE_LASSO <- mean((predLASSO - yTest)^2); testMSE_LASSO
## [1] 7.554096
# Binarize the LASSO probability prediction
binPredLASSO <- ifelse(predLASSO < -2, 0, 1)
table(binPredLASSO, yTest)
## yTest
## binPredLASSO 0 1
## 0 8267 521
## 1 2371 614
## [1] 0.7453573
# predict
predTest <- predict(glmnet_classifier,
s = glmnet_classifier$lambda.1se,
newx = dtm_test, type="response")
predVSrealDeath <- cbind(predTest, y_test$Death_1)
cor(predTest, y_test$Death_1)
## [,1]
## s1 0.2784294
##
## 0 1
## 0 9993 873
## 1 645 262
# install.packages("RTextTools")
# library("RTextTools")
#
# # DxCorpus <- create_corpus(iter_train, reviewVectorizer)
# # DxDTM <- get_dtm(DxCorpus)
# DxDTM_train <- create_dtm(iter_train, reviewVectorizer)
# dim(DxDTM_train)
# image(DxDTM_train[1:dim(DxDTM_train)[2], ], xlab="Terms", ylab="Training records/documents")
#
# DxDTM_test <- create_dtm(iter_test, reviewVectorizer)
# dim(DxDTM_test)
# image(DxDTM_test[1:dim(DxDTM_test)[2], ], xlab="Terms", ylab="Testing records/documents")
Data-driven Bayesian inference treats all unknowns (parameters and data prior to being observed) as random variables. The inference is probabilistic in nature, not deterministic (the way the frequentist inference is), which quantifies the measures of intrinsic process uncertainty.
Bayesian modeling propagates uncertainty from the initial prior to derived posterior distributions using the Bayesian inversion of conditioning theorem. When the posterior is not in simple standard form, e.g., complex or non-conjugate priors, MCMC sample-based simulation provides a computationally-expensive, but scientifically valid, protocol for modeling and inference. MCMC provides a general recipe for obtaining representative samples from any posterior density.