SOCR ≫ | DSPA ≫ | Topics ≫ |
After Thomas Bayes died in 1761 his unpublished work about the flat-prior posterior binomial distribution was rediscovered and published in the Philosophical Transactions of the Royal Society of London in 1764.
The Monte Carlo (MC) Method is a simple technique used to estimate analytical quantities that may have closed-form expressions, but are difficult to compute exactly, e.g., integration, differentiation, optimization of non-trivial functions. MC simulation may be used to approximate probability distributions, estimate likelihoods, or calculate expectations. The intuition of MC sampling is rooted in the fact that we can estimate any desired (posterior) expectation using ergodic averages. In other words, we can compute any population characteristic, associated with a corresponding sample-driven statistic, of a (posterior) distribution provided we can generate \(N\) simulated samples from that distribution:
\[E(f(s))_P = \int {f(s)p(s)ds} \approx \frac{1}{N}\sum_{i=1}^N {f(s^{(i)})},\] where \(P\) and \(p\) are the probability (posterior) distribution and density functions of interest, \(E\) represents the expectation, \(f(s)\) is the population characteristic of interest (corresponding to a statistic), and \(s^{(i)}\) is the \(i^{th}\) simulated sample from the distribution \(P\). For instance, we can estimate the population mean (linked to the sample arithmetic average statistic) by:
\[E(x)_P =\int{xp(x)dx} \approx \frac{1}{N}\sum_{i=1}^N {x^{(i)}},\] where \(p(x)\) is the density of the distribution function \(P\). Similarly, assuming a trivial mean, to estimate the population variance (corresponding to the sample-variance statistic) we compute:
\[E(x^2)_P =\int{x^2p(x)dx} \approx \frac{1}{N-1}\sum_{i=1}^N {\left ( x^{(i)}\right ) ^2}.\]
Many integrals, differentials, or sums that cannot be exactly computed in closed analytic form, can still be expressed as expectations of some random variables with respect to some probability distribution, which are estimable by MC methods.
The Markov Chain Monte Carlo (MCMC) method was proposed by Andrey Markov in the early \(20^{th}\) century. Markov Chains refer to one-step dependent sequences of random variables obeying the Markov property (past and future are conditionally independent given the present). For example:
Sampling with replacement vs. without replacement from a space (e.g., an urn) containing a set of identifiable objects represent processes obeying vs. not obeying the Markov property, respectively.
A simple Brownian motion (Weiner process) random walk on an integer lattice \(Z^d \subset R^d\) represents a Markov chain whose transition probabilities are defined by \(p(x,x\pm e_i)=\frac{1}{2d}\), for each \(x\in Z^d\), where \(e_i\) are the normalized vectors in \(Z^d\). At each step, this random walk process, currently at position \(x\), moves to a randomly chosen nearest neighbor, \(x\pm e_i\).
Consider cancer staging and thecorresponding transitions between stages. The Cancer Tumor-Node-Metastasis (TNM) staging system describes the size of the primary tumor (T), whether the cancer has spread to the lymph nodes (N), and whether it has spread to a different part of the body (M). \(T=1(small), 2, 3, 4(large)\) refers to the size of the cancer and how far it has spread into nearby tissue, \(N=0\) (no lymph nodes containing cancer cells), \(1, 2, 3\) (lots of lymph nodes containing cancer cells), and \(M=\) (no spread), \(1\) (cancer has spread), refers to whether the cancer has spread to another part of the body. Most types of cancer have four stages, \(CS=1(early), 2, 3,\ to\ 4\ (advanced)\). Stage 1 implies that a cancer is relatively small and contained within the organ it started in. Stage 2 means the cancer has not started to spread into surrounding tissue but tumor is larger than in stage 1, or that cancer cells have spread into lymph nodes close to the tumor. Stage 3 usually means the cancer is larger and may have started to spread into surrounding tissues possibly with cancer cells in the lymph nodes in the area. And stage 4 means the cancer has spread from its point of origination to another body organ indicating metastatic progression.
Let’s use cancer staging to illustrate the fundamentals of MC representation, its properties, and computational characteristics. For each stage, let \(X_t\) denote the cancer stage of a patient at time \(t\). Cancer onset, progression, prognosis and outcomes have stochastic behavior. We have 5 states, \(X_n=0(no\ cancer), 1,2, 3, 4\), and initially \(X_o=0\) (cancer occurs), with \(P(X_o=0)=1\). Next we can define the marginal probabilities for transition from one cancer stage to another (these depend upon the type of cancer), say \(P(X_1=1)=0.6\), \(P(X_1=2)=0.4\), \(P(X_1=3)=0.2\), \(P(X_1=4)=0.1\). Computing the distribution of \(X_t\) for \(t\geq 2\) requires conditional probabilities.
Suppose that at time \(t\), the cancer patient is at stage \(s_o\). Then, we can compute the conditional probabilities: \(P(X_{t+1}=s_k|X_t=s_o)\) of patient being in a specific stage \(s_k\) at the next time point \(t+1\). For instance, if at time \(t\) the patient is in stage \(1\) we may have \(P(X_{t+1}=0|X_t=1)=0.15\), \(P(X_{t+1}=1|X_t=1)=0.5\), \(P(X_{t+1}=2|X_t=1)=0.2\), \(P(X_{t+1}=3|X_t=1)=0.1\), and \(P(X_{t+1}=4|X_t=1)=0.05\).
Would these probabilities change if we condition further on the oncological history of the same patient up to time \(t\)? Because of the stochastic coin-toss like mechanism of many biomedical, physiologic and natural processes, these conditional likelihoods may not be affected by knowing a longer staging history of the disease provenance. To predict the likelihood of being in a certain state at the next time period, only the current state of the disease matters. Thus, for any cancer provenance record (\(s_o, s_1, \cdots, s_{t-1}\)), we may have \(P(X_{t+1}=0|X_o=S_o, X_1=s_1, \cdots, X_{t-1}=s_{t-1}, X_t=1)=P(X_{t+1}=0|X_t=1)=0.15\), \(P(X_{t+1}=1|X_o=S_o, X_1=s_1, \cdots, X_{t-1}=s_{t-1}, X_t=1)=P(X_{t+1}=1|X_t=1)=0.15\), \(P(X_{t+1}=2|X_o=S_o, X_1=s_1, \cdots, X_{t-1}=s_{t-1}, X_t=1)=P(X_{t+1}=2|X_t=1)=0.15\), \(P(X_{t+1}=3|X_o=S_o, X_1=s_1, \cdots, X_{t-1}=s_{t-1}, X_t=1)=P(X_{t+1}=3|X_t=1)=0.15\), and \(P(X_{t+1}=4|X_o=S_o, X_1=s_1, \cdots, X_{t-1}=s_{t-1}, X_t=1)=P(X_{t+1}=4|X_t=1)=0.15\).
This is because the coin-toss like cancer staging outcome (not necessarily a fair coin) at time \(t+1\) is independent of the prior coin flips and therefore it does not depend on \(X_o, X_1, \cdots, X_{t-1}\). It only depends on the current cancer stage, \(X_t\), at time \(t\). This rule, that the conditional distribution of \(X_{t+1}\) depends only on the state at the prior time, \(X_t\), is referred to as the Markov property, or memoryless property.
Another characteristic of Markov processes, time homogeneity, is that the conditional distribution of \(X_{t+1}\) given (say) \(X_t=s_o\) is the same for all time points, i.e., \(P(X_{t'+1} | X_{t'}=s_o) = P(X_{t''+1} | X_{t''}=s_o)\), for all \(t'\) and \(t''\). This follows from the fact that albeit dependent on human intervention, the biophysiological mechanism of cancer stage transitioning obeys the same rules independently of the time.
Collectively, the transition probabilities comprise the Markov transition matrix, \(P\), where each element \(P_{i,j}\) represents the conditional probability of observing cancer stage \(s_j\) at time \(t+1\) given that the current, time \(t\), stage is \(s_i\), i.e., \(P_{i,j} = P(X_{t'+1} =s_j| X_{t'}=s_i)\).
This a Markov chain model for cancer progression, using the above 5-cancer stages (model states) may be represented mathematically as a transition matrix or a graph (here \(k=5\)): \[P= \begin{pmatrix} P_{1,1} & P_{1,2} & \cdots & P_{1,k} \\ P_{2,1} & P_{2,2} & \cdots & P_{2,k} \\ \vdots & \vdots & \ddots & \vdots \\ P_{k,1} & P_{k,2} & \cdots & P_{k,k} \end{pmatrix} .\] The axioms of probability theory dictate that:
Below we show the mathematical matrix representaiton of the MC process above as a (directed, cyclic, complete) graphical object illustrating all states and their corresonding transition probabilities.
# install.packages("network")
library('igraph')
nodes <- data.frame(c('Stage_0','Stage_1','Stage_2', 'Stage_3', 'Stage_4'))
node_names <- c('Stage_0','Stage_1',' Stage_2', ' Stage_3', ' Stage_4' )
edges <- data.frame(from=c( 'Stage_0', 'Stage_0', 'Stage_0', 'Stage_0', 'Stage_0',
'Stage_1', 'Stage_1', 'Stage_1', 'Stage_1', 'Stage_1',
'Stage_2', 'Stage_2', 'Stage_2', 'Stage_2', 'Stage_2',
'Stage_3', 'Stage_3', 'Stage_3', 'Stage_3', 'Stage_3',
'Stage_4', 'Stage_4', 'Stage_4', 'Stage_4', 'Stage_4'),
to=c('Stage_0', 'Stage_1', 'Stage_2', 'Stage_3', 'Stage_4',
'Stage_0', 'Stage_1', 'Stage_2', 'Stage_3', 'Stage_4',
'Stage_0', 'Stage_1', 'Stage_2', 'Stage_3', 'Stage_4',
'Stage_0', 'Stage_1', 'Stage_2', 'Stage_3', 'Stage_4',
'Stage_0', 'Stage_1', 'Stage_2', 'Stage_3', 'Stage_4'),
same.dept=c(T,T,T,F, T)
)
edge_names <- rep('P{i,j}', 25)
vertex_size <- rep(20, 5)
for (i in 1:5) {
vertex_size[i] <- 70-5*(i-1)
for (j in 1:5) {
edge_names[j + (i-1)*5] <- paste0('P{', i-1, ',', j-1, '}')
# edge_names[j + (i-1)*5]
}
}
g <- graph_from_data_frame(edges, directed=TRUE, vertices=nodes)
plot(g, layout=layout_with_fr, vertex.size=vertex_size, vertex.label=node_names, edge.label=edge_names, edge.label.cex = 0.7, edge.curved=0.3, margin=c(0.1, 0.1, 0.1, 0.1), main='Cancer Staging Transition Graph', ylim=c(-1,1),xlim=c(-1,1), asp = 1)
In quantum physics, the Heisenberg uncertainty principle suggests that there is a limit to the process of simultaneously identifying the position and momentum of particles in the universe, which implies that the future is difficult to predict exactly, albeit may be probabilistically quantified. Similarly, in logic, the Gödel’s incompleteness theorem suggests that no system (closed system supported by quantitative arithmetic) can be complete and consistent at the same time. This also imposes some theoretical bounds on the precision of our forecasting and inference.
Monte Carlo experiments represent computational algorithms utilizing repeated random sampling of data and calculation of specific statistics that estimate concrete population characteristics. Two examples of using randomness to solve challenging problems involve the estimation of the area of a 2D region in the plane with complex boundary and compressed sensing for rapid estimation of complex signals using (sub-Nyquist frequency) stochastic sampling. The Metropolis-Hastings algorithm and Gibbs sampling represent a couple of specific Markov Chain Monte Carlo (MCMC) methods of simulation and estimation.
Earlier in Chapter 7, we presented the Bayes conditional probability formula in terms of events. Below, we will expand that formulation of the relation between the posterior, prior, conditional and marginal probabilities.
Gibbs sampling can be applied to estimate Bayesian models facilitating subsequent Bayesian inference. The fundamental idea is to obtain MCMC computationally estimates of complicated or intractable statistical estimations, mathematical operations.
Conditional probability reflects partial knowledge about some observed data. Let’s consider the bivariate density and distribution situations to avoid the limitations of univariate cases and reduce the complexities of more general multivariate processes. For discrete-state processes, \(X,Y\in \Omega\), the probability mass function (PMF), \(f(.,.)\geq 0\), consists of the complete set of probability values assigned to all possible \((x, y)\) pair states, where \(\sum_{(x, y) \in \Omega} f(x, y) = 1\). The PMF values, \(f(x, y)\) correspond to the probability of observing the event \(X = x\) and \(Y = y\). For continuous-state processes, the probability density function (PDF), is a non-negative function \(f(x,y)\geq 0\) that integrates to one \(\int_{-\infty}^\infty { \int_{-\infty}^{\infty} { f(x, y) \ d x \ d y }} = 1\).
The PDF assigns probabilities, \(f(x, y) \, d x \, d y\) , to outcome “areas” corresponding to observing the event \(x < X < x + d x\) and \(y < Y < y + d y\), where the infinitesimal \(d x\) and \(d y\) represent the small increments in the two planar directions. In mixed situations where one of the variable may be continuous and the other discrete, probability mass-density functions may be defined in the same spirit yielding probabilities of outcomes, \(f(x, y) \, d x\), defined for mixed continuous-discrete events like \(x < X < x + d x\) and \(Y = y\).
Often we may be able to observe \(X\) but not \(Y\). The joint distribution represents the probability of the outcome \((x,y)\in A\), \(P((X,Y)\in A)\), without any other knowledge and before observing either of the random processes \(X\) or \(Y\). Suppose we see \(X = x\) first, what should would be the distribution of \(Y\)? Given \(X\), the probability function has one unknown variable and has a conditional probability function analytic representation. For instance, when \(X\) and \(Y\) are discrete random variables and \((x,y)\in \Omega\), then
\[f(y | x) = \frac{h(x, y)}{\sum_{x \in \Omega_x} h(x, y)}, \qquad y \in \Omega _x,\] where
\[\Omega_x = \{\ y : \text{such that } (x, y) \in \Omega \}\] is the conditional probability mass function of \(Y\) given \(X\).
Similarly, when \(X\) and \(Y\) are continuous random variables, then \[ f(y | x) = \frac{h(x, y)}{\int_{- \infty}^\infty h(x, y) \ d y} \qquad - \infty < y < \infty,\] represents the conditional probability density function of \(Y\) given \(X\).
The joint multivariate distribution is the product of the marginal and the conditional distributions, \(f(x, y)= f_X(x)\times f(y | x)\). In the discrete case, the marginal distribution of \(X\) (ignoring \(Y\)) is \(f_X(x) = \sum_{y \in \Omega_x} f(x, y)\), where \(\Omega_x\) is the support of \(X\), represents the probability of the event \(X = x\). And when \(X\) and \(Y\) are both continuous, the marginal is \(f_X(x) = \int_{- \infty}^\infty f(x, y) \ d y\) and for any event \(E\subset \Re\), \(P(X \in E) = \int_E f_X(x) \ d x = \int_E d x \int_{- \infty}^\infty f(x, y) \ d y\) represents the probability of the event \(X \in E\), which is equivalent to \((X, Y) \in E \times \Re\).
The conditional distributions are expressed as: \[\begin{align*} f(y | x) & = \frac{f(x, y)}{f_X(x)} \\ f(x | y) & = \frac{f(x, y)}{f_Y(y)} \end{align*}.\]
Notice that there are separate marginal distributions for each variable. In our bivariate case, one marginal for \(X\) and another for \(Y\).
This bivariate situation generalizes to \(X\) and \(Y\) variable vectors where multiple variables can be conditioned upon in the above relation between joint, marginal and conditional probabilities.
Review the many hands-on examples of constructing and validating Bayesian confidence intervals that are included in the SOCR General Confidence Intervals Activity.
Bayesian inference is based on probability theory as a consistent model for describing uncertainty using probability distributions. For instance, statistical inference frequently involves probability distribution models, whose true parameters are unknown but may be estimated from observations. A prior distribution of the parameter (or parameters, if the model distribution has multiple parameters like in \(Normal(\mu, \sigma^2)\)) is the initial belief about the parameter distribution before any data are observed. Of course, if there is no reason to believe in a certain distribution without any prior evidence, we can use a uniform prior. In both Bayesian and frequentist inference approaches, the distribution of the data given the parameters (\(\theta\)) are identically interpreted based on some underlying statistical model. However, frequentist representation of this distribution is expressed as \(f^{freq}_{\theta}(x)\) emphasizing \(x\) as random (described by a probability distribution) and the parameter \(\theta\) as fixed. The corresponding Bayesian distribution representation is written as \(f^{Bayes}(x | \theta)\), emphasizing as random both \(X\) and \(\theta\), and conditioning on the \(\theta\) parameter as being given (i.e., known). This conditional distribution describes the probability of observing \(X\) for a given fixed value of \(\theta\).
Thus, in the Bayesian framework, assuming a prior distribution, \(g()\), for the parameter \(\theta\), the joint = marginal times conditional, \(f(X, \theta) = g(\theta) \times f(x | \theta)\), describes the uncertainty about \(X\) and \(\theta\) before observing the data.
Taking a data sample, \(X = x\), the conditional distribution of the parameter \(\theta\) given \(X\) can be computed, $ f(| x) = $, and is called the posterior distribution representing the probability distribution describing the post-evidence uncertainty in the parameter \(\theta\) after taking a data sample \(x\).
Both, the (marginal) prior and the (conditional) posterior distributions capture the probability of the parameter of interest. The prior is the initial distribution before seeing any data and the posterior the probability of the same parameter after observing the data. The Bayesian rule or Bayesian theorem provides a mechanism of inverting the conditioning and computing one conditional posterior probability from the other, e.g., having \(f(x | \theta)\), we can compute the conditional posterior probability of \(\theta\) given \(x\):
\[f(\theta |x) = \frac{g(\theta)\times f(x | \theta)}{f_X(x)}.\]
Bayesian inference employs the Bayesian rule to compute the conditional posterior probability from the joint distribution model and the prior belief.
The numerator in the right hand size above represents an un-normalized posterior distribution, \(f(\theta |x) \sim g(\theta)\times f(x | \theta)\). In other words, $ = $.
Bayesian inference suggest unique answers for each problem, each problem yields a single posterior distribution. On the other hand, frequentist inference always provides ranges of answers, e.g., confidence intervals of certain confidence levels. Bayesian inference is subjective as it heavily depends on prior distributions, which may be difficult to justify objectively. The frequentist inference methods (e.g., parametric hypothesis tests) tend to be simple (e.g., using sampling distributions) to compute using samples of data. Bayesian inference is much more difficult due to general challenges of analytically estimating the likelihoods, \(f(x | \theta)\). Hence, MCMC are developed to provide stochastic approaches to compute analytically intractable likelihoods. MCMC transform the analytical challenges into computational challenges, which are tractable using programming, storage, and compute infrastructure.
Frequentist inference treats parameters as fixed and data as random, even after it is observed (think of the sampling distribution of the mean), whereas Bayesian inference treats both data and parameters as random until data is observed, and then treats the data as fixed to compute the conditional posterior probability. Yet, the two inference strategies tend to agree for large sample sizes.
Suppose \(\hat{\theta}_n\) and \(\theta_o\) denote the maximum likelihood estimate (MLE) and the true unknown parameter value. Frequentist inference treats \(\hat{\theta}_n\) as random and \(\theta_o\) as fixed and computed asymptotic distribution: \[\hat{\theta}_n - \theta_o \sim Normal\left(0, I_n(\theta)^{- 1} \right),\] where the reciprocal of the Fisher’s information, \(I(\theta)\), is related to the the lower bound of the Cramer-Rao inequality, see this paper.
This inference may not be useful as frequentist inference has no access to \(\theta_o\). However, plugging in a specific value for \(\theta_o\) does yeild a useful result: \[\hat{\theta}_n - \theta_o \sim Normal\left(0, I_n(\hat{\theta}_n)^{- 1} \right).\]
A corresponding Bayesian inference is based on a random \(\theta_o\) and a fixed \(\hat{\theta}_n\), not random once the data are observed. Both inference strategies agree that \(\hat{\theta}_n - \theta_o\) is random, as a function of a random variable is itself a random variable, and its distribution is independent of the prior distribution, under regularity conditions (assuming smooth prior and locally strictly positive function around the true unknown parameter value).
In practice, large amount of data, this suggests that small perturbations of the prior don’t substantially alter the posterior probability or final inference.
The main downside of Bayesian inference is that two Bayesian scientists that look at the same data could arrive at different conclusions based on their prior values or prior beliefs.
To mediate the Bayesian subjectivity, we can employ change-of-variables for priors. Suppose the prior distribution for the parameter \(\theta\) is \(g\). Then, the distribution of another parameter \(\phi = h(\theta)\), where \(h\) is an invertible function with inverse \(h^{-1}\). The change-of-variable theorem for integration implies that: \[d \theta = h^{-1}(\phi) d \phi .\]
Thus, \[P(\phi \in E) = \int_{h(\theta) \in E} {g(\theta) \, d \theta } = \int_E {g(h^{-1}(\phi)) h^{-1}(\phi) \ d \phi} .\]
Thus, we can empliy the following prior for \(\phi\): \(g(h^{-1} (\phi)) h^{-1}(\phi)\).
This suggests that a flat uninformative (diffuse) prior for one parameter may transform to a non-flat, or informative, prior for another parameter.
Monte Carlo simulations may be used to estimate probabilities (likelihoods) or expectations (integrals) involving functional that may be difficult to analytically process (i.e., compute a closed form analytical expression of the desired solutions). Likelihoods are special cases of integrals, and each probability can be expressed as an expectation (e.g., for a characteristic function \(g\), \(E(g(X)) = P( g(X) = 1)\)). Let’s assume we want to compute \(\theta = E(g(X)) = \int_R {g(x)P_X(x)dx}\), but the integral can’t be exactly solves using simple analytical terms.
Instead, we can employ Monte Carlo simulation to estimate \(\theta\). Let’s generate a random sample of observations \(\{ X_1, X_2, \cdots \}\), independent and identically distributed (IID). Then, for each iteration \(n\), the average of the random variables is: \[\hat{\theta}_n = \frac{1}{n} \sum_{i = 1}^n g(X_i).\]
If \(Y_i = g(X_i)\), then \(\hat{\theta}_n\) can be written explicitly as a sample arithmetic average: \[\overline{Y}_n = \frac{1}{n} \sum_{i = 1}^n {Y_i}\]
The law of large numbers (LLN) implies that \(\hat{\theta}_n \longrightarrow \theta\) as \(n \longrightarrow \infty\).
Having a sufficiently large simulation sample would allow us to calculate a good estimate of \(\theta\).
By the central limit theorem (CLT): \[\hat{\theta}_n \sim Normal\left( \theta, \frac{\sigma^2}{n} \right),\] as \(n\longrightarrow \infty\) and \(\sigma^2 = \mathop{\rm var}( g(X))\) represents the population variance.
Neither \(\theta\) nor \(\sigma\) are known. However, we can use the sample variance as an estimate \[\hat{\sigma}^2 = \frac{1}{n-1} \sum_{i = 1}^n \left ( g(X_i) - \hat{\theta}_n \right )^2. \]
Thus, \[\hat{\theta}_n \sim Normal\left( \theta, \frac{\hat{\sigma}^2}{n} \right). \]
Let’s construct a 95% confidence interval for \(\theta\): \[\hat{\theta}_n \pm 1.96 \times \frac{\hat{\sigma}_n}{\sqrt{n}}. \]
Let’s now go back to Markov chains, sequences \(\{X_1\), \(X_2\), \(\ldots \}\) of random variables obeying the Markov property that the conditional distribution of \(X_{n + 1}\) given the entire transition history back to the initial time actually only depends on the current state \(X_n\): \[f(x_{n + 1} | x_1, x_2, \cdots, x_n) = f(x_{n + 1} | x_n).\]
Referring back to our 5-state cancer staging Markov process, as the conditional distribution does not depend on \(n\), we have stationary transition probabilities. If \(g\) represents the chain distribution at the initial state, the Bayesian inference, $ = $, implies that
\[\begin{align*} f(x_1, x_2) &= g(x_1) f(x_2 | x_1) \\ f(x_1, x_2, x_3) &= g(x_1) f(x_3 | x_2, x_1) f(x_2 | x_1) \\ f(x_1, x_2, x_3, x_4) &= g(x_1) f(x_4 | x_3 , x_2, x_1) f(x_3 | x_2, x_1) f(x_2 |x_1) \\ f(x_1, x_2, x_3, x_4, x_5) &= g(x_1) f(x_5, x_4, x_3, x_2 | x_1) \\ & = g(x_1) f(x_5 | x_4, x_3, x_2, x_1) f(x_4 | x_3, x_2, x_1) f(x_3 | x_2, x_1) f(x_2 | x_1). \end{align*} \]
All of the Markov chain probabilities are computed by multiplying the initial state distribution \(g\) and the conditional transition probability distribution \(f(\,\cdot\, | \,\cdot\,)\).
The marginal distributions of the different variables can also be estimated by: \[\begin{align*} f(x_2) & = \int f(x_1, x_2) \, d x_1 \\ & = \int f(x_2 | x_1) g(x_1) \, d x_1 \\ f(x_3) & = \iint f(x_1, x_2, x_3) \, d x_1 \, d x_2 \\ & = \iint f(x_3 | x_2) f(x_2 | x_1) g(x_1) \, d x_1 \, d x_2 \\ f(x_4) & = \iiint f(x_1, x_2, x_3, x_4) \, d x_1 \, d x_2 \, d x_3 \\ & = \iiint f(x_4 | x_3) f(x_3 \mid x_2) f(x_2 \mid x_1) g(x_1) \, d x_1 \, d x_2 \, d x_3 \\ f(x_5) & = \iiiint f(x_1, x_2, x_3, x_4, x_5) \, d x_1 \, d x_2 \, d x_3 \, d x_4 \\ & = \iiiint f(x_5 | x_4) f(x_4 | x_3) f(x_3 | x_2) f(x_2 | x_1) g(x_1) \, d x_1 \, d x_2 \, d x_3\, d x_4 \end{align*}\]The ergodic law, ergodicity or ergodic theory represents the idea is that for certain dynamical systems, like Markov chains, the time average of their properties is equal to the average over the entire state space. In physics, ergodic methods are used to study geodesic flows on Riemannian manifolds and in mathematics, there are ergodic applications in harmonic analysis, representation Lie groups, the theory of diophantine approximations, etc. Ergodic theory characterizes the behavior of a dynamical system that runs indefinitely. The Poincaré recurrence theorem suggests that on the long run, almost all points in any subset of the phase-space are eventually revisited. In general, ergodic theorems determine the conditions required to guarantee the almost certain existence of the time average of a function along the trajectories and its relation to the space average. For example, one special class of ergodic systems, this time average is statistically the same for almost all initial points, i.e., the evolving dynamical system forgets its initial state.
One special case of ergodic law for Markov chains suggests that: \(\hat{\theta}_n \longrightarrow \theta\) as \(n \longrightarrow \infty\).
The CLT suggests that for sufficiently large \(b<<n\), we have \[\hat{\theta}_b \approx Normal\left( \theta, \frac{\sigma^2}{b} \right) .\] When \(n\) is a multiple of \(b\), i.e., \(m\times b = n\), for an integer \(m\), we can divide the Markov chain into \(m\) batches: \[ X_{m, 1}, X_{m, 2}, \ldots, X_{m, b}, \ \forall\ m=1, 2, 3, \cdots, \frac{n}{b}.\] Computing the batch means yields: \[\hat{\theta}_{b, m} = \frac{1}{b} \sum_{i = 1}^b {X_{m, i}}. \] Again, assume that \(b\) is sufficiently large, then we have \[\hat{\theta}_{b, m} \sim Normal\left( \theta, \frac{\sigma^2}{b} \right) . \] Therefore, we can use the batches and their batch means, \(\hat{\theta}_{b, m}\), to obtain Bayesian simulation based estimates of the unknown process variance, \(\frac{\sigma^2}{b}\), using the sampe variance statistic:
\[\frac{1}{m} \sum_{j = 1}^m ( \hat{\theta}_{b, m} - \hat{\theta}_n )^2. \] As the last erxpression estimates \(\sigma^2 / b\), we can solve for the numerator and estimate the variance by: \[\hat{\sigma}^2_{b, m} = \frac{b}{m} \sum_{j = 1}^m ( \hat{\theta}_{b, m} - \hat{\theta}_n )^2.\] Therefore, for sufficiently large \(b\) and \(m\), the MCMC simulation yields a good estimator of \(\sigma^2\).
Similarly, we can peform Bayesian inference on the parameter \(\theta\) as well as compute confidence intervals for \(\theta\). For instance, when we run the MCMC simulation long enough (large sample), we get large \(b\) and \(m\) such that \(n = b m\), the expression below represents a 95% confidence interval for \(\theta\): \[\hat{\theta}_n \pm 1.96 \times \frac{\hat{\sigma}_{b, m}}{\sqrt{n}}. \]
Some additional details and demonstrations are provided in the SOCR Bayesian Estimator.
Remember that in Chapter 21, we discussed the importance of randomly sampling from diverse arrays of probability distributions. The Metropolis random walk algorithm provides a protocol for simulating data from the distribution of a continuous random vector for a given PDF \(h\). The pseudo-code implementation of the Metropolis algorithm is provided below.
in the Metropolis algorithm, we can only adjust the proposal variance, \(\Sigma\), which in the scalar version corresponds to \(\sigma_2\). Too small variance yields MCMC chain taking tiny steps forwardand requireing a very long time to equilibrate. On the flip side, large variance would force the algorithm to take very large steps, \(y_n\), most of which may be discarded (not accepted or used) in the last step because the calculation of \(r\) in step 3 may be too small.
This process generates a positively recurrent Markov chain that obeys the ergodic LLN law and has \(h\) as its equilibrium distribution. This algorithm works for any distribution function \(h\geq 0\).
The R function mcmc::metrop
implements the Random-Walk Metropolis algorithm. It requires a predefined R function to compute \(\log (h(x))\). In the degenerate case of \(h(x) = 0\) for some \(x\), would yield \(\log h(x) = - \infty\).
The metrop
function parameters include a scale
parameter vector and an output function, \(g\), coded as an R function.
The scale
argument to the metrop
function specifies the multivariate normal random vector \(y\) in the Metropolis algorithm. If z
is a standard multivariate normal random vector, then \(y\) in the algorithm is \(scale \times z\) (when scale
is scalar or vector) and \(scale\ \%*\%\ z\) (when scale
is a matrix of appropriate dimensions.
To adjust the acceptance rate, which represent the proportion of the time that \(u < r\) in step 4 of the algorithm, e.g., acceptance=20%. When \(h\) is a continuous distribution function the acceptance rate may be close to one, which reduces the iterative steps and may substantially increase the execution time to transition from one part of the sample space to another. For scale
parameters, the algorithm will take large steps. When \(h(x)\) is moderately large and \(x\) resembles a sample from the equilibrium distribution, \(x + y\) will be extremely large, in the tails of the distribution, corresponding to a very small \(r = \frac{h(x + y)}{ h(x)}\).
The Metropolis algorithm only requires un-normalized prior distribution, so \(h\) can be defined as \[\text{un-normalized posterior} \sim \text{likelihood} \times \text{un-normalized prior}. \]
MCMC-based Bayesian inference is alwasy possible when we can provide specific functions that evaluate the likelihood and unnormalized posterior.
We will demonstrate frequentist and Bayesian inference using a clinical case-study of hospital admissions; N=58,863 (cases) and k=9 (features) including:
First, we will load and inspect the data.
hosAdmissions <- read.csv("https://umich.instructure.com/files/5005926/download?download_frd=1")
head(hosAdmissions)
## ID AdmissionLengthDays Death_1 Admission_Type Insurance_Type
## 1 42862 4.277778 0 elective private
## 2 5967 26.122917 0 elective private
## 3 50952 7.028472 1 emergency medicare
## 4 17138 34.187500 0 elective medicare
## 5 27703 6.165972 0 elective government
## 6 29593 3.286806 0 elective medicare
## EnglishLanguage_1 Religion_Type Married_1 Race
## 1 1 episcopalian 1 white
## 2 0 catholic 1 white
## 3 1 catholic 0 asian - japanese
## 4 1 catholic 1 white
## 5 0 protestant quaker 0 black/african american
## 6 0 christian scientist 0 white
## Dx
## 1 mitral stenosis\\mitral valve replacement /sda
## 2 duodenal mass/sda
## 3 ? serotonin syndrome
## 4 abdominal aortic aneurysm/sda
## 5 absence/sda
## 6 adenoid cystic carcinoma/sda
# remove the case ID:
hosAdmissions <- hosAdmissions[, -1]
Each row of this data frame represents one hospital admission record. The response variable, \(y\) (random outcome), is a binary survival outcome (1 for death, 0 for survival). The frequentist inference will be based on logistic regression modeling.
#Check sizes
cases <- unique(hosAdmissions$ID); length(cases); dim(hosAdmissions)
## [1] 0
## [1] 58863 9
# define the binary outcome variable
y <- hosAdmissions$Death_1
Let’s fit a logistic prediction model for the clinical outcome \(y\) representing the binary variable Death_1.
# model_glm <- glm(y ~ ., data = hosAdmissions, family = binomial)
model_glm <- glm(y ~ AdmissionLengthDays + Married_1 + Insurance_Type + Admission_Type, data = hosAdmissions, family = binomial)
summary(model_glm)
##
## Call:
## glm(formula = y ~ AdmissionLengthDays + Married_1 + Insurance_Type +
## Admission_Type, family = binomial, data = hosAdmissions)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.6321 -0.5806 -0.4508 -0.1959 3.1413
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -4.326120 0.131779 -32.829 < 2e-16 ***
## AdmissionLengthDays 0.001042 0.001273 0.819 0.412896
## Married_1 -0.003940 0.029010 -0.136 0.891965
## Insurance_Typemedicaid 0.274587 0.121530 2.259 0.023857 *
## Insurance_Typemedicare 0.920552 0.110573 8.325 < 2e-16 ***
## Insurance_Typeprivate 0.382308 0.113017 3.383 0.000718 ***
## Insurance_Typeself pay 1.017459 0.156722 6.492 8.46e-11 ***
## Admission_Typeemergency 1.707001 0.073893 23.101 < 2e-16 ***
## Admission_Typenewborn -0.875436 0.148996 -5.876 4.21e-09 ***
## Admission_Typeurgent 1.657608 0.111239 14.901 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 38120 on 58862 degrees of freedom
## Residual deviance: 35470 on 58853 degrees of freedom
## AIC: 35490
##
## Number of Fisher Scoring iterations: 7
Stratify the population, say on Race=white, refit the logistic model, and report the results.
model_glm_white <- glm(y ~ AdmissionLengthDays + as.factor(Married_1) + as.factor(Insurance_Type) + as.factor(Admission_Type), data = hosAdmissions, family = binomial, subset = hosAdmissions$Race == "white")
summary(model_glm_white)
##
## Call:
## glm(formula = y ~ AdmissionLengthDays + as.factor(Married_1) +
## as.factor(Insurance_Type) + as.factor(Admission_Type), family = binomial,
## data = hosAdmissions, subset = hosAdmissions$Race == "white")
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.6673 -0.5748 -0.4364 -0.1905 3.1237
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -4.408236 0.189887 -23.215 < 2e-16
## AdmissionLengthDays 0.002208 0.001518 1.455 0.145737
## as.factor(Married_1)1 0.033884 0.034740 0.975 0.329387
## as.factor(Insurance_Type)medicaid 0.298104 0.186195 1.601 0.109370
## as.factor(Insurance_Type)medicare 0.972134 0.171160 5.680 1.35e-08
## as.factor(Insurance_Type)private 0.364072 0.173671 2.096 0.036052
## as.factor(Insurance_Type)self pay 0.854595 0.245315 3.484 0.000495
## as.factor(Admission_Type)emergency 1.702102 0.084834 20.064 < 2e-16
## as.factor(Admission_Type)newborn -0.825623 0.188062 -4.390 1.13e-05
## as.factor(Admission_Type)urgent 1.479738 0.149358 9.907 < 2e-16
##
## (Intercept) ***
## AdmissionLengthDays
## as.factor(Married_1)1
## as.factor(Insurance_Type)medicaid
## as.factor(Insurance_Type)medicare ***
## as.factor(Insurance_Type)private *
## as.factor(Insurance_Type)self pay ***
## as.factor(Admission_Type)emergency ***
## as.factor(Admission_Type)newborn ***
## as.factor(Admission_Type)urgent ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 26399 on 40926 degrees of freedom
## Residual deviance: 24603 on 40917 degrees of freedom
## AIC: 24623
##
## Number of Fisher Scoring iterations: 7
Report the variance-covariance matrix of the estimates of the model coefficients:
summ_out <- summary(model_glm_white)
summ_out$coefficients
## Estimate Std. Error z value
## (Intercept) -4.408236433 0.189887179 -23.2150293
## AdmissionLengthDays 0.002208128 0.001517869 1.4547558
## as.factor(Married_1)1 0.033884030 0.034740425 0.9753488
## as.factor(Insurance_Type)medicaid 0.298103910 0.186195000 1.6010307
## as.factor(Insurance_Type)medicare 0.972133761 0.171159880 5.6796824
## as.factor(Insurance_Type)private 0.364071672 0.173670514 2.0963355
## as.factor(Insurance_Type)self pay 0.854595497 0.245315401 3.4836602
## as.factor(Admission_Type)emergency 1.702102077 0.084834009 20.0639118
## as.factor(Admission_Type)newborn -0.825622895 0.188062297 -4.3901564
## as.factor(Admission_Type)urgent 1.479737666 0.149358060 9.9073171
## Pr(>|z|)
## (Intercept) 3.210399e-119
## AdmissionLengthDays 1.457369e-01
## as.factor(Married_1)1 3.293873e-01
## as.factor(Insurance_Type)medicaid 1.093701e-01
## as.factor(Insurance_Type)medicare 1.349451e-08
## as.factor(Insurance_Type)private 3.605244e-02
## as.factor(Insurance_Type)self pay 4.946070e-04
## as.factor(Admission_Type)emergency 1.525943e-89
## as.factor(Admission_Type)newborn 1.132692e-05
## as.factor(Admission_Type)urgent 3.868958e-23
var_cov <- vcov(model_glm_white)
var_cov
## (Intercept) AdmissionLengthDays
## (Intercept) 3.605714e-02 -2.153387e-05
## AdmissionLengthDays -2.153387e-05 2.303925e-06
## as.factor(Married_1)1 -4.616676e-04 -3.524855e-07
## as.factor(Insurance_Type)medicaid -2.882354e-02 -3.075147e-06
## as.factor(Insurance_Type)medicare -2.886613e-02 6.731918e-07
## as.factor(Insurance_Type)private -2.886567e-02 -5.812371e-07
## as.factor(Insurance_Type)self pay -2.883529e-02 8.082561e-06
## as.factor(Admission_Type)emergency -7.045865e-03 -2.026943e-06
## as.factor(Admission_Type)newborn -7.298670e-03 -5.844913e-06
## as.factor(Admission_Type)urgent -6.945212e-03 -8.437119e-06
## as.factor(Married_1)1
## (Intercept) -4.616676e-04
## AdmissionLengthDays -3.524855e-07
## as.factor(Married_1)1 1.206897e-03
## as.factor(Insurance_Type)medicaid 9.691941e-05
## as.factor(Insurance_Type)medicare -2.370309e-04
## as.factor(Insurance_Type)private -4.300747e-04
## as.factor(Insurance_Type)self pay 5.090034e-05
## as.factor(Admission_Type)emergency 1.559534e-04
## as.factor(Admission_Type)newborn 8.603678e-04
## as.factor(Admission_Type)urgent 6.969787e-05
## as.factor(Insurance_Type)medicaid
## (Intercept) -2.882354e-02
## AdmissionLengthDays -3.075147e-06
## as.factor(Married_1)1 9.691941e-05
## as.factor(Insurance_Type)medicaid 3.466858e-02
## as.factor(Insurance_Type)medicare 2.880777e-02
## as.factor(Insurance_Type)private 2.879171e-02
## as.factor(Insurance_Type)self pay 2.882194e-02
## as.factor(Admission_Type)emergency 1.063638e-06
## as.factor(Admission_Type)newborn 1.113055e-04
## as.factor(Admission_Type)urgent 6.101094e-05
## as.factor(Insurance_Type)medicare
## (Intercept) -2.886613e-02
## AdmissionLengthDays 6.731918e-07
## as.factor(Married_1)1 -2.370309e-04
## as.factor(Insurance_Type)medicaid 2.880777e-02
## as.factor(Insurance_Type)medicare 2.929570e-02
## as.factor(Insurance_Type)private 2.890530e-02
## as.factor(Insurance_Type)self pay 2.882195e-02
## as.factor(Admission_Type)emergency 9.022303e-05
## as.factor(Admission_Type)newborn 3.258590e-04
## as.factor(Admission_Type)urgent 1.185816e-04
## as.factor(Insurance_Type)private
## (Intercept) -2.886567e-02
## AdmissionLengthDays -5.812371e-07
## as.factor(Married_1)1 -4.300747e-04
## as.factor(Insurance_Type)medicaid 2.879171e-02
## as.factor(Insurance_Type)medicare 2.890530e-02
## as.factor(Insurance_Type)private 3.016145e-02
## as.factor(Insurance_Type)self pay 2.879496e-02
## as.factor(Admission_Type)emergency 1.708531e-04
## as.factor(Admission_Type)newborn -8.262959e-04
## as.factor(Admission_Type)urgent 1.467773e-04
## as.factor(Insurance_Type)self pay
## (Intercept) -2.883529e-02
## AdmissionLengthDays 8.082561e-06
## as.factor(Married_1)1 5.090034e-05
## as.factor(Insurance_Type)medicaid 2.882194e-02
## as.factor(Insurance_Type)medicare 2.882195e-02
## as.factor(Insurance_Type)private 2.879496e-02
## as.factor(Insurance_Type)self pay 6.017965e-02
## as.factor(Admission_Type)emergency -9.802637e-05
## as.factor(Admission_Type)newborn 2.327653e-04
## as.factor(Admission_Type)urgent 7.672538e-05
## as.factor(Admission_Type)emergency
## (Intercept) -7.045865e-03
## AdmissionLengthDays -2.026943e-06
## as.factor(Married_1)1 1.559534e-04
## as.factor(Insurance_Type)medicaid 1.063638e-06
## as.factor(Insurance_Type)medicare 9.022303e-05
## as.factor(Insurance_Type)private 1.708531e-04
## as.factor(Insurance_Type)self pay -9.802637e-05
## as.factor(Admission_Type)emergency 7.196809e-03
## as.factor(Admission_Type)newborn 6.911878e-03
## as.factor(Admission_Type)urgent 6.877517e-03
## as.factor(Admission_Type)newborn
## (Intercept) -7.298670e-03
## AdmissionLengthDays -5.844913e-06
## as.factor(Married_1)1 8.603678e-04
## as.factor(Insurance_Type)medicaid 1.113055e-04
## as.factor(Insurance_Type)medicare 3.258590e-04
## as.factor(Insurance_Type)private -8.262959e-04
## as.factor(Insurance_Type)self pay 2.327653e-04
## as.factor(Admission_Type)emergency 6.911878e-03
## as.factor(Admission_Type)newborn 3.536743e-02
## as.factor(Admission_Type)urgent 6.907991e-03
## as.factor(Admission_Type)urgent
## (Intercept) -6.945212e-03
## AdmissionLengthDays -8.437119e-06
## as.factor(Married_1)1 6.969787e-05
## as.factor(Insurance_Type)medicaid 6.101094e-05
## as.factor(Insurance_Type)medicare 1.185816e-04
## as.factor(Insurance_Type)private 1.467773e-04
## as.factor(Insurance_Type)self pay 7.672538e-05
## as.factor(Admission_Type)emergency 6.877517e-03
## as.factor(Admission_Type)newborn 6.907991e-03
## as.factor(Admission_Type)urgent 2.230783e-02
# It may be easier to display the matric in Rstudio as a table
View(var_cov)
To perform the analogous MCMC Bayesian inference we will use the R function mcmc::metrop
which takes as input the log unnormalized posterior probability distribution (that we need to generate MCMC samples from). Let’s rewrite the Bayesian rule (\(unnormalized\ posterior\ \sim \ likelihood \times \ unnormalized\ prior\)) in terms of logs
: \[\text{log unnormalized posterior} \sim \text{log likelihood} + \text{log unnormalized prior}. \] Let’s examine the two terms on the right-hand side.
The frequentist and Bayesian analyses share the same (log) likelihood function. Thus, the frequentist model matrix can be reused in the Bayesian analysis.
model_glm <- glm(y ~ AdmissionLengthDays + as.factor(Married_1) + as.factor(Insurance_Type) + as.factor(Admission_Type), data = hosAdmissions, family = binomial, x=TRUE) # return the design matrix X
design_mat <- model_glm$x
response <- model_glm$y
We need to define a function that computes the log likelihood, \(\log(f(x | \beta))\).
log_likelihod <- function(beta) {
# housekeeping
stopifnot(is.numeric(beta))
stopifnot(is.finite(beta))
stopifnot(length(beta) == ncol(design_mat))
# Compute the estimatesd y_hat = X*b
eta <- drop(design_mat %*% beta)
# use log1p to calculate the function x --> log(1+x)
# correctly for small x, avoiding singularity problems
logp <- ifelse(eta < 0, eta - log1p(exp(eta)), - log1p(exp(-eta)))
logq <- ifelse(eta < 0, -log1p(exp(eta)), -eta -log1p(exp(-eta)))
sum(ifelse(response == 1, logp, logq))
}
The reason for computing log likelihoods
, \(\log(f(x | \beta))\), instead of hte raw likelihoods, \(f(x | \beta)\), is that it makes the asymptotic calculatiosn more stable. For instance, let \(\eta = X\times \beta\), \(p=\frac{\exp{(\eta)}}{1+\exp{(\eta)}}\), and \(q=1-p=\frac{1}{1+\exp{(\eta)}}\). Then, \(\log(p)=\eta-\log(1+\exp{(\eta)})\) and \(\log(q)=-\log(1+\exp{(\eta)})\) are computable for large \(|\eta|\), either positive or negative.
The mcmc::demo
package (vignette(“demo”, “mcmc”)) includes additional vignettes and examples.
Ideally, we would choose informative Bayesian priors, but we also don’t want to introduce to strong subjectivity. Conjugate priors, assume the prior and posterior distributions are of the same probability distribution family.
\[\prod_{i = 1}^{k} p(\beta_i) q(\beta_i),\] where \(p(\beta) = {\rm logit}^{- 1}(\beta)\) is the mean-value parameter corresponding to the linear predictor \(\beta\) and \(q(\beta) = 1 - p(\beta)\); the \(p\) and \(q\) functions are used to calculate for various \(\beta\) values the log likelihood, see log_likelihod
.
log_unnormalized_prior <- function(beta) {
stopifnot(is.numeric(beta))
stopifnot(is.finite(beta))
stopifnot(length(beta) == ncol(design_mat))
logp <- ifelse(beta < 0, beta - log1p(exp(beta)), - log1p(exp(- beta)))
logq <- ifelse(beta < 0, - log1p(exp(beta)), - beta - log1p(exp(- beta)))
sum(logp) + sum(logq)
}
log_unnormalized_posterior <- function(beta) {
log_likelihod(beta) + log_unnormalized_prior(beta)
}
We are ready to run the MCMC Bayesian inference.
library(mcmc)
logUnnormalizedPDF <- function(x) {
if (all(x >= 0) && sum(x) <= 1) return(1)
else return(-Inf)
}
set.seed(1234)
mcmc_out <- metrop(log_unnormalized_prior, rep(0, ncol(design_mat)),nbatch = 100, blen = 100)
# Runs a "random-walk" Metropolis algorithm with multivariate normal proposal
# Generates a Markov chain (MC) with equilibrium distribution having the
# specified unnormalized density
mcmc_out$accept
## [1] 0.3907
The main output, mcmc_out
, of the MCMC call, metrop
, includes:
outfun
if outfun
is a function, or the dimension of state when outfun is missing.system.time()
.Sometimes, we need to adjust the proposal to get higher acceptance rates (mcmc_out$accept
). In general, an acceptance rate of \(\sim 20\%\) is good. Forsing the acceptance rates below 70% may keep the sampler from ever visiting part of the state space. The acceptance rate reflects the proportion of the time that \(u < r\) in step 4 of the algorithm.
The metrop
function accepts as first argument the output of a previous run, and it uses all the same arguments as the previous run except for any modified in the call for this new run. Of course the initial state of the next run is the final state of the previous run and the random number generation seed at the start of the next are the same as at the end of the previous run. This yields a continuous feed-forward iteration if no modification of other arguments is made.
mcmc_out <- metrop(mcmc_out, scale = 0.1)
mcmc_out$accept
## [1] 0.9311
mcmc_out <- metrop(mcmc_out, scale = 0.3)
mcmc_out$accept
## [1] 0.7905
mcmc_out <- metrop(mcmc_out, scale = 0.5)
mcmc_out$accept
## [1] 0.6642
mcmc_out <- metrop(mcmc_out, scale = 0.4)
mcmc_out$accept
## [1] 0.7219
t.test(mcmc_out$batch)$conf.int
## [1] -0.05332039 0.11673006
## attr(,"conf.level")
## [1] 0.95
mcmc_out$blen # check the length of batches
## [1] 100
It may be apporpriate to check that the batch length is long enough for the validity of this confidence interval for the true unknown acceptance rate.
Similar to diagnostic plots for linear models, analogous MCMC diagnostic plots may be drawn to confirm there are no clear problems with the simulation estimates. For instance, we can display the time series of any component of the Markov function, e.g., acceptance indicators or their batch means. Ideally, the time series should behave stationarily.
# Recall the actual model
# model_glm <- glm(y ~ AdmissionLengthDays + as.factor(Married_1) + as.factor(Insurance_Type) + as.factor(Admission_Type), data = hosAdmissions, family = binomial, x=TRUE)
# Batch Means for AdmissionLengthDays
plot(ts(mcmc_out$batch[ , 1]), main="Batch Means for AdmissionLengthDays Coefficient")
# Married_1
plot(ts(mcmc_out$batch[ , 2]), main="Batch Means for Married_1 Coefficient")
Autocorrelation (ACF) plots may also be used to report the timeseries autocorrelation of the time series of batch means. These depict the internal timeseries correlation that should not be significantly nonzero, except for lag zero. Otherwise, it may suggest that the batch length is not long enough.
acf(ts(mcmc_out$batch[ , 1]), main="Autocorrelation of AdmissionLengthDays Coefficient")
acf(ts(mcmc_out$batch[ , 2]), main="Autocorrelation of Married_1 Coefficient")
acf(ts(mcmc_out$batch), main="Autocorrelation of Acceptance Indicators")
It appears as if some of the low-lag autocorrelations may be significantly non-trivial, i.e., the blue dashed error lines indicate the cutoff points for \(0.05\) level tests of the null hypothesis that the true unknown autocorrelation is zero. To mediate this problem, we can increase the batch length and replot the resulting autocorrelations.
Are there examples of questions that may be answerable using Bayesian inference that may not be within the reach a frequentist inference? Question of about probability of parameters only make sense in a Bayesian setting, as frequentist inference considers all parameters as static, not random.
Once we applied the Bayesian rule and sampled the posterior probability distribution, we can examine the function of those parameters to answer specific parameter probabilistic questions. In order to calculate the probabilities of specific events, we need to output the indicators of these events.
outputIndicatorFun <- function(beta) {
stopifnot(is.numeric(beta))
stopifnot(is.finite(beta))
stopifnot(length(beta) == ncol(design_mat))
as.numeric(beta == max(beta))
}
Let’s call the MCMC with a reference to the output indicator function.
mcmc_out <- metrop(mcmc_out, blen = 1000, outfun = outputIndicatorFun)
mcmc_out$time
## user system elapsed
## 6.48 0.00 6.48
mcmc_out$accept
## [1] 0.71766
Notice that the increase of blen
drives the computing time higher. However, longer run times yield more precise answers.
colnames(mcmc_out$batch)
## NULL
colnames(mcmc_out$batch) <- names(model_glm_white$coefficients)
# skip over the intercept column and deal with the remaining 10-1=9 coefficient estimates
mcmc_out_ts <- as.ts(mcmc_out$batch[ , 2:10])
# report the overall means for each coeffiecint by batch averaging
colMeans(mcmc_out_ts)
## AdmissionLengthDays as.factor(Married_1)1
## 0.08101 0.10191
## as.factor(Insurance_Type)medicaid as.factor(Insurance_Type)medicare
## 0.09991 0.10775
## as.factor(Insurance_Type)private as.factor(Insurance_Type)self pay
## 0.10386 0.09003
## as.factor(Admission_Type)emergency as.factor(Admission_Type)newborn
## 0.10722 0.09921
## as.factor(Admission_Type)urgent
## 0.10589
Let’s report only the coefficient estimates corresponding to the larger probabilities (\(p>0.1\)).
mcmc_out_ts_highProb <- mcmc_out_ts[ , colMeans(mcmc_out_ts) > 0.1]
colMeans(mcmc_out_ts_highProb)
## as.factor(Married_1)1 as.factor(Insurance_Type)medicare
## 0.10191 0.10775
## as.factor(Insurance_Type)private as.factor(Admission_Type)emergency
## 0.10386 0.10722
## as.factor(Admission_Type)urgent
## 0.10589
# Compute the grand variance (variance of batch means)
apply(mcmc_out_ts_highProb, 2, sd) / sqrt(mcmc_out$batch)
## (Intercept) AdmissionLengthDays as.factor(Married_1)1
## [1,] 0.1862559 0.3226046 0.3345570
## [2,] 0.2821392 0.4448249 0.1720344
## [3,] 0.5178081 0.2188139 0.2388471
## [4,] 0.2007440 0.2975295 0.2927687
## [5,] 0.2483227 0.4301076 0.2633860
## [6,] 0.3803786 0.1634675 0.3900100
## [7,] 0.1924431 0.2509151 0.3922987
## [8,] Inf 0.2767801 0.2315708
## [9,] 0.1971004 0.2112527 0.1515867
## [10,] 0.1691363 0.3436282 0.1801522
## [11,] 0.3377586 0.1605995 0.2104048
## [12,] 0.2631620 0.2556055 0.2317012
## [13,] 0.2159409 0.1688904 0.3573216
## [14,] 0.2422708 0.2371698 0.1532255
## [15,] 0.3551496 0.2836045 0.7852654
## [16,] 0.1638370 0.2890539 0.3803786
## [17,] 0.4886795 0.2299257 0.6579049
## [18,] 0.4631417 0.2178435 0.1697960
## [19,] 0.2246482 0.1672693 0.6708174
## [20,] 0.3148065 0.9617498 0.3205833
## [21,] 0.4602873 0.2251724 0.3410538
## [22,] 0.1475739 0.4652090 1.1768961
## [23,] 0.4300163 0.2375867 0.3971407
## [24,] 0.2347372 0.2777448 0.2112527
## [25,] 0.2074163 0.2815715 0.1560164
## [26,] 0.1800456 0.2261409 0.2223399
## [27,] 0.5263239 0.2699985 0.3101393
## [28,] 0.2188139 0.2238680 0.1802777
## [29,] 0.2130205 0.2236058 0.2078452
## [30,] 0.1904548 Inf 0.3331599
## [31,] 0.1815398 0.3021004 0.7343645
## [32,] 0.4060677 0.4269051 0.2924022
## [33,] 0.2829088 0.2270739 0.3661456
## [34,] 0.3389581 0.1866505 0.2371698
## [35,] 0.1850888 0.3926327 0.6533803
## [36,] 0.4177039 0.1830718 0.3118481
## [37,] 0.2372851 0.5064554 0.1385063
## [38,] 0.2964961 0.2573009 0.4300163
## [39,] 0.2951203 0.5999973 0.3985249
## [40,] 0.1574033 0.4301076 1.0535441
## [41,] 1.2178058 0.9205747 0.2831341
## [42,] 0.2344433 0.2821392 0.2497823
## [43,] 0.1730842 0.1866051 0.4300163
## [44,] 0.2176419 0.5196129 0.4472116
## [45,] 0.1718141 0.3148065 0.2525679
## [46,] 0.2043920 0.9943343 0.4687337
## [47,] 0.2486647 0.2432933 0.3168098
## [48,] 0.1916494 0.6188991 0.1597991
## [49,] 0.2927687 0.2614456 0.2738600
## [50,] 0.1695739 0.4711592 0.2815715
## [51,] 0.3951084 0.3068582 0.1709464
## [52,] 0.4581062 0.7596831 0.3397406
## [53,] 0.1929757 0.3807001 0.1352851
## [54,] 0.1238579 0.1672693 0.1971004
## [55,] 0.1886146 0.2738557 0.2585822
## [56,] 0.3118481 0.1692867 Inf
## [57,] 0.2248344 0.6382615 0.2475619
## [58,] 0.4159135 0.1890768 0.1978443
## [59,] 0.2598064 0.1978132 0.2491353
## [60,] 0.2367664 0.2738557 0.1862420
## [61,] 0.1701105 0.3170896 0.5078601
## [62,] 0.4804658 0.2618559 0.2475619
## [63,] 0.2557272 0.1943299 0.5458177
## [64,] 0.3543730 0.2838948 0.3319685
## [65,] 0.2355796 0.3592555 0.2483227
## [66,] 0.2073330 0.2512142 0.4687337
## [67,] 0.2699985 0.2177944 0.3685005
## [68,] 0.3308155 0.2281735 0.1993044
## [69,] 0.3051272 0.1792835 0.2777448
## [70,] 0.1910803 0.4808749 0.2944745
## [71,] Inf 0.2525612 0.2793838
## [72,] 0.3123158 0.3880111 0.2821392
## [73,] 0.2270739 0.2917518 0.1760601
## [74,] 0.1943242 0.8783061 0.1545755
## [75,] 0.2027547 0.2367664 0.2005387
## [76,] 0.2911113 0.6089029 0.1301889
## [77,] 0.1514327 0.3168098 0.2605692
## [78,] 0.2829088 0.1750511 0.2228291
## [79,] 0.2148335 0.2614456 0.2505796
## [80,] 0.2344105 0.4301076 0.2082249
## [81,] 0.2022665 0.2998031 0.2581743
## [82,] 0.1727743 0.3342160 0.1480402
## [83,] 0.2917518 0.4726920 0.2031011
## [84,] 1.0392258 0.2927687 0.3674218
## [85,] 0.2720239 0.3365423 0.1923500
## [86,] 2.4356116 0.2870396 1.0892386
## [87,] 0.3059196 0.7596831 0.2871332
## [88,] 0.6188991 Inf 0.1473450
## [89,] 0.2665558 0.2166936 0.3769668
## [90,] Inf 0.3235935 0.2107088
## [91,] 0.3284178 0.1937669 0.1684748
## [92,] 0.4269051 0.2344433 0.6037349
## [93,] 0.1658313 0.2238680 0.2710332
## [94,] 0.2975295 0.2817997 0.1722500
## [95,] 0.1292911 0.3205833 0.2391949
## [96,] 0.3951084 0.6509446 0.2411615
## [97,] 0.2532277 0.4726528 0.2644877
## [98,] 0.2894636 0.2401278 0.2327375
## [99,] 0.2195765 0.2449479 0.1936483
## [100,] 0.2757251 0.2570384 0.2256444
## as.factor(Insurance_Type)medicaid as.factor(Insurance_Type)medicare
## [1,] 0.1605995 0.1943830
## [2,] 0.2728862 0.2743653
## [3,] 0.2351246 1.1578542
## [4,] 0.2205634 0.2777448
## [5,] 0.1195974 0.1455416
## [6,] 0.2485836 0.2447882
## [7,] 0.2080478 0.2281904
## [8,] 0.3859514 0.1163687
## [9,] 0.3721025 0.2505796
## [10,] 1.1778981 0.1722729
## [11,] 0.2689683 0.2261409
## [12,] 0.6794813 0.3581181
## [13,] 0.1658313 0.3308155
## [14,] 0.1577484 0.2007440
## [15,] 0.1791079 0.3176555
## [16,] 0.2793838 0.1762347
## [17,] 0.2821392 0.5263239
## [18,] 0.2748240 0.3971407
## [19,] 0.2121311 0.3820270
## [20,] 0.1610389 0.2601542
## [21,] 0.2343668 0.1271379
## [22,] 0.2758687 0.4269051
## [23,] 0.3040674 0.1476442
## [24,] 0.3585670 0.3191957
## [25,] 0.2776332 0.2922006
## [26,] 0.3198115 0.2553215
## [27,] 0.4448249 0.1613550
## [28,] 0.1662582 0.4093633
## [29,] 0.2112527 0.1592227
## [30,] 0.2720239 0.1760805
## [31,] 0.2161256 0.3479445
## [32,] 0.4513190 0.2134525
## [33,] 0.2497093 0.9453840
## [34,] 0.3354087 0.2422708
## [35,] 0.2684679 0.2332586
## [36,] 0.1717948 0.2512142
## [37,] 0.2372851 0.2871332
## [38,] 0.1923091 0.2259900
## [39,] 0.1532255 0.3464086
## [40,] 0.1806813 0.2601542
## [41,] 0.1595622 0.4177039
## [42,] 0.5742665 0.3342160
## [43,] 0.1654077 0.3067231
## [44,] 0.2550679 0.1326251
## [45,] 0.3400299 0.3266902
## [46,] 0.1244539 0.2365675
## [47,] 0.3059196 0.3967316
## [48,] 0.2228291 0.1819392
## [49,] 0.3354087 0.3319685
## [50,] 0.1856627 0.4374604
## [51,] 0.2998031 0.1696971
## [52,] 0.9304180 0.8772065
## [53,] 0.2808209 0.2710332
## [54,] 0.6708174 0.3464086
## [55,] 0.3511814 0.5022574
## [56,] 0.3254723 0.5740792
## [57,] 0.3145387 0.3059196
## [58,] 0.1517071 0.2557272
## [59,] 0.2359441 0.1722500
## [60,] 0.5552665 0.3205833
## [61,] 0.2657470 0.2975573
## [62,] 0.4448249 0.1661064
## [63,] 0.2729089 0.1943299
## [64,] 0.1626971 0.1432904
## [65,] 0.2150538 0.2570384
## [66,] 0.6288722 0.2641792
## [67,] 0.2193016 0.3426077
## [68,] 0.3807001 0.1808264
## [69,] 0.1878664 0.2062021
## [70,] 0.3331599 0.1796278
## [71,] 0.1841149 Inf
## [72,] 0.2532277 0.2193016
## [73,] 0.3274906 0.2787785
## [74,] 0.2738600 0.2701339
## [75,] 0.4164499 0.2511287
## [76,] 0.5740792 0.3021004
## [77,] 1.5193663 0.4726528
## [78,] 0.2989567 0.5616418
## [79,] 0.3721025 0.2566185
## [80,] 0.1606640 0.4374604
## [81,] 0.1835911 0.1873547
## [82,] 0.2453998 1.8608361
## [83,] 0.1654077 0.2249215
## [84,] 0.2384147 0.1787523
## [85,] 0.1976940 0.2601542
## [86,] 1.4062010 0.1516346
## [87,] 0.2432933 0.2163178
## [88,] 0.2710332 0.1781314
## [89,] 0.4107901 0.1943242
## [90,] 0.2344105 0.4374604
## [91,] 0.2740277 0.2388314
## [92,] 0.7596831 0.2743653
## [93,] 0.1414544 0.3274906
## [94,] 0.3162263 0.1803603
## [95,] 0.3176555 0.3298774
## [96,] 0.2998031 0.2008859
## [97,] 0.9304180 0.1388916
## [98,] 0.3452054 0.5979133
## [99,] 0.9486790 0.2648193
## [100,] 0.3982022 0.2027547
## as.factor(Insurance_Type)private as.factor(Insurance_Type)self pay
## [1,] 0.4687337 0.2281159
## [2,] 0.3485665 0.1688185
## [3,] 0.2808209 0.1123284
## [4,] 0.2396796 0.3985249
## [5,] 0.5267720 0.2991864
## [6,] 0.2793838 0.2332893
## [7,] 0.5487306 0.2871332
## [8,] 0.2150081 0.2388471
## [9,] 0.3629134 0.2860375
## [10,] 0.1570531 0.3772293
## [11,] 0.2512142 0.5078601
## [12,] 0.2193016 0.2979722
## [13,] 0.2327375 0.2388471
## [14,] 0.3354087 0.4173631
## [15,] 0.1700149 0.3120328
## [16,] 0.2775637 0.3226046
## [17,] 0.1754413 0.2335185
## [18,] 0.1336975 0.3452054
## [19,] 0.4242621 0.1672693
## [20,] 0.1629538 0.2141633
## [21,] 0.2812402 0.2831341
## [22,] 0.3397406 0.4213964
## [23,] 0.1929757 0.2000467
## [24,] 0.2014972 0.7006459
## [25,] 0.3041320 0.2367664
## [26,] 0.3630795 0.3170896
## [27,] 0.5610632 0.2821392
## [28,] 1.1578542 0.1726027
## [29,] 0.4173631 0.2566185
## [30,] 0.2082249 0.3148065
## [31,] 0.2460339 0.6755171
## [32,] 0.2568198 0.1972482
## [33,] 0.3242640 0.3807001
## [34,] 0.4173631 0.3426226
## [35,] 0.2256444 0.2332586
## [36,] 0.3951084 0.3068582
## [37,] 0.3798416 0.3922987
## [38,] 0.2281735 0.3531422
## [39,] 0.3162263 0.2777448
## [40,] 0.2050457 0.3872905
## [41,] 0.1835911 0.3068582
## [42,] 0.2200671 0.2353792
## [43,] 0.2427522 0.3014796
## [44,] 0.1635006 0.2797502
## [45,] 0.2555220 0.1796278
## [46,] 0.4446798 0.7343645
## [47,] 0.2018360 0.2888578
## [48,] 0.2304216 0.1745531
## [49,] 0.2070187 0.1771865
## [50,] 0.2141633 0.2187302
## [51,] 0.1975542 0.2641792
## [52,] 0.2854392 0.1598595
## [53,] 0.2710332 0.2808209
## [54,] 0.5635993 0.6708174
## [55,] 0.2525679 0.1917120
## [56,] 0.1602515 0.4116935
## [57,] 0.1961493 0.1671080
## [58,] 0.2238680 0.2315708
## [59,] 0.2157575 0.4173631
## [60,] 0.1439032 0.2050457
## [61,] 0.2104048 0.3851040
## [62,] 0.3922987 0.4060677
## [63,] 0.2270739 0.3067231
## [64,] 0.2701339 0.3629134
## [65,] 0.4374604 0.2344105
## [66,] 0.5192741 0.2301437
## [67,] 0.2618559 0.3581181
## [68,] 0.1707163 0.3122501
## [69,] 0.4472116 0.2070187
## [70,] 0.1765744 0.3235935
## [71,] 0.6755171 0.1556055
## [72,] 0.1966965 1.5193663
## [73,] 0.1438911 0.3859514
## [74,] 0.4045181 0.1713113
## [75,] 0.2058269 0.2776332
## [76,] 0.2460339 0.9205747
## [77,] 0.2273373 0.2193016
## [78,] 0.2304216 0.2691956
## [79,] 0.2904724 0.5070903
## [80,] 0.1976940 0.2299022
## [81,] 0.1525239 0.5192741
## [82,] 0.6037349 0.4652090
## [83,] 0.2894636 0.2207941
## [84,] 0.4472116 0.2701339
## [85,] 0.8328997 0.2815715
## [86,] 0.1664950 0.3377586
## [87,] 0.2805316 0.2273373
## [88,] 0.3274906 0.1280590
## [89,] 0.4107901 0.2359441
## [90,] 0.3148065 0.2216147
## [91,] 0.4177039 0.5192741
## [92,] 0.1406660 1.1768961
## [93,] 0.5616418 0.3342437
## [94,] 0.2347372 0.3721025
## [95,] 0.2367664 0.8328997
## [96,] 0.2058468 0.2058468
## [97,] 0.1632061 0.8321912
## [98,] 0.6422620 0.2589041
## [99,] 0.1831395 0.1860513
## [100,] 0.1910803 0.1850888
## as.factor(Admission_Type)emergency as.factor(Admission_Type)newborn
## [1,] 0.2911113 0.3851040
## [2,] 0.3342160 0.6579049
## [3,] 0.4031131 0.4726920
## [4,] 0.2195765 0.2520493
## [5,] 0.5889490 0.5022574
## [6,] 0.1907718 0.2088520
## [7,] 0.2685886 0.2486647
## [8,] 0.3531422 0.2031011
## [9,] 1.0392258 0.2103851
## [10,] 0.2141633 Inf
## [11,] 0.2332893 0.2223399
## [12,] 0.2464738 0.3581181
## [13,] 0.2062997 0.2710332
## [14,] 0.3286320 0.4845415
## [15,] 0.1880130 0.1713590
## [16,] 0.2232722 0.2932132
## [17,] 0.2200671 0.6794813
## [18,] 0.4726920 0.2573009
## [19,] 0.1992624 0.1777039
## [20,] 0.2511287 0.2226018
## [21,] 0.2706235 0.4004121
## [22,] 0.1433535 0.4886795
## [23,] 0.4631417 0.2989567
## [24,] 0.1273423 0.5196129
## [25,] 0.3093312 0.2899785
## [26,] 0.2271220 0.1815398
## [27,] 0.2685886 0.3168098
## [28,] 0.2573009 0.3491062
## [29,] 0.4647559 0.2520493
## [30,] 0.2585822 0.1614165
## [31,] 0.3144361 0.1937669
## [32,] 0.1691669 0.3880111
## [33,] 0.1419852 0.1711821
## [34,] 0.1929794 1.3416347
## [35,] 0.2878063 0.1751050
## [36,] 0.2261409 0.2311780
## [37,] 0.3397406 0.2672005
## [38,] 0.1561250 0.1813803
## [39,] 0.1581132 0.1903714
## [40,] 0.1850888 0.2288152
## [41,] 0.2073330 0.1672785
## [42,] 0.4448249 0.2087010
## [43,] 0.2375867 0.4159135
## [44,] 0.7745932 0.3162263
## [45,] 0.2005387 0.2404374
## [46,] 0.1919531 0.4239856
## [47,] 0.2924022 0.2326045
## [48,] 0.3377808 1.0356163
## [49,] 0.1992624 0.1766737
## [50,] 0.3400299 0.1949671
## [51,] 0.3630795 0.3044515
## [52,] 0.1592729 0.2924022
## [53,] 0.1866051 1.1578542
## [54,] 0.5999973 0.3426226
## [55,] 0.1546656 0.2456087
## [56,] 0.2399879 0.2472989
## [57,] 0.3798416 0.3038733
## [58,] 0.6188991 0.2259900
## [59,] 0.2927687 0.1999991
## [60,] 0.4301076 0.2050457
## [61,] 0.2498885 0.4374488
## [62,] 0.1754413 0.3038733
## [63,] 0.5178081 0.2000467
## [64,] 0.2007440 0.2166936
## [65,] 0.1641361 0.2540320
## [66,] 0.1846432 0.2740277
## [67,] 0.2290531 0.3168098
## [68,] 0.4031131 0.1666883
## [69,] 0.6445005 0.2614456
## [70,] 0.3266902 0.5267720
## [71,] 0.2152797 0.1988669
## [72,] 0.3101393 0.2127538
## [73,] 0.3414326 0.2964961
## [74,] 0.2038089 0.7745932
## [75,] 0.3331599 0.1775748
## [76,] 0.1562446 0.3630795
## [77,] 0.2412402 0.3649400
## [78,] 0.3859514 0.1355166
## [79,] 0.1722500 0.2078452
## [80,] 0.4533732 0.1839568
## [81,] 0.7343645 0.3254723
## [82,] 0.3759457 0.3426077
## [83,] 0.1526933 0.4159135
## [84,] 0.1756612 0.1532255
## [85,] 0.2667414 0.1304755
## [86,] 0.2223399 0.1487787
## [87,] 0.3123158 0.1989317
## [88,] 0.2159409 1.1578542
## [89,] 0.1563143 0.2053950
## [90,] 0.3772293 0.1314877
## [91,] 0.3021004 0.2205099
## [92,] 0.3239300 0.1754413
## [93,] 0.1854051 0.3914262
## [94,] 0.3162263 0.2701339
## [95,] 0.2115567 0.2483227
## [96,] 0.2152797 0.6089029
## [97,] 0.3342160 0.6202787
## [98,] 0.1617361 1.3369749
## [99,] 0.1943242 0.2648193
## [100,] 0.3235935 0.2442847
## as.factor(Admission_Type)urgent
## [1,] 0.2706235
## [2,] 0.2924022
## [3,] 0.2088004
## [4,] 0.1512652
## [5,] 0.4301076
## [6,] 0.5587676
## [7,] 0.3721672
## [8,] 0.5979133
## [9,] 0.1923174
## [10,] 0.6800598
## [11,] 0.3851040
## [12,] 0.1833536
## [13,] 0.5312599
## [14,] 0.2614456
## [15,] 0.3176555
## [16,] 0.2757787
## [17,] 0.1851601
## [18,] 0.2105189
## [19,] 2.3237795
## [20,] 0.3093312
## [21,] 0.3093230
## [22,] 0.3685005
## [23,] 0.1750511
## [24,] 0.2130205
## [25,] 0.1904548
## [26,] 0.4239856
## [27,] 0.1724031
## [28,] 0.2573009
## [29,] 0.2435983
## [30,] 0.2968024
## [31,] 0.1575466
## [32,] 0.2906138
## [33,] 0.1830728
## [34,] 0.1265843
## [35,] 0.1917120
## [36,] 0.2376914
## [37,] 0.4652090
## [38,] 0.3274906
## [39,] 0.2078452
## [40,] 0.3266902
## [41,] 0.3170896
## [42,] 0.2299257
## [43,] 0.2218046
## [44,] 0.2166936
## [45,] 0.2667414
## [46,] 0.2423524
## [47,] 0.2382556
## [48,] 0.4159135
## [49,] 0.2566185
## [50,] 0.2469545
## [51,] 0.2757787
## [52,] 0.2030339
## [53,] 0.2497093
## [54,] 0.1787523
## [55,] 0.3093312
## [56,] 0.1757751
## [57,] 0.4269051
## [58,] 0.3616529
## [59,] 0.2757819
## [60,] 0.4100915
## [61,] 0.1504726
## [62,] 0.2888578
## [63,] 0.1530263
## [64,] 0.7348436
## [65,] 0.1983938
## [66,] 0.1528239
## [67,] 0.2155907
## [68,] 0.3377808
## [69,] 0.1736874
## [70,] 0.1536752
## [71,] 0.2223399
## [72,] 0.2248344
## [73,] 0.1964158
## [74,] 0.2103851
## [75,] 0.2650478
## [76,] 0.2596371
## [77,] 0.3967316
## [78,] 0.2767801
## [79,] 0.1622998
## [80,] 0.3066985
## [81,] 0.7343645
## [82,] 0.2464738
## [83,] 0.3414326
## [84,] 0.2103851
## [85,] 0.5267720
## [86,] 0.9205747
## [87,] 0.2672005
## [88,] 2.3157084
## [89,] 0.2139212
## [90,] 0.1845202
## [91,] 0.1975542
## [92,] 0.3215035
## [93,] 0.3242640
## [94,] 0.1626971
## [95,] 0.2483227
## [96,] 0.1684748
## [97,] 0.3922987
## [98,] 0.1183272
## [99,] 0.3985249
## [100,] 0.2310048
The Monte Carlo standard errors (MCSE) are computed for the corresponding Monte Carlo estimates of the posterior probabilities via this command: apply(mcmc_out_ts_highProb, 2, sd) / sqrt(mcmc_out$batch)
, where the second parameter represents margin=2
indicates the finction is computed for each column separately.
The extra term in the denominator, sqrt(mcmc_out$batch)
ensures that the batch means have variance \(\frac{\sigma^2}{b}\), where \(b\) is the batch length (\(out\$blen\)), whereas the CLT dictates that the overall means, \(\mu\), have variance \(\frac{\sigma^2}{n}\), where \(n\) is the total number of iterations, which is \(out\$blen \times out\$nbatch\).
The square root law implies that t guarantee one more significant digit of precision (\(10 \times \ accuracy\)) we need to run \(100\) times as many iterations. Let’s display some diagnostic plots on the mcmc_out_ts_highProb
results.
plot(mcmc_out_ts_highProb)
acf(mcmc_out_ts_highProb)
Most of these autocorrelation plots appear fine and we can generate the final overall summary of hte MCMC Bayesian inference.
library(knitr)
bar <- rbind(colMeans(mcmc_out_ts_highProb), apply(mcmc_out_ts_highProb, 2, sd) / sqrt(mcmc_out$nbatch))
rownames(bar) <- c("probability of death", "MCSE")
kable(bar, digits = 2, caption = "Estimated (higherst) Posterior Probability of dying following an admission to the hospital. These probability values may not add to one due to rounding error and the fact that we only chose the high-probability predictors.")
as.factor(Married_1)1 | as.factor(Insurance_Type)medicare | as.factor(Insurance_Type)private | as.factor(Admission_Type)emergency | as.factor(Admission_Type)urgent | |
---|---|---|---|---|---|
probability of death | 0.10 | 0.11 | 0.10 | 0.11 | 0.11 |
MCSE | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 |
We can also display histograms of the marginal posteriors for the parameters.
# mcmc_out <- metrop(out, blen = 1, nbatch = 1e5)
theta <- mcmc_out$batch[ , 1]
hist(theta, freq = FALSE)
# Other parameters
beta <- mcmc_out$batch[ , 2]
hist(beta, freq = FALSE)
# display smooth parameter density plots
out <- density(beta)
plot(out)
Of course, we can also try to forecast one of the clinical outcomes (“Death_1”, “AdmissionLengthDays”, or “Dx”) using the machine learning methods we saw in the previous chapters.
## [1] 58863 9
## [1] 47090 3
## [1] 11773 3
##
## pred_results_bin 0 1
## 0 10580 1193
## Loading required package: lattice
## Loading required package: ggplot2
## Warning in confusionMatrix.default(pred_results_bin, data_test$y_test):
## Levels are not in the same order for reference and data. Refactoring data
## to match.
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 10580 1193
## 1 0 0
##
## Accuracy : 0.8986664
## 95% CI : (0.893074, 0.9040609)
## No Information Rate : 0.8986664
## P-Value [Acc > NIR] : 0.5077103
##
## Kappa : 0
## Mcnemar's Test P-Value : < 0.00000000000000022
##
## Sensitivity : 1.0000000
## Specificity : 0.0000000
## Pos Pred Value : 0.8986664
## Neg Pred Value : NaN
## Prevalence : 0.8986664
## Detection Rate : 0.8986664
## Detection Prevalence : 1.0000000
## Balanced Accuracy : 0.5000000
##
## 'Positive' Class : 0
##
Let’s also try decision tree classification for the same hospitalization case-study.
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 10521 1041
## 1 59 152
##
## Accuracy : 0.9065659
## 95% CI : (0.9011652, 0.9117646)
## No Information Rate : 0.8986664
## P-Value [Acc > NIR] : 0.002155746
##
## Kappa : 0.1919116
## Mcnemar's Test P-Value : < 0.00000000000000022204
##
## Sensitivity : 0.9944234
## Specificity : 0.1274099
## Pos Pred Value : 0.9099637
## Neg Pred Value : 0.7203791
## Prevalence : 0.8986664
## Detection Rate : 0.8936550
## Detection Prevalence : 0.9820776
## Balanced Accuracy : 0.5609167
##
## 'Positive' Class : 0
##
Lastly, we can employ RandomForest classification of the clinical outcome veriable, death
.
## Loading required package: randomForest
## randomForest 4.6-12
## Type rfNews() to see new features/changes/bug fixes.
##
## Attaching package: 'randomForest'
## The following object is masked from 'package:ggplot2':
##
## margin
## [1] 47090 3
##
## Call:
## randomForest(formula = as.factor(y) ~ AdmissionLengthDays + Married_1, data = data_train, importance = TRUE, ntree = 3000, mtry = 2)
## Type of random forest: classification
## Number of trees: 3000
## No. of variables tried at each split: 2
##
## OOB estimate of error rate: 13.98%
## Confusion matrix:
## 0 1 class.error
## 0 39804 2627 0.06191228112
## 1 3955 704 0.84889461258
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 9942 1010
## 1 638 183
##
## Accuracy : 0.8600187
## 95% CI : (0.8536197, 0.8662391)
## No Information Rate : 0.8986664
## P-Value [Acc > NIR] : 1
##
## Kappa : 0.108037
## Mcnemar's Test P-Value : <0.0000000000000002
##
## Sensitivity : 0.9396975
## Specificity : 0.1533948
## Pos Pred Value : 0.9077794
## Neg Pred Value : 0.2228989
## Prevalence : 0.8986664
## Detection Rate : 0.8444746
## Detection Prevalence : 0.9302642
## Balanced Accuracy : 0.5465462
##
## 'Positive' Class : 0
##
## Warning in randomForest.default(x, y, mtry = mtryStart, ntree = ntreeTry, :
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## mtry = 1 OOB error = 0.07893820256
## Searching left ...
## Searching right ...
## mtry OOBError
## 1 1 0.07893820256
##
## Attaching package: 'text2vec'
## The following object is masked from 'package:igraph':
##
## normalize
## Loading required package: tm
## Loading required package: NLP
##
## Attaching package: 'NLP'
## The following object is masked from 'package:ggplot2':
##
## annotate
##
|
|====== | 10%
|
|============= | 20%
|
|==================== | 30%
|
|========================== | 40%
|
|================================ | 50%
|
|======================================= | 60%
|
|============================================== | 70%
|
|==================================================== | 80%
|
|========================================================== | 90%
|
|=================================================================| 100%
## Number of docs: 47090
## 0 stopwords: ...
## ngram_min = 1; ngram_max = 1
## Vocabulary:
## term term_count doc_count
## 1: circulating 1 1
## 2: aktered 1 1
## 3: subderal 1 1
## 4: spoon 1 1
## 5: instemi 1 1
## ---
## 4309: disease 3145 3116
## 4310: artery 4331 3054
## 4311: coronary 4462 3199
## 4312: sda 5351 5350
## 4313: newborn 6248 6248
##
|
|====== | 10%
|
|============= | 20%
|
|==================== | 30%
|
|========================== | 40%
|
|================================ | 50%
|
|======================================= | 60%
|
|============================================== | 70%
|
|==================================================== | 80%
|
|========================================================== | 90%
|
|=================================================================| 100%
##
|
|======= | 10%
|
|============= | 20%
|
|==================== | 30%
|
|========================== | 40%
|
|================================= | 50%
|
|======================================= | 60%
|
|============================================== | 70%
|
|==================================================== | 80%
|
|=========================================================== | 90%
|
|=================================================================| 100%
## [1] 47090 4313
## [1] 11773 4313
## [1] TRUE
## Loading required package: Matrix
## Loading required package: foreach
## Loaded glmnet 2.0-10
## [1] 0.001529383544
## Time difference of 10.07343006 secs
## [1] "max AUC = 0.7512"
## [1] 7.894190891
## yTest
## binPredLASSO 0 1
## 0 10624 1130
## 1 5 14
## [1] 0.7447707881
## [,1]
## 1 0.2752648448
##
## 0 1
## 0 10158 926
## 1 471 218
## Loading required package: SparseM
##
## Attaching package: 'SparseM'
## The following object is masked from 'package:base':
##
## backsolve
##
|
|====== | 10%
|
|============= | 20%
|
|==================== | 30%
|
|========================== | 40%
|
|================================ | 50%
|
|======================================= | 60%
|
|============================================== | 70%
|
|==================================================== | 80%
|
|========================================================== | 90%
|
|=================================================================| 100%
## [1] 47090 4313
##
|
|======= | 10%
|
|============= | 20%
|
|==================== | 30%
|
|========================== | 40%
|
|================================= | 50%
|
|======================================= | 60%
|
|============================================== | 70%
|
|==================================================== | 80%
|
|=========================================================== | 90%
|
|=================================================================| 100%
## [1] 11773 4313
Data-driven Bayesian inference treats all unknowns (parameters and data prior to being observed) as random variables. The inference is probabilistic in nature, not deterministic (the way the frequentist inference is), which quantifies the measures of intrinsic process uncertainty.
Bayesian modeling propagates uncertainty from the initial prior to derived posterior distributions using the Bayesian inversion of conditioning theorem. When the posterior is not in simple standard form, e.g., complex or non-conjugate priors, MCMC sample-based simulation provides a computationally-expensive, but scientifically valid, protocol for modeling and inference. MCMC provides a general receipe for obtaining representative samples from any posterior density.