SOCR ≫ DSPA ≫ DSPA2 Topics ≫

1 Power Analysis in Experimental Design

1.1 Background

Power analysis represents a statistical approach to explicate the relations between a number of parameters that affect most experimental designs. It is well known that data are proxies of the natural phenomena, or processes, about which we try to make inference, and the size of a sample is associated with our ability to derive useful information about the underlying process or make predictions about its past, present or future states. In some situations, given the sample-size and a certain degree of confidence, we can compute the power to statistically detect an effect of interest. Similarly, we can determine the likelihood of detecting an effect of a certain size, subject to a predefined level of confidence and specific sample size constraints. This power, or probability, to detect the effect of interest may be low, medium, or high, which would help us determine the potential value of the experiment.

In most experimental designs, power analyses establish a relation between 5 quantities:

  • Statistical test, an explicit reference to the statistical inference that will be conducted on the data collected by the experiment
  • Sample size, there are pros and cons to running large, or small, experiments
  • Effect size, how strong is the expected effect that we are trying to uncover by the experiment
  • Significance level, false-positive rate \(\alpha=P(Type I error) =\) probability of finding an effect that is not there
  • Power = \(\beta=1 - P(Type II error) =sensitivity=\) probability of finding an effect that is there

In mathematical terms, having any 3 of the last 4 parameters may allow us to estimate the last one. Note that there is no general analytical expression that provides an exact closed-form expression (e.g., implicit or explicit function) encoding relation between all 5 terms.

1.2 R-based Power Analysis

The R package pwr provides the core functionality to conduct power analysis for some situations. It includes the following methods:

function Corresponding Statistical Inference
cohen.ES Conventional effects size
ES.h Effect size calculation for proportions
ES.w1 Effect size calculation in the chi-squared test for goodness of fit
ES.w2 Effect size calculation in the chi-squared test for association
pwr.2p.test Two proportions test (equal sample sizes, n)
pwr.2p2n.test Two proportions (unequal n)
pwr.anova.test Balanced one way ANOVA
pwr.chisq.test Chi-square test
pwr.f2.test General linear model (GLM)
pwr.norm.test Power calculations for the mean of a normal distribution (known variance)
pwr.p.test Single sample proportion
pwr.r.test Correlation
pwr.t.test T-tests (one sample, 2 sample, paired)
pwr.t2n.test T-test (two samples with unequal n), t-tests of means

As each method explicitly specifies the statistical inference procedure, we need to only specify 3 of the remaining 4 quantities (effect size, sample size, significance level, and power) to calculate the last parameter. A common practice is to use the default significance level of \(\alpha=0.05\), and hence we are down to specifying 2 out of 3 remaining parameters. For instance, given an effect size (from prior research or an oracle) and a desired power, we can calculate an appropriate experimental design sample size.

Determining an effective and appropriate effect size is often a challenge that can be tackled either by running simulations, collecting data, or using Cohen’s social-studies protocol, which provides an outline of categorizing the effect size as small, medium or large.

1.3 Cohen’s Protocol for categorizing the effect size

Let’s look at some examples.

  • pwr.t.test(n = n, d = d, sig.level = a, power = b, type = c("two.sample", "one.sample", "paired")): In this method definition, \(n\) is the sample size, \(d\) is Cohen’s effect size, the desired power is \(b\), and type indicates the specific parametric t-test we choose.

  • pwr.t2n.test(n1 = n1, n2= n2, d = d, sig.level = a, power = b): This is a more general call for unequal sample-sizes \(n1\) and \(n2\), for independent t-tests. Cohen’s d characterizes the effect size to three values, \(0.2\), \(0.5\), and \(0.8\) representing small, medium, and large effect sizes, respectively.

  • pwr.anova.test(k = k, n = n, f = f, sig.level = a, power = b): A one-way analysis of variance (ANOVA) test with \(k\) number of groups, \(n\) common sample size within each group, and effect size \(f\). Cohen’s f values of 0.1, 0.25, and 0.4 represent small, medium, and large effect sizes, respectively.

  • pwr.r.test(n = n, r = r, sig.level = a, power = b): Correlation coefficient analysis, where \(n\) is the sample size and \(r\) is the correlation, which uses the population correlation coefficient as a measure of the effect size. Cohen’s r values of 0.1, 0.3, and 0.5 represent small, medium, and large effect sizes, respectively.

  • pwr.f2.test(u = u, v = v, f2 = f2, sig.level = a, power = b): Multivariate linear Models, including multiple linear regression, with \(u\), \(v\), and $f2 representing the ANOVA numerator and denominator degrees of freedom, and the effect size measure. Cohen’s f2 values of 0.02, 0.15, and 0.35 approximately represent small, medium, and large effect sizes, respectively.

  • pwr.chisq.test(w = w, N = N, df = df, sig.level = a, power = b): Chi-square Test with \(w\) the effect size, \(N\) the total sample size, and \(df\) the degrees of freedom. Cohen’s w values of 0.1, 0.3, and 0.5 represent small, medium, and large effect sizes, respectively.

1.4 R power calculation examples

1.4.1 One-way ANOVA

Let’s try to run power analysis for a 1-way ANOVA comparing 5 groups. Specifically, we are interested in estimating the sample size needed in each group to secure a power \(\beta \geq 0.80\), given a moderate effect size (\(0.25\)) and a significance level of 0.025.

# install.packages("pwr")
library(pwr)

pwr.anova.test(k=5, f=0.25, sig.level=0.025, power=0.8)
## 
##      Balanced one-way analysis of variance power calculation 
## 
##               k = 5
##               n = 46.12892
##               f = 0.25
##       sig.level = 0.025
##           power = 0.8
## 
## NOTE: n is number in each group

This suggests that at least \(47\) participants will be required (\(n=46.12892\)).

Would that sample size estimate increase or decrease when we increase or decrease the effect-size? Inspect the following two examples.

# install.packages("pwr")
# library(pwr)

pwr.anova.test(k=5, f=0.1, sig.level=0.025, power=0.8) # small effect-size
## 
##      Balanced one-way analysis of variance power calculation 
## 
##               k = 5
##               n = 282.3918
##               f = 0.1
##       sig.level = 0.025
##           power = 0.8
## 
## NOTE: n is number in each group
pwr.anova.test(k=5, f=0.4, sig.level=0.025, power=0.8) # large effect-size
## 
##      Balanced one-way analysis of variance power calculation 
## 
##               k = 5
##               n = 18.71997
##               f = 0.4
##       sig.level = 0.025
##           power = 0.8
## 
## NOTE: n is number in each group

For a 1-way ANOVA test, Cohen’s effect size \(f\) is categorized as 0.1 (small), 0.25 (medium), and 0.4 (large), but computed by:

\[f=\sqrt{\frac{\sum_{i=1}^k{p_i\times(\mu_i-\mu)^2}}{\sigma^2}},\] where \(n\) is the total number of observations in all groups, \(n_i\) is the number of observations in group \(i\), \(p_i=\frac{n_i}{n}\), \(\mu_i\) and \(\mu\) are the group \(i\) and overall means, and \(\sigma^2\) is the within-group variance. Similar analytical expressions exist for other statistical tests and there are corresponding sample-driven estimates of these effects that can be used for the practical calculations.

1.4.2 Two-sample T-test

Let’s run power analysis for a two-sample, one-sided, T-test using a significance level of \(\alpha=0.001\), \(n=30\) participants per group, and a large effect size of \(0.8\).

# install.packages("pwr")
# library(pwr)

pwr.t.test(n=30, d=0.8, sig.level=0.001, alternative="greater") # large effect-size
## 
##      Two-sample t test power calculation 
## 
##               n = 30
##               d = 0.8
##       sig.level = 0.001
##           power = 0.4526868
##     alternative = greater
## 
## NOTE: n is number in *each* group

This yields a power of \(\beta = 0.4526868\) to detect an effect.

1.5 Power and Sample Size Graphs

The pwr package also provides some functions to generate power and sample size plots.

1.5.1 Correlation Test

For instance, we can plot sample-size vs. effect-size curves for the power of detecting different levels of correlations, \(0.1\leq \rho\leq 0.8\), for a number of power values, \(0.3\leq\beta\leq 0.85\).

# install.packages("pwr")
# library(pwr)

r <- seq(0.1, 0.8, 0.01) # define a range of correlations and sampling rate within this range
nr <- length(r)

p <- seq(0.3, 0.85, 0.1) # define a range for the power values, and their sampling rate
np <- length(p)

# Compute the corresponding sample sizes for all combinations of correlations and power values
sampleSize <- array(numeric(nr*np), dim=c(nr, np))
for (i in 1:np) {
  for (j in 1:nr) {
    # solve for sample size (n)
    testResult <- pwr.r.test(n = NULL, r = r[j], sig.level = 0.05, power = p[i], alternative = "two.sided")
    sampleSize[j, i] <- ceiling(testResult$n) # round sample sizes up to nearest integer
    # print(sprintf("sampleSize[%d,%d]=%s", j,i, round(sampleSize[j, i], 2)))
  }
}

# Graph the power plot
xRange <- range(r)
yRange <- round(range(sampleSize))
colors <- rainbow(length(p))
plot(xRange, yRange, type="n", xlab="Correlation Coefficient (r)", ylab="Sample Size (n)")
# Add power curves
for (i in 1:np) lines(r, sampleSize[ , i], type="l", lwd=2, col=colors[i])
# add annotations (grid lines, title, legend)
abline(v=0, h=seq(0, yRange[2], 100), lty=2, col="light gray")
abline(h=0, v=seq(xRange[1], xRange[2], 0.1), lty=2, col="light gray")
title("Effect-size (X) vs. Sample-size (Y) for \n different Power values in 
      (0.3, 0.85), Significance=0.05 (Two-tailed Correlation Test)")
legend("topright", title="Power", as.character(p), fill=colors)