SOCR ≫ | DSPA ≫ | DSPA2 Topics ≫ |

*Power analysis* represents a statistical approach to explicate the relations between a number of parameters that affect most experimental designs. It is well known that data are proxies of the natural phenomena, or processes, about which we try to make inference, and the *size of a sample* is associated with our ability to derive useful information about the underlying process or make predictions about its past, present or future states. In some situations, given the sample-size and a certain degree of confidence, we can compute the **power** to statistically detect an effect of interest. Similarly, we can determine the likelihood of detecting an effect of a certain size, subject to a predefined *level of confidence* and specific sample size constraints. This power, or probability, to detect the effect of interest may be *low*, *medium*, or *high*, which would help us determine the *potential* value of the experiment.

In most experimental designs, *power analyses* establish a relation between 5 quantities:

*Statistical test*, an explicit reference to the statistical inference that will be conducted on the data collected by the experiment*Sample size*, there are pros and cons to running large, or small, experiments*Effect size*, how strong is the expected effect that we are trying to uncover by the experiment*Significance level*, false-positive rate \(\alpha=P(Type I error) =\) probability of finding an effect that is not there*Power*= \(\beta=1 - P(Type II error) =sensitivity=\) probability of finding an effect that is there

In mathematical terms, having any 3 of the last 4 parameters *may* allow us to estimate the last one. Note that there is no general analytical expression that provides an exact closed-form expression (e.g., implicit or explicit function) encoding relation between all 5 terms.

The R package pwr provides the core functionality to conduct power analysis for some situations. It includes the following methods:

function | Corresponding Statistical Inference |
---|---|

cohen.ES |
Conventional effects size |

ES.h |
Effect size calculation for proportions |

ES.w1 |
Effect size calculation in the chi-squared test for goodness of fit |

ES.w2 |
Effect size calculation in the chi-squared test for association |

pwr.2p.test |
Two proportions test (equal sample sizes, n) |

pwr.2p2n.test |
Two proportions (unequal n) |

pwr.anova.test |
Balanced one way ANOVA |

pwr.chisq.test |
Chi-square test |

pwr.f2.test |
General linear model (GLM) |

pwr.norm.test |
Power calculations for the mean of a normal distribution (known variance) |

pwr.p.test |
Single sample proportion |

pwr.r.test |
Correlation |

pwr.t.test |
T-tests (one sample, 2 sample, paired) |

pwr.t2n.test |
T-test (two samples with unequal n), t-tests of means |

As each method explicitly specifies the statistical inference procedure, we need to only specify 3 of the remaining 4 quantities (effect size, sample size, significance level, and power) to calculate the last parameter. A common practice is to use the default significance level of \(\alpha=0.05\), and hence we are down to specifying 2 out of 3 remaining parameters. For instance, given an effect size (from prior research or an oracle) and a desired power, we can calculate an appropriate experimental design sample size.

Determining an effective and appropriate effect size is often a challenge that can be tackled either by running simulations, collecting data, or using Cohen’s social-studies protocol, which provides an outline of categorizing the effect size as small, medium or large.

Let’s look at some examples.

`pwr.t.test(n = n, d = d, sig.level = a, power = b, type = c("two.sample", "one.sample", "paired"))`

: In this method definition, \(n\) is the sample size, \(d\) is Cohen’s effect size, the desired power is \(b\), and type indicates the specific parametric t-test we choose.`pwr.t2n.test(n1 = n1, n2= n2, d = d, sig.level = a, power = b)`

: This is a more general call for unequal sample-sizes \(n1\) and \(n2\), for independent t-tests.**Cohen’s d**characterizes the effect size to three values, \(0.2\), \(0.5\), and \(0.8\) representing*small*,*medium*, and*large*effect sizes, respectively.`pwr.anova.test(k = k, n = n, f = f, sig.level = a, power = b)`

: A one-way analysis of variance (ANOVA) test with \(k\) number of groups, \(n\) common sample size within each group, and effect size \(f\).**Cohen’s f**values of 0.1, 0.25, and 0.4 represent small, medium, and large effect sizes, respectively.`pwr.r.test(n = n, r = r, sig.level = a, power = b)`

: Correlation coefficient analysis, where \(n\) is the sample size and \(r\) is the correlation, which uses the population correlation coefficient as a measure of the effect size.**Cohen’s r**values of 0.1, 0.3, and 0.5 represent small, medium, and large effect sizes, respectively.`pwr.f2.test(u = u, v = v, f2 = f2, sig.level = a, power = b)`

: Multivariate linear Models, including multiple linear regression, with \(u\), \(v\), and $f2 representing the ANOVA numerator and denominator degrees of freedom, and the effect size measure.**Cohen’s f2**values of 0.02, 0.15, and 0.35 approximately represent small, medium, and large effect sizes, respectively.`pwr.chisq.test(w = w, N = N, df = df, sig.level = a, power = b)`

: Chi-square Test with \(w\) the effect size, \(N\) the total sample size, and \(df\) the degrees of freedom.**Cohen’s w**values of 0.1, 0.3, and 0.5 represent small, medium, and large effect sizes, respectively.

Let’s try to run power analysis for a 1-way ANOVA comparing 5 groups. Specifically, we are interested in estimating the sample size needed in each group to secure a power \(\beta \geq 0.80\), given a moderate effect size (\(0.25\)) and a significance level of 0.025.

```
# install.packages("pwr")
library(pwr)
pwr.anova.test(k=5, f=0.25, sig.level=0.025, power=0.8)
```

```
##
## Balanced one-way analysis of variance power calculation
##
## k = 5
## n = 46.12892
## f = 0.25
## sig.level = 0.025
## power = 0.8
##
## NOTE: n is number in each group
```

This suggests that at least \(47\) participants will be required (\(n=46.12892\)).

Would that sample size estimate increase or decrease when we increase or decrease the effect-size? Inspect the following two examples.

```
# install.packages("pwr")
# library(pwr)
pwr.anova.test(k=5, f=0.1, sig.level=0.025, power=0.8) # small effect-size
```

```
##
## Balanced one-way analysis of variance power calculation
##
## k = 5
## n = 282.3918
## f = 0.1
## sig.level = 0.025
## power = 0.8
##
## NOTE: n is number in each group
```

`pwr.anova.test(k=5, f=0.4, sig.level=0.025, power=0.8) # large effect-size`

```
##
## Balanced one-way analysis of variance power calculation
##
## k = 5
## n = 18.71997
## f = 0.4
## sig.level = 0.025
## power = 0.8
##
## NOTE: n is number in each group
```

For a 1-way ANOVA test, Cohen’s effect size \(f\) is categorized as 0.1 (small), 0.25 (medium), and 0.4 (large), but computed by:

\[f=\sqrt{\frac{\sum_{i=1}^k{p_i\times(\mu_i-\mu)^2}}{\sigma^2}},\] where \(n\) is the total number of observations in all groups, \(n_i\) is the number of observations in group \(i\), \(p_i=\frac{n_i}{n}\), \(\mu_i\) and \(\mu\) are the group \(i\) and overall means, and \(\sigma^2\) is the within-group variance. Similar analytical expressions exist for other statistical tests and there are corresponding sample-driven estimates of these effects that can be used for the practical calculations.

Let’s run power analysis for a two-sample, one-sided, T-test using a significance level of \(\alpha=0.001\), \(n=30\) participants per group, and a large effect size of \(0.8\).

```
# install.packages("pwr")
# library(pwr)
pwr.t.test(n=30, d=0.8, sig.level=0.001, alternative="greater") # large effect-size
```

```
##
## Two-sample t test power calculation
##
## n = 30
## d = 0.8
## sig.level = 0.001
## power = 0.4526868
## alternative = greater
##
## NOTE: n is number in *each* group
```

This yields a power of \(\beta = 0.4526868\) to detect an effect.

The `pwr`

package also provides some functions to generate power and sample size plots.

For instance, we can plot sample-size vs. effect-size curves for the power of detecting different levels of correlations, \(0.1\leq \rho\leq 0.8\), for a number of power values, \(0.3\leq\beta\leq 0.85\).

```
# install.packages("pwr")
# library(pwr)
<- seq(0.1, 0.8, 0.01) # define a range of correlations and sampling rate within this range
r <- length(r)
nr
<- seq(0.3, 0.85, 0.1) # define a range for the power values, and their sampling rate
p <- length(p)
np
# Compute the corresponding sample sizes for all combinations of correlations and power values
<- array(numeric(nr*np), dim=c(nr, np))
sampleSize for (i in 1:np) {
for (j in 1:nr) {
# solve for sample size (n)
<- pwr.r.test(n = NULL, r = r[j], sig.level = 0.05, power = p[i], alternative = "two.sided")
testResult <- ceiling(testResult$n) # round sample sizes up to nearest integer
sampleSize[j, i] # print(sprintf("sampleSize[%d,%d]=%s", j,i, round(sampleSize[j, i], 2)))
}
}
# Graph the power plot
<- range(r)
xRange <- round(range(sampleSize))
yRange <- rainbow(length(p))
colors plot(xRange, yRange, type="n", xlab="Correlation Coefficient (r)", ylab="Sample Size (n)")
# Add power curves
for (i in 1:np) lines(r, sampleSize[ , i], type="l", lwd=2, col=colors[i])
# add annotations (grid lines, title, legend)
abline(v=0, h=seq(0, yRange[2], 100), lty=2, col="light gray")
abline(h=0, v=seq(xRange[1], xRange[2], 0.1), lty=2, col="light gray")
title("Effect-size (X) vs. Sample-size (Y) for \n different Power values in
(0.3, 0.85), Significance=0.05 (Two-tailed Correlation Test)")
legend("topright", title="Power", as.character(p), fill=colors)
```