Where we are

  • Phase 3 trials are complicated

  • Plan for semester is to discuss
    • Design elements (how many patients to enroll; what is outcome to measure)
    • Conduct elements (how to allocate patients to treatment arms; when to stop trial)
    • Analysis elements (what findings will be reported)

A glimpse into your future

Why do sample size calculations?

  1. Too few subjects will…

    • …lead to statistical insignificance. Failing to reject \(H_0\) is not same as accepting \(H_0\)

    • …enroll patients to experimental therapy with no scientific purpose

  2. Too many subjects will…

    • …identify clinically insignificant effects

    • …result in design not being funded

    • …take patients away from other potentially effective treatments

Limitations of sample size calculations

  • Formal calculations are necessary but are ultimately approximations. There are/will be
    • incorrect assumptions
    • logistical constraints
    • changes to design mid-trial
    • subject withdrawal

Summary of approach for calculating sample size \(n\)

  1. State null hypothesis (\(H_0\)) and alternative hypothesis (\(H_1\)) precisely

  2. Determine primary outcome to measure in each subject that will distinguish between \(H_0\) and \(H_1\)

  3. Calculate test statistic and its distribution under \(H_0\) and \(H_1\)

Hypothesis testing setup

Decision \(H_0\) True \(H_1\) True
Reject \(H_0\) type I error true positive
Do Not Reject \(H_0\) true negative type II error
  • \(\alpha = \Pr(\text{type I error}) = \text{"type I error rate"}\)
  • \(\beta = \Pr(\text{type II error}) = \text{"type II error rate"}\)
  • \(1-\beta = 1-\Pr(\text{type II error}) = \text{power}\)

How to choose \(\alpha\), \(\beta\) in design?

  1. Conventional regulatory perspective: \(\alpha < 0.05\) and \(\beta \in (0.10, 0.20)\)

  2. Rule-of-thumb perspective:

    • Set \(\alpha = \beta\) when control and experimental treatments is approximately symmetric
    • Set \(\alpha > \beta\) if there is no safe, effective standard of care or experimental treatment is inexpensive, simple, and exciting
    • Set \(\alpha < \beta\) if negation of above is true.
  3. Annoying statistical perspective that doesn't directly answer question:

    • \(\alpha\) and \(\beta\) are theoretical constructs only interpretable in context of long-run frequencies of perfect replications of identical trials
    • Furthermore, not easily comparable: \(\alpha\) is single number (false positive rate under null hypothesis), and \(\beta\) is function (false negative rate under range of alternative hypotheses)

Other ingredients in sample size recipes

  • \(n\): sample size. Larger \(n\) provides more information to distinguish between \(H_0\) and \(H_1\)

  • \(\delta\): distance between \(H_0\) and \(H_1\). Larger \(\delta\Rightarrow\) smaller \(n\)

  • \(\sigma^2\): variance or dispersion parameter representing amount of noise in data. Larger \(\sigma^2\Rightarrow\) larger \(n\)

  • Derive equation that contains \(\alpha\), \(\beta\), \(n\), \(\delta\), and \(\sigma^2\)

  • Can fix any four elements and solve for 5th

Do not consider \(delta\) or \(\sigma^2\) as 'tuning' parameters

  • Mortality rate under standard of care is 50%. New therapy may reduce mortality by 10%

  • \(\delta = 0.10\), \(\sigma^2\) determined by binomial distribution

  • \(\alpha=0.05\); \(\beta=0.20\)

  • Find required \(n=194\)

  • Investigator can only recruit and fund 100 patient stud

  • 100 patients would give 50% power when \(\delta=0.10\), but you observe that, if \(\delta=0.14\), you would have about 80% power

  • Now you've powered study to detect 14% absolute reduction in mortality. Realistic?

Sample size formulas: To Do

  1. One-, Two arms (Normal; constant variance)
  2. Two arms (Normal; non-constant variance)
  3. Paired outcomes (Normal)
  4. Two arms (Normal; non-inferiority)
  5. Time-to-event (next lecture: with censoring)

Some assumptions

  • Simple hypothesis tests
  • One-sided testing (simple extension to two-sided)
  • Larger outcomes are better (\(\delta>0\))

One arm, Normal with constant variance

  • Collect data \(Y_1,\ldots,Y_n \sim N(\mu,\sigma^2)\) from \(n\) iid subjects

  • \(H_0\): \(\mu = m_0\)

  • \(H_1\): \(\mu = m_0 + \delta\)

  • Estimate \(\mu\) with \(\bar Y = \sum_i Y_i/n \sim N(\mu,\sigma^2/n)\)

Distribution of \(\bar Y\), \(n=5\)

Distribution of \(\bar Y\) as \(n\) changes

Distribution of \(\bar Y\), \(n=25\)

One arm, Normal with constant variance

  • Test statistic is \(T=(\bar Y - m_0)/(\sigma/\sqrt{n})\)

  • Under \(H_0\), \(T\sim N(0,1)\)

  • Under \(H_1\), \(T\sim N(\delta/(\sigma/\sqrt{n}),1)\)

  • Will reject \(H_0\) if \(T> z_{1-\alpha}\)

Distribution of \(T\), \(n=5\)

Distribution of \(T\) as \(n\) changes

Distribution of \(T\), \(n=25\)

One arm, Normal with constant variance

\[ \begin{aligned} 1 - \beta &= \Pr(\text{Reject }H_0|H_1)\\ &= \Pr(T > z_{1-\alpha}|H_1)\\ &= \Pr\left(\dfrac{\bar Y - m_0}{\sigma/\sqrt{n}} > z_{1-\alpha}\big|H_1\right)\\ &= \Pr\left(\dfrac{\bar Y - m_0 - \delta}{\sigma/\sqrt{n}} > z_{1-\alpha} - \dfrac{\delta}{\sigma/\sqrt{n}}\big|H_1\right) \end{aligned} \]

  • This probability, plotted against \(\delta/\sigma\), is called "power curve"

Power curve, \(n=5\)

Power curve, varying \(n\)

Power curve, \(n=25\)

One arm, Normal with constant variance

\[ 1-\beta = \Pr\left(\dfrac{\bar Y - m_0 - \delta}{\sigma/\sqrt{n}} > z_{1-\alpha} - \dfrac{\delta}{\sigma/\sqrt{n}}\big|H_1\right) \]

Claim: LHS of inequality is \(\sim N(0,1)\) under \(H_1\), so RHS is, by definition, \(z_\beta\) (equal to \(-z_{1-\beta}\))

\[ \begin{aligned} 1-\beta &= \Pr\left(Z > z_{1-\alpha} - \dfrac{\delta}{\sigma/\sqrt{n}}\big|H_1\right)\\ \Rightarrow - z_{1-\beta}&= z_{1-\alpha} - \dfrac{\delta}{\sigma/\sqrt{n}} \\ \Rightarrow n &= \dfrac{(z_{1-\alpha} + z_{1-\beta})^2}{(\delta/\sigma)^2} \end{aligned} \]

Estimate of \(\sigma\)

  • In practice \(\sigma\) will be estimated, i.e. \(t\)-tests instead of \(z\)-tests

  • \(n = \dfrac{(t_{1-\alpha,\text{df}} + t_{1-\beta,\text{df}})^2}{(\delta/\sigma)^2}\)

  • But df depends on \(n\).

\((z_{1-\alpha} + z_{1-\beta})^2/0.25-n=0\)

\((t_{1-\alpha,n-1} + t_{1-\beta,n-1})^2/0.25-n=0\)

Still not 100% correct

  • What incorrect assumption about distribution of \(T\) under \(H_1\) is made in previous plot?

Word of the day – SPECIAL EDITION!

A noble spirit embiggens the smallest man

Two arms, Normal with constant (equal) variance

  • Collect \(Y_1,\ldots,Y_{n_A} \sim N(\mu_A,\sigma^2)\) from \(n_A\) iid subjects

  • Independently, collect \(X_1,\ldots,X_{n_B} \sim N(\mu_B,\sigma^2)\) from \(n_B\) iid subjects

  • \(H_0\): \(\mu_A-\mu_B = 0\)

  • \(H_1\): \(\mu_A-\mu_B = \delta\)

  • Estimate \(\mu_A\) and \(\mu_B\) with \(\bar Y = \sum_i Y_i/n_A\) and \(\bar X = \sum_i X_i/n_B\), respectively

Two arms, Normal with constant (equal) variance

  • \(T = \dfrac{\bar Y - \bar X}{\sigma\sqrt{1/n_A + 1/n_B}}\)

  • \(\sigma\sqrt{1/n_A + 1/n_B}\) is standard error of \(\bar Y - \bar X\).

Two arms, Normal with constant (equal) variance

\[ \begin{aligned} 1 - \beta &= \Pr(\text{Reject }H_0| H_1)\\ &= \Pr(T > z_{1-\alpha}|H_1)\\ &= \Pr\left(\dfrac{\bar Y - \bar X}{\sigma\sqrt{1/n_A + 1/n_B}} > z_{1-\alpha}\big|H_1\right)\\ &= \Pr\left(\dfrac{\bar Y - \bar X-\delta}{\sigma\sqrt{1/n_A + 1/n_B}} > z_{1-\alpha} - \dfrac{\delta}{\sigma\sqrt{1/n_A + 1/n_B}}\big|H_1\right)\\ &= \Pr\left(Z > z_{1-\alpha} - \dfrac{\delta}{\sigma\sqrt{1/n_A + 1/n_B}}\big|H_1\right) \end{aligned} \]

Power curve; \(n_A+n_B=80\)

Two arms, Normal with constant (equal) variance

\[ \Rightarrow \dfrac{\delta}{\sigma\sqrt{1/n_A + 1/n_B}} = z_{1-\alpha} + z_{1-\beta} \]

Write \(n_B = r n_A\), so that \[ \begin{aligned} &=\dfrac{\delta}{\sigma\sqrt{1/n_A + 1/(r n_A)}} = z_{1-\alpha} + z_{1-\beta}\\ &=\dfrac{\delta}{(\sigma/\sqrt{n_A})\sqrt{1 + 1/r}} = z_{1-\alpha} + z_{1-\beta}\\ &\Rightarrow n_A = \dfrac{(z_{1-\alpha} + z_{1-\beta})^2}{(\delta/\sigma)^2}(1+1/r) \end{aligned} \]

More Simpsons…

Two arms, Normal with non-constant variance

  • Collect \(Y_1,\ldots,Y_{n_A} \sim N(\mu_A,\sigma_A^2)\) from \(n_A\) iid subjects

  • Independently, collect \(X_1,\ldots,X_{n_B} \sim N(\mu_B,\sigma_B^2)\) from \(n_B\) iid subjects

  • \(H_0\): \(\mu_A-\mu_B = 0\)

  • \(H_1\): \(\mu_A-\mu_B = \delta\)

  • Estimate \(\mu_A\) and \(\mu_B\) with \(\bar Y = \sum_i Y_i/n_A\) and \(\bar X = \sum_i X_i/n_B\), respectively

Two arms, Normal with non-constant variance

  • \(T = \dfrac{\bar Y - \bar X}{\sqrt{\sigma_A^2/n_A + \sigma_B^2/n_B}}\)

  • \(\sqrt{\sigma_A^2/n_A + \sigma_B^2/n_B}\) is standard error of \(\bar Y - \bar X\).

Two arms, Normal with non-constant variance

  • …algebra…

  • \(n_A = \dfrac{(z_{1-\alpha} + z_{1-\beta} )^2}{\delta^2/(\sigma^2_A + \sigma^2_B/r)}\)
  • \(n_B = r n_A\)

Two arms, Normal, paired with constant variance

  • Collect \(Y_{ij},\ldots,Y_{nj} \sim N(\mu_j,\sigma^2)\) from \(i=1,\ldots,n\) subjects at \(j=1,2\) time points

  • \(\text{Cor}(Y_{i1}, Y_{i2})=\rho\)

  • \(H_0\): \(\mu_1-\mu_2 = 0\)

  • \(H_1\): \(\mu_1-\mu_2 = \delta\)

  • Estimate \(\mu_1\) and \(\mu_2\) with \(\bar Y_1 = \sum_i Y_{i1}/n\) and \(\bar Y_2 = \sum_i Y_{i2}/n\), respectively

Two arms, Normal, paired with constant variance

  • Recognize that \(Y_{i1} - Y_{i2}\sim N(\mu_1-\mu_2,2\sigma^2[1-\rho])\) (iid)

  • \(n = \dfrac{(z_{1-\alpha} + z_{1-\beta} )^2}{\delta^2/(2\sigma^2[1-\rho])}\)

Summary of Normal formulae

Description Formula
One arm; constant variance \(n = \dfrac{(z_{1-\alpha} + z_{1-\beta})^2}{(\delta/\sigma)^2}\)
Two arms; constant variance \(n_A = (1+1/r)\dfrac{(z_{1-\alpha} + z_{1-\beta})^2}{(\delta/\sigma)^2}; \quad n_B = rn_A\)
Two arms; non-constant variance \(n_A = \dfrac{(z_{1-\alpha} + z_{1-\beta})^2}{\delta^2/ (\sigma_A^2+\sigma_B^2/r)}; \quad n_B = rn_A\)
One arm; two observations; constant variance \(n = \dfrac{(z_{1-\alpha} + z_{1-\beta} )^2}{\delta^2/(2\sigma^2[1-\rho])}\)

Word of the day!

Can use as approximations for binary outcomes

  • \(Y_1,\ldots,Y_{n_A} \sim Bin(p_A)\) from \(n_A\) iid subjects
  • \(X_1,\ldots,Y_{n_B} \sim Bin(p_B)\) from \(n_B\) iid subjects

  • \(\delta = p_A - p_B\)
  • \(\sigma_A^2 \equiv p_A(1-p_A)\); \(\sigma_B^2\equiv p_B(1-p_B)\) (Bernoulli variances)

Non-inferiority designs

  • Suppose not interested in proving that experimental treatment is better than standard of care but only that it's not too much worse. When might this happen?

    • Experimental treatment is less toxic
    • Experimental treatment is easier to administer
    • Experimental treatment is significantly cheaper
    • Reduced dosage of standard of care

Superiority versus non-inferiority

  • Superiority trials ask "Is experimental treatment better than standard of care?"

  • Frequentist hypothesis testing defaults to \(H_0\) in absence of evidence

  • In superiority context, \(H_0:\delta = 0\), underpowered trial would be wasteful and unethical but would generally not carry foward inferior therapies
  • Applying same \(H_0\) to non-inferiority question, underpowered trial would be wasteful and unethical and carry forward potentially inferior therapies

Superiority versus non-inferiority

Figure 2, (Antman, 2001)

Two arms, Normal, non-inferiority

  • Collect \(Y_1,\ldots,Y_{n_A} \sim N(\mu_A,\sigma^2)\) from \(n_A\) iid subjects (experimental therapy)
  • Independently, collect \(X_1,\ldots,X_{n_B} \sim N(\mu_B,\sigma^2)\) from \(n_B\) iid subjects (standard of care)
  • Want to establish that difference in average response is no more than \(\delta\) (in favor of standard Arm B). Why?
  • Highlights why we "fail to reject \(H_0\)" rather than "accept \(H_0\)"

Two arms, Normal, non-inferiority

  • \(H_0\): \(\mu_A-\mu_B = -\delta\)

  • \(H_1\): \(\mu_A-\mu_B = 0\)

  • \(T = \dfrac{\bar Y - \bar X - \delta}{\sigma\sqrt{1/n_A + 1/n_B}}\)

  • Yields \(\dfrac{\delta}{\sigma\sqrt{1/n_A + 1/n_B}} = z_{1-\alpha} + z_{1-\beta}\) (same as before)

  • Real challenge in non-inferiority trial design is not directly related to sample size calculation but coming up with suitable \(\delta\)

References

Antman, E.M. (2001) Clinical trials in cardiovascular medicine. Circulation, 103, e101–e104.