Bios619 Lecture 8: Sample Size Calculations

Where we are

Phase 3 trials are complicated
Plan for semester is to discuss
- Design elements (how many patients to enroll; what is outcome to measure)
- Conduct elements (how to allocate patients to treatment arms; when to stop trial)
- Analysis elements (what findings will be reported)

A glimpse into your future

Why do sample size calculations?

Too few subjects will…
- …lead to statistical insignificance. Failing to reject \(H_0\) is not same as accepting \(H_0\)
- …enroll patients to experimental therapy with no scientific purpose
Too many subjects will…
- …identify clinically insignificant effects
- …result in design not being funded
- …take patients away from other potentially effective treatments

Limitations of sample size calculations

Formal calculations are necessary but are ultimately approximations. There are/will be
- incorrect assumptions
- logistical constraints
- changes to design mid-trial
- subject withdrawal

Summary of approach for calculating sample size \(n\)

State null hypothesis (\(H_0\)) and alternative hypothesis (\(H_1\)) precisely
Determine primary outcome to measure in each subject that will distinguish between \(H_0\) and \(H_1\)
Calculate test statistic and its distribution under \(H_0\) and \(H_1\)

Hypothesis testing setup

Decision	\(H_0\) True	\(H_1\) True
Reject \(H_0\)	type I error	true positive
Do Not Reject \(H_0\)	true negative	type II error

\(\alpha = \Pr(\text{type I error}) = \text{"type I error rate"}\)
\(\beta = \Pr(\text{type II error}) = \text{"type II error rate"}\)
\(1-\beta = 1-\Pr(\text{type II error}) = \text{power}\)

How to choose \(\alpha\), \(\beta\) in design?

Conventional regulatory perspective: \(\alpha < 0.05\) and \(\beta \in (0.10, 0.20)\)
Rule-of-thumb perspective:
- Set \(\alpha = \beta\) when control and experimental treatments is approximately symmetric
- Set \(\alpha > \beta\) if there is no safe, effective standard of care or experimental treatment is inexpensive, simple, and exciting
- Set \(\alpha < \beta\) if negation of above is true.
Annoying statistical perspective that doesn't directly answer question:
- \(\alpha\) and \(\beta\) are theoretical constructs only interpretable in context of long-run frequencies of perfect replications of identical trials
- Furthermore, not easily comparable: \(\alpha\) is single number (false positive rate under null hypothesis), and \(\beta\) is function (false negative rate under range of alternative hypotheses)

Other ingredients in sample size recipes

\(n\): sample size. Larger \(n\) provides more information to distinguish between \(H_0\) and \(H_1\)
\(\delta\): distance between \(H_0\) and \(H_1\). Larger \(\delta\Rightarrow\) smaller \(n\)
\(\sigma^2\): variance or dispersion parameter representing amount of noise in data. Larger \(\sigma^2\Rightarrow\) larger \(n\)
Derive equation that contains \(\alpha\), \(\beta\), \(n\), \(\delta\), and \(\sigma^2\)
Can fix any four elements and solve for 5th

Do not consider \(delta\) or \(\sigma^2\) as 'tuning' parameters

Mortality rate under standard of care is 50%. New therapy may reduce mortality by 10%
\(\delta = 0.10\), \(\sigma^2\) determined by binomial distribution
\(\alpha=0.05\); \(\beta=0.20\)
Find required \(n=194\)
Investigator can only recruit and fund 100 patient stud
100 patients would give 50% power when \(\delta=0.10\), but you observe that, if \(\delta=0.14\), you would have about 80% power
Now you've powered study to detect 14% absolute reduction in mortality. Realistic?

Sample size formulas: To Do

One-, Two arms (Normal; constant variance)
Two arms (Normal; non-constant variance)
Paired outcomes (Normal)
Two arms (Normal; non-inferiority)
Time-to-event (next lecture: with censoring)

Some assumptions

Simple hypothesis tests
One-sided testing (simple extension to two-sided)
Larger outcomes are better (\(\delta>0\))

One arm, Normal with constant variance

Collect data \(Y_1,\ldots,Y_n \sim N(\mu,\sigma^2)\) from \(n\) iid subjects
\(H_0\): \(\mu = m_0\)
\(H_1\): \(\mu = m_0 + \delta\)
Estimate \(\mu\) with \(\bar Y = \sum_i Y_i/n \sim N(\mu,\sigma^2/n)\)

Distribution of \(\bar Y\), \(n=5\)

Distribution of \(\bar Y\) as \(n\) changes

http://www.umich.edu/~philb/Winter2018Slides/ybar1.html

Distribution of \(\bar Y\), \(n=25\)

One arm, Normal with constant variance

Test statistic is \(T=(\bar Y - m_0)/(\sigma/\sqrt{n})\)
Under \(H_0\), \(T\sim N(0,1)\)
Under \(H_1\), \(T\sim N(\delta/(\sigma/\sqrt{n}),1)\)
Will reject \(H_0\) if \(T> z_{1-\alpha}\)

Distribution of \(T\), \(n=5\)

Distribution of \(T\) as \(n\) changes

http://www.umich.edu/~philb/Winter2018Slides/teststat1.html

Distribution of \(T\), \(n=25\)

One arm, Normal with constant variance

\[ \begin{aligned} 1 - \beta &= \Pr(\text{Reject }H_0|H_1)\\ &= \Pr(T > z_{1-\alpha}|H_1)\\ &= \Pr\left(\dfrac{\bar Y - m_0}{\sigma/\sqrt{n}} > z_{1-\alpha}\big|H_1\right)\\ &= \Pr\left(\dfrac{\bar Y - m_0 - \delta}{\sigma/\sqrt{n}} > z_{1-\alpha} - \dfrac{\delta}{\sigma/\sqrt{n}}\big|H_1\right) \end{aligned} \]

This probability, plotted against \(\delta/\sigma\), is called "power curve"

Power curve, \(n=5\)

Power curve, varying \(n\)

http://www.umich.edu/~philb/Winter2018Slides/power1.html

Power curve, \(n=25\)

One arm, Normal with constant variance

\[ 1-\beta = \Pr\left(\dfrac{\bar Y - m_0 - \delta}{\sigma/\sqrt{n}} > z_{1-\alpha} - \dfrac{\delta}{\sigma/\sqrt{n}}\big|H_1\right) \]

Claim: LHS of inequality is \(\sim N(0,1)\) under \(H_1\), so RHS is, by definition, \(z_\beta\) (equal to \(-z_{1-\beta}\))

\[ \begin{aligned} 1-\beta &= \Pr\left(Z > z_{1-\alpha} - \dfrac{\delta}{\sigma/\sqrt{n}}\big|H_1\right)\\ \Rightarrow - z_{1-\beta}&= z_{1-\alpha} - \dfrac{\delta}{\sigma/\sqrt{n}} \\ \Rightarrow n &= \dfrac{(z_{1-\alpha} + z_{1-\beta})^2}{(\delta/\sigma)^2} \end{aligned} \]

Estimate of \(\sigma\)

In practice \(\sigma\) will be estimated, i.e. \(t\)-tests instead of \(z\)-tests
\(n = \dfrac{(t_{1-\alpha,\text{df}} + t_{1-\beta,\text{df}})^2}{(\delta/\sigma)^2}\)
But df depends on \(n\).

\((z_{1-\alpha} + z_{1-\beta})^2/0.25-n=0\)

\((t_{1-\alpha,n-1} + t_{1-\beta,n-1})^2/0.25-n=0\)

Still not 100% correct

What incorrect assumption about distribution of \(T\) under \(H_1\) is made in previous plot?

Word of the day – SPECIAL EDITION!

A noble spirit embiggens the smallest man

Two arms, Normal with constant (equal) variance

Collect \(Y_1,\ldots,Y_{n_A} \sim N(\mu_A,\sigma^2)\) from \(n_A\) iid subjects
Independently, collect \(X_1,\ldots,X_{n_B} \sim N(\mu_B,\sigma^2)\) from \(n_B\) iid subjects
\(H_0\): \(\mu_A-\mu_B = 0\)
\(H_1\): \(\mu_A-\mu_B = \delta\)
Estimate \(\mu_A\) and \(\mu_B\) with \(\bar Y = \sum_i Y_i/n_A\) and \(\bar X = \sum_i X_i/n_B\), respectively

Two arms, Normal with constant (equal) variance

\(T = \dfrac{\bar Y - \bar X}{\sigma\sqrt{1/n_A + 1/n_B}}\)
\(\sigma\sqrt{1/n_A + 1/n_B}\) is standard error of \(\bar Y - \bar X\).

Two arms, Normal with constant (equal) variance

\[ \begin{aligned} 1 - \beta &= \Pr(\text{Reject }H_0| H_1)\\ &= \Pr(T > z_{1-\alpha}|H_1)\\ &= \Pr\left(\dfrac{\bar Y - \bar X}{\sigma\sqrt{1/n_A + 1/n_B}} > z_{1-\alpha}\big|H_1\right)\\ &= \Pr\left(\dfrac{\bar Y - \bar X-\delta}{\sigma\sqrt{1/n_A + 1/n_B}} > z_{1-\alpha} - \dfrac{\delta}{\sigma\sqrt{1/n_A + 1/n_B}}\big|H_1\right)\\ &= \Pr\left(Z > z_{1-\alpha} - \dfrac{\delta}{\sigma\sqrt{1/n_A + 1/n_B}}\big|H_1\right) \end{aligned} \]

Power curve; \(n_A+n_B=80\)

Two arms, Normal with constant (equal) variance

\[ \Rightarrow \dfrac{\delta}{\sigma\sqrt{1/n_A + 1/n_B}} = z_{1-\alpha} + z_{1-\beta} \]

Write \(n_B = r n_A\), so that \[ \begin{aligned} &=\dfrac{\delta}{\sigma\sqrt{1/n_A + 1/(r n_A)}} = z_{1-\alpha} + z_{1-\beta}\\ &=\dfrac{\delta}{(\sigma/\sqrt{n_A})\sqrt{1 + 1/r}} = z_{1-\alpha} + z_{1-\beta}\\ &\Rightarrow n_A = \dfrac{(z_{1-\alpha} + z_{1-\beta})^2}{(\delta/\sigma)^2}(1+1/r) \end{aligned} \]

More Simpsons…

Two arms, Normal with non-constant variance

Collect \(Y_1,\ldots,Y_{n_A} \sim N(\mu_A,\sigma_A^2)\) from \(n_A\) iid subjects
Independently, collect \(X_1,\ldots,X_{n_B} \sim N(\mu_B,\sigma_B^2)\) from \(n_B\) iid subjects
\(H_0\): \(\mu_A-\mu_B = 0\)
\(H_1\): \(\mu_A-\mu_B = \delta\)
Estimate \(\mu_A\) and \(\mu_B\) with \(\bar Y = \sum_i Y_i/n_A\) and \(\bar X = \sum_i X_i/n_B\), respectively

Two arms, Normal with non-constant variance

\(T = \dfrac{\bar Y - \bar X}{\sqrt{\sigma_A^2/n_A + \sigma_B^2/n_B}}\)
\(\sqrt{\sigma_A^2/n_A + \sigma_B^2/n_B}\) is standard error of \(\bar Y - \bar X\).

Two arms, Normal with non-constant variance

…algebra…
\(n_A = \dfrac{(z_{1-\alpha} + z_{1-\beta} )^2}{\delta^2/(\sigma^2_A + \sigma^2_B/r)}\)
\(n_B = r n_A\)

Two arms, Normal, paired with constant variance

Collect \(Y_{ij},\ldots,Y_{nj} \sim N(\mu_j,\sigma^2)\) from \(i=1,\ldots,n\) subjects at \(j=1,2\) time points
\(\text{Cor}(Y_{i1}, Y_{i2})=\rho\)
\(H_0\): \(\mu_1-\mu_2 = 0\)
\(H_1\): \(\mu_1-\mu_2 = \delta\)
Estimate \(\mu_1\) and \(\mu_2\) with \(\bar Y_1 = \sum_i Y_{i1}/n\) and \(\bar Y_2 = \sum_i Y_{i2}/n\), respectively

Two arms, Normal, paired with constant variance

Recognize that \(Y_{i1} - Y_{i2}\sim N(\mu_1-\mu_2,2\sigma^2[1-\rho])\) (iid)
\(n = \dfrac{(z_{1-\alpha} + z_{1-\beta} )^2}{\delta^2/(2\sigma^2[1-\rho])}\)

Summary of Normal formulae

Description	Formula
One arm; constant variance	\(n = \dfrac{(z_{1-\alpha} + z_{1-\beta})^2}{(\delta/\sigma)^2}\)
Two arms; constant variance	\(n_A = (1+1/r)\dfrac{(z_{1-\alpha} + z_{1-\beta})^2}{(\delta/\sigma)^2}; \quad n_B = rn_A\)
Two arms; non-constant variance	\(n_A = \dfrac{(z_{1-\alpha} + z_{1-\beta})^2}{\delta^2/ (\sigma_A^2+\sigma_B^2/r)}; \quad n_B = rn_A\)
One arm; two observations; constant variance	\(n = \dfrac{(z_{1-\alpha} + z_{1-\beta} )^2}{\delta^2/(2\sigma^2[1-\rho])}\)

Word of the day!

Can use as approximations for binary outcomes

\(Y_1,\ldots,Y_{n_A} \sim Bin(p_A)\) from \(n_A\) iid subjects
\(X_1,\ldots,Y_{n_B} \sim Bin(p_B)\) from \(n_B\) iid subjects
\(\delta = p_A - p_B\)
\(\sigma_A^2 \equiv p_A(1-p_A)\); \(\sigma_B^2\equiv p_B(1-p_B)\) (Bernoulli variances)

Non-inferiority designs

Suppose not interested in proving that experimental treatment is better than standard of care but only that it's not too much worse. When might this happen?
- Experimental treatment is less toxic
- Experimental treatment is easier to administer
- Experimental treatment is significantly cheaper
- Reduced dosage of standard of care

Superiority versus non-inferiority

Superiority trials ask "Is experimental treatment better than standard of care?"
Frequentist hypothesis testing defaults to \(H_0\) in absence of evidence
In superiority context, \(H_0:\delta = 0\), underpowered trial would be wasteful and unethical but would generally not carry foward inferior therapies
Applying same \(H_0\) to non-inferiority question, underpowered trial would be wasteful and unethical and carry forward potentially inferior therapies

Superiority versus non-inferiority

Figure 2, (Antman, 2001)

Two arms, Normal, non-inferiority

Collect \(Y_1,\ldots,Y_{n_A} \sim N(\mu_A,\sigma^2)\) from \(n_A\) iid subjects (experimental therapy)
Independently, collect \(X_1,\ldots,X_{n_B} \sim N(\mu_B,\sigma^2)\) from \(n_B\) iid subjects (standard of care)
Want to establish that difference in average response is no more than \(\delta\) (in favor of standard Arm B). Why?
Highlights why we "fail to reject \(H_0\)" rather than "accept \(H_0\)"

Two arms, Normal, non-inferiority

\(H_0\): \(\mu_A-\mu_B = -\delta\)
\(H_1\): \(\mu_A-\mu_B = 0\)
\(T = \dfrac{\bar Y - \bar X - \delta}{\sigma\sqrt{1/n_A + 1/n_B}}\)
Yields \(\dfrac{\delta}{\sigma\sqrt{1/n_A + 1/n_B}} = z_{1-\alpha} + z_{1-\beta}\) (same as before)
Real challenge in non-inferiority trial design is not directly related to sample size calculation but coming up with suitable \(\delta\)

References

Antman, E.M. (2001) Clinical trials in cardiovascular medicine. Circulation, 103, e101–e104.