Do events occur at higher rate in arm B than arm A?
\(\lambda_B(t)/\lambda_A(t)>1\Leftrightarrow\log\lambda_B(t)-\log\lambda_A(t)>0\)
"Isavuconazole versus voriconazole for primary treatment of invasive mould disease caused by Aspergillus and other filamentous fungi (SECURE): a phase 3, randomised-controlled, non-inferiority trial" (Maertens et al., 2016)
https://www.ncbi.nlm.nih.gov/pubmed/26684607
Novel antifungal (isavuconazole): want to demonstrate similar efficacy but improved safety (fewer adverse events) compared to voriconazole
"1:1 allocation"; "Roughly 255 patients"; "80% power"; "upper limit of 95% CI"; "treatment difference of 10% or less (prespecified non-inferiority margin)"; "20% mortality rate was assumed for both drugs"
#Per-arm sample size approximately matches ceiling(2*(qnorm(0.975) + qnorm(0.8))^2/(0.10^2)*(0.2*0.8));
## [1] 252
"Tenofovir alafenamide versus tenofovir disoproxil fumarate, coformulated with elvitegravir, cobicistat, and emtricitabine, for initial treatment of HIV-1 infection: two randomised, double-blind, phase 3, non-inferiority trials" (Sax et al., 2015)
https://www.ncbi.nlm.nih.gov/pubmed/25890673
Study of anti-retroviral treatment in untreated HIV-infected patients. Standard formulation of tenofovir causes substantial renal (kidney) toxicity. Two separate studies
"two-sided 95% CI"; "840 patients in each study"; "at least 95% power"; "overall response rate of 85% for viral suppression at week 48"; "with a prespecified noninferiority margin of 12%"
#Expect close to 420 ceiling(2*(qnorm(0.975) + qnorm(0.95))^2/(0.12^2)*(0.15*0.85));
## [1] 231
#Closer to their answer if we assume maximum binomial variance? ceiling(2*(qnorm(0.975) + qnorm(0.95))^2/(0.12^2)*(0.50*0.50));
## [1] 452
na = nb = 420;#Stated sample size nsim = 2e4; true_pa = 0.85; true_pb = 0.85; noninf_boundary = 0.12; sim_arma = rbinom(nsim,na,true_pa); sim_armb = rbinom(nsim,nb,true_pb); mean_pa = sim_arma / na; mean_pb = sim_armb / nb; #Power mean(((mean_pa - mean_pb) - qnorm(0.975) * sqrt(mean_pa * (1 - mean_pa) / na + mean_pb * (1 - mean_pb) / nb)) > (-noninf_boundary));
## [1] 0.998
#Avg width of CI mean(2 * qnorm(0.975) * sqrt(mean_pa * (1 - mean_pa) / na + mean_pb * (1 - mean_pb) / nb));
## [1] 0.0965
na = nb = 230;#Our calculation nsim = 2e4; true_pa = 0.85; true_pb = 0.85; noninf_boundary = 0.12; sim_arma = rbinom(nsim,na,true_pa); sim_armb = rbinom(nsim,nb,true_pb); mean_pa = sim_arma / na; mean_pb = sim_armb / nb; #Power mean(((mean_pa - mean_pb) - qnorm(0.975) * sqrt(mean_pa * (1 - mean_pa) / na + mean_pb * (1 - mean_pb) / nb)) > (-noninf_boundary));
## [1] 0.952
#Avg width of CI mean(2 * qnorm(0.975) * sqrt(mean_pa * (1 - mean_pa) / na + mean_pb * (1 - mean_pb) / nb));
## [1] 0.13
Calculate Wald-type test statistic
Find its distribution under alternative hypothesis, often a location-shifted Normal
Solve for \(n\)
We've derived (many) Normal-based sample size formulas
Often, outcomes will be yes/no or time-to-event
Simplest strategy for sample size calculations:
Not all patients will experience the outcome of interest and so are not equally informative
Question is not "how many patients do I need?" but "how much information do I need?" or "how many events do I need?"
Using central limit theorem, sample size formulas we've already studied provide basis for suitable answers
Usually denoted by \(\lambda(t)\), hazard is rate of occurence; instantaneous probability
Mathematically, \[ \begin{aligned} \lambda(t) &= \lim_{\epsilon\rightarrow 0} \dfrac{ \Pr(t<X<t+\epsilon|X>t)}{\epsilon}\\ &= \dfrac{1}{\Pr(X>t)} \lim_{\epsilon\rightarrow 0} \dfrac{\Pr(X>t) - \Pr(X>t+\epsilon)}{\epsilon}\\ &= \dfrac{1}{\Pr(X>t)}\left(-\dfrac{d}{dt} \Pr(X>t)\right)\\ &= \dfrac{f(t)}{\Pr(X>t)} \end{aligned} \]
\(\lambda(t)>0\). Larger values indicate events, i.e. death, disease progression, recurrence, exacerbation of symptoms, occurring at higher rate
Do events occur at higher rate in arm B than arm A?
\(\lambda_B(t)/\lambda_A(t)>1\Leftrightarrow\log\lambda_B(t)-\log\lambda_A(t)>0\)
Assume complete follow-up, time-independent hazards, i.e. \(\lambda_A(t)\equiv\lambda_A\)
Collect \(W_1,\ldots,W_{n_B} \sim \text{Exp}(\lambda_B)\) from \(n_B\) iid subjects (standard of care/control/placebo)
\(I(\lambda_A)\) is expected Fisher information from one subject for estimating \(\lambda_A\).
\((\hat\lambda_A - \lambda_A)/(\lambda_A/\sqrt{n_A})\overset{\cdot}{\sim} N(0,1)\)
set.seed(1); lambda = 0.025; base_draws = matrix(rexp(100 * 1e4), nrow = 1e4); nsamp = 5; ggplot() + geom_histogram(aes(x = ( 1/rowMeans(base_draws[,1:nsamp,drop=F]/lambda) - lambda)/ (lambda / sqrt(nsamp)), y = ..density..), bins=30) + stat_function(aes(x = c(-2,2)), fun = dnorm, n = 301, args = list(mean = 0, sd = 1)) + theme(text = element_text(size = 14)) + labs(x = "n = 5", y = "");
\[ \begin{aligned} -\log \hat\lambda_A = \log \bar V &\overset{\cdot}{\sim}N\left(-\log \lambda_A,E(V_1)^2\dfrac{\lambda_A^2}{n_A}\right)\\ &\overset{d}{=} N(-\log\lambda_A,1/n_A) \end{aligned} \] Similarly, \(\log \bar W \overset{\cdot}{\sim} N(-\log\lambda_B,1/n_B)\)
\(T = \dfrac{\log \bar V - \log \bar W}{\sqrt{1/n_A + 1/n_B}}\)
Let \(n_B = rn_A\)
Under \(H_1\), \(T\sim N\left(\delta\sqrt{n_A}/\sqrt{1 + 1/r},1\right)\)
\(n_B = r n_A\)
Sample size formulae assumes that event time is measured for all subjects. Usually cannot count on this:
Trial may end before all subjects experience event ("administrative censoring")
Subjects will not enroll instantaneously ("staggered enrollment")
Subjects may be lost to follow-up before experiencing event ("lost to follow-up")
Likelihood contribution from individual \(i\) in arm A is \(\lambda_A \exp\{-\lambda_A R_i \}\) if \(V_i < t_0\) (exponential density) and \(\Pr(V_i > t_0)= \exp\{-\lambda_A t_0\}\) if \(V_i \geq t_0\)
Use this to show that MLE for hazard is \(\hat\lambda_A = \dfrac{\sum_i 1_{[V_i<t_0]}}{\sum_i R_i}\) (Observed events divided by total follow-up)
(Expected) Fisher information is \((1-\exp\{-\lambda_A t_0\})/\lambda_A^2\), so that \[ \hat\lambda_A \overset{\cdot}{\sim} N\left(\lambda_A, \dfrac{1}{n_A}\dfrac{\lambda_A^2}{1-\exp\{-\lambda_A t_0\}} \right) \]
Applying Delta method,
\[ \begin{aligned} -\log\hat\lambda_A &\overset{\cdot}{\sim} N\left(-\log\lambda_A,\left(\dfrac{1}{\lambda_A}\right)^2\dfrac{1}{n_A}\dfrac{\lambda_A^2}{1-\exp\{-\lambda_A t_0\}} \right)\\ &\overset{d}{=} N\left(-\log\lambda_A,\dfrac{1}{n_A}\dfrac{1}{1-\exp\{-\lambda_A t_0\}} \right) \end{aligned} \]
Compare to no-censoring case: asymptotic variance of \(\log\hat\lambda_A\) multiplicatively increases by \(\left(1-\exp\{-\lambda_A t_0\}\right)^{-1}\), i.e. inverse of probability of arm A subject experiencing event.
As \(t_0\) increases, amount of censoring […]?
Under \(H_0\), \(T\sim N(0,1)\)
\(1- \exp\{-\lambda_A t_0\} = \Pr(V_i < t_0)\)
\[ \begin{aligned} n_A &= \dfrac{(z_{1-\alpha} + z_{1-\beta} )^2}{\delta^2}\times\\ &\quad([1- \exp\{-\lambda_A t_0\}]^{-1} + [1- \exp\{-\lambda_B t_0\}]^{-1}/r) \end{aligned} \]
For simplicity, may be reasonable to assume
\[ ([1- \exp\{-\lambda_A t_0\}]^{-1} + [1- \exp\{-\lambda_B t_0\}]^{-1}/r)\approx (1+1/r)/\pi, \]
where
\[ \pi = \dfrac{1}{1+r}\left(1 - \exp\{-\lambda_A t_0\}\right) + \dfrac{r}{1+r}\left(1-\exp\{-\lambda_B t_0\}\right) \]
so that
\[ \begin{aligned} n_A &= \dfrac{(z_{1-\alpha} + z_{1-\beta} )^2}{\pi\delta^2}(1 + 1/r) \end{aligned} \]
\(\pi\) is crude approximation for average probability of observing event in single subject under \(H_1\)
If \(\pi\) is small, more subjects are required
Denote \(\pi^{-1}\) by \(\text{IF}\), which always exceeds 1 and stands for "inflation factor".
ceiling(2*((qnorm(0.975) + qnorm(0.90))^2/(0.14^2)) * 0.5^2);
## [1] 269
Reference: "Efficacy and safety of belimumab in patients with active systemic lupus erythematosus: a randomised, placebo-controlled, phase 3 trial" (Navarra et al., 2011)
Assumption 1: Censoring mechanism is independent of event mechanism
Assumption 2: Hazard is constant (will discuss this soon)
Apply appropriate Normal-based sample size formula (tells you how many events needed)
Identify estimate of \(\text{IF}=\pi^{-1}\)
Multiplicatively increase (1) by (2) (tells you how many subjects to enroll to get needed events`)
\[ \begin{aligned} \dfrac{1}{n_A}\dfrac{\sum_i \varepsilon_i}{\hat\lambda_A^2}\rightarrow_P \dfrac{\Pr(V_i < C_i)}{\lambda_A^2} \equiv \dfrac{\Pr(\text{observing event})}{\lambda_A^2} \end{aligned} \]
\[ \begin{aligned} \dfrac{1}{n_A}\dfrac{\sum_i \varepsilon_i}{\hat\lambda_A^2} = \dfrac{1}{n_A}\dfrac{\sum_i R_i}{\hat\lambda_A} \rightarrow_P \dfrac{E[R_i]}{\lambda_A} \end{aligned} \]
Patients enroll throughout the trial (rather than time zero), \({E_{iA}}\sim \text{Unif}(0,t_0)\), \({E_{iB}}\sim \text{Unif}(0,t_0)\)
Data are \(R_i = \min\{V_i, t_0-{E_{iA}}\} \overset{d}{=}\min\{V_i, {E_{iA}}\}\) and \(S_i =\min\{W_i,t_0-{E_{iB}}\}\overset{d}{=}\min\{W_i,{E_{iB}}\}\)
Expected follow-up time (length of "stick") in arm A is \[ \begin{aligned} E[R_i] &= \dfrac{t_0\lambda_A - (1 - \exp\{-\lambda_A t_0\})}{t_0\lambda_A^2} \end{aligned} \] and probability of observing event is \[ \begin{aligned} \Pr(\varepsilon_i = 1) = \Pr(V_i < t_0 - E_{iA}) = \dfrac{t_0\lambda_A - (1 - \exp\{-\lambda_A t_0\})}{t_0\lambda_A} \end{aligned} \]
Similar calculation for Arm B, then calculate \(\pi\) as before:
\[ \begin{aligned} \pi = \dfrac{1}{1+r}\Pr(V_i < t_0 - E_{iA}) + \dfrac{r}{1+r}\Pr(W_i < t_0 - E_{iB}) \end{aligned} \]
Expected follow-up time in arm A \[ \begin{aligned} E[R_i] &= \dfrac{t_0\lambda_A - \exp\{-\lambda_A(t_1-t_0)\}(1 - \exp\{-\lambda_A t_0\})}{t_0\lambda_A^2} \end{aligned} \] and probability of observing event is \[ \begin{aligned} \Pr(V_i< t_1-E_{iA}) = \dfrac{t_0\lambda_A - \exp\{-\lambda_A(t_1-t_0)\}(1 - \exp\{-\lambda_A t_0\})}{t_0\lambda_A} \end{aligned} \]
https://www.ncbi.nlm.nih.gov/pubmed/22735384
BRAF-inhibitor (arm A) versus chemotherapy (arm B).
Wanted 3:1 randomization ratio in favor of BRAF-inhibitor arm: \(n_B = (1/3)n_A\), or \(r=1/3\)
Median time to progression or death is 2 months on chemotherapy. Expect increase to 6 months on dabrafenib. For exponential distribution, \(\text{median} = \log(2)/\lambda\). So aim to establish
\[ \delta = \log \lambda_B - \log \lambda_A = \log(\log(2)/2) - \log(\log(2)/6) = \log(3) \]
r=1/3; lambdaA = log(2)/6; lambdaB = log(2)/2; #na (na = ceiling((1+1/r)*(qnorm(0.98) + qnorm(0.997))^2/log(lambdaB/lambdaA)^2));
## [1] 77
#nb (nb = ceiling(r * na));
## [1] 26
Planned to enroll and randomize 200 patients (implies \(\pi = 102/200=0.51\) and \(\text{IF}=1.96\))
Enrollment period from Dec 23, 2010 to Sept 1, 2011 (\(t_0=8.25\) months)
Administratively censored on Dec 19, 2011 (\(t_1-t_0=3.5\) months)
t0 = 8.25;t1 = t0 + 3.5; piA = (t0*lambdaA - exp(-lambdaA*(t1-t0))* (1-exp(-t0*lambdaA)))/(t0*lambdaA); piB = (t0*lambdaB - exp(-lambdaB*(t1-t0))* (1-exp(-t0*lambdaB)))/(t0*lambdaB); pi = piA/(1+r) + piB*r/(1+r); 1/pi;#IF
## [1] 1.53
https://www.ncbi.nlm.nih.gov/pubmed/24439929
Treatment for non-small cell lung cancer.
Testing for \(\delta = \log(1.57)\) (median survival of 7 months in arm A versus 11 months in arm B)
Targeting power \(1-\beta = 0.90\); type I error rate \(\alpha = 0.025\).
r=1/2; lambdaA = log(2)/11; lambdaB = log(2)/7; #na (na = ceiling((1+1/r)*(qnorm(0.975) + qnorm(0.90))^2/log(lambdaB/lambdaA)^2));
## [1] 155
#nb (nb = ceiling(r * na));
## [1] 78
Planned to enroll 330 patients (implies \(\text{IF}=330/217=1.52\))
Enrollment period from April 27, 2010 to Nov 16, 2011 (\(t_0=18.5\) months)
Administratively censored on Oct 29, 2012 (\(t_1-t_0=11.5\) months)
t0 = 18.5;t1 = t0 + 11.5; piA = (t0*lambdaA - exp(-lambdaA*(t1-t0))* (1-exp(-t0*lambdaA)))/(t0*lambdaA); piB = (t0*lambdaB - exp(-lambdaB*(t1-t0))* (1-exp(-t0*lambdaB)))/(t0*lambdaB); pi = piA/(1+r) + piB*r/(1+r); 1/pi;#IF
## [1] 1.32
Why were our calculated IFs smaller than papers' IFs? One possible reason: subjects will drop out of studies with long follow-up (for reasons unrelated to outcome), e.g. \(F_{iA}\sim \text{Exp}(\eta)\)
\[ \begin{aligned} R_i &= \min\{V_i, {F_{iA}}, t_1-{E_{iA}}\}\\ &\overset{d}{=}\min\{\min\{V_i,{F_{iA}}\}, t_1-{E_{iA}}\}\\ &\overset{d}{=}\min\{V^*_i, t_1-{E_{iA}}\} \end{aligned} \]
What is distribution of \(V^*_i\)?
\[ \begin{aligned} \Pr(V^*_i > u) &= \Pr(V_i>u, F_{iA} >u)\\ &= \exp\{-\lambda_A u\} \exp\{-\eta u\}\\ &= \exp\{-(\lambda_A+\eta) u\}\\ \Rightarrow V^*_i&\sim\text{Exp}(\lambda_A+\eta) \end{aligned} \]
Expected follow-up time in arm A is \[ \begin{aligned} E[R_i] &= \dfrac{t_0(\lambda_A+\eta) - \exp\{-(\lambda_A+\eta)(t_1-t_0)\}(1 - \exp\{-(\lambda_A+\eta) t_0\})}{t_0(\lambda_A+\eta)^2} \end{aligned} \]
and probability of observing event is
\[ \begin{aligned} &\Pr(V_i< \min\{F_{iA}, t_1- E_{iA}\}) =\\ &=\dfrac{\lambda_A\left[t_0(\lambda_A+\eta) - \exp\{-(\lambda_A+\eta)(t_1-t_0)\}(1 - \exp\{-(\lambda_A+\eta) t_0\})\right]}{t_0(\lambda_A+\eta)^2} \end{aligned} \]
So far, have assumed \(\lambda_A(t)\equiv \lambda_A\) (likewise for \(\lambda_B\)).
Can relax this to assume that hazards change over time but remain proportional:
Very common test for equality of survival curves (or hazards) is non-parametric "log-rank test"
Assuming \(H_0\), i.e. equality of curves, and conditional on data, each event should occur with probability proportional to the number at risk (uncensored and free of event) at that time.
Let \(j=1,\ldots,\pi(n_A+n_B)\) index the number of events observed out of \(n_A+n_B\) total patients
If \(n_{Aj}\) arm A subjects and \(n_{Bj}\) arm B subjects at risk at time \(t_j\) (\(n_j = n_{Aj} + n_{Bj}\)), then, knowing that subject will experience event at \(t_j\), expect \(p_{Aj}=n_{Aj}/n_j\) events in arm A and \(p_{Bj}=n_{Bj}/n_j\) events in arm B.
We observe event in arm B (suppose), so discrepancy between observed and expected at time \(j\) is \(O_{Bj} - p_{Bj} = 1 - n_{Bj}/n_j= n_{Aj}/n_j\)
Variance of this discrepancy is \(V_j = p_{Bj}(1-p_{Bj})\)
\[ \begin{aligned} T = \dfrac{\sum_j O_{Bj} - p_{Bj}}{\sqrt{\sum_j p_{Bj}(1-p_{Bj}}} \end{aligned} \]
Under \(H_1\),
\[ \begin{aligned} E[O_{Bj} - p_{Bj}] &= \dfrac{\lambda_B(t_j) n_{Bj}}{\lambda_A(t) n_{Aj} + \lambda_B(t_j) n_{Bj}} - \dfrac{ n_{Bj}}{n_{Aj} + n_{Bj}}\\ &= \dfrac{e^\delta n_{Bj}}{ n_{Aj} + e^\delta n_{Bj}}- \dfrac{ n_{Bj}}{n_{Aj} + n_{Bj}}\\ &= \text{...algebra...}\\ &= \left(\dfrac{e^\delta - 1}{n_{Aj} + n_{Bj} e^\delta}\right)\left(\dfrac{n_{Aj}n_{Bj}}{n_{Aj}+n_{Bj}}\right)\\ &\approx \left(\dfrac{\delta}{n_{Aj} + n_{Bj}}\right)\left(\dfrac{n_{Aj}n_{Bj}}{n_{Aj}+n_{Bj}}\right)\quad\text{(next slide)}\\ &=\delta p_{Bj}p_{Aj}=\delta p_{Bj}(1-p_{Bj}) \end{aligned} \]
\((n_{Aj} + n_{Bj})\left(\dfrac{e^\delta - 1}{n_{Aj} + n_{Bj} e^\delta}\right)\) versus \(\delta\) for \(n_{Aj}=n_{Bj}=50\)
Asymptotically, under \(H_1\),
\[ \begin{aligned} E[T] &\approx \delta \dfrac{\sum_j p_{Bj}(1-p_{Bj}) }{\sqrt{\sum_j p_{Bj}(1-p_{Bj}) }} \\ &= \delta \sqrt{\sum_j p_{Bj}(1-p_{Bj})}\\ &\approx \delta \sqrt{\pi (n_A+n_B)\dfrac{n_An_B}{(n_A+n_B)^2}}\quad\left(\text{using } p_{Bj}\approx n_B/(n_A+n_B)\right)\\ &= \delta \sqrt{\dfrac{\pi n_A}{1+1/r}}\\ \end{aligned} \]
Sample size formula is identical (!):
\(n_B = r n_A\)
(Schoenfeld, 1981), (Schoenfeld, 1983)
Our approach emphasizes difference between number of events required and number of subjects to enroll in order to observe events.
Can be difficult to re-create others' power calculations. Important to provide details on assumptions used to calculate power/sample size
Simulation studies to confirm, explore power characteristics very useful
Implict assumption is that censoring mechanism is (i) unrelated to event mechanism and (ii) non-differential across arms
See Chapter 4 (Cook and DeMets, 2007)
Cook, T.D. and DeMets, D.L. (2007) Introduction to Statistical Methods for Clinical Trials. CRC Press.
Hauschild, A., Grob, J.-J., Demidov, L.V., Jouary, T., Gutzmer, R., Millward, M., et al. (2012) Dabrafenib in braf-mutated metastatic melanoma: A multicentre, open-label, phase 3 randomised controlled trial. The Lancet, 380, 358–365.
Maertens, J.A., Raad, I.I., Marr, K.A., Patterson, T.F., Kontoyiannis, D.P., Cornely, O.A., et al. (2016) Isavuconazole versus voriconazole for primary treatment of invasive mould disease caused by aspergillus and other filamentous fungi (secure): A phase 3, randomised-controlled, non-inferiority trial. The Lancet, 387, 760–769.
Sax, P.E., Wohl, D., Yin, M.T., Post, F., DeJesus, E., Saag, M., et al. (2015) Tenofovir alafenamide versus tenofovir disoproxil fumarate, coformulated with elvitegravir, cobicistat, and emtricitabine, for initial treatment of hiv-1 infection: Two randomised, double-blind, phase 3, non-inferiority trials. The Lancet, 385, 2606–2615.
Schoenfeld, D. (1981) The asymptotic properties of nonparametric tests for comparing survival distributions. Biometrika, 68, 316–319.
Schoenfeld, D.A. (1983) Sample-size formula for the proportional-hazards regression model. Biometrics, 499–503.
Wu, Y.-L., Zhou, C., Hu, C.-P., Feng, J., Lu, S., Huang, Y., et al. (2014) Afatinib versus cisplatin plus gemcitabine for first-line treatment of asian patients with advanced non-small-cell lung cancer harbouring egfr mutations (lux-lung 6): An open-label, randomised phase 3 trial. The lancet oncology, 15, 213–222.