Distinguishing feature of later-phase clinical trials from nearly all other forms of clinical research
Generally understood to minimize different types of "bias".
Variability due to randomization not always appreciated
Distinguishing feature of later-phase clinical trials from nearly all other forms of clinical research
Generally understood to minimize different types of "bias".
Variability due to randomization not always appreciated
Patient \(i\) has two "potential" outcomes \(O_i(A)\) when \(T_i=A\) and \(O_i(B)\) when \(T_i=B\)
Observe \(O_i(A)\) if assigned to arm \(A\), and observe \(O_i(B)\) if assigned to arm \(B\).
Only ever observe one outcome
Want to measure individual treatment effect, \(O_i(A)-O_i(B)\). Is this possible?
Can potentially estimate population-average effect conditional on treatment status: \(E[O_i(A)|T_i=A]-E[O_i(B)|T_i=B]\)
Then assume \(E[O_i(A)|T_i=A]-E[O_i(B)|T_i=B] = E[O_i(A)-O_i(B)]\)
Use as basis to conclude (or not) that treatment has differential effect on outcome
Assumption is that subject's outcome on specific arm is independent of event of being on that arm
Randomization ensures this independence: only systematic difference between arm \(A\) and arm \(B\) is treatment assignment
There exists large, target population (population to infer about)
Randomly sample from this population, i.e.
One sample is set of patients taking treatment \(A\) (arm \(A\))
Other is set of patients taking treatment \(B\) (arm \(B\))
Specific treatment assignments can be made arbitrarily. Is inference, e.g. two-sample \(t\)-test, valid?
Obtain non-probability sample from target population.
Randomize sample to arm \(A\), \(B\). Is inference valid?
Statistical tests (\(t\)-test, permutation test, regression) will be valid as tests of randomization
but inference to larger target population will be unverifiable assumption
Arguably more realistic framework for clinical trials
\(T\) binary treatment
\(O\) outcome
\(U\) unknown variables or risk factors
Simple random sample from target population
Randomization properly applied
\(T\)-\(O\) association is causal effect of \(T\) on \(O\)
\[ \begin{aligned} E[O_i(A)] &= E[\alpha + \delta + U_i]\\ &= \alpha + \delta + E[U_i] \end{aligned} \]
\[ \begin{aligned} E[O_i(A)|T_i=A] &= E[\alpha + \delta + U_i|T_i=A]\\ &= \alpha + \delta + E[U_i|T_i=A] \end{aligned} \]
Similarly, \(E[O_i(B)] =\alpha + E[U_i]\), and \(E[O_i(B)|T_i=B]=\alpha + E[U_i|T_i=B]\)
So, population-average (treatment) effect is \[ \begin{aligned} E[O_i(A) - O_i(B)] &= \delta \end{aligned} \] and population-average effect conditional on treatment status is \[ \begin{aligned} &E[O_i(A)|T_i=A] - E[O_i(B)|T_i=B] \\ &\quad = \delta + E[U_i|T_i=A] - E[U_i|T_i=B] \end{aligned} \]
These are equal when \(U_i\perp T_i\)
n_pop = 2e4;#size of target population alpha = 0; delta = 0.5; U = rnorm(n_pop); OA = alpha + delta + U;#potential outcome A OB = alpha + U;#potential outcome B
n_sim = 1e4; (n_subj = 2*ceiling(2*(qnorm(.975)+qnorm(.9))^2/delta^2));
## [1] 170
sim_id = rep(1:n_sim,each=n_subj); samp_id = sample(n_pop,n_sim*n_subj,replace=T) trt_id = rbinom(n_sim*n_subj,1,0.5); O = OA[samp_id]*(trt_id==1) + OB[samp_id]*(trt_id==0); arm_means = tapply(O,list(sim_id,trt_id),mean);
head(arm_means);
## 0 1 ## 1 -0.0464 0.439 ## 2 0.1411 0.436 ## 3 -0.0624 0.389 ## 4 -0.0173 0.518 ## 5 -0.0380 0.620 ## 6 0.1281 0.395
summary(arm_means[,2] - arm_means[,1]);
## Min. 1st Qu. Median Mean 3rd Qu. Max. ## -0.041 0.392 0.498 0.498 0.603 1.060
arm_vars = tapply(O,list(sim_id,trt_id),var); arm_size = tapply(O,list(sim_id,trt_id),length); test_stat = (arm_means[,2] - arm_means[,1]) / (sqrt((arm_vars[,1]*(arm_size[,1]-1) + arm_vars[,2]*(arm_size[,2]-1)) / (arm_size[,1]+arm_size[,2]-2))*sqrt(1/arm_size[,1]+1/arm_size[,2])); #summary of estimated effect sizes across simulations summary(test_stat);
## Min. 1st Qu. Median Mean 3rd Qu. Max. ## -0.27 2.55 3.25 3.26 3.94 7.10
#rejection rate (power), nominally 0.90 mean(abs(test_stat)>qt(0.975,n_subj-2));
## [1] 0.892
Treatment blinding broken: patients with better prognosis more likely to be assigned to arm \(A\).
\(T\) and \(U\) have marginal association
\(T\)-\(O\) association exists directly and through unmeasured \(U\)
Potential outcomes remain the same
\(\Pr(T_i = A|U_i) = 1/(1+\exp\{-0.05\times U_i\})\):
\(\Pr(T_i = A|U_i=-1)=\) 0.488
\(\Pr(T_i = A|U_i=0)=0.5\)
\(\Pr(T_i = A|U_i=1)=\) 0.512
samp_id = sample(n_pop,n_sim*n_subj,replace=T) trt_id = rbinom(n_sim*n_subj,1,1/(1+exp(-0.05*U[samp_id]))); O = OA[samp_id]*(trt_id==1) + OB[samp_id]*(trt_id==0); arm_means = tapply(O,list(sim_id,trt_id),mean);
summary(tapply(U[samp_id],list(sim_id,trt_id),mean));
## 0 1 ## Min. :-0.446 Min. :-0.408 ## 1st Qu.:-0.103 1st Qu.:-0.052 ## Median :-0.030 Median : 0.021 ## Mean :-0.030 Mean : 0.020 ## 3rd Qu.: 0.042 3rd Qu.: 0.094 ## Max. : 0.449 Max. : 0.519
head(arm_means);
## 0 1 ## 1 0.0837 0.461 ## 2 -0.0377 0.356 ## 3 0.0142 0.495 ## 4 -0.1690 0.577 ## 5 0.0569 0.418 ## 6 0.1177 0.497
summary(arm_means[,2] - arm_means[,1]);
## Min. 1st Qu. Median Mean 3rd Qu. Max. ## -0.077 0.446 0.549 0.550 0.654 1.150
arm_vars = tapply(O,list(sim_id,trt_id),var); arm_size = tapply(O,list(sim_id,trt_id),length); test_stat = (arm_means[,2] - arm_means[,1]) / (sqrt((arm_vars[,1]*(arm_size[,1]-1)+arm_vars[,2]*(arm_size[,2]-1)) / (arm_size[,1]+arm_size[,2]-2))*sqrt(1/arm_size[,1]+1/arm_size[,2])); #summary of estimated effect sizes across simulations summary(test_stat);
## Min. 1st Qu. Median Mean 3rd Qu. Max. ## -0.46 2.91 3.58 3.60 4.29 7.98
#rejection rate (power), nominally 0.90 mean(abs(test_stat)>qt(0.975,n_subj-2));
## [1] 0.946
No direct \(T\)-\(O\) association, i.e. \(\delta=0\)
All other settings same as in Simulation 2
\(T\)-\(O\) association exists through unmeasured \(U\), even though no direct (i.e. causal) relationship
So, population-average effect is \[ E[O_i(A) - O_i(B)] = 0 \] and population-average effect conditional on treatment status is \[ \begin{aligned} &E[O_i(A)|T_i=A] - E[O_i(B)|T_i=B] \\ &\quad = E[U_i|T_i=A] - E[U_i|T_i=B] \end{aligned} \]
OA = OB; O = OA[samp_id]*(trt_id==1) + OB[samp_id]*(trt_id==0); arm_means = tapply(O,list(sim_id,trt_id),mean);
summary(tapply(U[samp_id],list(sim_id,trt_id),mean));
## 0 1 ## Min. :-0.446 Min. :-0.408 ## 1st Qu.:-0.103 1st Qu.:-0.052 ## Median :-0.030 Median : 0.021 ## Mean :-0.030 Mean : 0.020 ## 3rd Qu.: 0.042 3rd Qu.: 0.094 ## Max. : 0.449 Max. : 0.519
head(arm_means);
## 0 1 ## 1 0.0837 -0.03885 ## 2 -0.0377 -0.14398 ## 3 0.0142 -0.00522 ## 4 -0.1690 0.07693 ## 5 0.0569 -0.08214 ## 6 0.1177 -0.00342
summary(arm_means[,2] - arm_means[,1]);
## Min. 1st Qu. Median Mean 3rd Qu. Max. ## -0.577 -0.054 0.048 0.050 0.154 0.651
arm_vars = tapply(O,list(sim_id,trt_id),var); arm_size = tapply(O,list(sim_id,trt_id),length); test_stat = (arm_means[,2] - arm_means[,1]) / (sqrt((arm_vars[,1]*(arm_size[,1]-1)+arm_vars[,2]*(arm_size[,2]-1)) / (arm_size[,1]+arm_size[,2]-2))*sqrt(1/arm_size[,1]+1/arm_size[,2])); #summary of estimated effect sizes across simulations summary(test_stat);
## Min. 1st Qu. Median Mean 3rd Qu. Max. ## -3.64 -0.36 0.32 0.33 1.01 4.52
#rejection rate (type 1 error), nominally 0.05 mean(abs(test_stat)>qt(0.975,n_subj-2));
## [1] 0.0631
Probability of sampling subject \(i\) varies unbeknownst to us, i.e. non-probability sampling
Sampled subjects have better average response than population (selection bias)
\(E_i\) indicates that subject \(i\) was sampled (enrolled)
In Simulations 1–3, \(\Pr(E_i=1)\propto\) (?)
Now, make \(\Pr(E_i=1|X_i)\propto X_i\sim\text{Unif}(0,1)\), \(X_i\) unobserved
Treatment effect is \(\delta=0.5\)
Model is \(O_i = \alpha + \delta 1_{[T_i=A]} + kU_i + X_i\)
Set \(k=\sqrt{11/12}\) so that \(\mathrm{var}(O_i|T_i)=1\) (sample size calculation still applies)
Causal \(T\)-\(O\) association exists
Conditioning on \(E\) breaks indirect association through \(X\)
Population-average treatment effect is \(E[O_i(A) - O_i(B)] = \delta\)
Conditional on treatment (and selection status), \[ \begin{aligned} &E[O_i(A)|T_i=A,E_i=1]\\ &\quad= \alpha + \delta + kE[U_i|T_i=A,E_i=1] + kE[X_i|T_i=A,E_i=1]\\ &\quad= \alpha + \delta+ kE[U_i] + kE[X_i|E_i=1]\\ \end{aligned} \]
Similar calculation for arm \(B\) gives \(E[O_i(A)|T_i=A,E_i=1] - E[O_i(B)|T_i=B,E_i=1] = \delta\)
delta = 0.5; X = runif(n_pop); samp_id = sample(n_pop,n_sim*n_subj,prob=X,replace=T) trt_id = rbinom(n_sim*n_subj,1,0.5); OA = alpha + delta + sqrt(11/12)*U + X; OB = alpha + sqrt(11/12)*U + X; O = OA[samp_id]*(trt_id==1) + OB[samp_id]*(trt_id==0); arm_means = tapply(O,list(sim_id,trt_id),mean);
head(arm_means);
## 0 1 ## 1 0.623 1.09 ## 2 0.692 1.09 ## 3 0.624 1.14 ## 4 0.726 1.22 ## 5 0.764 1.25 ## 6 0.626 1.08
summary(arm_means[,2] - arm_means[,1]);
## Min. 1st Qu. Median Mean 3rd Qu. Max. ## -0.129 0.399 0.501 0.501 0.603 1.030
arm_vars = tapply(O,list(sim_id,trt_id),var); arm_size = tapply(O,list(sim_id,trt_id),length); test_stat = (arm_means[,2] - arm_means[,1]) / (sqrt((arm_vars[,1]*(arm_size[,1]-1)+arm_vars[,2]*(arm_size[,2]-1)) / (arm_size[,1]+arm_size[,2]-2))*sqrt(1/arm_size[,1]+1/arm_size[,2])); #summary of estimated effect sizes across simulations summary(test_stat);
## Min. 1st Qu. Median Mean 3rd Qu. Max. ## -0.92 2.64 3.33 3.34 4.01 7.12
#power stays nominal, even with non-probability sample mean(abs(test_stat)>qt(0.975,n_subj-2));
## [1] 0.913
\[ \begin{aligned} O_i &= \alpha + 2\delta X_i 1_{[T_i=A]} + U_i\\ &= \alpha + 2\delta X_i^* + U_i\\ \end{aligned} \]
\(T\)-\(O\) association exists through \(X^*\)
Conditioning on \(E\) breaks indirect association through \(X\)
Conditional on treatment, \[ \begin{aligned} &E[O_i(A)|T_i=A,E_i=1]\\ &\quad= \alpha + 2\delta E[X_i|T_i=A,E_i=1] + E[U_i|T_i=A,E_i=1]\\ &\quad= \alpha + 2\delta E[X_i|E_i=1] + E[U_i]\\ \end{aligned} \]
Similar calculation for arm \(B\) gives \[ \begin{aligned} E[O_i(A)|T_i=A,E_i=1] - E[O_i(B)|T_i=B,E_i=1]\\ = 2\delta E[X_i|E_i=1] \end{aligned} \]
delta = 0.5; OA = alpha + 2*delta*X + U; OB = alpha + U; samp_id = sample(n_pop,n_sim*n_subj,prob=X,replace=T) O = OA[samp_id]*(trt_id==1) + OB[samp_id]*(trt_id==0); arm_means = tapply(O,list(sim_id,trt_id),mean);
head(arm_means);
## 0 1 ## 1 -0.06521 0.814 ## 2 0.02783 0.837 ## 3 -0.08994 0.634 ## 4 -0.01849 0.803 ## 5 0.07747 0.695 ## 6 0.00988 0.423
summary(arm_means[,2] - arm_means[,1]);
## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 0.081 0.563 0.666 0.667 0.771 1.230
arm_vars = tapply(O,list(sim_id,trt_id),var); arm_size = tapply(O,list(sim_id,trt_id),length); test_stat = (arm_means[,2] - arm_means[,1]) / (sqrt((arm_vars[,1]*(arm_size[,1]-1)+arm_vars[,2]*(arm_size[,2]-1)) / (arm_size[,1]+arm_size[,2]-2))*sqrt(1/arm_size[,1]+1/arm_size[,2])); #summary of estimated effect sizes across simulations summary(test_stat);
## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 0.50 3.64 4.31 4.32 5.00 9.21
#non-nominal power mean(abs(test_stat)>qt(0.975,n_subj-2));
## [1] 0.99
Label | Description of Violation | \(H_0\) | Source of Bias |
---|---|---|---|
1 | No Violations | F | No bias |
2 | Imperfect blinding | F | Confounding |
3 | Imperfect blinding | T | Confounding |
4 | Non-probability sample | F | No bias |
5 | Non-probability sample | F | Selection bias |
Hareyama et al. (2002); see also p144, Cook and DeMets (2007)
Study of low- versus high-dose radiation on invasive carcinoma
HDR group: 61 eligible patients with numerically odd birth month
LDR group: 71 eligible patients with numerically even birth month
No findings of differences between groups
Primary concern: patients may differ by birth month (confounding). More importantly, investigators could deduce treatment group before inviting patient to enroll (selection bias)
Chalmers et al. (1977); see also p70, Friedman, Furberg and DeMets (2010)
Meta-analysis (study of studies) of anticoagulant therapies for myocardial infarction
26 non-randomized studies; 6 randomized studies
20/26 non-randomized and 1/6 randomized studies supported therapy
When "pooled", non-randomized studies collectively showed 50% relative decrease in mortality. Randomized studies showed 20% relative decrease in mortality
Difference may be due to differential patient selection in non-randomized studies.
Bartlett et al. (1985)
Randomized trial of ECMO to treat newborns having respiratory failure
"Play-the-winner": more successful treatment received higher randomization probability
1/1 patients died while on conventional therapy; 0/11 patients died while on ECMO
Questioned for having just one patient on control arm. A priori, lead investigator convinced of superiority of ECMO
Treatment assignment \(T\) must be independent of…
Clinical trials assume (but cannot verify) that target population is same as sampling population.
Have only considered simple, or complete, randomization so far
Equivalent to coin flip: \(\Pr(T_i=A)=\Pr(T_i=B)=0.5\)
Rarely used in practice
n_subj;
## [1] 170
trt_id = rbinom(n_sim*n_subj,1,0.5); #size of one arm size_armA = tapply(trt_id,list(sim_id),sum); quantile(abs(n_subj - size_armA - size_armA), p=seq(0.5,1,by=0.1));
## 50% 60% 70% 80% 90% 100% ## 8 10 14 16 22 50
Cosmetically unappealing to have such large imbalances
Extreme imbalance may affect power calculations, which assume fixed sample sizes (next slide)
May want to ensure balance on specific important prognostic covariates
Potential for imbalances at interim analyses or temporal differences
#Power with exactly n_subj subjects in each pnorm(0.5/sqrt(1/(0.5*n_subj)+1/(0.5*n_subj)) - qnorm(0.975));
## [1] 0.903
#Realized power over 10,000 random allocations summary(pnorm(0.5/sqrt(1/size_armA+1/(n_subj-size_armA)) - qnorm(0.975)));
## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 0.876 0.901 0.903 0.901 0.903 0.903
\(D_i>0\Leftrightarrow\) more patients on arm A, and \(D_i<0\Leftrightarrow\) more patients on arm B
For patient \(i+1\) , increase probability of assignment to currently under-represented arm to \(q\), with \(1/2<q<1\)
\[ \begin{aligned} \Pr(T_{i+1}=A)=\begin{cases} q, & D_i<0\\ 1/2, & D_i=0\\ 1-q, & D_i>0 \end{cases} \end{aligned} \]
\(q\) close to 1 encourages balance, \(q\) close to 0.5 minimizes selection bias
For large (even) \(n\), \(\Pr(D_n=0)\approx 2 - 1/q\)
Example, for \(q=2/3\), there is 50% probability of perfect balance
Thoughts on making \(q\) closer to 1?
Extension of biased-coin design, correction probability changing with degree of imbalance. Two parameters: \(\gamma\), \(\rho\)
Initialize "urn" with \(\gamma\) balls of two colors (corresponding to two treatment arms), e.g. \(\gamma=1\)
For patient \(i\), draw ball and assign treatment corresponding to that ball
Put ball back in urn. Also put \(\rho\) balls of opposite color
As imbalance grows, correction probability in favor of under-represented arm – represented by number of balls of each type – increases
Choose block size, \(b\). Divide each group of \(b\) consecutive patients equally to both arms (\(b/2\) to each arm)
With \(b=4\), there are six possible blocks:
\(T_i\) | \(T_{i+1}\) | \(T_{i+2}\) | \(T_{i+3}\) |
---|---|---|---|
A | A | B | B |
A | B | A | B |
A | B | B | A |
B | B | A | A |
B | A | B | A |
B | A | A | B |
Guarantees treatment balance (\(D_n=0\)) after every \(b\) patients
Large potential for selection bias, e.g. every \(b\)th patient assignment will be known with certainty
If \(b\) is large, potential for large imbalances during block
Bartlett, R.H., Roloff, D.W., Cornell, R.G., Andrews, A.F., Dillon, P.W. and Zwischenberger, J.B. (1985) Extracorporeal circulation in neonatal respiratory failure: A prospective randomized study. Pediatrics, 76, 479–487.
Chalmers, T.C., Matta, R.J., Smith, H.J. and Kunzler, A.-M. (1977) Evidence favoring the use of anticoagulants in the hospital phase of acute myocardial infarction. New England Journal of Medicine, 297, 1091–1096.
Cook, T.D. and DeMets, D.L. (2007) Introduction to Statistical Methods for Clinical Trials. CRC Press.
Efron, B. (1971) Forcing a sequential experiment to be balanced. Biometrika, 58, 403–417.
Friedman, L.M., Furberg, C. and DeMets, D.L. (2010) Fundamentals of Clinical Trials, 4th ed. Springer.
Hareyama, M., Sakata, K.-i., Oouchi, A., Nagakura, H., Shido, M., Someya, M., et al. (2002) High-dose-rate versus low-dose-rate intracavitary therapy for carcinoma of the uterine cervix. Cancer, 94, 117–124.
Lachin, J.M. (1988) Statistical properties of randomization in clinical trials. Controlled Clinical Trials, 9, 289–311.
Rubin, D.B. (1978) Bayesian inference for causal effects: The role of randomization. The Annals of statistics, 34–58.
Rubin, D.B. (2011) Causal inference using potential outcomes. Journal of the American Statistical Association.
Wei, L. and Lachin, J.M. (1988) Properties of the urn randomization in clinical trials. Controlled Clinical Trials, 9, 345–364.
Zelen, M. (1974) The randomization and stratification of patients to clinical trials. Journal of Chronic Diseases, 27, 365–375.