Bios619 Lecture 11: Randomization

Randomization

Distinguishing feature of later-phase clinical trials from nearly all other forms of clinical research
Generally understood to minimize different types of "bias".
Variability due to randomization not always appreciated

Population inference

Question: what is difference between "random sample" and "random ized sample"?

Notation

\(T\in\{A,B\}\) is treatment
\(O\) is outcome

Potential outcomes

Rubin (1978), Rubin (2011)

Patient \(i\) has two "potential" outcomes \(O_i(A)\) when \(T_i=A\) and \(O_i(B)\) when \(T_i=B\)
Observe \(O_i(A)\) if assigned to arm \(A\), and observe \(O_i(B)\) if assigned to arm \(B\).
Only ever observe one outcome

Potential outcomes

Want to measure individual treatment effect, \(O_i(A)-O_i(B)\). Is this possible?
Can potentially estimate population-average effect conditional on treatment status: \(E[O_i(A)|T_i=A]-E[O_i(B)|T_i=B]\)

Potential outcomes

Then assume \(E[O_i(A)|T_i=A]-E[O_i(B)|T_i=B] = E[O_i(A)-O_i(B)]\)
Use as basis to conclude (or not) that treatment has differential effect on outcome

Why randomize?

Assumption is that subject's outcome on specific arm is independent of event of being on that arm
Randomization ensures this independence: only systematic difference between arm \(A\) and arm \(B\) is treatment assignment

Inferential Framework 1

(Lachin, 1988)

There exists large, target population (population to infer about)
Randomly sample from this population, i.e.
One sample is set of patients taking treatment \(A\) (arm \(A\))
Other is set of patients taking treatment \(B\) (arm \(B\))
Specific treatment assignments can be made arbitrarily. Is inference, e.g. two-sample \(t\)-test, valid?

Inferential Framework 2

Target population exists (conceptually) but impossible to sample from this:

Pre-specified inclusion/exclusion criteria (specific extent of disease, performance status, treatment history, patient characteristics)
Catchment area, geographical constraints

Obtain non-probability sample from target population.
Randomize sample to arm \(A\), \(B\). Is inference valid?

Inferential Framework 2

Statistical tests (\(t\)-test, permutation test, regression) will be valid as tests of randomization
but inference to larger target population will be unverifiable assumption
Arguably more realistic framework for clinical trials

Randomized trials as DAGs

\(T\) binary treatment
\(O\) outcome
\(U\) unknown variables or risk factors

Simulation 1

Simple random sample from target population
Randomization properly applied
\(T\)-\(O\) association is causal effect of \(T\) on \(O\)

Simulation 1

\(O_i = \alpha + \delta 1_{[T_i=A]} + U_i\);

\[ \begin{aligned} E[O_i(A)] &= E[\alpha + \delta + U_i]\\ &= \alpha + \delta + E[U_i] \end{aligned} \]

\[ \begin{aligned} E[O_i(A)|T_i=A] &= E[\alpha + \delta + U_i|T_i=A]\\ &= \alpha + \delta + E[U_i|T_i=A] \end{aligned} \]

Similarly, \(E[O_i(B)] =\alpha + E[U_i]\), and \(E[O_i(B)|T_i=B]=\alpha + E[U_i|T_i=B]\)

Simulation 1

So, population-average (treatment) effect is \[ \begin{aligned} E[O_i(A) - O_i(B)] &= \delta \end{aligned} \] and population-average effect conditional on treatment status is \[ \begin{aligned} &E[O_i(A)|T_i=A] - E[O_i(B)|T_i=B] \\ &\quad = \delta + E[U_i|T_i=A] - E[U_i|T_i=B] \end{aligned} \]

These are equal when \(U_i\perp T_i\)

Word of the day!

Simulation 1

\(\alpha = 0\)
\(\delta = 0.5\)
\(U_i\overset{iid}{\sim}N(0,1)\);
\(\Pr(T_i=A|U_i)=0.5\)
All patients have equal chance of being sampled

n_pop = 2e4;#size of target population
alpha = 0;
delta = 0.5;
U = rnorm(n_pop);
OA = alpha + delta + U;#potential outcome A
OB = alpha + U;#potential outcome B

Simulation 1

n_sim = 1e4;
(n_subj = 2*ceiling(2*(qnorm(.975)+qnorm(.9))^2/delta^2));

## [1] 170

sim_id = rep(1:n_sim,each=n_subj);
samp_id = sample(n_pop,n_sim*n_subj,replace=T)
trt_id = rbinom(n_sim*n_subj,1,0.5);
O = OA[samp_id]*(trt_id==1) + OB[samp_id]*(trt_id==0);
arm_means = tapply(O,list(sim_id,trt_id),mean);

Simulation 1

head(arm_means);

##         0     1
## 1 -0.0464 0.439
## 2  0.1411 0.436
## 3 -0.0624 0.389
## 4 -0.0173 0.518
## 5 -0.0380 0.620
## 6  0.1281 0.395

summary(arm_means[,2] - arm_means[,1]);

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  -0.041   0.392   0.498   0.498   0.603   1.060

Simulation 1

arm_vars = tapply(O,list(sim_id,trt_id),var);
arm_size = tapply(O,list(sim_id,trt_id),length);
test_stat = 
  (arm_means[,2] - arm_means[,1]) / 
  (sqrt((arm_vars[,1]*(arm_size[,1]-1) + arm_vars[,2]*(arm_size[,2]-1)) /
          (arm_size[,1]+arm_size[,2]-2))*sqrt(1/arm_size[,1]+1/arm_size[,2]));
#summary of estimated effect sizes across simulations
summary(test_stat);

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   -0.27    2.55    3.25    3.26    3.94    7.10

#rejection rate (power), nominally 0.90
mean(abs(test_stat)>qt(0.975,n_subj-2));

## [1] 0.892

Simulation 2

Treatment blinding broken: patients with better prognosis more likely to be assigned to arm \(A\).
\(T\) and \(U\) have marginal association
\(T\)-\(O\) association exists directly and through unmeasured \(U\)
Potential outcomes remain the same

Simulation 2

\(\Pr(T_i = A|U_i) = 1/(1+\exp\{-0.05\times U_i\})\):
- \(\Pr(T_i = A|U_i=-1)=\) 0.488
- \(\Pr(T_i = A|U_i=0)=0.5\)
- \(\Pr(T_i = A|U_i=1)=\) 0.512

samp_id = sample(n_pop,n_sim*n_subj,replace=T)
trt_id = rbinom(n_sim*n_subj,1,1/(1+exp(-0.05*U[samp_id])));
O = OA[samp_id]*(trt_id==1) + 
  OB[samp_id]*(trt_id==0);
arm_means = tapply(O,list(sim_id,trt_id),mean);

Simulation 2

summary(tapply(U[samp_id],list(sim_id,trt_id),mean));

##        0                1         
##  Min.   :-0.446   Min.   :-0.408  
##  1st Qu.:-0.103   1st Qu.:-0.052  
##  Median :-0.030   Median : 0.021  
##  Mean   :-0.030   Mean   : 0.020  
##  3rd Qu.: 0.042   3rd Qu.: 0.094  
##  Max.   : 0.449   Max.   : 0.519

Simulation 2

head(arm_means);

##         0     1
## 1  0.0837 0.461
## 2 -0.0377 0.356
## 3  0.0142 0.495
## 4 -0.1690 0.577
## 5  0.0569 0.418
## 6  0.1177 0.497

summary(arm_means[,2] - arm_means[,1]);

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  -0.077   0.446   0.549   0.550   0.654   1.150

Simulation 2

arm_vars = tapply(O,list(sim_id,trt_id),var);
arm_size = tapply(O,list(sim_id,trt_id),length);
test_stat = 
  (arm_means[,2] - arm_means[,1]) /
  (sqrt((arm_vars[,1]*(arm_size[,1]-1)+arm_vars[,2]*(arm_size[,2]-1)) /
          (arm_size[,1]+arm_size[,2]-2))*sqrt(1/arm_size[,1]+1/arm_size[,2]));
#summary of estimated effect sizes across simulations
summary(test_stat);

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   -0.46    2.91    3.58    3.60    4.29    7.98

#rejection rate (power), nominally 0.90
mean(abs(test_stat)>qt(0.975,n_subj-2));

## [1] 0.946

Simulation 3

No direct \(T\)-\(O\) association, i.e. \(\delta=0\)
All other settings same as in Simulation 2
\(T\)-\(O\) association exists through unmeasured \(U\), even though no direct (i.e. causal) relationship

Simulation 3

So, population-average effect is \[ E[O_i(A) - O_i(B)] = 0 \] and population-average effect conditional on treatment status is \[ \begin{aligned} &E[O_i(A)|T_i=A] - E[O_i(B)|T_i=B] \\ &\quad = E[U_i|T_i=A] - E[U_i|T_i=B] \end{aligned} \]

Simulation 3

OA = OB;
O = OA[samp_id]*(trt_id==1) + 
  OB[samp_id]*(trt_id==0);
arm_means = tapply(O,list(sim_id,trt_id),mean);

Simulation 3

summary(tapply(U[samp_id],list(sim_id,trt_id),mean));

##        0                1         
##  Min.   :-0.446   Min.   :-0.408  
##  1st Qu.:-0.103   1st Qu.:-0.052  
##  Median :-0.030   Median : 0.021  
##  Mean   :-0.030   Mean   : 0.020  
##  3rd Qu.: 0.042   3rd Qu.: 0.094  
##  Max.   : 0.449   Max.   : 0.519

Simulation 3

head(arm_means);

##         0        1
## 1  0.0837 -0.03885
## 2 -0.0377 -0.14398
## 3  0.0142 -0.00522
## 4 -0.1690  0.07693
## 5  0.0569 -0.08214
## 6  0.1177 -0.00342

summary(arm_means[,2] - arm_means[,1]);

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  -0.577  -0.054   0.048   0.050   0.154   0.651

Simulation 3

arm_vars = tapply(O,list(sim_id,trt_id),var);
arm_size = tapply(O,list(sim_id,trt_id),length);
test_stat = 
  (arm_means[,2] - arm_means[,1]) /
  (sqrt((arm_vars[,1]*(arm_size[,1]-1)+arm_vars[,2]*(arm_size[,2]-1)) /
          (arm_size[,1]+arm_size[,2]-2))*sqrt(1/arm_size[,1]+1/arm_size[,2]));
#summary of estimated effect sizes across simulations
summary(test_stat);

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   -3.64   -0.36    0.32    0.33    1.01    4.52

#rejection rate (type 1 error), nominally 0.05
mean(abs(test_stat)>qt(0.975,n_subj-2));

## [1] 0.0631

Simulation 4

Probability of sampling subject \(i\) varies unbeknownst to us, i.e. non-probability sampling
Sampled subjects have better average response than population (selection bias)
\(E_i\) indicates that subject \(i\) was sampled (enrolled)
- In Simulations 1–3, \(\Pr(E_i=1)\propto\) (?)
- Now, make \(\Pr(E_i=1|X_i)\propto X_i\sim\text{Unif}(0,1)\), \(X_i\) unobserved
Treatment effect is \(\delta=0.5\)
Model is \(O_i = \alpha + \delta 1_{[T_i=A]} + kU_i + X_i\)
Set \(k=\sqrt{11/12}\) so that \(\mathrm{var}(O_i|T_i)=1\) (sample size calculation still applies)

Simulation 4

Causal \(T\)-\(O\) association exists
Conditioning on \(E\) breaks indirect association through \(X\)

Simulation 4

Population-average treatment effect is \(E[O_i(A) - O_i(B)] = \delta\)
Conditional on treatment (and selection status), \[ \begin{aligned} &E[O_i(A)|T_i=A,E_i=1]\\ &\quad= \alpha + \delta + kE[U_i|T_i=A,E_i=1] + kE[X_i|T_i=A,E_i=1]\\ &\quad= \alpha + \delta+ kE[U_i] + kE[X_i|E_i=1]\\ \end{aligned} \]
Similar calculation for arm \(B\) gives \(E[O_i(A)|T_i=A,E_i=1] - E[O_i(B)|T_i=B,E_i=1] = \delta\)

Simulation 4

delta = 0.5;
X = runif(n_pop);
samp_id = sample(n_pop,n_sim*n_subj,prob=X,replace=T)
trt_id = rbinom(n_sim*n_subj,1,0.5);
OA = alpha + delta + sqrt(11/12)*U + X;
OB = alpha + sqrt(11/12)*U + X;
O = OA[samp_id]*(trt_id==1) + 
  OB[samp_id]*(trt_id==0);
arm_means = tapply(O,list(sim_id,trt_id),mean);

Simulation 4

head(arm_means);

##       0    1
## 1 0.623 1.09
## 2 0.692 1.09
## 3 0.624 1.14
## 4 0.726 1.22
## 5 0.764 1.25
## 6 0.626 1.08

summary(arm_means[,2] - arm_means[,1]);

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  -0.129   0.399   0.501   0.501   0.603   1.030

Simulation 4

arm_vars = tapply(O,list(sim_id,trt_id),var);
arm_size = tapply(O,list(sim_id,trt_id),length);
test_stat = 
  (arm_means[,2] - arm_means[,1]) /
  (sqrt((arm_vars[,1]*(arm_size[,1]-1)+arm_vars[,2]*(arm_size[,2]-1)) /
          (arm_size[,1]+arm_size[,2]-2))*sqrt(1/arm_size[,1]+1/arm_size[,2]));
#summary of estimated effect sizes across simulations
summary(test_stat);

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   -0.92    2.64    3.33    3.34    4.01    7.12

#power stays nominal, even with non-probability sample
mean(abs(test_stat)>qt(0.975,n_subj-2));

## [1] 0.913

Simulation 5

Same non-probability sample mechanism as in Simulation 4
Sampled subjects have better average response than population but only when on arm \(A\):

\[ \begin{aligned} O_i &= \alpha + 2\delta X_i 1_{[T_i=A]} + U_i\\ &= \alpha + 2\delta X_i^* + U_i\\ \end{aligned} \]

Simulation 5

\(T\)-\(O\) association exists through \(X^*\)
Conditioning on \(E\) breaks indirect association through \(X\)

Simulation 5

Population-average treatment effect is \[ \begin{aligned} E[O_i(A)-O_i(B)] &= (\alpha + 2\delta E[X_i] + E[U_i])\\ &\quad - (\alpha + E[U_i])\\ &= 2\delta E[X_i] = \delta \end{aligned} \]

Simulation 5

Conditional on treatment, \[ \begin{aligned} &E[O_i(A)|T_i=A,E_i=1]\\ &\quad= \alpha + 2\delta E[X_i|T_i=A,E_i=1] + E[U_i|T_i=A,E_i=1]\\ &\quad= \alpha + 2\delta E[X_i|E_i=1] + E[U_i]\\ \end{aligned} \]
Similar calculation for arm \(B\) gives \[ \begin{aligned} E[O_i(A)|T_i=A,E_i=1] - E[O_i(B)|T_i=B,E_i=1]\\ = 2\delta E[X_i|E_i=1] \end{aligned} \]

Simulation 5

delta = 0.5;
OA = alpha + 2*delta*X + U;
OB = alpha + U;
samp_id = sample(n_pop,n_sim*n_subj,prob=X,replace=T)
O = OA[samp_id]*(trt_id==1) + 
  OB[samp_id]*(trt_id==0);
arm_means = tapply(O,list(sim_id,trt_id),mean);

Simulation 5

head(arm_means);

##          0     1
## 1 -0.06521 0.814
## 2  0.02783 0.837
## 3 -0.08994 0.634
## 4 -0.01849 0.803
## 5  0.07747 0.695
## 6  0.00988 0.423

summary(arm_means[,2] - arm_means[,1]);

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.081   0.563   0.666   0.667   0.771   1.230

Simulation 5

arm_vars = tapply(O,list(sim_id,trt_id),var);
arm_size = tapply(O,list(sim_id,trt_id),length);
test_stat = 
  (arm_means[,2] - arm_means[,1]) /
  (sqrt((arm_vars[,1]*(arm_size[,1]-1)+arm_vars[,2]*(arm_size[,2]-1)) / 
          (arm_size[,1]+arm_size[,2]-2))*sqrt(1/arm_size[,1]+1/arm_size[,2]));
#summary of estimated effect sizes across simulations
summary(test_stat);

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.50    3.64    4.31    4.32    5.00    9.21

#non-nominal power
mean(abs(test_stat)>qt(0.975,n_subj-2));

## [1] 0.99

Five Simulations

Label	Description of Violation	\(H_0\)	Source of Bias
1	No Violations	F	No bias
2	Imperfect blinding	F	Confounding
3	Imperfect blinding	T	Confounding
4	Non-probability sample	F	No bias
5	Non-probability sample	F	Selection bias

Case Study 1: Radiation Therapy for Carcinoma of Uterine Cervix

Hareyama et al. (2002); see also p144, Cook and DeMets (2007)

Study of low- versus high-dose radiation on invasive carcinoma
HDR group: 61 eligible patients with numerically odd birth month
LDR group: 71 eligible patients with numerically even birth month
No findings of differences between groups

Primary concern: patients may differ by birth month (confounding). More importantly, investigators could deduce treatment group before inviting patient to enroll (selection bias)

Case Study 2: Anticoagulants for MI

Chalmers et al. (1977); see also p70, Friedman, Furberg and DeMets (2010)

Meta-analysis (study of studies) of anticoagulant therapies for myocardial infarction
26 non-randomized studies; 6 randomized studies
20/26 non-randomized and 1/6 randomized studies supported therapy
When "pooled", non-randomized studies collectively showed 50% relative decrease in mortality. Randomized studies showed 20% relative decrease in mortality

Difference may be due to differential patient selection in non-randomized studies.

Case Study 3: ECMO in Neonates

Bartlett et al. (1985)

Randomized trial of ECMO to treat newborns having respiratory failure
"Play-the-winner": more successful treatment received higher randomization probability
1/1 patients died while on conventional therapy; 0/11 patients died while on ECMO

Questioned for having just one patient on control arm. A priori, lead investigator convinced of superiority of ECMO

Key Points So Far

Treatment assignment \(T\) must be independent of…
1. …confounders
2. …unknown other variables
3. …sampling mechanism
Clinical trials assume (but cannot verify) that target population is same as sampling population.

Randomization comes in multiple flavors

Have only considered simple, or complete, randomization so far
Equivalent to coin flip: \(\Pr(T_i=A)=\Pr(T_i=B)=0.5\)
Rarely used in practice

Imbalance in treatment assignments

Calculate treatment imbalance after \(i\)th patient, \(D_i = \sum_{j=1}^i 1_{[T_j=A]} - i/2\)

n_subj;

## [1] 170

trt_id = rbinom(n_sim*n_subj,1,0.5);
#size of one arm
size_armA = tapply(trt_id,list(sim_id),sum);
quantile(abs(n_subj - size_armA - size_armA),
         p=seq(0.5,1,by=0.1));

##  50%  60%  70%  80%  90% 100% 
##    8   10   14   16   22   50

20% of simulated trials had \(|D_n|>\) 16 at end of trial

Imbalance in treatment assignments

Cosmetically unappealing to have such large imbalances
Extreme imbalance may affect power calculations, which assume fixed sample sizes (next slide)
May want to ensure balance on specific important prognostic covariates
Potential for imbalances at interim analyses or temporal differences

Power is a random variable

Power function for two-samples, with \(\delta=0.5\), \(\sigma^2=1\) (Lecture 8): \[ \begin{aligned} 1-\beta &= \Pr\left(Z>z_{1-\alpha}-\dfrac{0.5}{\sqrt{1/n_A+1/n_B}}\right)\\ &= \Phi\left(\dfrac{0.5}{\sqrt{1/n_A+1/n_B}}-z_{1-\alpha}\right) \end{aligned} \]

Power is a random variable

#Power with exactly n_subj subjects in each
pnorm(0.5/sqrt(1/(0.5*n_subj)+1/(0.5*n_subj)) - 
        qnorm(0.975));

## [1] 0.903

#Realized power over 10,000 random allocations
summary(pnorm(0.5/sqrt(1/size_armA+1/(n_subj-size_armA)) - 
                qnorm(0.975)));

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.876   0.901   0.903   0.901   0.903   0.903

Restricted randomization

Permuted block
Stratified permuted blocks
Biased coin
Urn randomization

Biased Coin

Efron (1971)

\(D_i>0\Leftrightarrow\) more patients on arm A, and \(D_i<0\Leftrightarrow\) more patients on arm B
For patient \(i+1\) , increase probability of assignment to currently under-represented arm to \(q\), with \(1/2<q<1\)

\[ \begin{aligned} \Pr(T_{i+1}=A)=\begin{cases} q, & D_i<0\\ 1/2, & D_i=0\\ 1-q, & D_i>0 \end{cases} \end{aligned} \]

\(q\) close to 1 encourages balance, \(q\) close to 0.5 minimizes selection bias

Biased Coin

Efron (1971)

For large (even) \(n\), \(\Pr(D_n=0)\approx 2 - 1/q\)
Example, for \(q=2/3\), there is 50% probability of perfect balance
Thoughts on making \(q\) closer to 1?

Urn randomization

Wei and Lachin (1988)

Extension of biased-coin design, correction probability changing with degree of imbalance. Two parameters: \(\gamma\), \(\rho\)

Initialize "urn" with \(\gamma\) balls of two colors (corresponding to two treatment arms), e.g. \(\gamma=1\)
For patient \(i\), draw ball and assign treatment corresponding to that ball
Put ball back in urn. Also put \(\rho\) balls of opposite color

As imbalance grows, correction probability in favor of under-represented arm – represented by number of balls of each type – increases

Permuted-block randomization

Zelen (1974)

Choose block size, \(b\). Divide each group of \(b\) consecutive patients equally to both arms (\(b/2\) to each arm)
With \(b=4\), there are six possible blocks:

\(T_i\)	\(T_{i+1}\)	\(T_{i+2}\)	\(T_{i+3}\)
A	A	B	B
A	B	A	B
A	B	B	A
B	B	A	A
B	A	B	A
B	A	A	B

Randomly select block (row) for each group of \(b\) consecutive patients

Permuted-block randomization

Guarantees treatment balance (\(D_n=0\)) after every \(b\) patients
Large potential for selection bias, e.g. every \(b\)th patient assignment will be known with certainty
If \(b\) is large, potential for large imbalances during block

References

Bartlett, R.H., Roloff, D.W., Cornell, R.G., Andrews, A.F., Dillon, P.W. and Zwischenberger, J.B. (1985) Extracorporeal circulation in neonatal respiratory failure: A prospective randomized study. Pediatrics, 76, 479–487.

Chalmers, T.C., Matta, R.J., Smith, H.J. and Kunzler, A.-M. (1977) Evidence favoring the use of anticoagulants in the hospital phase of acute myocardial infarction. New England Journal of Medicine, 297, 1091–1096.

Cook, T.D. and DeMets, D.L. (2007) Introduction to Statistical Methods for Clinical Trials. CRC Press.

Efron, B. (1971) Forcing a sequential experiment to be balanced. Biometrika, 58, 403–417.

Friedman, L.M., Furberg, C. and DeMets, D.L. (2010) Fundamentals of Clinical Trials, 4th ed. Springer.

Hareyama, M., Sakata, K.-i., Oouchi, A., Nagakura, H., Shido, M., Someya, M., et al. (2002) High-dose-rate versus low-dose-rate intracavitary therapy for carcinoma of the uterine cervix. Cancer, 94, 117–124.

Lachin, J.M. (1988) Statistical properties of randomization in clinical trials. Controlled Clinical Trials, 9, 289–311.

Rubin, D.B. (1978) Bayesian inference for causal effects: The role of randomization. The Annals of statistics, 34–58.

Rubin, D.B. (2011) Causal inference using potential outcomes. Journal of the American Statistical Association.

Wei, L. and Lachin, J.M. (1988) Properties of the urn randomization in clinical trials. Controlled Clinical Trials, 9, 345–364.

Zelen, M. (1974) The randomization and stratification of patients to clinical trials. Journal of Chronic Diseases, 27, 365–375.