Some comments on constructing a DAG

  • Any measured variable with 'parent' should be included
  • Any variable with at least two arrows pointing out must be included

    • The boxed 'E' (for enrollment) is always conditioned upon. If 'E' has no parents, its inclusion and conditioning implicit
    • I have used 'U' to denote residual error in our models. Because they are not measured, they are not strictly speaking necessary

Simulation 6

  • Everything kept same as in Simulation 5, but with \(\delta=0\)

\[ \begin{aligned} O_i &= \alpha + 2\delta X_i 1_{[T_i=A]} + U_i\\ &= \alpha + 2\delta X_i^* + U_i\\ &= \alpha + U_i\\ \end{aligned} \]

Simulation 6

  • No \(T\)-\(O\) association

Simulation 6

n_pop = 2e4;#size of target population
n_sim = 1e4;
(n_subj = 2*ceiling(2*(qnorm(.975)+qnorm(.9))^2/0.5^2));
## [1] 170
sim_id = rep(1:n_sim,each=n_subj);
alpha = 0;
delta = 0;
U = rnorm(n_pop);
X = runif(n_pop);
OA = alpha + 2*delta*X + U;
OB = alpha + U;
samp_id = sample(n_pop,n_sim*n_subj,prob=X,replace=T);
trt_id = rbinom(n_sim*n_subj,1,0.5);
O = OA[samp_id]*(trt_id==1) + 
  OB[samp_id]*(trt_id==0);
arm_means = tapply(O,list(sim_id,trt_id),mean);

Simulation 6

head(arm_means);
##          0       1
## 1 -0.09103  0.0108
## 2  0.03032 -0.0264
## 3  0.02114  0.1939
## 4 -0.00382 -0.0908
## 5  0.11227 -0.1016
## 6 -0.01591  0.0819
summary(arm_means[,2] - arm_means[,1]);
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  -0.722  -0.103   0.000  -0.001   0.103   0.555

Simulation 6

arm_vars = tapply(O,list(sim_id,trt_id),var);
arm_size = tapply(O,list(sim_id,trt_id),length);
test_stat = 
  (arm_means[,2] - arm_means[,1]) /
  (sqrt((arm_vars[,1]*(arm_size[,1]-1)+arm_vars[,2]*(arm_size[,2]-1)) / 
          (arm_size[,1]+arm_size[,2]-2))*sqrt(1/arm_size[,1]+1/arm_size[,2]));
#summary of estimated effect sizes across simulations
summary(test_stat);
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   -4.66   -0.67    0.00   -0.01    0.67    3.46
#nominal type I error
mean(abs(test_stat)>qt(0.975,n_subj-2));
## [1] 0.051

Six Simulations

To replace "Five Simulations" slide in previous lecture

Label Sampling Randomization \(\delta\) Unbiased estimate of \(\delta^{\dagger}\) Unbiased hypothesis test\({}^\ddagger\)
1 Simple Random Proper 0.5 Yes Yes
2 Simple Random Confounding 0.5 No No
3 Simple Random Confounding 0 No No
4 Selection Bias Proper 0.5 Yes Yes
5 Selection Bias Proper 0.5 No Yes
6 (new) Selection Bias Proper 0 Yes Yes

\(^{\dagger}\)\(E[\hat\delta] = \delta\) (depends on specific true value of \(\delta\))

\(^{\ddagger}\)\(\Pr(\text{Reject } H_0|H_0 \text{ True}) \leq\alpha\) (property of family of data generating models; always based upon \(H_0:\delta=0\)). Simulations 2 and 3 are in same family, and 5 and 6 are in same family, because only difference is true value of \(\delta\)

Simulations 5 and 6

  • Population average treatment effect was \(\delta\)

  • Population average treatment effect conditional on enrollment was \(2\delta E[X_i|E_i=1]\)

  • Thus when \(\delta \neq 0\), clinical trial yields biased estimate of \(\delta\). This is problematic insofar as population average treatment effect is actually what we are trying to estimate (or what we think we're trying to estimate)

Some comments on confounding and selection bias

  • Confounding: when one or more variable is associated with both treatment and outcome.

    • Consistent definition across statistical sciences
    • Indicated in DAGs by arrows coming from same variable into both treatment and outcome, e.g. treatment assignments are affected by some factor that is prognostic for outcome
    • Used interchangeably to describe both cause (confounding as action) and effect (confounding as bias), i.e. would be acceptable to say "Confounding confounded estimate of \(\delta\)"

Some comments on confounding and selection bias

  • Selection bias:

    • "a systematic error or bias that causes a sample to be unrepresentative of the population from which it came" (Piantadosi, 2017);
    • "an association between outcomes and treatment assignments" (Cook and DeMets, 2007);
    • "patients are differentially excluded from analyses" (Haneuse, 2016)
    • "individuals being more likely to be selected for study than others, biasing the sample" (Wikipedia)
    • "systematic inclination or tendency for elements or units selected for study…to differ from those not selected" (Meinert, 2012)
    • "the allocation process is predictive" (Friedman, Furberg and DeMets, 2010, Lachin (1988))
    • "the experimenter may consciously or unconsciously bias the experiment through his choice of subjects" (Wei, 1978)
    • "bias resulting from inappropriate selection of controls in case-control studies, bias resulting from differential loss-to-follow up, incidence–prevalence bias, volunteer bias, healthy-worker bias, and nonresponse bias" (Hernán, Hernández-Díaz and Robins, 2004)
  • Semantic difference from confounding is that selection bias as cause does not always imply selection bias an effect

  • More specifically, we saw that selection bias can cause bias in estimates of \(\delta\) but, assuming randomization is done properly, cannot bias hypothesis test itself

  • Difference in external validity versus internal validity. Randomization ensures latter: even randomized trials subject to extreme selection bias remain internally valid, i.e. proper hypothesis testing

  • Cook and DeMets (2007) state "if a trial is double-blind and randomized, there can be no selection bias". What this means is:
    • hypothesis tests will always be unbiased
    • estimates of treatment effect \(\delta\) will be unbiased for study population (which may not be well-characterized)

Quiz 2 content

  • I will not be expecting you to construct DAG from model

  • In generally you cannot construct model from DAG unless you have additional information. I won't be expecting you to do this

  • I do want you to be able use rules of Bayes-ball to make conditional independence claims as they pertain to understanding surrogate outcomes, causal inference, and regression models under randomized designs

  • Simulations from Lecture 11 and 11b were presented to help illustrate ideas. DAGs are useful in determining the validity of \(E[O_i(A)|T_i=A]-E[O_i(B)|T_i=B] = E[O_i(A)-O_i(B)]\)

Word of the day!

References

Cook, T.D. and DeMets, D.L. (2007) Introduction to Statistical Methods for Clinical Trials. CRC Press.

Friedman, L.M., Furberg, C. and DeMets, D.L. (2010) Fundamentals of Clinical Trials, 4th ed. Springer.

Haneuse, S. (2016) Distinguishing selection bias and confounding bias in comparative effectiveness research. Medical care, 54, e23.

Hernán, M.A., Hernández-Díaz, S. and Robins, J.M. (2004) A structural approach to selection bias. Epidemiology, 15, 615–625.

Lachin, J.M. (1988) Statistical properties of randomization in clinical trials. Controlled Clinical Trials, 9, 289–311.

Meinert, C.L. (2012) Clinical Trials. Design, Conduct and Analysis, 2nd ed. Oxford University Press.

Piantadosi, S. (2017) Clinical Trials: A Methodologic Perspective, 3rd ed. John Wiley & Sons.

Wei, L. (1978) On the random allocation design for the control of selection bias in sequential experiments. Biometrika, 65, 79–84.

Wikipedia. Bias (statistics). URL https://en.wikipedia.org/wiki/Bias_(statistics) [accessed 5 April 2018]