Bonus Discussion: Observed Power

What happens if we declare non-significance but didn't do sample size calculation beforehand? Were we sufficiently powered to begin with?

Recall power curve given by \[ \begin{aligned} 1 - \beta &= \Pr\left(Z > z_{1-\alpha} - \dfrac{\delta}{\sigma/\sqrt{n}}\right) \end{aligned} \] where \(Z\) is standard normal. Can we calculate observed power by plugging in observed value of test statistick?

\[ \begin{aligned} 1 - \beta &= \Pr\left(Z > z_{1-\alpha} - \dfrac{\hat\delta}{\hat\sigma/\sqrt{n}}\right) \end{aligned} \] Question: how does this relate to calculation of \(p\)-value?

Bonus Discussion: Observed Power

\[ \begin{aligned} p\text{-value} &=\quad ? \end{aligned} \]

"Observed power" and p-values are 1-1 transformations of each other.

Take away from sample size calculations in time-to-event setting

  • If all patients followed, only hazard ratio (relative rate) matters

  • Once censoring is introduced, individual hazards in each arm become important (see below)

  • In some cases, 'surrogate outcomes' may be observed before actual event of interest. Statistically, the hazard for the surrogate outcome is larger in each arm

Definition of surrogate outcome (Prentice, 1989),(Fleming and DeMets, 1996)

  • Often, one or more indicators of disease progression measured in patients. One would qualify as a surrogate outcome if:

    1. it is predictive of primary clinical outcome
    2. it (fully) captures effect of intervention on clinical outcome

Good idea / bad idea

Universal surrogates do not exist

  • Good idea:
    • Treatment: Cancer prevention education
    • Outcome: Lung cancer diagnosis
    • Proposed surrogate: cigarette smoking diary
  • Bad idea:
    • Treatment: Possible cancer chemoprevention, i.e. multivitamin use
    • Outcome: Lung cancer diagnosis
    • Proposed surrogate: cigarette smoking diary

Surrogate must be tied to treatment and outcome

Other important qualities of surrogates

  • Measured simply and non-invasively

  • Justified on biologically mechanistic grounds

  • Occurs quickly

Why surrogates?

  • If clinical outcome is very bad and/or irreversible, potential for intervention before occurrence
  • Decrease cost, duration of clinical trial

Why surrogates?

Simple example of gain in efficiency from ideal (perfect) surrogate outcome:

  • \(H_0\): \(\log\lambda_B-\log\lambda_A = 0\)
  • \(H_1\): \(\log\lambda_B-\log\lambda_A = \log 1.25\)

  • \(n_A=n_B\) (\(r=1\), equal arms)
  • \((z_{1-\alpha}+z_{1-\beta})^2=6.2\)

Example: Number to enroll based upon primary outcome

  • Suppose primary outcome occurs at small rate, i.e. under \(H_1\), \(\lambda_A=0.02\) (mean event time of 50 months), \(\lambda_B=0.025\) (mean event time of 40 months)

  • No censoring: \(n_A \approx\) 249

  • Censoring at \(t_0=36\) months, uniform staggered entry from \((0,t_0)\), expected drop-out rate of \(\eta=0.02\) patients/month.

  • Can use simulation to estimate probability of observing event:

nsim = 1e5;#number of simulated patients
time_close = 36;
rate_ltfu = 0.02;
avg_rate_event = (0.02 + 0.025)/2;
enroll_to_close = runif(nsim, 0, time_close);
enroll_to_ltfu = rexp(nsim, rate_ltfu);
enroll_to_event = rexp(nsim, avg_rate_event);
(IF = 1 / mean(enroll_to_event < pmin(enroll_to_close, enroll_to_ltfu)));
## [1] 3.85
  • With censoring: \(\text{IF}\times n_A \approx\) 957

Example: Number to enroll based upon proposed surrogate outcome

  • Proposed surrogate occurs at 4x rate, i.e. under \(H_1\), \(\lambda_A^*=0.08\) (mean event time of 12.5 months), \(\lambda_B^*=0.10\) (mean event time of 10 months).

  • Note: \(\log\lambda_B^*-\log \lambda_A^* = \log\lambda_B - \log\lambda_A = \log 1.25\)

  • No Censoring: \(n_A \approx\) 249(same as primary outcome)

avg_rate_event = (0.08 + 0.10)/2;
enroll_to_event = rexp(nsim, avg_rate_event);
(IF = 1 / mean(enroll_to_event < pmin(enroll_to_close, enroll_to_ltfu)));
## [1] 1.63
  • With censoring: \(\text{IF}\times n_A \approx\) 404

Some surrogates

Disease Surrogate Clinical Outcome
AIDS Viral Load, CD4 counts AIDS event, death
Cancer Tumor response Death
Cancer Progression or Death Death
Glaucoma Intraocular pressure Vision loss
Heart Disease Blood pressure, Cholesterol Myocardial infarction
Osteoporosis Bone mineral density Bone fractures
Vaccine Immune response Vaccine efficacy

Precise definition

  • For true surrogate outcome, should reject \(H_0\) based upon surrogate outcome if and only if reject \(H_0\) based upon clinical outcome

  • Let \(S(t)\) denote the surrogate history at time \(t\).
  • May be time-to-event or longitudinal measurements

are as follows:

  1. \(\lambda_A(t|S(t))=\lambda_B(t|S(t))=\lambda(t|S(t))\)

  2. \(\lambda(t|S(t))\neq \lambda(t)\)

Visualizing surrogate outcomes via Directed Acyclic Graphs (DAGs)

Directed Acyclic Graph (DAG)

  • Visualizes joint distribution, e.g. \(f(X_1,\ldots,X_5)\)

  • 'Directed' means that variable \(X_i\) (a 'node') has parents \(X_{\pi_i}\). Arrows instead of segments.

  • 'Acyclic' means that no arrows may lead back to a starting point

Example

  • \(X_{\pi_1}=X_{\pi_2}=\{ \}\) (\(X_1\) and \(X_2\) have no parents)
  • \(X_{\pi_3}=\{X_1,X_2\}\)
  • \(X_{\pi_4}=\{X_3\}\)
  • \(X_{\pi_5}=\{X_1,X_3\}\)

Joint distribution in a DAG

  • Result: \(f(X_1,\ldots,X_r)=\prod_{i=1}^r f(X_i|X_{\pi_i})\)

  • Example, cont'd:

\[ \begin{aligned} f(X_1,\ldots,X_5) &= f(X_5|X_1,X_2,X_3,X_4)f(X_4|X_3,X_2,X_1)\\ &\quad\times f(X_3|X_2,X_1)f(X_2|X_1)f(X_1)\\ &= f(X_5|X_1,X_3)f(X_4|X_3)\\ &\quad\times f(X_3|X_2,X_1)f(X_2)f(X_1) \end{aligned} \]

So \(X_5\) is conditionally independent of \(\{X_2,X_4\}\) given \(\{X_1,X_3\}\), or \(X_5\perp \{X_2,X_4\}|\{X_1,X_3\}\)

Visualizing conditional distributions

Graphical representation of \(\{X_2,X_4,X_5\}|\{X_1,X_3\}\)

Word of the day!

Use rules of Bayes-ball to determine conditional associations

  • Send imaginary ball through DAG (doesn't need to follow direction of arrows)

  • Ball may pass through each node (either circle if unobserved random variable or square if observed random variable) according to specific rules (next slide)

  • (Conditional) independence between two nodes holds if and only if ball is blocked

Use rules of Bayes-ball to determine conditional associations

Rule 1. Ball is not blocked by \(X_2\), i.e. \(X_1\not\perp X_3\)

Rule 2. Ball is blocked by \(X_2\), i.e. \(X_1\perp X_3|X_2\)

Use rules of Bayes-ball to determine conditional associations

Rule 3. Ball is not blocked by \(X_2\), i.e. \(X_1\not\perp X_3\)

Rule 4. Ball is blocked by \(X_2\), i.e. \(X_1\perp X_3|X_2\)

Use rules of Bayes-ball to determine conditional associations

Rule 5. Ball is blocked by \(X_2\), i.e. \(X_1\perp X_3\)

Rule 6. Ball is not blocked by \(X_2\), i.e. \(X_1\not\perp X_3|X_2\)

Example, revisited

  • Rule 2 blocks ball from leaving \(X_2\) through \(X_3\) to \(X_4\)

  • Rule 2 blocks ball from leaving \(X_2\) through \(X_3\) to \(X_5\)

  • Rule 6 allows ball to leave \(X_2\) through \(X_3\) to \(X_1\), but rule 4 blocks ball from leaving \(X_1\) to \(X_5\)

Example, revisited

  • Rule 4 blocks ball from leaving \(X_4\) through \(X_3\) toward \(X_5\)

  • Rule 2 blocks ball from leaving \(X_4\) through \(X_3\) to \(X_1\)

Example, revisited

So \(X_2\perp \{X_4,X_5\}|\{X_1,X_3\}\); \(X_4\perp X_5|\{X_1,X_3\}\) [note I deleted the next slide]

Visualization of surrogate relationships via DAGs

  • Random variables:
    • Disease (\(D\))
    • Treatment (\(T\))
    • Surrogate outcome (\(S\))
    • Clinical outcome (\(O\))
  • Condition on knowledge at time of enrollment (just \(D\))
  • Main question is: how do \(O\) and \(T\) relate after \(S\) is observed?

  • Prentice criteria satisfied if (i) \(O\perp T|S,D\) (ii) \(O \not\perp S|D\)

Ideal Surrogate

  • Treatment and disease both only go through surrogate
  • Both criteria satisifed

Second-path Surrogate

  • Both criteria satisfied (but treatment may not be very effective)

Missing-path Surrogate

  • Treating surrogate only
  • Only First criterion satisfied

Off-target

  • Treatment changes outcome apart from surrogate
  • Only second criterion satisfied

Unmeasured Confounders

  • Unmeasured confounders create path between treatment and clinical outcome

Good idea / bad idea

Normal Hematocrit Trial (NHT) (Besarab et al., 1998)

  • "The Effects of Normal as Compared with Low Hematocrit Values in Patients with Cardiac Disease Who Are Receiving Hemodialysis and Epoetinl" (Besarab et al., 1998)

  • Many patients with end-stage renal disease have anemia, i.e. low hematocrit (% volume of blood comprised by red blood cells) and receive treatment (Epogen) to address this

  • Study hypothesizes that this could help cardiac disease patients undergoing hemodialysis

NHT

  • 1233 patients enrolled over 2.5 years, randomized to treatment regime that uses Epogen to maintain hematocrit at lower-than-normal level or increase it to normal levels

  • Hematocrit used as surrogate for health, but not true surrogate outcome trial. Primary outcome is MI or death.

NHT

NHT

  • Achieved targeted hematocrit levels in both arms, but normalized hematocrit arm experienced higher rate of MI or death. Not expected

  • Study stopped for safety

  • Post-hoc analysis of MI or death against average hematocrit levels (instead of randomized treatment arm) suggested opposite: better prognosis as hematocrit increased

NHT

  • "The higher hematocrit values themselves do not appear to account for the disparate outcomes" (p589).

  • Subsequent studies (Goodkin, 2009, Coyne (2012)) further investigated trial outcomes.

  • Conclusion is that there is likely off-target effect (negatively) affecting survival not captured by hematocrit alone

Plausible DAG from NHT

Nocturnal Oxygen Therapy Trial (NOTT) (Nocturnal Oxygen Therapy Trial Group and others, 1980)

  • 203 patients with COPD randomized to 12 hours of oxygen at night or continuous oxygen

  • Several primary outcomes, all surrogate outcomes: forced expiratory volume (FEV), forced vital capacity (FVC), and functional residual capacity (FRC), quality of life

  • Mortality was secondary outcome

NOTT (Nocturnal Oxygen Therapy Trial Group and others, 1980)

  • No significant differences between groups in any surrogate outcomes

  • Mortality rate in nocturnal oxygen group nearly doubled

Plausible DAG from NOTT

Alternatives to Prentice's criteria

  • Acknowledging that ideal surrogate will almost never exist, less rigorous identification process of surrogate endpoints based on two weaker criteria (Burzykowski, Molenberghs and Buyse, 2005):

    1. Assessment of correlation between proposed surrogate and clinical outcome ('individual-level surrogacy').
    2. Assessment of correlation between effect of treatment on surrogate and effect of treatment on clinical outcome ('trial-level surrogacy')
  • Relaxed analogs of Prentice criteria:

    1. individual-level surrogacy :: \(O \not\perp S|D\)
    2. trial-level surrogacy :: \(O\perp T|S,D\)

Pathological complete response (pCR) as surrogate outcome in BrCA (Cortazar et al., 2014)

  • Neoadjuvant (pre-surgical) chemotherapy. After surgery, resected tissue is inspected for evidence of cancer.

  • pCR is absence of cancer in breast tissue and lymph nodes. Relatively quickly observed

  • Can pCR be used as surrogate outcome?

Figure 2 (Cortazar et al., 2014)

Figure 6 (Cortazar et al., 2014)

pCR as surrogate outcome in BrCA (Cortazar et al., 2014)

  • Subtle interprtation: is pCR 'useful'?

    • Evidence that pCR predicts long-term survival (individual-level surrogacy)

    • Less evidence that treatment effect on pPCR correlates with treatment effect on long-term survival (trial-level surrogacy)

Word of the day!

Discussion points

  • Prentice criteria are strict

  • Identifying and validating true surrogate outcome using this definition is Catch 22. Why?

Discussion points

  • Many cautionary tales regarding surrogate outcomes:

    • Retrospectively easy to see why proposed surrogate failed

    • But difficult and expenseive to prospectively show that surrogate outcome works

Discussion points

  • Validating surrogate assumes that it will be suitable for future therapies

  • This violates purest definition of surrogate outcome: it must be treatment-specific

  • May be tenuous assumption. Treatment mechanisms change

See also

  • Buyse et al. (2010)
  • Cook and DeMets (2007)
  • Fleming and Powers (2012)

References

Besarab, A., Bolton, W.K., Browne, J.K., Egrie, J.C., Nissenson, A.R., Okamoto, D.M., et al. (1998) The effects of normal as compared with low hematocrit values in patients with cardiac disease who are receiving hemodialysis and epoetin. New England Journal of Medicine, 339, 584–590.

Burzykowski, T., Molenberghs, G. and Buyse, M. (2005) The Evaluation of Surrogate Endpoints. Springer.

Buyse, M., Sargent, D.J., Grothey, A., Matheson, A. and De Gramont, A. (2010) Biomarkers and surrogate end points?the challenge of statistical validation. Nature reviews Clinical oncology, 7, 309–317.

Cook, T.D. and DeMets, D.L. (2007) Introduction to Statistical Methods for Clinical Trials. CRC Press.

Cortazar, P., Zhang, L., Untch, M., Mehta, K., Costantino, J.P., Wolmark, N., et al. (2014) Pathological complete response and long-term clinical benefit in breast cancer: The ctneobc pooled analysis. The Lancet, 384, 164–172.

Coyne, D.W. (2012) The health-related quality of life was not improved by targeting higher hemoglobin in the normal hematocrit trial. Kidney international, 82, 235–241.

Fleming, T.R. and DeMets, D.L. (1996) Surrogate end points in clinical trials: Are we being misled? Annals of internal medicine, 125, 605–613.

Fleming, T.R. and Powers, J.H. (2012) Biomarkers and surrogate endpoints in clinical trials. Statistics in medicine, 31, 2973–2984.

Goodkin, D.A. (2009) The normal hematocrit cardiac trial revisited. Seminars in dialysis pp. 495–502. Wiley Online Library.

Nocturnal Oxygen Therapy Trial Group and others. (1980) Continuous or nocturnal oxygen therapy in hypoxemic chronic obstructive lung disease: A clinical trial. Ann Intern Med, 93, 391–398.

Prentice, R.L. (1989) Surrogate endpoints in clinical trials: Definition and operational criteria. Statistics in medicine, 8, 431–440.