Townsley, et al., 2017

  • Acquired aplastic anemia: deficiency of blood cells caused by damaged bone marrow

  • "Investigator-initiated, nonrandomized, historically controlled, phase 1–2 study" (Townsley et al., 2017)

    • Investigator-initiated means that an academic institutition is main sponsor of study
    • Indicates that both safety and efficacy are of interest. But this is not dose-escalation study
  • Three cohorts of different dosing schedules of eltrombopag added to immunosuppression therapy

Townsley, et al., 2017

Townsley, et al., 2017

  • Primary efficacy endpoint was Complete Response, defined by achieving certain minimum counts of neutrophils, hemoglobin, and platelets

  • Unclear rationale for making safety primary objective:

    • Not powered for safety outcomes
    • Non-specific safety endpoints

Townsley, et al., 2017

  • Original protocol proposed Simon two-stage minimax with \({p_0} = 0.10\), \({p_1} = 0.30\), \(\alpha = 0.05\), and \(\beta = 0.20\). Arrived at \(n=25\) required patients:
require(clinfun);
ph2simon(0.10, 0.30, 0.05, 1 - 0.80);
## 
##  Simon 2-stage Phase II design 
## 
## Unacceptable response rate:  0.1 
## Desirable response rate:  0.3 
## Error rates: alpha =  0.05 ; beta =  0.2 
## 
##         r1 n1 r  n EN(p0) PET(p0)
## Optimal  1 10 5 29  15.01  0.7361
## Minimax  1 15 5 25  19.51  0.5490

Townsley, et al., 2017

  • Amendmendment B increased planned accrual to 31 patients to ensure least 25 evaluable patients. Reverse-engineered corresponding two-stage design by increasing power to \(1-\beta = 0.865\)
ph2simon(0.10, 0.30, 0.05, 1 - 0.865);
## 
##  Simon 2-stage Phase II design 
## 
## Unacceptable response rate:  0.1 
## Desirable response rate:  0.3 
## Error rates: alpha =  0.05 ; beta =  0.135 
## 
##         r1 n1 r  n EN(p0) PET(p0)
## Optimal  1 12 6 34  19.50  0.6590
## Minimax  2 24 6 31  27.05  0.5643
  • Post-hoc justification not really necessary

Townsley, et al., 2017

  • Protocol originally planned for one cohort.

  • Amendment H added second cohort of 31 patients on reduced schedule

  • Amendment P increased planned enrollment to 33 patients to ensure that all screened patients can enroll

  • Amendment Q added third cohort on expanded schedule

  • Final enrollment was 30, 31, and 31 patients

Townsley, et al., 2017

  • A few typos here, e.g. upper bound of 95% CI for CR rate at 6 months in cohort 1 is less than point estimate; table caption states null hypothesis is 30% CR (should be 10%)

  • UB 95% CI for OR at 6 months in cohort 3 exceeds 100% (!)

Townsley, et al., 2017

The response probabilities, including complete response probability and partial response probability, will be estimated using the sample proportions, and their inferences, including confidence intervals and hypotheses testing, will be evaluated using Binomial distributions. (Protocol Section 9.2, First paragraph)

Townsley, et al., 2017

require(binom);
R = 10; n = 30;#cohort 1
binom.confint(R, n);#nothing matches
##           method  x  n  mean lower upper
## 1  agresti-coull 10 30 0.333 0.191 0.513
## 2     asymptotic 10 30 0.333 0.165 0.502
## 3          bayes 10 30 0.339 0.180 0.504
## 4        cloglog 10 30 0.333 0.175 0.500
## 5          exact 10 30 0.333 0.173 0.528
## 6          logit 10 30 0.333 0.190 0.516
## 7         probit 10 30 0.333 0.185 0.513
## 8        profile 10 30 0.333 0.183 0.511
## 9            lrt 10 30 0.333 0.183 0.511
## 10     prop.test 10 30 0.333 0.179 0.529
## 11        wilson 10 30 0.333 0.192 0.512
#assuming typo should be 51 instead of 31,
#they used a t-test instead (counter to protocol)
(R/n) + qt(c(0.5,0.025,0.975),df=n-1)*sqrt((R/n)*(1-R/n)/(n-1));
## [1] 0.333 0.154 0.512
R = 8; n = 31;#cohort 2
binom.confint(R, n);#nothing matches
##           method x  n  mean lower upper
## 1  agresti-coull 8 31 0.258 0.135 0.435
## 2     asymptotic 8 31 0.258 0.104 0.412
## 3          bayes 8 31 0.266 0.122 0.418
## 4        cloglog 8 31 0.258 0.122 0.418
## 5          exact 8 31 0.258 0.119 0.446
## 6          logit 8 31 0.258 0.135 0.437
## 7         probit 8 31 0.258 0.130 0.431
## 8        profile 8 31 0.258 0.127 0.427
## 9            lrt 8 31 0.258 0.127 0.427
## 10     prop.test 8 31 0.258 0.125 0.449
## 11        wilson 8 31 0.258 0.137 0.432
(R/n) + qt(c(0.5,0.025,0.975),df=n-1)*sqrt((R/n)*(1-R/n)/(n-1));
## [1] 0.2581 0.0949 0.4212
#pvalue for p0 = 0.10 matches table to 2 sig dig
2*pt(((R/n) - 0.10)/sqrt((R/n)*(1-R/n)/(n-1)), df = n-1, lower = F);
## [1] 0.0571

Townsley, et al., 2017

  • Reported p-values do not account for design (they should)
  • Also, lack of consistency in use of one- versus two-sided: Simon's design is inherently one-sided, but they report two-sided p-values
r1 = 2; n1=24; r=6; n=31; p0=0.10;R = 8;
#One-sided p-value
sum(pbinom(R-1-((r1+1):n1), n - n1, p0, lower = F) *
                dbinom((r1+1):n1, n1, p0));
## [1] 0.00959

Townsley, et al., 2017

  • Some unavoidable messiness
  • Also some avoidable messiness and protocol deviations
  • Unclear which dose schedule was carried forward for phase 3 study (probably schedule 3)
  • Does not mean trial was bad / wrong
  • Large randomized, placebo-controlled study currently enrolling patients.

    • Aim is to improve 3 month CR rate
    • Usually scientific aim in comparative trial is "harder", e.g. survival

Boonstra, et al., 2017

  • "open-label, single-center phase II study of ixazomib in patients with relapsed or refractory cutaneous and PTCLs" (Boonstra et al., 2017)

  • Study of proteasome inhibitor

  • Heterogenous patient population

Boonstra, et al., 2017

  • Ixazomib targets genomic pathway (NF-KB -> GATA3 -> cancer cell survival)

    • Approved as second-line therapy for multiple myeloma
  • Proteasome is protein recycler. Ixazomib is proteasome inhibitor. Idea is to take away ingredient key for cancer cell survival

  • All -mib drugs are proteasome inhibitors

Boonstra, et al., 2017

  • In vitro studies demonstrated mechanism of action in CTCL and PTCL cell lines

  • Primary endpoint was best response in 6 months

  • The posited 'null' best objective response rate (ORR) was 30%. Given the previously reported ORR with bortezomib the trial was powered to detect an improved ORR of 60% with probability .90, based upon a two-sided type I error equal to .10.

  • \({p_0} = 0.30\), \({p_1} = 0.60\), \(\alpha = 0.05\), and \(\beta = 0.10\)

Boonstra, et al., 2017

library(ph2mult);
binom.design(type = "admissible", p0 = 0.30, p1 = 0.60, 
             signif.level = 0.05, power.level = 0.9, plot.out = T);

##            r1 n1  r  n EN.p0. PET.p0.  error power
## Optimal     3 10 12 28   16.3   0.650 0.0419 0.912
## Admissible  3 11 11 25   17.0   0.570 0.0418 0.909
## Minimax     7 18 10 23   18.7   0.859 0.0499 0.905
  • \(\{r_1,n_1,r,n\} = \{3,11,11,25\}\)

Boonstra, et al., 2017

  • Ended up with 12 evaluable patients at first stage

  • Per protocol, 2 patients who withdrew prior to first response assessment (1 month) were replaced
  • But also per protocol, all patients who received at least one dose were evaluable (12/13 received one dose or more)

Boonstra, et al., 2017

Figure 2

Boonstra, et al., 2017

  • 1/12 patients had CR/PR by 6 months

  • Stopped at interim futility analysis

  • UMCC will soon be opening dose-escalation + efficacy trial of Ixazomib plus Romidepsin in only PTCL patients

Summary of where we are at

  • Simon-like designs are popular as single-arm phase II designs, particularly in oncology, AIDS trials

  • Useful starting point for sample size but usually not sensible to rely on optimality criteria

More on Bayes

Frequentist

  • \(\gamma\) treated as fixed, unknown

  • Point estimation may be likelihood-based, i.e. MLE, but hypothesis testing frequency-based: what is distribution of data under many replications of data-generating mechanism?

  • \(p\)-value reflects probability of data given design

Bayes

  • \(\gamma\) treated as random

  • Posterior: \(\pi(\gamma|R) \propto L(R|\gamma) \pi(\gamma)\)

  • What is distribution of \(\gamma\) given data?

  • Ignores data-generating mechanism

Likelihood of phase 2 data

  • Let \(R_k\) denote # responses after patient \(k\). Then, \[L(R_k|\gamma) = \gamma^{R_k}(1-\gamma)^{k - R_k}\] What to use for \(\pi(\gamma)\)?

Beta distribution

  • Density \(f(x)\propto x^{a_1-1}(1-x)^{a_2-1}\)

  • Mean: \(E(X) = a_1/(a_1+a_2)\)

  • Variance: \(V(X) = a_1a_2/([a_1+a_2]^2[a_1+a_2+1]) = E(X)E(1-X)/(a_1+a_2+1)\)

Beta distribution

  • Prior that yields posterior having same distribution is called conjugate

  • Beta distribution is conjugate to Binomial likelihood: \[ \begin{aligned} \pi(\gamma|R_k)&\propto L(R_k|\gamma) \pi(\gamma)\\ &\propto\gamma^{R_k}(1-\gamma)^{k - R_k} \gamma^{a_1-1}(1-\gamma)^{a_2-1}\\ &=\gamma^{a_1+R_k-1}(1-\gamma)^{a_2+k - R_k-1} \end{aligned} \]

Clinical interpretation of Beta prior

  • A priori, \(\gamma\sim \text{Beta}(a_1,a_2)\), where \(a_1+a_2\) represents number of historical patients' worth of data, with \(a_1\) of them being responders

  • Incorporate information on \(R_k\) responders out of \(k\) patients and, a posteriori, \(\gamma \sim \text{Beta}(a_1+R_k,a_2+k-R_k)\)

Beta(1,2) distribution

Beta(2,4) distribution

Beta(5,10) distribution

Beta(15,30) distribution

One-arm Bayesian Phase 2 design

  • Enroll patients until stopping rule is satisfied. After patient \(k\), calculate \[\Pr(\gamma \geq p_0 + \delta|R_k).\] Stop trial if this falls below \(\pi_L\) (futility) or above \(\pi_U\) (efficacy)

  • \(\delta\) fixed. Key idea is that \(p_0\) can have distribution to reflect uncertainty about historical control

  • "What is posterior probability that response rate (\(\gamma\)) exceeds historical response rate (\(p_0\)) by at least \(\delta\)?"

  • Detailed in (Thall and Simon, 1994)

Decision 1: prior on \(\gamma\)

  • Size of \(a_1+a_2\) is important. Suppose \(k=10\) and \(R_k=9\),

  • If \(\gamma\sim\text{Beta}(a_1=0.5,a_2=4.5)\), then \(\gamma|R_k\sim\text{Beta}(9.5,5.5)\), and \(\Pr(\gamma \geq 0.3|R_k)\) is

pbeta(q = 0.3,shape1 = 5.5,shape2 = 9.5,lower = F);
## [1] 0.69

Decision 1: prior on \(\gamma\)

  • If \(\gamma\sim\text{Beta}(a_1=5,a_2=45)\), then \(\gamma|R_k\sim\text{Beta}(14,46)\), and \(\Pr(\gamma \geq 0.3|R_k)\) is
pbeta(q = 0.3,shape1 = 14,shape2 = 46,lower = F);
## [1] 0.114

Decision 1: prior on \(\gamma\)

  • Same data, different outcomes
  • Do not want prior to be so informative as to ignore data, e.g. \(a_1+a_2\leq10\)

Decision 2: prior on \(p_0\)

What is \(p_0\)? Control to be used in phase 3? How "historical" is it? Does it reflect same clinical population?

  • Suggests using prior on \(p_0\)

  • No additional data collected, so prior should be informative

Decision 3: choice of \(\delta\), \(\pi_L\), \(\pi_U\)

Tuning parameters selected based on clinical reasoning and frequentist operating characteristics (type I error under null; power under specified alternative). Some typical choices

  • \(\delta\in(0.05,0.15)\)
  • \(\pi_L<0.10\)
  • \(\pi_U>0.90\)

Ex: Proposed Design for Newly Diagnosed Metastatic Prostate Cancer

  • Treatment is surgical procedure, to decrease burden of circulating tumor cells (CTCs), followed by standard therapy

  • "Response" means achieving \(<5 \text{ CTCs} / 7.5 \text{ cc}\) post-op

  • Historical comparison is no surgery (just standard therapy)

Example

  • \(\gamma\sim\text{Beta}(2.8,1.2)\) gives probability of response post-op (\(E[\gamma] = 2.8/4 = 0.7\))

  • \(p_0\sim\text{Beta}(120,180)\) gives probability of response on standard therapy alone (\(E[p_0] = 120/300 = 0.4\))

  • Stop trial if \(\Pr(\gamma\geq p_0+0.3|R_k)<\pi_L\equiv 0.02\) (Max of 36 patients)

Example

Simulation 1: \(\gamma = 0.7\) (in truth)

Simulation 2: \(\gamma = 0.55\) (in truth)

Simulation 3: \(\gamma = 0.4\) (in truth)

Example

  • Was also important to ensure low surgery-related toxicity: additional stopping rule implemented if too much toxicity is observed

  • \(\gamma^T\sim\text{Beta}(0.4,3.6)\) gives probability of Grade 3 surgical toxicity.

  • Stop trial if \(\Pr(\gamma^T<0.10|R_k^T)<0.10\).

Example

  • Like MTPI, decisions based upon posterior probabilities can be translated into prespecified cutpoints after patient \(k\):

    • \(R_k < \ell_k \Rightarrow\Pr(\gamma \geq p_0 + \delta|R_k) < \pi_L \Rightarrow\) stop trial for futility

    • \(R_k > u_k \Rightarrow\Pr(\gamma \geq p_0 + \delta|R_k) > \pi_U \Rightarrow\) stop trial for efficacy

Example: Operating Characteristics

Example

  • Design is Bayesian, but assessment is still frequentist:

    1. Fix \(\gamma\) at different true values (uninteresting, interesting)
    2. Simulate trial and tune stopping rules, maximum sample size so that type I error and power are maintained under stopping rules

Word of the day!

References

Boonstra, P.S., Polk, A., Brown, N., Hristov, A.C., Bailey, N.G., Kaminski, M.S., et al. (2017) A single center phase ii study of ixazomib in patients with relapsed or refractory cutaneous or peripheral t-cell lymphomas. American Journal of Hematology, 92, 1287–1294.

Thall, P.F. and Simon, R. (1994) Practical bayesian guidelines for phase iib clinical trials. Biometrics, 50, 337–349.

Townsley, D.M., Scheinberg, P., Winkler, T., Desmond, R., Dumitriu, B., Rios, O., et al. (2017) Eltrombopag added to standard immunosuppression for aplastic anemia. New England Journal of Medicine, 376, 1540–1550.