Bios619 Lecture 6: Phase 2 Two-stage Examples, Bayesian Phase 2

Townsley, et al., 2017

Acquired aplastic anemia: deficiency of blood cells caused by damaged bone marrow
"Investigator-initiated, nonrandomized, historically controlled, phase 1–2 study" (Townsley et al., 2017)
- Investigator-initiated means that an academic institutition is main sponsor of study
- Indicates that both safety and efficacy are of interest. But this is not dose-escalation study
Three cohorts of different dosing schedules of eltrombopag added to immunosuppression therapy

Townsley, et al., 2017

Primary efficacy endpoint was Complete Response, defined by achieving certain minimum counts of neutrophils, hemoglobin, and platelets
Unclear rationale for making safety primary objective:
- Not powered for safety outcomes
- Non-specific safety endpoints

Townsley, et al., 2017

Original protocol proposed Simon two-stage minimax with \({p_0} = 0.10\), \({p_1} = 0.30\), \(\alpha = 0.05\), and \(\beta = 0.20\). Arrived at \(n=25\) required patients:

require(clinfun);
ph2simon(0.10, 0.30, 0.05, 1 - 0.80);

## 
##  Simon 2-stage Phase II design 
## 
## Unacceptable response rate:  0.1 
## Desirable response rate:  0.3 
## Error rates: alpha =  0.05 ; beta =  0.2 
## 
##         r1 n1 r  n EN(p0) PET(p0)
## Optimal  1 10 5 29  15.01  0.7361
## Minimax  1 15 5 25  19.51  0.5490

Townsley, et al., 2017

Amendmendment B increased planned accrual to 31 patients to ensure least 25 evaluable patients. Reverse-engineered corresponding two-stage design by increasing power to \(1-\beta = 0.865\)

ph2simon(0.10, 0.30, 0.05, 1 - 0.865);

## 
##  Simon 2-stage Phase II design 
## 
## Unacceptable response rate:  0.1 
## Desirable response rate:  0.3 
## Error rates: alpha =  0.05 ; beta =  0.135 
## 
##         r1 n1 r  n EN(p0) PET(p0)
## Optimal  1 12 6 34  19.50  0.6590
## Minimax  2 24 6 31  27.05  0.5643

Post-hoc justification not really necessary

Townsley, et al., 2017

Protocol originally planned for one cohort.
Amendment H added second cohort of 31 patients on reduced schedule
Amendment P increased planned enrollment to 33 patients to ensure that all screened patients can enroll
Amendment Q added third cohort on expanded schedule
Final enrollment was 30, 31, and 31 patients

Townsley, et al., 2017

A few typos here, e.g. upper bound of 95% CI for CR rate at 6 months in cohort 1 is less than point estimate; table caption states null hypothesis is 30% CR (should be 10%)
UB 95% CI for OR at 6 months in cohort 3 exceeds 100% (!)

Townsley, et al., 2017

The response probabilities, including complete response probability and partial response probability, will be estimated using the sample proportions, and their inferences, including confidence intervals and hypotheses testing, will be evaluated using Binomial distributions. (Protocol Section 9.2, First paragraph)

Townsley, et al., 2017

require(binom);
R = 10; n = 30;#cohort 1
binom.confint(R, n);#nothing matches

##           method  x  n  mean lower upper
## 1  agresti-coull 10 30 0.333 0.191 0.513
## 2     asymptotic 10 30 0.333 0.165 0.502
## 3          bayes 10 30 0.339 0.180 0.504
## 4        cloglog 10 30 0.333 0.175 0.500
## 5          exact 10 30 0.333 0.173 0.528
## 6          logit 10 30 0.333 0.190 0.516
## 7         probit 10 30 0.333 0.185 0.513
## 8        profile 10 30 0.333 0.183 0.511
## 9            lrt 10 30 0.333 0.183 0.511
## 10     prop.test 10 30 0.333 0.179 0.529
## 11        wilson 10 30 0.333 0.192 0.512

#assuming typo should be 51 instead of 31,
#they used a t-test instead (counter to protocol)
(R/n) + qt(c(0.5,0.025,0.975),df=n-1)*sqrt((R/n)*(1-R/n)/(n-1));

## [1] 0.333 0.154 0.512

R = 8; n = 31;#cohort 2
binom.confint(R, n);#nothing matches

##           method x  n  mean lower upper
## 1  agresti-coull 8 31 0.258 0.135 0.435
## 2     asymptotic 8 31 0.258 0.104 0.412
## 3          bayes 8 31 0.266 0.122 0.418
## 4        cloglog 8 31 0.258 0.122 0.418
## 5          exact 8 31 0.258 0.119 0.446
## 6          logit 8 31 0.258 0.135 0.437
## 7         probit 8 31 0.258 0.130 0.431
## 8        profile 8 31 0.258 0.127 0.427
## 9            lrt 8 31 0.258 0.127 0.427
## 10     prop.test 8 31 0.258 0.125 0.449
## 11        wilson 8 31 0.258 0.137 0.432

(R/n) + qt(c(0.5,0.025,0.975),df=n-1)*sqrt((R/n)*(1-R/n)/(n-1));

## [1] 0.2581 0.0949 0.4212

#pvalue for p0 = 0.10 matches table to 2 sig dig
2*pt(((R/n) - 0.10)/sqrt((R/n)*(1-R/n)/(n-1)), df = n-1, lower = F);

## [1] 0.0571

Townsley, et al., 2017

Reported p-values do not account for design (they should)
Also, lack of consistency in use of one- versus two-sided: Simon's design is inherently one-sided, but they report two-sided p-values

r1 = 2; n1=24; r=6; n=31; p0=0.10;R = 8;
#One-sided p-value
sum(pbinom(R-1-((r1+1):n1), n - n1, p0, lower = F) *
                dbinom((r1+1):n1, n1, p0));

## [1] 0.00959

Townsley, et al., 2017

Some unavoidable messiness
Also some avoidable messiness and protocol deviations
Unclear which dose schedule was carried forward for phase 3 study (probably schedule 3)
Does not mean trial was bad / wrong
Large randomized, placebo-controlled study currently enrolling patients.
- Aim is to improve 3 month CR rate
- Usually scientific aim in comparative trial is "harder", e.g. survival

Boonstra, et al., 2017

"open-label, single-center phase II study of ixazomib in patients with relapsed or refractory cutaneous and PTCLs" (Boonstra et al., 2017)
Study of proteasome inhibitor
Heterogenous patient population

Boonstra, et al., 2017

Ixazomib targets genomic pathway (NF-KB -> GATA3 -> cancer cell survival)
- Approved as second-line therapy for multiple myeloma
Proteasome is protein recycler. Ixazomib is proteasome inhibitor. Idea is to take away ingredient key for cancer cell survival
All -mib drugs are proteasome inhibitors

Boonstra, et al., 2017

In vitro studies demonstrated mechanism of action in CTCL and PTCL cell lines
Primary endpoint was best response in 6 months
The posited 'null' best objective response rate (ORR) was 30%. Given the previously reported ORR with bortezomib the trial was powered to detect an improved ORR of 60% with probability .90, based upon a two-sided type I error equal to .10.
\({p_0} = 0.30\), \({p_1} = 0.60\), \(\alpha = 0.05\), and \(\beta = 0.10\)

Boonstra, et al., 2017

library(ph2mult);
binom.design(type = "admissible", p0 = 0.30, p1 = 0.60, 
             signif.level = 0.05, power.level = 0.9, plot.out = T);

##            r1 n1  r  n EN.p0. PET.p0.  error power
## Optimal     3 10 12 28   16.3   0.650 0.0419 0.912
## Admissible  3 11 11 25   17.0   0.570 0.0418 0.909
## Minimax     7 18 10 23   18.7   0.859 0.0499 0.905

\(\{r_1,n_1,r,n\} = \{3,11,11,25\}\)

Boonstra, et al., 2017

Ended up with 12 evaluable patients at first stage
Per protocol, 2 patients who withdrew prior to first response assessment (1 month) were replaced
But also per protocol, all patients who received at least one dose were evaluable (12/13 received one dose or more)

Boonstra, et al., 2017

Figure 2

Boonstra, et al., 2017

1/12 patients had CR/PR by 6 months
Stopped at interim futility analysis
UMCC will soon be opening dose-escalation + efficacy trial of Ixazomib plus Romidepsin in only PTCL patients

Summary of where we are at

Simon-like designs are popular as single-arm phase II designs, particularly in oncology, AIDS trials
Useful starting point for sample size but usually not sensible to rely on optimality criteria

Likelihood of phase 2 data

Let \(R_k\) denote # responses after patient \(k\). Then, \[L(R_k|\gamma) = \gamma^{R_k}(1-\gamma)^{k - R_k}\] What to use for \(\pi(\gamma)\)?

Beta distribution

Density \(f(x)\propto x^{a_1-1}(1-x)^{a_2-1}\)
Mean: \(E(X) = a_1/(a_1+a_2)\)
Variance: \(V(X) = a_1a_2/([a_1+a_2]^2[a_1+a_2+1]) = E(X)E(1-X)/(a_1+a_2+1)\)

Beta distribution

Prior that yields posterior having same distribution is called conjugate
Beta distribution is conjugate to Binomial likelihood: \[ \begin{aligned} \pi(\gamma|R_k)&\propto L(R_k|\gamma) \pi(\gamma)\\ &\propto\gamma^{R_k}(1-\gamma)^{k - R_k} \gamma^{a_1-1}(1-\gamma)^{a_2-1}\\ &=\gamma^{a_1+R_k-1}(1-\gamma)^{a_2+k - R_k-1} \end{aligned} \]

Clinical interpretation of Beta prior

A priori, \(\gamma\sim \text{Beta}(a_1,a_2)\), where \(a_1+a_2\) represents number of historical patients' worth of data, with \(a_1\) of them being responders
Incorporate information on \(R_k\) responders out of \(k\) patients and, a posteriori, \(\gamma \sim \text{Beta}(a_1+R_k,a_2+k-R_k)\)

Beta(1,2) distribution

Beta(2,4) distribution

Beta(5,10) distribution

Beta(15,30) distribution

One-arm Bayesian Phase 2 design

Enroll patients until stopping rule is satisfied. After patient \(k\), calculate \[\Pr(\gamma \geq p_0 + \delta|R_k).\] Stop trial if this falls below \(\pi_L\) (futility) or above \(\pi_U\) (efficacy)
\(\delta\) fixed. Key idea is that \(p_0\) can have distribution to reflect uncertainty about historical control
"What is posterior probability that response rate (\(\gamma\)) exceeds historical response rate (\(p_0\)) by at least \(\delta\)?"
Detailed in (Thall and Simon, 1994)

Decision 1: prior on \(\gamma\)

Size of \(a_1+a_2\) is important. Suppose \(k=10\) and \(R_k=9\),
If \(\gamma\sim\text{Beta}(a_1=0.5,a_2=4.5)\), then \(\gamma|R_k\sim\text{Beta}(9.5,5.5)\), and \(\Pr(\gamma \geq 0.3|R_k)\) is

pbeta(q = 0.3,shape1 = 5.5,shape2 = 9.5,lower = F);

## [1] 0.69

Decision 1: prior on \(\gamma\)

If \(\gamma\sim\text{Beta}(a_1=5,a_2=45)\), then \(\gamma|R_k\sim\text{Beta}(14,46)\), and \(\Pr(\gamma \geq 0.3|R_k)\) is

pbeta(q = 0.3,shape1 = 14,shape2 = 46,lower = F);

## [1] 0.114

Decision 1: prior on \(\gamma\)

Same data, different outcomes
Do not want prior to be so informative as to ignore data, e.g. \(a_1+a_2\leq10\)

Decision 2: prior on \(p_0\)

What is \(p_0\)? Control to be used in phase 3? How "historical" is it? Does it reflect same clinical population?

Suggests using prior on \(p_0\)
No additional data collected, so prior should be informative

Decision 3: choice of \(\delta\), \(\pi_L\), \(\pi_U\)

Tuning parameters selected based on clinical reasoning and frequentist operating characteristics (type I error under null; power under specified alternative). Some typical choices

\(\delta\in(0.05,0.15)\)
\(\pi_L<0.10\)
\(\pi_U>0.90\)

Ex: Proposed Design for Newly Diagnosed Metastatic Prostate Cancer

Treatment is surgical procedure, to decrease burden of circulating tumor cells (CTCs), followed by standard therapy
"Response" means achieving \(<5 \text{ CTCs} / 7.5 \text{ cc}\) post-op
Historical comparison is no surgery (just standard therapy)

Example

\(\gamma\sim\text{Beta}(2.8,1.2)\) gives probability of response post-op (\(E[\gamma] = 2.8/4 = 0.7\))
\(p_0\sim\text{Beta}(120,180)\) gives probability of response on standard therapy alone (\(E[p_0] = 120/300 = 0.4\))
Stop trial if \(\Pr(\gamma\geq p_0+0.3|R_k)<\pi_L\equiv 0.02\) (Max of 36 patients)

Example

Simulation 1: \(\gamma = 0.7\) (in truth)

http://www.umich.edu/~philb/Winter2018Slides/BayesSim1.html

Simulation 2: \(\gamma = 0.55\) (in truth)

http://www.umich.edu/~philb/Winter2018Slides/BayesSim1.html

Simulation 3: \(\gamma = 0.4\) (in truth)

http://www.umich.edu/~philb/Winter2018Slides/BayesSim1.html

Example

Was also important to ensure low surgery-related toxicity: additional stopping rule implemented if too much toxicity is observed
\(\gamma^T\sim\text{Beta}(0.4,3.6)\) gives probability of Grade 3 surgical toxicity.
Stop trial if \(\Pr(\gamma^T<0.10|R_k^T)<0.10\).

Example

Like MTPI, decisions based upon posterior probabilities can be translated into prespecified cutpoints after patient \(k\):
- \(R_k < \ell_k \Rightarrow\Pr(\gamma \geq p_0 + \delta|R_k) < \pi_L \Rightarrow\) stop trial for futility
- \(R_k > u_k \Rightarrow\Pr(\gamma \geq p_0 + \delta|R_k) > \pi_U \Rightarrow\) stop trial for efficacy

Example: Operating Characteristics

Example

Design is Bayesian, but assessment is still frequentist:
1. Fix \(\gamma\) at different true values (uninteresting, interesting)
2. Simulate trial and tune stopping rules, maximum sample size so that type I error and power are maintained under stopping rules

Word of the day!

References

Boonstra, P.S., Polk, A., Brown, N., Hristov, A.C., Bailey, N.G., Kaminski, M.S., et al. (2017) A single center phase ii study of ixazomib in patients with relapsed or refractory cutaneous or peripheral t-cell lymphomas. American Journal of Hematology, 92, 1287–1294.

Thall, P.F. and Simon, R. (1994) Practical bayesian guidelines for phase iib clinical trials. Biometrics, 50, 337–349.

Townsley, D.M., Scheinberg, P., Winkler, T., Desmond, R., Dumitriu, B., Rios, O., et al. (2017) Eltrombopag added to standard immunosuppression for aplastic anemia. New England Journal of Medicine, 376, 1540–1550.

Townsley, et al., 2017

Townsley, et al., 2017

Townsley, et al., 2017

Townsley, et al., 2017

Townsley, et al., 2017

Townsley, et al., 2017

Townsley, et al., 2017

Townsley, et al., 2017

Townsley, et al., 2017

Townsley, et al., 2017

Townsley, et al., 2017

Boonstra, et al., 2017

Boonstra, et al., 2017

Boonstra, et al., 2017

Boonstra, et al., 2017

Boonstra, et al., 2017

Boonstra, et al., 2017

Boonstra, et al., 2017

Summary of where we are at

More on Bayes

Likelihood of phase 2 data

Beta distribution

Beta distribution

Clinical interpretation of Beta prior

Beta(1,2) distribution

Beta(2,4) distribution

Beta(5,10) distribution

Beta(15,30) distribution

One-arm Bayesian Phase 2 design

Decision 1: prior on \(\gamma\)

Decision 1: prior on \(\gamma\)

Decision 1: prior on \(\gamma\)

Decision 2: prior on \(p_0\)

Decision 3: choice of \(\delta\), \(\pi_L\), \(\pi_U\)

Ex: Proposed Design for Newly Diagnosed Metastatic Prostate Cancer

Example

Example

Simulation 1: \(\gamma = 0.7\) (in truth)

Simulation 2: \(\gamma = 0.55\) (in truth)

Simulation 3: \(\gamma = 0.4\) (in truth)

Example

Example

Example: Operating Characteristics

Example

Word of the day!

References