Phase 1 wrap up

  • Initial look at safety of drug

  • Goal is to identity dose level carried forward in future studies of its efficacy (phase 2, phase 3)

  • In dose escalation studies, this dose level was sometimes called maximum tolerated dose

  • Designs depend upon context of drug and disease

Dose escalation designs that we learned

  • 3+3 / Storer's A Design

  • Biased Coin Design

  • Modified Toxicity Probability Interval (MTPI)

  • Continual Reassessment Method (CRM)

Dose escalation designs that we learned

Characteristic 3+3 Biased Coin MTPI CRM
Description Cohorts of 3 De-esc (if DLT) / Stay/Esc (Biased coin) De-esc/Stay/Esc (Based upon Posterior Dist) Dose-toxicity regression curv
MTD Largest dose below observed 33% DLT rate Post-trial analysis (e.g. isotonic regression, logistic regression) to find dose level closest to target Post-trial analysis to find dose level closest to target Dose-toxicity regression curve to find dose level closest to target
Sample size Integrated into design Pre-specified Pre-specified Pre-specified
Convergence to correct dose assignment No No Yes Yes
Convergence to correct MTD No Yes Yes Yes

Phase 2 introduction

  • Looking for evidence of activity, with respect to what is already available to patients

  • "Phase 2" is convenient word but loosely defined

  • Piantadosi uses term middle development

Wny is phase 2 needed?

Taken from (Piantadosi, 2017), p373

  • First evidence of activity using meaningful clinical outcome
  • Longer-term evaluation of safety
  • Feasibility of administering new treatment
  • Fine-tune dose, schedule
  • Practice for definitive study
  • Determine whether definitive study is warranted
  • "Depressure pipeline"

When is phase 2 step most critical?

Factors in favor of Phase 2 Factors in favor of skipping Phase 2
Many competing experimental therapies in pipeline Lack of available therapies for disease (either experimental or approved)
Pessimistic about likelihood of success Optimistic about likelihood of success
Availability of short-term, easily measured surrogate efficacy outcomes Efficacy usefully measured by hard clinical outcomes
Highly prevalent disease Rare disease
Opportunity cost for failed phase 3 trial would be high Opportunity cost for time required to do phase 2 is high

Drug Development Pipeline

  • Let \(W\) denote drugs that are worthwhile, \(W^C\) denote drugs that are not
  • Illustration of phase 2 goal:

Drug Development Pipeline

  • Let \(\Pr_\text{ph2}(W)\) denote proportion of truly "worthwhile" drug entering phase 2 stage
  • Let \(\alpha\), \(\beta\) denote nominal type I and type II error rates in phase 2
  • Let \(S_2\) indicate that a drug is selected in phase 2 pipeline
  • From Bayes Rule, true positive finding from phase 2 is: \[ \begin{aligned} \Pr(W|S_2) &= \dfrac{\Pr(S_2|W)\Pr_\text{ph2}(W) }{\Pr(S_2|W)\Pr_\text{ph2}(W) + \Pr(S_2|W^C)\Pr_\text{ph2}(W^C) }\\ &= \dfrac{(1-\beta)\Pr_\text{ph2}(W) }{(1-\beta)\Pr_\text{ph2}(W) + \alpha(1-\Pr_\text{ph2}(W)) } \end{aligned} \]

Drug Development Pipeline

  • If 2% of drugs entering phase 2 study are truly worthwhile, \(\alpha = 0.05\), and \(\beta = 0.20\), then \((0.80*0.02)/(0.80*0.02 + 0.05*0.98)\approx 25\%\) of drugs leaving phase 2 study, i.e. entering phase 3 study, are truly worthwhile

Drug Development Pipeline

  • Phase 2 goal is to enrich phase 3 population with worthwhile drugs so that positive findings in phase 3 are highly likely to be true positives:

  • If 25% of drugs entering phase 3 study are truly worthwhile, \(\alpha = 0.05\), and \(\beta = 0.20\), then \((0.80*0.25)/(0.80*0.25 + 0.05*0.75)\approx 84\%\) of drugs leaving phase 3 study, i.e. submitted for regulatory approval, are truly worthwhile

General dichotomy of phase 2 designs

  • Phase 2A: any evidence of activiy? Lower threshold, single arm, fewer patients

  • Phase 2B: evidence of greater efficacy? Higher threshold, randomized (multiple arms), more patients

  • In reality, blurry distinction between these two

Statistical setup for single arm phase 2

  • All patients enrolled to new therapy. Followed for outcome ('response')

  • Response often (forced to be) binary, \(Y\in\{0,1\}\).

  • May take time to occur, e.g. tumor shrinkage of X% by 3 months, clinical improvement of symptoms by 6 months. Should not take too long

Statistical setup for single arm phase 2

  • Two motivating questions:
  1. What is current response rate to best available therapy? (\(\color{orangeish}{p_0}\); historical)
  2. What response rate would suggest activity? (\(\color{greenish}{p_1}\))

Hypothesis testing setup

\[\gamma = \Pr(\text{response}) = \Pr(Y=1)\] \[ \begin{aligned} H_0: \gamma = \color{orangeish}{p_0 \text{ uninteresting scenario}}\\ H_1: \gamma = \color{greenish}{p_1 \text{ interesting scenario}} \end{aligned} \]

  • Reject \(H_0\) \(\Leftrightarrow\) conclude further study warranted

  • Do not reject \(H_0\) \(\Leftrightarrow\) conclude no further study warranted

Hypothesis testing setup

\({}\) \({}\) \({}\)
Decision \(\color{orangeish}{H_0 \text{ True}}\) \(\color{greenish}{H_1 \text{ True}}\)
Reject \(H_0\) Type I Error True Positive
Do Not Reject \(H_0\) True Negative Type II Error
  • One-sided hypothesis test.
  • Estimate \(\gamma\) with \(\hat\gamma = R/n\), \(R\) is number of responses
  • Reject \(H_0\) if \(R/n > r/n\) for some constant \(r\)
  • What sample size \(n\) is required?

Normal-based sample size

Assume that: \[ \begin{aligned} \hat\gamma | H_0 \sim N(p_0, \sigma^2_0); \hat\gamma | H_1 \sim N(p_1, \sigma^2_1) \end{aligned} \]

Normal-based sample size

\[ \begin{aligned} p_1 - p_0 = |p_0 - r/n| + |p_1 - r/n| \end{aligned} \]

Normal-based sample size

\[ \begin{aligned} |p_0 - r/n| + |p_1 - r/n| = c_1\sigma_0 + c_2\sigma_1 \end{aligned} \]

Normal quantiles

  • \(z_{1-\alpha}\) is smallest \(x\) such that \(\Pr(Z\leq x)\equiv\Phi(x) \geq 1-\alpha\)
  • \(z_{\beta}\) is smallest \(x\) such that \(\Phi(x) \geq \beta\)
  • \(z_{1-\beta}\) is smallest \(x\) such that \(\Phi(x) \geq 1-\beta\)

Claim: \(z_{1-\beta}=-z_\beta\)

Normal-based sample size

\[ \begin{aligned} c_1\sigma_0 + c_2\sigma_1 &= z_{1-\alpha} \sigma_0 + (- z_{\beta}) \sigma_1\\ &= z_{1-\alpha} \sigma_0 + z_{1-\beta} \sigma_1 \end{aligned} \]

Normal-based sample size

\[ \begin{aligned} p_1 - p_0 &= |p_0 - r/n| + |p_1 - r/n|\\ &= c_1\sigma_0 + c_2\sigma_1 \\ &= z_{1-\alpha} \sigma_0 + z_{1-\beta} \sigma_1\\ &\approx z_{1-\alpha} \sqrt{\dfrac{p_0(1-p_0)}{n}} + z_{1-\beta} \sqrt{\dfrac{p_1(1-p_1)}{n}}\\ \Rightarrow n &\approx \left(\dfrac{z_{1-\alpha} \sqrt{p_0(1-p_0)} + z_{1-\beta} \sqrt{p_1(1-p_1)}}{p_1-p_0}\right)^2 \end{aligned} \]

Applying formula

simp_sampsize = function(p0, p1, 
                         alpha=0.05, beta=0.20) {
  ceiling((qnorm(1-alpha) * sqrt(p0*(1-p0)) + 
             qnorm(1-beta) * sqrt(p1*(1-p1)))^2 /
            (p1-p0)^2);
}
x = matrix(mapply(simp_sampsize, 
                  p0 = rep((1:6)/10,each=6), 
                  p1 = rep((2:7)/10,times=6)),
           nrow=6,
           byrow=T,
           dimnames = list((1:6)/10, (2:7)/10));

Applying formula

0.2 0.3 0.4 0.5 0.6 0.7
0.1 69 20 10 6 4 3
0.2 109 29 13 8 5
0.3 136 35 16 9
0.4 151 38 16
0.5 153 37
0.6 142
  • \(p_0\) along rows; \(p_1\) along columns;
  • \(\alpha = 0.05\); \(\beta = 0.20\)
  • \(n \propto (p_0 - p_1)^{-2}\)
  • Infeasible to test for improvement of 0.1 or less

Comments on formula

  • Large-sample approximation based on normality assumption. Actually easy to calculate "exact" sample size directly using binomial distribution (hopefully get similar results)

  • Phase 2 trials have limited budgets – negotiations are inevitable. One of your jobs is to communicate ramifications of different choices

Determining \(r\)

  • Reject \(H_0\) if \(R/n > r/n\), that is, if sufficiently many patients respond

  • If \(p_0=0.20\), \(p_1=0.35\), then \(n=50\) (assuming \(\alpha = 0.05\), \(\beta = 0.20\)), and:

\(r\) \(\Pr(R>r|\gamma = 0.20)\) \(\Pr(R>r|\gamma = 0.35)\)
10 0.416 0.984
11 0.289 0.966
12 0.186 0.934
13 0.111 0.884
14 0.061 0.812
15 0.031 0.720
16 0.014 0.611

Determining \(r\)

  • So if 15 or more responses (out of \(n=50\)), move to phase 3

  • Reality check: 95% score-based confidence interval for \(\gamma\) is \((0.19, 0.44)\). Evidence not overwhelming.

Case Study: (Flaherty et al., 2010)

  • Background: BRAF mutation, metastatic melanoma

  • Objectives: safety, PK, RP2D, response rate, duration of response, rate of progression

  • Phase 1 (3+3) + Phase 2 (one arm, single stage)

Case Study: (Flaherty et al., 2010)

  • Enrollment:

    • Dose escalation: any solid tumors (mostly BRAF-mutated melanoma)
    • Dose extension: only BRAF-mutated melanoma

Case Study: (Flaherty et al., 2010)

  • Sample size:

    • Unclear calculations: "we calculated that a sample of 32 patients would provide 95% confidence (\(\alpha\) = 0.05), with 80% power (\(\beta\) = 0.20), that an observed response rate of 40% would be consistent with a true response rate of more than 10%, which was considered justification for further study."
    • \(H_0: \gamma = 0.10\)
    • What is \(H_1\)?
    • Reject \(H_0\) if \(R/32> 0.4 \approx 12/32\)
pbinom(12,32,0.1,lower=F);#type I error
## [1] 5.51e-06

Case Study: (Flaherty et al., 2010)

  • Results:

    1. Failed initial dose escalation (crystalline formulation had no biological impact) (26 patients)
    2. Reformualated agent into capsules; ran second dose escalation (29 patients)

      1. 1/7 DLTs at 720mg
      2. 4/7 DLTs at 1120mg
      3. Interpolated new dose: 960mg
    3. Extension phase at 960mg (32 patients)

      1. 26/32 partial complete responders;
      2. 2/32 complete responders

Case Study: (Flaherty et al., 2010)

  • Many on-the-fly changes. Some avoidable
  • Significant changes to treatment landscape since 2010

    • Three approved melanoma therapies at time of publication
    • This plus 10 more have been approved since

Stopping for futility

  • In one-stage design, all patients enrolled before decision

  • What if no responses after 5, 10, 15 patients? Worthwhile, ethical to continue?

  • Motivation for interim futility analyses

    • futile = no use in trying

    • lack of evidence for activity

Simple futility analysis

  • First stage assesses likelihood of no responses under \(H_1\), e.g.
# patients enrolled \(\Pr(R=0|\gamma = 0.35)\)
1 0.650
2 0.423
3 0.275
4 0.179
5 0.116
6 0.075
7 0.049
8 0.032

Two-stage design (Gehan, 1961)

  1. Enroll initial cohort of \(n_1\) patients. Stop for futility if no responders

  2. Otherwise, enroll remaining \(n-n_1\) patients, conduct standard hypothesis test at end

Gehan originally proposed \(n_1=14\) based upon \(H_1: \gamma = 20\%\). Circa 1960 (early chemotherapeutic era), 20% response rate was impressive. We use \(n_1=7\) based upon larger 35% response rate

Comparison of one-stage, modified Gehan designs

One stage

  • Continue to phase 3 if \(R>r\) responses out of \(n\) patients
n = 50; r = 14; p0 = 0.20; p1 = 0.35;
#Type I error = \Pr(R>r|\gamma = p0)
pbinom(q = r,size = n,prob = p0,lower.tail = F);
## [1] 0.0607
#Power = \Pr(R>r|\gamma = p1)
pbinom(q = r,size = n,prob = p1,lower.tail = F);
## [1] 0.812

Comparison of one-stage, modified Gehan designs

Modified Gehan design

  • Continue if \(R_1>0\) (first stage), \(R>r\) (second stage) \[ \begin{aligned} \text{Type I error} &= \Pr(R>r, R_1>0|\gamma = p_0)\\ &= \sum_{x = 1}^{n_1} \Pr(R > r, R_1 = x|\gamma = p_0)\\ &= \sum_{x = 1}^{n_1} \Pr(R-R_1 > r - x, R_1 = x|\gamma = p_0)\\ &= \sum_{x = 1}^{n_1} \Pr(R-R_1 > r - x|\gamma = p_0)\times\\ &\quad\quad\quad\Pr(R_1 = x|\gamma = p_0) \end{aligned} \]

Comparison of one-stage, modified Gehan designs

Modified Gehan design

n1=7; n=50; r=14; p0=0.20; p1=0.35;
(summand1 = pbinom(r-(1:n1), n - n1, p0, lower = F));
## [1] 0.0362 0.0733 0.1355 0.2289 0.3533 0.4997 0.6503
(summand2 = dbinom(1:n1, n1, p0));
## [1] 3.67e-01 2.75e-01 1.15e-01 2.87e-02 4.30e-03 3.58e-04 1.28e-05
#Type I error
sum(summand1 * summand2);
## [1] 0.0573
#Power
sum(pbinom(r-(1:n1), n - n1, p1, lower = F) * 
      dbinom(1:n1, n1, p1))
## [1] 0.785

Comparison of one-stage, modified Gehan designs

Type I Error Power Pr(Early Termination) E[Enrollment]
One Stage 0.061 0.812 0.00 50
Modified Gehan 0.057 0.785 0.21 41

Comments

  • Tradeoff: enroll 9 fewer patients (on average) for slight loss of power

  • However, futility analysis is conservative: \(\Pr(R_1=0|\gamma=p_0)=0.210\), i.e. almost 80% chance of enrolling max possible patients under \(H_0\)

  • Possible refinement: new stage 1 stopping rule: \(R_1>r_1\). Tune \(\{r_1,n_1\}\) to stop when \(H_0: \gamma = p_0\) appears likely

Refined two-stage design: First try

#Adjust your path as necessary
source("/Users/philb/Desktop/Work/Teaching/TwoStageSim.R");
#Require more than 2/10 responses
twostage_oc(r1 = 2, n1 = 10, r = 14, n = 50,
            p = seq(0.2, 0.45, by = 0.05));
##      Pr(Early Term) Pr(!Reject H0) Pr(Reject H0) Avg. Enrolled
## 0.2          0.6778        0.28060        0.0416          22.9
## 0.25         0.5256        0.29537        0.1790          29.0
## 0.3          0.3828        0.20413        0.4131          34.7
## 0.35         0.2616        0.09291        0.6455          39.5
## 0.4          0.1673        0.02798        0.8047          43.3
## 0.45         0.0996        0.00555        0.8949          46.0

Refined two-stage design: Second try

#Low power: increase r1, n1
twostage_oc(r1 = 4, n1 = 20, r = 14, n = 50,
            p = seq(0.2, 0.45, by = 0.05));
##      Pr(Early Term) Pr(!Reject H0) Pr(Reject H0) Avg. Enrolled
## 0.2          0.6296        0.31765        0.0527          31.1
## 0.25         0.4148        0.36170        0.2235          37.6
## 0.3          0.2375        0.25854        0.5040          42.9
## 0.35         0.1182        0.11951        0.7623          46.5
## 0.4          0.0510        0.03627        0.9128          48.5
## 0.45         0.0189        0.00722        0.9739          49.4

Refined two-stage design: Third try

#Too blunt: increase r1, n1 more to stop more often
twostage_oc(r1 = 7, n1 = 30, r = 14, n = 50,
            p = seq(0.2, 0.45, by = 0.05));
##      Pr(Early Term) Pr(!Reject H0) Pr(Reject H0) Avg. Enrolled
## 0.2          0.7608        0.18575        0.0535          34.8
## 0.25         0.5143        0.25839        0.2273          39.7
## 0.3          0.2814        0.20523        0.5134          44.4
## 0.35         0.1238        0.10066        0.7756          47.5
## 0.4          0.0435        0.03166        0.9248          49.1
## 0.45         0.0121        0.00645        0.9814          49.8

Process formalized by Simon

  • Goal: identify design set \(\{r_1, n_1, r, n\}\) that satisifies type I error (\(\alpha\)) and power (\(1-\beta\)) constraints. Lots of such sets exist

  • Simon proposed two designs that are best by some definition (Simon, 1989)

  • More than 2800 citations currently

Optimal Two-stage Design

\[ \begin{aligned} N &= n_11_{\{R_1\leq r_1\}}+n1_{\{R_1>r_1\}}\\ \text{Avg. Enrolled} &= E[N|\gamma=p_0]\\ &= n_1\Pr(R_1\leq r_1 |\gamma = p_0)+n\Pr(R_1>r_1|\gamma = p_0) \end{aligned} \]

  • Among all sets of \(\{r_1, n_1, r, n\}\) satisfying type I error (\(\alpha\)) and power (\(1-\beta\)) constraints, identify set that minimizes \(E[N|\gamma=p_0]\)

  • Minimize number of subjects enrolled when trial should not have been run

Minimax Two-stage Design

  • Among all sets of \(\{r_1, n_1, r, n\}\) satisfying type I error (\(\alpha\)) and power (\(1-\beta\)) constraints, identify set that minimizes \(n\)

  • Aimed at identifying effective agents using as few patients as possible

Implemented in clinfun package (Seshan, 2015)

library(clinfun);
ph2simon(pu = 0.20, pa = 0.35, ep1 = 0.05, ep2 = 0.20);
## 
##  Simon 2-stage Phase II design 
## 
## Unacceptable response rate:  0.2 
## Desirable response rate:  0.35 
## Error rates: alpha =  0.05 ; beta =  0.2 
## 
##         r1 n1  r  n EN(p0) PET(p0)
## Optimal  5 22 19 72  35.37  0.7326
## Minimax  6 31 15 53  40.44  0.5711
  • Compared to one-stage design (\(n=50\); \(r=14\)), there is cost in terms of numbers of patients

Should match our previous results

#Optimal design
twostage_oc(r1 = 5, n1 = 22, r = 19, n = 72,
            p = c(0.2,0.35));
##      Pr(Early Term) Pr(!Reject H0) Pr(Reject H0) Avg. Enrolled
## 0.2           0.733         0.2183        0.0491          35.4
## 0.35          0.163         0.0366        0.8005          63.9
#Minimax design
twostage_oc(r1 = 6, n1 = 31, r = 15, n = 53,
            p = c(0.2,0.35));
##      Pr(Early Term) Pr(!Reject H0) Pr(Reject H0) Avg. Enrolled
## 0.2          0.5711          0.379        0.0498          40.4
## 0.35         0.0462          0.152        0.8017          52.0

How stable are these designs?

Simon's Designs are not only options

  • Here is set that is nearly minimax and nearly optimal:
twostage_oc(r1 = 6, n1 = 27, r = 16, n = 58,
            p = c(0.2,0.35));
##      Pr(Early Term) Pr(!Reject H0) Pr(Reject H0) Avg. Enrolled
## 0.2           0.713         0.2371        0.0495          35.9
## 0.35          0.115         0.0846        0.8007          54.4

Admissible Two-Stage Designs

  • Consider expected cost function for any possible design set satisfying \(\{\alpha,\beta\}\) constraints given by

\[ \begin{aligned} C(q,\{r_1, n_1, r, n\}) = q n + (1-q)E[N|\gamma=p_0] \end{aligned} \]

  • It is weighted average of maximum sample size and expected sample size under \(H_0\)

  • Design set \(\{r_1, n_1, r, n\}\) is called admissible if it achieves smallest possible cost across all possible design sets for at least one \(q\in[0,1]\).

Admissible Two-Stage Designs

  • Simon's optimal design is admissible because it minimizes risk when \(q=(?)\)
  • Simon's minimax design is admissible because it minimizes risk when \(q=(?)\)
  • There are usually other admissible designs that minimize risk for \(q\in(0,1)\)
  • Outlined in (Jung et al., 2004)

Implemented in ph2mult package (Zhu and Qin, 2016)

Code is buggy

library(ph2mult);
binom.design(type = "admissible", p0 = 0.20, p1 = 0.35, 
             signif.level = 0.05, power.level = 0.8, plot.out = T);

##              r1 n1  r  n EN.p0. PET.p0.  error power
## Optimal       5 22 19 72   35.4   0.733 0.0491 0.800
## Admissible    4 20 17 62   35.6   0.630 0.0473 0.800
## Admissible.1  6 27 16 58   35.9   0.713 0.0495 0.801
## Minimax       6 31 15 53   40.4   0.571 0.0498 0.802
twostage_oc(7,36,17,59,p=c(0.2,0.35));#I added this snippet
##      Pr(Early Term) Pr(!Reject H0) Pr(Reject H0) Avg. Enrolled
## 0.2          0.5660          0.398         0.036          46.0
## 0.35         0.0332          0.166         0.800          58.2

Discussion

  • Phase 2 trials are most often futility trials. Goal is to prune inefficacious treatments

  • Could trial stop for efficacy at stage 1? Should it?

  • Different implications from stopping early because \(R_1\leq r_1\) and failing to reject \(H_0\) because \(R \leq r\)

  • Nothing special about two stages. More stages possible

Discussion, cont'd

  • Strict sample sizes and adherance to interim analyses may be difficult to adhere to

  • Causes trial conduct to deviate from trial design and potentially invalidates type I error and power properties: inference is conditional on following design as it is laid out

Inference depends on design: Ex 1

  • Suppose that we conduct minimax design with \(p_0=0.20\), \(p_1=0.35\), \(\alpha = 0.05\), and \(1-\beta = 0.80\): \(\{r_1,n_1,r,n\} = \{6, 31, 15, 53\}\)

  • We successfully conclude the trial with \(R = 16 > r = 15\) responders.

  • Could report \(p\)-value:

\[ \begin{aligned} p &= \Pr(R\geq16, R_1>6|\gamma = p_0)\\ &= \sum_{x = 7}^{31} \Pr(R > 15, R_1 = x|\gamma = p_0)\\ &= \sum_{x = 7}^{31} \Pr(R-R_1 > 15 - x, R_1 = x|\gamma = p_0)\\ &= \sum_{x = 7}^{31} \Pr(R-R_1 > 15 - x|\gamma = p_0)\times\\ &\quad\quad\quad\Pr(R_1 = x|\gamma = p_0) \end{aligned} \]

r1 = 6; n1=31; r=15; n=53; p0=0.20;R = 16;
sum(pbinom(R-1-((r1+1):n1), n - n1, p0, lower = F) *
                dbinom((r1+1):n1, n1, p0));
## [1] 0.0498
  • Different from \(p\)-value of same data from one-stage design:
#Pr(R > 15|n,p0)
pbinom(R-1, n, p0, lower = F);
## [1] 0.0512
#1  - Pr(R <= 15|n,p0)
1 - pbinom(R-1, n, p0);
## [1] 0.0512

Inference depends on design: Ex 2

Adapted from (Lindley and Phillips, 1976). Two designs for testing \(H_0:\gamma = 0.2\)

Design 1: Enroll \(n=25\) patients. \(R\) is binomial. Observe \(R=8\) responses; \(p\)-value is \[\Pr(R\geq 8|\gamma=0.2) = \sum_{x=8}^{25} \binom{25}{x} 0.2^x 0.8^{25-x}\]

pbinom(7,25,0.2,lower=F);#1-binomial CDF
## [1] 0.109

Design 2: Enroll patients until \(f=17\) non-responders. \(R\) is negative binomial. Observe \(R=8\) responses; \(p\)-value is \[\Pr(R\geq 8|\gamma=0.2) = \sum_{x=8}^{\infty}\binom{x+16}{x}0.2^x 0.8^{17} \]

pnbinom(7,17,1-0.2,lower=F);#1 - neg-binom CDF 
## [1] 0.0892
  • If \(\alpha=0.10\) were significance threshold, then same data (but different designs) yield different conclusions

Likelihood principle

  • In both examples, likelihood of final data was equal for both designs

  • Frequentist inference based on what could have occurred and not just what did occur

  • Violates likelihood principle: inference should be based only upon likelihood function and not design

References

Flaherty, K.T., Puzanov, I., Kim, K.B., Ribas, A., McArthur, G.A., Sosman, J.A., et al. (2010) Inhibition of mutated, activated braf in metastatic melanoma. New England Journal of Medicine, 363, 809–819.

Gehan, E.A. (1961) The determination of the number of patients required in a preliminary and a follow-up trial of a new chemotherapeutic agent. Journal of chronic diseases, 13, 346–353.

Jung, S.-H., Lee, T., Kim, K. and George, S.L. (2004) Admissible two-stage designs for phase ii cancer clinical trials. Statistics in Medicine, 23, 561–569.

Lindley, D.V. and Phillips, L. (1976) Inference for a bernoulli process (a bayesian view). The American Statistician, 30, 112–119.

Piantadosi, S. (2017) Clinical Trials: A Methodologic Perspective, 3rd ed. John Wiley & Sons.

Seshan, V.E. (2015) Clinfun: Clinical Trial Design and Data Analysis Functions.

Simon, R. (1989) Optimal two-stage designs for phase ii clinical trials. Controlled clinical trials, 10, 1–10.

Zhu, Y. and Qin, R. (2016) Ph2mult: Phase Ii Clinical Trial Design for Multinomial Endpoints.