Phase 1 wrap up

  • Initial look at safety of drug

  • Goal is to identity dose level carried forward in future studies of its efficacy (phase 2, phase 3)

  • In dose escalation studies, this dose level was sometimes called maximum tolerated dose

  • Designs depend upon context of drug and disease

Dose escalation designs that we learned

  • 3+3 / Storer's A Design

  • Biased Coin Design

  • Modified Toxicity Probability Interval (MTPI)

  • Continual Reassessment Method (CRM)

Dose escalation designs that we learned

Characteristic 3+3 Biased Coin MTPI CRM
Description Cohorts of 3 De-esc (if DLT) / Stay/Esc (Biased coin) De-esc/Stay/Esc (Based upon Posterior Dist) Dose-toxicity regression curv
MTD Largest dose below observed 33% DLT rate Post-trial analysis (e.g. isotonic regression, logistic regression) to find dose level closest to target Post-trial analysis to find dose level closest to target Dose-toxicity regression curve to find dose level closest to target
Sample size Integrated into design Pre-specified Pre-specified Pre-specified
Convergence to correct dose assignment No No Yes Yes
Convergence to correct MTD No Yes Yes Yes

Phase 2 introduction

  • Looking for evidence of activity, with respect to what is already available to patients

  • "Phase 2" is convenient word but loosely defined

  • Piantadosi uses term middle development

Wny is phase 2 needed?

Taken from (Piantadosi, 2017), p373

  • First evidence of activity using meaningful clinical outcome
  • Longer-term evaluation of safety
  • Feasibility of administering new treatment
  • Fine-tune dose, schedule
  • Practice for definitive study
  • Determine whether definitive study is warranted
  • "Depressure pipeline"

When is phase 2 step most critical?

Factors in favor of Phase 2 Factors in favor of skipping Phase 2
Many competing experimental therapies in pipeline Lack of available therapies for disease (either experimental or approved)
Pessimistic about likelihood of success Optimistic about likelihood of success
Availability of short-term, easily measured surrogate efficacy outcomes Efficacy usefully measured by hard clinical outcomes
Highly prevalent disease Rare disease
Opportunity cost for failed phase 3 trial would be high Opportunity cost for time required to do phase 2 is high

Drug Development Pipeline

  • Let \(W\) denote drugs that are worthwhile, \(W^C\) denote drugs that are not
  • Illustration of phase 2 goal:

Drug Development Pipeline

  • Let \(\Pr_\text{ph2}(W)\) denote proportion of truly "worthwhile" drug entering phase 2 stage
  • Let \(\alpha\), \(\beta\) denote nominal type I and type II error rates in phase 2
  • Let \(S_2\) indicate that a drug is selected in phase 2 pipeline
  • From Bayes Rule, true positive finding from phase 2 is: \[ \begin{aligned} \Pr(W|S_2) &= \dfrac{\Pr(S_2|W)\Pr_\text{ph2}(W) }{\Pr(S_2|W)\Pr_\text{ph2}(W) + \Pr(S_2|W^C)\Pr_\text{ph2}(W^C) }\\ &= \dfrac{(1-\beta)\Pr_\text{ph2}(W) }{(1-\beta)\Pr_\text{ph2}(W) + \alpha(1-\Pr_\text{ph2}(W)) } \end{aligned} \]

Drug Development Pipeline

  • If 2% of drugs entering phase 2 study are truly worthwhile, \(\alpha = 0.05\), and \(\beta = 0.20\), then \((0.80*0.02)/(0.80*0.02 + 0.05*0.98)\approx 25\%\) of drugs leaving phase 2 study, i.e. entering phase 3 study, are truly worthwhile

Drug Development Pipeline

  • Phase 2 goal is to enrich phase 3 population with worthwhile drugs so that positive findings in phase 3 are highly likely to be true positives:

  • If 25% of drugs entering phase 3 study are truly worthwhile, \(\alpha = 0.05\), and \(\beta = 0.20\), then \((0.80*0.25)/(0.80*0.25 + 0.05*0.75)\approx 84\%\) of drugs leaving phase 3 study, i.e. submitted for regulatory approval, are truly worthwhile

General dichotomy of phase 2 designs

  • Phase 2A: any evidence of activiy? Lower threshold, single arm, fewer patients

  • Phase 2B: evidence of greater efficacy? Higher threshold, randomized (multiple arms), more patients

  • In reality, blurry distinction between these two

Statistical setup for single arm phase 2

  • All patients enrolled to new therapy. Followed for outcome ('response')

  • Response often (forced to be) binary, \(Y\in\{0,1\}\).

  • May take time to occur, e.g. tumor shrinkage of X% by 3 months, clinical improvement of symptoms by 6 months. Should not take too long

Statistical setup for single arm phase 2

  • Two motivating questions:
  1. What is current response rate to best available therapy? (\(\color{orangeish}{p_0}\); historical)
  2. What response rate would suggest activity? (\(\color{greenish}{p_1}\))

Hypothesis testing setup

\[\gamma = \Pr(\text{response}) = \Pr(Y=1)\] \[ \begin{aligned} H_0: \gamma = \color{orangeish}{p_0 \text{ uninteresting scenario}}\\ H_1: \gamma = \color{greenish}{p_1 \text{ interesting scenario}} \end{aligned} \]

  • Reject \(H_0\) \(\Leftrightarrow\) conclude further study warranted

  • Do not reject \(H_0\) \(\Leftrightarrow\) conclude no further study warranted

Hypothesis testing setup

\({}\) \({}\) \({}\)
Decision \(\color{orangeish}{H_0 \text{ True}}\) \(\color{greenish}{H_1 \text{ True}}\)
Reject \(H_0\) Type I Error True Positive
Do Not Reject \(H_0\) True Negative Type II Error
  • One-sided hypothesis test.
  • Estimate \(\gamma\) with \(\hat\gamma = R/n\), \(R\) is number of responses
  • Reject \(H_0\) if \(R/n > r/n\) for some constant \(r\)
  • What sample size \(n\) is required?

Normal-based sample size

Assume that: \[ \begin{aligned} \hat\gamma | H_0 \sim N(p_0, \sigma^2_0); \hat\gamma | H_1 \sim N(p_1, \sigma^2_1) \end{aligned} \]

Normal-based sample size

\[ \begin{aligned} p_1 - p_0 = |p_0 - r/n| + |p_1 - r/n| \end{aligned} \]

Normal-based sample size

\[ \begin{aligned} |p_0 - r/n| + |p_1 - r/n| = c_1\sigma_0 + c_2\sigma_1 \end{aligned} \]

Normal quantiles

  • \(z_{1-\alpha}\) is smallest \(x\) such that \(\Pr(Z\leq x)\equiv\Phi(x) \geq 1-\alpha\)
  • \(z_{\beta}\) is smallest \(x\) such that \(\Phi(x) \geq \beta\)
  • \(z_{1-\beta}\) is smallest \(x\) such that \(\Phi(x) \geq 1-\beta\)

Claim: \(z_{1-\beta}=-z_\beta\)

Normal-based sample size

\[ \begin{aligned} c_1\sigma_0 + c_2\sigma_1 &= z_{1-\alpha} \sigma_0 + (- z_{\beta}) \sigma_1\\ &= z_{1-\alpha} \sigma_0 + z_{1-\beta} \sigma_1 \end{aligned} \]

Normal-based sample size

\[ \begin{aligned} p_1 - p_0 &= |p_0 - r/n| + |p_1 - r/n|\\ &= c_1\sigma_0 + c_2\sigma_1 \\ &= z_{1-\alpha} \sigma_0 + z_{1-\beta} \sigma_1\\ &\approx z_{1-\alpha} \sqrt{\dfrac{p_0(1-p_0)}{n}} + z_{1-\beta} \sqrt{\dfrac{p_1(1-p_1)}{n}}\\ \Rightarrow n &\approx \left(\dfrac{z_{1-\alpha} \sqrt{p_0(1-p_0)} + z_{1-\beta} \sqrt{p_1(1-p_1)}}{p_1-p_0}\right)^2 \end{aligned} \]

Applying formula

simp_sampsize = function(p0, p1, 
                         alpha=0.05, beta=0.20) {
  ceiling((qnorm(1-alpha) * sqrt(p0*(1-p0)) + 
             qnorm(1-beta) * sqrt(p1*(1-p1)))^2 /
x = matrix(mapply(simp_sampsize, 
                  p0 = rep((1:6)/10,each=6), 
                  p1 = rep((2:7)/10,times=6)),
           dimnames = list((1:6)/10, (2:7)/10));

Applying formula

0.2 0.3 0.4 0.5 0.6 0.7
0.1 69 20 10 6 4 3
0.2 109 29 13 8 5
0.3 136 35 16 9
0.4 151 38 16
0.5 153 37
0.6 142
  • \(p_0\) along rows; \(p_1\) along columns;
  • \(\alpha = 0.05\); \(\beta = 0.20\)
  • \(n \propto (p_0 - p_1)^{-2}\)
  • Infeasible to test for improvement of 0.1 or less

Comments on formula

  • Large-sample approximation based on normality assumption. Actually easy to calculate "exact" sample size directly using binomial distribution (hopefully get similar results)

  • Phase 2 trials have limited budgets – negotiations are inevitable. One of your jobs is to communicate ramifications of different choices

Determining \(r\)

  • Reject \(H_0\) if \(R/n > r/n\), that is, if sufficiently many patients respond

  • If \(p_0=0.20\), \(p_1=0.35\), then \(n=50\) (assuming \(\alpha = 0.05\), \(\beta = 0.20\)), and:

\(r\) \(\Pr(R>r|\gamma = 0.20)\) \(\Pr(R>r|\gamma = 0.35)\)
10 0.416 0.984
11 0.289 0.966
12 0.186 0.934
13 0.111 0.884
14 0.061 0.812
15 0.031 0.720
16 0.014 0.611

Determining \(r\)

  • So if 15 or more responses (out of \(n=50\)), move to phase 3

  • Reality check: 95% score-based confidence interval for \(\gamma\) is \((0.19, 0.44)\). Evidence not overwhelming.

Case Study: (Flaherty et al., 2010)

  • Background: BRAF mutation, metastatic melanoma

  • Objectives: safety, PK, RP2D, response rate, duration of response, rate of progression

  • Phase 1 (3+3) + Phase 2 (one arm, single stage)

Case Study: (Flaherty et al., 2010)

  • Enrollment:

    • Dose escalation: any solid tumors (mostly BRAF-mutated melanoma)
    • Dose extension: only BRAF-mutated melanoma

Case Study: (Flaherty et al., 2010)

  • Sample size:

    • Unclear calculations: "we calculated that a sample of 32 patients would provide 95% confidence (\(\alpha\) = 0.05), with 80% power (\(\beta\) = 0.20), that an observed response rate of 40% would be consistent with a true response rate of more than 10%, which was considered justification for further study."
    • \(H_0: \gamma = 0.10\)
    • What is \(H_1\)?
    • Reject \(H_0\) if \(R/32> 0.4 \approx 12/32\)
pbinom(12,32,0.1,lower=F);#type I error
## [1] 5.51e-06

Case Study: (Flaherty et al., 2010)

  • Results:

    1. Failed initial dose escalation (crystalline formulation had no biological impact) (26 patients)
    2. Reformualated agent into capsules; ran second dose escalation (29 patients)

      1. 1/7 DLTs at 720mg
      2. 4/7 DLTs at 1120mg
      3. Interpolated new dose: 960mg
    3. Extension phase at 960mg (32 patients)

      1. 26/32 partial complete responders;
      2. 2/32 complete responders

Case Study: (Flaherty et al., 2010)

  • Many on-the-fly changes. Some avoidable
  • Significant changes to treatment landscape since 2010

    • Three approved melanoma therapies at time of publication
    • This plus 10 more have been approved since

Stopping for futility

  • In one-stage design, all patients enrolled before decision

  • What if no responses after 5, 10, 15 patients? Worthwhile, ethical to continue?

  • Motivation for interim futility analyses

    • futile = no use in trying

    • lack of evidence for activity

Simple futility analysis

  • First stage assesses likelihood of no responses under \(H_1\), e.g.
# patients enrolled \(\Pr(R=0|\gamma = 0.35)\)
1 0.650
2 0.423
3 0.275
4 0.179
5 0.116
6 0.075
7 0.049
8 0.032

Two-stage design (Gehan, 1961)

  1. Enroll initial cohort of \(n_1\) patients. Stop for futility if no responders

  2. Otherwise, enroll remaining \(n-n_1\) patients, conduct standard hypothesis test at end

Gehan originally proposed \(n_1=14\) based upon \(H_1: \gamma = 20\%\). Circa 1960 (early chemotherapeutic era), 20% response rate was impressive. We use \(n_1=7\) based upon larger 35% response rate

Comparison of one-stage, modified Gehan designs

One stage

  • Continue to phase 3 if \(R>r\) responses out of \(n\) patients
n = 50; r = 14; p0 = 0.20; p1 = 0.35;
#Type I error = \Pr(R>r|\gamma = p0)
pbinom(q = r,size = n,prob = p0,lower.tail = F);
## [1] 0.0607
#Power = \Pr(R>r|\gamma = p1)
pbinom(q = r,size = n,prob = p1,lower.tail = F);
## [1] 0.812

Comparison of one-stage, modified Gehan designs

Modified Gehan design

  • Continue if \(R_1>0\) (first stage), \(R>r\) (second stage) \[ \begin{aligned} \text{Type I error} &= \Pr(R>r, R_1>0|\gamma = p_0)\\ &= \sum_{x = 1}^{n_1} \Pr(R > r, R_1 = x|\gamma = p_0)\\ &= \sum_{x = 1}^{n_1} \Pr(R-R_1 > r - x, R_1 = x|\gamma = p_0)\\ &= \sum_{x = 1}^{n_1} \Pr(R-R_1 > r - x|\gamma = p_0)\times\\ &\quad\quad\quad\Pr(R_1 = x|\gamma = p_0) \end{aligned} \]

Comparison of one-stage, modified Gehan designs

Modified Gehan design

n1=7; n=50; r=14; p0=0.20; p1=0.35;
(summand1 = pbinom(r-(1:n1), n - n1, p0, lower = F));
## [1] 0.0362 0.0733 0.1355 0.2289 0.3533 0.4997 0.6503
(summand2 = dbinom(1:n1, n1, p0));
## [1] 3.67e-01 2.75e-01 1.15e-01 2.87e-02 4.30e-03 3.58e-04 1.28e-05
#Type I error
sum(summand1 * summand2);
## [1] 0.0573
sum(pbinom(r-(1:n1), n - n1, p1, lower = F) * 
      dbinom(1:n1, n1, p1))
## [1] 0.785

Comparison of one-stage, modified Gehan designs

Type I Error Power Pr(Early Termination) E[Enrollment]
One Stage 0.061 0.812 0.00 50
Modified Gehan 0.057 0.785 0.21 41


  • Tradeoff: enroll 9 fewer patients (on average) for slight loss of power

  • However, futility analysis is conservative: \(\Pr(R_1=0|\gamma=p_0)=0.210\), i.e. almost 80% chance of enrolling max possible patients under \(H_0\)

  • Possible refinement: new stage 1 stopping rule: \(R_1>r_1\). Tune \(\{r_1,n_1\}\) to stop when \(H_0: \gamma = p_0\) appears likely

Refined two-stage design: First try

#Adjust your path as necessary
#Require more than 2/10 responses
twostage_oc(r1 = 2, n1 = 10, r = 14, n = 50,
            p = seq(0.2, 0.45, by = 0.05));
##      Pr(Early Term) Pr(!Reject H0) Pr(Reject H0) Avg. Enrolled
## 0.2          0.6778        0.28060        0.0416          22.9
## 0.25         0.5256        0.29537        0.1790          29.0
## 0.3          0.3828        0.20413        0.4131          34.7
## 0.35         0.2616        0.09291        0.6455          39.5
## 0.4          0.1673        0.02798        0.8047          43.3
## 0.45         0.0996        0.00555        0.8949          46.0

Refined two-stage design: Second try

#Low power: increase r1, n1
twostage_oc(r1 = 4, n1 = 20, r = 14, n = 50,
            p = seq(0.2, 0.45, by = 0.05));
##      Pr(Early Term) Pr(!Reject H0) Pr(Reject H0) Avg. Enrolled
## 0.2          0.6296        0.31765        0.0527          31.1
## 0.25         0.4148        0.36170        0.2235          37.6
## 0.3          0.2375        0.25854        0.5040          42.9
## 0.35         0.1182        0.11951        0.7623          46.5
## 0.4          0.0510        0.03627        0.9128          48.5
## 0.45         0.0189        0.00722        0.9739          49.4

Refined two-stage design: Third try

#Too blunt: increase r1, n1 more to stop more often
twostage_oc(r1 = 7, n1 = 30, r = 14, n = 50,
            p = seq(0.2, 0.45, by = 0.05));
##      Pr(Early Term) Pr(!Reject H0) Pr(Reject H0) Avg. Enrolled
## 0.2          0.7608        0.18575        0.0535          34.8
## 0.25         0.5143        0.25839        0.2273          39.7
## 0.3          0.2814        0.20523        0.5134          44.4
## 0.35         0.1238        0.10066        0.7756          47.5
## 0.4          0.0435        0.03166        0.9248          49.1
## 0.45         0.0121        0.00645        0.9814          49.8

Process formalized by Simon

  • Goal: identify design set \(\{r_1, n_1, r, n\}\) that satisifies type I error (\(\alpha\)) and power (\(1-\beta\)) constraints. Lots of such sets exist

  • Simon proposed two designs that are best by some definition (Simon, 1989)

  • More than 2800 citations currently

Optimal Two-stage Design

\[ \begin{aligned} N &= n_11_{\{R_1\leq r_1\}}+n1_{\{R_1>r_1\}}\\ \text{Avg. Enrolled} &= E[N|\gamma=p_0]\\ &= n_1\Pr(R_1\leq r_1 |\gamma = p_0)+n\Pr(R_1>r_1|\gamma = p_0) \end{aligned} \]

  • Among all sets of \(\{r_1, n_1, r, n\}\) satisfying type I error (\(\alpha\)) and power (\(1-\beta\)) constraints, identify set that minimizes \(E[N|\gamma=p_0]\)

  • Minimize number of subjects enrolled when trial should not have been run

Minimax Two-stage Design

  • Among all sets of \(\{r_1, n_1, r, n\}\) satisfying type I error (\(\alpha\)) and power (\(1-\beta\)) constraints, identify set that minimizes \(n\)

  • Aimed at identifying effective agents using as few patients as possible

Implemented in clinfun package (Seshan, 2015)

ph2simon(pu = 0.20, pa = 0.35, ep1 = 0.05, ep2 = 0.20);
##  Simon 2-stage Phase II design 
## Unacceptable response rate:  0.2 
## Desirable response rate:  0.35 
## Error rates: alpha =  0.05 ; beta =  0.2 
##         r1 n1  r  n EN(p0) PET(p0)
## Optimal  5 22 19 72  35.37  0.7326
## Minimax  6 31 15 53  40.44  0.5711
  • Compared to one-stage design (\(n=50\); \(r=14\)), there is cost in terms of numbers of patients

Should match our previous results

#Optimal design
twostage_oc(r1 = 5, n1 = 22, r = 19, n = 72,
            p = c(0.2,0.35));
##      Pr(Early Term) Pr(!Reject H0) Pr(Reject H0) Avg. Enrolled
## 0.2           0.733         0.2183        0.0491          35.4
## 0.35          0.163         0.0366        0.8005          63.9
#Minimax design
twostage_oc(r1 = 6, n1 = 31, r = 15, n = 53,
            p = c(0.2,0.35));
##      Pr(Early Term) Pr(!Reject H0) Pr(Reject H0) Avg. Enrolled
## 0.2          0.5711          0.379        0.0498          40.4
## 0.35         0.0462          0.152        0.8017          52.0

How stable are these designs?

Simon's Designs are not only options

  • Here is set that is nearly minimax and nearly optimal:
twostage_oc(r1 = 6, n1 = 27, r = 16, n = 58,
            p = c(0.2,0.35));
##      Pr(Early Term) Pr(!Reject H0) Pr(Reject H0) Avg. Enrolled
## 0.2           0.713         0.2371        0.0495          35.9
## 0.35          0.115         0.0846        0.8007          54.4

Admissible Two-Stage Designs

  • Consider expected cost function for any possible design set satisfying \(\{\alpha,\beta\}\) constraints given by

\[ \begin{aligned} C(q,\{r_1, n_1, r, n\}) = q n + (1-q)E[N|\gamma=p_0] \end{aligned} \]

  • It is weighted average of maximum sample size and expected sample size under \(H_0\)

  • Design set \(\{r_1, n_1, r, n\}\) is called admissible if it achieves smallest possible cost across all possible design sets for at least one \(q\in[0,1]\).

Admissible Two-Stage Designs

  • Simon's optimal design is admissible because it minimizes risk when \(q=(?)\)
  • Simon's minimax design is admissible because it minimizes risk when \(q=(?)\)
  • There are usually other admissible designs that minimize risk for \(q\in(0,1)\)
  • Outlined in (Jung et al., 2004)

Implemented in ph2mult package (Zhu and Qin, 2016)

Code is buggy

library(ph2mult); = "admissible", p0 = 0.20, p1 = 0.35, 
             signif.level = 0.05, power.level = 0.8, plot.out = T);

##              r1 n1  r  n EN.p0. PET.p0.  error power
## Optimal       5 22 19 72   35.4   0.733 0.0491 0.800
## Admissible    4 20 17 62   35.6   0.630 0.0473 0.800
## Admissible.1  6 27 16 58   35.9   0.713 0.0495 0.801
## Minimax       6 31 15 53   40.4   0.571 0.0498 0.802
twostage_oc(7,36,17,59,p=c(0.2,0.35));#I added this snippet
##      Pr(Early Term) Pr(!Reject H0) Pr(Reject H0) Avg. Enrolled
## 0.2          0.5660          0.398         0.036          46.0
## 0.35         0.0332          0.166         0.800          58.2


  • Phase 2 trials are most often futility trials. Goal is to prune inefficacious treatments

  • Could trial stop for efficacy at stage 1? Should it?

  • Different implications from stopping early because \(R_1\leq r_1\) and failing to reject \(H_0\) because \(R \leq r\)

  • Nothing special about two stages. More stages possible

Discussion, cont'd

  • Strict sample sizes and adherance to interim analyses may be difficult to adhere to

  • Causes trial conduct to deviate from trial design and potentially invalidates type I error and power properties: inference is conditional on following design as it is laid out

Inference depends on design: Ex 1

  • Suppose that we conduct minimax design with \(p_0=0.20\), \(p_1=0.35\), \(\alpha = 0.05\), and \(1-\beta = 0.80\): \(\{r_1,n_1,r,n\} = \{6, 31, 15, 53\}\)

  • We successfully conclude the trial with \(R = 16 > r = 15\) responders.

  • Could report \(p\)-value:

\[ \begin{aligned} p &= \Pr(R\geq16, R_1>6|\gamma = p_0)\\ &= \sum_{x = 7}^{31} \Pr(R > 15, R_1 = x|\gamma = p_0)\\ &= \sum_{x = 7}^{31} \Pr(R-R_1 > 15 - x, R_1 = x|\gamma = p_0)\\ &= \sum_{x = 7}^{31} \Pr(R-R_1 > 15 - x|\gamma = p_0)\times\\ &\quad\quad\quad\Pr(R_1 = x|\gamma = p_0) \end{aligned} \]

r1 = 6; n1=31; r=15; n=53; p0=0.20;R = 16;
sum(pbinom(R-1-((r1+1):n1), n - n1, p0, lower = F) *
                dbinom((r1+1):n1, n1, p0));
## [1] 0.0498
  • Different from \(p\)-value of same data from one-stage design:
#Pr(R > 15|n,p0)
pbinom(R-1, n, p0, lower = F);
## [1] 0.0512
#1  - Pr(R <= 15|n,p0)
1 - pbinom(R-1, n, p0);
## [1] 0.0512

Inference depends on design: Ex 2

Adapted from (Lindley and Phillips, 1976). Two designs for testing \(H_0:\gamma = 0.2\)

Design 1: Enroll \(n=25\) patients. \(R\) is binomial. Observe \(R=8\) responses; \(p\)-value is \[\Pr(R\geq 8|\gamma=0.2) = \sum_{x=8}^{25} \binom{25}{x} 0.2^x 0.8^{25-x}\]

pbinom(7,25,0.2,lower=F);#1-binomial CDF
## [1] 0.109

Design 2: Enroll patients until \(f=17\) non-responders. \(R\) is negative binomial. Observe \(R=8\) responses; \(p\)-value is \[\Pr(R\geq 8|\gamma=0.2) = \sum_{x=8}^{\infty}\binom{x+16}{x}0.2^x 0.8^{17} \]

pnbinom(7,17,1-0.2,lower=F);#1 - neg-binom CDF 
## [1] 0.0892
  • If \(\alpha=0.10\) were significance threshold, then same data (but different designs) yield different conclusions

Likelihood principle

  • In both examples, likelihood of final data was equal for both designs

  • Frequentist inference based on what could have occurred and not just what did occur

  • Violates likelihood principle: inference should be based only upon likelihood function and not design


