Model-based Phase 1 Designs

  • Statistical model for probability of DLT at dose level \(d_j\), \(\Pr_\beta(\text{DLT}|d_j)\), \(j = 1,\ldots, k\) dose levels, where \(\mu\) is parameter (or vector of parameters)

  • Two approaches:
    • Modified toxicity probability interval (MTPI) (Ji et al., 2010, Ji and Wang (2013))

    • Continual reassessment method (CRM) (O’Quigley, Pepe and Fisher, 1990)

How they work

Main idea: Always seeking dose level that has probability closest to some desired level \(p_T\)

  1. Enroll patient to current dose assignment
  2. Update data, \({ y} = \{\{y_1, n_1\},\{y_2, n_2\},\ldots,\{y_k, n_k\}\}\), where \(y_j\) is number of DLTs at dose level \(j\) and \(n_j\) is number of patients
  3. Fit model to \(y\) and use result to update dose assignment for next patient

Questions that need to be answered

  1. How many patients to enroll (when should trial stop)?
  2. How to use data to estimate \(\beta\)?

Brief review of parametric statistical models

  • Outcome (\(Y\))
    • \(Y\) is random variable;
    • \(y\) is realized value;
    • subscript \(i\) indicates specific observation (assumed independent)
  • Question 1: What can we say about the distribution of \(Y\)?

Generalized Linear Models

  • Three ingredients
  1. Distributional assumption on outcome
  2. Systematic function of covariates (if there are covariates), the predictor
  3. Linking function between predictor and outcome

Distributional assumption on outcome

  • \(Y\sim N(\mu,\sigma^2)\)
    • pdf: \(f(y|\mu,\sigma) = (2\pi\sigma^2)^{-1/2}\exp\{-(y-\mu)^2/2\sigma^2\}\)
  • \(Y\sim Bern(p)\)
    • pmf: \(\Pr(Y=y|\mu) = \mu^y(1-\mu)^{1-y}\)
  • Many others

(Second and third ingredients will be discussed later)

Likelihood

  • Data plugged into a distribution yields a likelihood
  • \(Y_i\sim N(\mu,\sigma^2)\) gives
    • \(L({ y}|\mu, \sigma^2) = (2\pi\sigma^2)^{-n/2}\exp\{-\sum_i (y_i-\mu)^2/2\sigma^2\}\)
  • \(Y_i\sim Bern(\mu)\) gives
    • \(L({ y}|\mu) = \mu^{y}(1-\mu)^{n-{y}}\)

Likelihood: key points

  • Likelihood is distribution of data given value of parameter.
    • Maximize likelihood with respect to parameters, i.e. view likelihood as function of parameters
    • Or treat parameter as a random quantity, conduct inference about posterior distribution

MTPI

  • Pre-specified sample size \(n\)
  • After each patient, model gives decision to escalate up one level (E; go to \(j+1\)), stay at current level (S; \(j\)), or de-escalate down one level (D; \(j - 1\))
  • Decision is based upon distribution of \(\Pr_\beta(\text{DLT}|d_j)\equiv \mu_j\) and its proximity to target toxicity rate \(p_T\), e.g. \(p_T=0.3\)
  • No covariates

MTPI Intuition

  • \(\mu_j\) is a probability and so always in \((0,1)\) interval
  • Partition \((0,1)\) interval into regions corresponding to escalate, stay, or de-escalate
  • Choosing region having the largest average density

MTPI Intuition

\(n_j = 7\); \(y_j = 1\); Escalate

MTPI Intuition

\(n_j = 7\); \(y_j = 2\); Stay

MTPI Intuition

\(n_j = 7\); \(y_j = 3\); Stay

MTPI Intuition

\(n_j = 7\); \(y_j = 4\); De-escalate

MTPI Intuition

\(n_j = 20\); \(y_j = 5\); Stay

Density of \(\mu_j\) is calculated using Bayes Theorem

Bayes' Theorem: \[ \begin{aligned} \Pr(A=a|B=b) &= \dfrac{\Pr(B=b|A=a) Pr(A=a) }{\sum_{a} \Pr(B=b|A=a) Pr(A=a)}\\ &\propto \Pr(B=b|A=a) Pr(A=a) \end{aligned} \] Bayesians view parameters as random quantities with distributions: \[ \begin{aligned} \pi(\mu| y) &= \dfrac{L( y|\mu) \pi(\mu)}{\int L( y|\mu) \pi(\mu) d\mu}\\ &=L( y|\mu) \pi(\mu) / f( y)\\ &\propto L( y|\mu) \pi(\mu)\\ (\text{Posterior} &\propto \text{Likelihood}\times \text{Prior}) \end{aligned} \]

MTPI Assumptions

  • \(\mu_1,\ldots,\mu_k\) are parameters to be estimated
  • \(L({ y}|\mu) = \prod_{j=1}^k \mu_j^{y_j}(1-\mu_j)^{n_j-y_j}\)
  • \(\pi(\mu) = \prod_{j=1}^k \pi(\mu_j) = \prod_{j=1}^k \text{Beta}(\mu_j|a_1,a_2) \propto \prod_{j=1}^k \mu_j^{a_1-1} (1-\mu_j)^{a_2-1}\)

How to interpret \(\pi(\mu_j)\)

  • \(\pi(\mu_j)\) is distribution of \(\mu_j\) before trial starts, i.e. before data are collected
  • \(\pi(\mu_j)=Beta(\mu_j|a_1, a_2)\) means \(\Pr(\mu_j\in[c, d]) = \int_c^d t^{a_1-1} (1-t)^{a_2-1} dt\)
#prior Pr(mu_j <= 0.3 | a_1 = a_2 = 0.5)
pbeta(0.3, 0.5, 0.5);
## [1] 0.369
#prior Pr(0.25 <= mu_j <= 0.35 | a_1 = a_2 = 0.5);
pbeta(0.35, 0.5, 0.5) - pbeta(0.25, 0.5, 0.5);
## [1] 0.0697

Why Beta priors?

\[ \begin{aligned} \pi(\mu_j|n_j,y_j)&\propto \mu_j^{y_j}(1-\mu_j)^{n_j-y_j} \mu_j^{a_1-1} (1-\mu_j)^{a_2-1}\\ &= \mu_j^{y_j+a_1-1}(1-\mu_j)^{n_j-y_j+a_2-1} \end{aligned} \]

  • Means that \(\pi(\mu_j|{ y}) = \pi(\mu_j|n_j,y_j) = \text{Beta}(\mu_j|y_j+a_1,n_j-y_j+a_2)\)

Why Beta priors?

  • Beta prior is conjugate:
    • Start with \(a_1+a_2\) effective subjects' worth of data (\(a_1\) DLTs and \(a_2\) non-DLTs)
    • Distribution of each \(\mu_j\) remains Beta after (real) data are collected (and independent across \(j\))
  • For each dose level, need only number of DLTs (\(y_j\)) and non-DLTs (\(n_j-y_j\)) at that dose level
  • These are probability distributions on probabilities

MTPI Dose Selection for next patient

  • Select the region (E, S, or D) with the highest average height of the posterior density
If this is largest: Then choose this:
\(\dfrac{\Pr(\mu_j < p_T - \epsilon | { y})}{(p_T - \epsilon)}\) E
\(\dfrac{\Pr(p_T - \epsilon < \mu_j < p_T + \epsilon| { y})}{(2 \epsilon)}\) S
\(\dfrac{\Pr(p_T - \epsilon < \mu_j| { y})}{(1 - p_T + \epsilon)}\) D

R script

mtpi_decision = function(y, 
                         n, 
                         target = 0.30, 
                         epsilon = 0.05,#equivalence interval around target 
                         prior_shape1 = 0.5,#a1
                         prior_shape2 = 0.5)#a2
{
  post_shape1 = y + prior_shape1;
  post_shape2 = n - y + prior_shape2;
  E = pbeta(target - epsilon, post_shape1, post_shape2) / 
    (target - epsilon);
  S = (pbeta(target + epsilon, post_shape1, post_shape2) -
         pbeta(target - epsilon, post_shape1, post_shape2)) / 
    (2 * epsilon);
  D = pbeta(target + epsilon, post_shape1, post_shape2,lower = F) / 
    (1 - target - epsilon);
  c(E = E, S = S, D = D);
}

R script

mtpi_decision(0,2);
##     E     S     D 
## 2.987 0.914 0.249
mtpi_decision(1,2);
##     E     S     D 
## 0.782 1.164 1.059
mtpi_decision(1,6);
##     E     S     D 
## 2.609 1.713 0.271

MTPI

  • Data are shared within a dose level but not between:

    • After \(n_1=6\) patients at dose level 1, \(y_1=1\) DLT, and MTPI says to escalate to dose level 2 for patient 7
    • After DLT / no DLT is observed for patient 7, \(\pi(\mu_2|{ y})=\text{Beta}(\mu_2|y_2+a_1,1-y_2+a_2)\), where \(y_2=0\) or \(y_2=1\)
    • No knowledge used about 1/6 DLT rate at dose level 1.

Safety rules

  • If \(\Pr(\mu_j > p_T) > \xi\), for some large \(\xi\), then de-escalate and never revisit level \(j\) (or higher)
  • If this is true for \(\mu_1\), then trial stops

Behavior can be completely pre-specified

n = 20;target = 0.30;epsilon = 0.05;shape_both = 0.5;
recommendations = unacceptable = 
  matrix(NA, n + 1, n, dimnames = list(0:(n),1:n));
for(i in 1:n) {
  recommendations[1:(i+1),i] =  
    c("E","S","D")[apply(matrix(mtpi_decision(0:i, i, 
                                              target = target, 
                                              epsilon = epsilon,
                                              prior_shape1 = shape_both,
                                              prior_shape2 = shape_both),
                                nrow=i+1),1,which.max)];
  unacceptable[1:(i+1), i] = 
    pbeta(target + epsilon, 
          shape_both + (0:i), 
          shape_both + i - (0:i), 
          lower = F) > 0.95;
}

Example

plot_code = function() {
  par(mar=c(5,4,0,2),oma =c(0,0.1,0,0.1),las=1);
  plot.new();plot.window(xlim = c(1,n), c(0,n));
  axis(1,at = 1:n);
  axis(2, at = 0:n, las = 2);axis(4, at = 0:n, las = 2);
  for(i in 1:n) {
    if(i%%2) segments(0, i, i-0.1, i, lty = 2, lwd = 2, col="grey50");
    text(recommendations[1:(i+1),i], x = i, y = 0:i, 
         col = ifelse(recommendations[1:(i+1),i]=="E",
                      "#377EB8",
                      ifelse(recommendations[1:(i+1),i]=="S",
                             "#4DAF4A",
                             "#E41A1C")));
    text("X", x = i, y = which(unacceptable[1:(i+1),i])-1, 
         col = "black");
  }
  mtext(expression(n[j]),side=1,line = 3,cex = cex_scale * 1.5);
  mtext(expression(y[j]),side=2,line = 3,cex = cex_scale * 1.5);
  }

Example

Estimated MTD at end of trial

  • Could calculate \(\hat\mu_j = (y_j + a_1) / (n_j + a_1 + a_2)\), \(j= 1, \ldots, k\)
    • This is the mean of the posterior Beta distribution
  • Choose dose \(j\) that minimizes \(|\hat\mu_j - p_T|\) across \(\{\hat\mu_1,\ldots,\hat\mu_k\}\)
  • Problems with this approach?

Estimated MTD at end of trial

  • Suppose \({ y} = \{\{3,8\},\{1,4\},\{4,8\}\}\) and \(a_1 = a_2 = 0.5\)
  • \(\{\hat\mu_1,\hat\mu_2,\hat\mu_3\} = \{0.39, 0.30, 0.50\}\)
  • \(\hat\mu_1>\hat\mu_2\)
  • Different amounts of uncertainty at each dose level, more information about dose level 1 than dose level 2, i.e. \(n_1=8\) vs. \(n_2=4\)

Sample from posterior and enforce montonicity

(Gelfand, Smith and Lee, 1992)

mu_draws = cbind(rbeta(1e6,3+shape_both,5+shape_both),
                 rbeta(1e6,1+shape_both,3+shape_both),
                 rbeta(1e6,4+shape_both,4+shape_both));
satisfied = which((mu_draws[,1] < mu_draws[,2]) & (mu_draws[,2] < mu_draws[,3]));
mu_draws_mono = mu_draws[satisfied,];
nrow(mu_draws_mono) / nrow(mu_draws);#proportion of draws retained
## [1] 0.191

Unconstrained posterior

\(\pi(\mu|{ y})\)

#posterior mean (closer to target is better)
colMeans(mu_draws);
## [1] 0.389 0.300 0.500
#posterior pr(mu_j > p_t) (smaller is better)
colMeans(mu_draws > target);
## [1] 0.692 0.447 0.889
#posterior pr(p_T-eps < mu_j < p_t + eps) (larger is better)
colMeans((mu_draws > target - epsilon) * ((mu_draws < target + epsilon)));
## [1] 0.227 0.183 0.126

Constrained posterior

\(\pi(\mu|{ y},\mu_1<\mu_2<\mu_3)\)

#posterior mean given monotonicity
colMeans(mu_draws_mono);
## [1] 0.259 0.388 0.580
#posterior pr(mu_j > p_t) given monotonicity
colMeans(mu_draws_mono > target);
## [1] 0.335 0.743 0.985
#posterior pr(p_T-eps < mu_j < p_t + eps) given monotonicity
colMeans((mu_draws_mono > target - epsilon) * ((mu_draws_mono < target + epsilon)));
## [1] 0.2999 0.2584 0.0365

Unconstrained vs. constrained posterior densities

ggplot() +
  geom_density(data = data.frame(mu_draws), aes(X2), fill = "orange", color = "orange", alpha = 0.2) +
  geom_density(data = data.frame(mu_draws_mono), aes(X2), fill = "orange", color = "orange", alpha = 0.5) +
  xlim(0, 1);

Unconstrained posterior density

require(ggplot2);
ggplot(data = data.frame(mu_draws[1:1e5,])) +
  lims(x=c(0,1),y=c(0,1)) +
  stat_density_2d(aes(x = X1, y = X2, fill = ..level..), geom = "polygon");

Constrained posterior density

require(ggplot2);
ggplot(data = data.frame(mu_draws_mono[1:1e5,])) +
  lims(x=c(0,1),y=c(0,1)) +
  stat_density_2d(aes(x = X1, y = X2, fill = ..level..), geom = "polygon");

MTPI Summary

  • Model-based alternative
  • Pre-specified tabular outputs of decisions provide alternative to simplicity of 3+3
  • No sharing between dose levels in dose assignments
  • Possible disconnect between dose assignments and final decision

Regression-based phase 1 designs

  • Use all current data to estimate \(\beta\), then assign dose \(d_j\) with estimated \(\hat p_j\) closest to target probability, e.g. \(\arg\min_j|\hat p_j - p_T|\). \(p_T\in(0,1)\) is target probability

  • Model is used during trial to make dose assignments and after trial to estimate MTD

Models with covariates

  • Previosly we asked What can we say about the distribution of \(Y\)?
  • If we have covariate(s), e.g. \(X\), then question is how does \(Y\) change with \(X\)?

Second and third ingredients of GLM describe relationship with covariate

  • Predictor is \(\alpha + x\beta\)
  • Link outcome and predictor via \(g(E[Y|X=x]) = \alpha + x\beta\), \(g\) monotone
    • If \(Y\) is Normal, \(E[Y|X=x]\in \mathcal{R}\), so common to use \(g(t)=g^{-1}(t)=t\)
    • If \(Y\) is Bernoulli, \(E[Y|X=x]=\Pr(Y=1|X=x)\) is in \((0,1)\), so \(g(t):[0,1]\rightarrow \mathcal{R}\)

Logistic likelihood

  • \(g(t) = \log(t/[1-t])\) \[ \begin{aligned} &g(\Pr(Y=1|X=x)) = \alpha + x\beta\\ &\Rightarrow \Pr(Y=1|X=x) = 1/(1+\exp\{-\alpha -x\beta\})\\ &\Rightarrow \Pr(Y=0|X=x) = 1 - 1/(1+\exp\{-\alpha -x\beta\})\\ &\Rightarrow L({ y}|{ x},\alpha,\beta) \\ &\quad= \prod_{i=1}^n \Pr(Y_i=y_i|X=x_i) \\ &\quad= \prod_{i=1}^n \left(\dfrac{1}{1+\exp\{-\alpha - x_i\beta\}}\right)^{y_i}\left(1-\dfrac{1}{1+\exp\{-\alpha - x_i\beta\}}\right)^{1-y_i}\\ \end{aligned} \]

Probit likelihood

  • \(g(t) = \Phi^{-1}(t)\), where \(\Phi(t) = \int_{\infty}^t (2\pi)^{-1/2}\exp\{-u^2/2\}\)
  • Interpretation:
    • Latent random variable \(Z \sim N(0,1)\)
    • Do not observe \(Z\) directly but rather \[ \begin{aligned} Y = \begin{cases} 1, & Z < \alpha + x\beta\\ 0, & Z > \alpha + x\beta \end{cases} \end{aligned} \]

Probit likelihood

  • Then, \[ \begin{aligned} \Pr(Y=1|X=x) &= \Pr(Z < \alpha + x\beta)\\ &= \Phi(\alpha + x\beta) \end{aligned} \]

Probit likelihood

\[ \begin{aligned} &g(\Pr(Y=1|X=x)) = \alpha + x\beta\\ &\Rightarrow \Pr(Y=1|X=x) = \Phi(\alpha + x\beta)\\ &\Rightarrow \Pr(Y=0|X=x) = 1- \Phi(\alpha + x\beta)\\ &\Rightarrow L({ y}|{ x},\alpha,\beta) \\ &\quad= \prod_{i=1}^n \Pr(Y_i=y_i|X=x_i) \\ &\quad= \prod_{i=1}^n \left(\Phi(\alpha + x\beta\})\right)^{y_i}\left(1-\Phi(\alpha + x\beta\})\right)^{1-y_i}\\ \end{aligned} \]

Dose scale matters for regressions

Dose scale matters for regressions

How to measure dose?

Important to consider scale of dose in model:

  • Choice of model may restrict range of inputs
  • Dose level measurements have wide range

Modeling requires transforming biological measurements of dose (\(d_j\)) into quantity suitable to plug into model (\(s_j\)): \(\Pr_\beta(\text{DLT}|d_j) = f(s_j,\beta) = p_j\). Set of \(\{s_j\}\) often called skeleton

Example

Common problem: always need to ensure scale of covariate matches model. For example, different ways to measure size of person:

  • kg
  • log(kg)
  • bmi = kg/m\(^2\)

Constructing skeleton is selecting proper scale for dose

Possible skeletons

  • \(s_j = \log d_j\in\mathcal{R}\) (as in Storer's BC design)
  • \(s_j\) such that \(f(s_j,\beta=0)=\tilde {\text{Pr}}(\text{DLT}|d_j)\in(0,1)\) is anticipated rate of DLT (a priori guess at DLT)
    • When parameter is at its null value, (\(\beta=0\)) skeleton describes truth

Philosophy of regression-based phase I designs

  • Not feasible to efficiently estimate entire dose-toxicity curve
  • Instead want to estimate well the DLT rate at dose level closest to MTD
  • Simple, one-parameter models allow for local efficiency at expense of global efficiency

One-parameter logistic model

\[p_j = 1/(1+\exp\{-c - e^\beta s_j\})\]

  • \(c\) fixed constant (not estimated)
  • \(s_j=-c + \log(\tilde {\text{Pr}}(\text{DLT}|d_j)/[1-\tilde {\text{Pr}}(\text{DLT}|d_j)])\)

\(p_j = 1/(1+\exp\{-3 - e^\beta s_j\})\)

\(p_j = 1/(1+\exp\{-1 - e^\beta s_j\})\)

One-parameter probit model

\[p_j = \Phi(c + e^\beta s_j)\]

  • \(\Phi(x) = \int_{-\infty}^x (\sqrt{2\pi})^{-1/2} e^{\tfrac{-t^2}{2}} dt = \Pr(Z<x)\)
  • \(c\) fixed constant (not estimated)
  • \(s_j = -c + \Phi^{-1}\tilde {\text{Pr}}(\text{DLT}|d_j)\)

\(p_j = \Phi(3 + e^\beta s_j)\)

\(p_j = \Phi(1 + e^\beta s_j)\)

Power model

\[p_j = s_j^{\exp\{\beta\}}\]

  • \(s_j = \tilde{\text{Pr}}(\text{DLT}|d_j)\)
  • Where have we seen this before?

\[ \begin{aligned} \log(p_j) = e^\beta \log(s_j)\\ \log(-\log(p_j)) = \beta + \log(-\log(s_j)) \end{aligned} \]

  • What one-parameter version is this?

\(p_j = s_j^{\exp\{\beta\}}\)

Conduct

  1. Enroll patient at current dose (choose your starting dose)
  2. Current data are \(\{\{y_1,n_1\},\{y_2,n_2\},\ldots,\{y_k,n_k\}\}\)
  3. Fit model to current data to estimate parameter \(\beta\) with \(\hat\beta\)
  4. Identify \(\arg\min_j|f(s_j,\hat\beta) - p_T|\), assign to next patient

How to use data to estimate \(\beta\)?

Suppose power model, with \(\theta = e^\beta\): \[ \begin{aligned} Y_i &= \begin{cases} 0, & \text{No DLT in patient } i\\ 1, & \text{DLT in patient } i \end{cases}\\ x_i\in\{s_1,s_2,\ldots,s_m\} &=\text{ skeleton value of dose for subject } i\\ \Pr(Y_i=1|\theta) &= x_i^\theta \end{aligned} \] So \(Y_i|\theta\) is Bernoulli with probability \(x_i^\theta\).

Option 1: Maximum Likelihood

Likelihood after \(n\) patients: \[ \begin{aligned} L({ y}|\theta) &= \prod_{i=1}^n x_i^{\theta y_i}(1-x_i^\theta)^{1-y_i}\\ \log L({ y}|\theta) &= \sum_{i=1}^n y_i\theta\log(x_i) + (1-y_i)\log(1-x_i^\theta)\\ \frac{d\log L({ y}|\theta)}{d \theta} &= \sum_{i=1}^n y_i\log(x_i) - (1-y_i)\dfrac{x_i^\theta}{1-x_i^\theta}\log(x_i) \end{aligned} \] Solve score to find maximum likelihood estimate (MLE) of \(\theta\)

Option 1: Maximum Likelihood

If \(x_i\equiv x\) for all patients, then closed form solution is \[ \] Problems with this approach?

Option 1: Maximum Likelihood

If \(x_i\equiv x\) for all patients, then closed form solution is \[\hat{\theta} = \log \bar Y/\log x\] Problems with this approach?

Option 1: Maximum Likelihood

  • \(\bar Y = 0 \Rightarrow \hat\theta = \infty\), no (finite) MLE exists

  • \(\bar Y = 1 \Rightarrow \hat\theta = 0\Rightarrow s_j^0 = 1\) for all \(j\)

Ad-hoc solution: run trial according to '3+3' until \(\hat\theta\in(0,\infty)\), then switch to maximum likelihood

Word of the day!

Option 2: Bayesian Analysis

Bayesian approach to phase 1 trials:

  1. Before trial begins, specify prior belief about parameter. Prior \(\pi(\theta)\) is distribution of \(\theta\) before trial starts

    • E.g. For power model, if prior at \(\theta=1\) is high (equivalently, at \(\beta=0\), where \(\theta=e^\beta\)), then confident that skeleton is about right

    • In contrast to MTPI, prior is deliberately informative: meant to overcome lack of data early on in trial

Option 2: Bayesian Analysis

  1. As data are collected, information is quantified in likelihood and incorporated in posterior
    • Impact should (will) decrease as data are collected

Option 2: Bayesian Analysis

  1. Dose assignments for later patients reflect information gained from earlier patients (compare / contrast with MTPI)

Option 2: Bayesian Analysis

  • The likelihood is different-looking than MTPI: \[ \begin{aligned} L({ y}|\theta) &= \prod_{i=1}^2 x_i^{\theta y_i} (1-x_i^\theta)^{1-y_i} \end{aligned} \]

Option 2: Bayesian Analysis (artificial example)

  • Power model (\(p_j = s_j^\theta\)); \(\{s_1,s_2,s_3\} = \{0.2,0.3,0.4\}\)
  • Targeting \(p_T = 0.3\)

  • \(\theta\in\{0.5, 1.0, 1.5\}\)
    • \(\theta=0.5\Rightarrow p_1=0.45;p_2=0.55;p_3=0.63\) (all dose levels toxic)
    • \(\theta=1.0\Rightarrow p_1=0.20;p_2=0.30;p_3=0.40\) (dose 2 is MTD)
    • \(\theta=1.5\Rightarrow p_1=0.09;p_2=0.16;p_3=0.25\) (dose 3 is MTD)

Option 2: Bayesian Analysis (artificial example)

  • A priori, \(\Pr(\theta=0.5)=0.3; \Pr(\theta=1)=0.5; \Pr(\theta=1.5) = 0.2\)
  • Prior mean of \(\theta\) is \[ \int \theta \pi(\theta)d\theta = 0.5\times 0.3 + 1.0\times0.5 + 1.5\times 0.2 = 0.95\]
  • As data are collected, more (fewer) DLTs will tend to decrease (increase) \(\theta\), so that \(p_1,p_2,p_3\) increase (decrease).

Option 2: Bayesian Analysis (artificial example)

  • Suppose \(n=2\), with both patients at dose level 1, i.e. \(x_1=x_2=s_1=0.2\). Then \[ \begin{aligned} L({ y}|\theta) &= \prod_{i=1}^2 x_i^{\theta y_i} (1-x_i^\theta)^{1-y_i}\\ &= 0.2^{\theta \sum y_i}\times(1-0.2^\theta)^{2-\sum y_i} \end{aligned} \]

Option 2: Bayesian Analysis (artificial example)

\(L(y|\theta)\) for different \(\sum y_i\) with \(n=2\)

\(\sum y_i\) \(\theta=0.5\) \(\theta=1.0\) \(\theta=1.5\)
0 0.31 0.64 0.83
1 0.25 0.16 0.08
2 0.20 0.04 0.01

Option 2: Bayesian Analysis (artificial example)

Recall: \[\text{Posterior}\equiv\pi(\theta|{ y}) = \dfrac{L({ y}|\theta)\times \pi(\theta)}{\sum_\theta L({ y}|\theta)\times \pi(\theta)}\]

Option 2: Bayesian Analysis (artificial example)

\(\pi(\theta| y)\) for different \(\sum y_i\) with \(n=2\)

\(\sum y_i\) \(\theta=0.5\) \(\theta=1.0\) \(\theta=1.5\)
0 0.16 0.55 0.29
1 0.44 0.47 0.10
2 0.74 0.25 0.02

Option 2: Bayesian Analysis (artificial example)

Posterior mean of \(\theta\) is \[ \begin{aligned} \bar\theta({ y})&=\int \theta \pi(\theta|{ y})d\theta\\ \left(\sum y_i=0\right)\Rightarrow&\quad= 0.5\times 0.16 + 1.0\times0.55 + 1.5\times 0.29 = 1.06\\ \left(\sum y_i=1\right)\Rightarrow&\quad= 0.5\times 0.44 + 1.0\times0.47 + 1.5\times 0.10 = 0.83\\ \left(\sum y_i=2\right)\Rightarrow&\quad= 0.5\times 0.74 + 1.0\times0.25 + 1.5\times 0.02 = 0.64 \end{aligned} \]

Option 2: Bayesian Analysis (artificial example)

\[ \begin{aligned} \sum y_i=0\Rightarrow \bar\theta({ y})=1.06 &\Rightarrow \{s_1^{\bar\theta},s_2^{\bar\theta},s_3^{\bar\theta}\}\\ &= \{0.2^{1.06},0.3^{1.06},0.4^{1.06}\}\\ &= \{0.18, 0.28, 0.38\} \end{aligned} \] So subject 3 assigned dose level 2 if no DLTs in first two subjects

Option 2: Bayesian Analysis (artificial example)

\[ \begin{aligned} \sum y_i=1\Rightarrow \bar\theta({ y})=0.83 &\Rightarrow \{s_1^{\bar\theta},s_2^{\bar\theta},s_3^{\bar\theta}\}\\ &= \{0.2^{0.83},0.3^{0.83},0.4^{0.83}\}\\ &= \{0.26, 0.37, 0.47\} \end{aligned} \] So subject 3 assigned dose level 1 if one DLT in first two subjects

Option 2: Bayesian Analysis (artificial example)

\[ \begin{aligned} \sum y_i=2\Rightarrow \bar\theta({ y})=0.64 &\Rightarrow \{s_1^{\bar\theta},s_2^{\bar\theta},s_3^{\bar\theta}\}\\ &= \{0.2^{0.64},0.3^{0.64},0.4^{0.64}\}\\ &= \{0.36, 0.46, 0.56\} \end{aligned} \] So subject 3 assigned dose level 1 if two DLTs in first two subjects (or would consider stopping, because lowest dose level exceeds target \(p_T=0.3\))

Comments on Example

  • Prior guarantees existence of posterior mean, even when data are separated

  • Illustrates sensitivity of results to choice of prior

  • Continuous prior on \(\theta\) (or \(\beta\)) often used, e.g. \(\beta\sim N(0,\sigma^2)\)

  • Non-trivial computational component. Statistician required before, during, after

CRM

  • Continual Reassessment Method (O’Quigley et al., 1990) is formalization of model-based phase 1 design, including
    • choice of model
    • how to make dose assignments
    • recommendation of MTD

One versus Two Parameters

  • CRM homes in on target dose level. It tries to find and stay at MTD
  • Priority is precise estimate at MTD, not capturing entire dose-toxicity curve
  • Two parameters may offer better fit but less precise fit at MTD

CRM Simulation 1

CRM Simulation 2

CRM Simulation 3

CRM R packages

  • dfcrm (Cheung, 2013); bcrm (Sweeting, Mander and Sabin, 2013);
    • Implement one-parameter power and logistic models
    • Conduct simulations to quantify performance
install.packages("dfcrm");
library(dfcrm);

Example (dfcrm)

sim1 = titesim(
  PI = c(14,15,16,17,30,50)/100,#True curve 
  prior = c(12,16,22,30,40,52)/100,#skeleton
  x0 = 2,#starting dose level
  n = 36,#sample size
  target = 0.30,#target rate of DLT
  nsim = 100,#number of simulated trials
    count = F,#Don't display progress of simulations
    restrict = T,#place restrictions on dose escalation
    scale = 0.6) #sd of normal prior on beta

Example

sim1;
## 
## Number of simulations:    100 
## Patient accrued:  36 
## Target DLT rate:  0.3 
##             1    2    3    4     5    6
## Truth    0.14 0.15 0.16 0.17  0.30 0.50
## Prior    0.12 0.16 0.22 0.30  0.40 0.52
## Selected 0.00 0.00 0.02 0.21  0.67 0.10
## Nexpt    0.24 1.94 3.59 9.48 16.30 4.45
## Ntox     0.02 0.23 0.58 1.50  4.71 2.17
## 
## The distribution of trial duration:
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##      37      37      37      37      37      37 
## 
## The trials are generated by a TITE-CRM starting at dose 2 
## 
## Restriction apply to avoid
##   (1) Skipping doses in escalation;
##   (2) Escalation immediately after a toxic outcome.
## 
## The working model is empiric 
##  ptox = dose^{exp(beta)} with doses = 0.12 0.16 0.22 0.3 0.4 0.52 
##  and beta is estimated by its posterior mean 
##  assuming a normal prior with mean 0 and variance 0.36 
## 
## The linear function is used to assign weights to patients.
## 
## Patient arrival is modeled as a fixed process
##  with rate 1 patients per 1 time units (= observation window).

How many patients to enroll?

  • Simulation study: assuming true dose-toxicity relationship, how many patients are required to identify MTD with certain probability?
  • Often constrained by logistics: difficult to enroll patients, no promise of therapeutic benefit
  • Storer's Design A commits to up to 6 patients times number of dose levels. Use this as lower bound.

Class excercise

anticipated_probabilities = c(12,16,22,30,40,52)/100
truth1 = c(2,4,12,30,45,55)/100;
truth2 = c(2,12,20,23,26,30)/100;
truth3 = c(14,14.5,15,15.5,30,50)/100;
truth4 = c(20,30,42,54,66,80)/100
  • Use the titesim function do design a simple trial:
  1. Choose a model (options are empiric or one-par logistic)
  2. Determine number of patient needed to identify true MTD with probability at least 0.70 under all truths and such that the expected number of patients treated at the true MTD is 15.
  3. Investigate sensivity to prior scale
  4. What design elements did you choose? How many patients do you need?

Summary/Discussion

  • Significant conceptual leap from 3+3 to model-based designs. But both make assumptions

  • Probability of correctly identifying MTD increases with \(n\)

Further Reading

  • Typical to constrain model-based recommendations to maximize patient safety (Goodman, Zahurak and Piantadosi, 1995)
    • Cohorts of size \(>1\) at each dose level (moderates dose escalation)
    • Never escalate more than one dose level per patient
    • Never assign dose \(j\) when \(\hat p_j=f(s_j,\hat\beta)\) exceeds \(p_T\) by some tolerance, even when it is numerically closest. E.g. \(f(s_2,\hat\beta) = 0.23; f(s_3,\hat\beta) = 0.35; p_T = 0.30\), may feel that \(d_3\) is too high

Further Reading

  • Selecting value of constant \(c\) in one parameter models (Chevret, 1993)

  • Other commentaries, studies of CRM (versus 3+3) (Korn et al., 1994,Piantadosi, Fisher and Grossman (1998),Garrett-Mayer (2006))

References

Cheung, K. (2013) Dfcrm: Dose-Finding by the Continual Reassessment Method.

Chevret, S. (1993) The continual reassessment method in cancer phase i clinical trials: A simulation study. Statistics in medicine, 12, 1093–1108.

Garrett-Mayer, E. (2006) The continual reassessment method for dose-finding studies: A tutorial. Clinical Trials, 3, 57–71.

Gelfand, A.E., Smith, A.F. and Lee, T.-M. (1992) Bayesian analysis of constrained parameter and truncated data problems using gibbs sampling. Journal of the American Statistical Association, 87, 523–532.

Goodman, S.N., Zahurak, M.L. and Piantadosi, S. (1995) Some practical improvements in the continual reassessment method for phase i studies. Statistics in medicine, 14, 1149–1161.

Ji, Y. and Wang, S.-J. (2013) Modified toxicity probability interval design: A safer and more reliable method than the 3+ 3 design for practical phase i trials. Journal of Clinical Oncology, 31, 1785–1791.

Ji, Y., Liu, P., Li, Y. and Nebiyou Bekele, B. (2010) A modified toxicity probability interval method for dose-finding trials. Clinical Trials, 7, 653–663.

Korn, E.L., Midthune, D., Chen, T.T., Rubinstein, L.V., Christian, M.C. and Simon, R.M. (1994) A comparison of two phase i trial designs. Statistics in medicine, 13, 1799–1806.

O’Quigley, J., Pepe, M. and Fisher, L. (1990) Continual reassessment method: A practical design for phase 1 clinical trials in cancer. Biometrics, 33–48.

Piantadosi, S., Fisher, J. and Grossman, S. (1998) Practical implementation of a modified continual reassessment method for dose-finding trials. Cancer chemotherapy and pharmacology, 41, 429–436.

Sweeting, M., Mander, A. and Sabin, T. (2013) bcrm: Bayesian continual reassessment method designs for phase i dose-finding trials. Journal of Statistical Software, 54, 1–26.