Probabilistic sensitivity analysis with multistate in Stata

I had a question come in about how to alter a fitted model object to essentially conduct probabilistic sensitivity analysis within a multistate setting, using predictms from my multistate Stata package. So here goes.


Probabilistic sensitivity analysis (PSA)

I’m not a health economist. I had to Google probabilistic sensitivity analysis when starting to write this to check it was what I thought it was…

Ok, disclaimers out the way.

This post is mainly about how to manipulate an existing model object in Stata, or to define one given a set of parameter values. Such things are needed when conducting PSA.

I’ll start with the brcancer dataset:

. webuse brcancer, clear
(German breast cancer data)

. stset rectime, f(censrec) scale(365.25)

     failure event:  censrec != 0 & censrec < .
obs. time interval:  (0, rectime]
 exit on or before:  failure
    t for analysis:  time/365.25

        686  total observations
          0  exclusions
        686  observations remaining, representing
        299  failures in single-record/single-failure data
  2,111.978  total analysis time at risk and under observation
                                                at risk from t =         0
                                     earliest observed entry t =         0
                                          last observed exit t =  7.279945

and fit a proportional hazards Weibull model adjusting for hormonal therapy.

. streg hormon, distribution(weibull) nohr nolog

         failure _d:  censrec
   analysis time _t:  rectime/365.25

Weibull PH regression

No. of subjects =          686                  Number of obs    =         686
No. of failures =          299
Time at risk    =  2111.978097
                                                LR chi2(1)       =       10.36
Log likelihood  =   -694.53925                  Prob > chi2      =      0.0013

          _t |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
      hormon |  -.3932402   .1248267    -3.15   0.002     -.637896   -.1485844
       _cons |  -2.195131   .1093755   -20.07   0.000    -2.409504   -1.980759
       /ln_p |    .250997   .0496958     5.05   0.000     .1535949     .348399
           p |   1.285306   .0638744                      1.166018    1.416798
         1/p |   .7780247   .0386646                      .7058172    .8576193

and we get a log hazard ratio of -0.393. Now I’m going to stick to using the predictms function from my multistate package in this post, so that you can hopefully see how to extend it to a multistate setting. predictms doesn’t necessarily need a complex multistate setting to work - we can use the survival option to use its prediction engine with a single event, standard survival analysis. We first store our model object:

. estimates store m1

mainly because I prefer the models() syntax of predictms, but also because it’s easier to extend to more complex settings. I create a time variable at which to calculate predictions at,

. range tvar 0 5 2
(684 missing values generated)

. list tvar in 1/2

     | tvar |
  1. |    0 |
  2. |    5 |

I’m going to predict the survival probability for a patient in the treated group, at $t=5$. I also need the time 0 in there, because predictms currently uses the minimum of your timevar (when you use one) to define the at risk enter time (this is behaviour I will improve in a future update).

We call predictms, predicting the transition probabilities for a patient in the hormonal therapy group.

. predictms , models(m1) at1(hormon 1) survival timevar(tvar) seed(398471)

Nice and simple. We get two variables:

. list _prob_at1* in 1/2, abbrev(15)

     | _prob_at1_1_1   _prob_at1_1_2 |
  1. |             1               0 |
  2. |        .55411          .44589 |

which is the probability of still being in the starting state, i.e. alive, and the probability of being dead, at time 0 and at 5 years.

Now let’s say we want to alter my log hazard ratio to assess its impact. Well, I want to set it to be -1, and recalculate my predicted survival at 5 years.

First I’ll make a copy of my estimated coefficient vector, which is stored in e(b).

. mat b1 = e(b)

My log hazard ratio is in the first element, so I’ll replace it with a -1, and check it’s worked:

. mat b1[1,1] = -1

. mat list b1

            _t:         _t:          /:
        hormon       _cons        ln_p
y1          -1  -2.1951315   .25099699

All good. Now the tricky part. There are lots of options for Stata’s maximum likelihood (ml) engine, that most of the time feels like only I know about…but not many spend their time delving into Stata’s source code. But everything here is fully documented! Essentially I want a Weibull streg model object, with coefficients set to the values in b1. So, we can pass those as initial values using the from() option, but tell it not to proceed with estimation beyond the initial setup step, using iter(0):

. streg hormon, dist(weib) from(b1) iter(0) nohr

         failure _d:  censrec
   analysis time _t:  rectime/365.25

Fitting full model:
Iteration 0:   log likelihood = -708.81541  
convergence not achieved

Weibull PH regression

No. of subjects =          686                  Number of obs    =         686
No. of failures =          299
Time at risk    =  2111.978097
                                                Wald chi2(1)     =       40.87
Log likelihood  =   -708.81541                  Prob > chi2      =      0.0000

          _t |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
      hormon |         -1   .1564199    -6.39   0.000    -1.306577   -.6934227
       _cons |  -2.195131   .1132968   -19.38   0.000    -2.417189   -1.973074
       /ln_p |    .250997   .0526692     4.77   0.000     .1477673    .3542267
           p |   1.285306    .067696                      1.159243    1.425078
         1/p |   .7780247   .0409779                      .7017159    .8626318
Warning: convergence not achieved

which gives us exactly what we want. We of course get the convergence not achieved message, but our ‘estimated’ parameter vector is set to the values we specified! Check in e(b) as well:

. mat list e(b)

            _t:         _t:          /:
        hormon       _cons        ln_p
y1          -1  -2.1951315   .25099699

So now we have our streg model object, with our new hormone coefficient, we can simply store our new model object, and run predictms:

. estimates store m1

. predictms , models(m1) at1(hormon 1) survival timevar(tvar)

predictms doesn’t care if your model converged or not, it simply uses the values it finds in e(b). Even if I hadn’t fitted a first model, I could’ve just passed a matrix of my own values to streg, the only thing to make sure you have in memory is some form of survival data, as it will check your data is stset, and a variable called hormon. Those things can be easily created with some dummy data.

We can now generalise this to assess the assumption that our treatment effect comes from $N(-1,0.1^2)$, by making a large number of draws from the normal distribution, and calculating survival at 5 years for each run, as follows:

. local Nsim = 200

. capture set obs `Nsim'

. //to store our predictions
. gen s5 = .
(686 missing values generated)

. //loop over Nsim draws
. set seed 13490        // <- don't forget!

. forvalues i=1/`Nsim' {
  2.   local draw = rnormal(-1,0.1)
  3.   mat b1[1,1] = `draw'
  4.   qui streg hormon, dist(weib) from(b1) iter(0) nohr
  5.   //double check our hormon coefficient matches the random draw
.   assert [_t][hormon]==`draw'     
  6.   //store new model object
.   estimates store m1
  7.   //predict survival at 5 years
.   predictms , models(m1) at1(hormon 1) survival timevar(tvar)
  8.   //store my new estimate
.   qui replace s5 = _prob_at1_1_1[2] if _n==`i'
  9. }

which gives us our distribution of estimates of $S(5)$:

. summarize s5

    Variable |        Obs        Mean    Std. Dev.       Min        Max
          s5 |        200    .7226256    .0244609     .65435      .7742

Hopefully that shows the basis for how to manipulate a fitted model in Stata. Any parameter can be altered, and any prediction can be obtained, I simply tricked streg into turning it back into an estimates object, which means we can use a post-estimation function, such as predict or predictms.

  • I didn’t bother changing n() in my predictms calls. You should increase them to assess the impact of the simulation error, since the default method to calculate predictions is to use Monte Carlo simulation.
  • Next steps for PSA will be extracting the simulated times from predictms used to calculate the predictions, in order to apply costs to various components. I’ll add this in a future update.


Crowther MJ, Lambert PC. Parametric multi-state survival models: flexible modelling allowing transition-specific distributions with application to estimating clinically useful measures of effect differences. Statistics in Medicine 2017;36(29):4719-4742.


comments powered by Disqus