`survsim`

version 4.0.0

I may have got carried away this week, but `survsim`

version 4.0.0 is now released. It’s version *4* because this breaks some things introduced in 3 and 3.1, but, provides a whole lot more.

Key developments:

- computation time reduced by about 70% for multi-state model simulation, as the root-finding procedure is now vectorised
`reset`

added to`hazard#()`

to allow simulation from semi-Markov multi-state models- syntax changed for user-defined functions, time
`#t`

must now specified with`{t}`

, for more robust parsing - support for general multiple timescales, including:
`{t0}`

can be used in`user()`

and`tdefunction()`

to denote the time of entry into the initial state of the associated transition

- the competing risks syntax has now been removed and is now part of the more general
`msm`

syntax - bug fix; variables used directly in
`user()`

with a multi-state model were not passed to Mata -> now fixed - bug fix; if the same variable was used in multiple
`hazard#()`

s, parsing would only pick up the first -> now fixed

You can install the latest version of `survsim`

using:

```
net install survsim, from("https://www.mjcrowther.co.uk/code/survsim/")
```

or use `adoupdate`

if you already have version 3.0.0 or 3.1.0. It will be up on the SSC archive soon.

I’ll now go through the examples I posted earlier this week, that are affected by the changes, showing how to use the new syntax etc.

$~$

#### Simulating survival times from a user-defined (log) (cumulative) hazard function

I’ll simulate 300 observations, and generate a binary treatment group,

```
. set obs 300
number of observations (_N) was 0, now 300
. set seed 134987
. gen trt = runiform()>0.5
```

The most flexible form of simulating survival data with `survsim`

is by specifying a custom hazard or cumulative hazard function, such as:

$$ h(t) = h_{0}(t) \exp (\text{trt} \beta_{1}) $$ where $$ h_{0}(t) = \exp (-1 + 0.02 t - 0.03 t^2 + 0.005 t^3) $$

which can be done, on the `loghazard()`

scale for simplicity, using

```
. survsim stime1 died1, loghazard(-1 :+ 0.02:*{t} :- 0.03:*{t}:^2 :+ 0.005:*{t}:^3) ///
> maxtime(1.5) covariates(trt -0.5)
Warning: 199 survival times were above the upper limit of maxtime()
They have been set to maxtime()
You can identify them by _survsim_rc = 3
```

with a common right censoring time of 1.5 years specified using `maxtime(1.5)`

. We could make the treatment effect diminish over time by,

```
. survsim stime2 died2, loghazard(-1 :+ 0.02:*{t} :- 0.03:*{t}:^2 :+ 0.005:*{t}:^3) ///
> covariates(trt -0.5) tde(trt 0.03) maxtime(1.5)
Warning: 181 survival times were above the upper limit of maxtime()
They have been set to maxtime()
You can identify them by _survsim_rc = 3
```

which will form an interaction between `trt`

, its coefficient `0.03`

and linear time, which is the default. We could instead simulate from a model on the cumulative hazard scale, using the `logchazard()`

option instead.

$~$

#### Simulating competing risks data from specified cause-specific hazard functions

`survsim`

simulates competing risk data using the cause-specific hazard setting described by Beyersmann et al. (2009), utitlising the general simulation algorithm described by Crowther and Lambert (2013). Let’s simulate from a competing risk model with 2 competing events. The first cause-specific hazard has a Weibull distribution, with no covariates. The second cause-specific hazard model has an exponential distribution, with a beneficial treatment effect. Right censoring is applied at 10 years.

```
. clear
. set obs 1000
number of observations (_N) was 0, now 1,000
. gen trt = runiform()>0.5
. survsim time state event , hazard1(dist(weibull) lambda(0.1) gamma(1.2)) ///
> hazard2(dist(exponential) lambda(0.01) covariates(trt -0.5)) ///
> maxtime(10)
variables time0 to time1 created
variables state0 to state1 created
variables event1 to event1 created
```

We can see which events occurred with

```
. tabulate state1 event1
| event1
state1 | 0 1 | Total
-----------+----------------------+----------
1 | 208 0 | 208
2 | 0 750 | 750
3 | 0 42 | 42
-----------+----------------------+----------
Total | 208 792 | 1,000
```

Now let’s simulate from a competing risk model with 3 competing events. The first cause-specific hazard has a user defined baseline hazard function, with a harmful treatment effect. The second cause-specific hazard model has a Weibull distribition, with a beneficial treatment effect. The third cause-specific hazard has a user-defined baseline hazard function, with an initially beneficial treatment effect that reduces linearly with respect to log time. Right censoring is applied at 3 years. Phew.

```
. cap drop time* state* event*
. survsim time state event , hazard1(user(exp(-2 :+ 0.2:* log({t}) :+ 0.1:*{t})) covariates(trt 0.1)) ///
> hazard2(dist(weibull) lambda(0.01) gamma(1.3) covariates(trt -0.5)) ///
> hazard3(user(0.1 :* {t} :^ 1.5) covariates(trt -0.5) tde(trt 0.1) ///
> tdefunction(log({t}))) ///
> maxtime(3)
variables time0 to time1 created
variables state0 to state1 created
variables event1 to event1 created
. tabulate state1 event1
| event1
state1 | 0 1 | Total
-----------+----------------------+----------
1 | 362 0 | 362
2 | 0 338 | 338
3 | 0 28 | 28
4 | 0 272 | 272
-----------+----------------------+----------
Total | 362 638 | 1,000
```

You can see that this can get as complex as necessary. I currently let you use up to 50 cause-specific hazards, just in case you’re feeling particularly adventurous.

$~$

#### Simulating from an illness-death model

We first define the transition matrix for an illness-death model. It has three states:

- State 1 - A “healthy” state. Observations can move from state 1 to state 2 or 3.
- State 2 - An intermediate “illness” state. Observations can come from state 1, and move on to state 3.
- State 3 - An absorbing “death” state. Observations can come from state 1 or 2, but not leave.

This gives us three potential transitions between states:

- Transition 1 - State 1 -> State 2
- Transition 2 - State 1 -> State 3
- Transition 3 - State 2 -> State 3

which is defined by the following matrix:

```
. matrix tmat = (.,1,2\.,.,3\.,.,.)
```

The key is to think of the column/row numbers as the states, and the elements of the matrix as the transition numbers. Any transitions indexed with a missing value `.`

means that the transition between the row state and the column state is not possible. Let’s make it obvious, sticking with our “healthy”, “ill” and “dead” names for the states:

```
. mat colnames tmat = "healthy" "ill" "dead"
. mat rownames tmat = "healthy" "ill" "dead"
. mat list tmat
tmat[3,3]
healthy ill dead
healthy . 1 2
ill . . 3
dead . . .
```

Now we’ve defined the transition matrix, we can use `survsim`

to simulate some data. We’ll simulate 1000 observations, and generate a binary treatment group indicator, remembering to `set seed`

first (I bashed the keyboard to pick my seed).

```
. clear
. set obs 1000
number of observations (_N) was 0, now 1,000
. set seed 9865
. gen trt = runiform()>0.5
```

The first transition-specific hazard has a user defined baseline hazard function, with a harmful treatment effect. The second transition-specific hazard model has a Weibull distribition, with a beneficial treatment effect. The third transition-specific hazard has a user-defined baseline hazard function, with an initially beneficial treatment effect that reduces linearly with respect to log time. Right censoring is applied at 3 years.

```
. survsim time state event, transmatrix(tmat) ///
> hazard1(user(exp(-2 :+ 0.2:* log({t}) :+ 0.1:*{t})) covariates(trt 0.1)) ///
> hazard2(dist(weibull) lambda(0.01) gamma(1.3) covariates(trt -0.5)) ///
> hazard3(user(0.1 :* {t} :^ 1.5) covariates(trt -0.5) tde(trt 0.1) ///
> tdefunction(log({t}))) ///
> maxtime(3)
variables time0 to time2 created
variables state0 to state2 created
variables event1 to event2 created
```

The hazard number `#`

in each `hazard#()`

, represents the transition number in the transiton matrix. Simple as that. `survsim`

has created variables storing the times at which states were entered, with the associated state number and the associated event indicator. It begins by creating the `0`

variables, which represents the time at which observatations entered the inital state, `time0`

, and the associated state number, `state0`

. As `ltruncated()`

and `startstate()`

were not specified, all observations are assumed to start in state 1 at time 0. Subsequent transitions are simulated until all observations have either entered an absorbing state, or are right-censored at their `maxtime()`

. For simplicity, I will assume time is measured in years. We can see what `survsim`

has created:

```
. list if inlist(_n,1,4,16,112)
+----------------------------------------------------------------------------------+
| trt time0 state0 time1 state1 event1 time2 state2 event2 |
|----------------------------------------------------------------------------------|
1. | 0 0 1 3 1 0 . . . |
4. | 1 0 1 .95636156 2 1 3 2 0 |
16. | 0 0 1 1.0755764 2 1 2.4401409 3 1 |
112. | 1 0 1 2.3290322 3 1 . . . |
+----------------------------------------------------------------------------------+
```

All observations start initially in state 1 at time 0, which are stored in `state0`

and `time0`

, respectively. Then,

- Observation 1 is right-censored at 3 years, remaining in state 1
- Observation 4 moves to state 2 at 0.956 years, and is subsequently right-censored at 3 years, still in state 2
- Observation 16 moves to state 2 at 1.076 years, and then moves to state 3 at 2.440 years. Since state 3 is an absorbing state, there are no further transitions
- Observation 112 moves to state 3 at 2.329 years. Again, since state 3 is absorbing, there are no further transitions