The do file with all the code from this tutorial is available to download here

In this example, I look at a recent paper by Goldstein et al. (2017) that proposed a two-level model with complex level 1 variation. This will be a nice illustration of the use of `family(user)`

combined with the `family(null)`

options of `merlin`

.

Let’s start by defining our notation and model structure. We have, $$ y_{ij} = X_{1ij}\beta_{1} + Z_{1ij}b_{1j} + \epsilon_{ij} $$ where $y_{ij}$ is a repeatedly measured continuous outcome, for the $i$th patient measured at the $j$th time point. We have our fixed effect design matrix, $X_{1ij}$, and associated coefficients, $\beta_{1}$, and our random effects design matrix, $Z_{1ij}$, and our normally distributed random effects, $b_{1j}$. We have our standard residual error, level 1 variance function, $\epsilon_{ij}$, where under a standard mixed effects model, would be $$ \epsilon_{ij} \sim N(0,\sigma_{e}^{2}) $$ But what Goldstein and colleagues did was relax the homoscedasticity assumption, and allow the residual variance to vary as a function of both fixed and random effects, where the random effects are defined at the highest level, level 2. They did this by specifying a model for the log of the residual error variance, $$ \log (\sigma_{e}^{2}) = X_{2ij}\beta_{2} + Z_{2ij}b_{2i} $$ where $X_{2ij}$ is the fixed effects design matrix, with associated coefficients, $\beta_{2}$, and $Z_{2ij}$ is the random effects design matrix, $b_{2j}$, again normally distributed. By modelling on the log scale, we avoid any issues with variances going negative.

We now have two vectors of random effects defined at level 2, which of course may be correlated, and so they specified an overall multivariate normal distribution by stacking the random effects as follows,

$$ \left(\genfrac{}{}{0pt}{}{\mathbf{b}_{1i}}{\mathbf{b}_{2i}}\right) \sim N \left[\left(\genfrac{}{}{0pt}{}{0}{0}\right) , \left( \genfrac{}{}{0pt}{}{\mathbf{\Sigma}_{b_{1}}}{\mathbf{\Sigma}_{b_{1}b_{2}}} \genfrac{}{}{0pt}{}{}{\mathbf{\Sigma}_{b_{2}}} \right) \right] $$

They estimated their proposal with a Bayesian approach utilising Markov Chain Monte Carlo methods. Here we can subsume this proposed model within the extended framework, and show how to fit it using `merlin`

. As with many of my tutorials, I’ll start by showing you how to simulate such data.

Let’s simulate data for 300 patients, under the following model,

$$ y_{ij} = \beta_{01} + b_{01i} + \beta_{11} X_{1ij}+ \epsilon_{ij} $$ $$ \epsilon_{ij} \sim N(0,\sigma_{e}^{2}) $$ $$ \log (\sigma_{e}^{2}) = \beta_{02} + b_{02i} $$ $$ \left(\genfrac{}{}{0pt}{}{b_{01i}}{b_{02i}}\right) \sim N \left[\left(\genfrac{}{}{0pt}{}{0}{0}\right) , \left( \genfrac{}{}{0pt}{}{\sigma^{2}_{1}}{0} \genfrac{}{}{0pt}{}{0}{\sigma^{2}_{2}} \right) \right] $$

where we will observe a repeatedly measured biomarker, $y_{ij}$, with a random intercept, $b_{01i}$, and the residual error variance dependent on a random effect which is defined at the patient level, $b_{02i}$. We start by generating a unique `id`

variable for our 300 patients,

```
. clear all
. set seed 24575240
. set obs 300
number of observations (_N) was 0, now 300
. gen id = _n
```

We’ve declared 300 observations, and used the subscript notation `_n`

to generate our `id`

variable which now contains a vector of 1..300. We now need to define our variables at the patient `id`

level. Let’s define our two random effects, which we assume are normally distributed, and independent from each other.

```
. gen b21 = rnormal(0,0.5)
. gen b11 = rnormal(0,1)
.
```

where I’ll assume `b11`

is the random intercept for our biomarker trajectory, and `b21`

is the random intercept for our residual variance linear predictor. If we’d wanted to generate correlated random effects we could have used `drawnorm`

. I now need the standard deviation of my residual error, $\sigma_{e}$, which we can generate with

```
. gen sdre = sqrt(exp(b21))
```

Now let’s assume we observe 5 measurements of the biomarker per patient, and that they are recorded at 0, 1, 2, 3, 4 time points.

```
. expand 5
(1,200 observations created)
. bys id: gen time = _n-1
```

Finally let’s generate our observed biomarker, $y_{ij}$. We’ll assume $\beta_{01}=0.5$, $\beta_{11}=0.1$, and we can pass our residual error standard deviation to the `rnormal()`

function to generate our level 1 variability,

```
. gen y = 0.5 + b11 + 0.1*time + rnormal(0,sdre)
```

Now we have our data, we can fit our model…almost. In terms of understanding, it’s easiest if we define our `merlin`

command that we will use, first.

```
merlin (y time M1[id]@1, family(user, llfunction(lev1_logl))) ///
(M2[id]@1, family(null))
```

Hopefully you’ve looked at some of the simpler example uses of `merlin`

, as this one uses two of the more complex
syntax aspects, `family(user)`

and `family(null)`

. Even though our distributional model is Gaussian, we can’t use
the built-in distribution `family(gaussian)`

, because we want to have a residual variance function which isn’t just
a scalar. What we really want is for the log of the residual variance to have its own complex linear predictor, just
as if it were its own `merlin`

model. This is where `family(null)`

comes in. This lets us define and evaluate a complex
linear predictor, but it tells `merlin`

that that model definition doesn’t contribute to the overall log-likelihood. We
can then use our complex linear predictor in a user-defined log-likelihood function, as follows,

```
. mata:
------------------------------------------------- mata (type end to exit) ------------------------------------------------------------------------------------------------------------
: real matrix lev1_logl(gml)
> {
> y = merlin_util_depvar(gml) //response
> linpred1 = merlin_util_xzb(gml) //main lin. pred.
> varresid = exp(merlin_util_xzb_mod(gml,2)) //lev1 lin. pred
> return(lnnormalden(y,linpred1,sqrt(varresid))) //logl
> }
: end
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
```

The important thing to note is the use of the utility function `merlin_util_xzb_mod()`

, which let’s us extract the
complex linear predictor of a named *model* (see the help files for `merlin`

for more details), as opposed to
`merlin_util_xzb()`

which extracts the complex linear predictor of the current *model*, i.e. the model which
corresponds to where the `family(user,...)`

was entered. In the above case `merlin_util_xzb(gml) = merlin_util_xzb_mod(gml,1)`

.

So let’s now fit the model:

```
. merlin (y time M1[id]@1, family(user, llfunction(lev1_logl))) ///
> (M2[id]@1, family(null))
Fitting fixed effects model:
Fitting full model:
Iteration 0: log likelihood = -2517.3658 (not concave)
Iteration 1: log likelihood = -2433.9253
Iteration 2: log likelihood = -2422.6132
Iteration 3: log likelihood = -2419.4191
Iteration 4: log likelihood = -2419.4134
Iteration 5: log likelihood = -2419.4131
Iteration 6: log likelihood = -2419.4131
Mixed effects regression model Number of obs = 1,500
Log likelihood = -2419.4131
------------------------------------------------------------------------------
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
y: |
time | .0944808 .0174975 5.40 0.000 .0601863 .1287754
M1[id] | 1 . . . . .
_cons | .6403802 .074414 8.61 0.000 .4945315 .7862289
-------------+----------------------------------------------------------------
null: |
M2[id] | 1 . . . . .
_cons | -.1008029 .0557784 -1.81 0.071 -.2101266 .0085209
-------------+----------------------------------------------------------------
id: |
sd(M1) | 1.063407 .0541458 .9624069 1.175007
sd(M2) | .5488374 .06484 .4353938 .6918394
------------------------------------------------------------------------------
```

And we get good estimates of our true parameters described above. If you want to check it for bias, then increase the sample size to something large, say n = 10000, to see that everything is working as expected.

Now we can extend the level 1 variance function model in a number of ways, simply through the complex linear predictor, for example through extending to higher levels, time-dependent effects and non-linear covariate effects, as discussed in Goldstein et al. (2017). All of these extensions can be done without changing a single thing in our `lev1_logl()`

function…I think that’s pretty cool. We can of course also incorporate a non-linear mixed effects model, and still model the level 1 variance function as above, combining this example with this one, for example. Job done.