Recent Publications

A list of all my publications can be found here.

Simulation studies are computer experiments which involve creating data by pseudorandom sampling. The key strength of simulation studies is the ability to understand the behaviour of statistical methods because some `truth’ is known from the process of generating the data. This allows us to consider properties of methods, such as bias. While widely used, simulation studies are often poorly designed, analysed and reported. This article outlines the rationale for using simulation studies and offers guidance for design, execution, analysis, reporting and presentation. In particular, we provide: a structured approach for planning and reporting simulation studies; coherent terminology for simulation studies; guidance on coding simulation studies; a critical discussion of key performance measures and their computation; ideas on structuring tabular and graphical presentation of results; and new graphical presentations. With a view to describing current practice and identifying areas for improvement, we review 100 articles taken from Volume 34 of Statistics in Medicine which included at least one simulation study.
arXiv, 2017

Multivariate data occurs in a wide range of fields, with ever more flexible model specifications being proposed, often within a multivariate generalised linear mixed effects (MGLME) framework. In this article, we describe an extended framework, encompassing multiple outcomes of any type, each of which could be repeatedly measured (longitudinal), with any number of levels, and with any number of random effects at each level. Many standard distributions are described, as well as non-standard user-defined non-linear models. The extension focuses on a complex linear predictor for each outcome model, allowing sharing and linking between outcome models in an extremely flexible way, either by linking random effects directly, or the expected value of one outcome (or function of it) within the linear predictor of another. Non-linear and time-dependent effects are also seamlessly incorporated to the linear predictor through the use of splines or fractional polynomials. We further propose level-specific random effect distributions and numerical integration techniques to improve usability, relaxing the normally distributed random effects assumption to allow multivariate $t$-distributed random effects. We consider some special cases of the general framework, describing some new models in the fields of clustered survival data, joint longitudinal-survival models, and discuss various potential uses of the implementation. User friendly, and easily extendable, software is provided.
arXiv, 2017

With the release of Stata 14 came the mestreg command to fit multilevel mixed effects parametric survival models, assuming normally distributed random effects, estimated with maximum likelihood utilising Gaussian quadrature. In this article, I present the user written stmixed command, which serves as both an alternative and a complimentary program for the fitting of multilevel parametric survival models, to mestreg. The key extensions include incorporation of the flexible parametric Royston-Parmar survival model, and the ability to fit multilevel relative survival models. The methods are illustrated with a commonly used dataset of patients with kidney disease suffering recurrent infections, and a simulated example, illustrating a simple approach to simulating clustered survival data using survsim (Crowther and Lambert, 2012, 2013).
arXiv, 2017

Multi-state models are increasingly being used to model complex disease profiles. By modelling transitions between disease states, accounting for competing events at each transition, we can gain a much richer understanding of patient trajectories and how risk factors impact over the entire disease pathway. In this article we concentrate on parametric multi-state models, both Markov and semi-Markov, and develop a flexible framework where each transition can be specified by a variety of parametric models including exponential, Weibull, Gompertz, Royston-Parmar proportional hazards models or log-logistic, log-normal, generalised gamma accelerated failure time models, possibly sharing parameters across transitions. We also extend the framework to allow time-dependent effects. We then use an efficient and generalisable simulation method to calculate transition probabilities from any fitted multi-state model, and show how it facilitates the simple calculation of clinically useful measures, such as expected length of stay in each state, and differences and ratios of proportion within each state as a function of time, for specific covariate patterns. We illustrate our methods using a dataset of patients with primary breast cancer. User friendly Stata software is provided.
In Stats in Med, 2017


As part of my research I have developed a range of software packages in Stata. More details, including tutorials, can be found on the package-specific pages:

  • megenreg - to fit extended multivariate generalised linear and non-linear mixed effects models
  • staft - to fit flexible parametric accelerated failure time models
  • multistate - for multi-state survival analysis
  • stjm - to fit joint models of longitudinal and survival data
  • survsim - for the simulation of simple and complex survival data
  • stgenreg - to fit general parametric survival models
  • stmixed - to fit multilevel parametric survival models
  • stmix - to fit two-component mixture parametric survival models
  • extfunnel - to produce extended funnel plots for meta-analysis
  • metapow - for simulation-based sample size calculations for designing trials based on an existing meta-analysis

Each package can be installed by typing ssc install cmdname within Stata. Having said that, I’m starting to move things over to git repositories, so keep an eye on the package pages for installation instructions.

Recent Posts

You can see a list of all my posts here.

A major update to the multistate package in Stata, and other news in my multistate world


Some details on the importance of good starting values with megenreg, and my plans to reduce the worry


The program that can do everything…almost


Some history into how I learnt to code, and how I continually try and get better


A few thoughts on what I’ll be blogging about.



Flexible AFT models

Flexible parametric accelerated failure time models


MRC New Investigator Research Grant (01/03/2017 - 29/02/2020)


Multi-state survival analysis


Extended multivariate generalised linear and non-linear mixed effects models


My core teaching is on the MSc Medical Statistics course at the University of Leicester.

I teach a number of short courses, some teaching material is made freely available on the course pages:


My research group currently consists of:

Research Staff

  • Dr Emma Martin, Post-doctoral Research Associate in Biostatistics, University of Leicester. Emma is funded by my MRC New Investigator Research Grant to work on a variety of projects in multi-state survival models and joint models.

  • Micki Hill, Research Assistant in Biostatistics, University of Leicester. I co-supervise Micki who is funded by the charity Duchenne UK, to work on Project HERCULES - a multi-disciplinary collaborative project, where she’ll be working on multi-state survival models.

PhD students

As main supervisor:

  • Alessandro Gasparini, University of Leicester (1st October 2016 - Present) [GitHub]
    Alessandro has been working on frailty survival models and a RShiny app for use in summarising simulation studies. His main project centres on informative observations in joint modelling of longitudinal and survival data. More details on his PhD can be found here.

  • Nuzhat Ashra, University of Leicester (25th September 2017 - Present). Nuzhat is funded by an MRC IMPACT studentship and SPD Development Company, to work on joint modelling of biomarkers to predict miscarriage. She will also be working on some extensions to the stjm command in Stata, including dynamic predictions.

  • To be appointed. I have funding for a three year PhD studentship to work on multi-state survival models with applications in cancer epidmiology, to start October 2018. Applications will open very soon.

As co-supervisor:

  • Sam Brilleman, Monash University
    [Homepage] [GitHub]
    Sam’s been working on a variety of projects, but a core project has been development of an R package to fit an extensive array of Bayesian joint models using Stan. More details on his PhD can be found here.