Selected Publications

A list of all my publications can be found here, with some recent ones described below:

In this paper I describe some substantial extensions to the survsim command for simulating survival data. survsim can now simulate survival data from a parametric distribution, a custom/user-defined distribution, from a fitted merlin model, from a specified cause-specific hazards competing risks model, or from a specified general multi-state model. I illustrate the command with some examples from each setting, demonstrating the huge flexibilty that can be used to better evaluate statistical methods.
Pre-print, 2020

In this article, I present the community-contributed stmixed command for fitting multilevel survival models. It serves as both an alternative to Stata’s official mestreg command and a complimentary command with substantial extensions. stmixed can fit multilevel survival models with any number of levels and random effects at each level, including flexible spline-based approaches (such as Royston–Parmar and the log-hazard equivalent) and user-defined hazard models. Simple or complex time-dependent effects can be included, as can expected mortality for a relative survival model. Left-truncation (delayed entry) is supported, and t-distributed random effects are provided as an alternative to Gaussian random effects. I illustrate the methods with a commonly used dataset of patients with kidney disease suffering recurrent infections and a simulated example illustrating a simple approach to simulating clustered survival data using survsim (Crowther and Lambert 2012, Stata Journal 12: 674–687; 2013, Statistics in Medicine 32: 4118–4134). stmixed is part of the merlin family (Crowther 2017, arXiv Working Paper No. arXiv:1710.02223; 2018, arXiv Working Paper No. arXiv:1806.01615).
Stata Journal, 2019

Simulation studies are computer experiments which involve creating data by pseudorandom sampling. The key strength of simulation studies is the ability to understand the behaviour of statistical methods because some `truth’ is known from the process of generating the data. This allows us to consider properties of methods, such as bias. While widely used, simulation studies are often poorly designed, analysed and reported. This article outlines the rationale for using simulation studies and offers guidance for design, execution, analysis, reporting and presentation. In particular, we provide: a structured approach for planning and reporting simulation studies; coherent terminology for simulation studies; guidance on coding simulation studies; a critical discussion of key performance measures and their computation; ideas on structuring tabular and graphical presentation of results; and new graphical presentations. With a view to describing current practice and identifying areas for improvement, we review 100 articles taken from Volume 34 of Statistics in Medicine which included at least one simulation study.
Stat Med, 2019

merlin can do a lot of things. From simple stuff, like fitting a linear regression or a Weibull survival model, to a three-level logistic mixed effects model, or a multivariate joint model of multiple longitudinal outcomes (of different types) and a recurrent event and survival with non-linear effects…the list is rather endless. merlin can do things I haven’t even thought of yet. I’ll take a single dataset, and attempt to show you the full range of capabilities of merlin, and discuss some future directions for the implementation in Stata.

Multivariate data occurs in a wide range of fields, with ever more flexible model specifications being proposed, often within a multivariate generalised linear mixed effects (MGLME) framework. In this article, we describe an extended framework, encompassing multiple outcomes of any type, each of which could be repeatedly measured (longitudinal), with any number of levels, and with any number of random effects at each level. Many standard distributions are described, as well as non-standard user-defined non-linear models. The extension focuses on a complex linear predictor for each outcome model, allowing sharing and linking between outcome models in an extremely flexible way, either by linking random effects directly, or the expected value of one outcome (or function of it) within the linear predictor of another. Non-linear and time-dependent effects are also seamlessly incorporated to the linear predictor through the use of splines or fractional polynomials. We further propose level-specific random effect distributions and numerical integration techniques to improve usability, relaxing the normally distributed random effects assumption to allow multivariate $t$-distributed random effects. We consider some special cases of the general framework, describing some new models in the fields of clustered survival data, joint longitudinal-survival models, and discuss various potential uses of the implementation. User friendly, and easily extendable, software is provided.
arXiv, 2017

Multi-state models are increasingly being used to model complex disease profiles. By modelling transitions between disease states, accounting for competing events at each transition, we can gain a much richer understanding of patient trajectories and how risk factors impact over the entire disease pathway. In this article we concentrate on parametric multi-state models, both Markov and semi-Markov, and develop a flexible framework where each transition can be specified by a variety of parametric models including exponential, Weibull, Gompertz, Royston-Parmar proportional hazards models or log-logistic, log-normal, generalised gamma accelerated failure time models, possibly sharing parameters across transitions. We also extend the framework to allow time-dependent effects. We then use an efficient and generalisable simulation method to calculate transition probabilities from any fitted multi-state model, and show how it facilitates the simple calculation of clinically useful measures, such as expected length of stay in each state, and differences and ratios of proportion within each state as a function of time, for specific covariate patterns. We illustrate our methods using a dataset of patients with primary breast cancer. User friendly Stata software is provided.
In Stats in Med, 2017


As part of my research I have developed a range of software packages in Stata and R. More details, including tutorials, can be found on the package-specific pages:

merlin ~ mixed effects regression for linear and non-linear models
Tutorials in Stata, Stata version history (stable release), Stata version history (development release)

multistate ~ multi-state survival analysis
Stata version history (stable release), Stata version history (development release)

survsim ~ simulation of simple and complex survival data
Tutorials in Stata, Stata version history (stable release), Stata version history (development release)

stmixed ~ multilevel parametric survival models
Stata version history (stable release), Stata version history (development release)

staft ~ flexible parametric accelerated failure time models
Stata version history (stable release), Github repo.

sankey ~ Sankey graphs in Stata using Python
Stata version history (stable release)

stjm ~ joint models of longitudinal and survival data

stgenreg ~ general parametric survival models

stmix ~ two-component mixture parametric survival models

extfunnel ~ extended funnel plots for meta-analysis

metapow ~ simulation-based sample size calculations for designing trials based on an existing meta-analysis

Recent Posts

More Posts

survsim can now simulate survival data from a semi-Markov or multiple timescale multi-state model, and is about 70% faster


survsim can now simulate survival data from a Markov multi-state model, defined by general transition-specific hazard functions


survsim can now simulate survival data from a parametric distribution, a user-defined distribution, from a fitted merlin model, or from a general cause-specific hazards competing risks model


Bringing together the strengths of stset and merlin


How merlin makes modelling of non-linear effects a whole lot simpler