Nonparametric analysis of clustered multistate process data.
clusteredMSM provides population-averaged transition
probability estimates, pointwise confidence intervals, simultaneous
confidence bands, and two-sample Kolmogorov-Smirnov-type tests for
multistate process data with cluster-correlated observations. Estimation
follows Bakoyannis
(2021); two-sample inference for the cluster-randomized and
independent-samples designs follows Bakoyannis &
Bandyopadhyay (2022). Both rest on the working-independence
Aalen-Johansen estimator with a cluster-bootstrap variance.
Unlike its predecessor (the clustered-multistate
repository, which relied on the mstate package),
clusteredMSM is self-contained (depending only on
survival) and supports non-monotone multistate
processes, including illness-death with recovery and other
models with cyclic transitions.
# install.packages("devtools")
devtools::install_github("gbakoyannis/clusteredMSM")After CRAN release:
install.packages("clusteredMSM")clusteredMSM exposes one main function,
patp(), modelled after survival::Surv():
library(clusteredMSM)
# Synthetic clustered illness-death-with-recovery data (40 subjects,
# 8 clusters); see ?example_msm.
data(example_msm)
# Define the transition structure (illness-death with recovery)
tmat <- trans_mat(list(c(2, 3), c(1, 3), integer(0)),
names = c("Healthy", "Ill", "Dead"))
# One-sample analysis: P(Ill at t | Healthy at 0)
fit <- patp(msm(Tstart, Tstop, Sstart, Sstop) ~ 1,
data = example_msm, tmat = tmat,
id = "id", cluster = "cluster",
h = 1, j = 2, s = 0,
B = 1000, cband = TRUE)
fitIf the formula’s right-hand side has a grouping variable,
patp() automatically estimates both group-specific curves
AND tests their equality:
# Two-sample analysis (estimate + test in one call)
tt <- patp(msm(Tstart, Tstop, Sstart, Sstop) ~ treatment,
data = example_msm, tmat = tmat,
id = "id", cluster = "cluster",
h = 1, j = 2, B = 1000)
ttThe same example is shipped as a CSV under
inst/extdata/, so you can mimic the typical workflow of
reading a user-supplied file:
f <- system.file("extdata", "example_data.csv", package = "clusteredMSM")
mydata <- read.csv(f)
head(mydata)Each row of your data represents one mutually-exclusive time interval for one subject, with columns:
| Column | Description |
|---|---|
Tstart |
Numeric start time of the interval |
Tstop |
Numeric end time of the interval |
Sstart |
Integer state occupied during the interval |
Sstop |
Integer state at Tstop (or equal to Sstart
if censored) |
id |
Subject identifier |
cluster |
(optional) cluster identifier |
| (group) | (optional) binary grouping variable |
The column names are arbitrary — msm(...) and the
id/cluster arguments tell the package which is
which.
Censoring is encoded as Sstart == Sstop
on the final row of a subject’s record. Subjects in
absorbing states have no row after them.
Within each subject, intervals must be: - Temporally
contiguous: Tstop[k] == Tstart[k+1] -
State contiguous:
Sstop[k] == Sstart[k+1]
Validation is strict and informative — any violation triggers an error with a clear message.
Progressive illness-death (subject who got ill, then died):
| id | Tstart | Tstop | Sstart | Sstop |
|---|---|---|---|---|
| 1 | 0.0 | 1.5 | 1 | 2 |
| 1 | 1.5 | 3.0 | 2 | 3 |
Subject censored healthy:
| id | Tstart | Tstop | Sstart | Sstop |
|---|---|---|---|---|
| 2 | 0.0 | 4.0 | 1 | 1 |
Recovery (Healthy → Ill → Healthy → censored):
| id | Tstart | Tstop | Sstart | Sstop |
|---|---|---|---|---|
| 3 | 0.0 | 1.0 | 1 | 2 |
| 3 | 1.0 | 2.0 | 2 | 1 |
| 3 | 2.0 | 3.5 | 1 | 1 |
| Function | Purpose |
|---|---|
patp() |
The main user-facing function — formula-based estimation and testing. |
msm() |
Constructor for multistate intervals; used inside the formula. |
trans_mat() |
Build a K x K transition matrix. |
validate_intervals() |
Validate user data (called automatically by patp();
usable directly). |
Bakoyannis, G. (2021). Nonparametric analysis of nonhomogeneous multistate processes with clustered observations. Biometrics, 77(2), 533-546. doi:10.1111/biom.13327
Bakoyannis, G., & Bandyopadhyay, D. (2022). Nonparametric tests for multistate processes with clustered data. Annals of the Institute of Statistical Mathematics, 74(5), 837-867. doi:10.1007/s10463-021-00819-x
You can retrieve the BibTeX entries within R via
toBibtex(citation("clusteredMSM")).
GPL-3