RMSTpowerBoost: Sample Size and Power Calculations for RMST-based Clinical Trials

Introduction

Clinical trials with time-to-event endpoints often rely on hazard-based methods such as the proportional hazards (PH) model and the hazard ratio (HR). The HR can be hard to interpret when treatment effects vary over time, and the proportional hazards assumption is frequently violated in practice.

The Restricted Mean Survival Time (RMST) is the expected event-free time up to a pre-specified follow-up point, \(L\) (Royston and Parmar 2013; Uno et al. 2014). It yields a treatment contrast on the time scale, such as an average gain of several months over a clinically relevant horizon.

Recent work models RMST directly as a function of baseline covariates instead of estimating it only from a survival curve or hazard model. Methods based on Inverse Probability of Censoring Weighting (IPCW) (Tian et al. 2014) now cover stratified studies (Wang et al. 2019; Zhang and Schaubel 2024) and covariate-dependent censoring (Wang and Schaubel 2018).

Most available software emphasizes estimation from existing data rather than study design. As a result, trial statisticians often need custom code for sample size and power calculations.

RMSTpowerBoost implements direct RMST methods for power and sample size calculations:

This vignette outlines the main model families and shows how to use them.

To keep CRAN checks fast, this vignette uses precomputed outputs for the long-running examples by default. Set RMSTPOWERBOOST_FULL_VIGNETTES=true before knitting to recompute every analytical and bootstrap result.


Core Concepts of RMSTpowerBoost Package

The functions in this package are grounded in a regression-based formulation of the restricted mean survival time (RMST). For a given subject \(i\) with event time \(T_i\), covariate vector \(Z_i\) and treatment indicator \(\mathrm{Trt}_i\), the conditional RMST is modeled as

\[ \mathbb{E}[\min(T_i, L)\mid Z_i] \;=\; \beta_0 + \beta_{\text{effect}}\,\mathrm{Trt}_i + \beta_2^{\top} Z_i, \]

where \(L\) is the restriction time, and \(\beta_{\text{effect}}\) represents the modeled treatment contrast on the RMST scale, that is, the expected difference in event-free time between treatment arms after adjustment for the included covariates. The quantity \(\beta_{\text{effect}}\) therefore defines the effect size used throughout the analytical power and sample size functions in RMSTpowerBoost.

The Analytic Method

The analytical functions estimate power from a closed-form expression, so they are useful for rapid scenario exploration. The process is:

  1. One-Time Estimation: The function first analyzes the provided reference data to estimate two key parameters:
    • The treatment effect size (e.g., the difference in RMST or the log-RMST ratio).
    • The asymptotic variance of that effect estimator, which measures its uncertainty.
  2. Power Formula: It then plugs these fixed estimates into a standard power formula. For a given total sample size N, the power is calculated as: \[ \text{Power} = \Phi\left( \frac{|\beta_{\text{effect}}|}{\sigma_N} - z_{1-\alpha/2} \right) \] where:
    • \(\Phi\) is the cumulative distribution function (CDF) of the standard normal distribution.
    • \(\beta_{\text{effect}}\) is the treatment effect.
    • \(\sigma_N = \frac{\sigma_1}{\sqrt{N}}\) is the standard error of the effect for the target sample size N, which is scaled from the reference data’s variance.
    • \(z_{1-\alpha/2}\) is the critical value from the standard normal distribution (e.g., 1.96 for an alpha of 0.05).

The Bootstrap Method

The bootstrap functions estimate power empirically and therefore require more computation. The process is:

  1. Resample: The function simulates a “future trial” of a given sample_size by resampling with replacement from the reference data.
  2. Fit Model: On this new bootstrap sample, it performs the full analysis (e.g., calculating weights or pseudo-observations and fitting the specified model).
  3. Get P-Value: It extracts the p-value for the treatment effect from the fitted model.
  4. Repeat: This process is repeated thousands of times (n_sim).
  5. Calculate Power: The final estimated power is the proportion of simulations where the p-value was less than the significance level alpha. \[ \text{Power} = \frac{\text{Number of simulations with } p < \alpha}{n_{\text{sim}}} \]

The Sample Size Search Algorithm

rmst.ss() uses an iterative search algorithm to find the N required to achieve a target_power:

  1. Start: The search begins with a sample size of n_start.
  2. Calculate Power: It calculates the power for the current_n using either the analytic formula or a full bootstrap simulation.
  3. Check Condition:
    • If calculated_power >= target_power, the search succeeds and returns current_n.
    • If not, it increments the sample size (current_n = current_n + n_step) and repeats the process.
  4. Stopping Rules: The search terminates if the sample size exceeds max_n or, for bootstrap methods, if the power fails to improve for a set number of patience steps.

The Unified Interface

RMSTpowerBoost exposes all models through two top-level functions that use a familiar Surv()-based formula interface:

Function Purpose
rmst.power(Surv(time, status) ~ covariates, data, arm, sample_sizes, L, ...) Power curve over a vector of sample sizes
rmst.ss(Surv(time, status) ~ covariates, data, arm, target_power, L, ...) Minimum sample size to reach a power target

Key arguments that control model selection:

Argument Values Effect
type "analytical" (default) / "boot" Analytic vs. bootstrap method
strata column name / ~col / NULL Activates stratified model
strata_type "additive" / "multiplicative" Stratified model type
dep_cens TRUE / FALSE Dependent-censoring model
Smooth terms s(var) in formula Activates GAM (forces type = "boot")

The underlying model-specific functions remain exported for direct use when needed.

Selecting an Appropriate Model

Model choice depends on the study design and the assumptions you are willing to make. The table below summarizes typical settings for each approach:

Model Key Assumption / Scenario Use when
Linear IPCW Assumes a linear relationship between covariates and RMST. there is no strong evidence of non-linear effects or complex stratification.
Additive Stratified Assumes the treatment adds a constant amount of survival time across strata. treatment effects are expected to be comparable across centers or other strata.
Multiplicative Stratified Assumes the treatment multiplies survival time proportionally across strata. treatment effects are better expressed as relative changes in RMST across strata.
Semiparametric GAM Allows for non-linear covariate effects on RMST. covariates such as age or biomarkers are likely to have non-linear associations with the outcome.
Dependent Censoring Accounts for covariate-dependent censoring under a single censoring mechanism. censoring depends on measured covariates and competing risks are not being modeled explicitly.

Linear IPCW Models

These functions implement the foundational direct linear regression model for the RMST. This model is appropriate when a linear relationship between covariates and the RMST is assumed, and when censoring is independent of the event of interest.

Theory and Model

Based on the methods of (Tian et al. 2014), these functions model the conditional RMST as a linear function of covariates: \[\mathbb{E}[\min(T_i, L) | Z_i] = \beta_0 + \beta_{\text{effect}} \text{Treatment}_i + \beta_2 \text{Covariate}_{i}\] In this model, the expected RMST up to a pre-specified time L for subject i is modeled as a linear combination of their treatment arm and other variables \(Z_i\).

To handle right-censoring, the method uses Inverse Probability of Censoring Weighting (IPCW). This is achieved through the following steps:

  1. A survival curve for the censoring distribution is estimated using the Kaplan-Meier method (where “failure” is being censored).
  2. For each subject who experienced the primary event, a weight is calculated. This weight is the inverse of the probability of not being censored up to their event time.
  3. A standard weighted linear model (lm()) is then fitted using these weights. The model only includes subjects who experienced the event.

Analytical Methods

The analytical functions use a formula based on the asymptotic variance of the regression coefficients to calculate power or sample size, making them fast to evaluate.

Scenario: We use the veteran dataset to estimate power for a trial comparing standard vs. test chemotherapy (trt), adjusting for the Karnofsky performance score (karno).

Power Calculation

First, let’s inspect the prepared veteran dataset.

Now, we calculate the power for a range of sample sizes using a truncation time of 9 months (270 days).

Code
power_results_vet <- vignette_cache("veteran_power_calc", {
rmst.power(
  Surv(time, status) ~ karno,
  data         = vet,
  arm          = "arm",
  sample_sizes = c(100, 150, 200, 250),
  L            = 270
)
})

The results are returned as a data frame and a ggplot object.

Power Analysis for Veteran Dataset
N_per_Arm Power
100 0.1266
150 0.1687
200 0.2106
250 0.2521

Sample Size Calculation

We can also use the analytical method to find the required sample size to achieve a target power for a truncation time of one year (365 days).

Code
ss_results_vet <- vignette_cache("veteran_ss_calc", {
rmst.ss(
  Surv(time, status) ~ karno,
  data         = vet,
  arm          = "arm",
  target_power = 0.40,
  L            = 365,
  n_start = 1000, n_step = 250, max_n = 5000
)
})
Estimated Effect from Reference Data
Statistic Value
Assumed RMST Difference (from pilot) -3.9666

Bootstrap Methods

Passing type = "boot" to rmst.power() or rmst.ss() switches to a simulation-based approach. This method repeatedly resamples from the reference data, fits the model on each sample, and calculates power as the proportion of simulations where the treatment effect is significant. It relies less on closed-form large-sample approximations at the cost of greater computation time.

Power and Sample Size Calculation (bootstrap)

Here is how you would call the bootstrap method for power for the linear model. The following examples use the same veteran dataset, but with a smaller number of simulations for demonstration purposes. In practice, a larger number of simulations (e.g., 1,000 or more) is recommended to ensure stable results.

First we calculate the power for a range of sample sizes.

Code
power_boot_vet <- vignette_cache("linear_boot_example", {
rmst.power(
  Surv(time, status) ~ karno,
  data         = vet,
  arm          = "arm",
  sample_sizes = c(150, 200, 250),
  L            = 365,
  type         = "boot",
  n_sim        = 50
)
})

Here is how you would call the bootstrap method for sample size calculation, targeting 50% power and truncating at 180 days (6 months).

Code
ss_boot_vet <- vignette_cache("linear_boot_ss_example", {
rmst.ss(
  Surv(time, status) ~ karno,
  data         = vet,
  arm          = "arm",
  target_power = 0.5,
  L            = 180,
  type         = "boot",
  n_sim        = 100,
  patience     = 5
)
})


Additive Stratified Models

In multi-center clinical trials, the analysis is often stratified by a categorical variable with many levels, such as clinical center or a discretized biomarker. Estimating a separate parameter for each stratum can be inefficient when the number of strata is large. The additive stratified model removes the stratum-specific effects through conditioning.

Theory and Model

The semiparametric additive model for RMST, as developed by (Zhang and Schaubel 2024), is defined as: \[\mu_{ij} = \mu_{0j} + \beta'Z_i\] This model assumes that the effect of the covariates \(Z_i\) (which includes the treatment arm) is additive and constant across all strata \(j\). Each stratum has its own baseline RMST, denoted by \(\mu_{0j}\).

The common treatment effect, \(\beta\), is estimated with a stratum-centering approach applied to IPCW-weighted data. This avoids direct estimation of the many \(\mu_{0j}\) parameters.

Analytical Methods

Sample Size Calculation

Scenario: We use the colon dataset to design a trial stratified by the extent of local disease (extent), a factor with 4 levels. We want to find the sample size per stratum to achieve 60% power. Let’s inspect the prepared colon dataset.

Now, we run the sample size search for 60% power, truncating at 5 years (1825 days).

Code
ss_results_colon <- vignette_cache("colon_ss_calc", {
rmst.ss(
  Surv(time, status) ~ 1,
  data         = colon_death,
  arm          = "arm",
  strata       = "strata",
  strata_type  = "additive",
  target_power = 0.60,
  L            = 1825,
  n_start = 100, n_step = 100, max_n = 10000
)
})
Estimated Effect from Reference Data
Statistic Value
Assumed RMST Difference (from pilot) -36.7735

Power Calculation

This example calculates the power for a given set of sample sizes in a stratified additive model using the same colon dataset.

Code
power_results_colon <- vignette_cache("additive_power_calc", {
rmst.power(
  Surv(time, status) ~ 1,
  data         = colon_death,
  arm          = "arm",
  strata       = "strata",
  strata_type  = "additive",
  sample_sizes = c(1000, 3000, 5000),
  L            = 1825
)
})
Power for Additive Stratified Colon Trial
N_per_Stratum Power
1000 0.3259
3000 0.7432
5000 0.9213


Multiplicative Stratified Models

Use the multiplicative model when treatment is expected to act on the RMST scale through a relative effect, such as a percentage increase or decrease in survival time.

Theory and Model

The multiplicative model, based on the work of (Wang et al. 2019), is defined as: \[\mu_{ij} = \mu_{0j} \exp(\beta'Z_i)\] In this model, the covariates \(Z_i\) have a multiplicative effect on the baseline stratum-specific RMST, \(\mu_{0j}\). This structure is equivalent to a linear model on the log-RMST.

Formal estimation of \(\beta\) requires an iterative solver. This package instead fits a weighted log-linear model (lm(log(Y_rmst) ~ ...)) to approximate the log-RMST ratio and its variance.

Analytical Methods

Power Calculation

This function calculates the power for various sample sizes using the analytical method for the multiplicative stratified model.

Code
power_ms_analytical <- vignette_cache("ms_power_analytical_example", {
rmst.power(
  Surv(time, status) ~ 1,
  data         = colon_death,
  arm          = "arm",
  strata       = "strata",
  strata_type  = "multiplicative",
  sample_sizes = c(300, 400, 500),
  L            = 1825
)
})
Power for Multiplicative Stratified Model
N_per_Stratum Power
300 0.5062
400 0.6259
500 0.7225

Sample Size Calculation

The following example demonstrates the sample size calculation for the multiplicative model.

Code
ms_ss_results_colon <- vignette_cache("ms_ss_analytical_example", {
rmst.ss(
  Surv(time, status) ~ 1,
  data         = colon_death,
  arm          = "arm",
  strata       = "strata",
  strata_type  = "multiplicative",
  target_power = 0.6,
  L            = 1825
)
})
Sample Size for Multiplicative Stratified Model
Statistic Value
Assumed log(RMST Ratio) (from pilot) -0.0898

Bootstrap Methods

The bootstrap approach provides a simulation-based analysis for the multiplicative model. Pass type = "boot" together with strata_type = "multiplicative".

Power Calculation (bootstrap)

Code
power_ms_boot <- vignette_cache("ms_power_boot_example", {
rmst.power(
  Surv(time, status) ~ 1,
  data           = colon_death,
  arm            = "arm",
  strata         = "strata",
  strata_type    = "multiplicative",
  sample_sizes   = c(100, 300, 500),
  L              = 1825,
  type           = "boot",
  n_sim          = 30,
  parallel.cores = 1
)
})
Power for Multiplicative Stratified Model (Bootstrap)
Statistic Value
Mean RMST Ratio 1.0001
95% CI Lower 0.9579
95% CI Upper 1.0501

Sample Size Calculation (bootstrap)

Code
ss_ms_boot <- vignette_cache("ms_ss_boot_example", {
rmst.ss(
  Surv(time, status) ~ 1,
  data           = colon_death,
  arm            = "arm",
  strata         = "strata",
  strata_type    = "multiplicative",
  target_power   = 0.75,
  L              = 1825,
  type           = "boot",
  n_sim          = 30,
  n_start        = 100,
  n_step         = 50,
  patience       = 4,
  parallel.cores = 1
)
})
Sample Size for Multiplicative Stratified Model (Bootstrap)
Statistic Value
Mean RMST Ratio 1.0081
95% CI Lower 0.9538
95% CI Upper 1.0684

Semiparametric GAM Models

When a covariate is expected to have a non-linear effect on the outcome, standard linear models may be misspecified. Generalized Additive Models (GAMs) handle this by fitting smooth functions.

Theory and Model

These functions use a bootstrap simulation approach combined with a GAM. The method involves two main steps:

  1. Jackknife Pseudo-Observations: The time-to-event outcome is first converted into jackknife pseudo-observations for the RMST. This technique (Andersen et al. 2004; Parner and Andersen 2010) creates a continuous, uncensored variable that represents each subject’s contribution to the RMST. This makes the outcome suitable for use in a standard regression framework.

  2. GAM Fitting: A GAM is then fitted to these pseudo-observations. The model has the form: \[\mathbb{E}[\text{pseudo}_i] = \beta_0 + \beta_{\text{effect}} \cdot \text{Treatment}_i + \sum_{k=1}^{q} f_k(\text{Covariate}_{ik})\] Here, \(f_k()\) are the non-linear smooth functions (splines) that the GAM estimates from the data.

Power Calculation Formula

Because this is a bootstrap method, power is not calculated from a direct formula but is instead estimated empirically from the simulations: \[\text{Power} = \frac{1}{B} \sum_{b=1}^{B} \mathbb{I}(p_b < \alpha)\] Where:

  • \(B\) is the total number of bootstrap simulations (n_sim).

  • \(p_b\) is the p-value for the treatment effect in the \(b\)-th simulation.

  • \(\mathbb{I}(\cdot)\) is the indicator function, which is 1 if the condition is true and 0 otherwise.

Bootstrap Methods

Wrap smooth covariates in s() in the formula — this automatically routes to the GAM bootstrap engine regardless of the type argument.

Power Calculation

Scenario: We use the gbsg (German Breast Cancer Study Group) dataset, suspecting that the progesterone receptor count (pgr) has a non-linear effect on recurrence-free survival. Here is a look at the prepared gbsg data.

The following code shows how to calculate power. Wrapping pgr in s() signals a smooth non-linear effect.

Code
power_gam <- vignette_cache("gbsg_power_calc", {
rmst.power(
  Surv(rfstime, status) ~ s(pgr),
  data           = gbsg_prepared,
  arm            = "arm",
  sample_sizes   = c(50, 200, 400),
  L              = 2825,
  n_sim          = 50,
  parallel.cores = 1
)
})

print(power_gam$results_plot)

Sample Size Calculation

Scenario: We want to find the sample size needed to achieve 95% power for detecting an effect of pgr on recurrence-free survival.

Code
ss_gam <- vignette_cache("gbsg_ss_calc", {
rmst.ss(
  Surv(rfstime, status) ~ s(pgr),
  data           = gbsg_prepared,
  arm            = "arm",
  target_power   = 0.95,
  L              = 182,
  n_sim          = 50,
  patience       = 5,
  parallel.cores = 1
)
})


Covariate-Dependent Censoring Models

In many observational or longitudinal studies, the probability of being censored may depend on measured subject characteristics such as age, disease stage, or biomarker level, but not on the unobserved event time after conditioning on those covariates. This setting is referred to as covariate-dependent (or conditionally independent) censoring. Inverse-probability-of-censoring weighting (IPCW) can account for this dependence under the stated model and support valid estimation of the restricted mean survival time (RMST).

Theory and Model

We assume a single censoring mechanism described by a Cox proportional-hazards model for the censoring time: \[ \text{Pr}(C \le t \mid X) = 1 - G(t \mid X), \qquad G(t \mid X) = \exp\{-\Lambda_C(t \mid X)\}, \] where \(X\) denotes the observed covariates (e.g., age, sex, treatment arm). Each subject receives an inverse-probability weight \[ W_i = \frac{1}{\widehat{G}(Y_i \mid X_i)}, \qquad Y_i = \min(T_i, L), \] with \(T_i\) the event time and \(L\) the truncation horizon for RMST. Weights are stabilized and truncated to prevent numerical instability. The weighted regression \[ E[Y_i \mid A_i, X_i] = \beta_0 + \beta_A A_i + X_i^\top\beta_X \] is then fitted using weighted least squares. The variance of \(\hat\beta_A\) is estimated with a sandwich-type estimator that treats the censoring model as known.

Power Calculation Formula

Analytic power is computed as \[ \text{Power} = \Phi\!\left( \frac{|\hat\beta_A|}{\sigma_N} - z_{1-\alpha/2} \right), \qquad \sigma_N^2 = \frac{\widehat{\mathrm{Var}}(\hat\beta_A)}{N}, \] where \(\widehat{\mathrm{Var}}(\hat\beta_A)\) is the asymptotic variance estimated from the reference data and \(N\) is the total sample size. Because the same IPCW model is used for all subjects, no cause-specific components appear in the variance.

Analytical Methods

We demonstrate this approach using the mgus2 dataset, modeling censoring as a function of age and sex under a single censoring mechanism.

Power Calculation

Setting dep_cens = TRUE routes the call to the dependent-censoring adjusted estimator.

Code
dc_power_results <- vignette_cache("dc_power_calc", {
rmst.power(
  Surv(time, status) ~ age,
  data         = mgus_prepared,
  arm          = "arm",
  sample_sizes = c(100, 250, 500),
  L            = 120,
  dep_cens     = TRUE,
  alpha        = 0.05
)
})
Analytic Power Analysis under Covariate-Dependent Censoring
Statistic Value
Assumed RMST Difference (from pilot) -10.6135

Sample-Size Calculation

We next estimate the per-arm sample size required to achieve 80% power at the same truncation time (10 years).

Code
ss_dc_mgus <- vignette_cache("mgus2_ss_calc", {
rmst.ss(
  Surv(time, status) ~ age,
  data         = mgus_prepared,
  arm          = "arm",
  target_power = 0.80,
  L            = 120,
  dep_cens     = TRUE,
  alpha        = 0.05,
  n_start = 100, n_step = 50, max_n = 5000
)
})
Estimated RMST Difference from Reference Data
Statistic Value
Assumed RMST Difference (from pilot) -10.6135


Interactive Shiny Application

RMSTpowerBoost also includes a Shiny web application for users who prefer a graphical interface.

Accessing the Application

There are two ways to access the application:

  1. Live Web Version: Access the application directly in your browser without any installation.

  2. Run Locally from the R Package: If you have installed the RMSTpowerBoost-Package, you can run the application on your own machine with the following command:

Code
RMSTpowerBoost::run_app()

App Features

Conclusion

RMSTpowerBoost centers its interface on rmst.power() and rmst.ss() for RMST-based power and sample size calculations under linear, stratified, semiparametric, and covariate-dependent censoring models. Both functions use the familiar Surv(time, status) ~ covariates syntax, with model choice controlled by a small set of arguments (strata, strata_type, dep_cens, type).

In practice, these procedures depend on representative reference data because effect sizes and variance components are estimated from that dataset. Bootstrap-based methods can also require substantial computation.

Future development could extend simulation-based procedures for covariate-dependent censoring and add model diagnostic tools for assessing reference-data assumptions.

References

Andersen, Per K., Mette G. Hansen, and John P. Klein. 2004. “Regression Analysis of Restricted Mean Survival Time Based on Pseudo-Observations.” Lifetime Data Analysis 10 (4): 335–50. https://doi.org/10.1007/s10985-004-4771-0.
Parner, Erik T., and Per K. Andersen. 2010. “Regression Analysis of Censored Data Using Pseudo-Observations.” Stata Journal 10 (3): 408–22. https://doi.org/10.1177/1536867X1001000308.
Royston, Patrick, and Mahesh KB Parmar. 2013. “Restricted Mean Survival Time: An Alternative to the Hazard Ratio for the Design and Analysis of Randomized Trials with a Time-to-Event Outcome.” BMC Medical Research Methodology 13 (1): 1–13.
Tian, Lu, Lihui Zhao, and LJ Wei. 2014. “Predicting the Restricted Mean Event Time with the Subject’s Baseline Covariates in Survival Analysis.” Biostatistics 15 (2): 222–33.
Uno, Hajime, Brian Claggett, Lu Tian, et al. 2014. “Moving Beyond the Hazard Ratio in Quantifying the Between-Group Difference in Survival Analysis.” Journal of Clinical Oncology 32 (22): 2380.
Wang, Xin, and Douglas E Schaubel. 2018. “Modeling Restricted Mean Survival Time Under General Censoring Mechanisms.” Lifetime Data Analysis 24: 176–99.
Wang, Xin, Yingchao Zhong, Purna Mukhopadhyay, and Douglas E Schaubel. 2019. “Computationally Efficient Inference for Center Effects Based on Restricted Mean Survival Time.” Statistics in Medicine 38 (27): 5133–45.
Zhang, Yuan, and Douglas E Schaubel. 2024. “Semiparametric Additive Modeling of the Restricted Mean Survival Time.” Biometrical Journal 66 (6): e202200371.