Getting started with highMLR

Overview

highMLR provides a single, unified interface for high-dimensional feature selection when the outcome is a (possibly censored) survival time. The same highmlr() call dispatches to one of several machine learning methods:

All methods return a highmlr_fit object with a common structure, so the downstream verbs (print(), summary(), plot(), coef(), predict()) and the companion functions (highmlr_compare(), highmlr_stability(), highmlr_explain(), highmlr_screen(), highmlr_report()) work identically regardless of which method produced the fit.

A first fit

The package ships with two bundled high-dimensional survival datasets, hnscc and srdata. Both use OS for the survival time; the event indicator is Death in hnscc and event in srdata (1 = event, 0 = censored).

library(highMLR)
data(hnscc)

fit <- highmlr(
  hnscc,
  time   = "OS",
  status = "Death",
  method = "coxnet",
  resampling = "cv",
  folds = 5
)

print(fit)
plot(fit, top_n = 20)

The examples in this vignette are not evaluated at build time because the underlying learners (glmnet, ranger, aorsf, xgboost, grf, survex) can be slow on high-dimensional data. Copy the chunks into an interactive session to run them.

Comparing methods

highmlr_compare() runs several methods on the same data and returns a tidy side-by-side summary:

cmp <- highmlr_compare(
  hnscc, "OS", "Death",
  methods = c("coxnet", "rsf", "univariate")
)
cmp$summary

Pre-screening when p is very large

For very wide data, reduce the candidate set first:

data(srdata)
keep <- highmlr_screen(srdata, "OS", "event",
                       filter = "variance", keep = 500)
fit  <- highmlr(srdata, "OS", "event",
                features = keep, method = "coxnet")

Explaining a fit

Time-dependent SHAP values (SurvSHAP(t)) are available via highmlr_explain(), and a one-file biomarker report can be generated with highmlr_report().

ex <- highmlr_explain(fit, new_data = hnscc, method = "survshap")
print(ex)
plot(ex)

Session information

sessionInfo()
#> R version 4.5.1 (2025-06-13 ucrt)
#> Platform: x86_64-w64-mingw32/x64
#> Running under: Windows 11 x64 (build 26200)
#> 
#> Matrix products: default
#>   LAPACK version 3.12.1
#> 
#> locale:
#> [1] LC_COLLATE=C                           
#> [2] LC_CTYPE=English_United Kingdom.utf8   
#> [3] LC_MONETARY=English_United Kingdom.utf8
#> [4] LC_NUMERIC=C                           
#> [5] LC_TIME=English_United Kingdom.utf8    
#> 
#> time zone: Europe/London
#> tzcode source: internal
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> loaded via a namespace (and not attached):
#>  [1] digest_0.6.37     R6_2.6.1          fastmap_1.2.0     xfun_0.56        
#>  [5] cachem_1.1.0      knitr_1.51        htmltools_0.5.8.1 rmarkdown_2.31   
#>  [9] lifecycle_1.0.5   cli_3.6.6         sass_0.4.10       jquerylib_0.1.4  
#> [13] compiler_4.5.1    rstudioapi_0.18.0 tools_4.5.1       evaluate_1.0.5   
#> [17] bslib_0.10.0      yaml_2.3.10       otel_0.2.0        jsonlite_2.0.0   
#> [21] rlang_1.2.0