Cumulative Link Models with CmdStanR
clmstan fits cumulative link models (CLMs) for ordinal
categorical data using CmdStanR. It supports 11 link functions including
standard links (logit, probit, cloglog) and flexible parametric links
(GEV, AEP, Symmetric Power).
Models are pre-compiled using the instantiate package
for fast execution without runtime compilation.
Full documentation is available at: https://t-momozaki.github.io/clmstan/
This package requires:
# Install cmdstanr from r-universe (recommended)
install.packages("cmdstanr",
repos = c("https://stan-dev.r-universe.dev",
getOption("repos")))library(cmdstanr)
install_cmdstan() # Only needed once# From CRAN (when available)
install.packages("clmstan")
# From GitHub (development version)
# install.packages("devtools")
devtools::install_github("t-momozaki/clmstan")Note: During package installation, Stan models are compiled automatically. This may take a few minutes on first install.
library(clmstan)
# Example data
set.seed(123)
n <- 100
x <- rnorm(n)
latent <- 1.0 * x + rlogis(n)
y <- cut(latent, breaks = c(-Inf, -1, 0, 1, Inf), labels = 1:4)
data <- data.frame(y = y, x = x)
# Fit a cumulative link model with logit link
fit <- clm_stan(y ~ x, data = data, link = "logit",
chains = 4, iter = 2000, warmup = 1000)
# View results
fit$fit$summary(variables = c("beta", "c_transformed", "beta0"))| Link | Distribution | Use Case |
|---|---|---|
| logit | Logistic | Default, proportional odds |
| probit | Normal | Symmetric, latent variable interpretation |
| cloglog | Gumbel (max) | Asymmetric, proportional hazards |
| loglog | Gumbel (min) | Asymmetric |
| cauchit | Cauchy | Heavy tails |
| Link | Parameter | Description |
|---|---|---|
| tlink | \(\nu > 0\) | \(t\)-distribution, adjustable tail weight |
| aranda_ordaz | \(\lambda > 0\) | Generalized asymmetric link |
| sp | \(r > 0\), base | Symmetric Power, adjustable skewness |
| log_gamma | \(\lambda \in \mathbb{R}\) | Continuous symmetric/asymmetric adjustment |
| gev | \(\xi \in \mathbb{R}\) | Generalized Extreme Value |
| aep | \(\theta_1, \theta_2 > 0\) | Asymmetric Exponential Power |
# Fixed parameter
fit_t <- clm_stan(y ~ x, data = data, link = "tlink",
link_param = list(df = 8))
# Estimate parameter with Bayesian inference
fit_gev <- clm_stan(y ~ x, data = data, link = "gev",
link_param = list(xi = "estimate"))| Structure | Description |
|---|---|
| flexible | Free thresholds (default) |
| equidistant | Equal spacing between thresholds |
| symmetric | Symmetric around center |
# Equidistant thresholds
fit_equi <- clm_stan(y ~ x, data = data, threshold = "equidistant")clmstan uses weakly informative default priors:
| Parameter | Default Prior |
|---|---|
| Regression coefficients (\(\beta\)) | normal(0, 2.5) |
| Thresholds (\(c\)) | normal(0, 10) |
| Equidistant spacing (\(d\)) | gamma(2, 0.5) |
For link parameters estimated via Bayesian inference:
| Link | Parameter | Default Prior |
|---|---|---|
| tlink | \(\nu\) | gamma(2, 0.1) |
| aranda_ordaz | \(\lambda\) | gamma(0.5, 0.5) |
| sp | \(r\) | gamma(0.5, 0.5) |
| log_gamma | \(\lambda\) | normal(0, 1) |
| gev | \(\xi\) | normal(0, 2) |
| aep | \(\theta_1, \theta_2\) | gamma(2, 1) |
Use the prior() function with distribution helpers:
# Tighter prior on regression coefficients
fit <- clm_stan(y ~ x, data = data,
prior = prior(normal(0, 1), class = "b"))
# Multiple priors
fit <- clm_stan(y ~ x, data = data,
prior = c(
prior(normal(0, 1), class = "b"),
prior(normal(0, 5), class = "Intercept")
))When estimating link parameters, you can specify custom priors:
# Custom prior for t-link df parameter
fit <- clm_stan(y ~ x, data = data, link = "tlink",
link_param = list(df = "estimate"),
prior = prior(gamma(3, 0.2), class = "df"))
# Custom prior for GEV xi parameter
fit <- clm_stan(y ~ x, data = data, link = "gev",
link_param = list(xi = "estimate"),
prior = prior(normal(0, 0.5), class = "xi"))| Function | Parameters | Example |
|---|---|---|
normal(mu, sigma) |
\(\mu\): mean, \(\sigma\): SD | normal(0, 2.5) |
gamma(alpha, beta) |
\(\alpha\): shape, \(\beta\): rate | gamma(2, 0.1) |
student_t(df, mu, sigma) |
\(\nu\): df, \(\mu\): location, \(\sigma\): scale | student_t(3, 0, 2.5) |
cauchy(mu, sigma) |
\(\mu\): location, \(\sigma\): scale | cauchy(0, 2.5) |
flat() |
none | flat() |
Note: flat() creates an improper
uniform prior. Use with caution as it may lead to improper posteriors if
the data does not provide sufficient information. For thresholds with
ordered constraints, Stan’s internal transformation provides implicit
regularization.
| Class | Description | Compatible Distributions |
|---|---|---|
b |
Regression coefficients (\(\beta\)) | normal, student_t, cauchy, flat |
Intercept |
Thresholds (\(c\), flexible) | normal, student_t, cauchy, flat |
c1 |
First threshold (\(c_1\), equidistant) | normal, student_t, cauchy, flat |
cpos |
Positive thresholds (symmetric) | normal, student_t, cauchy, flat |
d |
Equidistant spacing (\(d\)) | gamma |
df |
t-link degrees of freedom (\(\nu\)) | gamma |
lambda_ao |
Aranda-Ordaz \(\lambda\) | gamma |
r |
Symmetric Power \(r\) | gamma |
lambda_lg |
Log-gamma \(\lambda\) | normal, student_t, cauchy |
xi |
GEV \(\xi\) | normal, student_t, cauchy |
theta1, theta2 |
AEP shape (\(\theta_1, \theta_2\)) | gamma |
MIT