Getting Started with seroreconstruct

Overview

seroreconstruct is an R package for Bayesian inference of influenza virus infection status from serial antibody measurements. It relaxes the traditional 4-fold rise rule by jointly modeling antibody boosting after infection, waning over time, and measurement error — estimating individual-level infection probabilities from hemagglutination inhibition (HAI) titer data.

The statistical framework is described in Tsang et al. (2022) Nature Communications 13:1557.

Input data

The package requires two data frames:

inputdata — one row per individual with columns:
- age_group: integer (0 = children, 1 = adults, 2 = older adults)
- start_time, end_time: follow-up period (integer days)
- time1, time2, time3: serum collection dates (integer days)
- HAI_titer_1, HAI_titer_2, HAI_titer3: HAI titers at each collection
inputILI — daily influenza-like illness activity, with rows corresponding to consecutive days. The row indices must cover the range of start_time to end_time in inputdata.

library(seroreconstruct)

data("inputdata")
data("flu_activity")

head(inputdata)
#>   age_group start_time end_time time1 time2 time3 HAI_titer_1 HAI_titer_2
#> 1         0        837     1192   837  1032  1192           0           0
#> 2         2        837     1192   837  1032  1192           0           0
#> 3         2        837     1192   837  1032  1192           0           0
#> 4         0        837     1192   837  1032  1192           4           4
#> 5         0        845     1191   845  1034  1191           0           0
#> 6         2        845     1191   845  1034  1191           0           0
#>   HAI_titer3
#> 1          0
#> 2          0
#> 3          0
#> 4          4
#> 5          0
#> 6          0
dim(inputdata)
#> [1] 1753    9

Fitting the model

The main function sero_reconstruct() runs a Bayesian MCMC to estimate infection probabilities and antibody dynamics parameters. For real analyses, use at least 100,000 iterations with appropriate burn-in and thinning.

fit <- sero_reconstruct(inputdata, flu_activity,
                        n_iteration = 200000, burnin = 100000, thinning = 10)

For this vignette, we use a short run for illustration:

fit <- sero_reconstruct(inputdata, flu_activity,
                        n_iteration = 2000, burnin = 1000, thinning = 1)
#> 1000
#> MCMC complete in 23 seconds. Use summary() to view estimates.
fit
#> seroreconstruct fit
#>   Individuals: 1753 
#>   Age groups: 3 
#>   Posterior samples: 1000 
#>   Runtime: 23 seconds
#> 
#> Use summary() to extract model estimates.

Viewing results

The summary() method extracts key estimates with 95% credible intervals:

summary(fit)
#>                                                                                                        Variable
#>                                                                                                Random error (%)
#>                                                                                              Two-fold error (%)
#>                                                           Fold-increase after infection for children (Boosting)
#>                                                                Fold-decrease after 1 year for children (Waning)
#>                                                                Fold-increase after 1 year for adults (Boosting)
#>                                                                  Fold-decrease after 1 year for adults (Waning)
#>                                                                              Infection probability for children
#>                                                                                Infection probability for adults
#>                                                                          Infection probability for older adults
#>                                             Infection probability for children with pre-epidemic HAI titer < 10
#>                                               Infection probability for adults with pre-epidemic HAI titer < 10
#>                                         Infection probability for older adults with pre-epidemic HAI titer < 10
#>                                                                        Relative risk for children (Ref: Adults)
#>                                                                    Relative risk for older adults (Ref: Adults)
#>      Relative risk for children with pre-epidemic HAI titer < 10 (Ref: Adults with pre-epidemic HAI titer < 10)
#>  Relative risk for older adults with pre-epidemic HAI titer < 10 (Ref: Adults with pre-epidemic HAI titer < 10)
#>  Point estimate Lower bound Upper bound
#>            2.45        1.46        3.87
#>            3.35        4.09        2.70
#>            8.16        4.41       12.57
#>            1.06        1.02        1.11
#>            6.20        4.46        8.03
#>            1.12        1.08        1.19
#>            0.17        0.15        0.20
#>            0.21        0.17        0.25
#>            0.14        0.09        0.18
#>            0.39        0.29        0.49
#>            0.30        0.24        0.36
#>            0.20        0.14        0.27
#>            0.84        0.66        1.07
#>            1.32        1.01        1.69
#>            0.66        0.41        0.95
#>            0.67        0.43        0.95

The summary table includes:

Random error (%) and Two-fold error (%): measurement error parameters
Boosting: fold-increase in antibody titer after infection
Waning: fold-decrease in antibody titer per year post-infection
Infection probability: overall and stratified by baseline HAI titer

Visualization

MCMC diagnostics

Check convergence with trace plots and posterior density plots:

plot_diagnostics(fit, params = c("random_error", "twofold_error"))

Antibody trajectories

Visualize posterior trajectories for individual participants. Red lines show trajectories where infection occurred; blue lines show trajectories without infection.

plot_trajectory(fit, id = 1)

Boosting and waning

oldpar <- par(mfrow = c(1, 2))
plot_boosting(fit)
plot_waning(fit)

par(oldpar)

Infection probabilities

Forest plot of posterior infection probabilities with 95% credible intervals:

plot_infection_prob(fit)

Tables

Extract parameter estimates and individual-level results as data frames:

# Model parameter estimates
table_parameters(fit)
#>                Parameter         Mean       Median        Lower        Upper
#> 1           random_error  0.002449224  0.002360517  0.001458128  0.003871297
#> 2          twofold_error  2.008823851  2.002093441  1.811109977  2.226452899
#> 3      boosting_children  3.028080030  3.065484041  2.142161893  3.652285927
#> 4        waning_children  0.078596897  0.077203527  0.027590924  0.147103681
#> 5        boosting_adults  2.631771904  2.629185001  2.156949228  3.004526332
#> 6          waning_adults  0.162706514  0.158313846  0.113593483  0.255005105
#> 7      inf_prob_children  0.533519831  0.535213356  0.372523502  0.721366919
#> 8        inf_prob_adults  0.379860281  0.381669558  0.293712880  0.475050602
#> 9  inf_prob_older_adults  0.236305160  0.232561598  0.158564652  0.334360678
#> 10              hai_coef -0.429737528 -0.450430439 -0.601178971 -0.253807632

# Per-individual infection probabilities
head(table_infections(fit))
#>   Individual Infection_prob Infection_time_mean Baseline_titer_mean
#> 1          1              0                  NA                0.52
#> 2          2              0                  NA                0.46
#> 3          3              0                  NA                0.49
#> 4          4              0                  NA                4.50
#> 5          5              0                  NA                0.50
#> 6          6              0                  NA                0.50

Subgroup analysis

Compare infection rates across groups by fitting independent MCMCs:

fit_by_age <- sero_reconstruct(inputdata, flu_activity,
                               n_iteration = 200000, burnin = 100000,
                               thinning = 10, group_by = ~age_group)

summary(fit_by_age)

Joint model with shared parameters

When comparing groups, measurement error is a lab assay property — it is shared across all groups. You can additionally share boosting/waning if the groups are expected to have similar antibody dynamics:

# Share all parameters except infection probability
fit_joint <- sero_reconstruct(inputdata, flu_activity,
                              n_iteration = 200000, burnin = 100000,
                              thinning = 10,
                              group_by = ~age_group,
                              shared = c("error", "boosting_waning"))

summary(fit_joint)

Simulation

Generate synthetic data for power analysis or validation:

data("para1")
data("para2")

sim <- simulate_data(inputdata, flu_activity, para1, para2)
names(sim)
#> [1] "age_group"   "start_time"  "end_time"    "time1"       "time2"      
#> [6] "time3"       "HAI_titer_1" "HAI_titer_2" "HAI_titer3"

The simulated data can be passed directly to sero_reconstruct() for simulation-recovery studies.