Getting Started with seroreconstruct

Overview

seroreconstruct is an R package for Bayesian inference of influenza virus infection status from serial antibody measurements. It relaxes the traditional 4-fold rise rule by jointly modeling antibody boosting after infection, waning over time, and measurement error — estimating individual-level infection probabilities from hemagglutination inhibition (HAI) titer data.

The statistical framework is described in Tsang et al. (2022) Nature Communications 13:1557.

Input data

The package requires two data frames:

  1. inputdata — one row per individual with columns:
    • age_group: integer (0 = children, 1 = adults, 2 = older adults)
    • start_time, end_time: follow-up period (integer days)
    • time1, time2, time3: serum collection dates (integer days)
    • HAI_titer_1, HAI_titer_2, HAI_titer3: HAI titers at each collection
  2. inputILI — daily influenza-like illness activity, with rows corresponding to consecutive days. The row indices must cover the range of start_time to end_time in inputdata.
library(seroreconstruct)

data("inputdata")
data("flu_activity")

head(inputdata)
#>   age_group start_time end_time time1 time2 time3 HAI_titer_1 HAI_titer_2
#> 1         0        837     1192   837  1032  1192           0           0
#> 2         2        837     1192   837  1032  1192           0           0
#> 3         2        837     1192   837  1032  1192           0           0
#> 4         0        837     1192   837  1032  1192           4           4
#> 5         0        845     1191   845  1034  1191           0           0
#> 6         2        845     1191   845  1034  1191           0           0
#>   HAI_titer3
#> 1          0
#> 2          0
#> 3          0
#> 4          4
#> 5          0
#> 6          0
dim(inputdata)
#> [1] 1753    9

Fitting the model

The main function sero_reconstruct() runs a Bayesian MCMC to estimate infection probabilities and antibody dynamics parameters. For real analyses, use at least 100,000 iterations with appropriate burn-in and thinning.

fit <- sero_reconstruct(inputdata, flu_activity,
                        n_iteration = 200000, burnin = 100000, thinning = 10)

For this vignette, we use a short run for illustration:

fit <- sero_reconstruct(inputdata, flu_activity,
                        n_iteration = 2000, burnin = 1000, thinning = 1)
#> 1000
#> MCMC complete in 23 seconds. Use summary() to view estimates.
fit
#> seroreconstruct fit
#>   Individuals: 1753 
#>   Age groups: 3 
#>   Posterior samples: 1000 
#>   Runtime: 23 seconds
#> 
#> Use summary() to extract model estimates.

Viewing results

The summary() method extracts key estimates with 95% credible intervals:

summary(fit)
#>                                                                                                        Variable
#>                                                                                                Random error (%)
#>                                                                                              Two-fold error (%)
#>                                                           Fold-increase after infection for children (Boosting)
#>                                                                Fold-decrease after 1 year for children (Waning)
#>                                                                Fold-increase after 1 year for adults (Boosting)
#>                                                                  Fold-decrease after 1 year for adults (Waning)
#>                                                                              Infection probability for children
#>                                                                                Infection probability for adults
#>                                                                          Infection probability for older adults
#>                                             Infection probability for children with pre-epidemic HAI titer < 10
#>                                               Infection probability for adults with pre-epidemic HAI titer < 10
#>                                         Infection probability for older adults with pre-epidemic HAI titer < 10
#>                                                                        Relative risk for children (Ref: Adults)
#>                                                                    Relative risk for older adults (Ref: Adults)
#>      Relative risk for children with pre-epidemic HAI titer < 10 (Ref: Adults with pre-epidemic HAI titer < 10)
#>  Relative risk for older adults with pre-epidemic HAI titer < 10 (Ref: Adults with pre-epidemic HAI titer < 10)
#>  Point estimate Lower bound Upper bound
#>            2.45        1.46        3.87
#>            3.35        4.09        2.70
#>            8.16        4.41       12.57
#>            1.06        1.02        1.11
#>            6.20        4.46        8.03
#>            1.12        1.08        1.19
#>            0.17        0.15        0.20
#>            0.21        0.17        0.25
#>            0.14        0.09        0.18
#>            0.39        0.29        0.49
#>            0.30        0.24        0.36
#>            0.20        0.14        0.27
#>            0.84        0.66        1.07
#>            1.32        1.01        1.69
#>            0.66        0.41        0.95
#>            0.67        0.43        0.95

The summary table includes:

Visualization

MCMC diagnostics

Check convergence with trace plots and posterior density plots:

plot_diagnostics(fit, params = c("random_error", "twofold_error"))

Antibody trajectories

Visualize posterior trajectories for individual participants. Red lines show trajectories where infection occurred; blue lines show trajectories without infection.

plot_trajectory(fit, id = 1)

Boosting and waning

oldpar <- par(mfrow = c(1, 2))
plot_boosting(fit)
plot_waning(fit)

par(oldpar)

Infection probabilities

Forest plot of posterior infection probabilities with 95% credible intervals:

plot_infection_prob(fit)

Tables

Extract parameter estimates and individual-level results as data frames:

# Model parameter estimates
table_parameters(fit)
#>                Parameter         Mean       Median        Lower        Upper
#> 1           random_error  0.002449224  0.002360517  0.001458128  0.003871297
#> 2          twofold_error  2.008823851  2.002093441  1.811109977  2.226452899
#> 3      boosting_children  3.028080030  3.065484041  2.142161893  3.652285927
#> 4        waning_children  0.078596897  0.077203527  0.027590924  0.147103681
#> 5        boosting_adults  2.631771904  2.629185001  2.156949228  3.004526332
#> 6          waning_adults  0.162706514  0.158313846  0.113593483  0.255005105
#> 7      inf_prob_children  0.533519831  0.535213356  0.372523502  0.721366919
#> 8        inf_prob_adults  0.379860281  0.381669558  0.293712880  0.475050602
#> 9  inf_prob_older_adults  0.236305160  0.232561598  0.158564652  0.334360678
#> 10              hai_coef -0.429737528 -0.450430439 -0.601178971 -0.253807632

# Per-individual infection probabilities
head(table_infections(fit))
#>   Individual Infection_prob Infection_time_mean Baseline_titer_mean
#> 1          1              0                  NA                0.52
#> 2          2              0                  NA                0.46
#> 3          3              0                  NA                0.49
#> 4          4              0                  NA                4.50
#> 5          5              0                  NA                0.50
#> 6          6              0                  NA                0.50

Subgroup analysis

Compare infection rates across groups by fitting independent MCMCs:

fit_by_age <- sero_reconstruct(inputdata, flu_activity,
                               n_iteration = 200000, burnin = 100000,
                               thinning = 10, group_by = ~age_group)

summary(fit_by_age)

Joint model with shared parameters

When comparing groups, measurement error is a lab assay property — it is shared across all groups. You can additionally share boosting/waning if the groups are expected to have similar antibody dynamics:

# Share all parameters except infection probability
fit_joint <- sero_reconstruct(inputdata, flu_activity,
                              n_iteration = 200000, burnin = 100000,
                              thinning = 10,
                              group_by = ~age_group,
                              shared = c("error", "boosting_waning"))

summary(fit_joint)

Simulation

Generate synthetic data for power analysis or validation:

data("para1")
data("para2")

sim <- simulate_data(inputdata, flu_activity, para1, para2)
names(sim)
#> [1] "age_group"   "start_time"  "end_time"    "time1"       "time2"      
#> [6] "time3"       "HAI_titer_1" "HAI_titer_2" "HAI_titer3"

The simulated data can be passed directly to sero_reconstruct() for simulation-recovery studies.

Citation

To cite seroreconstruct in publications:

Tsang TK, Perera RAPM, Fang VJ, Wong JY, Shiu EY, So HC, Ip DKM, Malik Peiris JS, Leung GM, Cowling BJ, Cauchemez S. (2022). Reconstructing antibody dynamics to estimate the risk of influenza virus infection. Nature Communications 13:1557. doi: 10.1038/s41467-022-29310-8