---
title: "Get started"
format: html
execute:
  eval: true
vignette: >
  %\VignetteIndexEntry{Get started}
  %\VignetteEngine{quarto::html}
  %\VignetteEncoding{UTF-8}
---

```{r results='hide', message=FALSE, warning=FALSE}
library(chmsflow)
```

## 1. Introduction

chmsflow harmonizes variables from the Canadian Health Measures Survey (CHMS) across cycles 1--6 and derives health indicators used in health research. It works with the [recodeflow](https://big-life-lab.github.io/recodeflow/) package to transform raw CHMS variables into analysis-ready versions using recoding rules defined in CSV metadata files.

### What chmsflow provides

The package includes two metadata CSV files (`variables.csv` and `variable-details.csv`) that define how raw CHMS variables are recoded, and 42 functions that derive new health indicators. The table below summarizes the available variables, organized by `section` and `subject` as defined in `variables.csv`:

| Section | Subject | Examples |
|---------|---------|----------|
| Sociodemographics | Age, sex, ethnicity | `clc_age`, `clc_sex`, `pgdcgt` |
| Socioeconomic | Income, education, occupation, marital status | `adj_hh_income`, `income_quintile`, `edudr04` |
| Health status | Blood pressure, hypertension | `sbp_adj_mmhg`, `htn_status`, `htn_control_status` |
| Health status | Chronic disease (diabetes, CKD, CVD) | `diab_status`, `ckd_status`, `cvd_status` |
| Health status | Medication (8 drug classes from ATC codes) | `ace_med`, `any_htn_med`, `diab_med` |
| Health status | Weight, height, cholesterol | `nonhdl_mmoll`, `waist_height_ratio`, `hwmdbmi` |
| Health status | Family history | `cvd_premature_famhist_status`, `fam_bp` |
| Health behaviour | Alcohol, diet | `alc_risk_score`, `fv_daily_times`, `healthy_diet_indicator` |
| Health behaviour | Exercise | `exercise_min_week`, `enough_exercise_indicator` |
| Health behaviour | Smoking | `pack_years`, `smoke` |

For the full variable list, see [Variable schema reference](variables_and_variable_details.html).

### Typical workflow

1. **Merge cycle components** - At the RDC, combine household, clinic, and lab data into one object per cycle (e.g., `cycle4`). Keep medication data separate as `cyclex_meds`.
2. **Recode medications first** - If your analysis needs medication variables, always recode them before other variables. See [Recoding medications](recoding_medications.html).
3. **Recode other variables** - Use `rec_with_table()` from recodeflow to transform source variables and derive new ones.

## 2. Installation

```{r, eval=FALSE}
# Install release version from CRAN
install.packages("chmsflow")

# Install the most recent version from GitHub
devtools::install_github("Big-Life-Lab/chmsflow")
```

## 3. Quick start

Use `rec_with_table()` from recodeflow to transform CHMS variables. The cycle data object must be named `cyclex` for recoding to work properly.

```{r, warning=FALSE}
library(recodeflow)

# Recode a source variable (age)
cycle4_ages <- rec_with_table(
  cycle4, "clc_age",
  variable_details = variable_details, log = TRUE
)
head(cycle4_ages)
```

## 4. Variable types

chmsflow handles three types of variables, each recoded differently.

### 4.1 Source variables (direct mapping)

Source variables are mapped directly from raw CHMS columns. Variable names may differ across cycles, but chmsflow harmonizes them to a single name.

```{r, warning=FALSE}
# Recode sex (same variable name across all cycles)
cycle4_sexes <- rec_with_table(
  cycle4, "clc_sex",
  variable_details = variable_details, log = TRUE
)
head(cycle4_sexes)
```

### 4.2 Transformed variables (continuous to categorical)

Some variables convert continuous measurements into categories using thresholds defined in `variable-details.csv`.

```{r, warning=FALSE}
# Recode age into 4 groups
cycle4_categorical_ages <- rec_with_table(
  cycle4, "agegroup4",
  variable_details = variable_details, log = TRUE
)
head(cycle4_categorical_ages)
```

### 4.3 Derived variables (computed by functions)

Derived variables are computed by R functions referenced as `Func::` entries in `variable-details.csv`. These require their input variables to be present in the data. See [Derived variables](derived_variables.html) for details.

```{r, warning=FALSE}
# Derive adjusted systolic blood pressure
# bpmdpbps (raw SBP) must be in the data for sbp_adj_mmhg to be computed
cycle4_adjusted_SBPs <- rec_with_table(
  cycle4, c("bpmdpbps", "sbp_adj_mmhg"),
  variable_details = variable_details, log = TRUE
)
head(cycle4_adjusted_SBPs)
```

## 5. Next steps

- **Full walkthrough** -- End-to-end hypertension prevalence analysis in [Analysis walkthrough](analysis_walkthrough.html).
- **Medication recoding** -- Required before deriving hypertension or diabetes status. See [Recoding medications](recoding_medications.html).
- **Understanding the metadata** -- Learn about the CSV schema in [Variable schema reference](variables_and_variable_details.html).
- **Derived variables** -- How `Func::` and `DerivedVar::` entries work in [Derived variables](derived_variables.html).
- **Adding variables** -- Extend chmsflow with your own variables in [How to add variables](how_to_add_variables.html).
- **Missing data** -- How `haven::tagged_na()` handles CHMS missing codes in [Missing data (tagged_na)](tagged_na_usage.html).
- **Methodology** -- Why harmonization is non-trivial and how chmsflow works in [Methodology](methodology.html).
- **RDC setup** -- Using chmsflow at a Research Data Centre in [Using chmsflow at an RDC](using_chmsflow_at_an_rdc.html).