Introduction to gerda

library(gerda)

Overview

The gerda package provides functions to access and work with GERDA datasets. The German Election Database (GERDA) provides a comprehensive dataset of local, state, and federal election results in Germany. All datasets include turnout and vote shares for all major parties. Moreover, GERDA contains geographically harmonized datasets that account for changes in municipal boundaries and mail-in voting districts. GERDA covers federal, state (Bundesland), and municipal (Kreis and Gemeinde) elections.

In addition to election results, the package provides county-level socioeconomic covariates from INKAR, municipality-level data from the German Census 2022, and a party crosswalk that maps GERDA party names to standardized ParlGov attributes.

GERDA was compiled by Vincent Heddesheimer, Florian Sichart, Andreas Wiedemann and Hanno Hilbig. For additional information, see also the GERDA website (www.german-elections.com) and the accompanying publication: doi.org/10.1038/s41597-025-04811-5

This vignette will introduce you to the main functions of the package and demonstrate how to use them.

Available Datasets

To see a list of all available GERDA electoral result datasets, you can use the gerda_data_list() function:

gerda_data_list()
#> |data_name            |description                                                                          |
#> |:--------------------|:------------------------------------------------------------------------------------|
#> |municipal_unharm     |Local elections at the municipal level (1990-2020, unharmonized).                    |
#> |municipal_harm       |Local elections at the municipal level (1990-2020, harmonized).                      |
#> |state_unharm         |State elections at the municipal level (2006-2019, unharmonized).                    |
#> |state_harm           |State elections at the municipal level (2006-2019, harmonized).                      |
#> |federal_muni_raw     |Federal elections at the municipal level (1980-2025, raw data).                      |
#> |federal_muni_unharm  |Federal elections at the municipal level (1980-2025, unharmonized).                  |
#> |federal_muni_harm_21 |Federal elections at the municipal level (1990-2025, harmonized to 2021 boundaries). |
#> |federal_muni_harm_25 |Federal elections at the municipal level (1990-2025, harmonized to 2025 boundaries). |
#> |federal_cty_unharm   |Federal elections at the county level (1953-2021, unharmonized).                     |
#> |federal_cty_harm     |Federal elections at the county level (1990-2021, harmonized).                       |
#> |ags_crosswalks       |Crosswalks for municipalities (1990-2025).                                           |
#> |cty_crosswalks       |Crosswalks for counties (1990-2025).                                                 |
#> |ags_area_pop_emp     |Crosswalk covariates (area, population, employment) for municipalities (1990-2025).  |
#> |cty_area_pop_emp     |Crosswalk covariates (area, population, employment) for counties (1990-2025).        |

This function displays a formatted table with the names and descriptions of all available datasets. You can use the file_name column from this output to specify which dataset you want to load using the load_gerda_web() function.

Loading Data

The main function for loading GERDA data is load_gerda_web(). This function allows you to load a specific dataset from a web source. Here’s an example of how to use it:

# Load the municipal harmonized dataset
municipal_harm_data <- load_gerda_web("municipal_harm", verbose = TRUE, file_format = "rds")

The load_gerda_web() function takes the following parameters:

Example Workflow

Here’s an example of a typical workflow using the gerda package:

  1. List available datasets:
gerda_data_list()
#> |data_name            |description                                                                          |
#> |:--------------------|:------------------------------------------------------------------------------------|
#> |municipal_unharm     |Local elections at the municipal level (1990-2020, unharmonized).                    |
#> |municipal_harm       |Local elections at the municipal level (1990-2020, harmonized).                      |
#> |state_unharm         |State elections at the municipal level (2006-2019, unharmonized).                    |
#> |state_harm           |State elections at the municipal level (2006-2019, harmonized).                      |
#> |federal_muni_raw     |Federal elections at the municipal level (1980-2025, raw data).                      |
#> |federal_muni_unharm  |Federal elections at the municipal level (1980-2025, unharmonized).                  |
#> |federal_muni_harm_21 |Federal elections at the municipal level (1990-2025, harmonized to 2021 boundaries). |
#> |federal_muni_harm_25 |Federal elections at the municipal level (1990-2025, harmonized to 2025 boundaries). |
#> |federal_cty_unharm   |Federal elections at the county level (1953-2021, unharmonized).                     |
#> |federal_cty_harm     |Federal elections at the county level (1990-2021, harmonized).                       |
#> |ags_crosswalks       |Crosswalks for municipalities (1990-2025).                                           |
#> |cty_crosswalks       |Crosswalks for counties (1990-2025).                                                 |
#> |ags_area_pop_emp     |Crosswalk covariates (area, population, employment) for municipalities (1990-2025).  |
#> |cty_area_pop_emp     |Crosswalk covariates (area, population, employment) for counties (1990-2025).        |
  1. Load a dataset (in this case, the federal elections at the county level, harmonized):
federal_cty_harm <- load_gerda_web("federal_cty_harm", verbose = TRUE)

County-Level Covariates

The gerda package includes county-level socioeconomic and demographic covariates from INKAR (Indikatoren und Karten zur Raum- und Stadtentwicklung). These covariates can be easily merged with GERDA election data to enrich your analyses. INKAR data is available from 1995 to 2022, so covariates can be matched to federal elections from 1998 onwards (earlier elections fall outside the INKAR coverage window).

Quick Start

The easiest way to add covariates to your election data is using the add_gerda_covariates() function:

library(dplyr)

# Load election data and add covariates
merged <- load_gerda_web("federal_cty_harm") %>%
  add_gerda_covariates()

# Your data now includes 30 county-level covariates!

Under the hood, add_gerda_covariates() merges on county code and election year. It automatically:

Available Covariates

The covariates dataset includes 30 variables across 10 categories (for the full list of variable names, units, and descriptions, see gerda_covariates_codebook()):

Viewing the Codebook

To see detailed information about each covariate, including units and missing data patterns:

# Get the codebook
codebook <- gerda_covariates_codebook()
print(codebook)

# Find variables with good coverage
library(dplyr)
codebook %>%
  filter(missing_pct < 10) %>%
  select(variable, label, category)

Advanced Usage

For more control, you can access the raw covariates data:

# Get raw covariate data
covs <- gerda_covariates()

# Inspect before merging
summary(covs$unemployment_rate)

# Custom merge
elections <- load_gerda_web("federal_cty_harm")
merged <- elections %>%
  left_join(covs, by = c("county_code" = "county_code", "election_year" = "year"))

Data Coverage

Coverage varies by variable: core indicators (demographics, economy, labor market) are available for all 7 federal election years (1998-2021). Newer INKAR indicators (e.g., childcare, some healthcare variables) are available for 2-3 recent elections only. Consult the codebook’s missing_pct column to check per-variable availability before analysis.

Census 2022 Data

The gerda package includes municipality-level data from the German Census 2022 (Zensus 2022). This cross-sectional snapshot covers approximately 10,800 municipalities and can be merged with any GERDA election dataset.

The main advantage of this covariate data is that it is observed at the municipal level (unlike the county-level INKAR data). This allows for more fine-grained analyses of local election outcomes. However, the census is a single time point (2022), so it does not vary across election years. This means that the resulting merged dataset will have time-invariant covariates, i.e. each municipality receives the same census values for all election years. Users should not conduct analyses that rely on over-time variation in these covariates.

Quick Start

library(gerda)

# Add census data to municipal-level elections
muni_merged <- load_gerda_web("federal_muni_harm_21") |>
  add_gerda_census()

# Also works with county-level data (aggregated from municipalities)
county_merged <- load_gerda_web("federal_cty_harm") |>
  add_gerda_census()

Available Indicators

The census data includes 14 indicators across four categories:

Since the census is a 2022 snapshot, the same values are attached to all election years (see also the note above).

Viewing the Codebook

# Get the census codebook
census_cb <- gerda_census_codebook()
print(census_cb)

Data Coverage

Most census variables have >95% municipality coverage. avg_household_size_census22 has approximately 12.5% missing values because Destatis suppresses data for small municipalities under its disclosure rules.

Party Crosswalk Function

The party_crosswalk() function provides a mapping between GERDA party names and standardized party information from the ParlGov database. This is particularly useful for linking GERDA data with other political science datasets or for obtaining standardized party characteristics.

Usage

The function takes two main parameters:

Available Mapping Options

You can map GERDA party names to various standardized party characteristics, including:

Example

# Map GERDA party names to left-right positions
parties <- c("cdu", "spd", "linke_pds", "fdp")
left_right_scores <- party_crosswalk(parties, "left_right")
print(left_right_scores)

# Map to English party names
english_names <- party_crosswalk(parties, "party_name_english")
print(english_names)

This function is especially useful when you want to:

Conclusion

The gerda package provides easy access to a wide range of German election and related data. By using the gerda_data_list() function to explore available datasets and load_gerda_web() to load them, you can quickly incorporate this data into your research or analysis projects.

For more information or to provide feedback, please contact or visit the GitHub repository at https://github.com/hhilbig/gerda.