The gerda package provides functions to access and work
with GERDA datasets. The German Election Database (GERDA) provides a
comprehensive dataset of local, state, and federal election results in
Germany. All datasets include turnout and vote shares for all major
parties. Moreover, GERDA contains geographically harmonized datasets
that account for changes in municipal boundaries and mail-in voting
districts. GERDA covers federal, state (Bundesland), and
municipal (Kreis and Gemeinde) elections.
In addition to election results, the package provides county-level socioeconomic covariates from INKAR, municipality-level data from the German Census 2022, and a party crosswalk that maps GERDA party names to standardized ParlGov attributes.
GERDA was compiled by Vincent Heddesheimer, Florian Sichart, Andreas Wiedemann and Hanno Hilbig. For additional information, see also the GERDA website (www.german-elections.com) and the accompanying publication: doi.org/10.1038/s41597-025-04811-5
This vignette will introduce you to the main functions of the package and demonstrate how to use them.
To see a list of all available GERDA electoral result datasets, you
can use the gerda_data_list() function:
gerda_data_list()
#> |data_name |description |
#> |:--------------------|:------------------------------------------------------------------------------------|
#> |municipal_unharm |Local elections at the municipal level (1990-2020, unharmonized). |
#> |municipal_harm |Local elections at the municipal level (1990-2020, harmonized). |
#> |state_unharm |State elections at the municipal level (2006-2019, unharmonized). |
#> |state_harm |State elections at the municipal level (2006-2019, harmonized). |
#> |federal_muni_raw |Federal elections at the municipal level (1980-2025, raw data). |
#> |federal_muni_unharm |Federal elections at the municipal level (1980-2025, unharmonized). |
#> |federal_muni_harm_21 |Federal elections at the municipal level (1990-2025, harmonized to 2021 boundaries). |
#> |federal_muni_harm_25 |Federal elections at the municipal level (1990-2025, harmonized to 2025 boundaries). |
#> |federal_cty_unharm |Federal elections at the county level (1953-2021, unharmonized). |
#> |federal_cty_harm |Federal elections at the county level (1990-2021, harmonized). |
#> |ags_crosswalks |Crosswalks for municipalities (1990-2025). |
#> |cty_crosswalks |Crosswalks for counties (1990-2025). |
#> |ags_area_pop_emp |Crosswalk covariates (area, population, employment) for municipalities (1990-2025). |
#> |cty_area_pop_emp |Crosswalk covariates (area, population, employment) for counties (1990-2025). |This function displays a formatted table with the names and
descriptions of all available datasets. You can use the
file_name column from this output to specify which dataset
you want to load using the load_gerda_web() function.
The main function for loading GERDA data is
load_gerda_web(). This function allows you to load a
specific dataset from a web source. Here’s an example of how to use
it:
# Load the municipal harmonized dataset
municipal_harm_data <- load_gerda_web("municipal_harm", verbose = TRUE, file_format = "rds")The load_gerda_web() function takes the following
parameters:
file_name: A character string with the name of the
dataset to load, e.g. "federal_cty_harm" (as shown in the
gerda_data_list() output). The function supports fuzzy
matching, so close misspellings will produce a helpful suggestion.verbose: If set to TRUE, it prints
messages about the loading process (default is FALSE)file_format: Specifies the format of the file to load,
either "rds" or "csv" (default is
"rds"). Both formats return the same tibble, so this choice
only affects download size and speed.Here’s an example of a typical workflow using the gerda
package:
gerda_data_list()
#> |data_name |description |
#> |:--------------------|:------------------------------------------------------------------------------------|
#> |municipal_unharm |Local elections at the municipal level (1990-2020, unharmonized). |
#> |municipal_harm |Local elections at the municipal level (1990-2020, harmonized). |
#> |state_unharm |State elections at the municipal level (2006-2019, unharmonized). |
#> |state_harm |State elections at the municipal level (2006-2019, harmonized). |
#> |federal_muni_raw |Federal elections at the municipal level (1980-2025, raw data). |
#> |federal_muni_unharm |Federal elections at the municipal level (1980-2025, unharmonized). |
#> |federal_muni_harm_21 |Federal elections at the municipal level (1990-2025, harmonized to 2021 boundaries). |
#> |federal_muni_harm_25 |Federal elections at the municipal level (1990-2025, harmonized to 2025 boundaries). |
#> |federal_cty_unharm |Federal elections at the county level (1953-2021, unharmonized). |
#> |federal_cty_harm |Federal elections at the county level (1990-2021, harmonized). |
#> |ags_crosswalks |Crosswalks for municipalities (1990-2025). |
#> |cty_crosswalks |Crosswalks for counties (1990-2025). |
#> |ags_area_pop_emp |Crosswalk covariates (area, population, employment) for municipalities (1990-2025). |
#> |cty_area_pop_emp |Crosswalk covariates (area, population, employment) for counties (1990-2025). |The gerda package includes county-level socioeconomic
and demographic covariates from INKAR (Indikatoren und Karten zur Raum-
und Stadtentwicklung). These covariates can be easily merged with GERDA
election data to enrich your analyses. INKAR data is available from 1995
to 2022, so covariates can be matched to federal elections from 1998
onwards (earlier elections fall outside the INKAR coverage window).
The easiest way to add covariates to your election data is using the
add_gerda_covariates() function:
library(dplyr)
# Load election data and add covariates
merged <- load_gerda_web("federal_cty_harm") %>%
add_gerda_covariates()
# Your data now includes 30 county-level covariates!Under the hood, add_gerda_covariates() merges on county
code and election year. It automatically:
county_code or
ags, and election_year) are presentThe covariates dataset includes 30 variables across 10 categories
(for the full list of variable names, units, and descriptions, see
gerda_covariates_codebook()):
To see detailed information about each covariate, including units and missing data patterns:
For more control, you can access the raw covariates data:
Coverage varies by variable: core indicators (demographics, economy,
labor market) are available for all 7 federal election years
(1998-2021). Newer INKAR indicators (e.g., childcare, some healthcare
variables) are available for 2-3 recent elections only. Consult the
codebook’s missing_pct column to check per-variable
availability before analysis.
The gerda package includes municipality-level data from
the German Census 2022 (Zensus 2022). This cross-sectional snapshot
covers approximately 10,800 municipalities and can be merged with any
GERDA election dataset.
The main advantage of this covariate data is that it is observed at the municipal level (unlike the county-level INKAR data). This allows for more fine-grained analyses of local election outcomes. However, the census is a single time point (2022), so it does not vary across election years. This means that the resulting merged dataset will have time-invariant covariates, i.e. each municipality receives the same census values for all election years. Users should not conduct analyses that rely on over-time variation in these covariates.
The census data includes 14 indicators across four categories:
Since the census is a 2022 snapshot, the same values are attached to all election years (see also the note above).
Most census variables have >95% municipality coverage.
avg_household_size_census22 has approximately 12.5% missing
values because Destatis suppresses data for small municipalities under
its disclosure rules.
The party_crosswalk() function provides a mapping
between GERDA party names and standardized party information from the
ParlGov database. This is particularly useful for linking GERDA data
with other political science datasets or for obtaining standardized
party characteristics.
The function takes two main parameters:
party_gerda: A character vector of GERDA party
namesdestination: The name of the column from the ParlGov
view_party table to map toYou can map GERDA party names to various standardized party characteristics, including:
left_right: Left-right position scoresparty_name_english: English party namesparty_name_short: Short party namescountry_name: Country names# Map GERDA party names to left-right positions
parties <- c("cdu", "spd", "linke_pds", "fdp")
left_right_scores <- party_crosswalk(parties, "left_right")
print(left_right_scores)
# Map to English party names
english_names <- party_crosswalk(parties, "party_name_english")
print(english_names)This function is especially useful when you want to:
The gerda package provides easy access to a wide range
of German election and related data. By using the
gerda_data_list() function to explore available datasets
and load_gerda_web() to load them, you can quickly
incorporate this data into your research or analysis projects.
For more information or to provide feedback, please contact hhilbig@ucdavis.edu or visit the GitHub repository at https://github.com/hhilbig/gerda.