Type: Package
Title: Tidy Access to Women's Tennis Association (WTA) Data
Version: 0.1.0
Description: Scrapes and tidies publicly available data from the Women's Tennis Association website (https://www.wtatennis.com). Provides helpers to retrieve player biographies, singles and doubles career overviews, match histories, live rankings and aggregate statistics. Dynamic pages are rendered through a headless 'Chrome' session so 'JavaScript'-generated content is fully captured, and all outputs are returned as tidy data frames suitable for downstream analysis or visualisation.
License: Apache License (≥ 2)
URL: https://github.com/Angnar-97/matchpointR
BugReports: https://github.com/Angnar-97/matchpointR/issues
Encoding: UTF-8
Depends: R (≥ 4.1.0)
Imports: chromote, cli, jsonlite, magick, purrr, rvest, stringr, tibble, xml2
Suggests: httr2, knitr, rmarkdown, rsvg, testthat (≥ 3.0.0), withr
Config/testthat/edition: 3
RoxygenNote: 7.3.3
VignetteBuilder: knitr
NeedsCompilation: no
Packaged: 2026-04-26 14:34:10 UTC; User
Author: Alejandro Navas González [aut, cre] (alias: Angnar)
Maintainer: Alejandro Navas González <angnar@telaris.es>
Repository: CRAN
Date/Publication: 2026-04-28 19:40:12 UTC

matchpointR: Tidy Access to Women's Tennis Association (WTA) Data

Description

matchpointR is a small scraper toolkit that turns the public pages of https://www.wtatennis.com into tidy data frames. It ships helpers for player biographies, career highlights, full match histories and live rankings.

Details

Dynamic content is rendered through a headless Chrome session using the chromote package, so JavaScript-generated sections (matches, rankings) are fully captured before parsing. Where possible the package reads structured JSON-LD (schema.org) data instead of scraping CSS classes, for resilience against site redesigns.

Main functions

Author

Alejandro Navas González (Angnar).

Author(s)

Maintainer: Alejandro Navas González angnar@telaris.es (Angnar)

See Also

Useful links:


Fetch fully-rendered HTML with chromote

Description

Opens a headless Chrome session via chromote, waits for the page to settle, optionally clicks a "load more" button and/or scrolls, and returns the complete page source.

Usage

.chromote_get_html(
  url,
  wait = 8,
  click_more_selector = NULL,
  scroll = TRUE,
  max_clicks = 50L,
  session = NULL
)

Arguments

url

Character. Destination URL.

wait

Numeric. Seconds to wait after initial navigation. Default 8.

click_more_selector

Optional CSS selector for a "load more" button that should be clicked repeatedly until it disappears.

scroll

Logical. Scroll to the bottom after each click? Default TRUE.

max_clicks

Integer. Safety cap for the click loop. Default 50.

session

Optional pre-existing chromote::ChromoteSession. When supplied it is reused (callers are responsible for closing it).

Value

A character string containing the full page source.


Read dynamic HTML into an xml2 document

Description

Thin wrapper around .chromote_get_html() that parses the rendered HTML with xml2::read_html().

Usage

.read_html_dynamic(
  url,
  wait = 8,
  click_more_selector = NULL,
  scroll = TRUE,
  max_clicks = 50L,
  session = NULL
)

Arguments

url

Character. Destination URL.

wait

Numeric. Seconds to wait after initial navigation. Default 8.

click_more_selector

Optional CSS selector for a "load more" button that should be clicked repeatedly until it disappears.

scroll

Logical. Scroll to the bottom after each click? Default TRUE.

max_clicks

Integer. Safety cap for the click loop. Default 50.

session

Optional pre-existing chromote::ChromoteSession. When supplied it is reused (callers are responsible for closing it).

Value

An xml2::xml_document.


Get basic bio for a WTA player

Description

Parses the profile header of a WTA player page and returns a one-row tibble with name, nationality, birth date, birth place, height and handedness. The bulk of the data is read from the page's JSON-LD (schema.org Person) block, which is more stable than the visual markup; height is read from the profile bio block as a fallback.

Usage

wta_get_player_basics(player_url, download_images = TRUE)

Arguments

player_url

Character. Full URL to a player page. Build it with wta_player_url() if you only have the numeric id.

download_images

Logical. When TRUE (default) the headshot is downloaded into a magick-image object. Set to FALSE to skip the network round-trip and return only the image URL.

Value

A one-row tibble::tibble() with columns:

player_id

Numeric WTA id parsed from ⁠@id⁠.

name, given_name, family_name

Name fields.

birth_date

Date of birth (ISO 8601 character).

nationality, birth_place, birth_country

Geography fields.

height

Height string as shown on the bio (e.g. ⁠5' 9" (1.74m)⁠).

handedness

Dominant hand ("Right-Handed" / "Left-Handed").

nationality_code

3-letter IOC/ISO code extracted from the flag image (e.g. "CZE", "USA").

player_image_url, nationality_flag_url

Headshot and flag URLs.

player_image

magick-image of the headshot, when download_images = TRUE.

nationality_flag

magick-image of the flag SVG, when download_images = TRUE and the suggested package rsvg is installed (otherwise NA).

Examples


wta_get_player_basics(wta_player_url(320301, "katerina-siniakova"))


Get the match history for a WTA player

Description

Walks the dynamic "Matches" page of a player profile, clicking the "Show more" button until the full history is loaded, and returns one row per match with tournament, round, opponent, score and result.

Usage

wta_get_player_matches(player_url, max_clicks = 50L)

Arguments

player_url

Character. URL to the player page; the function normalises to the ⁠/matches⁠ path automatically.

max_clicks

Integer. Safety cap for the "Show more" click loop. Defaults to 50.

Value

A tibble::tibble() with one row per match and columns: tournament, tournament_date, round, opponent, opponent_seed, opponent_country, opponent_rank, score, result.

Examples


url <- wta_player_url(320301, "katerina-siniakova", "matches")
wta_get_player_matches(url)


Get a WTA player's career highlights

Description

Returns the structured "additional properties" block from the page's JSON-LD: current singles and doubles rank, career titles, career prize money. Supplements with the career-high singles rank read from the bio side panel.

Usage

wta_get_player_overview(player_url)

Arguments

player_url

Character. URL to the player overview page.

Value

A long-format tibble::tibble() with columns metric and value. Rows include singles_rank, doubles_rank, singles_career_titles, doubles_career_titles, career_prize_money, career_high.

Examples


wta_get_player_overview(wta_player_url(320301, "katerina-siniakova"))


Get the current WTA rankings

Description

Scrapes the rankings table at https://www.wtatennis.com/rankings/singles (or ⁠/doubles⁠) and returns a tidy tibble. The initial page renders the first 50 rows; increase the browser dwell time with wait if the widget hasn't hydrated yet.

Usage

wta_get_rankings(type = c("singles", "doubles"), top = NULL, wait = 12)

Arguments

type

Character. One of "singles", "doubles". Defaults to "singles".

top

Integer. Limit the output to the top N ranked players. NULL (default) keeps every row rendered by the page.

wait

Numeric. Seconds to wait for the rankings widget to hydrate after navigation. Defaults to 12.

Value

A tibble::tibble() with one row per player and columns: rank, player_id, player, country, age, tournaments_played, points.

Examples


wta_get_rankings("singles", top = 50)


Build a WTA player URL

Description

Convenience wrapper to assemble a canonical player URL from a numeric id and an optional slug.

Usage

wta_player_url(id, slug = NULL, section = c("overview", "matches"))

Arguments

id

Character or integer. The WTA numeric player id (e.g. 320301).

slug

Optional character. Player slug (e.g. "katerina-siniakova"). When omitted the URL still resolves — WTA redirects to the canonical one.

section

Optional character. Page section to append as a path segment, one of "overview", "matches". Defaults to "overview", which maps to the bare player URL.

Value

A single character string with the full URL.

Examples

wta_player_url(320301, "katerina-siniakova")
wta_player_url(320301, "katerina-siniakova", "matches")