Help for package OlinkAnalyze

Type:

Package

Title:

Facilitate Analysis of Proteomic Data from Olink

Version:

5.0.0

Description:

A collection of functions to facilitate analysis of proteomic data from Olink, primarily NPX data that has been exported from Olink Software. The functions also work on QUANT data from Olink by log- transforming the QUANT data. The functions are focused on reading data, facilitating data wrangling and quality control analysis, performing statistical analysis and generating figures to visualize the results of the statistical analysis. The goal of this package is to help users extract biological insights from proteomic data run on the Olink platform.

License:

AGPL (≥ 3)

Contact:

biostattools@olink.com

URL:

https://olink.com/ https://github.com/Olink-Proteomics/OlinkRPackage

Config/testthat/edition:

Config/testthat/parallel:

true

Config/testthat/start-first:

read_npx_l*, read_npx_w*

Encoding:

UTF-8

LazyData:

true

RoxygenNote:

7.3.3

Depends:

R (≥ 4.1.0)

Imports:

arrow (≥ 14.0.0), cli (≥ 3.6.2), data.table, dbplyr, dplyr (≥ 1.2.0), duckdb, forcats, ggplot2, grDevices, rlang, stringr, tibble, tidyr

Suggests:

broom, car, clusterProfiler, curl, emmeans, ggplotify, ggpubr, ggrepel, lme4, lmerTest, msigdbr (> 24.1.0), ordinal, readxl, pheatmap, scales, showtext, sysfonts, systemfonts, testthat (≥ 3.0.0), vdiffr, withr, writexl, zip, umap, rstatix, FSA, kableExtra, knitr

VignetteBuilder:

knitr

NeedsCompilation:

Packaged:

2026-03-28 11:26:48 UTC; klev.diamanti

Author:

Kathleen Nevola

[aut, cre] (kathy-nevola), Marianne Sandin

[aut] (marisand), Jamey Guess

[aut] (jrguess), Simon Forsberg

[aut] (simfor), Christoffer Cambronero [aut] (Orbmac), Pascal Pucholt

[aut] (AskPascal), Boxi Zhang

[aut] (boxizhang), Masoumeh Sheikhi [aut] (MasoumehSheikhi), Klev Diamanti

[aut] (klevdiamanti), Amrita Kar [aut] (amrita-kar), Lei Conze [aut] (leiliuC), Kristyn Chin [aut] (kristynchin-olink), Danai Topouza

[aut] (dtopouza), Stephen Pollo

[aut] (spollo-olprot), Kang Dong

[aut] (KangD-dev), Kristian Hodén

[ctb] (kristianHoden), Per Eriksson

[ctb] (b_watcher), Nicola Moloney

[ctb], Britta Lötstedt

[ctb], Emmett Sprecher

[ctb], Jessica Barbagallo [ctb] (jbarbagallo), Olof Mansson [ctr] (olofmansson), Ola Caster [ctb] (OlaCaster), Olink [cph, fnd]

Maintainer:

Kathleen Nevola <biostattools@olink.com>

Repository:

CRAN

Date/Publication:

2026-03-28 12:00:02 UTC

Common parameters for check functions.

Description

Common parameters for check functions.

Usage

.check_params(x, error)

Arguments

x

Variable to check.

error

Scalar boolean to return an error instead of a FALSE (default = FALSE).

Value

Boolean, TRUE or FALSE, if the variable is of the correct class; If output is FALSE and error = TRUE, an error is thrown.

Author(s)

Klev Diamanti

Common parameters for downstream analysis functions.

Description

Common parameters for downstream analysis functions.

Usage

.downstream_fun_args(df, check_log)

Arguments

df

A "tibble" or "ArrowObject" from read_npx.

check_log

A named list returned by check_npx(). If NULL, check_npx() will be run internally using df.

Author(s)

Klev Diamanti

Common parameters for getter functions in this file.

Description

Common parameters for getter functions in this file.

Usage

.get_olink_data_details(broad_platform, platform_name, data_type, quant_type)

Arguments

broad_platform

Name of the broad_platform to filter for. If NULL all broader platforms are considered, otherwise one of "NGS" and "qPCR" is expected.

platform_name

Name of the platform_name to filter for. If NULL all platforms are considered, otherwise one of "Explore 3072", "Explore HT", "Flex", "Focus", "Reveal", "Target 48", and "Target 96" is expected.

data_type

Name of the data_type to filter the Olink for. If NULL all data types are considered, otherwise one of "Ct", "NPX", and "Quantified" is expected.

quant_type

Name of the quant_type to filter for. If NULL all quantification types are considered, otherwise one of "absolute" and "relative" is expected.

Author(s)

Klev Diamanti

Common parameters for read_npx-related functions.

Description

Common parameters for read_npx-related functions.

Usage

.read_npx_args(
  filename,
  file,
  out_df,
  long_format,
  olink_platform,
  data_type,
  .ignore_files,
  quiet,
  legacy
)

Arguments

filename

Path to Olink software output file in wide or long format. Expecting extensions "xls" or "xlsx" for excel files, "csv" or "txt" for delim files, "parquet" for parquet files, and "zip" for compressed files.

file

out_df

The class of the output dataset. One of "tibble" or "arrow". Defaults to "tibble".

long_format

Boolean marking format of input file. One of TRUE for long format and FALSE for wide format files. Defaults to NULL for auto-detection.

olink_platform

Olink platform used to generate the input file. One of "Target 48", "Flex", "Target 96", "Explore 3072", "Explore HT", "Focus", or "Reveal". Defaults to NULL for auto-detection.

data_type

Quantification method of the input data. One of "Ct", "NPX", or "Quantified". Defaults to NULL for auto-detection.

.ignore_files

Character vector of files included in the zip-compressed Olink software output files that should be ignored. Used only for zip-compressed input files (default = c("README.txt")).

quiet

Boolean to print a confirmation message when reading the input file. Applies to excel or delimited input only. TRUE skips printing the message, and FALSE otherwise.

legacy

Boolean to run the legacy version of the read_npx function. Important: should be used only to wide format files from Target 96 or Target 48 with NPX Software version earlier than 1.8! (default FALSE).

Value

Dataset, "tibble" or "ArrowObject", with Olink data in long format.

Author(s)

Klev Diamanti

Utility function that adds quotation marks on elements printed by ansi_collapse from cli.

Description

Utility function that adds quotation marks on elements printed by ansi_collapse from cli.

Usage

ansi_collapse_quot(x, sep = "and")

Arguments

x

Character vector.

sep

One of "or" and "and".

Value

Scalar character vector collapsed by "and" or "or".

assign subject to a plate for longitudinal randomization

Description

assign subject to a plate for longitudinal randomization

Usage

assign_subject2plate(plate_map, manifest, subject_id)

Arguments

plate_map

character vector of locations available for samples including plate, row and columns

manifest

sample manifest mapping sample IDs and subject IDs

subject_id

id of subject

Value

plate_map adding sample IDs to plates keeping samples from the same subject on the same plate

Help function comparing the checksum reported by Olink software to the checksum of the Olink data file from the input zip-compressed file.

Description

Runs only if one of "MD5_checksum.txt" or "checksum_sha256.txt". are present in the input zip-compressed file. This function does not check if the checksum_file is in acceptable format.

Usage

check_checksum(checksum_file, npx_file)

Arguments

checksum_file

Extracted checksum file from the zip-compressed file that contains the checksum file from Olink software.

npx_file

Extracted file from the zip-compressed file that contains the Olink data file from Olink software.

Value

NULL or an error if the files could not be opened or if checksum did not match.

Author(s)

Klev Diamanti

Check if col_key is valid.

Description

Check if col_key is valid.

Usage

check_col_key(col_key)

Arguments

col_key

Column key for which to retrieve alternative column names. One of "sample_id", "sample_type", "assay_type", "olink_id", "uniprot", "assay", "panel", "block", "plate_id", "panel_version", "lod", "quant", "ext_npx", "count", "qc_warning", "assay_warn", "normalization", and "qc_version".

Value

Nothing or an error message if col_key is not valid.

Check presence of columns in dataset.

Description

Check if the input dataset (tibble or ArrowObject) df contains columns specified in col_list. col_list supports both exact matches of column names and alternative column names. In the latter case, alternative column names are elements of a character vector, and exactly one of the elements is required to be present.

Usage

check_columns(df, col_list)

Arguments

df

An Olink dataset (tibble or ArrowObject).

col_list

A list of character vectors.

Details

col_list contains a collection of character vectors. If a character vector is scalar (length = 1), the element of the vector is expected to be present among the column names of df. When an element of col_list contains more than one elements, the function will check whether the column names of df include at least one of the elements of that vector.

Value

Nothing or an error message if any column is missing.

Author(s)

Klev Diamanti Albin Lundin Lei Conze Pascal Pucholt Gil Henriques

Examples

## Not run: 
tmp_data <- dplyr::tibble(
  "A" = c(1L, 2L, 3L),
  "B" = c(TRUE, TRUE, FALSE),
  "C" = c("A", "B", "C"),
  "D" = c(FALSE, FALSE, TRUE)
)

# OK
check_columns(df = tmp_data,
              col_list = list("A", "B"))

# ERROR: E is missing
check_columns(df = tmp_data,
              col_list = list("A", "E"))

# ERROR: E and F are missing
check_columns(df = tmp_data,
              col_list = list("A", "E", "F"))

# OK
check_columns(df = tmp_data,
              col_list = list("A", c("B", "C")))

# OK
check_columns(df = tmp_data,
              col_list = list("A", c("B", "E")))

# ERROR: c(F, E) are missing
check_columns(df = tmp_data,
              col_list = list("A", c("F", "E")))

# ERROR: c(F, E) and c(M, N) are missing
check_columns(df = tmp_data,
              col_list = list("A", c("F", "E"), c("M", "N")))

## End(Not run)

Help function checking for DARID and PanelDataArchiveVersion combinations

Description

DarIDs D.07, 08, 10, and 14 need to exported with Panel Data Archive Version 1.5 or later. This function identifies cases where DataAnalysisRefID are paired with earlier PanelDataArchiveVersion.

Usage

check_darid(df, col_names)

Arguments

df

A "tibble" or "ArrowObject" from read_npx.

Value

A warning message if any invalid combinations are found.

Author(s)

Kathleen Nevola Kang Dong Klev Diamanti

Help function checking if file exists.

Description

Check one file at a time if it exists.

Usage

check_file_exists(file, error = FALSE)

Arguments

file

error

Scalar boolean to return an error instead of a FALSE (default = FALSE).

Value

TRUE if the file exists, and FALSE if not; error if the file does not exist and error = TRUE.

Author(s)

Klev Diamanti

Help function checking if file extension is acceptable.

Description

Use variable accepted_npx_file_ext to check if extension of the input file is acceptable. Expecting one of ""xls" or "xlsx" for excel files, "csv" or "txt" for delim files, "parquet" for parquet files, and "zip" for compressed files".

Usage

check_file_extension(file)

Arguments

file

Value

The type of the file extension based on the global variable accepted_npx_file_ext.

Help function to check Explore HT Fixed LOD file version

Description

When invalid DARID / PanelDataArchiveVersion combinations are detected in an Explore HT NPX file (darid_invalid entries in check_log), this helper checks that the Explore HT Fixed LOD file used for LOD calculation meets a minimum version requirement (default: "6.0.0").

Usage

check_ht_fixed_lod_version(check_log, lod_file)

Arguments

lod_file

A data frame (or tibble) representing the Fixed LOD file. It must contain a Panel column. If present, a Version column (character) is used to determine whether the file meets the min_version requirement.

Value

A logical scalar returned invisibly:

TRUE — the Fixed LOD file is missing/invalid/outdated (and a message or warning has been emitted).
FALSE — the check is not relevant or the Fixed LOD file meets the minimum version requirement.

Author(s)

Kathleen Nevola Kang Dong

Help function checking if a variable is an R6 ArrowObject.

Description

Help function checking if a variable is an R6 ArrowObject.

Usage

check_is_arrow_object(x, error = FALSE)

Arguments

x

Variable to check.

error

Scalar boolean to return an error instead of a FALSE (default = FALSE).

Value

Boolean, TRUE or FALSE, if the variable is of the correct class; If output is FALSE and error = TRUE, an error is thrown.

Author(s)

Klev Diamanti

Help function checking if a variable is a vector of booleans.

Description

Help function checking if a variable is a vector of booleans.

Usage

check_is_boolean(x, error = FALSE)

Arguments

x

Variable to check.

error

Scalar boolean to return an error instead of a FALSE (default = FALSE).

Value

Boolean, TRUE or FALSE, if the variable is of the correct class; If output is FALSE and error = TRUE, an error is thrown.

Author(s)

Klev Diamanti

Help function checking if a variable is a vector of characters.

Description

Help function checking if a variable is a vector of characters.

Usage

check_is_character(x, error = FALSE)

Arguments

x

Variable to check.

error

Scalar boolean to return an error instead of a FALSE (default = FALSE).

Value

Boolean, TRUE or FALSE, if the variable is of the correct class; If output is FALSE and error = TRUE, an error is thrown.

Author(s)

Klev Diamanti

Help function checking if a variable is a tibble or an ArrowObject dataset.

Description

Help function checking if a variable is a tibble or an ArrowObject dataset.

Usage

check_is_dataset(x, error = FALSE)

Arguments

x

Variable to check.

error

Scalar boolean to return an error instead of a FALSE (default = FALSE).

Value

Boolean, TRUE or FALSE, if the variable is of the correct class; If output is FALSE and error = TRUE, an error is thrown.

Author(s)

Klev Diamanti

Help function checking if a variable is a vector of integers.

Description

Help function checking if a variable is a vector of integers.

Usage

check_is_integer(x, error = FALSE)

Arguments

x

Variable to check.

error

Scalar boolean to return an error instead of a FALSE (default = FALSE).

Value

Boolean, TRUE or FALSE, if the variable is of the correct class; If output is FALSE and error = TRUE, an error is thrown.

Author(s)

Klev Diamanti

Help function checking if a variable is a list.

Description

Help function checking if a variable is a list.

Usage

check_is_list(x, error = FALSE)

Arguments

x

Variable to check.

error

Scalar boolean to return an error instead of a FALSE (default = FALSE).

Value

Boolean, TRUE or FALSE, if the variable is of the correct class; If output is FALSE and error = TRUE, an error is thrown.

Author(s)

Klev Diamanti

Help function checking if a variable is a vector of numerics.

Description

Help function checking if a variable is a vector of numerics.

Usage

check_is_numeric(x, error = FALSE)

Arguments

x

Variable to check.

error

Scalar boolean to return an error instead of a FALSE (default = FALSE).

Value

Boolean, TRUE or FALSE, if the variable is of the correct class; If output is FALSE and error = TRUE, an error is thrown.

Author(s)

Klev Diamanti

Help function checking if a variable is a scalar boolean.

Description

Help function checking if a variable is a scalar boolean.

Usage

check_is_scalar_boolean(x, error = FALSE)

Arguments

x

Variable to check.

error

Scalar boolean to return an error instead of a FALSE (default = FALSE).

Value

Boolean, TRUE or FALSE, if the variable is of the correct class; If output is FALSE and error = TRUE, an error is thrown.

Author(s)

Klev Diamanti

Help function checking if a variable is a scalar character.

Description

Help function checking if a variable is a scalar character.

Usage

check_is_scalar_character(x, error = FALSE)

Arguments

x

Variable to check.

error

Scalar boolean to return an error instead of a FALSE (default = FALSE).

Value

Boolean, TRUE or FALSE, if the variable is of the correct class; If output is FALSE and error = TRUE, an error is thrown.

Author(s)

Klev Diamanti

Help function checking if a variable is a scalar integer.

Description

Help function checking if a variable is a scalar integer.

Usage

check_is_scalar_integer(x, error = FALSE)

Arguments

x

Variable to check.

error

Scalar boolean to return an error instead of a FALSE (default = FALSE).

Value

Boolean, TRUE or FALSE, if the variable is of the correct class; If output is FALSE and error = TRUE, an error is thrown.

Author(s)

Klev Diamanti

Help function checking if a variable is a scalar numeric

Description

Help function checking if a variable is a scalar numeric

Usage

check_is_scalar_numeric(x, error = FALSE)

Arguments

x

Variable to check.

error

Scalar boolean to return an error instead of a FALSE (default = FALSE).

Value

Boolean, TRUE or FALSE, if the variable is of the correct class; If output is FALSE and error = TRUE, an error is thrown.

Author(s)

Klev Diamanti

Help function checking if a variable is a tibble dataset.

Description

Help function checking if a variable is a tibble dataset.

Usage

check_is_tibble(x, error = FALSE)

Arguments

x

Variable to check.

error

Scalar boolean to return an error instead of a FALSE (default = FALSE).

Value

Boolean, TRUE or FALSE, if the variable is of the correct class; If output is FALSE and error = TRUE, an error is thrown.

Author(s)

Klev Diamanti

Help function to check if suggested libraries are installed when required.

Description

Help function to check if suggested libraries are installed when required.

Usage

check_library_installed(x, error = FALSE)

Arguments

x

A character vector of R libraries.

error

Boolean to return error or a boolean (default).

Value

Boolean if the library is installed or not, and an error if error = TRUE.

Author(s)

Klev Diamanti

Check if check_log has identified required column in the dataset.

Description

Check if check_log has identified required column in the dataset.

Usage

check_log_colname(check_log, col_key)

Arguments

check_log

A named list returned by check_npx(). If NULL, check_npx() will be run internally using df.

col_key

Value

Nothing or an error message if the required column is missing.

Check NPX data format

Description

This function performs various checks on NPX data, including checking column names, validating Olink identifiers, identifying assays with NA values for all samples and detecting duplicate sample identifiers.

Usage

check_npx(df, preferred_names = NULL)

Arguments

df

A "tibble" or "ArrowObject" from read_npx.

preferred_names

A named character vector where names are internal column names and values are column names to be selected from the input data frame. Read the description for further information.

Details

OlinkAnalyze uses pre-defined names of columns of data frames to perform downstream analyses. At the same time, different Olink platforms export data with different column names (e.g. different protein quantification metric). This function aims to instruct each function of OlinkAnalyze on the column it should be using for the downstream analysis. This should be seamless for data exported from Olink Software and imported to R using the read_npx function.

However, in certain cases the columns of interest might be named differently. This function allows assigning custom-named columns of a data frame to internally expected variables that will in turn instruct Olink Analyze functions to use them for downstream analysis. For example, if one wished to use the column PCNormalizedNPX for their analysis instead of the column NPX, then they can assign this new name to the internal variable quant to inform the package that in the downstream analysis PCNormalizedNPX should be used. See example 3.

Similarly, in case of multiple matches (e.g. the data frame contains both columns LOD and PlateLOD) the ties will need to be resolved by the user using the argument preferred_names from this function. See example 4.

The argument preferred_names is a named character vector with internal column names as names and column names of the current data set as values. Names of the input vector can be one or more of the following: "sample_id", "sample_type", "assay_type", "olink_id", "uniprot", "assay", "panel", "block", "plate_id", "panel_version", "lod", "quant", "ext_npx", "count", "qc_warning", "assay_warn", "normalization", and "qc_version"

Value

A list containing the following elements:

col_names List of column names from the input data frame marking the columns to be used in downstream analyses.
oid_invalid Character vector of invalid OlinkID.
assay_na Character vector of assays with all samples having NA values.
sample_id_dups Character vector of duplicate SampleID.
sample_id_na Character vector containing SampleID of samples with quantified values NA for all assays.
col_class Data frame with columns of incorrect type including column key col_key, column name col_name, detected column type col_class and expected column type expected_col_class.
assay_qc Character vector containing OlinkID of assays with at least one assay warning.
non_unique_uniprot Character vector of OlinkID mapped to more than one UniProt ID.
darid_invalid Character vector containing outdated combinations of DataAnalysisRefID and PanelDataArchiveVersion.

Author(s)

Masoumeh Sheikhi Klev Diamanti

Examples

## Not run: 
# Example 0: Use npx_data1 to check that check_npx works
check_npx_result <- OlinkAnalyze::npx_data1 |>
  OlinkAnalyze::check_npx() |>
  suppressWarnings()

# read NPX data
npx_file <- system.file("extdata",
                        "npx_data_ext.parquet",
                        package = "OlinkAnalyze")
npx_df <- OlinkAnalyze::read_npx(filename = npx_file)

# Example 1: run df as is
OlinkAnalyze::check_npx(df = npx_df)

# Example 2: SampleType missing from data frame
npx_df |>
  dplyr::select(
    -dplyr::all_of(
      c("SampleType")
    )
  ) |>
  OlinkAnalyze::check_npx()

# Example 3: Use PCNormalizedNPX instead on NPX
OlinkAnalyze::check_npx(
  df = npx_df,
  preferred_names = c("quant" = "PCNormalizedNPX")
)

# Example 4: Use PCNormalizedNPX instead on NPX, and PlateLOD instead of LOD
npx_df |>
  dplyr::mutate(
    LOD = 1L,
    PlateLOD = 2L
  ) |>
  OlinkAnalyze::check_npx(
    preferred_names = c("quant" = "PCNormalizedNPX",
                        "lod" = "PlateLOD")
  )

## End(Not run)

Help function to identify Olink assays with all quantified values NA

Description

This function checks if there are assays with the quantified values for all samples NA.

Usage

check_npx_all_na_assays(df, col_names)

Arguments

df

A "tibble" or "ArrowObject" from read_npx.

col_names

A list of matched column names. This is the output of the check_npx_col_names function.

Details

We have added the tags importFrom for "dbplyr" and "duckdb" because "devtools::check()" would complain with a note that the two libraries are imported but never used. To avoid that we used solutions taken from here:

https://github.com/hadley/r-pkgs/issues/203
https://github.com/pbs-software/pbs-modelling/issues/95

Value

A character vector containing OlinkID of assays with quantified values NA for all samples, otherwise returns character(0).

Author(s)

Simon Forsberg Masoumeh Sheikhi

Help function to identify Olink samples with all quantified values NA

Description

This function checks if there are samples with the quantified values for all assays NA.

Usage

check_npx_all_na_sample(df, col_names)

Arguments

df

A "tibble" or "ArrowObject" from read_npx.

col_names

A list of matched column names. This is the output of the check_npx_col_names function.

Details

https://github.com/hadley/r-pkgs/issues/203
https://github.com/pbs-software/pbs-modelling/issues/95

Value

A character vector containing SampleID of samples with quantified values NA for all assays, otherwise returns character(0).

Author(s)

Simon Forsberg Masoumeh Sheikhi Klev Diamanti

Help function checking types of columns in data.

Description

This function checks if certain columns from df have the correct type to enable downstream analysis. Columns to be checked are marked as such in the columns col_class and col_class_check of column_name_dict.

Usage

check_npx_col_class(df, col_names)

Arguments

df

A "tibble" or "ArrowObject" from read_npx.

col_names

A list of matched column names. This is the output of the check_npx_col_names function.

Value

A data frame with the columns col_name, col_key, col_class and expected_col_class marking columns with the incorrect type.

Author(s)

Klev Diamanti

Check, update and define column names used in downstream analyses

Description

The argument preferred_names is a named character vector with internal column names as names and column names of the current data set as values. Names of the input vector can be one or more of the following: sample_id, sample_type, assay_type, olink_id, uniprot, assay, panel, block, plate_id, panel_version, lod, quant, ext_npx, count, qc_warning, assay_warn, normalization, and qc_version

Usage

check_npx_col_names(df, preferred_names = NULL)

Arguments

df

A "tibble" or "ArrowObject" from read_npx.

preferred_names

A named character vector where names are internal column names and values are column names to be selected from the input data frame. Read the description for further information.

Value

List of column names from the input data frame marking the columns to be used in downstream analyses.

Author(s)

Klev Diamanti Masoumeh Sheikhi

Examples


# read NPX data
npx_file <- system.file("extdata",
                        "npx_data_ext.parquet",
                        package = "OlinkAnalyze")
npx_df <- OlinkAnalyze::read_npx(filename = npx_file)

# Example 1: run df as is
OlinkAnalyze:::check_npx_col_names(df = npx_df)

# Example 2: SampleType missing from data frame
npx_df |>
  dplyr::select(
    -dplyr::all_of(
      c("SampleType")
    )
  ) |>
  OlinkAnalyze:::check_npx_col_names()

# Example 3: Use PCNormalizedNPX instead on NPX
OlinkAnalyze:::check_npx_col_names(
  df = npx_df,
  preferred_names = c("quant" = "PCNormalizedNPX")
)

# Example 4: Use PCNormalizedNPX instead on NPX, and PlateLOD instead of LOD
npx_df |>
  dplyr::mutate(
    LOD = 1L,
    PlateLOD = 2L
  ) |>
  OlinkAnalyze:::check_npx_col_names(
    preferred_names = c("quant" = "PCNormalizedNPX",
                        "lod" = "PlateLOD")
  )

Help function checking for duplicate sample identifiers in data.

Description

This function checks if there are duplicate sample identifiers for any assay.

Usage

check_npx_duplicate_sample_ids(df, col_names)

Arguments

df

A "tibble" or "ArrowObject" from read_npx.

col_names

A list of matched column names. This is the output of the check_npx_col_names function.

Value

A character vector of duplicate SampleID found in the data.

Author(s)

Masoumeh Sheikhi

Help function checking for assays mapping to multiple UniProt identifiers.

Description

Occasionally, updates in panel versions include updates in UniProt identifiers (e.g. change in formatting). This function identifies cases where an assay identifier OlinkID maps to multiple UniProt identifiers.

Usage

check_npx_nonunique_uniprot(df, col_names)

Arguments

df

A "tibble" or "ArrowObject" from read_npx.

col_names

A list of matched column names. This is the output of the check_npx_col_names function.

Value

A character vector of assay identifiers OlinkID that map to more than one UniProt identifiers.

Author(s)

Kathleen Nevola Kang Dong Klev Diamanti

Help function checking whether df contains invalid Olink identifiers

Description

This function checks if Olink identifiers (OlinkID) match the pattern of a prefix "OID" followed by 5 integer numbers.

Usage

check_npx_olinkid(df, col_names)

Arguments

df

A "tibble" or "ArrowObject" from read_npx.

col_names

A list of matched column names. This is the output of the check_npx_col_names function.

Value

A character vector with invalid OlinkID.

Author(s)

Masoumeh Sheikhi

Help function checking data for assay QC warnings.

Description

Help function checking data for assay QC warnings.

Usage

check_npx_qcwarn_assays(df, col_names)

Arguments

df

A "tibble" or "ArrowObject" from read_npx.

col_names

A list of matched column names. This is the output of the check_npx_col_names function.

Value

A character vector containing OlinkID of assays with at least one QC warning, otherwise a character(0).

Author(s)

Klev Diamanti

Update column names to be used in downstream analyses

Description

OlinkAnalyze uses pre-defined names of columns of data frames to perform downstream analyses. However, in certain cases the columns of interest might be named differently. The aim of this function is to assign custom-named columns of a data frame to internally expected variables that will in turn enable analysis of Olink data. For example, if one wished to #' use the column PCNormalizedNPX for their analysis instead of the column NPX, then they can assign this new name to the internal variable quant to inform the package that in the downstream analysis PCNormalizedNPX should be used.

This function takes as input a named character vector with internal column names as names and column names of the current data set as values. Names of the input vector can be one or more of the following: sample_id, sample_type, assay_type, olink_id, uniprot, assay, panel, block, plate_id, panel_version, lod, quant, ext_npx, count, qc_warning, assay_warn, normalization, and qc_version

Usage

check_npx_update_col_names(preferred_names)

Arguments

preferred_names

A named character vector where names are internal column names and values are column names to be selected from the input data frame. Read the description for further information.

Value

column_name_dict updated based on preferred_names.

Author(s)

Klev Diamanti Masoumeh Sheikhi

Help function checking that `broad_platform` is expected.

Description

Help function checking that broad_platform is expected.

Usage

check_olink_broader_platform(
  x,
  platform_name = NULL,
  data_type = NULL,
  quant_type = NULL
)

Arguments

x

Name of the broader Olink platform. One of "NGS" and "qPCR".

platform_name

Name of the platform_name to filter for. If NULL all platforms are considered, otherwise one of "Explore 3072", "Explore HT", "Flex", "Focus", "Reveal", "Target 48", and "Target 96" is expected.

data_type

Name of the data_type to filter the Olink for. If NULL all data types are considered, otherwise one of "Ct", "NPX", and "Quantified" is expected.

quant_type

Name of the quant_type to filter for. If NULL all quantification types are considered, otherwise one of "absolute" and "relative" is expected.

Value

NULL if broader Olink platform is expected, otherwise an error.

Author(s)

Klev Diamanti

Help function checking that `data_type` is expected.

Description

Help function checking that data_type is expected.

Usage

check_olink_data_type(
  x,
  broad_platform = NULL,
  platform_name = NULL,
  quant_type = NULL
)

Arguments

x

The name of the Olink data type. One of "Ct", "NPX", and "Quantified".

broad_platform

Name of the broad_platform to filter for. If NULL all broader platforms are considered, otherwise one of "NGS" and "qPCR" is expected.

platform_name

Name of the platform_name to filter for. If NULL all platforms are considered, otherwise one of "Explore 3072", "Explore HT", "Flex", "Focus", "Reveal", "Target 48", and "Target 96" is expected.

quant_type

Name of the quant_type to filter for. If NULL all quantification types are considered, otherwise one of "absolute" and "relative" is expected.

Value

NULL if quantification method (data type) is expected, otherwise an error.

Author(s)

Klev Diamanti

Help function checking that `olink_platform` is expected.

Description

Help function checking that olink_platform is expected.

Usage

check_olink_platform(
  x,
  broad_platform = NULL,
  data_type = NULL,
  quant_type = NULL
)

Arguments

x

Name of Olink platform. One of "Explore 3072", "Explore HT", "Flex", "Focus", "Reveal", "Target 48", and "Target 96".

broad_platform

Name of the broad_platform to filter for. If NULL all broader platforms are considered, otherwise one of "NGS" and "qPCR" is expected.

data_type

Name of the data_type to filter the Olink for. If NULL all data types are considered, otherwise one of "Ct", "NPX", and "Quantified" is expected.

quant_type

Name of the quant_type to filter for. If NULL all quantification types are considered, otherwise one of "absolute" and "relative" is expected.

Value

NULL if platform is expected, otherwise an error.

Author(s)

Klev Diamanti

Help function checking that `data_type` is expected.

Description

Help function checking that data_type is expected.

Usage

check_olink_quant_type(
  x,
  broad_platform = NULL,
  platform_name = NULL,
  data_type = NULL
)

Arguments

x

The name of the Olink quantification type. One of "absolute" and "relative".

broad_platform

Name of the broad_platform to filter for. If NULL all broader platforms are considered, otherwise one of "NGS" and "qPCR" is expected.

platform_name

Name of the platform_name to filter for. If NULL all platforms are considered, otherwise one of "Explore 3072", "Explore HT", "Flex", "Focus", "Reveal", "Target 48", and "Target 96" is expected.

data_type

Name of the data_type to filter the Olink for. If NULL all data types are considered, otherwise one of "Ct", "NPX", and "Quantified" is expected.

Value

NULL if quantification type is expected, otherwise an error.

Author(s)

Klev Diamanti

Utility function to check OSI values for validity

Description

Utility function to check OSI values for validity

Usage

check_osi(df, check_log = NULL, osi_score = NULL)

Arguments

df

An Olink dataset.

check_log

Output log of check_npx(). Defaults to NULL.

osi_score

Name of OSI column to check. Defaults to NULL.

Value

An Olink dataset with the OSI column checked and cleaned

Help function checking that the requested output class of the read_npx* functions is acceptable.

Description

Help function checking that the requested output class of the read_npx* functions is acceptable.

Usage

check_out_df_arg(out_df)

Arguments

out_df

The class of the output dataset. One of "tibble" or "arrow". Defaults to "tibble".

Value

Error if out_df is not one of "tibble" and "arrow".

Author(s)

Klev Diamanti

Help function removing assays with all quantified values `NA`.

Description

This function filters out rows from a tibble or arrow object where the assay identifier (one of "OlinkID", "OID", "olinkid", "oid", or "olink_id") matches those listed in check_log$assay_na, which contains assays composed entirely of NA values in their quantification column.

Usage

clean_assay_na(df, check_log, remove_assay_na = TRUE, verbose = FALSE)

Arguments

df

A "tibble" or "ArrowObject" from read_npx.

check_log

A named list returned by check_npx(). If NULL, check_npx() will be run internally using df.

remove_assay_na

Logical. If FALSE, skips filtering assays with all quantified values NA. Defaults to TRUE.

verbose

Logical. If FALSE (default), silences step-wise messages.

Value

Dataset, "tibble" or "ArrowObject", with Olink data in long format.

Author(s)

Kang Dong Klev Diamanti

Help function removing control assays based on assay type.

Description

This function filters out internal control assays (ext_ctrl, inc_ctrl, amp_ctrl) from the dataset, unless user specified to retain them. The function uses column mapping provided by check_log.

Usage

clean_assay_type(df, check_log, remove_control_assay = TRUE, verbose = FALSE)

Arguments

df

A "tibble" or "ArrowObject" from read_npx.

check_log

A named list returned by check_npx(). If NULL, check_npx() will be run internally using df.

remove_control_assay

If FALSE, all internal control assays are retained. If TRUE, all internal control assays are removed. Alternatively, a character vector with one or more of "assay", "inc", "det", "ext", and "amp" indicating the assay types to remove.

verbose

Logical. If FALSE (default), silences step-wise messages.

Value

Dataset, "tibble" or "ArrowObject", with Olink data in long format.

Author(s)

Kang Dong Klev Diamanti

Help function removing instances of assays flagged with warnings.

Description

The function is used to remove assay-level QC warnings from the dataset before analysis. It uses the column marking assay QC warnings identified by check_log to remove assays flagged as WARN in the dataset.

Usage

clean_assay_warning(
  df,
  check_log,
  remove_assay_warning = TRUE,
  verbose = FALSE
)

Arguments

df

A "tibble" or "ArrowObject" from read_npx.

check_log

A named list returned by check_npx(). If NULL, check_npx() will be run internally using df.

remove_assay_warning

Logical. If FALSE, retains assays flagged as WARN in assay warning. Defaults to TRUE.

verbose

Logical. If FALSE (default), silences step-wise messages.

Value

Dataset, "tibble" or "ArrowObject", with Olink data in long format.

Author(s)

Kang Dong Klev Diamanti

Help function converting types of columns to the expected ones.

Description

This function checks for mismatches between actual and expected column classes in the input data frame and coerces those columns to the expected class using information from check_log$col_class.

Usage

clean_col_class(df, check_log, convert_df_cols = TRUE, verbose = FALSE)

Arguments

df

A "tibble" or "ArrowObject" from read_npx.

check_log

A named list returned by check_npx(). If NULL, check_npx() will be run internally using df.

convert_df_cols

Logical. If FALSE, retains columns of df as are. Defaults to TRUE, were columns required for downstream analysis are converted to the expected format.

verbose

Logical. If FALSE (default), silences step-wise messages.

Value

Dataset, "tibble" or "ArrowObject", with Olink data in long format.

Author(s)

Kang Dong Klev Diamanti

Help function removing a set of control samples from the dataset.

Description

This function removes rows from NPX data where the sample identifiers, as defined in check_log, match samples provided in control_sample_ids. Primary goal of the function is to serve for filtering out technical replicates or control samples prior to downstream analysis.

Usage

clean_control_sample_id(
  df,
  check_log,
  control_sample_ids = NULL,
  verbose = FALSE
)

Arguments

df

A "tibble" or "ArrowObject" from read_npx.

check_log

A named list returned by check_npx(). If NULL, check_npx() will be run internally using df.

control_sample_ids

character vector of sample identifiers of control samples. Default NULL, to mark no samples to be removed.

verbose

Logical. If FALSE (default), silences step-wise messages.

Value

Dataset, "tibble" or "ArrowObject", with Olink data in long format.

Author(s)

Kang Dong Klev Diamanti

Examples

## Not run: 
# use npx_data1 to check that clean_control_sample_id() works
log <- OlinkAnalyze::check_npx(
  df = OlinkAnalyze::npx_data1
) |>
  suppressWarnings() |>
  suppressMessages()

out <- OlinkAnalyze:::clean_control_sample_id(
  df = npx_data1,
  check_npx_log = log,
  control_sample_id = c("CONTROL_SAMPLE_AS 1", "CONTROL_SAMPLE_AS 2")
)

## End(Not run)

Help function removing samples with duplicate identifiers.

Description

This function filters out rows from a tibble or arrow object where the sample identifier (one of "SampleID", "sampleid", or "sample_id") matches values listed in check_log$sample_id_dups, which identifies samples with duplicated identifiers.

Usage

clean_duplicate_sample_id(
  df,
  check_log,
  remove_dup_sample_id = TRUE,
  verbose = FALSE
)

Arguments

df

A "tibble" or "ArrowObject" from read_npx.

check_log

A named list returned by check_npx(). If NULL, check_npx() will be run internally using df.

remove_dup_sample_id

Logical. If FALSE, skips filtering samples with duplicate sample identifiers. Defaults to TRUE.

verbose

Logical. If FALSE (default), silences step-wise messages.

Value

Dataset, "tibble" or "ArrowObject", with Olink data in long format.

Author(s)

Kang Dong Klev Diamanti

Help function removing assays with invalid identifiers.

Description

This function filters out rows from a tibble or arrow object where the assay identifier (one of "OlinkID", "OID", "olinkid", "oid", or "olink_id") matches values listed in check_log$oid_invalid, which identifies invalid or malformed assay identifiers.

Usage

clean_invalid_oid(df, check_log, remove_invalid_oid = TRUE, verbose = FALSE)

Arguments

df

A "tibble" or "ArrowObject" from read_npx.

check_log

A named list returned by check_npx(). If NULL, check_npx() will be run internally using df.

remove_invalid_oid

Logical. If FALSE, skips filtering assays with invalid identifiers. Defaults to TRUE.

verbose

Logical. If FALSE (default), silences step-wise messages.

Value

Dataset, "tibble" or "ArrowObject", with Olink data in long format.

Author(s)

Kang Dong Klev Diamanti

Help function unifying pairs of OlinkID and UniProt identifiers.

Description

This function checks the non-unique "OlinkID - UniProt" mappings, as defined in check_log. It selects the first instance of UniProt ID per OlinkID and replaces the original UniProt column with the unified mapping.

Usage

clean_nonunique_uniprot(
  df,
  check_log,
  convert_nonunique_uniprot = TRUE,
  verbose = TRUE
)

Arguments

df

A "tibble" or "ArrowObject" from read_npx.

check_log

A named list returned by check_npx(). If NULL, check_npx() will be run internally using df.

convert_nonunique_uniprot

Logical. If FALSE, retains non-unique OlinkID - UniProt mapping. Defaults to TRUE.

verbose

Logical. If FALSE (default), silences step-wise messages.

Value

Dataset, "tibble" or "ArrowObject", with Olink data in long format.

Author(s)

Kang Dong Klev Diamanti

Clean proteomics data quantified with Olink's PEA technology

Description

This function applies a series of cleaning steps to a data set exported by Olink Software and imported in R by read_npx(). Some of the steps of this function rely on results from check_npx().

This function removes samples and assays that are not suitable for downstream statistical analysis. Some of the data records that are removed include duplicate sample identifiers, external controls samples, internal control assays, and samples or assays with quality control flags.

Usage

clean_npx(
  df,
  check_log = NULL,
  remove_assay_na = TRUE,
  remove_invalid_oid = TRUE,
  remove_dup_sample_id = TRUE,
  remove_control_assay = TRUE,
  remove_control_sample = TRUE,
  remove_qc_warning = TRUE,
  remove_assay_warning = TRUE,
  control_sample_ids = NULL,
  convert_df_cols = TRUE,
  convert_nonunique_uniprot = TRUE,
  out_df = "tibble",
  verbose = FALSE
)

Arguments

df

A "tibble" or "ArrowObject" from read_npx.

check_log

A named list returned by check_npx(). If NULL, check_npx() will be run internally using df.

remove_assay_na

Logical. If FALSE, skips filtering assays with all quantified values NA. Defaults to TRUE.

remove_invalid_oid

Logical. If FALSE, skips filtering assays with invalid identifiers. Defaults to TRUE.

remove_dup_sample_id

Logical. If FALSE, skips filtering samples with duplicate sample identifiers. Defaults to TRUE.

remove_control_assay

remove_control_sample

If FALSE, all control samples are retained. If TRUE, all control samples are removed. Alternatively, a character vector with one or more of "sample", "sc", "pc", and "nc" indicating the sample types to remove.

remove_qc_warning

Logical. If FALSE, retains samples flagged as FAIL in QC warning. Defaults to TRUE.

remove_assay_warning

Logical. If FALSE, retains assays flagged as WARN in assay warning. Defaults to TRUE.

control_sample_ids

character vector of sample identifiers of control samples. Default NULL, to mark no samples to be removed.

convert_df_cols

Logical. If FALSE, retains columns of df as are. Defaults to TRUE, were columns required for downstream analysis are converted to the expected format.

convert_nonunique_uniprot

Logical. If FALSE, retains non-unique OlinkID - UniProt mapping. Defaults to TRUE.

out_df

The class of the output dataset. One of "tibble" or "arrow". Defaults to "tibble".

verbose

Logical. If FALSE (default), silences step-wise messages.

Details

The pipeline performs the following steps:

Remove assays with invalid identifiers: assays flagged as having invalid identifiers from check_npx(). Occurs when the original data set provided by Olink Software has been modified.
Remove assays with NA quantification values: assays lacking quantification data are reported with NA as quantification. These assays are identified in check_npx().
Remove samples with duplicate identifiers: samples with identical identifiers detected by check_npx(). Instances of duplicate sample identifiers cause errors in the downstream analysis of data with, and it is highly discouraged.
Remove external control samples:
- Uses column marking sample type (e.g. SampleType) to exclude external control samples.
- Uses column marking sample identifier (e.g. SampleID) to remove external control samples, or samples that ones wants to exclude from the downstream analysis.
Remove samples failing quality control: samples with QC status FAIL.
Remove internal control assays: Uses column marking assay type (e.g. AssayType) to exclude internal control assays.
Remove assays with quality controls warnings: assays with QC status WARN.
Correct column data type: ensure that certain columns have the expected data type (class). These columns are identified in check_npx().
Resolve multiple UniProt mappings per assay: ensure that each assay identifier (e.g., OlinkID) maps uniquely to a single UniProt ID.

Important:

When data set lacks a column marking sample type (e.g. SampleType), one should remove external control samples based on their sample identifiers. This function does not auto-detect external control samples based on their sample identifiers. Please ensure external control samples have been removed prior to downstream statistical analysis.
When data set lacks a column marking assay type (e.g. AssayType), one should remove internal control assays manually. This function does not auto-detect internal control assays. Please ensure internal control assays have been removed prior to downstream statistical analysis.

Value

Dataset, "tibble" or "ArrowObject", with Olink data in long format.

Author(s)

Kang Dong Klev Diamanti

Examples

## Not run: 
# run check_npx
check_log <- check_npx(
  df = npx_data1
)

# run clean_npx
clean_npx(
  df = npx_data1,
  check_log = check_log
)

# run clean_npx with messages for all steps
clean_npx(
  df = npx_data1,
  check_log = check_log,
  verbose = TRUE
)

## End(Not run)

Help function removing instances of samples that failed QC.

Description

This function uses the column marking QC warnings identified by check_log to remove samples flagged FAIL in the dataset.

Usage

clean_qc_warning(df, check_log, remove_qc_warning = TRUE, verbose = FALSE)

Arguments

df

A "tibble" or "ArrowObject" from read_npx.

check_log

A named list returned by check_npx(). If NULL, check_npx() will be run internally using df.

remove_qc_warning

Logical. If FALSE, retains samples flagged as FAIL in QC warning. Defaults to TRUE.

verbose

Logical. If FALSE (default), silences step-wise messages.

Value

Dataset, "tibble" or "ArrowObject", with Olink data in long format.

Author(s)

Kang Dong Klev Diamanti

Help function removing control samples based on sample type.

Description

This function filters out rows from a dataset where the sample type column matches known control sample types: "SAMPLE_CONTROL", "PLATE_CONTROL" or "NEGATIVE_CONTROL". If keep_control_sample is set to TRUE, or if the sample type column is present in the check_log, the function returns the original data unchanged.

Usage

clean_sample_type(df, check_log, remove_control_sample = TRUE, verbose = FALSE)

Arguments

df

A "tibble" or "ArrowObject" from read_npx.

check_log

A named list returned by check_npx(). If NULL, check_npx() will be run internally using df.

remove_control_sample

verbose

Logical. If FALSE (default), silences step-wise messages.

Value

Dataset, "tibble" or "ArrowObject", with Olink data in long format.

Author(s)

Kang Dong Klev Diamanti

Help function converting the output dataset from read_npx* functions to "tibble" or "ArrowObject".

Description

Help function converting the output dataset from read_npx* functions to "tibble" or "ArrowObject".

Usage

convert_read_npx_output(df, out_df)

Arguments

df

A "tibble" or "ArrowObject" from read_npx.

out_df

The class of the output dataset. One of "tibble" or "arrow". Defaults to "tibble".

Value

The dataset in the requested class.

Author(s)

Klev Diamanti

Create empty plate layout

Description

Create empty plate layout

Usage

generate_plate_holder(
  nplates,
  nspots,
  nsamples,
  plate_size,
  num_ctrl,
  rand_ctrl
)

Arguments

nplates

number of plates

nspots

number of spots on each plate

nsamples

number of samples

plate_size

size of plate

num_ctrl

number of controls

rand_ctrl

if controls are randomized

Value

plate layout including plates, rows, and columns of available wells

Get names of all broader Olink platforms.

Description

Get names of all broader Olink platforms.

Usage

get_all_olink_broader_platforms()

Value

A character vector with names of all Olink broader platforms.

Author(s)

Klev Diamanti

Get names of all Olink quantification methods (data types).

Description

Get names of all Olink quantification methods (data types).

Usage

get_all_olink_data_types()

Value

A character vector with names of all Olink quantification methods (data types).

Author(s)

Klev Diamanti

Get names of all Olink platforms.

Description

Get names of all Olink platforms.

Usage

get_all_olink_platforms()

Value

A character vector with names of all Olink platforms.

Author(s)

Klev Diamanti

Get names of all Olink quantification types.

Description

Get names of all Olink quantification types.

Usage

get_all_olink_quant_types()

Value

A character vector with names of all Olink quantification types.

Author(s)

Klev Diamanti

Gets alternative column names for a given column key.

Description

Gets alternative column names for a given column key.

Usage

get_alt_colnames(col_key)

Arguments

col_key

Value

A character vector of alternative column names corresponding to the provided column key.

Examples


  OlinkAnalyze:::get_alt_colnames(col_key = "sample_id")

Help function to get the file name of the checksum file from the list of contents of a zip-compressed file.

Description

Help function to get the file name of the checksum file from the list of contents of a zip-compressed file.

Usage

get_checksum_file(files)

Arguments

files

A character vector listing file names included in the zip-compressed input file.

Value

The file name of the checksum file, or NA if file is absent.

Author(s)

Klev Diamanti

Prints class type output from read_npx* functions.

Description

Prints class type output from read_npx* functions.

Usage

get_df_output_print()

Value

A scalar character vector with the class type of outputs from read_npx* functions.

Author(s)

Klev Diamanti

Help function to get the separator of a delimited file exported from Olink software.

Description

This function uses the first line of the provided file to determine the separator of the file.

Note: The function does not allow presence of commas and semicolons on the same line.

Usage

get_field_separator(file)

Arguments

file

Path to Olink software output delimited file in wide or long format. Expecting file extensions "csv" or "txt".

Value

The file delimiter ";", ",", " ", "|", or ":".

Author(s)

Klev Diamanti

Gets all file extensions based on the file format.

Description

Gets all file extensions based on the file format.

Usage

get_file_ext(name_sub = NULL)

Arguments

name_sub

Substring of file format. One of "excel", "delim", "parquet", "compressed", and "NULL". If NULL all file extensions are returned.

Value

Character vector with accepted file extensions.

Author(s)

Klev Diamanti

Describes acceptable file extension for each file type.

Description

Describes acceptable file extension for each file type.

Usage

get_file_ext_summary()

Value

A scalar character vector with one sentence describing the acceptable extensions of each file type.

Author(s)

Klev Diamanti

Get all acceptable file formats.

Description

Get all acceptable file formats.

Usage

get_file_formats()

Value

A character vector with the acceptable file formats.

Author(s)

Klev Diamanti

Help function to get the file name of the Olink data file from the list of contents of a zip-compressed file.

Description

Help function to get the file name of the Olink data file from the list of contents of a zip-compressed file.

Usage

get_npx_file(files, excl_file_ext = c("zip"))

Arguments

files

A character vector listing file names included in the zip-compressed input file.

excl_file_ext

Character vector of file extensions that should not be considered as Olink data file. Mainly used to avoid nested compressed files.

Value

The file name of the Olink data file.

Author(s)

Klev Diamanti

Get names of selected broader Olink platforms.

Description

Get names of selected broader Olink platforms.

Usage

get_olink_broader_platforms(
  platform_name = NULL,
  data_type = NULL,
  quant_type = NULL
)

Arguments

platform_name

Name of the platform_name to filter for. If NULL all platforms are considered, otherwise one of "Explore 3072", "Explore HT", "Flex", "Focus", "Reveal", "Target 48", and "Target 96" is expected.

data_type

Name of the data_type to filter the Olink for. If NULL all data types are considered, otherwise one of "Ct", "NPX", and "Quantified" is expected.

quant_type

Name of the quant_type to filter for. If NULL all quantification types are considered, otherwise one of "absolute" and "relative" is expected.

Value

A character vector with names of Olink broader platforms filtered by platform_name, data_type, and quant_type.

Author(s)

Klev Diamanti

Get names of selected Olink quantification methods (data types).

Description

Get names of selected Olink quantification methods (data types).

Usage

get_olink_data_types(
  broad_platform = NULL,
  platform_name = NULL,
  quant_type = NULL
)

Arguments

broad_platform

Name of the broad_platform to filter for. If NULL all broader platforms are considered, otherwise one of "NGS" and "qPCR" is expected.

platform_name

Name of the platform_name to filter for. If NULL all platforms are considered, otherwise one of "Explore 3072", "Explore HT", "Flex", "Focus", "Reveal", "Target 48", and "Target 96" is expected.

quant_type

Name of the quant_type to filter for. If NULL all quantification types are considered, otherwise one of "absolute" and "relative" is expected.

Value

A character vector with names of Olink quantification methods (data types) filtered by broad_platform, platform_name, and quant_type.

Author(s)

Klev Diamanti

Get names of selected Olink platforms.

Description

Get names of selected Olink platforms.

Usage

get_olink_platforms(broad_platform = NULL, data_type = NULL, quant_type = NULL)

Arguments

broad_platform

Name of the broad_platform to filter for. If NULL all broader platforms are considered, otherwise one of "NGS" and "qPCR" is expected.

data_type

Name of the data_type to filter the Olink for. If NULL all data types are considered, otherwise one of "Ct", "NPX", and "Quantified" is expected.

quant_type

Name of the quant_type to filter for. If NULL all quantification types are considered, otherwise one of "absolute" and "relative" is expected.

Value

A character vector with names of Olink platforms filtered by broad_platform, data_type, and quant_type.

Author(s)

Klev Diamanti

Get names of selected Olink quantification types.

Description

Get names of selected Olink quantification types.

Usage

get_olink_quant_types(
  broad_platform = NULL,
  platform_name = NULL,
  data_type = NULL
)

Arguments

broad_platform

Name of the broad_platform to filter for. If NULL all broader platforms are considered, otherwise one of "NGS" and "qPCR" is expected.

platform_name

Name of the platform_name to filter for. If NULL all platforms are considered, otherwise one of "Explore 3072", "Explore HT", "Flex", "Focus", "Reveal", "Target 48", and "Target 96" is expected.

data_type

Name of the data_type to filter the Olink for. If NULL all data types are considered, otherwise one of "Ct", "NPX", and "Quantified" is expected.

Value

A character vector with names of Olink quantification types filtered by broad_platform, platform_name, and data_type.

Author(s)

Klev Diamanti

Example Sample Manifest

Description

Synthetic sample manifest to demonstrate use of functions in this package.

Usage

manifest

Format

This dataset contains columns:

SubjectID: Subject Identifier, A-Z
Visit: Visit Number, 1-6
SampleID: 138 unique sample IDs
Site: Site1 or Site2

Details

A tibble with 138 rows and 4 columns. This manifest contains 26 example subjects, with 6 visits and 2 sites.

Identifying which mapping file to use

Description

Identifying which mapping file to use

Usage

mapping_file_id(prod_uniq)

Arguments

prod_uniq

Name of products (not_ref, ref)

Value

dataframe of mapping file to use for OlinkID mapping (eHT_e3072_mapping, reveal_eht_mapping or reveal_e3072_mapping)

Combine reference and non-reference datasets

Description

The function is used by norm_internal_subset and norm_internal_bridge to combine the reference dataset that has Adj_factor = 0 and the non-reference dataset that used the adjustment factors provided in adj_fct_df.

Usage

norm_internal_adjust(
  ref_df,
  ref_name,
  ref_cols,
  not_ref_df,
  not_ref_name,
  not_ref_cols,
  adj_fct_df
)

Arguments

ref_df

The reference dataset to be used in normalization (required).

ref_name

Project name of the reference dataset (required).

ref_cols

Named list of column names in the reference dataset (required).

not_ref_df

The non-reference dataset to be used in normalization (required).

not_ref_name

Project name of the non-reference dataset (required).

not_ref_cols

Named list of column names in the non-reference dataset (required).

adj_fct_df

Dataset containing the adjustment factors to be applied to the non-reference dataset for (required).

Details

The function calls norm_internal_adjust_ref and norm_internal_adjust_not_ref and combines their outputs.

Value

Tibble or ArrowObject with the normalized dataset.

Author(s)

Klev Diamanti

Add adjustment factors to a dataset

Description

Add adjustment factors to a dataset

Usage

norm_internal_adjust_not_ref(df, name, cols, adj_fct_df, adj_fct_cols)

Arguments

df

The dataset to be normalized (required).

name

Project name of the dataset (required).

cols

Named list of column names in the dataset (required).

adj_fct_df

Dataset containing the adjustment factors to be applied to the dataset not_ref_df (required).

adj_fct_cols

Named list of column names in the dataset containing adjustment factors (required).

Value

Tibble or ArrowObject with the normalized dataset with additional columns "Project" and "Adj_factor".

Author(s)

Klev Diamanti

Modify the reference dataset to be combined with the non-reference normalized dataset

Description

Modify the reference dataset to be combined with the non-reference normalized dataset

Usage

norm_internal_adjust_ref(ref_df, ref_name)

Arguments

ref_df

The reference dataset to be used in normalization (required).

ref_name

Project name of the reference dataset (required).

Value

Tibble or ArrowObject with the reference dataset with additional columns "Project" and "Adj_factor".

Author(s)

Klev Diamanti

Compute median value of the quantification method for each Olink assay

Description

The function computes the median value of the the quantification method for each Olink assay in the set of samples samples, and it adds the column Project.

Usage

norm_internal_assay_median(df, samples, name, cols)

Arguments

df

The dataset to calculate medians from (required).

samples

Character vector of sample identifiers to be used for adjustment factor calculation in the dataset df (required).

name

Project name of the dataset that will be added in the column Project (required).

cols

Named list of column names identified in the dataset df (required).

Details

This function is typically used by internal functions norm_internal_subset and norm_internal_reference_median that compute median quantification value for each assay across multiple samples specified by samples.

Value

Tibble or ArrowObject with one row per Olink assay and the columns OlinkID, Project, and assay_med

Author(s)

Klev Diamanti

Internal bridge normalization function

Description

Internal bridge normalization function

Usage

norm_internal_bridge(
  ref_df,
  ref_samples,
  ref_name,
  ref_cols,
  not_ref_df,
  not_ref_name,
  not_ref_cols
)

Arguments

ref_df

The reference dataset to be used in normalization (required).

ref_samples

Character vector of sample identifiers to be used for adjustment factor calculation in the reference dataset (required).

ref_name

Project name of the reference dataset (required).

ref_cols

Named list of column names in the reference dataset (required).

not_ref_df

The non-reference dataset to be used in normalization (required).

not_ref_name

Project name of the non-reference dataset (required).

not_ref_cols

Named list of column names in the non-reference dataset (required).

Value

Tibble or ArrowObject with the normalized dataset.

Author(s)

Klev Diamanti

Internal function normalizing Olink Explore 3k to Olink Explore 3072

Description

Internal function normalizing Olink Explore 3k to Olink Explore 3072

Usage

norm_internal_cross_product(
  ref_df,
  ref_samples,
  ref_name,
  ref_cols,
  prod_uniq,
  not_ref_df,
  not_ref_name,
  not_ref_cols
)

Arguments

ref_df

The reference dataset to be used in normalization (required).

ref_samples

Character vector of sample identifiers to be used for adjustment factor calculation in the reference dataset (required).

ref_name

Project name of the reference dataset (required).

ref_cols

Named list of column names in the reference dataset (required).

prod_uniq

Name of products (not_ref, ref)

not_ref_df

The non-reference dataset to be used in normalization (required).

not_ref_name

Project name of the non-reference dataset (required).

not_ref_cols

Named list of column names in the non-reference dataset (required).

Value

Tibble or ArrowObject with a dataset with the following additional columns:

OlinkID_E3072: Corresponding assay identifier from Olink Explore 3072.
Project: Project of origin.
BridgingRecommendation: Recommendation of whether the assay is bridgeable or not. One of "NotBridgeable", "MedianCentering", or "QuantileSmoothing".
MedianCenteredNPX: NPX values adjusted based on the median of the pair-wise differences of NPX values between bridge samples.
QSNormalizedNPX: NPX values adjusted based on the quantile smoothing normalization among bridge samples.

Author(s)

Klev Diamanti

Internal reference median normalization function

Description

Internal reference median normalization function

Usage

norm_internal_reference_median(
  ref_df,
  ref_samples,
  ref_name,
  ref_cols,
  reference_medians
)

Arguments

ref_df

The reference dataset to be used in normalization (required).

ref_samples

Character vector of sample identifiers to be used for adjustment factor calculation in the reference dataset (required).

ref_name

Project name of the reference dataset (required).

ref_cols

Named list of column names in the reference dataset (required).

reference_medians

Dataset with columns "OlinkID" and "Reference_NPX" (required). Used for reference median normalization.

Value

Tibble or ArrowObject with the normalized dataset.

Author(s)

Klev Diamanti

Update column names of non-reference dataset based on those of reference dataset

Description

This function handles cases when specific columns referring to the same thing are named differently in df1 and df2 normalization datasets. It only renames columns panel_version, qc_warn, and assay_warn based on their names in the reference dataset.#'

Usage

norm_internal_rename_cols(ref_cols, not_ref_cols, not_ref_df)

Arguments

ref_cols

Named list of column names identified in the reference dataset.

not_ref_cols

Named list of column names identified in the non-reference dataset.

not_ref_df

Non-reference dataset to be used in normalization.

Value

not_ref_df with updated column names.

Author(s)

Klev Diamanti

Internal subset normalization function

Description

This function performs subset normalization using a subset of the samples from either or both reference and non-reference datasets. When all samples from each dataset are used, the function performs intensity normalization.

Usage

norm_internal_subset(
  ref_df,
  ref_samples,
  ref_name,
  ref_cols,
  not_ref_df,
  not_ref_samples,
  not_ref_name,
  not_ref_cols
)

Arguments

ref_df

The reference dataset to be used in normalization (required).

ref_samples

Character vector of sample identifiers to be used for adjustment factor calculation in the reference dataset (required).

ref_name

Project name of the reference dataset (required).

ref_cols

Named list of column names in the reference dataset (required).

not_ref_df

The non-reference dataset to be used in normalization (required).

not_ref_samples

Character vector of sample identifiers to be used for adjustment factor calculation in the non-reference dataset (required).

not_ref_name

Project name of the non-reference dataset (required).

not_ref_cols

Named list of column names in the non-reference dataset (required).

Value

Tibble or ArrowObject with the normalized dataset.

Author(s)

Klev Diamanti

Update MaxLOD to the maximum MaxLOD across normalized datasets.

Description

Update MaxLOD to the maximum MaxLOD across normalized datasets.

Usage

norm_internal_update_maxlod(df, cols)

Arguments

df

Normalized Olink dataset (required).

cols

Named list of column names in the dataset (required).

Value

The same dataset as the input df with the column reflecting MaxLOD updated.

NPX Data in Long format.

Description

This is a synthetic dataset aiming to use-cases of functions from this package.

Usage

npx_data1

Format

In addition to standard read_npx() columns, this dataset also contains columns:

Subject: Subject Identifier
Treatment: Treated or Untreated
Site: Site indicator, 5 unique values
Time: Baseline, Week.6 and Week.12
Project: Project ID number

Details

A tibble with 29,440 rows and 17 columns.

npx_data1 is an Olink NPX data file (tibble) in long format with 158 unique Sample identifiers (including 2 repeats each of control samples: CONTROL_SAMPLE_AS 1 and CONTROL_SAMPLE_AS 2). The data also contains 1104 assays uniquely identified using OlinkID over 2 Olink Panels.

NPX Data in Long format, a follow-up.

Description

This is a synthetic dataset aiming to use-cases of functions from this package. The format is very similar to npx_data1. Both datasets can be used to demonstrate the use of normalization functionality.

Usage

npx_data2

Format

In addition to standard read_npx() columns, this dataset also contains columns:

Subject: Subject Identifier
Treatment: Treated or Untreated
Site: Site indicator, 5 unique values
Time: Baseline, Week.6 and Week.12
Project: Project ID number

Details

A tibble with 32,384 rows and 17 columns.

npx_data2 is an Olink NPX data file (tibble) in long format with 174 unique Sample identifiers (including 2 repeats each of control samples: CONTROL_SAMPLE_AS 1 and CONTROL_SAMPLE_AS 2). The data also contains 1,104 assays uniquely identified using OlinkID over 2 Panels. This dataset also contains 16 bridge samples with SampleIDs that are also present in data npx_data1. These samples are: A13, A29, A30, A36, A45, A46, A52, A63, A71, A73, B3, B4, B37, B45, B63 and B75.

Function which performs an ANOVA per protein.

Description

Performs an ANOVA F-test for each assay (by OlinkID) in every panel using car::Anova and Type III sum of squares. The function handles both factor and numerical variables and/or covariates.

Usage

olink_anova(
  df,
  variable,
  check_log = NULL,
  outcome = "NPX",
  covariates = NULL,
  model_formula,
  return.covariates = FALSE,
  verbose = TRUE
)

Arguments

df

NPX data frame in long format with at least protein name (Assay), OlinkID, UniProt, Panel and a factor with at least 3 levels.

variable

Single character value or character array. Variable(s) to test. If length > 1, the included variable names will be used in crossed analyses. Also takes ':' or '*' notation.

check_log

A named list returned by check_npx(). If NULL, check_npx() will be run internally using df.

outcome

Character. The dependent variable. Default: NPX.

covariates

Single character value or character array. Default: NULL. Covariates to include. Takes ':' or '*' notation. Crossed analysis will not be inferred from main effects.

model_formula

(optional) Symbolic description of the model to be fitted in standard formula notation (e.g. "NPX~A*B"). If provided, this will override the outcome, variable and covariates arguments. Can be a string or of class stats::formula().

return.covariates

Boolean. Default: False. Returns F-test results for the covariates. Note: Adjusted p-values will be NA for the covariates.

verbose

Boolean. Default: True. If information about removed samples, factor conversion and final model formula is to be printed to the console.

Details

Samples that have no variable information or missing factor levels are automatically removed from the analysis (specified in a message if verbose = TRUE). Character columns in the input dataframe are automatically converted to factors (specified in a message if verbose = TRUE). Numerical variables are not converted to factors. Control samples should be removed before using this function. Control assays (AssayType is not "assay", or Assay contains "control" or "ctrl") should be removed before using this function. If a numerical variable is to be used as a factor, this conversion needs to be done on the dataframe before the function call.

Crossed analysis, i.e. A*B formula notation, is inferred from the variable argument in the following cases:

c('A','B')
c('A: B')
c('A: B', 'B') or c('A: B', 'A')

Inference is specified in a message if verbose = TRUE.

For covariates, crossed analyses need to be specified explicitly, i.e. two main effects will not be expanded with a c('A','B') notation. Main effects present in the variable takes precedence. The formula notation of the final model is specified in a message if verbose = TRUE.

Adjusted p-values are calculated by stats::p.adjust according to the Benjamini & Hochberg (1995) method (“fdr”). The threshold is determined by logic evaluation of Adjusted_pval < 0.05. Covariates are not included in the p-value adjustment.

Value

A "tibble" containing the ANOVA results for every protein. The tibble is arranged by ascending p-values. Columns include:

Assay: "character" Protein symbol
OlinkID: "character" Olink specific ID
UniProt: "character" UniProt ID
Panel: "character" Name of Olink Panel
term: "character" term in model
df: "numeric" degrees of freedom
sumsq: "numeric" sum of square
meansq: "numeric" mean of square
statistic: "numeric" value of the statistic
p.value: "numeric" nominal p-value
Adjusted_pval: "numeric" adjusted p-value for the test (Benjamini&Hochberg)
Threshold: "character" if adjusted p-value is significant or not (< 0.05)

Examples


if (rlang::is_installed(pkg = c("broom", "car"))) {
  #data
  npx_df <- OlinkAnalyze::npx_data1 |>
    dplyr::filter(
      !grepl(
        pattern = "control|ctrl",
        x = .data[["SampleID"]],
        ignore.case = TRUE
      )
    )

  # check data
  npx_df_check_log <- OlinkAnalyze::check_npx(
    df = npx_df
  )

  # One-way ANOVA, no covariates.
  # Results in a model NPX~Time
  anova_results <- OlinkAnalyze::olink_anova(
    df = npx_df,
    check_log = npx_df_check_log,
    variable = "Time"
  )

  # Two-way ANOVA, one main effect covariate.
  # Results in model NPX~Treatment*Time+Site.
  anova_results <- OlinkAnalyze::olink_anova(
    df = npx_df,
    check_log = npx_df_check_log,
    variable = c("Treatment:Time"),
    covariates = "Site"
  )

  # One-way ANOVA, interaction effect covariate.
  # Results in model NPX~Treatment+Site:Time+Site+Time.
  anova_results <- OlinkAnalyze::olink_anova(
    df = npx_df,
    check_log = npx_df_check_log,
    variable = "Treatment",
    covariates = "Site:Time"
  )
}

Function which performs an ANOVA posthoc test per protein.

Description

Performs a post hoc ANOVA test using emmeans::emmeans with Tukey p-value adjustment per assay (by OlinkID) for each panel at confidence level 0.95. See olink_anova for details of input notation.

Usage

olink_anova_posthoc(
  df,
  check_log = NULL,
  olinkid_list = NULL,
  variable,
  covariates = NULL,
  outcome = "NPX",
  model_formula,
  effect,
  effect_formula,
  mean_return = FALSE,
  post_hoc_padjust_method = "tukey",
  verbose = TRUE
)

Arguments

df

NPX data frame in long format with at least protein name (Assay), OlinkID, UniProt, Panel and a factor with at least 3 levels.

check_log

A named list returned by check_npx(). If NULL, check_npx() will be run internally using df.

olinkid_list

Character vector of OlinkID's on which to perform post hoc analysis. If not specified, all assays in df are used.

variable

Single character value or character array. Variable(s) to test. If length > 1, the included variable names will be used in crossed analyses. Also takes ':' notation.

covariates

Single character value or character array. Default: NULL. Covariates to include. Takes ':' or '*' notation. Crossed analysis will not be inferred from main effects.

outcome

Character. The dependent variable. Default: NPX.

model_formula

effect

Term on which to perform post-hoc. Character vector. Must be subset of or identical to variable.

effect_formula

(optional) A character vector specifying the names of the predictors over which estimated marginal means are desired as defined in the emmeans package. May also be a formula. If provided, this will override the effect argument. See ?emmeans::emmeans() for more information.

mean_return

Boolean. If true, returns the mean of each factor level rather than the difference in means (default). Note that no p-value is returned for mean_return = TRUE and no adjustment is performed.

post_hoc_padjust_method

P-value adjustment method to use for post-hoc comparisons within an assay. Options include tukey, sidak, bonferroni and none.

verbose

Boolean. Default: True. If information about removed samples, factor conversion and final model formula is to be printed to the console.

Details

The function handles both factor and numerical variables and/or covariates. Control samples should be removed before using this function. Control assays (AssayType is not "assay", or Assay contains "control" or "ctrl") should be removed before using this function. The posthoc test for a numerical variable compares the difference in means of the outcome variable (default: NPX) for 1 standard deviation difference in the numerical variable, e.g. mean NPX at mean(numerical variable) versus mean NPX at mean(numerical variable) + 1*SD(numerical variable).

Value

A "tibble" of posthoc tests for specified effect, arranged by ascending adjusted p-values. Columns include:

Assay: "character" Protein symbol
OlinkID: "character" Olink specific ID
UniProt: "character" UniProt ID
Panel: "character" Name of Olink Panel
term: "character" term in model
contrast: "character" the groups that were compared
estimate: "numeric" difference in mean NPX between groups
conf.low: "numeric" confidence interval for the mean (lower end)
conf.high: "numeric" confidence interval for the mean (upper end)
Adjusted_pval: "numeric" adjusted p-value for the test
Threshold: "character" if adjusted p-value is significant or not (< 0.05)

Examples


if (rlang::is_installed(pkg = c("car", "emmeans"))) {
  # data
  npx_df <- OlinkAnalyze::npx_data1 |>
    dplyr::filter(
      !grepl(
        pattern = "control|ctrl",
        x = .data[["SampleID"]],
        ignore.case = TRUE
       )
    )

  # check data
  npx_df_check_log <- OlinkAnalyze::check_npx(
    df = npx_df
  )

  # Two-way ANOVA, one main effect (Site) covariate.
  # Results in model NPX~Treatment*Time+Site.
  anova_results <- OlinkAnalyze::olink_anova(
    df = npx_df,
    check_log = npx_df_check_log,
    variable = c("Treatment:Time"),
    covariates = "Site"
  )

  # Posthoc test for the model NPX~Treatment*Time+Site,
  # on the interaction effect Treatment:Time with covariate Site.

  # Filtering out significant and relevant results.
  significant_assays <- anova_results |>
    dplyr::filter(
      .data[["Threshold"]] == "Significant"
      & .data[["term"]] == "Treatment:Time"
    ) |>
    dplyr::select(
      dplyr::all_of("OlinkID")
    ) |>
    dplyr::distinct() |>
    dplyr::pull()

  # Posthoc, all pairwise comparisons
  anova_posthoc_results <- OlinkAnalyze::olink_anova_posthoc(
    df = npx_df,
    check_log = npx_df_check_log,
    variable = c("Treatment:Time"),
    covariates = "Site",
    olinkid_list = significant_assays,
    effect = "Treatment:Time"
  )

  # Posthoc, treated vs untreated at each timepoint, adjusted for Site effect
  anova_posthoc_results <- OlinkAnalyze::olink_anova_posthoc(
    df = npx_df,
    check_log = npx_df_check_log,
    model_formula = "NPX~Treatment*Time+Site",
    olinkid_list = significant_assays,
    effect_formula = "pairwise~Treatment|Time"
  )
}

Function which plots boxplots of selected variables

Description

Generates faceted boxplots of NPX vs. grouping variable(s) for a given list of proteins (OlinkIDs) using ggplot2::ggplot and ggplot2::geom_boxplot.

Usage

olink_boxplot(
  df,
  variable,
  olinkid_list,
  verbose = FALSE,
  number_of_proteins_per_plot = 6,
  posthoc_results = NULL,
  ttest_results = NULL,
  check_log = NULL,
  ...
)

Arguments

df

NPX data frame in long format with at least protein name (Assay), OlinkID (unique), UniProt and at least one grouping variable.

variable

A character vector or character value indicating which column to use as the x-axis and fill grouping variable. The first or single value is used as x-axis, the second as fill. Further values in a vector are not plotted.

olinkid_list

Character vector indicating which proteins (OlinkIDs) to plot.

verbose

Boolean. If the plots are shown as well as returned in the list (default is false).

number_of_proteins_per_plot

Number of boxplots to include in the facet plot (default 6).

posthoc_results

Data frame from ANOVA posthoc analysis using olink_anova_posthoc() function.

ttest_results

Data frame from ttest analysis using olink_ttest() function.

check_log

A named list returned by check_npx(). If NULL, check_npx() will be run internally using df...

...

coloroption passed to specify color order.

Value

A list of objects of class “ggplot” (the actual ggplot object is entry 1 in the list). Box and whisker plot of NPX (y-axis) by variable (x-axis) for each Assay.

Examples



if (rlang::is_installed(pkg = c("broom", "car"))) {
  npx_df <- npx_data1 |>
    dplyr::filter(
      !grepl(pattern = "control|ctrl",
             x = .data[["SampleID"]],
             ignore.case = TRUE)
    )
  anova_results <- OlinkAnalyze::olink_anova(
    df = npx_df,
    variable = "Site"
  )
  significant_assays <- anova_results |>
    dplyr::filter(
      .data[["Threshold"]] == "Significant"
    ) |>
    dplyr::pull(
      .data[["OlinkID"]]
    )
  OlinkAnalyze::olink_boxplot(
    df = npx_df,
    variable = "Site",
    olinkid_list = significant_assays,
    verbose = TRUE,
    number_of_proteins_per_plot = 3L
  )
}

Bridge selection function

Description

The bridge selection function will select a number of bridge samples based on the input data. It selects samples with good detection that pass QC and cover a good range of the data. If possible, Olink recommends 8-16 bridge samples. When running the selector, Olink recommends starting at sample_missing_freq = 0.10 which represents a maximum of 10\ per sample. If there are not enough samples output, increase to 20\ The function accepts NPX Excel files with data < LOD replaced.

Usage

olink_bridge_selector(df, sample_missing_freq, n, check_log = NULL)

olink_bridgeselector(df, ..., n, check_log = NULL)

Arguments

df

Tibble/data frame in long format such as produced by the Olink Analyze read_npx function.

sample_missing_freq

The threshold for sample wise missingness.

n

Number of bridge samples to be selected.

check_log

A named list returned by check_npx(). If NULL, check_npx() will be run internally using df.

...

Additional arguments. Currently only accepts sampleMissingFreq for backward compatibility. Please use sample_missing_freq instead of sampleMissingFreq in the future.

Details

olink_bridgeselector() is a synonym of olink_bridge_selector() .

Value

A "tibble" with sample IDs and mean NPX for a defined number of bridging samples. Columns include:

SampleID: Sample ID
PercAssaysBelowLOD: Percent of Assays that are below LOD for the sample
MeanNPX: Mean NPX for the sample

Examples


  check_log <- OlinkAnalyze::check_npx(df = npx_data1)

  bridge_samples <- OlinkAnalyze::olink_bridge_selector(
    df = npx_data1,
    sample_missing_freq = 0.1,
    n = 20L,
    check_log = check_log
  )

Plots for each bridgeable assays between two products.

Description

Plots for each bridgeable assays between two products.

Usage

olink_bridgeability_plot(
  df,
  check_log = NULL,
  olink_id = NULL,
  median_counts_threshold = 150L,
  min_count = 10L
)

Arguments

df

A tibble containing the cross-product bridge normalized dataset generated by olink_normalization.

check_log

A named list returned by check_npx(). If NULL, check_npx() will be run internally using df.

olink_id

Character vector of Olink assay identifiers OlinkID for which bridgeability plots will be created. If null, plots for all assays in data will be created. (default = NULL)

median_counts_threshold

Threshold indicating the minimum median counts for each product (default = 150).

min_count

Threshold indicating the minimum number of counts per data point (default = 10). Data below min_count are excluded.

Value

An object of class "ggplot" containing 4 plots for each assay.

Author(s)

Amrita Kar Klev Diamanti

Generates a combined plot per assay containing a violin and boxplot for IQR ranges; correlation plot of NPX values; a median count bar plot and KS plots from the 2 products.

Examples


if (rlang::is_installed(pkg = c("ggpubr"))) {
  npx_ht <- OlinkAnalyze:::data_ht_small |>
    dplyr::filter(
      .data[["SampleType"]] == "SAMPLE"
    )

  npx_3072 <- OlinkAnalyze:::data_3k_small |>
    dplyr::filter(
      .data[["SampleType"]] == "SAMPLE"
    )

  overlapping_samples <- intersect(
    x = npx_ht$SampleID,
    y = npx_3072$SampleID
  )

  data_norm <- OlinkAnalyze::olink_normalization(
    df1 = npx_ht,
    df2 = npx_3072,
    overlapping_samples_df1 = overlapping_samples,
    df1_project_nr = "Explore HT",
    df2_project_nr = "Explore 3072",
    reference_project = "Explore HT",
    df1_check_log = check_npx(df = npx_ht) |>
      suppressMessages() |>
      suppressWarnings(),
    df2_check_log = check_npx(df = npx_3072) |>
      suppressMessages() |>
      suppressWarnings()
  )

  data_norm_bridge_p <- OlinkAnalyze::olink_bridgeability_plot(
    df = data_norm,
    check_log = check_npx(df = data_norm) |>
      suppressMessages() |>
      suppressWarnings(),
    olink_id = c("OID40770", "OID40835"),
    median_counts_threshold = 150L,
    min_count = 10L
  )
}

Olink color scale for discrete ggplots

Description

Olink color scale for discrete ggplots

Usage

olink_color_discrete(..., alpha = 1, coloroption = NULL)

Arguments

...

Optional. Additional arguments to pass to ggplot2::discrete_scale()

alpha

transparency (optional)

coloroption

string, one or more of the following: c("red", "orange", "yellow", "green", "teal", "turqoise", "lightblue", "darkblue", "purple", "pink")

Value

No return value, called for side effects

Examples


ggplot2::ggplot(
  data = datasets::mtcars,
  mapping = ggplot2::aes(
    x = .data[["wt"]],
    y = .data[["mpg"]],
    color = as.factor(x = .data[["cyl"]])
  )
) +
  ggplot2::geom_point(
    size = 4L
  ) +
  OlinkAnalyze::olink_color_discrete() +
  ggplot2::theme_bw()

ggplot2::ggplot(
  data = datasets::mtcars,
  mapping = ggplot2::aes(
    x = .data[["wt"]],
    y = .data[["mpg"]],
    color = as.factor(x = .data[["cyl"]])
  )
) +
  ggplot2::geom_point(
    size = 4L
  ) +
  OlinkAnalyze::olink_color_discrete(
    coloroption = c("lightblue", "red", "green")
  ) +
  ggplot2::theme_bw()

Olink color scale for continuous ggplots

Description

Olink color scale for continuous ggplots

Usage

olink_color_gradient(..., alpha = 1, coloroption = NULL)

Arguments

...

Optional. Additional arguments to pass to ggplot2::scale_color_gradientn()

alpha

transparency (optional)

coloroption

string, one or more of the following: c("red", "orange", "yellow", "green", "teal", "turqoise", "lightblue", "darkblue", "purple", "pink")

Value

No return value, called for side effects

Examples


ggplot2::diamonds |>
  dplyr::filter(
    .data[["x"]] > 5
    & .data[["x"]] < 6
    & .data[["y"]] > 5
    & .data[["y"]] < 6
  ) |>
  dplyr::mutate(
    diff = sqrt(
      x = abs(
        x = .data[["x"]] - .data[["y"]]
      )
    ) * sign(
      x = .data[["x"]] - .data[["y"]]
    )
  ) |>
  ggplot2::ggplot(
    mapping = ggplot2::aes(
      x = .data[["x"]],
      y = .data[["y"]],
      colour = .data[["diff"]]
    )
  ) +
  ggplot2::geom_point() +
  ggplot2::theme_bw() +
  OlinkAnalyze::olink_color_gradient()

Plot distributions of a given variable for all plates

Description

Displays a bar chart for each plate representing the distribution of the given grouping variable on each plate using ggplot2::ggplot and ggplot2::geom_bar.

Usage

olink_display_plate_dist(data, fill.color = "plate")

olink_displayPlateDistributions(data, fill.color = "plate")

Arguments

data

tibble/data frame in long format returned from the olink_plate_randomizer function.

fill.color

Column name to be used as coloring variable for wells.

Value

An object of class "ggplot" showing the percent distribution of fill.color in each plate (x-axis)

Examples

randomized.manifest <- olink_plate_randomizer(manifest)
olink_display_plate_dist(data=randomized.manifest,
fill.color="Site")

Plot all plates colored by a variable

Description

Displays each plate in a facet with cells colored by the given variable using ggplot and ggplot2::geom_tile.

Usage

olink_display_plate_layout(
  data,
  fill.color,
  PlateSize = 96L,
  num_ctrl = 8L,
  rand_ctrl = FALSE,
  Product,
  include.label = FALSE
)

olink_displayPlateLayout(
  data,
  fill.color,
  PlateSize = 96L,
  num_ctrl = 8L,
  rand_ctrl = FALSE,
  Product,
  include.label = FALSE
)

Arguments

data

tibble/data frame in long format returned from the olink_plate_randomizer function.

fill.color

Column name to be used as coloring variable for wells.

PlateSize

Integer. Either 96 or 48. 96 is default.

num_ctrl

Numeric. Number of controls on each plate (default = 8)

rand_ctrl

Logical. Whether controls are added to be randomized across the plate (default = FALSE)

Product

String. Name of Olink product used to set PlateSize if not provided. Optional.

include.label

Should the variable group be shown in the plot.

Value

An object of class "ggplot" showing each plate in a facet with the cells colored by values in column fill.color in input data.

Examples


randomized_manifest <- OlinkAnalyze::olink_plate_randomizer(
  Manifest = manifest
)
OlinkAnalyze::olink_display_plate_layout(
  data = randomized_manifest,
  fill.color = "Site"
)

Function to plot the NPX distribution by panel

Description

Generates boxplots of NPX vs. SampleID colored by QC_Warning (default) or any other grouping variable and faceted by Panel using ggplot and ggplot2::geom_boxplot.

Usage

olink_dist_plot(df, check_log = NULL, color_g = "QC_Warning", ...)

Arguments

df

NPX data frame in long format. Must have columns SampleID, NPX and Panel

check_log

A named list returned by check_npx(). If NULL, check_npx() will be run internally using df.

color_g

Character value indicating which column to use as fill color. (default: QC_Warning).

...

Color option passed to specify color order.

Value

An object of class "ggplot" which displays NPX distribution for each sample per panel

Examples



# Optional: check and clean dataset
check_log <- OlinkAnalyze::check_npx(
  df = npx_data1
)

cleaned_data <- OlinkAnalyze::clean_npx(
  df = npx_data1,
  check_log = check_log
)

OlinkAnalyze::olink_dist_plot(
  df = npx_data1,
  check_log = check_log,
  color_g = "QC_Warning"
)

OlinkAnalyze::olink_dist_plot(
  df = cleaned_data,
  check_log = check_log,
  color_g = "QC_Warning"
)

Olink fill scale for discrete ggplots

Description

Olink fill scale for discrete ggplots

Usage

olink_fill_discrete(..., alpha = 1, coloroption = NULL)

Arguments

...

Optional. Additional arguments to pass to ggplot2::discrete_scale()

alpha

transparency (optional)

coloroption

string, one or more of the following: c("red", "orange", "yellow", "green", "teal", "turqoise", "lightblue", "darkblue", "purple", "pink")

Value

No return value, called for side effects

Examples


ggplot2::diamonds |>
  dplyr::filter(
    .data[["x"]] > 5
    & .data[["x"]] < 6
    & .data[["y"]] > 5
    & .data[["y"]] < 6
  ) |>
  dplyr::mutate(
    diff = sqrt(
      x = abs(
        x = .data[["x"]] - .data[["y"]]
      )
    ) * sign(
      x = .data[["x"]] - .data[["y"]]
    )
  ) |>
  ggplot2::ggplot(
    mapping = ggplot2::aes(
      x = .data[["x"]],
      y = .data[["y"]],
      colour = .data[["diff"]]
    )
  ) +
  ggplot2::geom_point() +
  ggplot2::theme_bw() +
  OlinkAnalyze::olink_fill_discrete()

Olink fill scale for continuous ggplots

Description

Olink fill scale for continuous ggplots

Usage

olink_fill_gradient(..., alpha = 1, coloroption = NULL)

Arguments

...

Optional. Additional arguments to pass to ggplot2::scale_fill_gradientn()

alpha

transparency (optional)

coloroption

string, one or more of the following: c("red", "orange", "yellow", "green", "teal", "turqoise", "lightblue", "darkblue", "purple", "pink")

Value

No return value, called for side effects

Examples


ggplot2::diamonds |>
  dplyr::filter(
    .data[["x"]] > 5
    & .data[["x"]] < 6
    & .data[["y"]] > 5
    & .data[["y"]] < 6
  ) |>
  dplyr::mutate(
    diff = sqrt(
      x = abs(
        x = .data[["x"]] - .data[["y"]]
      )
    ) * sign(
      x = .data[["x"]] - .data[["y"]]
    )
  ) |>
  ggplot2::ggplot(
    mapping = ggplot2::aes(
      x = .data[["x"]],
      y = .data[["y"]],
      colour = .data[["diff"]]
    )
  ) +
  ggplot2::geom_point() +
  ggplot2::theme_bw() +
  OlinkAnalyze::olink_fill_gradient()

Retrieve non-overlapping assays between two NPX datasets

Description

For use in olink_normalization_format function. Generates a message stating how many assays were not overlapping. Appends additional columns depending on the normalization type to match normalized data output. For cross-product normalization, splits any concatenated OlinkIDs.

Usage

olink_format_oid_no_overlap(lst_check)

Arguments

lst_check

Normalization input list checks generated by olink_norm_input_check.

Value

A combined "tibble" of Olink data in long format containing only the non-overlapping assays from each input dataset.

Author(s)

Danai Topouza Klev Diamanti

Remove negative controls and plate controls from dataset. For use in olink_normalization_format function. Generates a message stating which control samples were removed.

Description

Remove negative controls and plate controls from dataset. For use in olink_normalization_format function. Generates a message stating which control samples were removed.

Usage

olink_format_rm_ext_ctrl(df, lst_check)

Arguments

df

NPX dataset to be processed.

lst_check

Normalization input list checks generated by olink_norm_input_check.

Value

A "tibble" of Olink data in long format containing the input dataset with negative controls and plate controls removed.

Author(s)

Danai G. Topouza Klev Diamanti

Function to plot a heatmap of the NPX data

Description

Generates a heatmap using pheatmap::pheatmap of all samples from NPX data.

Usage

olink_heatmap_plot(
  df,
  check_log = NULL,
  variable_row_list = NULL,
  variable_col_list = NULL,
  center_scale = TRUE,
  cluster_rows = TRUE,
  cluster_cols = TRUE,
  show_rownames = TRUE,
  show_colnames = TRUE,
  colnames = "both",
  annotation_legend = TRUE,
  fontsize = 10,
  na_col = "black",
  ...
)

Arguments

df

Data frame in long format with SampleID, NPX, OlinkID, Assay and columns of choice for annotations.

check_log

output from check_npx on df

variable_row_list

Columns in df to be annotated for rows in the heatmap.

variable_col_list

Columns in df to be annotated for columns in the heatmap.

center_scale

Logical. If data should be centered and scaled across assays (default TRUE).

cluster_rows

Logical. Determining if rows should be clustered (default TRUE).

cluster_cols

Logical. Determining if columns should be clustered (default TRUE).

show_rownames

Logical. Determining if row names are shown (default TRUE).

show_colnames

Logical. Determining if column names are shown (default TRUE).

colnames

Character. Determines how to label the columns. Must be 'assay', 'oid', or 'both' (default 'both').

annotation_legend

Logical. Determining if legend for annotations should be shown (default TRUE).

fontsize

Fontsize (default 10)

na_col

Color of cells with NA (default black)

...

Additional arguments used in pheatmap::pheatmap

Details

The values are by default scaled across and centered in the heatmap. Columns and rows are by default sorted by by dendrogram. Unique sample names are required.

Value

An object of class ggplot, generated from the gtable returned by pheatmap::pheatmap.

Examples


if (rlang::is_installed(pkg = c("ggplotify", "pheatmap"))) {
  npx_data <- npx_data1 |>
    dplyr::filter(
      !stringr::str_detect(
        string = .data[["SampleID"]],
        pattern = "CONT"
      )
    )
  check_log <- OlinkAnalyze::check_npx(
    df = npx_data
  )

  # Heatmap
  OlinkAnalyze::olink_heatmap_plot(
    df = npx_data,
    check_log = check_log
  )

  # Heatmap with annotation
  OlinkAnalyze::olink_heatmap_plot(
    df = npx_data,
    check_log = check_log,
    variable_row_list = c("Time", "Site")
  )

  # Heatmap with calls from pheatmap
  OlinkAnalyze::olink_heatmap_plot(
    df = npx_data,
    check_log = check_log,
    cutree_rows = 3L
  )
}

Compute inter-quartile range (IQR) of multiplied by a fixed value

Description

Compute inter-quartile range (IQR) of multiplied by a fixed value

Usage

olink_iqr(df, quant_col, iqr_group, iqr_sd)

Arguments

df

Olink dataset

quant_col

Character vector of name of quantification column

iqr_group

Grouping for which to compute IQR for

iqr_sd

Fixed value to multiply IQR with

Value

Input dataset with two additional columns, iqr and iqr_sd

Function that performs a linear mixed model per protein.

Description

Fits a linear mixed effects model for every protein (by OlinkID) in every panel, using lmerTest::lmer and stats::anova. The function handles both factor and numerical variables and potential covariates.

Usage

olink_lmer(
  df,
  variable,
  check_log = NULL,
  outcome = "NPX",
  random,
  covariates = NULL,
  model_formula,
  return.covariates = FALSE,
  verbose = TRUE
)

Arguments

df

NPX data frame in long format with at least protein name (Assay), OlinkID, UniProt, 1-2 variables with at least 2 levels.

variable

Single character value or character array. Variables to test. If length > 1, the included variable names will be used in crossed analyses. Also takes ':' or '*' notation.

check_log

A named list returned by check_npx(). If NULL, check_npx() will be run internally using df.

outcome

Character. The dependent variable. Default: NPX.

random

Single character value or character array.

covariates

Single character value or character array. Default: NULL. Covariates to include. Takes ':' or '*' notation. Crossed analysis will not be inferred from main effects.

model_formula

(optional) Symbolic description of the model to be fitted in standard formula notation (e.g. NPX~A*B + (1|ID)). If provided, this will override the outcome, variable and covariates. arguments. Can be a string or of class stats::formula().

return.covariates

Boolean. Default: FALSE. Returns results for the covariates. Note: Adjusted p-values will be NA for the covariates.

verbose

Boolean. Default: TRUE. If information about removed samples, factor conversion and final model formula is to be printed to the console.

Details

Samples that have no variable information or missing factor levels are automatically removed from the analysis (specified in a message if verbose = TRUE). Character columns in the input dataset are automatically converted to factors (specified in a message if verbose = TRUE). Numerical variables are not converted to factors. If a numerical variable is to be used as a factor, this conversion needs to be done on the dataset before the function call.

Crossed analysis, i.e. A*B formula notation, is inferred from the variable argument in the following cases:

c('A','B')
c('A:B')
c('A:B', 'B') or c('A:B', 'A')

Inference is specified in a message if verbose = TRUE.

For covariates, crossed analyses need to be specified explicitly, i.e. two main effects will not be expanded with a c('A','B') notation. Main effects present in the variable takes precedence. The random variable only takes main effects. The formula notation of the final model is specified in a message if verbose = TRUE.

Output p-values are adjusted by stats::p.adjust according to the Benjamini-Hochberg method (“fdr”). Adjusted p-values are logically evaluated towards adjusted p-value<0.05.

Value

A "tibble" containing the results of fitting the linear mixed effects model to every protein by OlinkID, ordered by ascending p-value. Columns include:

Assay: "character" Protein symbol
OlinkID: "character" Olink specific ID
UniProt: "character" UniProt ID
Panel: "character" Name of Olink Panel
term: "character" term in model
sumsq: "numeric" sum of square
meansq: "numeric" mean of square
NumDF: "integer" numerator of degrees of freedom
DenDF: "numeric" denominator of decrees of freedom
statistic: "numeric" value of the statistic
p.value: "numeric" nominal p-value
Adjusted_pval: "numeric" adjusted p-value for the test (Benjamini&Hochberg)
Threshold: "character" if adjusted p-value is significant or not (< 0.05)

Examples


if (rlang::is_installed(pkg = c("lme4", "lmerTest", "broom"))) {
  #data
  npx_df <- OlinkAnalyze::npx_data1 |>
    dplyr::filter(
      !grepl(
        pattern = "control|ctrl",
        x = .data[["SampleID"]],
        ignore.case = TRUE
      )
    )

  # check data
  npx_df_check_log <- OlinkAnalyze::check_npx(
    df = npx_df
  )

  # Results in model NPX ~ Time * Treatment + (1 | Subject) + (1 | Site)
  lmer_results <- OlinkAnalyze::olink_lmer(
    df = npx_df,
    check_log = npx_df_check_log,
    variable = c("Time", "Treatment"),
    random = c("Subject", "Site")
  )
}

Function which performs a point-range plot per protein on a linear mixed model

Description

Generates a point-range plot faceted by Assay using ggplot and ggplot2::geom_pointrange based on a linear mixed effects model using lmerTest:lmer and emmeans::emmeans. See olink_lmer for details of input notation.

Usage

olink_lmer_plot(
  df,
  check_log = NULL,
  variable,
  outcome = "NPX",
  random,
  olinkid_list = NULL,
  covariates = NULL,
  x_axis_variable,
  col_variable = NULL,
  number_of_proteins_per_plot = 6L,
  verbose = FALSE,
  ...
)

Arguments

df

NPX data frame in long format with at least protein name (Assay), OlinkID, UniProt, 1-2 variables with at least 2 levels.

check_log

A named list returned by check_npx(). If NULL, check_npx() will be run internally using df.

variable

Single character value or character array. Variable(s) to test. If length > 1, the included variable names will be used in crossed analyses. Also takes ':' or '*' notation.

outcome

Character. The dependent variable. Default: NPX.

random

Single character value or character array.

olinkid_list

Character vector indicating which proteins (by OlinkID) for which to create figures.

covariates

Single character value or character array. Default: NULL. Covariates to include. Takes ':' or '*' notation. Crossed analysis will not be inferred from main effects.

x_axis_variable

Character. Which main effect to use as x-axis in the plot.

col_variable

Character. If provided, the interaction effect col_variable:x_axis_variable will be plotted with x_axis_variable on the x-axis and col_variable as color.

number_of_proteins_per_plot

Number plots to include in the list of point-range plots. Defaults to 6 plots per figure

verbose

Boolean. Default: True. If information about removed samples, factor conversion and final model formula is to be printed to the console.

...

coloroption for color ordering

Value

A list of objects of class "ggplot" showing point-range plot of NPX (y-axis) over x_axis_variable for each assay (facet), colored by col_variable if provided.

Examples


if (rlang::is_installed(pkg = c("lme4", "lmerTest", "broom", "emmeans"))) {
  #data
  npx_df <- OlinkAnalyze::npx_data1 |>
    dplyr::filter(
      !grepl(
        pattern = "control|ctrl",
        x = .data[["SampleID"]],
        ignore.case = TRUE
      )
    )

  # check data
  npx_df_check_log <- OlinkAnalyze::check_npx(
    df = npx_df
  )

  # Results in model NPX ~ Time * Treatment + (1 | Subject) + (1 | Site)
  lmer_results <- OlinkAnalyze::olink_lmer(
    df = npx_df,
    check_log = npx_df_check_log,
    variable = c("Time", "Treatment"),
    random = c("Subject")
  )

  # List of significant proteins for the interaction effect Time:Treatment
  assay_list <- lmer_results |>
    dplyr::filter(
    .data[["Threshold"]] == "Significant"
    & .data[["term"]] == "Time:Treatment"
  ) |>
    dplyr::distinct(.data[["OlinkID"]]) |>
    dplyr::pull()

  lst_pointrange_plots <- OlinkAnalyze::olink_lmer_plot(
    df = npx_df,
    check_log = npx_df_check_log,
    variable = c("Time", "Treatment"),
    random = c("Subject"),
    x_axis_variable = "Time",
    col_variable = "Treatment",
    verbose = TRUE,
    olinkid_list = assay_list,
    number_of_proteins_per_plot = 10L
  )
}

Function which performs a linear mixed model posthoc per protein.

Description

Similar to olink_lmer but performs a post-hoc analysis based on a linear mixed model effects model using lmerTest::lmer and emmeans::emmeans on proteins. See olink_lmer for details of input notation.

Usage

olink_lmer_posthoc(
  df,
  check_log = NULL,
  olinkid_list = NULL,
  variable,
  outcome = "NPX",
  random,
  model_formula,
  effect,
  effect_formula,
  covariates = NULL,
  mean_return = FALSE,
  post_hoc_padjust_method = "tukey",
  verbose = TRUE
)

Arguments

df

NPX data frame in long format with at least protein name (Assay), OlinkID, UniProt, 1-2 variables with at least 2 levels and subject identifier.

check_log

A named list returned by check_npx(). If NULL, check_npx() will be run internally using df.

olinkid_list

Character vector of OlinkID's on which to perform post hoc analysis. If not specified, all assays in df are used.

variable

Single character value or character array. Variables to test. If length > 1, the included variable names will be used in crossed analyses. Also takes ':' or '*' notation.

outcome

Character. The dependent variable. Default: NPX.

random

Single character value or character array.

model_formula

effect

Term on which to perform post-hoc. Character vector. Must be subset of or identical to variable.

effect_formula

covariates

Single character value or character array. Default: NULL. Covariates to include. Takes ':' or '*' notation. Crossed analysis will not be inferred from main effects.

mean_return

Boolean. If true, returns the mean of each factor level rather than the difference in means (default). Note that no p-value is returned for mean_return = TRUE and no adjustment is performed.

post_hoc_padjust_method

P-value adjustment method to use for post-hoc comparisons within an assay. Options include tukey, sidak, bonferroni and none.

verbose

Boolean. Default: True. If information about removed samples, factor conversion and final model formula is to be printed to the console.

Details

The function handles both factor and numerical variables and/or covariates. Differences in estimated marginal means are calculated for all pairwise levels of a given variable. Degrees of freedom are estimated using Satterthwaite’s approximation. The posthoc test for a numerical variable compares the difference in means of the outcome variable (default: NPX) for 1 standard deviation difference in the numerical variable, e.g. mean NPX at mean(numerical variable) versus mean NPX at mean(numerical variable) + 1*SD(numerical variable). The output tibble is arranged by ascending Tukey adjusted p-values.

Value

A "tibble" containing the results of the pairwise comparisons between given variable levels for proteins specified in olinkid_list (or full df). Columns include:

Assay: "character" Protein symbol
OlinkID: "character" Olink specific ID
UniProt: "character" UniProt ID
Panel: "character" Name of Olink Panel
term: "character" term in model
contrast: "character" the groups that were compared
estimate: "numeric" difference in mean NPX between groups
conf.low: "numeric" confidence interval for the mean (lower end)
conf.high: "numeric" confidence interval for the mean (upper end)
Adjusted_pval: "numeric" adjusted p-value for the test
Threshold: "character" if adjusted p-value is significant or not (< 0.05)

Examples


if (rlang::is_installed(pkg = c("lme4", "lmerTest", "emmeans", "broom"))) {
  #data
  npx_df <- OlinkAnalyze::npx_data1 |>
    dplyr::filter(
      !grepl(
        pattern = "control|ctrl",
        x = .data[["SampleID"]],
        ignore.case = TRUE
      )
    )

  # check data
  npx_df_check_log <- OlinkAnalyze::check_npx(
    df = npx_df
  )

  # Results in model NPX ~ Time * Treatment + (1 | Subject)
  lmer_results <- OlinkAnalyze::olink_lmer(
    df = npx_df,
    check_log = npx_df_check_log,
    variable = c("Time", "Treatment"),
    random = c("Subject")
  )

  # List of significant proteins for the interaction effect Time:Treatment
  assay_list <- lmer_results |>
    dplyr::filter(
    .data[["Threshold"]] == "Significant"
    & .data[["term"]] == "Time:Treatment"
  ) |>
    dplyr::distinct(.data[["OlinkID"]]) |>
    dplyr::pull()

  # Run lmer posthoc on significant proteins
  results_lmer_posthoc <- OlinkAnalyze::olink_lmer_posthoc(
    df = npx_df,
    check_log = npx_df_check_log,
    olinkid_list = assay_list,
    variable = c("Time", "Treatment"),
    effect = "Time:Treatment",
    random = "Subject",
    verbose = TRUE
  )

  # Estimate treated vs untreated at each timepoint
  results_lmer_posthoc <- OlinkAnalyze::olink_lmer_posthoc(
    df = npx_df,
    check_log = npx_df_check_log,
    olinkid_list = assay_list,
    model_formula = "NPX~Time*Treatment+(1|Subject)",
    effect_formula = "pairwise~Treatment|Time",
    verbose = TRUE
  )
}

Calculate LOD using Negative Controls or Fixed LOD

Description

Calculate LOD using Negative Controls or Fixed LOD

Usage

olink_lod(data, check_log = NULL, lod_file_path = NULL, lod_method = "NCLOD")

Arguments

data

npx data file

check_log

A named list returned by check_npx(). If NULL, check_npx() will be run internally using df.

lod_file_path

location of lod file from Olink. Only needed if lod_method = "FixedLOD" or "Both". Default NULL.

lod_method

method for calculating LOD using either "FixedLOD" or negative controls ("NCLOD"), or both ("Both"). Default NCLOD.

Value

A dataframe with 2 additional columns, LOD and PCNormalizedLOD if lod_method is FixedLOD or NCLOD. When Normalization = "Plate Control", LOD and PCNormalizedLOD are identical.

If lod_method is "Both", 4 additional columns will be added:

NCLOD - LOD calculated from negative controls and normalized based on normalization column
NCPCNormalizedLOD - PC Normalized LOD calculated from negative controls
FixedLOD - LOD calculated from fixed LOD file and normalized based on normalization column
FixedPCNormalizedLOD - PC Normalized LOD calculated from fixed LOD file

Examples

## Not run: 
  \donttest{
  try(
    {
      # This will fail if the files do not exist.

      # Import NPX data
      npx_data <- read_npx(filename = "path/to/npx_file")

      # Check NPX data
      check_log <- check_npx(df = npx_data)

      # Clean NPX data
      npx_data_clean <- clean_npx(
        df = npx_data,
        check_log = check_log
      )

      # Re-check NPX data
      check_log_clean <- check_npx(df = npx_data_clean)

      # Estimate LOD from negative controls
      npx_data_lod_nc <- olink_lod(
        data = npx_data_clean,
        check_log = check_log_clean,
        lod_method = "NCLOD"
      )

      # Estimate LOD from fixed LOD
      ## Locate the fixed LOD file
      lod_file_path <- "path/to/lod_file"

      npx_data_lod_Fixed <- olink_lod(
        data = npx_data,
        check_log = check_log_clean,
        lod_file_path = lod_file_path,
        lod_method = "FixedLOD"
      )

      # Estimate LOD from both negative controls and fixed LOD
      npx_data_lod_both <- olink_lod(
        data = npx_data,
        check_log = check_log_clean,
        lod_file_path = lod_file_path,
        lod_method = "Both"
      )
    }
  )
  }

## End(Not run)

Compute median of quantified value

Description

Compute median of quantified value

Usage

olink_median(df, quant_col, median_group)

Arguments

df

Olink dataset

quant_col

Character vector of name of quantification column

median_group

Grouping for which to compute median for

Value

Input dataset with one additional columns, median

Compute outliers based on median +/- iqr_sd * IQR

Description

Compute outliers based on median +/- iqr_sd * IQR

Usage

olink_median_iqr_outlier(df, quant_col, group, iqr_sd)

Arguments

df

Olink dataset

quant_col

Character vector of name of quantification column

group

Grouping for which to compute median for

iqr_sd

Fixed value to multiply IQR with

Value

Boolean vector with length equal to the number of input rows indicating outlier.

Check `datasets` and `reference_medians` for Olink identifiers not shared across datasets.

Description

Check datasets and reference_medians for Olink identifiers not shared across datasets.

Usage

olink_norm_input_assay_overlap(lst_df, reference_medians, lst_cols, norm_mode)

Arguments

lst_df

Named list of datasets to be normalized.

reference_medians

Dataset with columns "OlinkID" and "Reference_NPX". Used for reference median normalization.

lst_cols

Named list of vectors with the required column names for each dataset in lst_df.

norm_mode

Character string indicating the type of normalization to be performed. Expecting one of bridge, subset, ref_median or norm_cross_product. # nolint: line_length_linter

Value

A named list containing lst_df and reference_medians with assays shared across all datasets.

Author(s)

Klev Diamanti

Check inputs of `olink_normalization` function.

Description

This function is a wrapper of multiple help functions which check the inputs of the olink_normalization function.

Usage

olink_norm_input_check(
  df1,
  df1_check_log = NULL,
  df2,
  df2_check_log = NULL,
  overlapping_samples_df1,
  overlapping_samples_df2,
  df1_project_nr,
  df2_project_nr,
  reference_project,
  reference_medians
)

Arguments

df1

First dataset to be used in normalization (required).

df1_check_log

A named list returned by check_npx(). If NULL, check_npx() will be run internally using df1.

df2

Second dataset to be used in normalization.

df2_check_log

A named list returned by check_npx(). If NULL, check_npx() will be run internally using df2.

overlapping_samples_df1

Samples to be used for adjustment factor calculation in df1 (required).

overlapping_samples_df2

Samples to be used for adjustment factor calculation in df2.

df1_project_nr

Project name of first dataset (df1).

df2_project_nr

Project name of first dataset (df2).

reference_project

Project name of reference_project. Should be one of df1_project_nr or df2_project_nr. Indicates the project to which the other project is adjusted to.

reference_medians

Dataset with columns "OlinkID" and "Reference_NPX". Used for reference median normalization.

Details

The following checks are performed:

olink_norm_input_validate:
- Determines the normalization to be performed by intersecting inputs with internal global variable olink_norm_mode_combos.
- Returns the type of normalization to be performed from olink_norm_modes.
- Message with the normalization type.
- Error message if input is invalid.
olink_norm_input_class:
- Checks if all inputs are of the expected class:
  - df1, df2 and reference_medians: tibble or R6 ArrowObject
  - overlapping_samples_df1, overlapping_samples_df2, df1_project_nr, df2_project_nr and reference_project: Character vector
- Also checks the validity of names of project and reference project.
- Error if invalid input classes are detected.
olink_norm_input_check_df_cols:
- Detects the column names of input datasets df1 and df2 to allow for alternative names.
- Returns named list of column names to use downstream.
- Warning if Normalization column missing from all datasets.
- Warning if LOD is missing or if there are multiple LOD columns.
- Error if required columns are missing.
- Error if not all input datasets have or lack Normalization column.
- Error if input datasets have been quantified with different methods.
olink_norm_input_ref_medians:
- Checks validity of dataset containing reference_medians.
- Error if required columns are missing based on olink_norm_ref_median_cols.
- Error if columns are not of the correct class bases on olink_norm_ref_median_cols.
- Error if there duplicate assay identifiers.
olink_norm_input_check_samples:
- Check character vectors of reference sample identifiers for:
  - Being present in df1 and/or df2.
  - Duplicate identifiers.
olink_norm_input_clean_assays:
- Returns a named list with the updated df1, df2 and/or reference_medians.
- Removes assays that are not of the format OID followed by 5 digits.
- Removes assays that are marked with Normalization = EXCLUDED.
olink_norm_input_assay_overlap:
- Returns a named list with the updated df1, df2 and/or reference_medians.
- Remove assays not shared between df1 and df2, or between df1 and reference_medians.
olink_norm_input_norm_method:
- Check if all assays in df1 and df2 have been originally normalized with the same method "Intensity" or "Plate control".
- Warning is thrown if not.

Value

Named list of updated inputs to use for normalization:

df1: dataset df1.
df2: NULL if reference median normalization, or dataset df2.
overlapping_samples_df1: character vector of reference samples from df1.
overlapping_samples_df2: NULL if reference median normalization, or character vector of reference samples from df1.
df1_project_nr: name of df1 project.
df2_project_nr: NULL if reference median normalization, or name of df2 project.
reference_project: NULL if reference median normalization, or name of reference project.
reference_medians: NULL if bridge or subset normalization, or dataset with reference_medians.
df1_cols: column names of df1 to use downstream.
df2_cols: NULL if reference median normalization, or column names of df2 to use downstream.
norm_mode: one of bridge, subset, ref_median, and norm_cross_product indicating the normalization to be performed.

Author(s)

Klev Diamanti

Check columns of a list of datasets to be normalized.

Description

This function takes as input a named list of datasets and checks if their columns allow the normalization to be performed. The input may contain "tibble", "ArrowTable" or a mixture of them.

Usage

olink_norm_input_check_df_cols(lst_df, lst_cols)

Arguments

lst_df

Named list of datasets to be normalized.

lst_cols

Named list of check logs returned by check_npx.

Value

NULL unless there is an error.

Author(s)

Klev Diamanti

Examples


# One dataset
lst_df_v1 <- list(
  "p1" = npx_data1
) |>
  lapply(function(l_df) {
    l_df |>
      dplyr::select(
        -dplyr::any_of(c("Normalization"))
      )
  })

lst_df_v1_check <- lst_df_v1 |>
  lapply(function(.x) {
    check_npx(df = .x) |>
      suppressWarnings() |>
      suppressMessages() |>
      (\(.) .$col_names)()
  })

OlinkAnalyze:::olink_norm_input_check_df_cols(
  lst_df = lst_df_v1,
  lst_cols = lst_df_v1_check
)

# Two datasets
lst_df_v2 <- list(
  "p1" = npx_data1,
  "p2" = npx_data2
) |>
  lapply(function(l_df) {
    l_df |>
      dplyr::select(
        -dplyr::any_of(c("Normalization"))
      )
  })

lst_df_v2_check <- lst_df_v2 |>
  lapply(function(.x) {
    check_npx(df = .x) |>
      suppressWarnings() |>
      suppressMessages() |>
      (\(.) .$col_names)()
  })

OlinkAnalyze:::olink_norm_input_check_df_cols(
  lst_df = lst_df_v2,
  lst_cols = lst_df_v2_check
)

# Multiple datasets
lst_df_v3 <- list(
  "p1" = npx_data1,
  "p2" = npx_data2,
  "p3" = npx_data1,
  "p4" = npx_data2
) |>
  lapply(function(l_df) {
    l_df |>
      dplyr::select(
        -dplyr::any_of(c("Normalization"))
      )
  })

lst_df_v3_check <- lst_df_v3 |>
  lapply(function(.x) {
    check_npx(df = .x) |>
      suppressWarnings() |>
      suppressMessages() |>
      (\(.) .$col_names)()
  })

OlinkAnalyze:::olink_norm_input_check_df_cols(
  lst_df = lst_df_v3,
  lst_cols = lst_df_v3_check
)

Check reference samples to be used for normalization

Description

This function takes as input a two named lists of character vectors with matching names and checks the validity of the reference samples. In case of 1 set of df samples, then all checks are skipped as reference median normalization is to be performed.

Usage

olink_norm_input_check_samples(
  lst_df_samples,
  lst_ref_samples,
  lst_dup_samples,
  norm_mode
)

Arguments

lst_df_samples

Named list of all sample identifiers from datasets to be normalized.

lst_ref_samples

Named list of reference sample identifiers to be used for normalization.

lst_dup_samples

Named list of duplicate sample identifiers identified by check_npx.

norm_mode

Character string indicating the type of normalization to be performed. Expecting one of bridge, subset, ref_median or norm_cross_product. # nolint: line_length_linter

Value

NULL if no warning or error.

Author(s)

Klev Diamanti

Examples


# Reference median normalization
OlinkAnalyze:::olink_norm_input_check_samples(
  lst_df_samples = list(
    "p1" = unique(npx_data1$SampleID)
  ),
  lst_ref_samples = list(
    "p1" = npx_data1 |>
      dplyr::filter(
        !grepl(pattern = "CONTROL_SAMPLE",
        x = .data[["SampleID"]],
        fixed = TRUE)
      ) |>
      dplyr::pull(.data[["SampleID"]]) |>
      unique() |>
      sort() |>
      head(n = 6L)
  ),
  lst_dup_samples = list(
    "p1" = character(0L)
  ),
  norm_mode = "ref_median"
)

# Bridge normalization
ref_samples_bridge <- intersect(x = npx_data1$SampleID,
                                y = npx_data2$SampleID) |>
  (\(x) x[!grepl(pattern = "CONTROL_SAMPLE", x = x, fixed = TRUE)])()

OlinkAnalyze:::olink_norm_input_check_samples(
  lst_df_samples = list(
    "p1" = unique(npx_data1$SampleID),
    "p2" = unique(npx_data2$SampleID)
  ),
  lst_ref_samples = list(
    "p1" = ref_samples_bridge,
    "p2" = ref_samples_bridge
  ),
  lst_dup_samples = list(
    "p1" = character(0L),
    "p2" = character(0L)
  ),
  norm_mode = "bridge"
)

# Subset normalization
ref_samples_subset_1 <- npx_data1 |>
  dplyr::filter(
    !grepl(pattern = "CONTROL_SAMPLE",
           x = .data[["SampleID"]],
           fixed = TRUE)
    & .data[["QC_Warning"]] == "Pass"
  ) |>
  dplyr::pull(
    .data[["SampleID"]]
  ) |>
  unique()
ref_samples_subset_2 <- npx_data2 |>
  dplyr::filter(
    !grepl(pattern = "CONTROL_SAMPLE",
           x = .data[["SampleID"]],
           fixed = TRUE)
    & .data[["QC_Warning"]] == "Pass"
  ) |>
  dplyr::pull(
    .data[["SampleID"]]
  ) |>
  unique()

OlinkAnalyze:::olink_norm_input_check_samples(
  lst_df_samples = list(
    "p1" = unique(npx_data1$SampleID),
    "p2" = unique(npx_data2$SampleID)
  ),
  lst_ref_samples = list(
    "p1" = ref_samples_subset_1,
    "p2" = ref_samples_subset_2
  ),
  lst_dup_samples = list(
    "p1" = character(0L),
    "p2" = character(0L)
  ),
  norm_mode = "subset"
)

Check classes of input in olink_normalization function

Description

Check if df1, df2 and/or reference_medians are tibble or ArrowDataset datasets; if overlapping_samples_df1 and/or overlapping_samples_df2 are character vectors; and if df1_project_nr, df2_project_nr and/or reference_project are scalar character vectors.

Usage

olink_norm_input_class(
  df1,
  df2,
  overlapping_samples_df1,
  overlapping_samples_df2,
  df1_project_nr,
  df2_project_nr,
  reference_project,
  reference_medians,
  norm_mode
)

Arguments

df1

First dataset to be used in normalization (required).

df2

Second dataset to be used in normalization.

overlapping_samples_df1

Samples to be used for adjustment factor calculation in df1 (required).

overlapping_samples_df2

Samples to be used for adjustment factor calculation in df2.

df1_project_nr

Project name of first dataset (df1).

df2_project_nr

Project name of first dataset (df2).

reference_project

Project name of reference_project. Should be one of df1_project_nr or df2_project_nr. Indicates the project to which the other project is adjusted to.

reference_medians

Dataset with columns "OlinkID" and "Reference_NPX". Used for reference median normalization.

norm_mode

Scalar character from olink_norm_modes with the normalization to be performed. Output from olink_norm_input_validate.

Value

NULL unless there is an error.

Author(s)

Klev Diamanti

Check `datasets` and `reference_medians` for unexpected Olink identifiers or excluded assays

Description

Check datasets and reference_medians for unexpected Olink identifiers or excluded assays

Usage

olink_norm_input_clean_assays(lst_df, reference_medians, lst_cols, norm_mode)

Arguments

lst_df

Named list of datasets to be normalized.

reference_medians

Dataset with columns "OlinkID" and "Reference_NPX". Used for reference median normalization.

lst_cols

Named list of vectors with the required column names for each dataset in lst_df.

norm_mode

Character string indicating the type of normalization to be performed. Expecting one of bridge, subset, ref_median or norm_cross_product. # nolint: line_length_linter

Value

A named list containing lst_df and reference_medians stripped from unexpected Olink identifiers or excluded assays

Author(s)

Klev Diamanti

Check if bridge or cross-platform normalization

Description

A function to check whether we are to perform simple bridge normalization, or cross-platform (Olink Explore 3072 - Olink Explore HT/Olink Reveal) normalization.

The function uses the internal dataset eHT_e3072_mapping to determine the product source of each dataset. If both datasets originate from the same Olink product, then it will return bridge. If the datasets to be normalized originate from Olink Explore HT and Olink Explore 3072 or Olink Reveal and Olink Explore 3072, it will return norm_cross_product. In any other case an error is thrown.

Usage

olink_norm_input_cross_product(
  lst_df,
  lst_cols,
  reference_project,
  product_ids,
  ref_ids
)

Arguments

lst_df

Named list of datasets to be normalized.

lst_cols

Named list of vectors with the required column names for each dataset in lst_df.

reference_project

Project name of reference_project. Should be one of df1_project_nr or df2_project_nr. Indicates the project to which the other project is adjusted to.

product_ids

Named character vector with the Olink product name that each input dataset matches to.

ref_ids

Named character vector with df1_project_nr and df2_project_nr marked as "ref" and "not_ref".

Value

Character string indicating the type of normalization to be performed. One of bridge, subset, ref_median or norm_cross_product. # nolint: line_length_linter And the updated list of datasets in case of cross-platform normalization.

Author(s)

Klev Diamanti

Check `datasets` and `reference_medians` for Olink identifiers not shared across datasets.

Description

Check datasets and reference_medians for Olink identifiers not shared across datasets.

Usage

olink_norm_input_norm_method(lst_df, lst_cols)

Arguments

lst_df

Named list of datasets to be normalized.

lst_cols

Named list of vectors with the required column names for each dataset in lst_df.

Value

NULL if all assays are normalized with the same approach.

Author(s)

Klev Diamanti Kathleen Nevola

Check datasets of `reference_medians`

Description

Check datasets of reference_medians

Usage

olink_norm_input_ref_medians(reference_medians)

Arguments

reference_medians

Dataset with columns "OlinkID" and "Reference_NPX". Used for reference median normalization.

Value

NULL otherwise error.

Author(s)

Klev Diamanti

Validate inputs of normalization function

Description

This function takes as input some of the inputs of the Olink normalization function and checks the validity of the input.

Usage

olink_norm_input_validate(
  df1,
  df2,
  overlapping_samples_df1,
  overlapping_samples_df2,
  reference_medians
)

Arguments

df1

First dataset to be used in normalization (required).

df2

Second dataset to be used in normalization.

overlapping_samples_df1

Samples to be used for adjustment factor calculation in df1 (required).

overlapping_samples_df2

Samples to be used for adjustment factor calculation in df2.

reference_medians

Dataset with columns "OlinkID" and "Reference_NPX". Used for reference median normalization.

Details

Depending on the input the function will return:

Error: if the required components are lacking from the input or if the normalization cannot be performed.
Warning: if the normalization can be determined but extra inputs are provided. This will be followed by a message and the type of normalization to be performed.
Message: Information about the type of normalization to be performed.

Note that input are passed directly from the main olink_normalization function.

Value

Scalar character from olink_norm_modes if normalization can be determined from the input, otherwise see details.

Author(s)

Klev Diamanti

Identify names of product for each project

Description

Identify names of product for each project

Usage

olink_norm_product_id(lst_df, lst_cols)

Arguments

lst_df

Named list of datasets to be normalized.

lst_cols

Named list of vectors with the required column names for each dataset in lst_df.

Value

Named character vector with the Olink product name that each input dataset matches to.

Author(s)

Kathy Nevola Klev Diamanti

Identify reference project.

Description

Identify reference project.

Usage

olink_norm_reference_id(lst_product, reference_project)

Arguments

lst_product

Named character vector with the Olink product name that each input dataset matches to.

reference_project

Project name of reference_project. Should be one of df1_project_nr or df2_project_nr. Indicates the project to which the other project is adjusted to.

Value

Named character vector with df1_project_nr and df2_project_nr marked as "ref" and "not_ref".

Author(s)

Kathy Nevola Klev Diamanti

Normalize two Olink datasets

Description

Normalizes two Olink datasets to each other, or one Olink dataset to a reference set of medians values.

Usage

olink_normalization(
  df1,
  df2 = NULL,
  overlapping_samples_df1,
  overlapping_samples_df2 = NULL,
  df1_project_nr = "P1",
  df2_project_nr = "P2",
  reference_project = "P1",
  reference_medians = NULL,
  format = FALSE,
  df1_check_log = NULL,
  df2_check_log = NULL
)

Arguments

df1

First dataset to be used for normalization (required).

df2

Second dataset to be used for normalization. Required for bridge and subset normalization.

overlapping_samples_df1

Character vector of samples to be used for the calculation of adjustment factors in df1 (required).

overlapping_samples_df2

Character vector of samples to be used for the calculation of adjustment factors in df2. Required for subset normalization.

df1_project_nr

Project name of first dataset (required).

df2_project_nr

Project name of second dataset. Required for bridge and subset normalization.

reference_project

Project to be used as reference project. Should be one of df1_project_nr and df2_project_nr. Required for bridge and subset normalization.

reference_medians

Dataset with columns "OlinkID" and "Reference_NPX". Required for reference median normalization.

format

Boolean that controls whether the normalized dataset will be formatted for input to downstream analysis.

df1_check_log

A named list returned by check_npx(). If NULL, check_npx() will be run internally using df1.

df2_check_log

A named list returned by check_npx(). If NULL, check_npx() will be run internally using df2.

Details

The function handles four different types of normalization:

Bridge normalization: One of the datasets is adjusted to another using overlapping samples (bridge samples). Overlapping samples need to have the same identifiers in both datasets. Normalization is performed using the median of the pair-wise differences between the bridge samples in the two datasets. The two datasets are provided as df1 and df2, and the one being adjusted to is specified in the input reference_project; overlapping samples are specified in overlapping_samples_df1. Only overlapping_samples_df1 should be provided regardless of the dataset used as reference_project.
Subset normalization: One of the datasets is adjusted to another using a subset of samples from each. Normalization is performed using the differences of the medians between the subsets from the two datasets. Both overlapping_samples_df1 and overlapping_samples_df2 need to be provided, and sample identifiers do not need to be the same.
- A special case of subset normalization occurs when all samples (except control samples and samples with QC warnings) from each dataset are used for normalization; this special case is called intensity normalization. In intensity normalization all unique sample identifiers from df1 are provided as input in overlapping_samples_df1 and all unique sample identifiers from df2 are provided as input in overlapping_samples_df2.
Reference median normalization: One of the datasets (df1) is adjusted to a predefined set of adjustment factors. This is effectively subset normalization, but using differences of medians to pre-recorded median values. df1, overlapping_samples_df1, df1_project_nr and reference_medians need to be specified. Dataset df1 is normalized using the differences in median between the overlapping samples and the reference medians.
Cross-product normalization: One of the datasets is adjusted to another using the median of pair-wise differences of overlapping samples (bridge samples) or quantile smoothing using overlapping samples as reference to adjust the distributions. Overlapping samples need to have the same identifiers in both datasets. The two datasets are provided as df1 and df2, and the one being adjusted to is specified in the input reference_project; Note that in cross-product normalization the reference project is predefined, and in case the argument reference_project does not match the expected reference project an error will be returned. Overlapping samples are specified in overlapping_samples_df1. Only overlapping_samples_df1 should be provided regardless of the dataset used as reference_project. This functionality does not modify the column with original quantification values (e.g. NPX), instead it normalizes it with 2 different approaches in columns "MedianCenteredNPX" and "QSNormalizedNPX", and provides a recommendation in "BridgingRecommendation" about which of the two columns is to be used.

The output dataset is df1 if reference median normalization, or df2 appended to df1 if bridge, subset or cross-product normalization. The output dataset contains all original columns from the original dataset(s), and the columns:

"Project" and "Adj_factor" in case of reference median, bridge and subset normalization. The former marks the project of origin based on df1_project_nr and df2_project_nr, and the latter the adjustment factor that was applied to the non-reference dataset.
"Project", "OlinkID_E3072", "MedianCenteredNPX", "QSNormalizedNPX", "BridgingRecommendation" in case of cross-product normalization. The columns correspond to the project of origin based on df1_project_nr and df2_project_nr, the assay identifier in the non-reference project, the bridge-normalized quantification value, the quantile smoothing-normalized quantification value, and the recommendation about which of the two normalized values is more suitable for downstream analysis.

Value

Tibble or ArrowObject with the normalized dataset.

Examples


# prepare datasets
npx_df1 <- npx_data1 |>
  dplyr::mutate(
    Normalization = "Intensity"
  )
npx_df2 <- npx_data2 |>
  dplyr::mutate(
    Normalization = "Intensity"
  )

# check datasets

npx_df1_check <- check_npx(df = npx_df1)
npx_df2_check <- check_npx(df = npx_df2)

# bridge normalization

# overlapping samples - exclude control samples
overlap_samples <- intersect(x = npx_df1$SampleID,
                             y = npx_df2$SampleID) |>
  (\(x) x[!grepl("^CONTROL_SAMPLE", x)])()

# normalize
olink_normalization(
  df1 = npx_df1,
  df2 = npx_df2,
  overlapping_samples_df1 = overlap_samples,
  df1_project_nr = "P1",
  df2_project_nr = "P2",
  reference_project = "P1",
  df1_check_log = npx_df1_check,
  df2_check_log = npx_df2_check
)

# subset normalization

# find a suitable subset of samples from each dataset:
# exclude control samples
# exclude samples that do not pass QC
df1_samples <- npx_df1 |>
  dplyr::group_by(
    dplyr::pick(
      dplyr::all_of("SampleID")
    )
  )|>
  dplyr::filter(
    all(.data[["QC_Warning"]] == 'Pass')
  ) |>
  dplyr::ungroup() |>
  dplyr::filter(
    !grepl(pattern = "^CONTROL_SAMPLE", x = .data[["SampleID"]])
  ) |>
  dplyr::pull(
    .data[["SampleID"]]
  ) |>
  unique()
df2_samples <- npx_df2 |>
  dplyr::group_by(
    dplyr::pick(
      dplyr::all_of("SampleID")
    )
  )|>
  dplyr::filter(
    all(.data[["QC_Warning"]] == 'Pass')
  ) |>
  dplyr::ungroup() |>
  dplyr::filter(
    !grepl(pattern = "^CONTROL_SAMPLE", x = .data[["SampleID"]])
  ) |>
  dplyr::pull(
    .data[["SampleID"]]
  ) |>
  unique()

# select a subset of samples from each set from above
df1_subset <- sample(x = df1_samples, size = 16L)
df2_subset <- sample(x = df2_samples, size = 20L)

# normalize
olink_normalization(
  df1 = npx_df1,
  df2 = npx_df2,
  overlapping_samples_df1 = df1_subset,
  overlapping_samples_df2 = df2_subset,
  df1_project_nr = "P1",
  df2_project_nr = "P2",
  reference_project = "P1",
  df1_check_log = npx_df1_check,
  df2_check_log = npx_df2_check
)

# special case of subset normalization using all samples
olink_normalization(
  df1 = npx_df1,
  df2 = npx_df2,
  overlapping_samples_df1 = df1_samples,
  overlapping_samples_df2 = df2_samples,
  df1_project_nr = "P1",
  df2_project_nr = "P2",
  reference_project = "P1",
  df1_check_log = npx_df1_check,
  df2_check_log = npx_df2_check
)

# reference median normalization

# For the sake of this example, set the reference median to 1
ref_med_df <- npx_data1 |>
  dplyr::select(
    dplyr::all_of(
      c("OlinkID")
    )
  ) |>
  dplyr::distinct() |>
  dplyr::mutate(
    Reference_NPX = runif(n = dplyr::n(),
                          min = -1,
                          max = 1)
  )

# normalize
olink_normalization(
  df1 = npx_df1,
  overlapping_samples_df1 = df1_subset,
  reference_medians = ref_med_df,
  df1_check_log = npx_df1_check
)

# cross-product normalization

# get reference samples
overlap_samples_product <- intersect(
  x = unique(OlinkAnalyze:::data_ht_small$SampleID),
  y = unique(OlinkAnalyze:::data_3k_small$SampleID)
) |>
  (\(.) .[!grepl("CONTROL", .)])()

# check datasets

npx_ht_check <- check_npx(df = OlinkAnalyze:::data_ht_small)
npx_3k_check <- check_npx(df = OlinkAnalyze:::data_3k_small)

# normalize
olink_normalization(
  df1 = OlinkAnalyze:::data_ht_small,
  df2 = OlinkAnalyze:::data_3k_small,
  overlapping_samples_df1 = overlap_samples_product,
  df1_project_nr = "proj_ht",
  df2_project_nr = "proj_3k",
  reference_project = "proj_ht",
  format = FALSE,
  df1_check_log = npx_ht_check,
  df2_check_log = npx_3k_check
)

Bridge normalization of all proteins between two NPX projects.

Description

Normalizes two NPX projects (data frames) using shared samples.

This function is a wrapper of olink_normalization.

Usage

olink_normalization_bridge(
  project_1_df,
  project_2_df,
  bridge_samples,
  project_1_name = "P1",
  project_2_name = "P2",
  project_ref_name = "P1",
  format = FALSE,
  project_1_check_log = NULL,
  project_2_check_log = NULL
)

Arguments

project_1_df

Data frame of the first project (required).

project_2_df

Data frame of the second project (required).

bridge_samples

Named list of 2 arrays containing SampleID of shared samples to be used for the calculation of adjustment factor. The names of the two arrays should be DF1 and DF2 corresponding to projects 1 and 2, respectively. Arrays should be of equal length and index of each entry should correspond to the same sample. (required)

project_1_name

Name of the first project (default: P1).

project_2_name

Name of the second project (default: P2).

project_ref_name

Name of the project to be used as reference set. Needs to be one of the project_1_name or project_2_name. It marks the project to which the other project will be adjusted to (default: P1).

format

Boolean that controls whether the normalized dataset will be formatted for input to downstream analysis.

project_1_check_log

A named list returned by check_npx(). If NULL, check_npx() will be run internally using project_1_df. (default: NULL)

project_2_check_log

A named list returned by check_npx(). If NULL, check_npx() will be run internally using project_2_df. (default: NULL)

Details

In bridging normalization one of the projects is adjusted to another using shared samples (bridge samples). It is not necessary for the shared samples to be named the same in each project. Adjustment between the two projects is made using the median of the paired differences between the shared samples. The two data frames are inputs project_1_df and project_2_df, the one being adjusted to is specified in the input project_ref_name and the shared samples are specified in bridge_samples.

Value

A "tibble" of NPX data in long format containing normalized NPX values, including adjustment factors and name of project.

Examples


# prepare datasets
npx_df1 <- npx_data1 |>
  dplyr::filter(
    !stringr::str_detect(string = .data[["SampleID"]],
                         pattern = "CONTROL_")
  ) |>
  dplyr::select(
    -dplyr::all_of("Project")
  ) |>
  dplyr::mutate(
    Normalization = "Intensity"
  )
npx_df2 <- npx_data2 |>
  dplyr::filter(
    !stringr::str_detect(string = .data[["SampleID"]],
                         pattern = "CONTROL_")
  ) |>
  dplyr::select(
    -dplyr::all_of("Project")
  ) |>
  dplyr::mutate(
    Normalization = "Intensity"
  )

# Find overlapping samples, but exclude Olink control
overlap_samples <- dplyr::intersect(x = unique(npx_df1[["SampleID"]]),
                                    y = unique(npx_df2[["SampleID"]]))
overlap_samples_list <- list("DF1" = overlap_samples,
                             "DF2" = overlap_samples)

# check npx
df1_check_log <- OlinkAnalyze::check_npx(df = npx_df1)
df2_check_log <- OlinkAnalyze::check_npx(df = npx_df2)

# Normalize
OlinkAnalyze::olink_normalization_bridge(
  project_1_df = npx_df1,
  project_2_df = npx_df2,
  bridge_samples = overlap_samples_list,
  project_1_name = "P1",
  project_2_name = "P2",
  project_ref_name = "P1",
  project_1_check_log = df1_check_log,
  project_2_check_log = df2_check_log
)

Identify if assays shared between Olink Explore 3072 and Olink Explore HT can be bridged

Description

The function uses a dataset from Olink Explore 3072 and a dataset from Olink Explore HT, and examines if the matched assays between the two products can be normalized to each other. The input datasets should be exported from Olink software and should not be altered prior to importing them to this function.

Usage

olink_normalization_bridgeable(lst_df, ref_cols, not_ref_cols, seed = 1)

Arguments

lst_df

A named list of the 2 input datasets. First element should be the reference dataset from Olink Explore HT and the second element should originate from Olink Explore 3072.

ref_cols

A named list with the column names to use. Exported from olink_norm_input_check.

not_ref_cols

A named list with the column names from the non-reference dataset. Exported from olink_norm_input_check.

seed

Integer random seed (Default: seek = 1).

Details

All processes below assume that the first element from lst_df is the reference dataset (e.g. Olink Explore HT), and the other element of the list is the non-reference dataset (e.g. Olink Explore 3072). The input datasets have to be pre-processed by olink_norm_input_check which will take care of mapping of assay identifiers and various checks. Also, the input datasets should exclusively contain datapoints from bridge samples. When this function is called from the function olink_normalization, then the list is created seamlessly in the background, and the datasets have been already processed by olink_norm_input_check.

The input ref_cols is a named list masking column names of the reference dataset. This list is generated automatically from olink_norm_input_check when it is called from olink_normalization. In addition, olink_normalization has also utilized norm_internal_rename_cols to rename the columns of the non-reference dataset according to the ones of the reference dataset, hence all column names should match.

Value

A "tibble" in long format with the following columns:

OlinkID: Underscore-separated Olink identifiers of matching assays between Olink Explore HT and Olink Explore 3072.
BridgingRecommendation: A character vector indicating whether the matching assays are considered as bridgeable or not, and the recommended type of normalization to perform.

Author(s)

Amrita Kar Marianne Sandin Danai G. Topouza Klev Diamanti

Examples


# check_npx
data_ht_small_check <- OlinkAnalyze::check_npx(
  df = OlinkAnalyze:::data_ht_small
)

data_3k_small_check <- OlinkAnalyze::check_npx(
  df = OlinkAnalyze:::data_3k_small
)

# check input datasets
data_explore_check <- OlinkAnalyze:::olink_norm_input_check(
  df1 = OlinkAnalyze:::data_3k_small,
  df1_check_log = data_3k_small_check,
  df2 = OlinkAnalyze:::data_ht_small,
  df2_check_log = data_ht_small_check,
  overlapping_samples_df1 = intersect(
    x = unique(OlinkAnalyze:::data_3k_small$SampleID),
    y = unique(OlinkAnalyze:::data_ht_small$SampleID)
  ) |>
    (\(x) x[!grepl("CONTROL", x)])() |>
    head(20L),
  overlapping_samples_df2 = NULL,
  df1_project_nr = "P1",
  df2_project_nr = "P2",
  reference_project = "P2",
  reference_medians = NULL
)

# create lst_df
lst_df <- list(
  data_explore_check$ref_df,
  data_explore_check$not_ref_df
)
names(lst_df) <- c(data_explore_check$ref_name,
                   data_explore_check$not_ref_name)

# create ref_cols
ref_cols <- data_explore_check$ref_check_log$col_names
not_ref_cols <- data_explore_check$not_ref_check_log$col_names

# run olink_normalization_bridgeable
is_bridgeable_result <- OlinkAnalyze:::olink_normalization_bridgeable(
  lst_df = lst_df,
  ref_cols = ref_cols,
  not_ref_cols = not_ref_cols,
  seed = 1
)

Format the output of olink_normalization for seamless use with downstream analysis functions.

Description

For within-product bridging and subset normalization:

Adds non-overlapping assays between projects to the bridged file without adjustment.
Removes external controls, except sample controls.

For cross-product bridging:

Adds non-overlapping assays between projects and not bridgeable assays to the bridged file without adjustment.
Removes external controls, except sample controls.
Replaces the NPX values of the non-reference project by the Median Centered or QS Normalized NPX, according to the Bridging Recommendation.
Edits the BridgingRecommendation column to indicate whether an assay is NotBridgeable, NotOverlapping, MedianCentering, or QuantileSmoothing bridged.
Replaces OlinkID by the concatenation of each product's OlinkIDs to record the OlinkIDs from both projects for bridgeable assays. Assays that are NotBridgeable or NotOverlapping retain their original OlinkIDs and NPX values.
Replaces Panel by the concatenation of each product panel per assay. Assays that are NotBridgeable or NotOverlapping retain their original Panel value.
Removes MedianCenteredNPX, QSNormalizedNPX, OlinkID_E3072 columns.

#' For reference median normalization:

Adds non-overlapping assays from the dataset, but not from the reference medians, to the bridged file without adjustment.
Removes external controls, except sample controls.

In all cases, normalization and formatting changes are applied to the NPX column. The contents of the Count and PCNormalizedNPX columns remain unchanged.

Usage

olink_normalization_format(df_norm, lst_check)

Arguments

df_norm

A "tibble" of Olink data in long format resulting from the olink_normalization function.

lst_check

Normalization input list checks generated by olink_norm_input_check.

Value

A "tibble" of Olink data in long format containing both input datasets with the bridged NPX quantifications, with the above modifications.

Author(s)

Danai G. Topouza Klev Diamanti

Examples


# bridge samples
bridge_samples <- intersect(
  x = unique(OlinkAnalyze:::data_ht_small$SampleID),
  y = unique(OlinkAnalyze:::data_3k_small$SampleID)
) |>
  (\(x) x[!grepl("CONTROL", x)])()

# check_npx
data_ht_small_check <- OlinkAnalyze::check_npx(
  df = OlinkAnalyze:::data_ht_small
)

data_3k_small_check <- OlinkAnalyze::check_npx(
  df = OlinkAnalyze:::data_3k_small
)

# run olink_normalization
df_norm <- OlinkAnalyze::olink_normalization(
  df1 = OlinkAnalyze:::data_ht_small,
  df2 = OlinkAnalyze:::data_3k_small,
  overlapping_samples_df1 = bridge_samples,
  df1_project_nr = "Explore HT",
  df2_project_nr = "Explore 3072",
  reference_project = "Explore HT",
  format = FALSE,
  df1_check_log = data_ht_small_check,
  df2_check_log = data_3k_small_check
)

# generate lst_check
lst_check <- OlinkAnalyze:::olink_norm_input_check(
  df1 = OlinkAnalyze:::data_ht_small,
  df1_check_log = data_ht_small_check,
  df2 = OlinkAnalyze:::data_3k_small,
  df2_check_log = data_3k_small_check,
  overlapping_samples_df1 = bridge_samples,
  overlapping_samples_df2 = NULL,
  df1_project_nr = "Explore HT",
  df2_project_nr = "Explore 3072",
  reference_project = "Explore HT",
  reference_medians = NULL
)

# format output
OlinkAnalyze:::olink_normalization_format(
  df_norm = df_norm,
  lst_check = lst_check
)

Bridge and/or subset normalization of all proteins among multiple NPX projects.

Description

This function normalizes pairs of NPX projects (data frames) using shared samples or subsets of samples.

This function is a wrapper of olink_normalization_bridge and olink_normalization_subset.

Usage

olink_normalization_n(norm_schema)

Arguments

norm_schema

A tibble with more than 1 rows and (strictly) the following columns: "order", "name", "data", "samples", "normalization_type", "normalize_to". See "Details" for the structure of the data frame (required)

Details

The input of this function is a tibble that contains all the necessary information to normalize multiple NPX projects. This tibble is called the normalization schema. The basic idea is that every row of the data frame is a separate project to be normalized. We assume that there is always one baseline project that does not normalize to any other. All other project normalize to one or more projects. The function handles projects that are normalized in a chain, for example:

1. project 2 normalizes to project 1, and project 3 normalizes to project 2.
2. project 2 normalizes to project 1, and project 3 normalizes to the combined data frame of projects 1 and 2 (that is already normalized).

The function can also handle a mixed schema of bridge and subset normalization.

Specifications of the normalization schema data frame:

order: should strictly be a numeric or integer array with unique identifiers for each project. It is necessary that this array starts from 1 and that it contains no NAs.
name: should strictly be a character array with unique identifiers for each project. Each entry should represent the name of the project located in the same row. No NAs are allowed.
data: a named list of NPX data frames representing the projects to be normalized. Names of the items of the list should be identical to "names". No NAs are allowed.
samples: a two-level nested named list of sample identifiers from each NPX project from "data". Names of the first level of the nested list should be identical to "names" and to the names of the list from "data". Projects that will be used only as reference should have their corresponding element in the list as NA, while all other projects should contain a named list of 2 arrays containing identifiers of samples to be used for the calculation of adjustment factor. The names of the two arrays should be DF1 and DF2 corresponding to the reference project and the project in the current row, respectively. For bridge normalization arrays should be of equal length and the index of each entry should correspond to the same sample. For subset normalization arrays do not need to be of equal length and the order the samples appear in does not matter. DF1 might contain sample identifiers from more than one project as long as the project in the current row is to be normalized to multiple other projects.
normalization_type: a character array containing the flags "Bridge" or "Subset". Projects that will be used only as reference should have their corresponding element in the array as NA, while all other projects should contain a flag. For the time being the flag "Median" is not supported.
normalize_to: a character array pointing to the project this project is to be normalized to. Elements of the array should be exclusively from the "order" column. Elements of the array may be comma-separated if the project is to be normalized to multiple projects.

Value

A "tibble" of NPX data in long format containing normalized NPX values, including adjustment factors and name of project.

Examples


#### Bridge normalization of two projects

# prepare datasets
npx_df1 <- npx_data1 |>
  dplyr::filter(
    !stringr::str_detect(string = .data[["SampleID"]],
                         pattern = "CONTROL_")
  ) |>
  dplyr::select(
    -dplyr::all_of("Project")
  ) |>
  dplyr::mutate(
    Normalization = "Intensity"
  )
npx_df2 <- npx_data2 |>
  dplyr::filter(
    !stringr::str_detect(string = .data[["SampleID"]],
                         pattern = "CONTROL_")
  ) |>
  dplyr::select(
    -dplyr::all_of("Project")
  ) |>
  dplyr::mutate(
    Normalization = "Intensity"
  )

# Find overlapping samples, but exclude Olink control
overlap_samples <- dplyr::intersect(x = unique(npx_df1[["SampleID"]]),
                                    y = unique(npx_df2[["SampleID"]]))
overlap_samples_list <- list("DF1" = overlap_samples,
                             "DF2" = overlap_samples)

# create tibble for input
norm_schema_bridge <- dplyr::tibble(
  order              = c(1L, 2L),
  name               = c("NPX_DF1", "NPX_DF2"),
  data               = list("NPX_DF1" = npx_df1,
                            "NPX_DF2" = npx_df2),
  samples            = list("NPX_DF1" = NA_character_,
                            "NPX_DF2" = overlap_samples_list),
  normalization_type = c(NA_character_, "Bridge"),
  normalize_to       = c(NA_character_, "1")
)

# normalize
OlinkAnalyze::olink_normalization_n(
  norm_schema = norm_schema_bridge
)

#### Subset normalization of two projects

# datasets
npx_df1 <- npx_data1 |>
  dplyr::filter(
    !stringr::str_detect(string = .data[["SampleID"]],
                         pattern = "CONTROL_")
  ) |>
  dplyr::select(
    -dplyr::all_of("Project")
  ) |>
  dplyr::mutate(
    Normalization = "Intensity"
  )
npx_df2 <- npx_data2 |>
  dplyr::filter(
    !stringr::str_detect(string = .data[["SampleID"]],
                         pattern = "CONTROL_")
  ) |>
  dplyr::select(
    -dplyr::all_of("Project")
  ) |>
  dplyr::mutate(
    Normalization = "Intensity"
  )

# Find a suitable subset of samples from both projects, but exclude Olink
# controls and samples that fail QC.
df1_samples <- npx_df1 |>
  dplyr::filter(
    !stringr::str_detect(string = .data[["SampleID"]],
                         pattern = "CONTROL_")
  ) |>
  dplyr::group_by(
    dplyr::across(
      dplyr::all_of("SampleID")
    )
  ) |>
  dplyr::filter(
    all(.data[["QC_Warning"]] == "Pass")
  ) |>
  dplyr::pull(
    .data[["SampleID"]]
  ) |>
  unique() |>
  sample(
    size = 16L,
    replace = FALSE
  )
df2_samples <- npx_df2 |>
  dplyr::filter(
    !stringr::str_detect(string = .data[["SampleID"]],
                         pattern = "CONTROL_")
  ) |>
  dplyr::group_by(
    dplyr::across(
      dplyr::all_of("SampleID")
    )
  ) |>
  dplyr::filter(
    all(.data[["QC_Warning"]] == "Pass")
  ) |>
  dplyr::pull(
    .data[["SampleID"]]
  ) |>
  unique() |>
  sample(
    size = 16L,
    replace = FALSE
  )

# create named list
subset_samples_list <- list("DF1" = df1_samples,
                            "DF2" = df2_samples)

# create tibble for input
norm_schema_subset <- dplyr::tibble(
  order              = c(1L, 2L),
  name               = c("NPX_DF1", "NPX_DF2"),
  data               = list("NPX_DF1" = npx_df1,
                            "NPX_DF2" = npx_df2),
  samples            = list("NPX_DF1" = NA_character_,
                            "NPX_DF2" = subset_samples_list),
  normalization_type = c(NA_character_, "Subset"),
  normalize_to       = c(NA_character_, "1")
)

# Normalize
OlinkAnalyze::olink_normalization_n(
  norm_schema = norm_schema_subset
)

#### Subset normalization  of two projects using all samples

# datasets
npx_df1 <- npx_data1 |>
  dplyr::filter(
    !stringr::str_detect(string = .data[["SampleID"]],
                         pattern = "CONTROL_")
  ) |>
  dplyr::select(
    -dplyr::all_of("Project")
  ) |>
  dplyr::mutate(
    Normalization = "Intensity"
  )
npx_df2 <- npx_data2 |>
  dplyr::filter(
    !stringr::str_detect(string = .data[["SampleID"]],
                         pattern = "CONTROL_")
  ) |>
  dplyr::select(
    -dplyr::all_of("Project")
  ) |>
  dplyr::mutate(
    Normalization = "Intensity"
  )

# Find a suitable subset of samples from both projects, but exclude Olink
# controls and samples that fail QC.
df1_samples_all <- npx_df1 |>
  dplyr::filter(
    !stringr::str_detect(string = .data[["SampleID"]],
                         pattern = "CONTROL_")
  ) |>
  dplyr::group_by(
    dplyr::across(
      dplyr::all_of("SampleID")
    )
  ) |>
  dplyr::filter(
    all(.data[["QC_Warning"]] == "Pass")
  ) |>
  dplyr::pull(
    .data[["SampleID"]]
  ) |>
  unique()
df2_samples_all <- npx_df2 |>
  dplyr::filter(
    !stringr::str_detect(string = .data[["SampleID"]],
                         pattern = "CONTROL_")
  ) |>
  dplyr::group_by(
    dplyr::across(
      dplyr::all_of("SampleID")
    )
  ) |>
  dplyr::filter(
    all(.data[["QC_Warning"]] == "Pass")
  ) |>
  dplyr::pull(
    .data[["SampleID"]]
  ) |>
  unique()

# create named list
subset_samples_all_list <- list("DF1" = df1_samples_all,
                                "DF2" = df2_samples_all)

# create tibble for input
norm_schema_subset_all <- dplyr::tibble(
  order              = c(1L, 2L),
  name               = c("NPX_DF1", "NPX_DF2"),
  data               = list("NPX_DF1" = npx_df1,
                            "NPX_DF2" = npx_df2),
  samples            = list("NPX_DF1" = NA_character_,
                            "NPX_DF2" = subset_samples_all_list),
 normalization_type = c(NA_character_, "Subset"),
 normalize_to       = c(NA_character_, "1")
)

# Normalize
OlinkAnalyze::olink_normalization_n(
  norm_schema = norm_schema_subset_all
)

#### Multi-project normalization using bridge and subset samples

## NPX data frames to bridge
npx_df1 <- npx_data1 |>
  dplyr::filter(
    !stringr::str_detect(string = .data[["SampleID"]],
                         pattern = "CONTROL_")
  ) |>
  dplyr::select(
    -dplyr::all_of("Project")
  ) |>
  dplyr::mutate(
    Normalization = "Intensity"
  )
npx_df2 <- npx_data2 |>
  dplyr::filter(
    !stringr::str_detect(string = .data[["SampleID"]],
                         pattern = "CONTROL_")
  ) |>
  dplyr::select(
    -dplyr::all_of("Project")
  ) |>
  dplyr::mutate(
    Normalization = "Intensity"
  )

# manipulating the sample NPX datasets to create another two random ones
npx_df3 <- npx_data2 |>
  dplyr::mutate(
    SampleID = paste(.data[["SampleID"]], "_mod", sep = ""),
    PlateID = paste(.data[["PlateID"]], "_mod", sep = ""),
    NPX = sample(x = .data[["NPX"]], size = dplyr::n(), replace = FALSE)
  ) |>
  dplyr::filter(
    !stringr::str_detect(string = .data[["SampleID"]],
                         pattern = "CONTROL_")
  ) |>
  dplyr::select(
    -dplyr::all_of("Project")
  ) |>
  dplyr::mutate(
    Normalization = "Intensity"
  )

npx_df4 <- npx_data1 |>
  dplyr::mutate(
    SampleID = paste(.data[["SampleID"]], "_mod2", sep = ""),
    PlateID = paste(.data[["PlateID"]], "_mod2", sep = ""),
    NPX = sample(x = .data[["NPX"]], size = dplyr::n(), replace = FALSE)
  ) |>
  dplyr::filter(
    !stringr::str_detect(string = .data[["SampleID"]],
                         pattern = "CONTROL_")
  ) |>
  dplyr::select(
    -dplyr::all_of("Project")
  ) |>
  dplyr::mutate(
    Normalization = "Intensity"
  )

## samples to use for normalization
# Bridge samples with same identifiers between npx_df1 and npx_df2
overlap_samples <- dplyr::intersect(unique(npx_df1[["SampleID"]]),
                                    unique(npx_df2[["SampleID"]]))
overlap_samples_df1_df2 <- list("DF1" = overlap_samples,
                                "DF2" = overlap_samples)

# Bridge samples with different identifiers between npx_df2 and npx_df3
overlap_samples_df2_df3 <- list(
  "DF1" = sample(x = setdiff(x = unique(npx_df2[["SampleID"]]),
                             y = overlap_samples),
                 size = 10L,
                 replace = FALSE),
  "DF2" = sample(x = setdiff(x = unique(npx_df3[["SampleID"]]),
                             y = overlap_samples),
                 size = 10L,
                 replace = FALSE)
)

# Samples to use for intensity normalization between npx_df4 and the
# normalized dataset of npx_df1 and npx_df2
overlap_samples_df12_df4 <- list(
  "DF1" = sample(
    x = c(unique(npx_df1[["SampleID"]]), unique(npx_df2[["SampleID"]])),
    size = 100L,
    replace = FALSE
  ) |>
    unique(),
  "DF2" = sample(
    x = unique(npx_df4[["SampleID"]]),
    size = 40L,
    replace = FALSE
  )
)

# create tibble for input
norm_schema_n <- dplyr::tibble(
  order              = c(1L, 2L, 3L, 4L),
  name               = c("NPX_DF1", "NPX_DF2", "NPX_DF3", "NPX_DF4"),
  data               = list("NPX_DF1" = npx_df1,
                            "NPX_DF2" = npx_df2,
                            "NPX_DF3" = npx_df3,
                            "NPX_DF4" = npx_df4),
  samples            = list("NPX_DF1" = NA_character_,
                            "NPX_DF2" = overlap_samples_df1_df2,
                            "NPX_DF3" = overlap_samples_df2_df3,
                            "NPX_DF4" = overlap_samples_df12_df4),
  normalization_type = c(NA_character_, "Bridge", "Bridge", "Subset"),
  normalize_to       = c(NA_character_, "1", "2", "1,2")
)

OlinkAnalyze::olink_normalization_n(
  norm_schema = norm_schema_n
)

An internal function to perform checks on the input of the function olink_normalization_n.

Description

An internal function to perform checks on the input of the function olink_normalization_n.

Usage

olink_normalization_n_check(norm_schema)

Arguments

norm_schema

A tibble with more than 1 rows and (strictly) the following columns: "order", "name", "data", "samples", "normalization_type", "normalize_to". See above for details of the structure of the data frame. See details in help for olink_normalization_n. (required)

Value

a character message. If the message is "TRUE" then all checks passed, otherwise an error message will be printed.

An internal function to perform checks on the input project names in the functions olink_normalization_bridge and olink_normalization_subset. The function is expected to run all checks on project names to make sure that normalization can be performed smoothly. It should work independently of the function calling it.

Description

An internal function to perform checks on the input project names in the functions olink_normalization_bridge and olink_normalization_subset. The function is expected to run all checks on project names to make sure that normalization can be performed smoothly. It should work independently of the function calling it.

Usage

olink_normalization_project_name_check(
  project_1_name,
  project_2_name,
  project_ref_name
)

Arguments

project_1_name

Name of project 1 (required)

project_2_name

Name of project 2 (required)

project_ref_name

Name of reference project (required)

Value

a character message. If the message is "TRUE" then all checks passed, otherwise an error message will be printed.

Quantile smoothing normalization of all proteins between two NPX projects.

Description

This function uses bridge samples to map quantiles of the non-reference dataset to the ones of the reference dataset. Mapped quantiles are used to transform the quantifications of the the non-reference dataset to the reference.

Usage

olink_normalization_qs(
  lst_df,
  ref_cols,
  not_ref_cols,
  bridge_samples,
  prod_uniq
)

Arguments

lst_df

A named list of the 2 input datasets. First element should be the reference dataset from Olink Explore HT and the second element should originate from Olink Explore 3072. (required)

ref_cols

A named list with the column names to use. Exported from olink_norm_input_check. (required)

not_ref_cols

A named list with the column names from the non-reference dataset. Exported from olink_norm_input_check. (required)

bridge_samples

Character vector of samples to be used for the quantile mapping. (required)

prod_uniq

Name of products (not_ref, ref)

Details

In the case when a study is separated into multiple projects, an additional normalization step is needed to allow the data to be comparable across projects. Across different Olink products, some of the assays exist in corresponding but distinct NPX spaces. For those assays, the median of paired differences is insufficient for bridging as it only considers one anchor point (the median/50% quantile). Instead, quantile smoothing (QS) using multiple anchor points (5%, 10%, 25%, 50%, 75%, 90% and 95% quantiles) is favored to map the Explore 3072 data to the Explore HT distribution. The olink_normalization_qs() performs quantile smoothing bridging normalization between datasets from two Olink products (for example Olink Explore 3072 and Olink Explore HT) by performing the following steps:

An empirical cumulative distribution function is used to map datapoints for the bridging samples from one product to the equivalent space in the other product.
A spline regression model is constructed using unmapped and mapped data from one product, using anchor points from the quantiles defined above.
The spline regression model is used to predict the normalized NPX values for all datapoints

More information on quantile smoothing and between product normalization can be found in the Bridging Olink Explore 3072 to Olink Explore HT tutorial.

Value

A "tibble" of Olink data in long format containing both input datasets with the quantile normalized quantifications.

Author(s)

Amrita Kar Marianne Sandin Masoumeh Sheikhi Klev Diamanti

Examples


# Bridge samples
bridge_samples <- intersect(
  x = unique(OlinkAnalyze:::data_ht_small$SampleID),
  y = unique(OlinkAnalyze:::data_3k_small$SampleID)
) |>
  (\(x) x[!grepl("CONTROL", x)])()

# check_npx
data_ht_small_check <- OlinkAnalyze::check_npx(
  df = OlinkAnalyze:::data_ht_small
)

data_3k_small_check <- OlinkAnalyze::check_npx(
  df = OlinkAnalyze:::data_3k_small
)

# Run the internal function olink_norm_input_check
check_norm <- OlinkAnalyze:::olink_norm_input_check(
  df1 = OlinkAnalyze:::data_ht_small,
  df1_check_log = data_ht_small_check,
  df2 = OlinkAnalyze:::data_3k_small,
  df2_check_log = data_3k_small_check,
  overlapping_samples_df1 = bridge_samples,
  overlapping_samples_df2 = NULL,
  df1_project_nr = "P1",
  df2_project_nr = "P2",
  reference_project = "P1",
  reference_medians = NULL
)

# Named list of input datasets
lst_df <- list(
  check_norm$ref_df,
  check_norm$not_ref_df
)
names(lst_df) <- c(check_norm$ref_name, check_norm$not_ref_name)

ref_cols <- check_norm$ref_check_log$col_names
not_ref_cols <- check_norm$not_ref_check_log$col_names

qs_result <- OlinkAnalyze:::olink_normalization_qs(
  lst_df = lst_df,
  ref_cols = ref_cols,
  not_ref_cols = not_ref_cols,
  bridge_samples = bridge_samples,
  prod_uniq = c("E3072", "HT")
)

An internal function to perform checks on the input samples in the functions olink_normalization_bridge and olink_normalization_subset. The function is expected to run all checks on SampleID to make sure that normalization can be performed smoothly. It should work independently of the function calling it.

Description

An internal function to perform checks on the input samples in the functions olink_normalization_bridge and olink_normalization_subset. The function is expected to run all checks on SampleID to make sure that normalization can be performed smoothly. It should work independently of the function calling it.

Usage

olink_normalization_sample_check(
  list_samples,
  check_mode,
  project_1_all_samples,
  project_2_all_samples
)

Arguments

list_samples

Named list of 2 arrays containing SampleID of the subset or bridge samples to be used for normalization. The names of the two arrays should be DF1 and DF2 corresponding to projects 1 and 2, respectively. (required)

check_mode

Flag "bridge" or "subset" indicating the type of normalization the check should be tailored to (required)

project_1_all_samples

Array of all samples from project 1 (required)

project_2_all_samples

Array of all samples from project 2 (required)

Value

a character message. If the message is "TRUE" then all checks passed, otherwise an error message will be printed.

Subset normalization of all proteins between two NPX projects.

Description

Normalizes two NPX projects (data frames) using all or a subset of samples.

This function is a wrapper of olink_normalization.

Usage

olink_normalization_subset(
  project_1_df,
  project_2_df,
  reference_samples,
  project_1_name = "P1",
  project_2_name = "P2",
  project_ref_name = "P1",
  format = FALSE,
  project_1_check_log = NULL,
  project_2_check_log = NULL
)

Arguments

project_1_df

Data frame of the first project (required).

project_2_df

Data frame of the second project (required).

reference_samples

Named list of 2 arrays containing SampleID of the subset of samples to be used for the calculation of median NPX within each project. The names of the two arrays should be DF1 and DF2 corresponding to projects 1 and 2, respectively. Arrays do not need to be of equal length and the order the samples appear in does not play any role. (required)

project_1_name

Name of the first project (default: P1).

project_2_name

Name of the second project (default: P2).

project_ref_name

Name of the project to be used as reference set. Needs to be one of the project_1_name or project_2_name. It marks the project to which the other project will be adjusted to (default: P1).

format

Boolean that controls whether the normalized dataset will be formatted for input to downstream analysis.

project_1_check_log

A named list returned by check_npx(). If NULL, check_npx() will be run internally using project_1_df. (default: NULL)

project_2_check_log

A named list returned by check_npx(). If NULL, check_npx() will be run internally using project_2_df. (default: NULL)

Details

In subset normalization one of the projects is adjusted to another using a subset of all samples from each. Please note that the subsets of samples are not expected to be replicates of each other or to have the SampleID. Adjustment between the two projects is made using the assay-specific differences in median between the subsets of samples from the two projects. The two data frames are inputs project_1_df and project_2_df, the one being adjusted to is specified in the input project_ref_name and the shared samples are specified in reference_samples.

A special case of subset normalization is to use all samples (except control samples) from each project as a subset.

Value

A "tibble" of NPX data in long format containing normalized NPX values, including adjustment factors and name of project.

Examples


#### Subset normalization

# datasets
npx_df1 <- npx_data1 |>
  dplyr::filter(
    !stringr::str_detect(string = .data[["SampleID"]],
                         pattern = "CONTROL_")
  ) |>
  dplyr::select(
    -dplyr::all_of("Project")
  ) |>
  dplyr::mutate(
    Normalization = "Intensity"
  )
npx_df2 <- npx_data2 |>
  dplyr::filter(
    !stringr::str_detect(string = .data[["SampleID"]],
                         pattern = "CONTROL_")
  ) |>
  dplyr::select(
    -dplyr::all_of("Project")
  ) |>
  dplyr::mutate(
    Normalization = "Intensity"
  )

# Find a suitable subset of samples from both projects, but exclude Olink
# controls and samples that fail QC.
df1_samples <- npx_df1 |>
  dplyr::filter(
    !stringr::str_detect(string = .data[["SampleID"]],
                         pattern = "CONTROL_")
  ) |>
  dplyr::group_by(
    dplyr::across(
      dplyr::all_of("SampleID")
    )
  ) |>
  dplyr::filter(
    all(.data[["QC_Warning"]] == "Pass")
  ) |>
  dplyr::pull(
    .data[["SampleID"]]
  ) |>
  unique() |>
  sample(
    size = 16L,
    replace = FALSE
  )
df2_samples <- npx_df2 |>
  dplyr::filter(
    !stringr::str_detect(string = .data[["SampleID"]],
                         pattern = "CONTROL_")
  ) |>
  dplyr::group_by(
    dplyr::across(
      dplyr::all_of("SampleID")
    )
  ) |>
  dplyr::filter(
    all(.data[["QC_Warning"]] == "Pass")
  ) |>
  dplyr::pull(
    .data[["SampleID"]]
  ) |>
  unique() |>
  sample(
    size = 16L,
    replace = FALSE
  )

# create named list
subset_samples_list <- list("DF1" = df1_samples,
                            "DF2" = df2_samples)

# check npx
df1_check_log <- OlinkAnalyze::check_npx(df = npx_df1)
df2_check_log <- OlinkAnalyze::check_npx(df = npx_df2)

# Normalize
OlinkAnalyze::olink_normalization_subset(
  project_1_df = npx_df1,
  project_2_df = npx_df2,
  reference_samples = subset_samples_list,
  project_1_name = "P1",
  project_2_name = "P2",
  project_ref_name = "P1",
  project_1_check_log = df1_check_log,
  project_2_check_log = df2_check_log
)


#### Special case of subset normalization using all samples

# datasets
npx_df1 <- npx_data1 |>
  dplyr::filter(
    !stringr::str_detect(string = .data[["SampleID"]],
                         pattern = "CONTROL_")
  ) |>
  dplyr::select(
    -dplyr::all_of("Project")
  ) |>
  dplyr::mutate(
    Normalization = "Intensity"
  )
npx_df2 <- npx_data2 |>
  dplyr::filter(
    !stringr::str_detect(string = .data[["SampleID"]],
                         pattern = "CONTROL_")
  ) |>
  dplyr::select(
    -dplyr::all_of("Project")
  ) |>
  dplyr::mutate(
    Normalization = "Intensity"
  )

# Find a suitable subset of samples from both projects, but exclude Olink
# controls and samples that fail QC.
df1_samples_all <- npx_df1 |>
  dplyr::filter(
    !stringr::str_detect(string = .data[["SampleID"]],
                         pattern = "CONTROL_")
  ) |>
  dplyr::group_by(
    dplyr::across(
      dplyr::all_of("SampleID")
    )
  ) |>
  dplyr::filter(
    all(.data[["QC_Warning"]] == "Pass")
  ) |>
  dplyr::pull(
    .data[["SampleID"]]
  ) |>
  unique()
df2_samples_all <- npx_df2 |>
  dplyr::filter(
    !stringr::str_detect(string = .data[["SampleID"]],
                         pattern = "CONTROL_")
  ) |>
  dplyr::group_by(
    dplyr::across(
      dplyr::all_of("SampleID")
    )
  ) |>
  dplyr::filter(
    all(.data[["QC_Warning"]] == "Pass")
  ) |>
  dplyr::pull(
    .data[["SampleID"]]
  ) |>
  unique()

# create named list
subset_samples_all_list <- list("DF1" = df1_samples_all,
                                "DF2" = df2_samples_all)

# check npx
df1_check_log <- OlinkAnalyze::check_npx(df = npx_df1)
df2_check_log <- OlinkAnalyze::check_npx(df = npx_df2)

# Normalize
OlinkAnalyze::olink_normalization_subset(
  project_1_df = npx_df1,
  project_2_df = npx_df2,
  reference_samples = subset_samples_all_list,
  project_1_name = "P1",
  project_2_name = "P2",
  project_ref_name = "P1",
  project_1_check_log = df1_check_log,
  project_2_check_log = df2_check_log
)

Function which performs a Kruskal-Wallis Test or Friedman Test per protein

Description

Performs an Kruskal-Wallis Test for each assay (by OlinkID) in every panel using stats::kruskal.test. Performs an Friedman Test for each assay (by OlinkID) in every panel using rstatix::friedman_test. The function handles factor variable.

Samples that have no variable information or missing factor levels are automatically removed from the analysis (specified in a message if verbose = TRUE). Character columns in the input dataframe are automatically converted to factors (specified in a message if verbose = T). Numerical variables are not converted to factors. If a numerical variable is to be used as a factor, this conversion needs to be done on the dataframe before the function call.

Inference is specified in a message if verbose = TRUE.
The formula notation of the final model is specified in a message if verbose = TRUE.

Adjusted p-values are calculated by stats::p.adjust according to the Benjamini & Hochberg (1995) method (“fdr”). The threshold is determined by logic evaluation of Adjusted_pval < 0.05.

Usage

olink_one_non_parametric(
  df,
  check_log = NULL,
  variable,
  dependence = FALSE,
  subject = NULL,
  verbose = TRUE
)

Arguments

df

NPX data frame in long format with at least protein name (Assay), OlinkID, UniProt, Panel and a factor with at least 3 levels.

check_log

A named list returned by check_npx(). If NULL, check_npx() will be run internally using df.

variable

Single character value.

dependence

Boolean. Default: FALSE. When the groups are independent, the kruskal-Wallis will run, when the groups are dependent, the Friedman test will run.

subject

Group information for the repeated measurement. If (dependence = TRUE), this parameter need to be specified.

verbose

Boolean. Default: True. If information about removed samples, factor conversion and final model formula is to be printed to the console.

Value

A tibble containing the Kruskal-Wallis Test or Friedman Test results for every protein.

Columns include:

Assay: "character" Protein symbol
OlinkID: "character" Olink specific ID
UniProt: "character" UniProt ID
Panel: "character" Name of Olink Panel
term: "character" term in model
df: "numeric" degrees of freedom
method: "character" which method was used
statistic: "named numeric" the value of the test statistic with a name describing it
p.value: "numeric" p-value for the test
Adjusted_pval: "numeric" adjusted p-value for the test (Benjamini&Hochberg)
Threshold: "character" if adjusted p-value is significant or not (< 0.05)

Examples


if (rlang::is_installed(pkg = c("broom", "rstatix"))) {

 check_log <- check_npx(df = npx_data1)

 # One-way Kruskal-Wallis Test
 kruskal_results <- OlinkAnalyze::olink_one_non_parametric(
    df = npx_data1,
    check_log = check_log,
    variable = "Site"
  )

  # Friedman Test
  friedman_results <- OlinkAnalyze::olink_one_non_parametric(
    df = npx_data1,
    check_log = check_log,
    variable = "Time",
    subject = "Subject",
    dependence = TRUE
  )
}

Function which performs posthoc test per protein for the results from Friedman or Kruskal-Wallis Test.

Description

Performs a posthoc test using rstatix::wilcox_test or FSA::dunnTest with Benjamini-Hochberg p-value adjustment per assay (by OlinkID) for each panel at confidence level 0.95. See olink_one_non_parametric for details of input notation.

The function handles both factor and numerical variables.

Usage

olink_one_non_parametric_posthoc(
  df,
  check_log = NULL,
  olinkid_list = NULL,
  variable,
  test = "kruskal",
  subject = "Subject",
  verbose = TRUE
)

Arguments

df

NPX data frame in long format with at least protein name (Assay), OlinkID, UniProt, Panel and a factor with at least 3 levels.

check_log

A named list returned by check_npx(). If NULL, check_npx() will be run internally using df.

olinkid_list

Character vector of OlinkID's on which to perform post hoc analysis. If not specified, all assays in df are used.

variable

Single character value or character array.

test

Single character value indicates running the post hoc test for friedman or kruskal.

subject

Group information for the repeated measurement. If (dependence = TRUE), this parameter need to be specified.

verbose

Boolean. Default: True. If information about removed samples, factor conversion and final model formula is to be printed to the console.

Value

Tibble of posthoc tests for specified effect, arranged by ascending adjusted p-values.

Columns include:

Assay: "character" Protein symbol
OlinkID: "character" Olink specific ID
UniProt: "character" UniProt ID
Panel: "character" Name of Olink Panel
term: "character" term in model
contrast: "character" the groups that were compared
estimate: "numeric" the value of the test statistic with a name describing it
Adjusted_pval: "numeric" adjusted p-value for the test
Threshold: "character" if adjusted p-value is significant or not (< 0.05)

Examples


if (rlang::is_installed(pkg = c("FSA", "broom", "rstatix"))) {

  check_log <- check_npx(df = npx_data1)

  # One-way Kruskal-Wallis Test
  kruskal_results <- OlinkAnalyze::olink_one_non_parametric(
    df = npx_data1,
    check_log = check_log,
    variable = "Site"
  )

  # Friedman Test
  friedman_results <- OlinkAnalyze::olink_one_non_parametric(
    df = npx_data1,
    check_log = check_log,
    variable = "Time",
    subject = "Subject",
    dependence = TRUE
  )

  # Posthoc test for the results from Friedman Test
  friedman_posthoc_results <- OlinkAnalyze::olink_one_non_parametric_posthoc(
    df = npx_data1,
    check_log = check_log,
    variable = "Time",
    test = "friedman",
    olinkid_list = friedman_results |>
      dplyr::filter(.data[["Threshold"]] == "Significant") |>
      dplyr::select(
        dplyr::all_of("OlinkID")
      ) |>
      dplyr::distinct() |>
      dplyr::pull()
  )
}

Function that performs a two-way ordinal analysis.

Description

Function that performs a two-way ordinal analysis of variance can address an experimental design with two independent variables, each of which is a factor variable. The main effect of each independent variable can be tested, as well as the effect of the interaction of the two factors.

Usage

olink_ordinal_regression(
  df,
  variable,
  covariates = NULL,
  return.covariates = FALSE,
  check_log = NULL,
  verbose = TRUE
)

olink_ordinalRegression(
  df,
  variable,
  covariates = NULL,
  return.covariates = FALSE,
  check_log = NULL,
  verbose = TRUE
)

Arguments

df

NPX or Quantified_value data frame in long format with at least protein name (Assay), OlinkID, UniProt, Panel and a factor with at least 3 levels.

variable

Single character value or character array. Variable(s) to test. If length > 1, the included variable names will be used in crossed analyses. Also takes ':'/'*' notation.

covariates

Single character value or character array. Default: NULL. Covariates to include. Takes ':'/'*' notation. Crossed analysis will not be inferred from main effects.

return.covariates

Logical. Default: False. Returns F-test results for the covariates. Note: Adjusted p-values will be NA for the covariates.

check_log

A named list returned by check_npx(). If NULL, check_npx() will be run internally using df.

verbose

Logical. Default: True. If information about removed samples, factor conversion and final model formula is to be printed to the console.

Details

Performs an ANOVA F-test for each assay (by OlinkID) in every panel using stats::Anova and Type III sum of squares. Dependent variable will be treated as ordered factor. The function handles only factor and/or covariates.

Samples that have no variable information or missing factor levels are automatically removed from the analysis (specified in a message if verbose = T). Character columns in the input dataframe are automatically converted to factors (specified in a message if verbose = T). Crossed analysis, i.e. A*B formula notation, is inferred from the variable argument in the following cases:

c('A','B')
c('A: B')
c('A: B', 'B') or c('A: B', 'A')

Inference and the formula notation of the final model are specified in a message if verbose = T.

Value

A tibble containing the ANOVA results for every protein. The tibble is arranged by ascending p-values. Columns include:#'

Assay: "character" Protein symbol
OlinkID: "character" Olink specific ID
UniProt: "character" UniProt ID
Panel: "character" Name of Olink Panel
term: "character" term in model
statistic: "numeric" value of the statistic
p.value: "numeric" nominal p-value
Adjusted_pval: "numeric" adjusted p-value for the test
Threshold: "character" if adjusted p-value is significant or not (< 0.05)

Examples


if (rlang::is_installed(pkg = c("ordinal", "broom"))) {
  npx_df <- OlinkAnalyze::npx_data1 |>
    dplyr::filter(
    !grepl(
      pattern = "control",
      x = .data[["SampleID"]],
      ignore.case = TRUE
    )
  )
  check_log <- OlinkAnalyze::check_npx(df = npx_df)

  # Two-way Ordinal Regression with CLM.
  # Results in model NPX~Treatment+Time+Treatment:Time.
  ordinalRegression_results <- OlinkAnalyze::olink_ordinal_regression(
    df = npx_df,
    variable = "Treatment:Time"
  )
}

Function which performs an posthoc test per protein.

Description

Performs a post hoc ANOVA test using emmeans::emmeans with Tukey p-value adjustment per assay (by OlinkID) for each panel at confidence level 0.95. See olink_anova for details of input notation.

The function handles both factor and numerical variables and/or covariates. The posthoc test for a numerical variable compares the difference in means of the ordinal outcome variable (default: NPX) for 1 standard deviation difference in the numerical variable, e.g. mean ordinal NPX at mean(numerical variable) versus mean NPX at mean(numerical variable) + 1*SD(numerical variable).

Usage

olink_ordinal_regression_posthoc(
  df,
  olinkid_list = NULL,
  variable,
  covariates = NULL,
  effect,
  effect_formula,
  mean_return = FALSE,
  post_hoc_padjust_method = "tukey",
  check_log = NULL,
  verbose = TRUE
)

olink_ordinalRegression_posthoc(
  df,
  olinkid_list = NULL,
  variable,
  covariates = NULL,
  effect,
  effect_formula,
  mean_return = FALSE,
  post_hoc_padjust_method = "tukey",
  check_log = NULL,
  verbose = TRUE
)

Arguments

df

NPX data frame in long format with at least protein name (Assay), OlinkID, UniProt, Panel and a factor with at least 3 levels.

olinkid_list

Character vector of OlinkID's on which to perform post hoc analysis. If not specified, all assays in df are used.

variable

Single character value or character array. Variable(s) to test. If length > 1, the included variable names will be used in crossed analyses. Also takes ':' notation.

covariates

Single character value or character array. Default: NULL. Covariates to include. Takes ':'/'*' notation. Crossed analysis will not be inferred from main effects.

effect

Term on which to perform post-hoc. Character vector. Must be subset of or identical to variable.

effect_formula

mean_return

Boolean. If true, returns the mean of each factor level rather than the difference in means (default). Note that no p-value is returned for mean_return = TRUE and no adjustment is performed.

post_hoc_padjust_method

P-value adjustment method to use for post-hoc comparisons within an assay. Options include tukey, sidak, bonferroni and none.

check_log

A named list returned by check_npx(). If NULL, check_npx() will be run internally using df.

verbose

Boolean. Default: True. If information about removed samples, factor conversion and final model formula is to be printed to the console.

Value

Tibble of posthoc tests for specified effect, arranged by ascending adjusted p-values. Columns include:

Assay: "character" Protein symbol
OlinkID: "character" Olink specific ID
UniProt: "character" UniProt ID
Panel: "character" Name of Olink Panel
term: "character" term in model
contrast: "character" the groups that were compared
estimate: "numeric" difference in mean of the ordinal NPX between groups
Adjusted_pval: "numeric" adjusted p-value for the test
Threshold: "character" if adjusted p-value is significant or not (< 0.05)

Examples


if (rlang::is_installed(pkg = c("ordinal", "emmeans"))) {
  npx_df <- OlinkAnalyze::npx_data1 |>
    dplyr::filter(
    !grepl(
      pattern = "control",
      x = .data[["SampleID"]],
      ignore.case = TRUE
    )
  )
  check_log <- OlinkAnalyze::check_npx(df = npx_df)

  # Two-way Ordinal Regression with CLM.
  # Results in model NPX~Treatment+Time+Treatment:Time.
  ordinalRegression_results <- OlinkAnalyze::olink_ordinal_regression(
    df = npx_df,
    variable = "Treatment:Time"
  )

  significant_assays <- ordinalRegression_results |>
    dplyr::filter(
      .data[["Threshold"]] == "Significant"
      & .data[["term"]] == "Time"
    ) |>
    dplyr::pull(
      .data[["OlinkID"]]
    ) |>
    unique()

  # Posthoc test
  ordRegr_results_posthoc <- OlinkAnalyze::olink_ordinal_regression_posthoc(
    df = npx_df,
    variable = c("Treatment:Time"),
    olinkid_list = significant_assays,
    effect = "Time",
    check_log = check_log
  )
}

OSI distribution plot

Description

Generates a density plot showing the distribution of the selected OSI score among dataset samples using ggplot2. OSI score can be one of "OSITimeToCentrifugation", "OSIPreparationTemperature", or "OSISummary". Olink external controls are excluded from this visualization.

Usage

olink_osi_dist_plot(df, check_log = NULL, osi_score = NULL)

Arguments

df

data frame with OSI data present

check_log

check log from check NPX

osi_score

OSI column to graph, one of OSISummary, OSITimeToCentrifugation, or OSIPreparationTemperature

Value

distribution plot (histogram overlayed with density plot) of osi values for corresponding osi_score column

Examples


# Creating fake OSI data from Site data
npx_df <- OlinkAnalyze::npx_data1 |>
  dplyr::filter(
    !grepl(pattern = "control",
    x = .data[["SampleID"]],
    ignore.case = TRUE)
  ) |>
  dplyr::mutate(
    OSISummary = as.numeric(as.factor(.data[["Site"]])),
    OSISummary = .data[["OSISummary"]] - min(.data[["OSISummary"]],
                                             na.rm = TRUE),
    OSISummary = .data[["OSISummary"]] / max(.data[["OSISummary"]],
                                             na.rm = TRUE)
  )

check_log <- OlinkAnalyze::check_npx(
  df = npx_df
)

# Generate figure
OlinkAnalyze::olink_osi_dist_plot(
  df = npx_df,
  check_log = check_log,
  osi_score = "OSISummary"
)

Olink color panel for plotting

Description

Olink color panel for plotting

Usage

olink_pal(alpha = 1, coloroption = NULL)

Arguments

alpha

transparency (optional)

coloroption

string, one or more of the following: c("red", "orange", "yellow", "green", "teal", "turqoise", "lightblue", "darkblue", "purple", "pink")

Value

A character vector of palette hex codes for colors.

Examples


if (rlang::is_installed(pkg = c("scales"))) {
  # Color matrices
  scales::show_col(
    colours = OlinkAnalyze::olink_pal()(10L),
    labels = FALSE
  )
  scales::show_col(
    colours = OlinkAnalyze::olink_pal(
      coloroption = c("lightblue", "green")
    )(2L),
    labels = FALSE
  )

  # Contour plot
  filled.contour(
    x = datasets::volcano,
    color.palette = OlinkAnalyze::olink_pal(),
    asp = 1
  )
  filled.contour(
    x = datasets::volcano,
    color.palette = scales::hue_pal(),
    asp = 1
  )
}

Performs pathway enrichment using over-representation analysis (ORA) or gene set enrichment analysis (GSEA)

Description

This function performs enrichment analysis based on statistical test results and full data using clusterProfiler's functions gsea and enrich for MSigDB.

Usage

olink_pathway_enrichment(
  df,
  test_results,
  check_log = NULL,
  method = "GSEA",
  ontology = "MSigDb",
  organism = "human",
  pvalue_cutoff = 0.05,
  estimate_cutoff = 0
)

Arguments

df

NPX data frame in long format with at least protein name ("Assay"), "OlinkID", "UniProt", "SampleID", QC warning ("QC_Warning" or "SampleQC"), quantification column ("NPX", "Ct" or "Quantified_value"), and one or more columns representing limit of detection ("LOD", "PlateLOD" or "MaxLOD").

test_results

a data frame of statistical test results including the columns "Adjusted_pval" and "estimate".

check_log

A named list returned by check_npx(). If NULL, check_npx() will be run internally using df.

method

One of "GSEA" (default) or "ORA".

ontology

One of "MSigDb" (default), "MSigDb_com", "KEGG", "GO", and "Reactome". "MSigDb" contains "C2" and "C5" gene sets which encompass "KEGG", "GO", and "Reactome". "MSigDb_com" consists of "C2" and "C5" gene sets without "KEGG", as the latter not permitted for commercial use.

organism

One of "human" (default) or "mouse".

pvalue_cutoff

(numeric) maximum adjusted p-value cutoff for ORA filtering of foreground set (default = 0.05). This argument is not used for GSEA.

estimate_cutoff

(numeric) minimum estimate cutoff for ORA filtering of foreground set (default = 0). This argument is not used for GSEA.

Details

MSigDB is subset if the ontology argument is "KEGG", "GO", or "Reactome". The argument test_results must contain estimates for all assays, otherwise an error will be thrown. Results from a post-hoc statistical test can be used as argument for test_results, but the user needs to select and filter one contrast to improve interpretability of the results. Alternative statistical results can be used as input as long as they include the columns "OlinkID", "Assay", and "estimate". A column named "Adjusted_pval" is also required for ORA. Any statistical result that contains exactly one estimate per protein will work as long as the estimates are comparable to each other.

The R library clusterProfiler is originally developed by Guangchuang Yu at the School of Basic Medical Sciences at Southern Medical University.

NB: We strongly recommend to set a seed prior to running this function to ensure reproducibility of the results.

An important note regarding Pathway Enrichment with Olink Data

It is important to note that sometimes the proteins that are assayed in Olink Panels are related to specific biological areas and therefore do not represent an unbiased overview of the proteome as a whole, which is an assumption for pathway enrichment. Pathways can only interpreted based on the background/context they came from. For this reason, an estimate for all assays measured must be provided. Furthermore, certain pathways cannot come up based on Olink's coverage in this area. Additionally, if only the Inflammation panel was run, then the available pathways would be given based on a background of proteins related to inflammation. Both ORA and GSEA can provide mechanistic and disease related insight and are best to use when trying to uncover pathways/annotations of interest. It is recommended to only use pathway enrichment for hypothesis generating data, which is better suited for data originating from Olink's NGS platforms "Explore 3072", "Explore HT", and "Reveal" or on multiple Target 96 panels from Olink's qPCR platform. For smaller lists of proteins it may be more informative to use biological annotation in directed research, to discover which significant assays are related to keywords of interest.

Value

A data frame of enrichment results.

Columns for ORA include:

ID: Pathway ID from MSigDB.
Description: Description of Pathway from MSigDB.
GeneRatio: Ratio of input proteins that are annotated in a term.
BgRatio: Ratio of all genes that are annotated in this term.
pvalue: P-value of enrichment.
p.adjust: Benjamini-Hochberg adjusted p-value.
qvalue: False discovery rate (FDR), the estimated probability that the normalized enrichment score represents a false positive finding.
geneID: List of input proteins (Gene Symbols) annotated in a term, delimited by "/".
Count: Number of input proteins that are annotated in a term.

Columns for GSEA:

ID: Pathway ID from MSigDB.
Description: Description of Pathway from MSigDB.
setSize: Ratio of input proteins that are annotated in a term.
enrichmentScore: Enrichment score (ES), degree to which a gene set is over-represented at the top or bottom of the ranked list of genes.
NES: Normalized Enrichment Score (NES), normalized to account for differences in gene set size and in correlations between gene sets and expression data sets. NES can be used to compare analysis results across gene sets.
pvalue: P-value of enrichment.
p.adjust: Benjamini-Hochberg adjusted p-value.
qvalue: False discovery rate (FDR), the estimated probability that the normalized enrichment score represents a false positive finding.
rank: The position in the ranked list where the maximum enrichment score occurred.
leading_edge: Contains tags, list, and signal. Tags provide an indication of the percentage of genes contributing to the ES. List gives an indication of where in the list the ES is obtained. Signal represents the enrichment signal strength and combines the tag and list.
core_enrichment: List of input proteins (Gene Symbols) annotated in a term, delimited by "/".

Author(s)

Kathleen Nevola Klev Diamanti

References

Wu, T. et al. (2021). clusterProfiler 4.0: A universal enrichment tool for interpreting omics data. The Innovation, 2(3):100141. doi: 10.1016/j.xinn.2021.100141.

Examples


if (rlang::is_installed(pkg = c("msigdbr", "clusterProfiler"))) {
  npx_df <- npx_data1 |>
    dplyr::filter(
      !grepl(
        pattern = "control",
        x = .data[["SampleID"]],
        ignore.case = TRUE
      )
    )

  check_log <- check_npx(df = npx_df)

  ttest_results <- OlinkAnalyze::olink_ttest(
    df = npx_df,
    variable = "Treatment",
    alternative = "two.sided",
    check_log = check_log
  )

  # GSEA
  gsea_results <- OlinkAnalyze::olink_pathway_enrichment(
    df = npx_df,
    test_results = ttest_results,
    check_log = check_log
  )

  # ORA
  ora_results <- OlinkAnalyze::olink_pathway_enrichment(
    df = npx_df,
    test_results = ttest_results,
    check_log = check_log,
    method = "ORA"
  )
}

Creates a heatmap of proteins related to pathways using enrichment results from `olink_pathway_enrichment`.

Description

Creates a heatmap of proteins related to pathways using enrichment results from olink_pathway_enrichment.

Usage

olink_pathway_heatmap(
  enrich_results,
  test_results,
  method = "GSEA",
  keyword = NULL,
  number_of_terms = 20L
)

Arguments

enrich_results

data frame of enrichment results from olink_pathway_enrichment

test_results

a data frame of statistical test results including the columns "Adjusted_pval" and "estimate".

method

One of "GSEA" (default) or "ORA".

keyword

(optional) keyword to filter enrichment results on. If not specified, displays top terms.

number_of_terms

number of terms to display (default is 20).

Value

A heatmap as a ggplot object.

Examples


if (rlang::is_installed(pkg = c("msigdbr", "clusterProfiler"))) {

  # Run olink_ttest or other stats test (see documentation )
  npx_df <- npx_data1 |>
    dplyr::filter(
      !grepl(
        pattern = "control",
        x = .data[["SampleID"]],
        ignore.case = TRUE
      )
    )

  check_log <- check_npx(df = npx_df)

  ttest_results <- OlinkAnalyze::olink_ttest(
    df = npx_df,
    variable = "Treatment",
    alternative = "two.sided",
    check_log = check_log
  )

  # Run olink_pathway_enrichment (see documentation)

  # GSEA
  gsea_results <- OlinkAnalyze::olink_pathway_enrichment(
    df = npx_df,
    test_results = ttest_results,
    check_log = check_log
  )

  # ORA
  ora_results <- OlinkAnalyze::olink_pathway_enrichment(
    df = npx_df,
    test_results = ttest_results,
    check_log = check_log,
    method = "ORA"
  )

  # Plot

  OlinkAnalyze::olink_pathway_heatmap(
    enrich_results = gsea_results,
    test_results = ttest_results
  )

  OlinkAnalyze::olink_pathway_heatmap(
    enrich_results = ora_results,
    test_results = ttest_results,
    method = "ORA",
    keyword = "cell"
  )
}

Creates bargraph of top/selected enrichment terms from GSEA or ORA results from `olink_pathway_enrichment`

Description

Pathways are ordered by increasing p-value (unadjusted)

Usage

olink_pathway_visualization(
  enrich_results,
  method = "GSEA",
  keyword = NULL,
  number_of_terms = 20L
)

Arguments

enrich_results

data frame of enrichment results from olink_pathway_enrichment

method

One of "GSEA" (default) or "ORA".

keyword

(optional) keyword to filter enrichment results on. If not specified, displays top terms.

number_of_terms

number of terms to display (default is 20).

Value

A bargraph as a ggplot object.

Examples


if (rlang::is_installed(pkg = c("msigdbr", "clusterProfiler"))) {

  # Run olink_ttest or other stats test (see documentation )
  npx_df <- npx_data1 |>
    dplyr::filter(
      !grepl(
        pattern = "control",
        x = .data[["SampleID"]],
        ignore.case = TRUE
      )
    )

  check_log <- check_npx(df = npx_df)

  ttest_results <- OlinkAnalyze::olink_ttest(
    df = npx_df,
    variable = "Treatment",
    alternative = "two.sided",
    check_log = check_log
  )

  # Run olink_pathway_enrichment (see documentation)

  # GSEA
  gsea_results <- OlinkAnalyze::olink_pathway_enrichment(
    df = npx_df,
    test_results = ttest_results,
    check_log = check_log
  )

  # ORA
  ora_results <- OlinkAnalyze::olink_pathway_enrichment(
    df = npx_df,
    test_results = ttest_results,
    check_log = check_log,
    method = "ORA"
  )

  # Plot

  OlinkAnalyze::olink_pathway_visualization(
    enrich_results = gsea_results
  )

  OlinkAnalyze::olink_pathway_visualization(
    enrich_results = gsea_results,
    keyword = "immune"
  )

  OlinkAnalyze::olink_pathway_visualization(
    enrich_results = ora_results,
    method = "ORA",
    number_of_terms = 15L
  )
}

Function to plot a PCA of the data

Description

Generates a PCA projection of all samples from NPX data along two principal components (default PC2 vs. PC1) including the explained variance and dots colored by QC_Warning using stats::prcomp and ggplot2::ggplot.

The values are by default scaled and centered in the PCA and proteins with missing NPX values are by default removed from the corresponding assay.

Unique sample names are required.

Imputation by the median is done for assays with missingness <10\ for multi-plate projects and <5\

The plot is printed, and a list of ggplot objects is returned. If byPanel = TRUE, the data processing (imputation of missing values etc) and subsequent PCA is performed separately per panel. A faceted plot is printed, while the individual ggplot objects are returned.

The arguments outlierDefX and outlierDefY can be used to identify outliers in the PCA. Samples more than +/- outlierDefX and outlierDefY standard deviations from the mean of the plotted PC will be labelled. Both arguments have to be specified.

Usage

olink_pca_plot(
  df,
  check_log = NULL,
  color_g = "QC_Warning",
  x_val = 1L,
  y_val = 2L,
  label_samples = FALSE,
  drop_assays = FALSE,
  drop_samples = FALSE,
  n_loadings = 0,
  loadings_list = NULL,
  byPanel = FALSE,
  outlierDefX = NA,
  outlierDefY = NA,
  outlierLines = FALSE,
  label_outliers = TRUE,
  quiet = FALSE,
  verbose = TRUE,
  ...
)

Arguments

df

data frame in long format with Sample Id, NPX and column of choice for colors.

check_log

A named list returned by check_npx(). If NULL, check_npx() will be run internally using df.

color_g

Character value indicating which column to use for colors (default QC_Warning). Continuous color scale for Olink(R) Sample Index (OSI) columns OSITimeToCentrifugation, OSIPreparationTemperature and OSISummary is also supported.

x_val

Integer indicating which principal component to plot along the x-axis (default 1)

y_val

Integer indicating which principal component to plot along the y-axis (default 2)

label_samples

Logical. If TRUE, points are replaced with SampleID (default FALSE)

drop_assays

Logical. All assays with any missing values will be dropped. Takes precedence over sample drop.

drop_samples

Logical. All samples with any missing values will be dropped.

n_loadings

Integer. Will plot the top n_loadings based on size.

loadings_list

Character vector indicating for which OlinkID's to plot as loadings. It is possible to use n_loadings and loadings_list simultaneously.

byPanel

Perform the PCA per panel (default FALSE)

outlierDefX

The number standard deviations along the PC plotted on the x-axis that defines an outlier. See also 'Details"

outlierDefY

The number standard deviations along the PC plotted on the y-axis that defines an outlier. See also 'Details"

outlierLines

Draw dashed lines at +/- outlierDefX and outlierDefY standard deviations from the mean of the plotted PCs (default FALSE)

label_outliers

Use ggrepel to label samples lying outside the limits set by the outlierLines (default TRUE)

quiet

Logical. If TRUE, the resulting plot is not printed

verbose

Logical. Whether warnings about the number of samples and/or assays dropped or imputed should be printed to the console.

...

coloroption passed to specify color order.

Value

A list of objects of class "ggplot", each plot contains scatter plot of PCs

Examples


if (rlang::is_installed(pkg = c("ggrepel", "ggpubr"))) {
  npx_data <- npx_data1 |>
    dplyr::filter(
      !grepl(pattern = "CONTROL",
             x = .data[["SampleID"]],
             ignore.case = TRUE)
    )

  check_log <- check_npx(npx_data)

  # PCA using all the data
  OlinkAnalyze::olink_pca_plot(
    df = npx_data,
    check_log = check_log,
    color_g = "QC_Warning"
  )

  # PCA per panel
  g <- OlinkAnalyze::olink_pca_plot(
    df = npx_data,
    check_log = check_log,
    color_g = "QC_Warning",
    byPanel = TRUE
  )
  g[[2L]] # Plot only the second panel

  # Label outliers
  OlinkAnalyze::olink_pca_plot(
    df = npx_data,
    check_log = check_log,
    color_g = "QC_Warning",
    outlierDefX = 2L,
    outlierDefY = 4L
   ) # All data

  OlinkAnalyze::olink_pca_plot(
    df = npx_data,
    check_log = check_log,
    color_g = "QC_Warning",
    outlierDefX = 2.5,
    outlierDefY = 4L,
    byPanel = TRUE
  ) # Per panel

  # Retrieve the outliers
  g <- OlinkAnalyze::olink_pca_plot(
    df = npx_data,
    check_log = check_log,
    color_g = "QC_Warning",
    outlierDefX = 2.5,
    outlierDefY = 4L,
    byPanel = TRUE
  )
  outliers <- lapply(g, function(x) {
    return(x$data)
  }) |>
  dplyr::bind_rows() |>
  dplyr::filter(
    .data[["Outlier"]] == 1L
  )
}

Randomly assign samples to plates

Description

Generates a scheme for how to plate samples with an option to keep subjects on the same plate and/or to keep studies together.

Usage

olink_plate_randomizer(
  Manifest,
  PlateSize = 96,
  Product,
  SubjectColumn,
  iterations = 500,
  available.spots,
  num_ctrl = 8L,
  rand_ctrl = FALSE,
  seed,
  study = NULL
)

Arguments

Manifest

tibble/data frame in long format containing all sample ID's. Sample ID column must be named SampleID.

PlateSize

Integer. Either 96 or 48. 96 is default.

Product

String. Name of Olink product used to set PlateSize if not provided. Optional.

SubjectColumn

(Optional) Column name of the subject ID column. Cannot contain missing values. If provided, subjects are kept on the same plate. This argument is used for longitudinal studies and must be a separate column from the SampleID column.

iterations

Number of iterations for fitting subjects on the same plate.

available.spots

Numeric. Number of wells available on each plate. Maximum 40 for T48 and 88 for T96. Takes a vector equal to the number of plates to be used indicating the number of wells available on each plate.

num_ctrl

Numeric. Number of controls on each plate (default = 8)

rand_ctrl

Logical. Whether controls are added to be randomized across the plate (default = FALSE)

seed

Seed to set. Highly recommend setting this for reproducibility.

study

String. Optional. Name of column that includes study information. For when multiple studies are being plated and randomizing within studies. If study column is present in manifest, within study randomization will be performed.

Details

Variables of interest should if possible be randomized across plates to avoid confounding with potential plate effects. In the case of multiple samples per subject (e.g. in longitudinal studies), Olink recommends keeping each subject on the same plate. This can be achieved using the SubjectColumn argument.

Value

A "tibble" including SampleID, SubjectID etc. assigned to well positions. Columns include same columns as Manifest with additional columns:

plate: Plate number
column: Column on the plate
row: Row on the plate
well: Well location on the plate

Examples


#Generate randomization scheme using complete randomization
randomized.manifest_a <- olink_plate_randomizer(manifest, seed=12345)

# Generate randomization scheme that keeps subjects on the same plate
# (for longitudinal studies)
randomized.manifest_b <- olink_plate_randomizer(manifest,
                                                SubjectColumn="SubjectID",
                                                available.spots=c(88,88),
                                                seed=12345)

# Generate randomization scheme that keeps samples from the same
# study together
randomized.manifest_c <- olink_plate_randomizer(manifest, study = "Site")

# Visualize the generated plate layouts
olink_displayPlateLayout(randomized.manifest_a, fill.color = 'Site')
olink_displayPlateLayout(randomized.manifest_a, fill.color = 'SubjectID')
olink_displayPlateLayout(randomized.manifest_b, fill.color = 'Site')
olink_displayPlateLayout(randomized.manifest_b, fill.color = 'SubjectID')
olink_displayPlateLayout(randomized.manifest_c, fill.color = 'Site')

# Validate that sites are properly randomized
olink_displayPlateDistributions(randomized.manifest_a, fill.color = 'Site')
olink_displayPlateDistributions(randomized.manifest_b, fill.color = 'Site')

Function to plot an overview of a sample cohort per Panel.

Description

Generates a facet plot per Panel using ggplot2::ggplot and ggplot2::geom_point and stats::IQR plotting IQR vs. median for all samples. Horizontal dashed lines indicate \pmIQR_outlierDef standard deviations from the mean IQR (default 3). Vertical dashed lines indicate \pmmedian_outlierDef standard deviations from the mean sample median (default 3).

Usage

olink_qc_plot(
  df,
  check_log = NULL,
  color_g = "QC_Warning",
  plot_index = FALSE,
  label_outliers = FALSE,
  IQR_outlierDef = 3L,
  median_outlierDef = 3L,
  outlierLines = TRUE,
  facetNrow = NULL,
  facetNcol = NULL,
  ...
)

Arguments

df

NPX data frame in long format. Must have columns SampleID, NPX and Panel.

check_log

A named list returned by check_npx(). If NULL, check_npx() will be run internally using df.

color_g

Character value indicating which column to use as fill color (default QC_Warning). Continuous color scale for Olink(R) Sample Index (OSI) columns OSITimeToCentrifugation, OSIPreparationTemperature and OSISummary is also supported.

plot_index

Boolean. If FALSE (default), a point will be plotted for a sample. If TRUE, a sample's unique index number is displayed.

label_outliers

Boolean. If TRUE, an outlier sample will be labelled with its SampleID. (default FALSE)

IQR_outlierDef

The number of standard deviations from the mean IQR that defines an outlier. (default 3)

median_outlierDef

The number of standard deviations from the mean sample median that defines an outlier. (default 3)

outlierLines

Draw dashed lines at \pmIQR_outlierDef and \pmmedian_outlierDef standard deviations from the mean IQR and sample median respectively. (default TRUE)

facetNrow

The number of rows that the panels are arranged on.

facetNcol

The number of columns that the panels are arranged on.

...

coloroption passed to specify color order.

Value

An object of class "ggplot". Scatterplot shows IQR vs median for all samples per panel

Examples


if (rlang::is_installed(pkg = c("ggrepel"))) {

  # standard plot
  OlinkAnalyze::olink_qc_plot(
    df = npx_data1,
    color_g = "QC_Warning",
    label_outliers = TRUE
  )

  # Change the outlier threshold to +/-4SD
  OlinkAnalyze::olink_qc_plot(
    df = npx_data1,
    color_g = "QC_Warning",
    IQR_outlierDef = 4L,
    median_outlierDef = 4L,
    label_outliers = TRUE
  )

  # Identify the outliers
  qc <- OlinkAnalyze::olink_qc_plot(
    df = npx_data1,
    color_g = "QC_Warning",
    IQR_outlierDef = 4L,
    median_outlierDef = 4L,
    label_outliers = TRUE
  )

  outliers <- qc$data |>
    dplyr::filter(
      .data[["Outlier"]] == 1L
    )
}

Function which performs a t-test per protein

Description

Performs a Welch 2-sample t-test or paired t-test at confidence level 0.95 for every protein (by OlinkID) for a given grouping variable using stats::t.test and corrects for multiple testing by the Benjamini-Hochberg method (“fdr”) using stats::p.adjust. Adjusted p-values are logically evaluated towards adjusted p-value<0.05. The resulting t-test table is arranged by ascending p-values.

Usage

olink_ttest(df, variable, pair_id, check_log = NULL, ...)

Arguments

df

NPX data frame in long format with at least protein name (Assay), OlinkID, UniProt and a factor with 2 levels.

variable

Character value indicating which column should be used as the grouping variable. Needs to have exactly 2 levels.

pair_id

Character value indicating which column indicates the paired sample identifier.

check_log

A named list returned by check_npx(). If NULL, check_npx() will be run internally using df.

...

Options to be passed to t.test. See ?t.test for more information.

Value

A "tibble" containing the t-test results for every protein. Columns include:

Assay: "character" Protein symbol
OlinkID: "character" Olink specific ID
UniProt: "character" UniProt ID
Panel: "character" Name of Olink Panel
estimate: "numeric" difference in mean NPX between groups
Group 1: "numeric" Column is named first level of variable when converted to factor, contains mean NPX for that group
Group 2: "numeric" Column is named second level of variable when converted to factor, contains mean NPX for that group
statistic: "named numeric" value of the t-statistic
p.value: "numeric" p-value for the test
parameter: "named numeric" degrees of freedom for the t-statistic
conf.low: "numeric" confidence interval for the mean (lower end)
conf.high: "numeric" confidence interval for the mean (upper end)
method: "character" which t-test method was used
alternative: "character" describes the alternative hypothesis
Adjusted_pval: "numeric" adjusted p-value for the test (Benjamini & Hochberg)
Threshold: "character" if adjusted p-value is significant or not (< 0.05)

Examples


if (rlang::is_installed(pkg = c("broom"))) {
  npx_df <- OlinkAnalyze::npx_data1 |>
    dplyr::filter(
    !grepl(
      pattern = "control",
      x = .data[["SampleID"]],
      ignore.case = TRUE
    )
  )
  check_log <- OlinkAnalyze::check_npx(df = npx_df)

  ttest_results <- OlinkAnalyze::olink_ttest(
    df = npx_df,
    variable = "Treatment",
    alternative = "two.sided",
    check_log = check_log
  )

  # Paired t-test
  ttest_paired_results <- npx_df |>
    dplyr::filter(
      .data[["Time"]] %in% c("Baseline", "Week.6")
    ) |>
    OlinkAnalyze::olink_ttest(
      variable = "Time",
      pair_id = "Subject",
      check_log = check_log
    )
}

Function to make a UMAP plot from the data

Description

Computes a manifold approximation and projection using umap::umap and plots the two specified components. Unique sample names are required and imputation by the median is done for assays with missingness <10\ projects and <5\

Usage

olink_umap_plot(
  df,
  color_g = "QC_Warning",
  x_val = 1L,
  y_val = 2L,
  check_log = NULL,
  config = NULL,
  label_samples = FALSE,
  drop_assays = FALSE,
  drop_samples = FALSE,
  byPanel = FALSE,
  outlierDefX = NA,
  outlierDefY = NA,
  outlierLines = FALSE,
  label_outliers = TRUE,
  quiet = FALSE,
  verbose = TRUE,
  ...
)

Arguments

df

data frame in long format with Sample Id, NPX and column of choice for colors.

color_g

x_val

Integer indicating which UMAP component to plot along the x-axis (default 1)

y_val

Integer indicating which UMAP component to plot along the y-axis (default 2)

check_log

A named list returned by check_npx(). If NULL, check_npx() will be run internally using df.

config

object of class umap.config, specifying the parameters for the UMAP algorithm (default umap::umap.defaults)

label_samples

Logical. If TRUE, points are replaced with SampleID (default FALSE)

drop_assays

Logical. All assays with any missing values will be dropped. Takes precedence over sample drop.

drop_samples

Logical. All samples with any missing values will be dropped.

byPanel

Perform the UMAP per panel (default FALSE)

outlierDefX

The number standard deviations along the UMAP dimension plotted on the x-axis that defines an outlier. See also 'Details"

outlierDefY

The number standard deviations along the UMAP dimension plotted on the y-axis that defines an outlier. See also 'Details"

outlierLines

Draw dashed lines at +/- outlierDefX and outlierDefY standard deviations from the mean of the plotted PCs (default FALSE)

label_outliers

Use ggrepel to label samples lying outside the limits set by the outlierLines (default TRUE)

quiet

Logical. If TRUE, the resulting plot is not printed

verbose

Logical. Whether warnings about the number of samples and/or assays dropped or imputed should be printed to the console.

...

coloroption passed to specify color order.

Details

The plot is printed, and a list of ggplot objects is returned.

If byPanel = TRUE, the data processing (imputation of missing values etc) and subsequent UMAP is performed separately per panel. A faceted plot is printed, while the individual ggplot objects are returned. The arguments outlierDefX and outlierDefY can be used to identify outliers in the UMAP results. Samples more than +/- outlierDefX and outlierDefY standard deviations from the mean of the plotted UMAP component will be labelled. Both arguments have to be specified.

NOTE: UMAP is a non-linear data transformation that might not accurately preserve the properties of the data. Distances in the UMAP plane should therefore be interpreted with caution.

Value

A list of objects of class "ggplot", each plot contains scatter plot of UMAPs

Examples


if (rlang::is_installed(pkg = c("umap", "ggrepel", "ggpubr"))) {

  npx_data <- npx_data1 |>
    dplyr::mutate(
      SampleID = paste(.data[["SampleID"]], "_", .data[["Index"]], sep = "")
    )

  check_log <- check_npx(df = npx_data)

  # UMAP using all the data
  OlinkAnalyze::olink_umap_plot(
    df = npx_data,
    color_g = "QC_Warning",
    check_log = check_log
  )

  # UMAP per panel
  g <- OlinkAnalyze::olink_umap_plot(
    df = npx_data,
    color_g = "QC_Warning",
    byPanel = TRUE,
    check_log = check_log
  )
  # Plot only the Inflammation panel
  g$Inflammation

  # Label outliers
  OlinkAnalyze::olink_umap_plot(
    df = npx_data,
    color_g = "QC_Warning",
    outlierDefX = 2L,
    outlierDefY = 4L,
    check_log = check_log
  )

  OlinkAnalyze::olink_umap_plot(
    df = npx_data,
    color_g = "QC_Warning",
    outlierDefX = 3L,
    outlierDefY = 2L,
    byPanel = TRUE,
    check_log = check_log
  )

  # Retrieve outliers
  p <- OlinkAnalyze::olink_umap_plot(
    df = npx_data,
    color_g = "QC_Warning",
    outlierDefX = 3L,
    outlierDefY = 2L,
    byPanel = TRUE,
    check_log = check_log
  )
  outliers <- lapply(p, function(x) x$data) |>
    dplyr::bind_rows() |>
    dplyr::filter(
      .data[["Outlier"]] == 1L
    )
}

Easy volcano plot with Olink theme

Description

Generates a volcano plot using the results of the olink_ttest function using ggplot and ggplot2::geom_point. The estimated difference is plotted on the x-axis and the negative 10-log p-value on the y-axis. The horizontal dotted line indicates p-value=0.05. Dots are colored based on the Benjamini-Hochberg adjusted p-value cutoff 0.05 and can optionally be annotated by OlinkID.

Usage

olink_volcano_plot(p.val_tbl, x_lab = "Estimate", olinkid_list = NULL, ...)

Arguments

p.val_tbl

a data frame of results generated by olink_ttest()

x_lab

Optional. Character value to use as the X-axis label

olinkid_list

Optional. Character vector of proteins (by OlinkID) to label in the plot. If not provided, default is to label all significant proteins.

...

Optional. Additional arguments for olink_color_discrete()

Value

An object of class "ggplot", plotting significance (y-axis) by estimated difference between groups (x-axis) for each protein.

Examples


if (rlang::is_installed(pkg = c("broom", "ggrepel"))) {
  npx_df <- npx_data1 |>
  dplyr::filter(
    !grepl(pattern = "control",
           x = .data[["SampleID"]],
           ignore.case = TRUE

     )
  )

  check_log <- check_npx(df = npx_df)

  ttest_results <- OlinkAnalyze::olink_ttest(
    df = npx_df,
    check_log = check_log,
    variable = "Treatment",
    alternative = "two.sided"
  )

  OlinkAnalyze::olink_volcano_plot(
    p.val_tbl = ttest_results
  )
}

Function which performs a Mann-Whitney U Test per protein

Description

Performs a Welch 2-sample Mann-Whitney U Test at confidence level 0.95 for every protein (by OlinkID) for a given grouping variable using stats::wilcox.test and corrects for multiple testing by the Benjamini-Hochberg method (“fdr”) using stats::p.adjust. Adjusted p-values are logically evaluated towards adjusted p-value<0.05. The resulting Mann-Whitney U Test table is arranged by ascending p-values.

Usage

olink_wilcox(df, variable, pair_id, check_log = NULL, ...)

Arguments

df

NPX or Quantified_value data frame in long format with at least protein name (Assay), OlinkID, UniProt and a factor with 2 levels.

variable

Character value indicating which column should be used as the grouping variable. Needs to have exactly 2 levels.

pair_id

Character value indicating which column indicates the paired sample identifier.

check_log

A named list returned by check_npx(). If NULL, check_npx() will be run internally using df.

...

Options to be passed to wilcox.test. See ?wilcox_test for more information.

Value

A data frame containing the Mann-Whitney U Test results for every protein. Columns include:

Assay: "character" Protein symbol
OlinkID: "character" Olink specific ID
UniProt: "character" UniProt ID
Panel: "character" Name of Olink Panel
estimate: "numeric" median of NPX differences between groups
statistic: "named numeric" the value of the test statistic with a name describing it
p.value: "numeric" p-value for the test
conf.low: "numeric" confidence interval for the median of differences (lower end)
conf.high: "numeric" confidence interval for the median of differences (upper end)
method: "character" which wilcoxon method was used
alternative: "character" describes the alternative hypothesis
Adjusted_pval: "numeric" adjusted p-value for the test (Benjamini&Hochberg)
Threshold: "character" if adjusted p-value is significant or not (< 0.05)

Examples


if (rlang::is_installed(pkg = c("broom"))) {
  npx_df <- npx_data1 |>
    dplyr::filter(
    !grepl(
      pattern = "control",
      x = .data[["SampleID"]],
      ignore.case = TRUE
    )
  )
  check_log <- OlinkAnalyze::check_npx(df = npx_df)

  # Mann-Whitney U Test
  wilcox_results <- OlinkAnalyze::olink_wilcox(
    df = npx_df,
    variable = "Treatment",
    alternative = "two.sided",
    check_log = check_log
  )

  # Paired Mann-Whitney U Test
  wilcox_paired_results <- npx_df |>
    dplyr::filter(
      .data[["Time"]] %in% c("Baseline", "Week.6")
    ) |>
    OlinkAnalyze::olink_wilcox(
      variable = "Time",
      pair_id = "Subject",
      check_log = check_log
    )
}

Add annotations to pheatmap arguments

Description

Add annotations to pheatmap arguments

Usage

pheatmap_annotate_heatmap(
  df,
  check_log,
  colnames,
  pheatmap_args,
  variable_row_list,
  variable_col_list
)

Arguments

df

Data frame in long format with SampleID, NPX, OlinkID, Assay and columns of choice for annotations.

check_log

output from check_npx on df

colnames

Character. Determines how to label the columns. Must be 'assay', 'oid', or 'both' (default 'both').

variable_row_list

Columns in df to be annotated for rows in the heatmap.

variable_col_list

Columns in df to be annotated for columns in the heatmap.

Value

updated pheatmap arguments

add colors to pheatmap arguments

Description

add colors to pheatmap arguments

Usage

pheatmap_color_heatmap(
  df,
  check_log,
  pheatmap_args,
  variable_row_list,
  variable_col_list
)

Arguments

df

Data frame in long format with SampleID, NPX, OlinkID, Assay and columns of choice for annotations.

check_log

output from check_npx on df

variable_row_list

Columns in df to be annotated for rows in the heatmap.

variable_col_list

Columns in df to be annotated for columns in the heatmap.

Value

updated pheatmap_args

extract ellipsis arguments and add to pheatmap arguments

Description

extract ellipsis arguments and add to pheatmap arguments

Usage

pheatmap_extract_ellipsis_arg(pheatmap_args, ...)

Arguments

pheatmap_args

pheatmap argument list

...

additional arguments to be passed to pheatmap function

Value

updated pheatmap arguments list with ellipsis variables

Function to edit grob

Description

Function to edit grob

Usage

pheatmap_lst_styling(i, x, styling_location, styling)

Arguments

i

index of attribute

x

plot

styling_location

list of locations for different attributes

styling

grob altered parameters for theme

Value

updated plot with updated grob

Run the pheatmap using args

Description

Run the pheatmap using args

Usage

pheatmap_run(pheatmap_args)

Arguments

pheatmap_args

Value

ggplot of heatmap as generated by pheatmap

Add theme to pheatmap

Description

Add theme to pheatmap

Usage

pheatmap_set_plot_theme(
  x,
  fontsize,
  col = "#737373",
  font1 = "Arial Regular",
  font2 = "Arial"
)

Arguments

x

pheatmap plot

fontsize

size of font

col

color

font1

font to use as default

font2

secondary font to use

Value

pheatmap with updated theme

Checking for needed packages and valid inputs

Description

Checking for needed packages and valid inputs

Usage

plot_heatmap_check_inputs(colnames, ...)

Arguments

colnames

Character. Determines how to label the columns. Must be 'assay', 'oid', or 'both' (default 'both').

...

Additional arguments used in pheatmap::pheatmap

Value

Null or error/warnings

remove low var assays and add colnames

Description

remove low var assays and add colnames

Usage

plot_heatmap_clean_df(df, check_log, colnames)

Arguments

df

Data frame in long format with SampleID, NPX, OlinkID, Assay and columns of choice for annotations.

check_log

output from check_npx on df

colnames

Character. Determines how to label the columns. Must be 'assay', 'oid', or 'both' (default 'both').

Value

df w/o low var assays with added colnames

Convert long df to wide

Description

Convert long df to wide

Usage

plot_heatmap_df_to_wide(df, check_log, colnames)

Arguments

df

Data frame in long format with SampleID, NPX, OlinkID, Assay and columns of choice for annotations.

check_log

output from check_npx on df

colnames

Character. Determines how to label the columns. Must be 'assay', 'oid', or 'both' (default 'both').

create list of arguments to pass to pheatmap function

Description

create list of arguments to pass to pheatmap function

Usage

plot_heatmap_pheatmap_args(
  df_wide,
  df,
  check_log,
  center_scale,
  cluster_rows,
  cluster_cols,
  na_col,
  show_rownames,
  show_colnames,
  annotation_legend,
  fontsize,
  variable_row_list,
  variable_col_list,
  colnames,
  ...
)

Arguments

df_wide

wide version of df

df

Data frame in long format with SampleID, NPX, OlinkID, Assay and columns of choice for annotations.

check_log

output from check_npx on df

center_scale

Logical. If data should be centered and scaled across assays (default TRUE).

cluster_rows

Logical. Determining if rows should be clustered (default TRUE).

cluster_cols

Logical. Determining if columns should be clustered (default TRUE).

na_col

Color of cells with NA (default black)

show_rownames

Logical. Determining if row names are shown (default TRUE).

show_colnames

Logical. Determining if column names are shown (default TRUE).

annotation_legend

Logical. Determining if legend for annotations should be shown (default TRUE).

fontsize

Fontsize (default 10)

variable_row_list

Columns in df to be annotated for rows in the heatmap.

variable_col_list

Columns in df to be annotated for columns in the heatmap.

colnames

Character. Determines how to label the columns. Must be 'assay', 'oid', or 'both' (default 'both').

...

Additional arguments used in pheatmap::pheatmap

Value

list of arguments for pheatmap

Check product name and set plate size accordingly

Description

If plate size is not provided, function will use accepted_olink_products tibble to map the product name to the plate size

Usage

product_to_platesize(product)

Arguments

product

(String) Name of product (needs to match one of names in accepted_olink_platforms$name)

Value

(Integer) Corresponding plate size per accepted_olink_platforms$plate_size

Read Olink data in R.

Description

Imports a file exported from Olink software that quantifies protein levels in NPX, Ct or absolute quantification.

Note: Do not modify the Olink software output file prior to importing it with read_npx as it might fail.

Usage

read_npx(
  filename,
  out_df = "tibble",
  long_format = NULL,
  olink_platform = NULL,
  data_type = NULL,
  .ignore_files = c("README.txt"),
  quiet = TRUE,
  legacy = FALSE
)

read_NPX(
  filename,
  out_df = "tibble",
  long_format = NULL,
  olink_platform = NULL,
  data_type = NULL,
  .ignore_files = c("README.txt"),
  quiet = TRUE,
  legacy = FALSE
)

Arguments

filename

out_df

The class of the output dataset. One of "tibble" or "arrow". Defaults to "tibble".

long_format

Boolean marking format of input file. One of TRUE for long format and FALSE for wide format files. Defaults to NULL for auto-detection.

olink_platform

Olink platform used to generate the input file. One of "Target 48", "Flex", "Target 96", "Explore 3072", "Explore HT", "Focus", or "Reveal". Defaults to NULL for auto-detection.

data_type

Quantification method of the input data. One of "Ct", "NPX", or "Quantified". Defaults to NULL for auto-detection.

.ignore_files

Character vector of files included in the zip-compressed Olink software output files that should be ignored. Used only for zip-compressed input files (default = c("README.txt")).

quiet

Boolean to print a confirmation message when reading the input file. Applies to excel or delimited input only. TRUE skips printing the message, and FALSE otherwise.

legacy

Value

Dataset, "tibble" or "ArrowObject", with Olink data in long format.

Author(s)

Klev Diamanti Kathleen Nevola Pascal Pucholt Christoffer Cambronero Boxi Zhang Olof Mansson Marianne Sandin

Examples


file <- system.file("extdata",
                    "npx_data_ext.parquet",
                    package = "OlinkAnalyze")
read_npx(filename = file)

Help function to read long or wide format "Ct", "NPX", or "Quantified" data from delimited "csv" or "txt" files exported from Olink software in R.

Description

The function can handle delimited files in long and wide format.

Usage

read_npx_delim(file, out_df = "arrow")

read_npx_csv(file, out_df = "arrow")

Arguments

file

Path to Olink software output delimited file in wide or long format. Expecting file extensions "csv" or "txt".

out_df

The class of the output dataset. One of "tibble" or "arrow". Defaults to "tibble".

Value

Dataset, "tibble" or "ArrowObject", with Olink data in long or wide format.

Author(s)

Klev Diamanti Christoffer Cambronero Kathleen Nevola

Help function to read long format "Ct", "NPX", or "Quantified" data from delimited "csv" or "txt" files exported from Olink software in R.

Description

The function can handle delimited files in long format.

Usage

read_npx_delim_long(file, sep)

Arguments

file

Path to Olink software output delimited file in wide or long format. Expecting file extensions "csv" or "txt".

sep

Character separator of delimited input file. One of NULL (default) for auto-detection, or ";", ",", " ", "|", and ":". Used only for delimited output files from Olink software.

Value

Dataset, "tibble", with Olink data in long format.

Author(s)

Klev Diamanti Christoffer Cambronero Kathleen Nevola Ola Caster

Help function to read wide format "Ct", "NPX", or "Quantified" data from delimited "csv" or "txt" files exported from Olink software in R.

Description

The function can handle delimited files in wide format.

Usage

read_npx_delim_wide(file, sep)

Arguments

file

Path to Olink software output delimited file in wide or long format. Expecting file extensions "csv" or "txt".

sep

Character separator of delimited input file. One of NULL (default) for auto-detection, or ";", ",", " ", "|", and ":". Used only for delimited output files from Olink software.

Value

Dataset, "tibble", with Olink data in wide format.

Author(s)

Klev Diamanti

Help function to read long or wide format "Ct", "NPX", or "Quantified" data from Microsoft "xls" or "xlsx" files exported from Olink software in R.

Description

Help function to read long or wide format "Ct", "NPX", or "Quantified" data from Microsoft "xls" or "xlsx" files exported from Olink software in R.

Usage

read_npx_excel(file, out_df = "arrow")

Arguments

file

Path to Olink software output excel file in wide or long format. Expected file extensions "xls" or "xlsx".

out_df

The class of the output dataset. One of "tibble" or "arrow". Defaults to "tibble".

Value

Dataset, "tibble" or "ArrowObject", with Olink data in long or wide format.

Author(s)

Klev Diamanti Christoffer Cambronero Kathleen Nevola

Help function to read excel and delimited Olink data files in R and determine their format, data type and platform.

Description

This function processes Olink software excel or delimited files regardless of data type, platform or format.

Olink software excel files with the extension "xls" or "xlsx" are imported in R by the function read_npx_excel.

Olink software delimited files with suffix "csv" or "txt" are imported in R by the functions read_npx_delim or read_npx_csv.

Files in wide format are subsequently handled by the function read_npx_wide.

Olink software files in wide format always originate from Olink qPCR platforms, and are further processed by the functions read_npx_format_get_platform and read_npx_format_get_quant to determine the data type and Olink platform, respectively.

Usage

read_npx_format(
  file,
  out_df = "arrow",
  long_format = NULL,
  olink_platform = NULL,
  data_type = NULL,
  quiet = FALSE,
  legacy = FALSE
)

Arguments

file

Path to Olink software output file in wide or long format. Expecting file extensions "xls", "xlsx", "csv", or "txt".

out_df

The class of the output dataset. One of "tibble" or "arrow". Defaults to "tibble".

long_format

Boolean marking format of input file. One of TRUE for long format and FALSE for wide format files. Defaults to NULL for auto-detection.

olink_platform

Olink platform used to generate the input file. One of "Target 48", "Flex", "Target 96", "Explore 3072", "Explore HT", "Focus", or "Reveal". Defaults to NULL for auto-detection.

data_type

Quantification method of the input data. One of "Ct", "NPX", or "Quantified". Defaults to NULL for auto-detection.

quiet

Boolean to print a confirmation message when reading the input file. Applies to excel or delimited input only. TRUE skips printing the message, and FALSE otherwise.

legacy

Boolean to enforce returning a list containing olink_platform, data_type and long_format information together with the dataset. Used only when read_npx_format is called from read_npx_legacy.

Value

Dataset, "tibble" or "ArrowObject", with Olink data in long format.

Author(s)

Klev Diamanti Kathleen Nevola Pascal Pucholt Christoffer Cambronero Boxi Zhang Olof Mansson Marianne Sandin

Help function checking whether a dataset contains NA or empty strings on its column names

Description

Help function checking whether a dataset contains NA or empty strings on its column names

Usage

read_npx_format_colnames(df, file)

Arguments

df

A "tibble" or "ArrowObject" from read_npx.

file

Value

Error is file contains problematic column names. NULL otherwise.

Author(s)

Klev Diamanti

Help function to determine the format (wide or long) of the input dataset.

Description

The function uses the first read_n rows of the input dataset to determine the format of the input file.

The user can provide the file format as an input argument long_format. If the user did not provide an input with regards to the file format, then it is auto-determined is this function. If file format was provided as input, then the function cross-checks that its auto-detection matches the user input. If not, it will throw a warning, and accept user's input as the correct answer.

Usage

read_npx_format_get_format(df_top_n, file, long_format = NULL)

Arguments

df_top_n

A tibble containing the first read_n rows of the input dataset.

file

Path to Olink software output file in wide or long format. Expecting file extensions "xls", "xlsx", "csv", or "txt".

long_format

Boolean marking format of input file. One of TRUE for long format and FALSE for wide format files. Defaults to NULL for auto-detection.

Value

A list with two elements:

A scalar boolean (is_long_format) marking if the input file is in long (TRUE) or wide (FALSE) format.
A character vector (data_cells) from the input file which allows detection of the quantification method. Used in function read_npx_format_get_quant.

Author(s)

Klev Diamanti

Help function to determine the Olink platform from the input dataset.

Description

This function uses the panel name from Olink software files in wide format to determine the qPCR platform that was used for the project that this dataset represents.

Usage

read_npx_format_get_platform(df_top_n, file, olink_platform = NULL)

Arguments

df_top_n

A tibble containing the first read_n rows of the input dataset.

file

Path to Olink software output file in wide format. Expected one of file extensions "xls", "xlsx", "csv", or "txt".

olink_platform

Olink platform used to generate the input file. One of NULL (default) for auto-detection, "Flex", "Focus", "Target 48", or "Target 96".

Value

The name of the Olink platform. One of "Flex", "Focus", "Target 48", or "Target 96".

Author(s)

Klev Diamanti

Help function to determine the type of quantification from the input file in wide format.

Description

This function uses information from the cell A2 from Olink software files in wide format to determine the quantification method of the data that the dataset contains.

Usage

read_npx_format_get_quant(file, data_type = NULL, data_cells)

Arguments

file

Path to Olink software output file in wide format. Expected one of file extensions "xls", "xlsx", "csv", or "txt".

data_type

Quantification method of the input data. One of "Ct", "NPX", or "Quantified". Defaults to NULL for auto-detection.

data_cells

A character vector with the contents of the cell A2 from the Olink software file in wide format indicating the quantification method.

Value

The name of the data type. One of "Ct", "NPX", or "Quantified".

Author(s)

Klev Diamanti

Help function to read excel and delimited Olink data files in R.

Description

This function reads Olink software excel and delimited files regardless of data type, platform or format.

Olink software excel files with the extension "xls" or "xlsx" are imported in R by the function read_npx_excel.

Olink software delimited files with suffix "csv" or "txt" are imported in R by the functions read_npx_delim or read_npx_csv.

Files in long format are read with the column row as column names.
Files in wide format are read with column names as V1, V2, etc.

This function also extracts the first read_n rows of the dataset to determine the Olink platform that generated the file, the data type and and file format.

Usage

read_npx_format_read(file, read_n = 3L)

Arguments

file

Path to Olink software output file in wide or long format. Expecting file extensions "xls", "xlsx", "csv", or "txt".

read_n

Number of top n rows to read.

Value

A list with two elements:

An ArrowObject (df) containing the full dataset in wide or long format.
A tibble (df_top_n) containing the read_n rows of the full dataset. This subset of data is used to determine long_format, olink_platform and data_type.

Author(s)

Klev Diamanti

Olink legacy function for reading NPX or absolute quantification data in wide format into R from qPCR Olink products.

Description

This implementation of read_NPX does not cover the latest versions of Olink files in wide format. Specifically, it supports:

Target 96 output files in wide format (T96 reports only NPX) with the bottom matrix containing one of the following combinations of rows:
- Missing Data freq. and LOD. # nolint: line_length_linter
- Missing Data freq., LOD, and Normalization. # nolint: line_length_linter
Target 48 output files in wide format NPX with the bottom matrix containing the following rows: Missing Data freq., Normalization, and LOD. # nolint: line_length_linter
Target 48 output files in wide format absolute Quantification with the bottom matrix containing the following rows: Assay warning, LLOQ, Lowest quantifiable level, Missing Data freq., Normalization, Plate_LOD, and ULOQ. # nolint: line_length_linter

This function would accept data exported in wide format from Olink NPX Signature 1.7.1 or earlier, or NPX Manager.

Usage

read_npx_legacy(
  file,
  out_df = "tibble",
  olink_platform = NULL,
  data_type = NULL,
  quiet = TRUE
)

Arguments

file

Path to Olink software output file in wide format. Expected one of file extensions "xls", "xlsx", "csv", or "txt".

out_df

The class of the output dataset. One of "tibble" or "arrow". Defaults to "tibble".

olink_platform

Olink platform used to generate the input file. One of NULL (default) for auto-detection, "Flex", "Focus", "Target 48", or "Target 96".

data_type

Quantification method of the input data. One of "Ct", "NPX", or "Quantified". Defaults to NULL for auto-detection.

quiet

Boolean to print a confirmation message when reading the input file. Applies to excel or delimited input only. TRUE skips printing the message, and FALSE otherwise.

Value

Dataset, "tibble" or "ArrowObject", with Olink data in long format.

Author(s)

Klev Diamanti Kathleen Nevola Pascal Pucholt Christoffer Cambronero Boxi Zhang Olof Mansson Marianne Sandin

Help function ensuring `read_npx_legacy` works

Description

Help function ensuring read_npx_legacy works

Usage

read_npx_legacy_check(file, df_top, data_type, olink_platform, bottom_mat_v)

Arguments

file

Path to Olink software output file in wide format. Expected one of file extensions "xls", "xlsx", "csv", or "txt".

df_top

Top matrix of Olink dataset in wide format.

data_type

Quantification method of the input data. One of "Ct", "NPX", or "Quantified". Defaults to NULL for auto-detection.

olink_platform

Olink platform used to generate the input file. One of NULL (default) for auto-detection, "Flex", "Focus", "Target 48", or "Target 96".

bottom_mat_v

Version of the rows in the bottom matrix of the Olink file in wide format based on the local environment variable olink_wide_bottom_matrix.

Value

NULL unless unsupported file is detected.

Author(s)

Klev Diamanti

Help function utilizing functions from `read_npx_format` and `read_npx_wide` to streamline `read_npx_legacy`

Description

Help function utilizing functions from read_npx_format and read_npx_wide to streamline read_npx_legacy

Usage

read_npx_legacy_help(
  file,
  out_df,
  olink_platform = NULL,
  data_type = NULL,
  data_type_no_accept = c("Ct")
)

Arguments

file

Path to Olink software output file in wide format. Expected one of file extensions "xls", "xlsx", "csv", or "txt".

out_df

The class of the output dataset. One of "tibble" or "arrow". Defaults to "tibble".

olink_platform

Olink platform used to generate the input file. One of NULL (default) for auto-detection, "Flex", "Focus", "Target 48", or "Target 96".

data_type

Quantification method of the input data. One of "Ct", "NPX", or "Quantified". Defaults to NULL for auto-detection.

data_type_no_accept

Character vector of data types that should be rejected (default = "Ct").

Value

A list of objects containing the following:

olink_platform: auto-detected Olink platform. One of "Flex", "Focus", "Target 48", and "Target 96".
long_format: auto-detected Olink format. Should always be "FALSE".
data_type: auto-detected Olink data type. One of "NPX" and "Quantified". # nolint: line_length_linter
df_split: list of 2 tibbles. Top matrix from the Olink wide file, and middle combined with bottom matrix.
npxs_v: Olink NPX software version.
bottom_mat_v: bottom matrix version based on olink_wide_bottom_matrix.
format_spec: specifications of the wide format based on olink_wide_spec.

Author(s)

Klev Diamanti

Help function to read NPX data from long format parquet Olink software output file in R.

Description

Help function to read NPX data from long format parquet Olink software output file in R.

Usage

read_npx_parquet(file, out_df = "arrow")

Arguments

file

Path to Olink software output parquet file in long format. Expecting file extension "parquet".

out_df

The class of the output dataset. One of "tibble" or "arrow". Defaults to "tibble".

Value

Dataset, "tibble" or "ArrowObject", with Olink data in long format.

Author(s)

Klev Diamanti Kathleen Nevola Pascal Pucholt

Convert Olink data in wide format with "Ct", "NPX", and "Quantified" data to long format.

Description

Convert Olink data in wide format with "Ct", "NPX", and "Quantified" data to long format.

Usage

read_npx_wide(df, file, data_type, olink_platform)

Arguments

df

A tibble containing the full Olink dataset in wide format.

file

Path to Olink software output file in wide format. Expected one of file extensions "xls", "xlsx", "csv", or "txt".

data_type

Quantification method of the input data. One of "Ct", "NPX", or "Quantified". Defaults to NULL for auto-detection.

olink_platform

Olink platform used to generate the input file. One of NULL (default) for auto-detection, "Flex", "Focus", "Target 48", or "Target 96".

Value

Dataset, "tibble" or "ArrowObject", with Olink data in long format.

Author(s)

Klev Diamanti

Convert the bottom matrix from Olink dataset in wide format to long.

Description

Use chunks of columns from read_npx_wide_top to covert the bottom matrix df_bottom into a long format tibble.

Usage

read_npx_wide_bottom(
  df,
  file,
  olink_platform,
  data_type,
  col_names,
  format_spec,
  df_plate_panel
)

Arguments

df

Bottom matrix of Olink dataset in wide format df_bottom.

file

Path to Olink software output file in wide format. Expected one of file extensions "xls", "xlsx", "csv", or "txt".

olink_platform

Olink platform used to generate the input file. One of NULL (default) for auto-detection, "Flex", "Focus", "Target 48", or "Target 96".

data_type

Quantification method of the input data. One of "Ct", "NPX", or "Quantified". Defaults to NULL for auto-detection.

col_names

Names list of character vectors containing column names from each chunk of columns df_top was split on in function. read_npx_wide_top.

format_spec

A tibble derived from olink_wide_spec in the local environment containing the expected format of the Olink wide file based on the olink_platform and data_type.

df_plate_panel

Tibble with unique combinations of panels and plates from the combination of top and middle data frames.

Value

A tibble with the bottom matrix of an Olink wide file in long format.

Author(s)

Klev Diamanti

Additional checks of the bottom matrix of Olink dataset in wide format.

Description

The rows included in the bottom matrix have evolved through the years. For us to be able to support as many such versions as possible we have used the local environment variable olink_wide_bottom_matrix to mark these different versions. This function extract these version and allows us to check the validity of the data.

Usage

read_npx_wide_bottom_version(df, file, data_type, olink_platform)

Arguments

df

Bottom matrix of Olink dataset in wide format df_bottom.

file

Path to Olink software output file in wide format. Expected one of file extensions "xls", "xlsx", "csv", or "txt".

data_type

Quantification method of the input data. One of "Ct", "NPX", or "Quantified". Defaults to NULL for auto-detection.

olink_platform

Olink platform used to generate the input file. One of NULL (default) for auto-detection, "Flex", "Focus", "Target 48", or "Target 96".

Value

Tibble with the bottom matrix specifications for the Olink wide file.

Author(s)

Klev Diamanti

Additional checks of the top matrix of Olink dataset in wide format.

Description

Additional checks of the top matrix of Olink dataset in wide format.

Usage

read_npx_wide_check_top(df, file, format_spec)

Arguments

df

Top matrix of Olink dataset in wide format df_top.

file

Path to Olink software output file in wide format. Expected one of file extensions "xls", "xlsx", "csv", or "txt".

format_spec

A tibble derived from olink_wide_spec in the local environment containing the expected format of the Olink wide file based on the olink_platform and data_type.

Value

NULL unless an inconsistency is spotted.

Author(s)

Klev Diamanti

Split the middle matrix from Olink dataset in wide format.

Description

Use chunks of columns from read_npx_wide_top to split the middle matrix df_mid into corresponding chunks of columns.

Usage

read_npx_wide_middle(df, file, data_type, col_names)

Arguments

df

Middle matrix of Olink dataset in wide format df_mid.

file

Path to Olink software output file in wide format. Expected one of file extensions "xls", "xlsx", "csv", or "txt".

data_type

Quantification method of the input data. One of "Ct", "NPX", or "Quantified". Defaults to NULL for auto-detection.

col_names

Names list of character vectors containing column names from each chunk of columns df_top was split on in function. read_npx_wide_top.

Value

A list of data frames (df_oid, df_pid, df_qc_warn and df_int_ctrl) in long format from the middle matrix of an Olink wide file:

Data frame containing measurements of Olink assays df_mid_oid
Data frame containing plate identifiers df_mid_pid
Data frame containing QC warnings df_mid_qc_warn
Data frame containing measurements of internal control assays df_mid_int_ctrl
Data frame containing measurements of deviations from internal control assays df_mid_dev_int_ctrl

Author(s)

Klev Diamanti

Extract version of NPX Signature from the head matrix of Olink datasets in wide format.

Description

Extract version of NPX Signature from the head matrix of Olink datasets in wide format.

Usage

read_npx_wide_npxs_version(df)

Arguments

df

Head matrix of Olink dataset in wide format df_head.

Value

The version of the NPX Signature software.

Author(s)

Klev Diamanti

Help function to extract Panel_Version from Panel column.

Description

Help function to extract Panel_Version from Panel column.

Usage

read_npx_wide_panel_version(df)

Arguments

df

A tibble containing the column Panel.

Value

Same tibble as input with additional column Panel_Version.

Split Olink wide files to sub-matrices.

Description

Olink datasets in wide format contain 2 or 3 rows with all columns NA marking sub-matrices of the data. This function takes advantage of that feature and splits the dataset into 3 or 4 sub-matrices. Each sub-matrix is used downstream to assemble a long data frame.

Specifically:

Head matrix consists of the first 2 rows of the wide dataset. This matrix contains the project name, the NPX Signature version that was used to generate the wide dataset and the quantification method.
Top matrix consists of the next 4 or 5 rows of the wide dataset, depending on the quantification method. This matrix contains data on assays, panels, columns with plate identifiers, columns with sample QC warnings and column with deviations from the internal controls. Note that not all the columns are present in all datasets and for all quantification methods. The local environment variable olink_wide_spec marks all the expected configurations.
Middle matrix is marked by rows with all columns NA above and below. This matrix contains sample identifiers, quantification measurements for all assays, plate identifiers, sample QC warnings and deviations from the internal controls.
Bottom matrix is located below the middle matrix and contains information of LOD, missing frequency, assay warning and data normalization approach. Note that this matrix is not available for all quantification methods.

Usage

read_npx_wide_split_row(df, file, data_type, format_spec)

Arguments

df

A tibble containing the full Olink dataset in wide format.

file

Path to Olink software output file in wide format. Expected one of file extensions "xls", "xlsx", "csv", or "txt".

data_type

Quantification method of the input data. One of "Ct", "NPX", or "Quantified". Defaults to NULL for auto-detection.

format_spec

A tibble derived from olink_wide_spec in the local environment containing the expected format of the Olink wide file based on the olink_platform and data_type.

Value

A named list of tibbles containing the sub-matrices of the Olink wide format file split on:

Head matrix as df_head
Top matrix as df_top
Middle as df_mid
Bottom matrix as df_bottom

Author(s)

Klev Diamanti

Split the top matrix from Olink dataset in wide format.

Description

The function splits the top matrix df_top into chunks of columns, each of which contains separate information that will be combined with matching chunks from df_mid to convert the wide dataset into a long one.

Usage

read_npx_wide_top(df, file, olink_platform, format_spec)

Arguments

df

Top matrix of Olink dataset in wide format df_top.

file

Path to Olink software output file in wide format. Expected one of file extensions "xls", "xlsx", "csv", or "txt".

olink_platform

Olink platform used to generate the input file. One of NULL (default) for auto-detection, "Flex", "Focus", "Target 48", or "Target 96".

format_spec

A tibble derived from olink_wide_spec in the local environment containing the expected format of the Olink wide file based on the olink_platform and data_type.

Value

A list of data frames from top matrix in long format:

Data frame containing Olink assays df_top_oid
Data frame containing plate identifiers df_top_pid
Data frame containing QC warnings df_top_qc_warn
Data frame containing internal control assays df_top_int_ctrl
Data frame containing deviation from internal control assays df_top_dev_int_ctrl

Author(s)

Klev Diamanti

Help function to read "Ct", "NPX", and "Quantified" data from zip-compressed Olink software output files in R.

Description

A zip-compressed input file might contain a file from the Olink software containing "Ct", "NPX", and "Quantified" data, a checksum file and one or more files to be ignored.

Note: The zip-compressed file should strictly contain one Olink data file, none or one checksum file and none or one, or more files that might be ignored.

Olink file exported by Olink software in wide or long format. Expecting file extensions "xls", "xlsx", "csv", "txt", and "parquet". This file is subsequently provided as input to read_npx.
checksum file One of "MD5_checksum.txt" or "checksum_sha256.txt". depending on the checksum algorithm. The file contains only one line with the checksum string of characters.
File(s) to be ignored from the zip file. These files can be named as a character vector in the argument .ignore_files.

Usage

read_npx_zip(
  file,
  out_df = "arrow",
  long_format = NULL,
  olink_platform = NULL,
  data_type = NULL,
  .ignore_files = c("README.txt"),
  quiet = FALSE
)

Arguments

file

Path to Olink software output zip-compressed file in wide or long format. Expected file extensions "zip".

out_df

The class of the output dataset. One of "tibble" or "arrow". Defaults to "tibble".

long_format

Boolean marking format of input file. One of TRUE for long format and FALSE for wide format files. Defaults to NULL for auto-detection.

olink_platform

Olink platform used to generate the input file. One of "Target 48", "Flex", "Target 96", "Explore 3072", "Explore HT", "Focus", or "Reveal". Defaults to NULL for auto-detection.

data_type

Quantification method of the input data. One of "Ct", "NPX", or "Quantified". Defaults to NULL for auto-detection.

.ignore_files

Character vector of files included in the zip-compressed Olink software output files that should be ignored. Used only for zip-compressed input files (default = c("README.txt")).

quiet

Boolean to print a confirmation message when reading the input file. Applies to excel or delimited input only. TRUE skips printing the message, and FALSE otherwise.

Value

Dataset, "tibble" or "ArrowObject", with Olink data in long format.

Author(s)

Klev Diamanti Kathleen Nevola Pascal Pucholt

Combine top and middle matrices in long format.

Description

Combined corresponding chunks of columns from the top and middle matrix that were computed from read_npx_wide_top and read_npx_wide_middle, respectively.

Usage

red_npx_wide_top_mid_long(df_top_list, df_middle_list, data_type, format_spec)

Arguments

df_top_list

List of data frames from the top matrix. Output of function read_npx_wide_top.

df_middle_list

List of data frames from the middle matrix. Output of function read_npx_wide_middle.

data_type

Quantification method of the input data. One of "Ct", "NPX", or "Quantified". Defaults to NULL for auto-detection.

format_spec

A tibble derived from olink_wide_spec in the local environment containing the expected format of the Olink wide file based on the olink_platform and data_type.

Value

Tibble in long format combining the top and middle matrices.

Author(s)

Klev Diamanti

Utility function removing columns with all values NA from a dataset.

Description

Utility function removing columns with all values NA from a dataset.

Usage

remove_all_na_cols(df)

Arguments

df

An Olink dataset.

Value

The input Olink dataset without all-NA columns.

Author(s)

Klev Diamanti

Check and run `check_npx()` if not provided.

Description

Check and run check_npx() if not provided.

Usage

run_check_npx(df, check_log = NULL)

Arguments

df

A "tibble" or "ArrowObject" from read_npx.

check_log

A named list returned by check_npx(). If NULL, check_npx() will be run internally using df.

Details

This function acts as a wrapper for check_npx(). It will check if the input check_log provided by the user is valid. If not, it will throw relevant errors or warnings. Alternatively, if check_log was not provided by the user, it will run check_npx() to provide check_log to enable downstream functions to run.#'

Value

A list containing the following elements:

col_names List of column names from the input data frame marking the columns to be used in downstream analyses.
oid_invalid Character vector of invalid OlinkID.
assay_na Character vector of assays with all samples having NA values.
sample_id_dups Character vector of duplicate SampleID.
sample_id_na Character vector containing SampleID of samples with quantified values NA for all assays.
col_class Data frame with columns of incorrect type including column key col_key, column name col_name, detected column type col_class and expected column type expected_col_class.
assay_qc Character vector containing OlinkID of assays with at least one assay warning.
non_unique_uniprot Character vector of OlinkID mapped to more than one UniProt ID.
darid_invalid Character vector containing outdated combinations of DataAnalysisRefID and PanelDataArchiveVersion.

Author(s)

Klev Diamanti

Function to set plot theme

Description

This function sets a coherent plot theme for functions.

Usage

set_plot_theme(font = "Arial")

Arguments

font

Font family to use for text elements. Default: "Arial".

Value

No return value, used as theme for ggplots

Examples


if (rlang::is_installed(pkg = c("showtext", "systemfonts",
                                "sysfonts", "curl"))) {
  ggplot2::ggplot(
    data = datasets::mtcars,
    mapping = ggplot2::aes(
      x = .data[["wt"]],
      y = .data[["mpg"]],
      color = as.factor(x = .data[["cyl"]])
    )
  ) +
    ggplot2::geom_point(
      size = 4L
    ) +
    OlinkAnalyze::set_plot_theme()

  ggplot2::ggplot(
    data = datasets::mtcars,
    mapping = ggplot2::aes(
      x = .data[["wt"]],
      y = .data[["mpg"]],
     color = as.factor(x = .data[["cyl"]])
    )
  ) +
    ggplot2::geom_point(
      size = 4L
    ) +
    OlinkAnalyze::set_plot_theme(
      font = ""
    )
}

Package {OlinkAnalyze}

Common parameters for check functions.

Description

Usage

Arguments

Value

Author(s)

See Also

Common parameters for downstream analysis functions.

Description

Usage

Arguments

Author(s)

Common parameters for getter functions in this file.

Description

Usage

Arguments

Author(s)

Common parameters for read_npx-related functions.

Description

Usage

Arguments

Value

Author(s)

Utility function that adds quotation marks on elements printed by ansi_collapse from cli.

Description

Usage

Arguments

Value

assign subject to a plate for longitudinal randomization

Description

Usage

Arguments

Value

Help function comparing the checksum reported by Olink software to the checksum of the Olink data file from the input zip-compressed file.

Description

Usage

Arguments

Value

Author(s)

Check if col_key is valid.

Description

Usage

Arguments

Value

Check presence of columns in dataset.

Description

Usage

Arguments

Details

Value

Author(s)

Examples

Help function checking for DARID and PanelDataArchiveVersion combinations

Description

Usage

Arguments

Value

Author(s)

Help function checking if file exists.

Description

Usage

Arguments

Value

Author(s)

Help function checking if file extension is acceptable.

Description

Usage

Arguments

Value

Help function to check Explore HT Fixed LOD file version

Description

Usage

Arguments

Value

Author(s)

Help function checking if a variable is an R6 ArrowObject.

Description

Usage

Arguments