---
title: "VectraPolarisData"
author:
- name: Julia Wrobel and Tusharkanti Ghosh
  affiliation: Department of Biostatistics and Informatics, Colorado School of Public Health
output:
  BiocStyle::html_document:
    toc_float: true
package: VectraPolarisData
abstract: |
  The VectraPolarisData ExperimentHub package provides two large multiplex immunofluorescence datasets collected by Akoya Biosciences Vectra 3 and Vectra Polaris platforms. Image preprocessing (cell segmentation and phenotyping) was performed using  Inform software. Data cover are formatted into objects of class SpatialExperiment.
vignette: |
  %\VignetteIndexEntry{VectraPolarisData}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---


```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```




# Loading the data

To retrieve a dataset, we can use a dataset's corresponding named function `<id>()`, where `<id>` should correspond to one a valid dataset identifier (see `?VectraPolarisData`). Below both the lung and ovarian cancer datasets are loaded this way.

```{r, message = FALSE, warning = FALSE}
library(VectraPolarisData)
spe_lung <- HumanLungCancerV3()
spe_ovarian <- HumanOvarianCancerVP()
```

Alternatively, data can loaded directly from Bioconductor's `r Biocpkg("ExperimentHub")` as follows. First, we initialize a hub instance and store the complete list of records in a variable `eh`. Using `query()`, we then identify any records made available by the `VectraPolarisData` package, as well as their accession IDs (EH7311 for the lung cancer data). Finally, we can load the data into R via `eh[[id]]`, where `id` corresponds to the data entry's identifier we'd like to load. E.g.:

```{r, eval = FALSE}
library(ExperimentHub)
eh <- ExperimentHub()        # initialize hub instance
q <- query(eh, "VectraPolarisData") # retrieve 'VectraPolarisData' records
id <- q$ah_id[1]             # specify dataset ID to load
spe <- eh[[id]]              # load specified dataset
```


# Data Representation

Both the `HumanLungCancerV3()` and `HumanOvarianCancerVP()` datasets are stored as `SpatialExperiment` objects. This allows users of our data to interact with methods built for `SingleCellExperiment`, `SummarizedExperiment`, and `SpatialExperiment` class methods in Bioconductor. See [this ebook](https://lmweber.org/OSTA-book/index.html#welcome) for more details on `SpatialExperiment`. To get cell level tabular data that can be stored in this format, raw multiplex.tiff images have been preprocessed, segmented and cell phenotyped using [`Inform`](https://www.akoyabio.com/phenoimager/software/inform-tissue-finder/) software from Akoya Biosciences.


The `SpatialExperiment` class was originally built for spatial transcriptomics data and follows the structure depicted in the schematic below (Righelli et al. 2021): 

```{r, echo=FALSE}
# image from Righelli et al. 2021
url <- "SPE.png"
```

<center><img src="`r url`"></center>


To adapt this class structure for multiplex imaging data we use slots in the following way:

* `assays` slot: `intensities`, `nucleus_intensities`, `membrane_intensities`
* `sample_id` slot: contains image identifier. For the VectraOvarianDataVP this also identifies the subject because there is one image per subject
* `colData` slot: Other cell-level characteristics of the marker intensities, cell phenotypes, cell shape characteristics
* `spatialCoordsNames` slot: The `x-` and `y-` coordinates describing the location of the center point in the image for each cell
* `metadata` slot: A dataframe of subject-level patient clinical characteristics.


# Transforming to other data formats

The following code shows how to transform the `SpatialExperiment` class object to a `data.frame` class object, if that is preferred for analysis. The example below is shown using the `HumanOvarianVP` dataset.

```{r, message = FALSE}
library(dplyr)

## Assays slots
assays_slot <- assays(spe_ovarian)
intensities_df <- assays_slot$intensities
rownames(intensities_df) <- paste0("total_", rownames(intensities_df))
nucleus_intensities_df<- assays_slot$nucleus_intensities
rownames(nucleus_intensities_df) <- paste0("nucleus_", rownames(nucleus_intensities_df))
membrane_intensities_df<- assays_slot$membrane_intensities
rownames(membrane_intensities_df) <- paste0("membrane_", rownames(membrane_intensities_df))

# colData and spatialData
colData_df <- colData(spe_ovarian)
spatialCoords_df <- spatialCoords(spe_ovarian)

# clinical data
patient_level_df <- metadata(spe_ovarian)$clinical_data

cell_level_df <- as.data.frame(cbind(colData_df, 
                                     spatialCoords_df,
                                     t(intensities_df),
                                     t(nucleus_intensities_df),
                                     t(membrane_intensities_df))
                               )


ovarian_df <- full_join(patient_level_df, cell_level_df, by = "sample_id")

```




# Citation Info

The objects provided in this package are rich data sources we encourage others to use in their own analyses. If you do include them in your peer-reviewed work, we ask that you cite our package and the original studies. 

To cite the `VectraPolarisData` package, use:

```{}
@Manual{VectraPolarisData,
    title = {VectraPolarisData: Vectra Polaris and Vectra 3 multiplex single-cell imaging data},
    author = {Wrobel, J and Ghosh, T},
    year = {2022},
    note = {Bioconductor R package version 1.0},
  }
```


To cite the `HumanLungCancerV3()` data in `bibtex` format, use:

```{}
@article{johnson2021cancer,
  title={Cancer cell-specific MHCII expression as a determinant of the immune infiltrate organization and function in the non-small cell lung cancer tumor microenvironment.},
  author={Johnson, AM and Boland, JM and Wrobel, J and Klezcko, EK and Weiser-Evans, M and Hopp, K and Heasley, L and Clambey, ET and Jordan, K and Nemenoff, RA and others},
  journal={Journal of Thoracic Oncology: Official Publication of the International Association for the Study of Lung Cancer},
  year={2021}
}
```

To cite the `HumanOvarianCancerVP()` data, use:

```{}
@article{steinhart2021spatial,
  title={The spatial context of tumor-infiltrating immune cells associates with improved ovarian cancer survival},
  author={Steinhart, Benjamin and Jordan, Kimberly R and Bapat, Jaidev and Post, Miriam D and Brubaker, Lindsay W and Bitler, Benjamin G and Wrobel, Julia},
  journal={Molecular Cancer Research},
  volume={19},
  number={12},
  pages={1973--1979},
  year={2021},
  publisher={AACR}
}
```



# Data Dictionaries

Detailed tables representing what is provided in each dataset are listed here

## HumanLungCancerV3

In the table below note the following shorthand:

* `[marker]` represents one of: `cd3`, `cd8`, `cd14`, `cd19`, `cd68`, `ck`, `dapi`, `hladr`,  
* `[cell region]` represents one of: entire_cell, membrane, nucleus

**Table 1: data dictionary for HumanLungCancerV3**

<style type="text/css">
.tg  {border-collapse:collapse;border-spacing:0;}
.tg td{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;
  overflow:hidden;padding:10px 5px;word-break:normal;}
.tg th{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;
  font-weight:normal;overflow:hidden;padding:10px 5px;word-break:normal;}
.tg .tg-l6li{border-color:inherit;font-size:10px;text-align:left;vertical-align:top}
.tg .tg-0pky{border-color:inherit;text-align:left;vertical-align:top}
.tg .tg-jpc1{font-size:10px;text-align:left;vertical-align:top}
.tg .tg-0lax{text-align:left;vertical-align:top}
</style>
<table class="tg">
<thead>
  <tr>
    <th class="tg-0pky">Variable</th>
    <th class="tg-0pky"><span style="font-weight:bold">Slot</span></th>
    <th class="tg-0pky">Description</th>
    <th class="tg-0pky">Variable coding</th>
  </tr>
</thead>
<tbody>
  <tr>
    <td class="tg-l6li">[marker]</td>
    <td class="tg-l6li">assays: intensities</td>
    <td class="tg-l6li">mean total cell intensity for [marker]&nbsp;&nbsp;</td>
    <td class="tg-l6li"></td>
  </tr>
  <tr>
    <td class="tg-l6li">[marker]</td>
    <td class="tg-l6li">assays: nucleus_intensities</td>
    <td class="tg-l6li">mean nucleus intensity for [marker]</td>
    <td class="tg-l6li"></td>
  </tr>
  <tr>
    <td class="tg-l6li">[marker]</td>
    <td class="tg-l6li">assays: membrane_intensities</td>
    <td class="tg-l6li">mean membrane intensity for [marker]</td>
    <td class="tg-l6li"></td>
  </tr>
  <tr>
    <td class="tg-l6li">sample_id</td>
    <td class="tg-l6li"></td>
    <td class="tg-l6li">image identifier, also subject id for the ovarian data</td>
    <td class="tg-l6li"></td>
  </tr>
  <tr>
    <td class="tg-l6li">cell_id</td>
    <td class="tg-l6li" rowspan="13">colData<br><br><br><br><br><br><br><br></td>
    <td class="tg-l6li">cell identifier</td>
    <td class="tg-l6li"></td>
  </tr>
  <tr>
    <td class="tg-l6li">slide_id</td>
    <td class="tg-l6li">slide identifier, also the patient id for the lung data</td>
    <td class="tg-l6li"></td>
  </tr>
  <tr>
    <td class="tg-l6li">tissue category</td>
    <td class="tg-l6li">type of tissue (indicates a region of the image)</td>
    <td class="tg-l6li">Stroma or Tumor</td>
  </tr>
  <tr>
    <td class="tg-l6li">[cell region]_[marker]_min</td>
    <td class="tg-l6li">min [cell region] intensity for [marker]</td>
    <td class="tg-l6li"></td>
  </tr>
  <tr>
    <td class="tg-l6li">[cell region]_[marker]_max</td>
    <td class="tg-l6li">max [cell region] intensity for [marker]</td>
    <td class="tg-l6li"></td>
  </tr>
  <tr>
    <td class="tg-l6li">[cell region]_[marker]_std_dev</td>
    <td class="tg-l6li"><span style="font-weight:400;font-style:normal"> [cell region] std dev of intensity for [marker]</span></td>
    <td class="tg-l6li"></td>
  </tr>
  <tr>
    <td class="tg-l6li">[cell region]_[marker]_total</td>
    <td class="tg-l6li">total [cell region] intensity for [marker]</td>
    <td class="tg-l6li"></td>
  </tr>
  <tr>
    <td class="tg-l6li">[cell region]_area_square_microns</td>
    <td class="tg-l6li">[cell region] area in square microns</td>
    <td class="tg-l6li"></td>
  </tr>
  <tr>
    <td class="tg-l6li">[cell region]_compactness</td>
    <td class="tg-l6li">[cell region] compactness</td>
    <td class="tg-l6li"></td>
  </tr>
  <tr>
    <td class="tg-l6li">[cell region]_minor_axis</td>
    <td class="tg-l6li"><span style="font-weight:400;font-style:normal">[cell region] length of minor axis</span></td>
    <td class="tg-l6li"></td>
  </tr>
  <tr>
    <td class="tg-l6li">[cell region]_major_axis</td>
    <td class="tg-l6li"><span style="font-weight:400;font-style:normal">[cell region] length of major axis</span></td>
    <td class="tg-l6li"></td>
  </tr>
  <tr>
    <td class="tg-l6li">[cell region]_axis_ratio</td>
    <td class="tg-l6li"><span style="font-weight:400;font-style:normal">[cell region] ratio of major and minor axis</span></td>
    <td class="tg-l6li"></td>
  </tr>
  <tr>
    <td class="tg-jpc1">phenotype_[marker]</td>
    <td class="tg-jpc1">cell phenotype label as determined by Inform software</td>
    <td class="tg-0lax"></td>
  </tr>
  <tr>
    <td class="tg-l6li">cell_x_position</td>
    <td class="tg-l6li" rowspan="2">spatialCoordsNames</td>
    <td class="tg-l6li">cell x coordinate</td>
    <td class="tg-l6li"></td>
  </tr>
  <tr>
    <td class="tg-l6li">cell_y_position</td>
    <td class="tg-l6li">cell y coordinate</td>
    <td class="tg-l6li"></td>
  </tr>
  <tr>
    <td class="tg-l6li">gender</td>
    <td class="tg-l6li" rowspan="12">metadata</td>
    <td class="tg-l6li">gender</td>
    <td class="tg-l6li">"M", "F"</td>
  </tr>
  <tr>
    <td class="tg-l6li">mhcII_status</td>
    <td class="tg-l6li">MHCII status, from Johnson et.al. 2021</td>
    <td class="tg-l6li">"low", "high"</td>
  </tr>
  <tr>
    <td class="tg-l6li">age_at_diagnosis</td>
    <td class="tg-l6li"><span style="font-weight:400;font-style:normal">age at diagnosis</span></td>
    <td class="tg-l6li"></td>
  </tr>
  <tr>
    <td class="tg-l6li">stage_at_diagnosis</td>
    <td class="tg-l6li"><span style="font-weight:400;font-style:normal">stage of the cancer when image was collected</span></td>
    <td class="tg-l6li"></td>
  </tr>
  <tr>
    <td class="tg-l6li">stage_numeric</td>
    <td class="tg-l6li">numeric version of stage variable</td>
    <td class="tg-l6li"></td>
  </tr>
  <tr>
    <td class="tg-l6li">pack_years</td>
    <td class="tg-l6li">pack-years of cigarette smoking</td>
    <td class="tg-l6li"></td>
  </tr>
  <tr>
    <td class="tg-l6li">survival_days</td>
    <td class="tg-l6li">time in days from date of diagnosis to date of death or censoring event</td>
    <td class="tg-l6li"></td>
  </tr>
  <tr>
    <td class="tg-l6li">survival_status</td>
    <td class="tg-l6li">did the participant pass away?</td>
    <td class="tg-l6li">0 = no, 1 = yes</td>
  </tr>
  <tr>
    <td class="tg-l6li">cause_of_death</td>
    <td class="tg-l6li">cause of death</td>
    <td class="tg-l6li"></td>
  </tr>
  <tr>
    <td class="tg-l6li">recurrence_or_lung_ca_death</td>
    <td class="tg-l6li">did the participant have a recurrence or death event?</td>
    <td class="tg-l6li">0 = no, 1 = yes</td>
  </tr>
  <tr>
    <td class="tg-l6li">time_to_recurrence_days</td>
    <td class="tg-l6li">time in days from date of diagnosis to first recurrent event</td>
    <td class="tg-l6li"></td>
  </tr>
  <tr>
    <td class="tg-l6li">adjuvant_therapy</td>
    <td class="tg-l6li">whether or not the participant received adjuvant therapy</td>
    <td class="tg-l6li">"No", "Yes"</td>
  </tr>
</tbody>
</table>

## HumanOvarianCancerVP

In the table below note the following shorthand:

* `[marker]` represents one of: `cd3`, `cd8`, `cd19`, `cd68`, `ck`, `dapi`, `ier3`, `ki67`, `pstat3` 
* `[cell region]` represents one of: cytoplasm, membrane, nucleus

**Table 2: data dictionary for HumanOvarianCancerVP**

<style type="text/css">
.tg  {border-collapse:collapse;border-spacing:0;}
.tg td{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;
  overflow:hidden;padding:10px 5px;word-break:normal;}
.tg th{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;
  font-weight:normal;overflow:hidden;padding:10px 5px;word-break:normal;}
.tg .tg-l6li{border-color:inherit;font-size:10px;text-align:left;vertical-align:top}
.tg .tg-fymr{border-color:inherit;font-weight:bold;text-align:left;vertical-align:top}
.tg .tg-0pky{border-color:inherit;text-align:left;vertical-align:top}
</style>
<table class="tg">
<thead>
  <tr>
    <th class="tg-fymr">Variable</th>
    <th class="tg-0pky"><span style="font-weight:bold">Slot</span></th>
    <th class="tg-fymr">Description</th>
    <th class="tg-fymr">Variable coding</th>
  </tr>
</thead>
<tbody>
  <tr>
    <td class="tg-l6li">[marker]</td>
    <td class="tg-l6li">assays: intensities</td>
    <td class="tg-l6li">mean total cell intensity for [marker]&nbsp;&nbsp;</td>
    <td class="tg-l6li"></td>
  </tr>
  <tr>
    <td class="tg-l6li">[marker]</td>
    <td class="tg-l6li">assays: nucleus_intensities</td>
    <td class="tg-l6li">mean nucleus intensity for [marker]</td>
    <td class="tg-l6li"></td>
  </tr>
  <tr>
    <td class="tg-l6li">[marker]</td>
    <td class="tg-l6li">assays: membrane_intensities</td>
    <td class="tg-l6li">mean membrane intensity for [marker]</td>
    <td class="tg-l6li"></td>
  </tr>
  <tr>
    <td class="tg-l6li">sample_id</td>
    <td class="tg-l6li"></td>
    <td class="tg-l6li">image identifier, also subject id for the ovarian data</td>
    <td class="tg-l6li"></td>
  </tr>
  <tr>
    <td class="tg-l6li">cell_id</td>
    <td class="tg-l6li" rowspan="12">colData<br><br><br><br><br><br><br><br></td>
    <td class="tg-l6li">cell identifier</td>
    <td class="tg-l6li"></td>
  </tr>
  <tr>
    <td class="tg-l6li">slide_id</td>
    <td class="tg-l6li">slide identifier</td>
    <td class="tg-l6li"></td>
  </tr>
  <tr>
    <td class="tg-l6li">tissue category</td>
    <td class="tg-l6li">type of tissue (indicates a region of the image)</td>
    <td class="tg-l6li">Stroma or Tumor</td>
  </tr>
  <tr>
    <td class="tg-l6li">[cell region]_[marker]_min</td>
    <td class="tg-l6li">min [cell region] intensity for [marker]</td>
    <td class="tg-l6li"></td>
  </tr>
  <tr>
    <td class="tg-l6li">[cell region]_[marker]_max</td>
    <td class="tg-l6li">max [cell region] intensity for [marker]</td>
    <td class="tg-l6li"></td>
  </tr>
  <tr>
    <td class="tg-l6li">[cell region]_[marker]_std_dev</td>
    <td class="tg-l6li"><span style="font-weight:400;font-style:normal"> [cell region] std dev of intensity for [marker]</span></td>
    <td class="tg-l6li"></td>
  </tr>
  <tr>
    <td class="tg-l6li">[cell region]_[marker]_total</td>
    <td class="tg-l6li">total [cell region] intensity for [marker]</td>
    <td class="tg-l6li"></td>
  </tr>
  <tr>
    <td class="tg-l6li">[cell region]_area_square_microns</td>
    <td class="tg-l6li">[cell region] area in square microns</td>
    <td class="tg-l6li"></td>
  </tr>
  <tr>
    <td class="tg-l6li">[cell region]_compactness</td>
    <td class="tg-l6li">[cell region] compactness</td>
    <td class="tg-l6li"></td>
  </tr>
  <tr>
    <td class="tg-l6li">[cell region]_minor_axis</td>
    <td class="tg-l6li"><span style="font-weight:400;font-style:normal">[cell region] length of minor axis</span></td>
    <td class="tg-l6li"></td>
  </tr>
  <tr>
    <td class="tg-l6li">[cell region]_major_axis</td>
    <td class="tg-l6li"><span style="font-weight:400;font-style:normal">[cell region] length of major axis</span></td>
    <td class="tg-l6li"></td>
  </tr>
  <tr>
    <td class="tg-l6li">[cell region]_axis_ratio</td>
    <td class="tg-l6li"><span style="font-weight:400;font-style:normal">[cell region] ratio of major and minor axis</span></td>
    <td class="tg-l6li"></td>
  </tr>
  <tr>
    <td class="tg-l6li">cell_x_position</td>
    <td class="tg-l6li" rowspan="2">spatialCoordsNames</td>
    <td class="tg-l6li">cell x coordinate</td>
    <td class="tg-l6li"></td>
  </tr>
  <tr>
    <td class="tg-l6li">cell_y_position</td>
    <td class="tg-l6li">cell y coordinate</td>
    <td class="tg-l6li"></td>
  </tr>
  <tr>
    <td class="tg-l6li">diagnosis</td>
    <td class="tg-l6li" rowspan="13">metadata</td>
    <td class="tg-l6li"></td>
    <td class="tg-l6li"></td>
  </tr>
  <tr>
    <td class="tg-l6li">primary</td>
    <td class="tg-l6li">primary tumor from initial diagnosis?</td>
    <td class="tg-l6li">0 = no, 1 = yes</td>
  </tr>
  <tr>
    <td class="tg-l6li">recurrent</td>
    <td class="tg-l6li">tumor from a recurrent event (not initial diagnosis tumor)? </td>
    <td class="tg-l6li">0 = no, 1 = yes</td>
  </tr>
  <tr>
    <td class="tg-l6li">treatment_effect</td>
    <td class="tg-l6li">was tumor treated with chemo prior to imaging?</td>
    <td class="tg-l6li">0 = no, 1 = yes</td>
  </tr>
  <tr>
    <td class="tg-l6li">stage</td>
    <td class="tg-l6li">stage of the cancer when image was collected</td>
    <td class="tg-l6li">I,II,II,IV</td>
  </tr>
  <tr>
    <td class="tg-l6li">grade</td>
    <td class="tg-l6li">grade of cancer severity (nearly all 3)</td>
    <td class="tg-l6li"></td>
  </tr>
  <tr>
    <td class="tg-l6li">survival_time</td>
    <td class="tg-l6li">time in months from date of diagnosis to date of death or censoring event</td>
    <td class="tg-l6li"></td>
  </tr>
  <tr>
    <td class="tg-l6li">death</td>
    <td class="tg-l6li">did the participant pass away?</td>
    <td class="tg-l6li">0 = no, 1 = yes</td>
  </tr>
  <tr>
    <td class="tg-l6li">BRCA_mutation</td>
    <td class="tg-l6li">does the participant have a BRCA mutation?</td>
    <td class="tg-l6li">0 = no, 1 = yes</td>
  </tr>
  <tr>
    <td class="tg-l6li">age_at_diagnosis</td>
    <td class="tg-l6li">age at diagnosis</td>
    <td class="tg-l6li"></td>
  </tr>
  <tr>
    <td class="tg-l6li">time_to_recurrence</td>
    <td class="tg-l6li">time in months from date of diagnosis to first recurrent event</td>
    <td class="tg-l6li"></td>
  </tr>
  <tr>
    <td class="tg-l6li">parpi_inhibitor</td>
    <td class="tg-l6li">whether or not the participant received PARPi inhibitor</td>
    <td class="tg-l6li">N = no, Y = yes</td>
  </tr>
  <tr>
    <td class="tg-l6li">debulking</td>
    <td class="tg-l6li">subjective rating of how the tumor removal process went</td>
    <td class="tg-l6li">optimal, suboptimal, interval</td>
  </tr>
</tbody>
</table>



**Note**: the `debulking` variable described as `optimal` if surgeon believes tumor area was reduced to 1 cm or below; `suboptimal` if surgeon was unable to remove significant amount of tumor due to various reasons; `interval` if tumor removal came after three cycles of chemo 
     

# Session Info

```{r}
sessionInfo()
```

     
