Version: 1.1.1
Title: Adaptive Machine Learning-Powered, Context-Matching Tool for Single-Cell and Spatial Transcriptomics Annotation
Description: Annotates single-cell and spatial-transcriptomic (ST) data using context-matching marker datasets. It creates a unified marker list (‘Markers_list') from multiple sources: built-in curated databases (’Cellmarker2', 'PanglaoDB', 'scIBD', 'TCellSI', 'PCTIT', 'PCTAM'), Seurat objects with cell labels, or user-provided Excel tables. SlimR first uses adaptive machine learning for parameter optimization, and then offers two automated annotation approaches: 'cluster-based' and 'per-cell'. Cluster-based annotation assigns one label per cluster, expression-based probability calculation, and AUC validation. Per-cell annotation assigns labels to individual cells using three scoring methods with adaptive thresholds and ratio-based confidence filtering, plus optional UMAP spatial smoothing, making it ideal for heterogeneous clusters and rare cell types. The package also supports semi-automated workflows with heatmaps, feature plots, and combined visualizations for manual annotation. For more details, see Kabacoff (2020, ISBN:9787115420572).
License: MIT + file LICENSE
URL: https://github.com/zhaoqing-wang/SlimR
BugReports: https://github.com/zhaoqing-wang/SlimR/issues
Depends: R (≥ 3.5)
Imports: cowplot, dplyr, ggplot2, patchwork, pheatmap, readxl, scales, Seurat, tidyr, tools, tibble
Suggests: crayon, RANN, testthat (≥ 3.0.0)
Encoding: UTF-8
LazyData: true
RoxygenNote: 7.3.3
Date: 2026-02-05
NeedsCompilation: no
Packaged: 2026-02-05 15:49:32 UTC; Runaw
Author: Zhaoqing Wang ORCID iD [aut, cre]
Maintainer: Zhaoqing Wang <zhaoqingwang@mail.sdu.edu.cn>
Repository: CRAN
Date/Publication: 2026-02-05 16:20:14 UTC

Apply UMAP-based spatial smoothing to scores

Description

Apply UMAP-based spatial smoothing to scores

Usage

.apply_umap_smoothing(
  seurat_obj,
  score_matrix,
  umap_reduction,
  k_neighbors,
  smoothing_weight,
  chunk_size,
  verbose
)

Compute AUCell-like rank-based scores

Description

Uses a ranking approach similar to AUCell: for each cell, genes are ranked by expression, and the score is based on where marker genes fall in that ranking. This method is robust to batch effects and technical variation.

Usage

.compute_aucell_scores(expr_matrix, marker_sets, top_percent = 0.05)

Details

Key improvement: Uses recovery curve area under curve (AUC) calculation rather than simple proportion, giving partial credit to markers ranked just outside the top threshold.


Compute weighted scores for per-cell annotation

Description

This function uses an improved weighting scheme that considers:

  1. Expression level (log-normalized)

  2. Detection rate (binary: above min_expression threshold)

  3. Marker specificity (how unique is this marker to this cell type)

  4. Expression variability (CV-based: more variable genes are more discriminative)

Usage

.compute_weighted_scores(expr_matrix, marker_sets, min_expression)

Cellmarker2 dataset

Description

A dataset containing marker genes for different cell types from Cellmarker2

Usage

Cellmarker2

Format

A data frame with 8 columns:

Details

This dataset is used to filter and create a standardized marker list. The dataset can be filtered based on species, tissue class, tissue type, cancer type, and cell type to generate a list of marker genes for specific cell types.

Source

http://117.50.127.228/CellMarker/

See Also

Other Section_0_Database: Cellmarker2_raw, Cellmarker2_table, Markers_list_PCTAM, Markers_list_PCTIT, Markers_list_TCellSI, Markers_list_scIBD, PanglaoDB, PanglaoDB_raw, PanglaoDB_table


Cellmarker2 raw dataset

Description

A dataset containing marker genes for different cell types from Cellmarker2

Usage

Cellmarker2_raw

Format

A data frame with 20 columns contined in the Cellmarker2 database:

Details

This dataset is used to filter and create a standardized marker list. The dataset can be filtered based on species, tissue class, tissue type, cancer type, and cell type to generate a list of marker genes for specific cell types.

Source

http://117.50.127.228/CellMarker/

See Also

Other Section_0_Database: Cellmarker2, Cellmarker2_table, Markers_list_PCTAM, Markers_list_PCTIT, Markers_list_TCellSI, Markers_list_scIBD, PanglaoDB, PanglaoDB_raw, PanglaoDB_table


Cellmarker2 table

Description

A dataset containing marker genes for different cell types from Cellmarker2

Usage

Cellmarker2_table

Format

A list contain different types like species, tissue_class, tissue_type, cancer_type, cell_type

Details

This list is used to choose filters for creation of standardized marker list.

Source

http://117.50.127.228/CellMarker/

See Also

Other Section_0_Database: Cellmarker2, Cellmarker2_raw, Markers_list_PCTAM, Markers_list_PCTIT, Markers_list_TCellSI, Markers_list_scIBD, PanglaoDB, PanglaoDB_raw, PanglaoDB_table


Annotate Seurat Object with SlimR Cell Type Predictions

Description

This function assigns SlimR predicted cell types to a Seurat object based on cluster annotations, and stores the results in the meta.data slot.

Usage

Celltype_Annotation(
  seurat_obj,
  cluster_col,
  SlimR_anno_result,
  plot_UMAP = TRUE,
  annotation_col = "Cell_type_SlimR"
)

Arguments

seurat_obj

A Seurat object containing cluster information in meta.data.

cluster_col

Character string indicating the column name in meta.data that contains cluster IDs.

SlimR_anno_result

List generated by function Celltype_Calculate() which containing a data.frame in $Prediction_results with: 1.cluster_col (Cluster identifiers (should match cluster_col in meta.data)) 2.Predicted_cell_type (Predicted cell types for each cluster).

plot_UMAP

logical(1); if TRUE, plot the UMAP with cell type annotations.

annotation_col

The location to write in 'meta.data' that contains the predicted cell type. (default = "Cell_type_SlimR")

Value

A Seurat object with updated meta.data containing the predicted cell types.

Note

If plot_UMAP = TRUE, this function will print a UMAP plot as a side effect.

See Also

Other Section_3_Automated_Annotation: Celltype_Annotation_PerCell(), Celltype_Calculate(), Celltype_Calculate_PerCell(), Celltype_Verification(), Celltype_Verification_PerCell(), Parameter_Calculate(), percell_workflow

Examples

## Not run: 
sce <- Celltype_Annotation(seurat_obj = sce,
    cluster_col = "seurat_clusters",
    SlimR_anno_result = SlimR_anno_result,
    plot_UMAP = TRUE,
    annotation_col = "Cell_type_SlimR"
    )
    
## End(Not run)


Uses "marker_list" to generate combined plot for cell annotation

Description

Uses "marker_list" to generate combined plot for cell annotation

Usage

Celltype_Annotation_Combined(
  seurat_obj,
  gene_list,
  species,
  cluster_col = "seurat_clusters",
  assay = "RNA",
  save_path = NULL,
  colour_low = "white",
  colour_high = "navy"
)

Arguments

seurat_obj

Enter the Seurat object with annotation columns such as "seurat_cluster" in meta.data to be annotated.

gene_list

A list of cells and corresponding gene controls, the name of the list is cell type, and the first column of the list corresponds to markers. Lists can be generated using functions such as "Markers_filter_Cellmarker2 ()", "Markers_filter_PanglaoDB ()", "read_excel_markers ()", "read_seurat_markers ()", etc.

species

This parameter selects the species "Human" or "Mouse" for standard gene format correction of markers entered by "Marker_list".

cluster_col

Enter annotation columns such as "seurat_cluster" in meta.data of the Seurat object to be annotated. Default parameters use "cluster_col = 'seurat_clusters'".

assay

Enter the assay used by the Seurat object, such as "RNA". Default parameters use "assay = 'RNA'".

save_path

The output path of the cell annotation picture. Example parameters use "save_path = './SlimR/Celltype_annotation_Bar/'".

colour_low

Color for lowest expression level. (default = "white")

colour_high

Color for highest expression level. (default = "navy")

Value

The cell annotation picture is saved in "save_path".

See Also

Other Section_4_Semi_Automated_Annotation: Celltype_Annotation_Features(), Celltype_Annotation_Heatmap()

Examples

## Not run: 
Celltype_Annotation_Combined(seurat_obj = sce,
    gene_list = Markers_list,
    species = "Human",
    cluster_col = "seurat_clusters",
    assay = "RNA",
    save_path = file.path(tempdir(),"SlimR_Celltype_Annotation_Combined"),
    colour_low = "white",
    colour_high = "navy"
    )
    
## End(Not run)


Annotate cell types using features plot with different marker databases

Description

This function dynamically selects the appropriate annotation method based on the gene_list_type parameter. It supports marker databases from Cellmarker2, PanglaoDB, Seurat (via FindAllMarkers), or Excel files.

Usage

Celltype_Annotation_Features(
  seurat_obj,
  gene_list,
  gene_list_type = "Default",
  species = NULL,
  cluster_col = "seurat_clusters",
  assay = "RNA",
  save_path = NULL,
  min_counts = 1,
  metric_names = NULL,
  colour_low = "white",
  colour_high = "navy",
  colour_low_mertic = "white",
  colour_high_mertic = "navy",
  ...
)

Arguments

seurat_obj

A valid Seurat object with cluster annotations in meta.data.

gene_list

A list of data frames containing marker genes and metrics. Format depends on gene_list_type: - Cellmarker2: Generated by Markers_filter_Cellmarker2(). - PanglaoDB: Generated by Markers_filter_PanglaoDB(). - Seurat: Generated by read_seurat_markers(). - Excel: Generated by read_excel_markers().

gene_list_type

Type of marker database to use. Be one of: "Cellmarker2", "PanglaoDB", "Seurat", or "Excel".

species

Species of the dataset: "Human" or "Mouse" for gene name standardization.

cluster_col

Column name in meta.data defining clusters (default: "seurat_clusters").

assay

Assay layer in the Seurat object (default: "RNA").

save_path

Directory to save output PNGs. Must be explicitly specified.

min_counts

Minimum number of counts for Cellmarker2 annotations (default: 1).

metric_names

Optional. Change the row name for the input mertics, not recommended unless necessary. (NULL is used as default parameter; used in "Seurat"/"Excel").

colour_low

Color for lowest expression level. (default = "white")

colour_high

Color for highest expression level. (default = "navy")

colour_low_mertic

Color for lowest mertic level. (default = "white")

colour_high_mertic

Color for highest mertic level. (default = "navy")

...

Additional parameters passed to the specific annotation function.

Value

Saves cell type annotation PNGs in save_path. Returns invisibly.

See Also

Other Section_4_Semi_Automated_Annotation: Celltype_Annotation_Combined(), Celltype_Annotation_Heatmap()

Examples

## Not run: 
# Example for Cellmarker2
Celltype_Annotation_Features(seurat_obj = sce,
    gene_list = Markers_list_Cellmarker2,
    species = "Human",
    cluster_col = "seurat_clusters",
    assay = "RNA",
    save_path = file.path(tempdir(),"SlimR_Celltype_annotation_Cellmarker2"),
    colour_low = "white",
    colour_high = "navy",
    colour_low_mertic = "white",
    colour_high_mertic = "navy",
    )

# Example for PanglaoDB
Celltype_Annotation_Features(seurat_obj = sce,
    gene_list = Markers_list_panglaoDB,
    species = "Human",
    cluster_col = "seurat_clusters",
    assay = "RNA",
    save_path = file.path(tempdir(),"SlimR_Celltype_annotation_PanglaoDB")
    colour_low = "white",
    colour_high = "navy",
    colour_low_mertic = "white",
    colour_high_mertic = "navy",
    )

# Example for Seurat marker list
Celltype_Annotation_Features(seurat_obj = sce,
    gene_list = Markers_list_Seurat,
    species = "Human",
    cluster_col = "seurat_clusters",
    assay = "RNA",
    save_path = file.path(tempdir(),"SlimR_Celltype_annotation_Seurat")
    colour_low = "white",
    colour_high = "navy",
    colour_low_mertic = "white",
    colour_high_mertic = "navy",
    )

# Example for Excel marker list
Celltype_Annotation_Features(seurat_obj = sce,
    gene_list = Markers_list_Excel,
    species = "Human",
    cluster_col = "seurat_clusters",
    assay = "RNA",
    save_path = file.path(tempdir(),"SlimR_Celltype_annotation_Excel")
    colour_low = "white",
    colour_high = "navy",
    colour_low_mertic = "white",
    colour_high_mertic = "navy",
    )

## End(Not run)


Uses "marker_list" to generate heatmap for cell annotation

Description

Uses "marker_list" to generate heatmap for cell annotation

Usage

Celltype_Annotation_Heatmap(
  seurat_obj,
  gene_list,
  species,
  cluster_col = "seurat_clusters",
  assay = "RNA",
  min_expression = 0.1,
  specificity_weight = 3,
  colour_low = "navy",
  colour_high = "firebrick3"
)

Arguments

seurat_obj

Enter the Seurat object with annotation columns such as "seurat_cluster" in meta.data to be annotated.

gene_list

A list of cells and corresponding gene controls, the name of the list is cell type, and the first column of the list corresponds to markers. Lists can be generated using functions such as "Markers_filter_Cellmarker2 ()", "Markers_filter_PanglaoDB ()", "read_excel_markers ()", "read_seurat_markers ()", etc.

species

This parameter selects the species "Human" or "Mouse" for standard gene format correction of markers entered by "Marker_list".

cluster_col

Enter annotation columns such as "seurat_cluster" in meta.data of the Seurat object to be annotated. Default parameters use "cluster_col = 'seurat_clusters'".

assay

Enter the assay used by the Seurat object, such as "RNA". Default parameters use "assay = 'RNA'".

min_expression

The min_expression parameter defines a threshold value to determine whether a cell's expression of a feature is considered "expressed" or not. It is used to filter out low-expression cells that may contribute noise to the analysis. Default parameters use "min_expression = 0.1".

specificity_weight

The specificity_weight parameter controls how much the expression variability (standard deviation) of a feature within a cluster contributes to its "specificity score." It amplifies or suppresses the impact of variability in the final score calculation.Default parameters use "specificity_weight = 3".

colour_low

Color for lowest probability level in Heatmap visualization of probability matrix. (default = "navy")

colour_high

Color for highest probability level Heatmap visualization of probability matrix. (default = "firebrick3")

Value

The heatmap of the comparison between "cluster_col" in the Seurat object and the given gene set "gene_list" needs to be annotated.

See Also

Other Section_4_Semi_Automated_Annotation: Celltype_Annotation_Combined(), Celltype_Annotation_Features()

Examples

## Not run: 
Celltype_Annotation_Heatmap(seurat_obj = sce,
    gene_list = Markers_list,
    species = "Human",
    cluster_col = "seurat_clusters",
    assay = "RNA",
    min_expression = 0.1,
    specificity_weight = 3,
    colour_low = "navy",
    colour_high = "firebrick3"
    )
    
## End(Not run)


Annotate Seurat Object with Per-Cell SlimR Predictions

Description

This function assigns SlimR per-cell predicted cell types directly to individual cells in a Seurat object's meta.data slot.

Usage

Celltype_Annotation_PerCell(
  seurat_obj,
  SlimR_percell_result,
  plot_UMAP = TRUE,
  annotation_col = "Cell_type_PerCell_SlimR",
  plot_confidence = FALSE
)

Arguments

seurat_obj

A Seurat object.

SlimR_percell_result

List generated by Celltype_Calculate_PerCell() containing Cell_annotations data.frame with Cell_barcode and Predicted_cell_type columns.

plot_UMAP

Logical; if TRUE, plot the UMAP with cell type annotations. Default: TRUE.

annotation_col

Column name to write in meta.data. Default: "Cell_type_PerCell_SlimR".

plot_confidence

Logical; if TRUE, also plot a UMAP colored by confidence scores. Default: FALSE.

Value

A Seurat object with updated meta.data containing:

Note

If plot_UMAP = TRUE, this function will print UMAP plot(s) as a side effect.

See Also

Other Section_3_Automated_Annotation: Celltype_Annotation(), Celltype_Calculate(), Celltype_Calculate_PerCell(), Celltype_Verification(), Celltype_Verification_PerCell(), Parameter_Calculate(), percell_workflow

Examples

## Not run: 
# Run per-cell annotation
result <- Celltype_Calculate_PerCell(
    seurat_obj = sce,
    gene_list = Markers_list,
    species = "Human"
)

# Annotate Seurat object
sce <- Celltype_Annotation_PerCell(
    seurat_obj = sce,
    SlimR_percell_result = result,
    plot_UMAP = TRUE,
    annotation_col = "Cell_type_PerCell_SlimR"
)

## End(Not run)


Uses "marker_list" to calculate probability, prediction results, AUC and generate heatmap for cell annotation

Description

Uses "marker_list" to calculate probability, prediction results, AUC and generate heatmap for cell annotation

Usage

Celltype_Calculate(
  seurat_obj,
  gene_list,
  species,
  cluster_col = "seurat_clusters",
  assay = "RNA",
  min_expression = 0.1,
  specificity_weight = 3,
  threshold = 0.6,
  compute_AUC = TRUE,
  plot_AUC = TRUE,
  AUC_correction = FALSE,
  colour_low = "navy",
  colour_high = "firebrick3"
)

Arguments

seurat_obj

Enter the Seurat object with annotation columns such as "seurat_cluster" in meta.data to be annotated.

gene_list

A list of cells and corresponding gene controls, the name of the list is cell type, and the first column of the list corresponds to markers. Lists can be generated using functions such as "Markers_filter_Cellmarker2 ()", "Markers_filter_PanglaoDB ()", "read_excel_markers ()", "read_seurat_markers ()", etc.

species

This parameter selects the species "Human" or "Mouse" for standard gene format correction of markers entered by "Marker_list".

cluster_col

Enter annotation columns such as "seurat_cluster" in meta.data of the Seurat object to be annotated. Default parameters use "cluster_col = 'seurat_clusters'".

assay

Enter the assay used by the Seurat object, such as "RNA". Default parameters use "assay = 'RNA'".

min_expression

The min_expression parameter defines a threshold value to determine whether a cell's expression of a feature is considered "expressed" or not. It is used to filter out low-expression cells that may contribute noise to the analysis. Default parameters use "min_expression = 0.1".

specificity_weight

The specificity_weight parameter controls how much the expression variability (standard deviation) of a feature within a cluster contributes to its "specificity score." It amplifies or suppresses the impact of variability in the final score calculation.Default parameters use "specificity_weight = 3".

threshold

This parameter refers to the normalized similarity between the "alternative cell type" and the "predicted cell type" in the returned results. (the default parameter is 0.6)

compute_AUC

Logical indicating whether to calculate AUC values for predicted cell types. AUC measures how well the marker genes distinguish the cluster from others. When TRUE, adds an AUC column to the prediction results. (default: TRUE)

plot_AUC

The logic indicates whether to draw an AUC curve for the predicted cell type. When TRUE, add an AUC_plot to result. (default: TRUE)

AUC_correction

Logical value controlling AUC-based correction. (default = FALSE) When set to TRUE: 1.Computes AUC values for candidate cell types. (probability > threshold) 2.Selects the cell type with the highest AUC as the final predicted type. 3.Records the selected type's AUC value in the "AUC" column.

colour_low

Color for lowest probability level in Heatmap visualization of probability matrix. (default = "navy")

colour_high

Color for highest probability level Heatmap visualization of probability matrix. (default = "firebrick3")

Value

A list containing:

See Also

Other Section_3_Automated_Annotation: Celltype_Annotation(), Celltype_Annotation_PerCell(), Celltype_Calculate_PerCell(), Celltype_Verification(), Celltype_Verification_PerCell(), Parameter_Calculate(), percell_workflow

Examples

## Not run: 
SlimR_anno_result <- Celltype_Calculate(seurat_obj = sce,
    gene_list = Markers_list,
    species = "Human",
    cluster_col = "seurat_clusters",
    assay = "RNA",
    min_expression = 0.1,
    specificity_weight = 3,
    threshold = 0.6,
    compute_AUC = TRUE,
    plot_AUC = TRUE,
    AUC_correction = FALSE,
    colour_low = "navy",
    colour_high = "firebrick3"
    )
    
## End(Not run)


Per-cell annotation using marker expression and optional UMAP spatial smoothing

Description

Unlike cluster-based annotation, this function assigns cell type labels to each individual cell based on marker gene expression profiles. Optionally uses UMAP coordinates to smooth predictions via k-nearest neighbor voting.

Usage

Celltype_Calculate_PerCell(
  seurat_obj,
  gene_list,
  species,
  assay = "RNA",
  method = c("weighted", "mean", "AUCell"),
  min_expression = 0.1,
  use_umap_smoothing = FALSE,
  umap_reduction = "umap",
  k_neighbors = 15,
  smoothing_weight = 0.3,
  min_score = "auto",
  min_confidence = 1.2,
  return_scores = FALSE,
  ncores = 1,
  chunk_size = 5000,
  verbose = TRUE
)

Arguments

seurat_obj

Seurat object with normalized expression data.

gene_list

A standardized marker list (same format as Celltype_Calculate).

species

"Human" or "Mouse" for gene name formatting.

assay

Assay to use (default: "RNA").

method

Scoring method: "AUCell" (rank-based), "mean" (average expression), or "weighted" (expression * detection weighted). Default: "weighted".

min_expression

Minimum expression threshold for detection. Default: 0.1.

use_umap_smoothing

Logical. If TRUE, apply k-NN smoothing using UMAP coordinates to improve annotation consistency. Default: FALSE.

umap_reduction

Name of UMAP reduction in Seurat object. Default: "umap".

k_neighbors

Number of neighbors for UMAP smoothing. Default: 15.

smoothing_weight

Weight for neighbor votes vs cell's own score (0-1). Higher values give more weight to neighbors. Default: 0.3.

min_score

Minimum score threshold to assign a cell type. Cells below this threshold are labeled "Unassigned". Default: "auto" which adaptively sets the threshold based on number of cell types (1.5 / n_celltypes). Set to a numeric value (e.g., 0.1) to use a fixed threshold.

min_confidence

Minimum confidence threshold. Cells with confidence below this value are labeled "Unassigned". Confidence is calculated as the ratio of max score to second-highest score. Default: 1.2 (max must be 20% higher than second). Set to 1.0 to disable confidence filtering.

return_scores

If TRUE, return full score matrix. Default: FALSE.

ncores

Number of cores for parallel processing. Default: 1.

chunk_size

Number of cells to process per chunk (memory optimization). Default: 5000.

verbose

Print progress messages. Default: TRUE.

Details

Scoring Methods

"weighted" (recommended): Combines normalized expression with detection rate. For each cell and cell type: score = mean(expr_i * weight_i) where weight_i is derived from the marker's specificity across the dataset.

"mean": Simple average of normalized marker expression. Fast but less discriminative for overlapping marker sets.

"AUCell": Rank-based scoring similar to AUCell package. For each cell, genes are ranked by expression, and the score is the proportion of marker genes in the top X% of expressed genes. Robust to technical variation.

UMAP Smoothing

When use_umap_smoothing = TRUE, the function:

  1. Computes initial per-cell scores

  2. Finds k nearest neighbors in UMAP space for each cell

  3. Smooths scores by weighted averaging with neighbors

  4. Re-assigns cell types based on smoothed scores

This helps reduce noise and improve consistency of annotations within spatially coherent regions.

Value

A list containing:

See Also

Other Section_3_Automated_Annotation: Celltype_Annotation(), Celltype_Annotation_PerCell(), Celltype_Calculate(), Celltype_Verification(), Celltype_Verification_PerCell(), Parameter_Calculate(), percell_workflow

Examples

## Not run: 
# Basic per-cell annotation
result <- Celltype_Calculate_PerCell(
    seurat_obj = sce,
    gene_list = Markers_list,
    species = "Human",
    method = "weighted"
)

# Add annotations to Seurat object
sce$Cell_type_PerCell <- result$Cell_annotations$Predicted_cell_type

# With UMAP smoothing for more consistent annotations
result_smooth <- Celltype_Calculate_PerCell(
    seurat_obj = sce,
    gene_list = Markers_list,
    species = "Human",
    use_umap_smoothing = TRUE,
    k_neighbors = 20,
    smoothing_weight = 0.3
)

## End(Not run)


Perform cell type verification and generate the validation dotplot

Description

This function performs verification of predicted cell types by selecting high log2FC and high expression proportion genes and generates and generate the validation dotplot.

Usage

Celltype_Verification(
  seurat_obj,
  SlimR_anno_result,
  assay = "RNA",
  gene_number = 5,
  colour_low = "white",
  colour_high = "navy",
  annotation_col = "Cell_type_SlimR"
)

Arguments

seurat_obj

A Seurat object containing single-cell data.

SlimR_anno_result

A list containing SlimR annotation results with: Expression_list - List of expression matrices for each cell type. Prediction_results - Data frame with cluster annotations.

assay

Enter the assay used by the Seurat object, such as "RNA". Default parameters use "assay = 'RNA'".

gene_number

Integer specifying number of top genes to select per cell type.

colour_low

Color for lowest expression level. (default = "white")

colour_high

Color for highest expression level. (default = "navy")

annotation_col

Character string specifying the column in meta.data to use for grouping.

Value

A ggplot object showing expression of top variable genes.

See Also

Other Section_3_Automated_Annotation: Celltype_Annotation(), Celltype_Annotation_PerCell(), Celltype_Calculate(), Celltype_Calculate_PerCell(), Celltype_Verification_PerCell(), Parameter_Calculate(), percell_workflow

Examples

## Not run: 
Celltype_Verification(seurat_obj = sce,
    SlimR_anno_result = SlimR_anno_result,
    assay = "RNA",
    gene_number = 5,
    colour_low = "white",
    colour_high = "navy",
    annotation_col = "Cell_type_SlimR"
    )
    
## End(Not run)


Verify per-cell annotations with marker expression dotplot

Description

This function verifies per-cell SlimR annotations by generating a dotplot showing marker gene expression across predicted cell types.

Usage

Celltype_Verification_PerCell(
  seurat_obj,
  SlimR_percell_result,
  assay = "RNA",
  gene_number = 5,
  colour_low = "white",
  colour_high = "navy",
  annotation_col = "Cell_type_PerCell_SlimR",
  min_cells = 10
)

Arguments

seurat_obj

A Seurat object with per-cell annotations.

SlimR_percell_result

A list from Celltype_Calculate_PerCell() containing Expression_list with marker genes per cell type.

assay

Assay to use. Default: "RNA".

gene_number

Number of top genes to show per cell type. Default: 5.

colour_low

Color for lowest expression. Default: "white".

colour_high

Color for highest expression. Default: "navy".

annotation_col

Column in meta.data with cell type annotations. Default: "Cell_type_PerCell_SlimR".

min_cells

Minimum number of cells required for a cell type to be included in the plot. Default: 10.

Value

A ggplot object showing marker gene expression dotplot.

See Also

Other Section_3_Automated_Annotation: Celltype_Annotation(), Celltype_Annotation_PerCell(), Celltype_Calculate(), Celltype_Calculate_PerCell(), Celltype_Verification(), Parameter_Calculate(), percell_workflow

Examples

## Not run: 
# After running Celltype_Calculate_PerCell and Celltype_Annotation_PerCell
dotplot <- Celltype_Verification_PerCell(
    seurat_obj = sce,
    SlimR_percell_result = result,
    gene_number = 5,
    annotation_col = "Cell_type_PerCell_SlimR"
)
print(dotplot)

## End(Not run)


Uses "marker_list" from Cellmarker2 for cell annotation

Description

Uses "marker_list" from Cellmarker2 for cell annotation

Usage

Celltype_annotation_Cellmarker2(
  seurat_obj,
  gene_list,
  species,
  cluster_col = "seurat_clusters",
  assay = "RNA",
  save_path = NULL,
  min_counts = 1,
  colour_low = "white",
  colour_high = "navy",
  colour_low_mertic = "white",
  colour_high_mertic = "navy"
)

Arguments

seurat_obj

Enter the Seurat object with annotation columns such as "seurat_cluster" in meta.data to be annotated.

gene_list

Enter the standard "Marker_list" generated by the Cellmarker2 database for the SlimR package, generated by the "Markers_filter_Cellmarker2 ()" function.

species

This parameter selects the species "Human" or "Mouse" for standard gene format correction of markers entered by "Marker_list".

cluster_col

Enter annotation columns such as "seurat_cluster" in meta.data of the Seurat object to be annotated. Default parameters use "cluster_col = 'seurat_clusters'".

assay

Enter the assay used by the Seurat object, such as "RNA". Default parameters use "assay = "RNA"".

save_path

The output path of the cell annotation picture. Example parameters use "save_path = './SlimR/Celltype_annotation_Cellmarker2/'".

min_counts

The minimum number of counts of genes in "Marker_list" entered. This number represents the number of the same gene in the same species and the same location in the Cellmarker2 database used for annotation of this cell type. Default parameters use "min_counts = 1".

colour_low

Color for lowest expression level. (default = "white")

colour_high

Color for highest expression level. (default = "navy")

colour_low_mertic

Color for lowest mertic level. (default = "white")

colour_high_mertic

Color for highest mertic level. (default = "navy")

Value

The cell annotation picture is saved in "save_path".

See Also

Other Section_5_Other_Functions_Provided: Celltype_annotation_Excel(), Celltype_annotation_PanglaoDB(), Celltype_annotation_Seurat()

Examples

## Not run: 
Celltype_annotation_Cellmarker2(seurat_obj = sce,
    gene_list = Markers_list_Cellmarker2,
    species = "Human",
    cluster_col = "seurat_clusters",
    assay = "RNA",
    save_path = file.path(tempdir(),"SlimR_Celltype_annotation_Cellmarker2")
    colour_low = "white",
    colour_high = "navy",
    colour_low_mertic = "white",
    colour_high_mertic = "navy",
    )
    
## End(Not run)


Uses "marker_list" from Excel input for cell annotation

Description

Uses "marker_list" from Excel input for cell annotation

Usage

Celltype_annotation_Excel(
  seurat_obj,
  gene_list,
  species,
  cluster_col = "seurat_clusters",
  assay = "RNA",
  save_path = NULL,
  metric_names = NULL,
  colour_low = "white",
  colour_high = "navy",
  colour_low_mertic = "white",
  colour_high_mertic = "navy"
)

Arguments

seurat_obj

Enter the Seurat object with annotation columns such as "seurat_cluster" in meta.data to be annotated.

gene_list

Enter the standard "Marker_list" generated by the Excel files database for the SlimR package, generated by the "read_excel_markers()" function.

species

This parameter selects the species "Human" or "Mouse" for standard gene format correction of markers entered by "Marker_list".

cluster_col

Enter annotation columns such as "seurat_cluster" in meta.data of the Seurat object to be annotated. Default parameters use "cluster_col = "seurat_clusters"".

assay

Enter the assay used by the Seurat object, such as "RNA". Default parameters use "assay = 'RNA'".

save_path

The output path of the cell annotation picture. Example parameters use "save_path = './SlimR/Celltype_annotation_Excel/'".

metric_names

Change the row name for the input mertics, not recommended unless necessary. (NULL is used as default parameter)

colour_low

Color for lowest expression level. (default = "white")

colour_high

Color for highest expression level. (default = "navy")

colour_low_mertic

Color for lowest mertic level. (default = "white")

colour_high_mertic

Color for highest mertic level. (default = "navy")

Value

The cell annotation picture is saved in "save_path".

See Also

Other Section_5_Other_Functions_Provided: Celltype_annotation_Cellmarker2(), Celltype_annotation_PanglaoDB(), Celltype_annotation_Seurat()

Examples

## Not run: 
Celltype_annotation_Excel(seurat_obj = sce,
    gene_list = Markers_list_Excel,
    species = "Human",
    cluster_col = "seurat_clusters",
    assay = "RNA",
    save_path = file.path(tempdir(),"SlimR_Celltype_annotation_Excel")
    colour_low = "white",
    colour_high = "navy",
    colour_low_mertic = "white",
    colour_high_mertic = "navy",
    )
    
## End(Not run)


Uses "marker_list" from PanglaoDB for cell annotation

Description

Uses "marker_list" from PanglaoDB for cell annotation

Usage

Celltype_annotation_PanglaoDB(
  seurat_obj,
  gene_list,
  species,
  cluster_col = "seurat_clusters",
  assay = "RNA",
  save_path = NULL,
  metric_names = NULL,
  colour_low = "white",
  colour_high = "navy",
  colour_low_mertic = "white",
  colour_high_mertic = "navy"
)

Arguments

seurat_obj

Enter the Seurat object with annotation columns such as "seurat_cluster" in meta.data to be annotated.

gene_list

Enter the standard "Marker_list" generated by the PanglaoDB database for the SlimR package, generated by the "Markers_filter_PanglaoDB ()" function.

species

This parameter selects the species "Human" or "Mouse" for standard gene format correction of markers entered by "Marker_list".

cluster_col

Enter annotation columns such as "seurat_cluster" in meta.data of the Seurat object to be annotated. Default parameters use "cluster_col = 'seurat_clusters'".

assay

Enter the assay used by the Seurat object, such as "RNA". Default parameters use "assay = 'RNA'".

save_path

The output path of the cell annotation picture. Example parameters use "save_path = './SlimR/Celltype_annotation_PanglaoDB/'".

metric_names

Warning: Do not enter information. This parameter is used to check if "Marker_list" conforms to the PanglaoDB database output.

colour_low

Color for lowest expression level. (default = "white")

colour_high

Color for highest expression level. (default = "navy")

colour_low_mertic

Color for lowest mertic level. (default = "white")

colour_high_mertic

Color for highest mertic level. (default = "navy")

Value

The cell annotation picture is saved in "save_path".

See Also

Other Section_5_Other_Functions_Provided: Celltype_annotation_Cellmarker2(), Celltype_annotation_Excel(), Celltype_annotation_Seurat()

Examples

## Not run: 
Celltype_annotation_PanglaoDB(seurat_obj = sce,
    gene_list = Markers_list_panglaoDB,
    species = "Human",
    cluster_col = "seurat_clusters",
    assay = "RNA",
    save_path = file.path(tempdir(),"SlimR_Celltype_annotation_PanglaoDB")
    colour_low = "white",
    colour_high = "navy",
    colour_low_mertic = "white",
    colour_high_mertic = "navy",
    )
    
## End(Not run)


Uses "marker_list" from Seurat object for cell annotation

Description

Uses "marker_list" from Seurat object for cell annotation

Usage

Celltype_annotation_Seurat(
  seurat_obj,
  gene_list,
  species,
  cluster_col = "seurat_clusters",
  assay = "RNA",
  save_path = NULL,
  metric_names = NULL,
  colour_low = "white",
  colour_high = "navy",
  colour_low_mertic = "white",
  colour_high_mertic = "navy"
)

Arguments

seurat_obj

Enter the Seurat object with annotation columns such as "seurat_cluster" in meta.data to be annotated.

gene_list

Enter the standard "Marker_list" generated by the Seurat object database for the SlimR package, generated by the "read_seurat_markers()" function.

species

This parameter selects the species "Human" or "Mouse" for standard gene format correction of markers entered by "Marker_list".

cluster_col

Enter annotation columns such as "seurat_cluster" in meta.data of the Seurat object to be annotated. Default parameters use "cluster_col = 'seurat_clusters'".

assay

Enter the assay used by the Seurat object, such as "RNA". Default parameters use "assay = 'RNA'".

save_path

The output path of the cell annotation picture. Example parameters use "save_path = './SlimR/Celltype_annotation_Seurat/'".

metric_names

Change the row name for the input mertics, not recommended unless necessary. (NULL is used as default parameter)

colour_low

Color for lowest expression level. (default = "white")

colour_high

Color for highest expression level. (default = "navy")

colour_low_mertic

Color for lowest mertic level. (default = "white")

colour_high_mertic

Color for highest mertic level. (default = "navy")

Value

The cell annotation picture is saved in "save_path".

See Also

Other Section_5_Other_Functions_Provided: Celltype_annotation_Cellmarker2(), Celltype_annotation_Excel(), Celltype_annotation_PanglaoDB()

Examples

## Not run: 
Celltype_annotation_Seurat(seurat_obj = sce,
    gene_list = Markers_list_Seurat,
    species = "Human",
    cluster_col = "seurat_clusters",
    assay = "RNA",
    save_path = file.path(tempdir(),"SlimR_Celltype_annotation_Seurat")
    colour_low = "white",
    colour_high = "navy",
    colour_low_mertic = "white",
    colour_high_mertic = "navy",
    )
    
## End(Not run)


Create Marker_list from the Cellmarkers2 database

Description

Create Marker_list from the Cellmarkers2 database

Usage

Markers_filter_Cellmarker2(
  df,
  species = NULL,
  tissue_class = NULL,
  tissue_type = NULL,
  cancer_type = NULL,
  cell_type = NULL
)

Arguments

df

Standardized Cellmarkers2 database. It is read as data(Cellmarkers2) in the SlimR library.

species

Species information in Cellmarkers2 database. The default input is "Human" or "Mouse".The input can be retrieved by "Cellmarkers2_table". For more information,please refer to http://117.50.127.228/CellMarker/ on Cellmarkers2's official website.

tissue_class

Tissue_class information in Cellmarkers2 database. The input can be retrieved by "Cellmarkers2_table". For more information, please refer to http://117.50.127.228/CellMarker/ on Cellmarkers2's official website.

tissue_type

Tissue_type information in Cellmarkers2 database. The input can be retrieved by "Cellmarkers2_table". For more information, please refer to http://117.50.127.228/CellMarker/ on Cellmarkers2's official website.

cancer_type

Cancer_type information in Cellmarkers2 database. The input can be retrieved by "Cellmarkers2_table". For more information, please refer to http://117.50.127.228/CellMarker/ on Cellmarkers2's official website.

cell_type

Cell_type information in Cellmarkers2 database. The input can be retrieved by "Cellmarkers2_table". For more information, please refer to http://117.50.127.228/CellMarker/ on Cellmarkers2's official website.

Value

The standardized "Marker_list" in the SlimR package

See Also

Other Section_2_Standardized_Markers_List: Markers_filter_PanglaoDB(), Read_excel_markers(), Read_seurat_markers()

Examples

Cellmarker2 <- SlimR::Cellmarker2
Markers_list_Cellmarker2 <- Markers_filter_Cellmarker2(
    Cellmarker2,
    species = "Human",
    tissue_class = "Intestine",
    tissue_type = NULL,
    cancer_type = NULL,
    cell_type = NULL
    )


Create Marker_list from the PanglaoDB database

Description

Create Marker_list from the PanglaoDB database

Usage

Markers_filter_PanglaoDB(df, species_input, organ_input)

Arguments

df

Standardized PanglaoDB database. It is read as data(PanglaoDB) in the SlimR library.

species_input

Species information in PanglaoDB database. The default input is "Human" or "Mouse".The input can be retrieved by "PanglaoDB_table". For more information,please refer to https://panglaodb.se/ on PanglaoDB's official website.

organ_input

Organ type information in the PanglaoDB database. The input can be retrieved by "PanglaoDB_table".For more information, please refer to https://panglaodb.se/ on PanglaoDB's official website.

Value

The standardized "Marker_list" in the SlimR package

See Also

Other Section_2_Standardized_Markers_List: Markers_filter_Cellmarker2(), Read_excel_markers(), Read_seurat_markers()

Examples

PanglaoDB <- SlimR::PanglaoDB
Markers_list_panglaoDB <- Markers_filter_PanglaoDB(
    PanglaoDB,
    species_input = 'Human',
    organ_input = 'GI tract'
    )


List of Macrophage subtype markers in the article "Macrophage diversity in cancer revisited in the era of single-cell omics"

Description

A dataset containing marker genes for different Macrophage subtypes from the article "Macrophage diversity in cancer revisited in the era of single-cell omics"

Usage

Markers_list_PCTAM

Format

A list with 7 tables.

Details

This list is a table of 7 types of Tumor-associated macrophages (TAMs) markers obtained from the article "Macrophage diversity in cancer revisited in the era of single-cell omics". The data source is "https://doi.org/10.1016/j.it.2022.04.008", and the reference literature is: Ruo-Yu Ma et al. (2022) doi:10.1016/j.it.2022.04.008.

Source

doi:10.1016/j.it.2022.04.008

See Also

Other Section_0_Database: Cellmarker2, Cellmarker2_raw, Cellmarker2_table, Markers_list_PCTIT, Markers_list_TCellSI, Markers_list_scIBD, PanglaoDB, PanglaoDB_raw, PanglaoDB_table


List of T cell subtype markers in the article "Pan-cancer single cell landscape of tumor-infiltrating T cells"

Description

A dataset containing marker genes for different T cell types from the article "Pan-cancer single cell landscape of tumor-infiltrating T cells"

Usage

Markers_list_PCTIT

Format

A list with 40 tables.

Details

This list is a table of 40 types of pan-cancer tumor-infiltrating T cell (PCTIT) markers obtained from the article "Pan-cancer single cell landscapeof tumor-infiltrating T cells". The data source is "https://doi.org/10.1126/science.abe6474", and the reference literature is: L. Zheng et al. (2021) doi:10.1126/science.abe6474.

Source

doi:10.1126/science.abe6474

See Also

Other Section_0_Database: Cellmarker2, Cellmarker2_raw, Cellmarker2_table, Markers_list_PCTAM, Markers_list_TCellSI, Markers_list_scIBD, PanglaoDB, PanglaoDB_raw, PanglaoDB_table


List of T cell subtype markers in the article TCellSI

Description

A dataset containing marker genes for different T cell subtypes from TCellSI

Usage

Markers_list_TCellSI

Format

A list with ten tables.

Details

This list is a table of 10 types of T cell markers obtained from TCellSI. The data source is "https://github.com/GuoBioinfoLab/TCellSI/blob/main/data/markers.rda", and the reference literature is: Yang et al. (2024) doi:10.1002/imt2.231.

Source

https://github.com/GuoBioinfoLab/TCellSI/

See Also

Other Section_0_Database: Cellmarker2, Cellmarker2_raw, Cellmarker2_table, Markers_list_PCTAM, Markers_list_PCTIT, Markers_list_scIBD, PanglaoDB, PanglaoDB_raw, PanglaoDB_table


List of cell type markers in the article scIBD

Description

A dataset containing marker genes for different human intestine cell types from scIBD

Usage

Markers_list_scIBD

Format

A list with one hundred and one tables.

Details

This list is a table of 101 types of human intestine cell types markers obtained from scIBD. The article doi source is "https://doi.org/10.1038/s43588-023-00464-9", and the reference literature is: Nie et al. (2023) doi:10.1038/s43588-023-00464-9. Note: The 'Markers_list_scIBD' was generated using section 2.5.2 and the parameters 'sort_by = "logFC"' and 'gene_filter = 20' were set.

Source

doi:10.1038/s43588-023-00464-9

See Also

Other Section_0_Database: Cellmarker2, Cellmarker2_raw, Cellmarker2_table, Markers_list_PCTAM, Markers_list_PCTIT, Markers_list_TCellSI, PanglaoDB, PanglaoDB_raw, PanglaoDB_table


PanglaoDB dataset

Description

A dataset containing marker genes for different cell types from PanglaoDB

Usage

PanglaoDB

Format

A data frame with 9 columns:

Details

This dataset is used to filter and create a standardized marker list.'

Source

https://panglaodb.se/

See Also

Other Section_0_Database: Cellmarker2, Cellmarker2_raw, Cellmarker2_table, Markers_list_PCTAM, Markers_list_PCTIT, Markers_list_TCellSI, Markers_list_scIBD, PanglaoDB_raw, PanglaoDB_table


PanglaoDB raw dataset

Description

A dataset containing marker genes for different cell types from PanglaoDB

Usage

PanglaoDB_raw

Format

A data frame with 14 columns contined in the PanglaoDB database:

Details

This dataset is used to filter and create a standardized marker list.'

Source

https://panglaodb.se/

See Also

Other Section_0_Database: Cellmarker2, Cellmarker2_raw, Cellmarker2_table, Markers_list_PCTAM, Markers_list_PCTIT, Markers_list_TCellSI, Markers_list_scIBD, PanglaoDB, PanglaoDB_table


PanglaoDB table

Description

A dataset containing marker genes for different cell types from PanglaoDB

Usage

PanglaoDB_table

Format

A list contain different types like species, organ, cell type.

Details

This list is used to choose filters for creation of standardized marker list.

Source

https://panglaodb.se/

See Also

Other Section_0_Database: Cellmarker2, Cellmarker2_raw, Cellmarker2_table, Markers_list_PCTAM, Markers_list_PCTIT, Markers_list_TCellSI, Markers_list_scIBD, PanglaoDB, PanglaoDB_raw


Adaptive Parameter Tuning for Single-Cell Data Annotation in SlimR

Description

This function automatically determines optimal min_expression, specificity_weight, and threshold parameters for single-cell data analysis based on dataset characteristics using adaptive algorithms derived from empirical analysis of single-cell datasets.

Usage

Parameter_Calculate(
  seurat_obj,
  features = NULL,
  assay = NULL,
  cluster_col = NULL,
  n_celltypes = 50,
  verbose = TRUE
)

Arguments

seurat_obj

A Seurat object containing single-cell data

features

Character vector of feature names (genes) to analyze. If NULL, will use highly variable features from the Seurat object.

assay

Name of assay to use (default: default assay)

cluster_col

Column name in metadata containing cluster information

n_celltypes

Expected number of cell types in marker database (default: 50). Used for threshold recommendation calculation.

verbose

Whether to print progress messages (default: TRUE)

Value

A list containing:

See Also

Other Section_3_Automated_Annotation: Celltype_Annotation(), Celltype_Annotation_PerCell(), Celltype_Calculate(), Celltype_Calculate_PerCell(), Celltype_Verification(), Celltype_Verification_PerCell(), percell_workflow

Examples

## Not run: 
SlimR_params <- Parameter_Calculate(
  seurat_obj = sce,
  features = c("CD3E", "CD4", "CD8A"),
  assay = "RNA",
  cluster_col = "seurat_clusters",
  n_celltypes = 98,
  verbose = TRUE
  )

## End(Not run)


Create "Marker_list" from Excel files ".xlsx"

Description

Create "Marker_list" from Excel files ".xlsx"

Usage

Read_excel_markers(path, has_colnames = TRUE)

Arguments

path

The path information of Marker files stored in ".xlsx" format. The Sheet name in the file is filled with cell type. The first line of each Sheet is the table head, the first column is filled with markers information, and the following column is filled with mertic information.

has_colnames

Logical value indicating whether the first row contains column names. If FALSE, the first column will be named "Markers" and subsequent columns will be named "Col1", "Col2", etc.

Value

The standardized "Marker_list" in the SlimR package.

See Also

Other Section_2_Standardized_Markers_List: Markers_filter_Cellmarker2(), Markers_filter_PanglaoDB(), Read_seurat_markers()

Examples

## Not run: 
Markers_list_Excel <- Read_excel_markers(
    "D:/Laboratory/Marker_load.xlsx"
    )

## End(Not run)


Create "Marker_list" from Seurat object

Description

Create "Marker_list" from Seurat object

Usage

Read_seurat_markers(
  df,
  sources = c("Seurat", "presto"),
  sort_by = "FSS",
  gene_filter = 20
)

Arguments

df

Dataframe generated by "FindAllMarkers" function, recommend to use parameter "group.by = "Cell_type"" and "only.pos = TRUE".

sources

Type of markers sources to use. Be one of: "Seurat" or "presto".

sort_by

Marker sorting parameter, for Seurat sources, select "avg_log2FC" or "p_val_adj" or "FSS" (Feature Significance Score, FSS, product value of log2FC and ⁠Expression ratio⁠). Default parameters use "sort_by = 'FSS'".for presto sources, select "logFC" or "padj" or "FSS". Default parameters use "sort_by = 'FSS'".

gene_filter

The number of markers left for each cell type based on the "sort_by" parameter's level of difference. Default parameters use "gene_fliter = 20"

Value

The standardized "Marker_list" in the SlimR package.

See Also

Other Section_2_Standardized_Markers_List: Markers_filter_Cellmarker2(), Markers_filter_PanglaoDB(), Read_excel_markers()

Examples

## Not run: 
# Example for Seurat sources markers
seurat_markers <- Seurat::FindAllMarkers(
    object = sce,
    group.by = "Cell_type",
    only.pos = TRUE)

Markers_list_Seurat <- Read_seurat_markers(seurat_markers,
    sources = "Seurat",
    sort_by = "avg_log2FC",
    gene_filter = 20
    )

# Example for presto sources markers
seurat_markers <- dplyr::filter(
    presto::wilcoxauc(
      X = sce,
      group_by = "Cell_type",
      seurat_assay = "RNA"
      ),
    padj < 0.05, logFC > 0.5
    )

Markers_list_Seurat <- Read_seurat_markers(seurat_markers,
    sources = "presto",
    sort_by = "logFC",
    gene_filter = 20
    )

## End(Not run)


Calculate Cluster Variability (Use in package)

Description

Measures the degree of separation between different cell clusters based on expression patterns.

Usage

calculate_cluster_variability(data.features, features)

Arguments

data.features

Data frame containing expression data and cluster labels

features

Feature names to include in analysis

Value

Numeric value representing cluster separation strength

See Also

Other Section_1_Functions_Use_in_Package: calculate_expression(), calculate_expression_skewness(), calculate_probability(), compute_adaptive_parameters(), estimate_batch_effect(), extract_dataset_features()


Counts average expression of gene set (Use in package)

Description

Counts average expression of gene set (Use in package)

Usage

calculate_expression(
  object,
  features,
  assay = NULL,
  cluster_col = NULL,
  colour_low = "white",
  colour_high = "navy"
)

Arguments

object

Enter a Seurat object.

features

Enter one or a set of markers.

assay

Enter the assay used by the Seurat object, such as "RNA". Default parameters use "assay = NULL".

cluster_col

Enter the meta.data column in the Seurat object to be annotated, such as "seurat_cluster". Default parameters use "cluster_col = NULL".

colour_low

Color for lowest expression level. (default = "white")

colour_high

Color for highest expression level. (default = "black")

Value

Average expression genes and relatied informations in the input "Seurat" object given "cluster_col" and given "features".

See Also

Other Section_1_Functions_Use_in_Package: calculate_cluster_variability(), calculate_expression_skewness(), calculate_probability(), compute_adaptive_parameters(), estimate_batch_effect(), extract_dataset_features()


Calculate Expression Distribution Skewness (Use in package)

Description

Computes the average skewness of gene expression distributions across all features.

Usage

calculate_expression_skewness(expression_matrix)

Arguments

expression_matrix

Matrix of expression values

Value

Mean absolute skewness across all genes

See Also

Other Section_1_Functions_Use_in_Package: calculate_cluster_variability(), calculate_expression(), calculate_probability(), compute_adaptive_parameters(), estimate_batch_effect(), extract_dataset_features()


Calculate gene set expression and infer probabilities with control datasets (Use in package)

Description

Calculate gene set expression and infer probabilities with control datasets (Use in package)

Usage

calculate_probability(
  object,
  features,
  assay = NULL,
  cluster_col = NULL,
  min_expression = 0.1,
  specificity_weight = 3
)

Arguments

object

Enter a Seurat object.

features

Enter one or a set of markers.

assay

Enter the assay used by the Seurat object, such as "RNA". Default parameters use "assay = NULL".

cluster_col

Enter the meta.data column in the Seurat object to be annotated, such as "seurat_cluster". Default parameters use "cluster_col = NULL".

min_expression

The min_expression parameter defines a threshold value to determine whether a cell's expression of a feature is considered "expressed" or not. It is used to filter out low-expression cells that may contribute noise to the analysis. Default parameters use "min_expression = 0.1".

specificity_weight

The specificity_weight parameter controls how much the expression variability (standard deviation) of a feature within a cluster contributes to its "specificity score." It amplifies or suppresses the impact of variability in the final score calculation.Default parameters use "specificity_weight = 3".

Value

Average expression of genes in the input "Seurat" object given "cluster_col" and given "features".

See Also

Other Section_1_Functions_Use_in_Package: calculate_cluster_variability(), calculate_expression(), calculate_expression_skewness(), compute_adaptive_parameters(), estimate_batch_effect(), extract_dataset_features()


Compute Adaptive Parameters Based on Dataset Features (Use in package)

Description

Calculates optimal min_expression, specificity_weight, and threshold parameters using continuous adaptive algorithms based on dataset characteristics.

Usage

compute_adaptive_parameters(dataset_features, n_celltypes = 50)

Arguments

dataset_features

List of dataset characteristics from extract_dataset_features()

n_celltypes

Expected number of cell types in marker database

Value

List containing min_expression, specificity_weight, threshold, and rationale

See Also

Other Section_1_Functions_Use_in_Package: calculate_cluster_variability(), calculate_expression(), calculate_expression_skewness(), calculate_probability(), estimate_batch_effect(), extract_dataset_features()


Estimate Batch Effect Strength (Use in package)

Description

Roughly estimates the potential impact of batch effects using available metadata.

Usage

estimate_batch_effect(seurat_obj, assay)

Arguments

seurat_obj

Seurat object

assay

Assay name

Value

Batch effect score (0 indicates no detectable batch effect)

See Also

Other Section_1_Functions_Use_in_Package: calculate_cluster_variability(), calculate_expression(), calculate_expression_skewness(), calculate_probability(), compute_adaptive_parameters(), extract_dataset_features()


Extract Dataset Characteristics for Adaptive Parameter Calculation (Use in package)

Description

Computes various statistical features from single-cell data that are used as input for the parameter prediction model.

Usage

extract_dataset_features(
  seurat_obj,
  features,
  assay = NULL,
  cluster_col = NULL
)

Arguments

seurat_obj

Seurat object

features

Features to analyze

assay

Assay name

cluster_col

Cluster column name

Value

List of dataset characteristics including expression statistics, variability measures, and cluster properties

See Also

Other Section_1_Functions_Use_in_Package: calculate_cluster_variability(), calculate_expression(), calculate_expression_skewness(), calculate_probability(), compute_adaptive_parameters(), estimate_batch_effect()


Per-Cell Annotation Workflow Example

Description

Example workflow for using SlimR's per-cell annotation functions

Overview

The per-cell annotation workflow in SlimR provides an alternative to cluster-based annotation by scoring and labeling individual cells based on marker expression. This is useful when:

Basic Workflow

# 1. Prepare your Seurat object (must have normalized data)
library(SlimR)
library(Seurat)

# 2. Create or load marker list
Markers_list <- Markers_filter_Cellmarker2(
    Cellmarker2,
    species = "Human",
    tissue_class = "Intestine"
)

# 3. Run per-cell annotation
result <- Celltype_Calculate_PerCell(
    seurat_obj = sce,
    gene_list = Markers_list,
    species = "Human",
    method = "weighted",          # "weighted", "mean", or "AUCell"
    min_expression = 0.1,
    min_score = 0.1,
    verbose = TRUE
)

# 4. Annotate Seurat object
sce <- Celltype_Annotation_PerCell(
    seurat_obj = sce,
    SlimR_percell_result = result,
    plot_UMAP = TRUE,
    plot_confidence = TRUE,
    annotation_col = "Cell_type_PerCell"
)

# 5. Verify annotations
dotplot <- Celltype_Verification_PerCell(
    seurat_obj = sce,
    SlimR_percell_result = result,
    gene_number = 5,
    annotation_col = "Cell_type_PerCell"
)
print(dotplot)

Advanced

UMAP Spatial Smoothing:

# Use UMAP coordinates to smooth predictions via k-NN
# This reduces noise and improves consistency in spatial regions

result_smooth <- Celltype_Calculate_PerCell(
    seurat_obj = sce,
    gene_list = Markers_list,
    species = "Human",
    use_umap_smoothing = TRUE,
    k_neighbors = 20,              # Number of neighbors to consider
    smoothing_weight = 0.3,        # 30
    verbose = TRUE
)

# Compare smoothed vs unsmoothed
sce$Cell_type_Smooth <- result_smooth$Cell_annotations$Predicted_cell_type
sce$Cell_type_Raw <- result$Cell_annotations$Predicted_cell_type

DimPlot(sce, group.by = "Cell_type_Raw") | 
  DimPlot(sce, group.by = "Cell_type_Smooth")

Scoring Methods Comparison

# Method 1: Weighted (recommended for most cases)
# Combines expression with marker specificity and detection rate
result_weighted <- Celltype_Calculate_PerCell(
    seurat_obj = sce,
    gene_list = Markers_list,
    species = "Human",
    method = "weighted"
)

# Method 2: Mean (simple, fast)
# Just averages normalized marker expression
result_mean <- Celltype_Calculate_PerCell(
    seurat_obj = sce,
    gene_list = Markers_list,
    species = "Human",
    method = "mean"
)

# Method 3: AUCell (rank-based, robust to batch effects)
# Scores based on proportion of markers in top 5
result_aucell <- Celltype_Calculate_PerCell(
    seurat_obj = sce,
    gene_list = Markers_list,
    species = "Human",
    method = "AUCell"
)

Comparing Cluster vs Per-Cell Annotation

# Cluster-based annotation (original SlimR approach)
cluster_result <- Celltype_Calculate(
    seurat_obj = sce,
    gene_list = Markers_list,
    species = "Human",
    cluster_col = "seurat_clusters"
)

sce <- Celltype_Annotation(
    seurat_obj = sce,
    cluster_col = "seurat_clusters",
    SlimR_anno_result = cluster_result,
    annotation_col = "Cell_type_Cluster"
)

# Per-cell annotation
percell_result <- Celltype_Calculate_PerCell(
    seurat_obj = sce,
    gene_list = Markers_list,
    species = "Human"
)

sce <- Celltype_Annotation_PerCell(
    seurat_obj = sce,
    SlimR_percell_result = percell_result,
    annotation_col = "Cell_type_PerCell"
)

# Compare
library(ggplot2)
library(patchwork)

p1 <- DimPlot(sce, group.by = "Cell_type_Cluster") + 
      ggtitle("Cluster-based")
p2 <- DimPlot(sce, group.by = "Cell_type_PerCell") + 
      ggtitle("Per-cell")

p1 | p2

# Check agreement
table(sce$Cell_type_Cluster, sce$Cell_type_PerCell)

Performance Optimization

# For large datasets, adjust chunk_size to manage memory
result <- Celltype_Calculate_PerCell(
    seurat_obj = sce,
    gene_list = Markers_list,
    species = "Human",
    chunk_size = 10000,  # Process 10k cells at a time
    verbose = TRUE
)

# For UMAP smoothing, install RANN for 10-100x speedup
# install.packages("RANN")

result_smooth <- Celltype_Calculate_PerCell(
    seurat_obj = sce,
    gene_list = Markers_list,
    species = "Human",
    use_umap_smoothing = TRUE,
    k_neighbors = 15
    # RANN will be used automatically if installed
)

Accessing Results

# Cell-level annotations
head(result$Cell_annotations)
#   Cell_barcode Predicted_cell_type Max_score Confidence
# 1  AAACCTGAG... Enterocyte          0.85      0.62
# 2  AAACCTGCA... Goblet cell         0.72      0.45

# Summary statistics
result$Summary
#   Cell_type       Count Percentage
# 1 Enterocyte      5432  45.2
# 2 Goblet cell     2156  17.9

# Full probability matrix (if return_scores = TRUE)
result$Probability_matrix[1:5, 1:3]
#              Enterocyte Goblet_cell Stem_cell
# AAACCTGAG... 0.85       0.10        0.05

# Extract high-confidence cells
high_conf <- result$Cell_annotations$Cell_barcode[
    result$Cell_annotations$Confidence > 0.5
]

# Extract uncertain cells for manual review
uncertain <- result$Cell_annotations$Cell_barcode[
    result$Cell_annotations$Confidence < 0.2
]

See Also

Other Section_3_Automated_Annotation: Celltype_Annotation(), Celltype_Annotation_PerCell(), Celltype_Calculate(), Celltype_Calculate_PerCell(), Celltype_Verification(), Celltype_Verification_PerCell(), Parameter_Calculate()


Plot Method for pheatmap Objects

Description

This S3 method allows pheatmap objects (returned by Celltype_Calculate()) to be plotted using the generic plot() function. Without this method, attempting to use plot() on a pheatmap object results in an error.

Usage

## S3 method for class 'pheatmap'
plot(x, ...)

Arguments

x

A pheatmap object, typically from cluster_results$Heatmap_plot

...

Additional arguments (currently ignored)

Details

Pheatmap objects contain a gtable component that needs to be drawn using grid graphics. This method handles that automatically when plot() is called.

Alternative ways to display pheatmaps:

Value

Invisibly returns the input pheatmap object after displaying it

Examples

## Not run: 
# After running Celltype_Calculate()
cluster_results <- Celltype_Calculate(
    seurat_obj = sce,
    gene_list = Markers_list,
    species = "Human"
)

# Now both of these work:
print(cluster_results$Heatmap_plot)
plot(cluster_results$Heatmap_plot)

## End(Not run)