decontX {celda} | R Documentation |
Identifies contamination from factors such as ambient RNA in single cell genomic datasets.
## S4 method for signature 'SingleCellExperiment' decontX( x, assayName = "counts", z = NULL, batch = NULL, maxIter = 500, delta = 10, convergence = 0.001, iterLogLik = 10, varGenes = 5000, dbscanEps = 1, L = 50, seed = 12345, logfile = NULL, verbose = TRUE ) ## S4 method for signature 'ANY' decontX( x, z = NULL, batch = NULL, maxIter = 500, delta = 10, convergence = 0.001, iterLogLik = 10, varGenes = 5000, dbscanEps = 1, L = 50, seed = 12345, logfile = NULL, verbose = TRUE )
x |
A numeric matrix of counts or a SingleCellExperiment
with the matrix located in the assay slot under |
assayName |
Character. Name of the assay to use if |
z |
Numeric or character vector. Cell cluster labels. If NULL, Celda will be used to reduce the dimensionality of the dataset to 'L' modules, 'umap' from the 'uwot' package will be used to further reduce the dataset to 2 dimenions and the 'dbscan' function from the 'dbscan' package will be used to identify clusters of broad cell types. Default NULL. |
batch |
Numeric or character vector. Batch labels for cells. If batch labels are supplied, DecontX is run on cells from each batch separately. Cells run in different channels or assays should be considered different batches. Default NULL. |
maxIter |
Integer. Maximum iterations of the EM algorithm. Default 500. |
delta |
Numeric. Symmetric Dirichlet concentration parameter to initialize theta. Default 10. |
convergence |
Numeric. The EM algorithm will be stopped if the maximum difference in the contamination estimates between the previous and current iterations is less than this. Default 0.001. |
iterLogLik |
Integer. Calculate log likelihood every 'iterLogLik' iteration. Default 10. |
varGenes |
Integer. The number of variable genes to use in
Celda clustering. Variability is calcualted using
|
dbscanEps |
Numeric. The clustering resolution parameter used in 'dbscan' to estimate broad cell clusters. Used only when z is not provided. Default 1. |
L |
Integer. Number of modules for Celda clustering. Used to reduce the dimensionality of the dataset before applying UMAP and dbscan. Used only when z is not provided. Default 50. |
seed |
Integer. Passed to with_seed. For reproducibility, a default value of 12345 is used. If NULL, no calls to with_seed are made. |
logfile |
Character. Messages will be redirected to a file named 'logfile'. If NULL, messages will be printed to stdout. Default NULL. |
verbose |
Logical. Whether to print log messages. Default TRUE. |
If x
is a matrix-like object, a list will be returned
with the following items:
decontXcounts
:The decontaminated matrix. Values obtained
from the variational inference procedure may be non-integer. However,
integer counts can be obtained by rounding,
e.g. round(decontXcounts)
.
contamination
:Percentage of contamination in each cell.
estimates
:List of estimated parameters for each batch. If z was not supplied, then the UMAP coordinates used to generated cell cluster labels will also be stored here.
z
:Cell population/cluster labels used for analysis.
runParams
:List of arguments used in the function call.
If x
is a SingleCellExperiment, then the decontaminated
counts will be stored as an assay and can be accessed with
decontXcounts(x)
. The contamination values and cluster labels
will be stored in colData(x)
. estimates
and runParams
will be stored in metadata(x)$decontX
. If z was not supplied, then
the UMAPs used to generated cell cluster labels will be stored in
reducedDims
slot in x
s <- simulateContaminatedMatrix() result <- decontX(s$observedCounts, s$z) contamination <- colSums(s$observedCounts - s$nativeCounts) / colSums(s$observedCounts) plot(contamination, result$contamination)