Package 'DoubletFinder' reference manual

Title:	DoubletFinder is a suite of tools for identifying doublets in single-cell RNA sequencing data.
Description:	DoubletFinder identifies doublets by generating artificial doublets from existing scRNA-seq data and defining which real cells preferentially co-localize with artificial doublets in gene expression space. Other DoubletFinder package functions are used for fitting DoubletFinder to different scRNA-seq datasets. For example, ideal DoubletFinder performance in real-world contexts requires (I) Optimal pK selection and (2) Homotypic doublet proportion estimation. pK selection is achieved using pN-pK parameter sweeps and maxima identification in mean-variance-normalized bimodality coefficient distributions. Homotypic doublet proportion estimation is achieved by finding the sum of squared cell annotation frequencies. For more information, see our Cell Sysmtes paper https://www.cell.com/cell-systems/fulltext/S2405-4712(19)30073-0 and our github https://github.com/chris-mcginnis-ucsf/DoubletFinder
Authors:	Chris McGinnis [aut, cre]
Maintainer:	Chris McGinnis <[email protected]>
License:	CC-0
Version:	2.0.4
Built:	2025-03-10 02:16:36 UTC
Source:	https://github.com/blaserlab/DoubletFinder

bimodality_coefficient

Description

Internal function to compute bimodality coefficient during BCmvn computation and pK estimation.

Usage

bimodality_coefficient(x)
bimodality_coefficient(x)

Arguments

`x`	Guassian kernel density estimation representing pANN distribution

Value

Bimodality coefficient value

References

Taken from the 'modes' R package (v0.7)

Examples

## Internal to summarizeSweep
gkde <- approxfun(bkde(res.temp$pANN, kernel="normal"))
x <- seq(from=min(res.temp$pANN), to=max(res.temp$pANN), length.out=nrow(res.temp))
sweep.stats$BCreal[i] <- bimodality_coefficient(gkde(x))
## Internal to summarizeSweep
gkde <- approxfun(bkde(res.temp$pANN, kernel="normal"))
x <- seq(from=min(res.temp$pANN), to=max(res.temp$pANN), length.out=nrow(res.temp))
sweep.stats$BCreal[i] <- bimodality_coefficient(gkde(x))

Core doublet prediction function of the DoubletFinder package. Generates artifical doublets from an existing, pre-processed Seurat object. Real and artificial data are then merged and pre-processed using parameters utilized for the existing Seurat object. PC distance matrix is then computed and used the measure the proportion of artificial nearest neighbors (pANN) for every real cell. pANN is then thresholded according to the number of expected doublets to generate final doublet predictions.

Usage

doubletFinder(seu, PCs, pN = 0.25, pK, nExp, reuse.pANN = FALSE, sct = FALSE)
doubletFinder(seu, PCs, pN = 0.25, pK, nExp, reuse.pANN = FALSE, sct = FALSE)

Arguments

`seu`	A fully-processed Seurat object (i.e., After NormalizeData, FindVariableGenes, ScaleData, and RunPCA have all been performed).
`PCs`	Number of statistically-significant principal components (e.g., as estimated from PC elbow plot)
`pN`	The number of generated artificial doublets, expressed as a proportion of the merged real-artificial data. Default is set to 0.25, based on observation that DoubletFinder performance is largely pN-invariant (see McGinnis, Murrow and Gartner 2019, Cell Systems).
`pK`	The PC neighborhood size used to compute pANN, expressed as a proportion of the merged real-artificial data. No default is set, as pK should be adjusted for each scRNA-seq dataset. Optimal pK values can be determined using mean-variance-normalized bimodality coefficient.
`nExp`	The total number of doublet predictions produced. This value can best be estimated from cell loading densities into the 10X/Drop-Seq device, and adjusted according to the estimated proportion of homotypic doublets.
`reuse.pANN`	Seurat metadata column name for previously-generated pANN results. Argument should be set to FALSE (default) for initial DoubletFinder runs. Enables fast adjusting of doublet predictions for different nExp.
`sct`	Logical representing whether SCTransform was used during original Seurat object pre-processing (default = FALSE).

Value

Seurat object with updated metadata including pANN and doublet classifications.

Examples

## Initial run, nExp set to 0.15 Poisson loading estimate (e.g., 1000 total doublet predictions)
nExp_poi <- round(0.15*nrow([email protected]))
seu <- doubletFinder(seu, PCs = 1:10, pN = 0.25, pK = 0.01, nExp = nExp_poi, reuse.pANN = FALSE, sct=FALSE)

## With homotypic adjustment
homotypic.prop <- modelHomotypic(annotations)
nExp_poi.adj <- round(nExp_poi*(1-homotypic.prop))
seu <- doubletFinder(seu, PCs = 1:10, pN = 0.25, pK = 0.01, nExp = nExp_poi.adj, reuse.pANN = "pANN_0.25_0.01_1000", sct=FALSE)
## Initial run, nExp set to 0.15 Poisson loading estimate (e.g., 1000 total doublet predictions)
nExp_poi <- round(0.15*nrow(seu@meta.data))
seu <- doubletFinder(seu, PCs = 1:10, pN = 0.25, pK = 0.01, nExp = nExp_poi, reuse.pANN = FALSE, sct=FALSE)

## With homotypic adjustment
homotypic.prop <- modelHomotypic(annotations)
nExp_poi.adj <- round(nExp_poi*(1-homotypic.prop))
seu <- doubletFinder(seu, PCs = 1:10, pN = 0.25, pK = 0.01, nExp = nExp_poi.adj, reuse.pANN = "pANN_0.25_0.01_1000", sct=FALSE)

find.pK

Description

Computes and visualizes the mean-variance normalized bimodality coefficient (BCmvn) score for each pK value tested during doubletFinder_ParamSweep. Optimal pK for any scRNA-seq data can be manually discerned as maxima in BCmvn distributions. If ground-truth doublet classifications are available, BCmvn is plotted along with mean ROC AUC for each pK.

Usage

 bcmvn <- find.pK(sweep.stats) bcmvn <- find.pK(sweep.stats)

Arguments

sweep.stats

pN-pK bimodality coefficient dataframe as produced by summarizeSweep.

Value

Dataframe of mean BC, BC variance, and BCmvn scores for each pK value. Includes mean AUC for each pK value if ground-truth doublet classifications are utilized during summarizeSweep.

Examples

sweep.list <- paramSweep(seu)
sweep.stats <- summarizeSweep(sweep.list, GT = FALSE)
bcmvn <- find.pK(sweep.stats)
sweep.list <- paramSweep(seu)
sweep.stats <- summarizeSweep(sweep.list, GT = FALSE)
bcmvn <- find.pK(sweep.stats)

kurtosis

Description

Internal function to compute bimodality coefficient during BCmvn computation and pK estimation.

Usage

kurtosis(x)
kurtosis(x)

Arguments

`x`	Guassian kernel density estimation representing pANN distribution

Value

Kurtosis value

References

Taken from the 'modes' R package (v0.7).

Examples

## Internal to bimodality_coefficient
sample.excess.kurtosis <- kurtosis(x)
## Internal to bimodality_coefficient
sample.excess.kurtosis <- kurtosis(x)

modelHomotypic

Description

Leverages user-provided cell annotations to model the proportion of homotypic doublets. Building on the assumption that literature-supported annotations reflect real transcriptional divergence, homotypic doublet proportions are modeled as the sum of squared annotation frequencies.

Usage

modelHomotypic(annotations)modelHomotypic(annotations)

Arguments

annotations

An nCell-length character vector of annotations.

Value

Numeric proportion of homotypic doublets.

Examples

## Initial run, nExp set to Poisson loading estimate (e.g., 913 total doublet predictions)
nExp_poi <- round(0.15*length([email protected]))
seu <- doubletFinder(seu, pN = 0.25, pK = 0.01, nExp = nExp_poi, reuse.pANN = FALSE)

## With homotypic adjustment
homotypic.prop <- modelHomotypic(annotations)
nExp_poi.adj <- round(nExp_poi*(1-homotypic.prop))
seu <- doubletFinder(seu, pN = 0.25, pK = 0.01, nExp = nExp_poi.adj, reuse.pANN = "pANN_0.25_0.01_913")
## Initial run, nExp set to Poisson loading estimate (e.g., 913 total doublet predictions)
nExp_poi <- round(0.15*length(seu@cell.names))
seu <- doubletFinder(seu, pN = 0.25, pK = 0.01, nExp = nExp_poi, reuse.pANN = FALSE)

## With homotypic adjustment
homotypic.prop <- modelHomotypic(annotations)
nExp_poi.adj <- round(nExp_poi*(1-homotypic.prop))
seu <- doubletFinder(seu, pN = 0.25, pK = 0.01, nExp = nExp_poi.adj, reuse.pANN = "pANN_0.25_0.01_913")

parallel_paramSweep

Description

Internal parallelization function for paramSweep.

Usage

**NOT RUN**
parallel_paramSweep(n, n.real.cells, real.cells, pK, pN, data, orig.commands, PCs, sct)
**NOT RUN**
parallel_paramSweep(n, n.real.cells, real.cells, pK, pN, data, orig.commands, PCs, sct)

Arguments

`n`	pN iteration counter.
`n.real.cells`	Number of real cells. Set automatically during paramSweep_v3.
`real.cells`	Vector of real cell IDs. Set automatically during paramSweep_v3.
`pK`	The PC neighborhood size used to compute pANN, expressed as a proportion of the merged real-artificial data. No default is set, as pK should be adjusted for each scRNA-seq dataset. Optimal pK values can be determined using mean-variance-normalized bimodality coefficient.
`pN`	The number of generated artificial doublets, expressed as a proportion of the merged real-artificial data. Default is set to 0.25, based on observation that DoubletFinder performance is largely pN-invariant (see McGinnis, Murrow and Gartner 2019, Cell Systems).
`data`	Count matrix. Set automatically during paramSweep_v3.
`orig.commands`	Count matrix. Set automatically during paramSweep_v3.
`PCs`	Number of statistically-sigificant PCs. Set according to paramSweep_v3 arguments.
`sct`	Logical representing whether Seurat object was pre-processed using 'sctransform'. Set according to paramSweep_v3 arguments (default = F).

Value

Parallelization function compatible with mclapply.

Author(s)

Implemented by Nathan Skeene, June 2019.

paramSweep

Description

Performs pN-pK parameter sweeps on a 10,000-cell subset of a pre-processed Seurat object. Will use all cells if Seurat object contains less than 10,000 cells. Results are fed into 'summarizeSweep' and 'find.pK' functions during optimal pK parameter selection workflow. Parameters tested: pN = 0.05-0.3, pK = 0.0005-0.3.

Usage

sweep.list <- paramSweep(seu, PCs, sct=FALSE)
sweep.list <- paramSweep(seu, PCs, sct=FALSE)

Arguments

`seu`	A fully-processed Seurat object (i.e., After NormalizeData, FindVariableGenes, ScaleData, RunPCA, and RunTSNE have all been performed).
`PCs`	Number of statistically-significant principal components (e.g., as estimated from PC elbow plot)
`sct`	Logical representing whether SCTransform was used during original Seurat object pre-processing (default = FALSE).
`num.cores`	Number of cores to use for parallelization, default=1.

Value

List of pANN vectors for every pN and pK combination. Output also contains pANN information for artificial doublets.

Examples

sweep.list <- paramSweep(seu, PCs = 1:10, sct=FALSE)
sweep.stats <- summarizeSweep(sweep.list, GT = FALSE)
bcmvn <- find.pK(sweep.stats)
sweep.list <- paramSweep(seu, PCs = 1:10, sct=FALSE)
sweep.stats <- summarizeSweep(sweep.list, GT = FALSE)
bcmvn <- find.pK(sweep.stats)

skewness

Description

Internal function to compute skewness during BCmvn computation and pK estimation.

Usage

skewness(x)
skewness(x)

Arguments

`x`	Guassian kernel density estimation representing pANN distribution

Value

Skewness value

References

Taken from the 'modes' R package (v0.7).

Examples

## Internal to bimodality_coefficient
G <- skewness(x)
## Internal to bimodality_coefficient
G <- skewness(x)

summarizeSweep

Description

Summarizes results from doubletFinder_ParamSweep, computing the bimodality coefficient across pN and pK parameter space. If ground-truth doublet classifications are available, then ROC analysis is performed, enabling optimal DoubletFinder parameter selection.

Usage

sweep.stats <- summarizeSweep(sweep.list, GT = FALSE)
sweep.stats <- summarizeSweep(sweep.list, GT = TRUE, GT.calls = classifications)
sweep.stats <- summarizeSweep(sweep.list, GT = FALSE)
sweep.stats <- summarizeSweep(sweep.list, GT = TRUE, GT.calls = classifications)

Arguments

`sweep.list`	List of pANN vectors across pN-pK space, as produced by doubletFinder_ParamSweep.
`GT`	Logical set to TRUE when ground-truth doublet classifications are available for ROC analysis. Default set to FALSE.
`GT.calls`	An nCell-length character vector of ground-truth doublet classifications (e.g., "Singlet" or "Doublet") used to gauge performance of logistic regression models trained using pANN vectors during ROC analysis.

Value

Dataframe with bimodality coefficient values at each pN-pK parameter set. If GT = TRUE, dataframe also includes AUC for each pN-pK parameter set computed during ROC analysis.

Examples

sweep.list <- paramSweep(seu)
sweep.stats <- summarizeSweep(sweep.list, GT = FALSE)
bcmvn <- find.pK(sweep.stats)
sweep.list <- paramSweep(seu)
sweep.stats <- summarizeSweep(sweep.list, GT = FALSE)
bcmvn <- find.pK(sweep.stats)

Package 'DoubletFinder'

Help Index

bimodality_coefficient

Description

Usage

Arguments

Value

References

Examples

doubletFinder

Description

Usage

Arguments

Value

Examples

find.pK

Description

Usage

Arguments

Value

Examples

kurtosis

Description

Usage

Arguments

Value

References

Examples

modelHomotypic

Description

Usage

Arguments

Value

Examples

parallel_paramSweep

Description

Usage

Arguments

Value

Author(s)

paramSweep

Description

Usage

Arguments

Value

Examples

skewness

Description

Usage

Arguments

Value

References

Examples

summarizeSweep

Description

Usage

Arguments

Value

Examples