Title: | Analyzing single-cell regulatory chromatin in R. |
---|---|
Description: | This package is designed to streamline scATAC analyses in R. |
Authors: | Jeffrey Granja [aut, cre], Ryan Corces [aut] |
Maintainer: | Jeffrey Granja <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.0.2 |
Built: | 2024-11-22 06:25:31 UTC |
Source: | https://github.com/GreenleafLab/ArchR |
This function will allow direct access to cellColData with a $
accessor.
.DollarNames.ArchRProject(x, pattern = "")
.DollarNames.ArchRProject(x, pattern = "")
This function will allow adding directly to cellColData with a $
accessor.
## S3 method for class 'ArchRProject' x[i, j]
## S3 method for class 'ArchRProject' x[i, j]
This function provides a generic matching function for S4Vector objects primarily to avoid ambiguity.
x %bcin% table
x %bcin% table
x |
An |
table |
The set of |
This function provides the reciprocal of %bcin% for S4Vector objects primarily to avoid ambiguity.
x %bcni% table
x %bcni% table
x |
An |
table |
The set of |
This function is the reciprocal of %in%. See the match funciton in base R.
x %ni% table
x %ni% table
x |
The value to search for in |
table |
The set of values to serve as the base for the match function. |
This function will allow direct access to cellColData with a $
accessor.
## S3 method for class 'ArchRProject' x$i
## S3 method for class 'ArchRProject' x$i
This function will allow adding directly to cellColData with a $
accessor.
## S3 replacement method for class 'ArchRProject' x$i <- value
## S3 replacement method for class 'ArchRProject' x$i <- value
This function adds information about which peaks in the ArchR database contain input regions to a given ArchRProject. For each peak, a binary value is stored indicating whether each region is observed within the peak region.
addArchRAnnotations( ArchRProj = NULL, db = "ArchR", collection = "EncodeTFBS", name = collection, force = FALSE, logFile = createLogFile("addArchRAnnotations") )
addArchRAnnotations( ArchRProj = NULL, db = "ArchR", collection = "EncodeTFBS", name = collection, force = FALSE, logFile = createLogFile("addArchRAnnotations") )
ArchRProj |
An |
db |
A string indicating the database or a path to a database to use for peak annotation. Options include ArchR,
LOLA, and a valid path to a file of class |
collection |
A string indicating which collection within the database to collect for annotation.
For ArchR, options are "ATAC", "EncodeTFBS", "CistromeTFBS", or "Codex".
For LOLA, options include "EncodeTFBS" "CistromeTFBS", "CistromeEpigenome", "Codex", or "SheffieldDnase".
If supplying a custom |
name |
The name of the |
force |
A boolean value indicating whether to force the |
logFile |
The path to a file to be used for logging ArchR output. |
This function will set the default requirement of chromosomes to have a "chr" prefix.
addArchRChrPrefix(chrPrefix = TRUE)
addArchRChrPrefix(chrPrefix = TRUE)
chrPrefix |
A boolean describing the requirement of chromosomes to have a "chr" prefix. |
This function will set ArchR Debugging which will save an RDS if an error is encountered.
addArchRDebugging(debug = FALSE)
addArchRDebugging(debug = FALSE)
debug |
A boolean describing whether to use logging with ArchR. |
This function will set the genome across all ArchR functions.
addArchRGenome(genome = NULL, install = TRUE)
addArchRGenome(genome = NULL, install = TRUE)
genome |
A string indicating the default genome to be used for all ArchR functions.
Currently supported values include "hg19","hg38","mm9", and "mm10".
This value is stored as a global environment variable, not part of the |
install |
A boolean value indicating whether the |
This function will set ArchR logging
addArchRLogging(useLogs = TRUE)
addArchRLogging(useLogs = TRUE)
useLogs |
A boolean describing whether to use logging with ArchR. |
This function will set the number of threads to be used for parallel computing across all ArchR functions.
addArchRThreads(threads = floor(parallel::detectCores()/2), force = FALSE)
addArchRThreads(threads = floor(parallel::detectCores()/2), force = FALSE)
threads |
The default number of threads to be used for parallel execution across all ArchR functions.
This value is stored as a global environment variable, not part of the |
force |
If you request more than the total number of CPUs minus 2, ArchR will set |
This function will set ArchR logging verbosity.
addArchRVerbose(verbose = TRUE)
addArchRVerbose(verbose = TRUE)
verbose |
A boolean describing whether to printMessages in addition to logging with ArchR. |
This function will compute background peaks controlling for total accessibility and GC-content and add this information to an ArchRProject.
addBgdPeaks( ArchRProj = NULL, nIterations = 50, w = 0.1, binSize = 50, seed = 1, method = "chromVAR", outFile = file.path(getOutputDirectory(ArchRProj), "Background-Peaks.rds"), force = FALSE )
addBgdPeaks( ArchRProj = NULL, nIterations = 50, w = 0.1, binSize = 50, seed = 1, method = "chromVAR", outFile = file.path(getOutputDirectory(ArchRProj), "Background-Peaks.rds"), force = FALSE )
ArchRProj |
An |
nIterations |
The number of background peaks to sample. See |
w |
The parameter controlling similarity of background peaks. See |
binSize |
The precision with which the similarity is computed. See |
seed |
A number to be used as the seed for random number generation. It is recommended to keep track of the seed used so that you can reproduce results downstream. |
method |
A string indicating whether to use chromVAR or ArchR for background peak identification. |
outFile |
The path to save the |
force |
A boolean value indicating whether to force the file indicated by |
This function adds new data to cellColData in a given ArchRProject.
addCellColData( ArchRProj = NULL, data = NULL, name = NULL, cells = NULL, force = FALSE )
addCellColData( ArchRProj = NULL, data = NULL, name = NULL, cells = NULL, force = FALSE )
ArchRProj |
An |
data |
The data to add to |
name |
The column header name to be used for this new data in |
cells |
The names of the cells corresponding to |
force |
A boolean value indicating whether or not to overwrite data in a given column when the value passed to |
This function will identify clusters from a reduced dimensions object in an ArchRProject or from a supplied reduced dimensions matrix.
addClusters( input = NULL, reducedDims = "IterativeLSI", name = "Clusters", sampleCells = NULL, seed = 1, method = "Seurat", dimsToUse = NULL, scaleDims = NULL, corCutOff = 0.75, knnAssign = 10, nOutlier = 5, maxClusters = 25, testBias = TRUE, filterBias = FALSE, biasClusters = 0.01, biasCol = "nFrags", biasVals = NULL, biasQuantiles = c(0.05, 0.95), biasEnrich = 10, biasProportion = 0.5, biasPval = 0.05, nPerm = 500, prefix = "C", ArchRProj = NULL, verbose = TRUE, tstart = NULL, force = FALSE, logFile = createLogFile("addClusters"), ... )
addClusters( input = NULL, reducedDims = "IterativeLSI", name = "Clusters", sampleCells = NULL, seed = 1, method = "Seurat", dimsToUse = NULL, scaleDims = NULL, corCutOff = 0.75, knnAssign = 10, nOutlier = 5, maxClusters = 25, testBias = TRUE, filterBias = FALSE, biasClusters = 0.01, biasCol = "nFrags", biasVals = NULL, biasQuantiles = c(0.05, 0.95), biasEnrich = 10, biasProportion = 0.5, biasPval = 0.05, nPerm = 500, prefix = "C", ArchRProj = NULL, verbose = TRUE, tstart = NULL, force = FALSE, logFile = createLogFile("addClusters"), ... )
input |
Either (i) an |
reducedDims |
The name of the |
name |
The column name of the cluster label column to be added to |
sampleCells |
An integer specifying the number of cells to subsample and perform clustering on. The remaining cells that were not subsampled will be assigned to the cluster of the nearest subsampled cell. This enables a decrease in run time but can sacrifice granularity of clusters. |
seed |
A number to be used as the seed for random number generation required in cluster determination. It is recommended to keep track of the seed used so that you can reproduce results downstream. |
method |
A string indicating the clustering method to be used. Supported methods are "Seurat" and "Scran". |
dimsToUse |
A vector containing the dimensions from the |
scaleDims |
A boolean value that indicates whether to z-score the reduced dimensions for each cell. This is useful for minimizing the contribution
of strong biases (dominating early PCs) and lowly abundant populations. However, this may lead to stronger sample-specific biases since
it is over-weighting latent PCs. If set to |
corCutOff |
A numeric cutoff for the correlation of each dimension to the sequencing depth. If the dimension has a correlation to
sequencing depth that is greater than the |
knnAssign |
The number of nearest neighbors to be used during clustering for assignment of outliers (clusters with less than nOutlier cells). |
nOutlier |
The minimum number of cells required for a group of cells to be called as a cluster. If a group of cells does not reach this threshold, then the cells will be considered outliers and assigned to nearby clusters. |
maxClusters |
The maximum number of clusters to be called. If the number exceeds this the clusters are merged unbiasedly using hclust and cutree. This is useful for contraining the cluster calls to be reasonable if they are converging on large numbers. Useful in iterativeLSI as well for initial iteration. Default is set to 25. |
testBias |
A boolean value that indicates whether or not to test clusters for bias. |
filterBias |
A boolean value indicates whether or not to filter clusters that are identified as biased. |
biasClusters |
A numeric value between 0 and 1 indicating that clusters that are smaller than the specified proportion of total cells are to be checked for bias. This should be set close to 0. We recommend a default of 0.01 which specifies clusters below 1 percent of the total cells. |
biasCol |
The name of a column in |
biasVals |
A set of numeric values used for testing bias enrichment if |
biasQuantiles |
A vector of two numeric values, each between 0 and 1, that describes the lower and upper quantiles of the bias values to use for computing bias enrichment statistics. |
biasEnrich |
A numeric value that specifies the minimum enrichment of biased cells over the median of the permuted background sets. |
biasProportion |
A numeric value between 0 and 1 that specifies the minimum proportion of biased cells in a cluster required to determine that the cluster is biased during testing for bias-enriched clusters. |
biasPval |
A numeric value between 0 and 1 that specifies the p-value to use when testing for bias-enriched clusters. |
nPerm |
An integer specifying the number of permutations to perform for testing bias-enriched clusters. |
prefix |
A character string to be added before each cluster identity. For example, if "Cluster" then cluster results will be "Cluster1", "Cluster2" etc. |
ArchRProj |
An |
verbose |
A boolean value indicating whether to use verbose output during execution of this function. Can be set to FALSE for a cleaner output. |
tstart |
A timestamp that is typically passed internally from another function (for ex. "IterativeLSI") to measure how long the clustering analysis has been running relative to the start time when this process was initiated in another function. This argument is rarely manually specified. |
force |
A boolean value that indicates whether or not to overwrite data in a given column when the value passed to |
logFile |
The path to a file to be used for logging ArchR output. |
... |
Additional arguments to be provided to Seurat::FindClusters or scran::buildSNNGraph (for example, knn = 50, jaccard = TRUE) |
This function will add co-accessibility scores to peaks in a given ArchRProject
addCoAccessibility( ArchRProj = NULL, reducedDims = "IterativeLSI", dimsToUse = 1:30, scaleDims = NULL, corCutOff = 0.75, cellsToUse = NULL, k = 100, knnIteration = 500, overlapCutoff = 0.8, maxDist = 1e+05, scaleTo = 10^4, log2Norm = TRUE, seed = 1, threads = getArchRThreads(), verbose = TRUE, logFile = createLogFile("addCoAccessibility") )
addCoAccessibility( ArchRProj = NULL, reducedDims = "IterativeLSI", dimsToUse = 1:30, scaleDims = NULL, corCutOff = 0.75, cellsToUse = NULL, k = 100, knnIteration = 500, overlapCutoff = 0.8, maxDist = 1e+05, scaleTo = 10^4, log2Norm = TRUE, seed = 1, threads = getArchRThreads(), verbose = TRUE, logFile = createLogFile("addCoAccessibility") )
ArchRProj |
An |
reducedDims |
The name of the |
dimsToUse |
A vector containing the dimensions from the |
scaleDims |
A boolean value that indicates whether to z-score the reduced dimensions for each cell. This is useful for minimizing
the contribution of strong biases (dominating early PCs) and lowly abundant populations. However, this may lead to stronger sample-specific
biases since it is over-weighting latent PCs. If set to |
corCutOff |
A numeric cutoff for the correlation of each dimension to the sequencing depth. If the dimension has a correlation to
sequencing depth that is greater than the |
cellsToUse |
A character vector of cellNames to compute coAccessibility on if desired to run on a subset of the total cells. |
k |
The number of k-nearest neighbors to use for creating single-cell groups for correlation analyses. |
knnIteration |
The number of k-nearest neighbor groupings to test for passing the supplied |
overlapCutoff |
The maximum allowable overlap between the current group and all previous groups to permit the current group be added to the group list during k-nearest neighbor calculations. |
maxDist |
The maximum allowable distance in basepairs between two peaks to consider for co-accessibility. |
scaleTo |
The total insertion counts from the designated group of single cells is summed across all relevant peak regions from
the |
log2Norm |
A boolean value indicating whether to log2 transform the single-cell groups prior to computing co-accessibility correlations. |
seed |
A number to be used as the seed for random number generation required in knn determination. It is recommended to keep track of the seed used so that you can reproduce results downstream. |
threads |
The number of threads to be used for parallel computing. |
verbose |
A boolean value that determines whether standard output should be printed. |
logFile |
The path to a file to be used for logging ArchR output. |
This function will combine two or more modalities dimensionality reductions into a single reduction.
addCombinedDims( ArchRProj = NULL, name = "CombinedDims", reducedDims = NULL, dimWeights = NULL, dimsToUse = NULL, scaleDims = NULL, corCutOff = 0.75 )
addCombinedDims( ArchRProj = NULL, name = "CombinedDims", reducedDims = NULL, dimWeights = NULL, dimsToUse = NULL, scaleDims = NULL, corCutOff = 0.75 )
ArchRProj |
An |
name |
The name for the combinedDims to be stored as. |
reducedDims |
The name of the |
dimWeights |
A vector of weights to be used to weight each dimensionality reduction when combining. |
dimsToUse |
A vector containing the dimensions from the |
scaleDims |
A boolean value that indicates whether to z-score the reduced dimensions for each cell. This is useful for minimizing
the contribution of strong biases (dominating early PCs) and lowly abundant populations. However, this may lead to stronger sample-specific
biases since it is over-weighting latent PCs. If set to |
corCutOff |
A numeric cutoff for the correlation of each dimension to the sequencing depth. If the dimension has a correlation to
sequencing depth that is greater than the |
This function will read in the .best file output from demuxlet and add the doublet classifications into the cellColData for the ArchR Project
addDemuxletResults(ArchRProj = NULL, bestFiles = NULL, sampleNames = NULL)
addDemuxletResults(ArchRProj = NULL, bestFiles = NULL, sampleNames = NULL)
ArchRProj |
An |
bestFiles |
The file path to the .best files created by Demuxlet. There should be one .best file for each sample in the |
sampleNames |
The sample names corresponding to the .best files. These must match the sample names present in the |
This function will compute peakAnnotation deviations for each ArrowFiles independently while controlling for global biases (low-memory requirement).
addDeviationsMatrix( ArchRProj = NULL, peakAnnotation = NULL, matches = NULL, bgdPeaks = getBgdPeaks(ArchRProj, method = "chromVAR"), matrixName = NULL, out = c("z", "deviations"), binarize = FALSE, threads = getArchRThreads(), verbose = TRUE, parallelParam = NULL, force = FALSE, logFile = createLogFile("addDeviationsMatrix") )
addDeviationsMatrix( ArchRProj = NULL, peakAnnotation = NULL, matches = NULL, bgdPeaks = getBgdPeaks(ArchRProj, method = "chromVAR"), matrixName = NULL, out = c("z", "deviations"), binarize = FALSE, threads = getArchRThreads(), verbose = TRUE, parallelParam = NULL, force = FALSE, logFile = createLogFile("addDeviationsMatrix") )
ArchRProj |
An |
peakAnnotation |
The name of the |
matches |
A custom |
bgdPeaks |
A |
matrixName |
The name to be used for storage of the deviations matrix in the provided |
out |
A string or character vector that indicates whether to save the ouptut matrices as deviations ("deviations") z-scores ("z"), or both (c("deviations","z")). |
binarize |
A boolean value indicating whether the input matrix should be binarized before calculating deviations. This is often desired when working with insertion counts. |
threads |
The number of threads to be used for parallel computing. |
verbose |
A boolean value that determines whether standard output includes verbose sections. |
parallelParam |
A list of parameters to be passed for biocparallel/batchtools parallel computing. |
force |
A boolean value indicating whether to force the matrix indicated by |
logFile |
The path to a file to be used for logging ArchR output. |
For each sample in the ArrowFiles or ArchRProject provided, this function will independently assign inferred doublet information to each cell. This allows for removing strong heterotypic doublet-based clusters downstream. A doublet results from a droplet that contained two cells, causing the ATAC-seq data to be a mixture of the signal from each cell.
addDoubletScores( input = NULL, useMatrix = "TileMatrix", k = 10, nTrials = 5, dimsToUse = 1:30, LSIMethod = 1, scaleDims = FALSE, corCutOff = 0.75, knnMethod = "UMAP", UMAPParams = list(n_neighbors = 40, min_dist = 0.4, metric = "euclidean", verbose = FALSE), LSIParams = list(outlierQuantiles = NULL, filterBias = FALSE), outDir = getOutputDirectory(input), threads = getArchRThreads(), force = FALSE, parallelParam = NULL, verbose = TRUE, logFile = createLogFile("addDoubletScores") )
addDoubletScores( input = NULL, useMatrix = "TileMatrix", k = 10, nTrials = 5, dimsToUse = 1:30, LSIMethod = 1, scaleDims = FALSE, corCutOff = 0.75, knnMethod = "UMAP", UMAPParams = list(n_neighbors = 40, min_dist = 0.4, metric = "euclidean", verbose = FALSE), LSIParams = list(outlierQuantiles = NULL, filterBias = FALSE), outDir = getOutputDirectory(input), threads = getArchRThreads(), force = FALSE, parallelParam = NULL, verbose = TRUE, logFile = createLogFile("addDoubletScores") )
input |
An |
useMatrix |
The name of the matrix to be used for performing doublet identification analyses. Options include "TileMatrix" and "PeakMatrix". |
k |
The number of cells neighboring a simulated doublet to be considered as putative doublets. |
nTrials |
The number of times to simulate nCell (number of cells in the sample) doublets to use for doublet simulation when calculating doublet scores. |
dimsToUse |
A vector containing the dimensions from the |
LSIMethod |
A number or string indicating the order of operations in the TF-IDF normalization. Possible values are: 1 or "tf-logidf", 2 or "log(tf-idf)", and 3 or "logtf-logidf". |
scaleDims |
A boolean that indicates whether to z-score the reduced dimensions for each cell during the LSI method performed for doublet determination. This is useful for minimizing the contribution of strong biases (dominating early PCs) and lowly abundant populations. However, this may lead to stronger sample-specific biases since it is over-weighting latent PCs. |
corCutOff |
A numeric cutoff for the correlation of each dimension to the sequencing depth. If the dimension has a correlation
to sequencing depth that is greater than the |
knnMethod |
The name of the dimensionality reduction method to be used for k-nearest neighbors calculation. Possible values are "UMAP" or "LSI". |
UMAPParams |
The list of parameters to pass to the UMAP function if "UMAP" is designated to |
LSIParams |
The list of parameters to pass to the |
outDir |
The relative path to the output directory for relevant plots/results from doublet identification. |
threads |
The number of threads to be used for parallel computing. |
force |
If the UMAP projection is not accurate (when R < 0.8 for the reprojection of the training data - this occurs when you
have a very homogenous population of cells), setting |
parallelParam |
A list of parameters to be passed for biocparallel/batchtools parallel computing. |
verbose |
A boolean value that determines whether standard output is printed. |
logFile |
The path to a file to be used for logging ArchR output. |
This function will add total counts of scATAC cells in provided features into ArchRProject.
addFeatureCounts( ArchRProj = NULL, features = NULL, name = NULL, addRatio = TRUE, threads = getArchRThreads(), logFile = createLogFile("addFeatureCounts") )
addFeatureCounts( ArchRProj = NULL, features = NULL, name = NULL, addRatio = TRUE, threads = getArchRThreads(), logFile = createLogFile("addFeatureCounts") )
ArchRProj |
An |
features |
A |
name |
A character defining the name of the features. " |
addRatio |
A boolean indicating whether to add the " |
threads |
The number of threads to use for parallel execution. |
logFile |
The path to a file to be used for logging ArchR output. |
This function for each sample will independently compute counts for each feature per cell in the provided ArchRProject or set of ArrowFiles.
addFeatureMatrix( input = NULL, features = NULL, matrixName = "FeatureMatrix", ceiling = 10^9, binarize = FALSE, verbose = TRUE, threads = getArchRThreads(), parallelParam = NULL, force = TRUE, logFile = createLogFile("addFeatureMatrix") )
addFeatureMatrix( input = NULL, features = NULL, matrixName = "FeatureMatrix", ceiling = 10^9, binarize = FALSE, verbose = TRUE, threads = getArchRThreads(), parallelParam = NULL, force = TRUE, logFile = createLogFile("addFeatureMatrix") )
input |
An |
features |
A |
matrixName |
The name to be used for storage of the feature matrix in the provided |
ceiling |
The maximum counts per feature allowed. This is used to prevent large biases in feature counts. |
binarize |
A boolean value indicating whether the feature matrix should be binarized prior to storage. This can be useful for downstream analyses when working with insertion counts. |
verbose |
A boolean value that determines whether standard output includes verbose sections. |
threads |
The number of threads to be used for parallel computing. |
parallelParam |
A list of parameters to be passed for biocparallel/batchtools parallel computing. |
force |
A boolean value indicating whether to force the matrix indicated by |
logFile |
The path to a file to be used for logging ArchR output. |
This function, for each sample, will add gene expression values from a paired scATAC-seq + scRNA-seq multi modal assay to the ArrowFiles or ArchRProject.
addGeneExpressionMatrix( input = NULL, seRNA = NULL, chromSizes = getChromSizes(input), excludeChr = c("chrM", "chrY"), scaleTo = 10000, verbose = TRUE, threads = getArchRThreads(), parallelParam = NULL, strictMatch = FALSE, force = TRUE, logFile = createLogFile("addGeneExpressionMatrix") )
addGeneExpressionMatrix( input = NULL, seRNA = NULL, chromSizes = getChromSizes(input), excludeChr = c("chrM", "chrY"), scaleTo = 10000, verbose = TRUE, threads = getArchRThreads(), parallelParam = NULL, strictMatch = FALSE, force = TRUE, logFile = createLogFile("addGeneExpressionMatrix") )
input |
An |
seRNA |
A a scRNA-seq |
chromSizes |
A GRanges object of the chromosome lengths. See |
excludeChr |
A character vector containing the |
scaleTo |
Each column in the calculated gene score matrix will be normalized to a column sum designated by |
verbose |
A boolean describing whether to print to console messages of progress. |
threads |
The number of threads to be used for parallel computing. |
parallelParam |
A list of parameters to be passed for biocparallel/batchtools parallel computing. |
strictMatch |
A boolean value indicating whether every cell in |
force |
A boolean value indicating whether to force the matrix indicated by |
logFile |
The path to a file to be used for logging ArchR output. |
This function, will integrate multiple subsets of scATAC cells with a scRNA experiment, compute matched scRNA profiles and then store this in each samples ArrowFile.
addGeneIntegrationMatrix( ArchRProj = NULL, useMatrix = "GeneScoreMatrix", matrixName = "GeneIntegrationMatrix", reducedDims = "IterativeLSI", seRNA = NULL, groupATAC = NULL, groupRNA = NULL, groupList = NULL, sampleCellsATAC = 10000, sampleCellsRNA = 10000, embeddingATAC = NULL, embeddingRNA = NULL, dimsToUse = 1:30, scaleDims = NULL, corCutOff = 0.75, plotUMAP = TRUE, UMAPParams = list(n_neighbors = 40, min_dist = 0.4, metric = "cosine", verbose = FALSE), nGenes = 2000, useImputation = TRUE, reduction = "cca", addToArrow = TRUE, scaleTo = 10000, genesUse = NULL, nameCell = "predictedCell", nameGroup = "predictedGroup", nameScore = "predictedScore", transferParams = list(), threads = getArchRThreads(), verbose = TRUE, force = FALSE, logFile = createLogFile("addGeneIntegrationMatrix"), ... )
addGeneIntegrationMatrix( ArchRProj = NULL, useMatrix = "GeneScoreMatrix", matrixName = "GeneIntegrationMatrix", reducedDims = "IterativeLSI", seRNA = NULL, groupATAC = NULL, groupRNA = NULL, groupList = NULL, sampleCellsATAC = 10000, sampleCellsRNA = 10000, embeddingATAC = NULL, embeddingRNA = NULL, dimsToUse = 1:30, scaleDims = NULL, corCutOff = 0.75, plotUMAP = TRUE, UMAPParams = list(n_neighbors = 40, min_dist = 0.4, metric = "cosine", verbose = FALSE), nGenes = 2000, useImputation = TRUE, reduction = "cca", addToArrow = TRUE, scaleTo = 10000, genesUse = NULL, nameCell = "predictedCell", nameGroup = "predictedGroup", nameScore = "predictedScore", transferParams = list(), threads = getArchRThreads(), verbose = TRUE, force = FALSE, logFile = createLogFile("addGeneIntegrationMatrix"), ... )
ArchRProj |
An |
useMatrix |
The name of a matrix in the |
matrixName |
The name to use for the output matrix containing scRNA-seq integration to be stored in the |
reducedDims |
The name of the |
seRNA |
A |
groupATAC |
A column name in |
groupRNA |
A column name in either |
groupList |
A list of cell groupings for both ATAC-seq and RNA-seq cells to be used for RNA-ATAC integration.
This is used to constrain the integration to occur across biologically relevant groups. The format of this should be a list of groups
with subgroups of ATAC and RNA specifying cells to integrate from both platforms.
For example |
sampleCellsATAC |
An integer describing the number of scATAC-seq cells to be used for integration. This number will be evenly sampled across the total number of cells in the ArchRProject. |
sampleCellsRNA |
An integer describing the number of scRNA-seq cells to be used for integration. |
embeddingATAC |
A |
embeddingRNA |
A |
dimsToUse |
A vector containing the dimensions from the |
scaleDims |
A boolean value that indicates whether to z-score the reduced dimensions for each cell. This is useful for minimizing
the contribution of strong biases (dominating early PCs) and lowly abundant populations. However, this may lead to stronger sample-specific
biases since it is over-weighting latent PCs. If set to |
corCutOff |
A numeric cutoff for the correlation of each dimension to the sequencing depth. If the dimension has a
correlation to sequencing depth that is greater than the |
plotUMAP |
A boolean determining whether to plot a UMAP for each integration block. |
UMAPParams |
The list of parameters to pass to the UMAP function if "plotUMAP = TRUE". See the function |
nGenes |
The number of variable genes determined by |
useImputation |
A boolean value indicating whether to use imputation for creating the Gene Score Matrix prior to integration. |
reduction |
The Seurat reduction method to use for integrating modalities. See |
addToArrow |
A boolean value indicating whether to add the log2-normalized transcript counts from the integrated matched RNA to the Arrow files. |
scaleTo |
Each column in the integrated RNA matrix will be normalized to a column sum designated by |
genesUse |
If desired a character vector of gene names to use for integration instead of determined ones from Seurat::variableGenes. |
nameCell |
A column name to add to |
nameGroup |
A column name to add to |
nameScore |
A column name to add to |
transferParams |
Additional params to be passed to |
threads |
The number of threads to be used for parallel computing. |
verbose |
A boolean value that determines whether standard output includes verbose sections. |
force |
A boolean value indicating whether to force the matrix indicated by |
logFile |
The path to a file to be used for logging ArchR output. |
... |
Additional params to be added to |
This function, for each sample, will independently compute counts for each tile per cell and then infer gene activity scores.
addGeneScoreMatrix( input = NULL, genes = getGenes(input), geneModel = "exp(-abs(x)/5000) + exp(-1)", matrixName = "GeneScoreMatrix", extendUpstream = c(1000, 1e+05), extendDownstream = c(1000, 1e+05), geneUpstream = 5000, geneDownstream = 0, useGeneBoundaries = TRUE, useTSS = FALSE, extendTSS = FALSE, tileSize = 500, ceiling = 4, geneScaleFactor = 5, scaleTo = 10000, excludeChr = c("chrY", "chrM"), blacklist = getBlacklist(input), threads = getArchRThreads(), parallelParam = NULL, subThreading = TRUE, force = FALSE, logFile = createLogFile("addGeneScoreMatrix") )
addGeneScoreMatrix( input = NULL, genes = getGenes(input), geneModel = "exp(-abs(x)/5000) + exp(-1)", matrixName = "GeneScoreMatrix", extendUpstream = c(1000, 1e+05), extendDownstream = c(1000, 1e+05), geneUpstream = 5000, geneDownstream = 0, useGeneBoundaries = TRUE, useTSS = FALSE, extendTSS = FALSE, tileSize = 500, ceiling = 4, geneScaleFactor = 5, scaleTo = 10000, excludeChr = c("chrY", "chrM"), blacklist = getBlacklist(input), threads = getArchRThreads(), parallelParam = NULL, subThreading = TRUE, force = FALSE, logFile = createLogFile("addGeneScoreMatrix") )
input |
An |
genes |
A stranded |
geneModel |
A string giving a "gene model function" used for weighting peaks for gene score calculation. This string
should be a function of |
matrixName |
The name to be used for storage of the gene activity score matrix in the provided |
extendUpstream |
The minimum and maximum number of basepairs upstream of the transcription start site to consider for gene activity score calculation. |
extendDownstream |
The minimum and maximum number of basepairs downstream of the transcription start site or transcription termination site (based on 'useTSS') to consider for gene activity score calculation. |
geneUpstream |
An integer describing the number of bp upstream the gene to extend the gene body. This effectively makes the gene body larger as there are proximal peaks that should be weighted equally to the gene body. This parameter is used if 'useTSS=FALSE'. |
geneDownstream |
An integer describing the number of bp downstream the gene to extend the gene body.This effectively makes the gene body larger as there are proximal peaks that should be weighted equally to the gene body. This parameter is used if 'useTSS=FALSE'. |
useGeneBoundaries |
A boolean value indicating whether gene boundaries should be employed during gene activity score calculation. Gene boundaries refers to the process of preventing tiles from contributing to the gene score of a given gene if there is a second gene's transcription start site between the tile and the gene of interest. |
useTSS |
A boolean describing whether to build gene model based on gene TSS or the gene body. |
extendTSS |
A boolean describing whether to extend the gene TSS. By default useTSS uses the 1bp TSS while this parameter enables the extension of this region with 'geneUpstream' and 'geneDownstream' respectively. |
tileSize |
The size of the tiles used for binning counts prior to gene activity score calculation. |
ceiling |
The maximum counts per tile allowed. This is used to prevent large biases in tile counts. |
geneScaleFactor |
A numeric scaling factor to weight genes based on the inverse of there length i.e. (Scale Factor)/(Gene Length). This is scaled from 1 to the scale factor. Small genes will be the scale factor while extremely large genes will be closer to 1. This scaling helps with the relative gene score value. |
scaleTo |
Each column in the calculated gene score matrix will be normalized to a column sum designated by |
excludeChr |
A character vector containing the |
blacklist |
A |
threads |
The number of threads to be used for parallel computing. |
parallelParam |
A list of parameters to be passed for biocparallel/batchtools parallel computing. |
subThreading |
A boolean determining whether possible use threads within each multi-threaded subprocess if greater than the number of input samples. |
force |
A boolean value indicating whether to force the matrix indicated by |
logFile |
The path to a file to be used for logging ArchR output. |
This function will merge cells within each designated cell group for the generation of pseudo-bulk replicates and then merge these replicates into a single insertion coverage file.
addGroupCoverages( ArchRProj = NULL, groupBy = "Clusters", useLabels = TRUE, sampleLabels = "Sample", minCells = 40, maxCells = 500, maxFragments = 25 * 10^6, minReplicates = 2, maxReplicates = 5, sampleRatio = 0.8, kmerLength = 6, threads = getArchRThreads(), returnGroups = FALSE, parallelParam = NULL, force = FALSE, verbose = TRUE, logFile = createLogFile("addGroupCoverages") )
addGroupCoverages( ArchRProj = NULL, groupBy = "Clusters", useLabels = TRUE, sampleLabels = "Sample", minCells = 40, maxCells = 500, maxFragments = 25 * 10^6, minReplicates = 2, maxReplicates = 5, sampleRatio = 0.8, kmerLength = 6, threads = getArchRThreads(), returnGroups = FALSE, parallelParam = NULL, force = FALSE, verbose = TRUE, logFile = createLogFile("addGroupCoverages") )
ArchRProj |
An |
groupBy |
The name of the column in |
useLabels |
A boolean value indicating whether to use sample labels to create sample-aware subgroupings during as pseudo-bulk replicate generation. |
sampleLabels |
The name of a column in |
minCells |
The minimum number of cells required in a given cell group to permit insertion coverage file generation. |
maxCells |
The maximum number of cells to use during insertion coverage file generation. |
maxFragments |
The maximum number of fragments per cell group to use in insertion coverage file generation. This prevents the generation of excessively large files which would negatively impact memory requirements. |
minReplicates |
The minimum number of pseudo-bulk replicates to be generated. |
maxReplicates |
The maximum number of pseudo-bulk replicates to be generated. |
sampleRatio |
The fraction of the total cells that can be sampled to generate any given pseudo-bulk replicate. |
kmerLength |
The length of the k-mer used for estimating Tn5 bias. |
threads |
The number of threads to be used for parallel computing. |
returnGroups |
A boolean value that indicates whether to return sample-guided cell-groupings without creating coverages.
This is used mainly in |
parallelParam |
A list of parameters to be passed for biocparallel/batchtools parallel computing. |
force |
A boolean value that indicates whether or not to overwrite the relevant data in the |
verbose |
A boolean value that determines whether standard output includes verbose sections. |
logFile |
The path to a file to be used for logging ArchR output. |
This function will add the Harmony batch-corrected reduced dimensions to an ArchRProject.
addHarmony( ArchRProj = NULL, reducedDims = "IterativeLSI", dimsToUse = NULL, scaleDims = NULL, corCutOff = 0.75, name = "Harmony", groupBy = "Sample", verbose = TRUE, force = FALSE, ... )
addHarmony( ArchRProj = NULL, reducedDims = "IterativeLSI", dimsToUse = NULL, scaleDims = NULL, corCutOff = 0.75, name = "Harmony", groupBy = "Sample", verbose = TRUE, force = FALSE, ... )
ArchRProj |
An |
reducedDims |
The name of the |
dimsToUse |
A vector containing the dimensions from the |
scaleDims |
A boolean that indicates whether to z-score the reduced dimensions for each cell. This is useful forminimizing the contribution
of strong biases (dominating early PCs) and lowly abundant populations. However, this may lead to stronger sample-specific biases since
it is over-weighting latent PCs. If set to |
corCutOff |
A numeric cutoff for the correlation of each dimension to the sequencing depth. If the dimension has a correlation
to sequencing depth that is greater than the |
name |
The name to store harmony output as a |
groupBy |
The name of the column in |
verbose |
A boolean value indicating whether to use verbose output during execution of this function. Can be set to FALSE for a cleaner output. |
force |
A boolean value that indicates whether or not to overwrite data in a given column when the value passed to |
... |
Additional arguments to be provided to harmony::HarmonyMatrix |
This function computes imputations weights that describe each cell as a linear combination of many cells based on a MAGIC diffusion matrix.
addImputeWeights( ArchRProj = NULL, reducedDims = "IterativeLSI", dimsToUse = NULL, scaleDims = NULL, corCutOff = 0.75, td = 3, ka = 4, sampleCells = 5000, nRep = 2, k = 15, epsilon = 1, useHdf5 = TRUE, randomSuffix = FALSE, threads = getArchRThreads(), seed = 1, verbose = TRUE, logFile = createLogFile("addImputeWeights") )
addImputeWeights( ArchRProj = NULL, reducedDims = "IterativeLSI", dimsToUse = NULL, scaleDims = NULL, corCutOff = 0.75, td = 3, ka = 4, sampleCells = 5000, nRep = 2, k = 15, epsilon = 1, useHdf5 = TRUE, randomSuffix = FALSE, threads = getArchRThreads(), seed = 1, verbose = TRUE, logFile = createLogFile("addImputeWeights") )
ArchRProj |
An |
reducedDims |
The name of the |
dimsToUse |
A vector containing the dimensions from the |
scaleDims |
A boolean that indicates whether to z-score the reduced dimensions for each cell. This is useful forminimizing the contribution
of strong biases (dominating early PCs) and lowly abundant populations. However, this may lead to stronger sample-specific biases since
it is over-weighting latent PCs. If set to |
corCutOff |
A numeric cutoff for the correlation of each dimension to the sequencing depth. If the dimension has a correlation to
sequencing depth that is greater than the |
td |
The diffusion time parameter determines the number of smoothing iterations to be performed (see MAGIC from van Dijk et al Cell 2018). |
ka |
The k-nearest neighbors autotune parameter to equalize the effective number of neighbors for each cell, thereby diminishing the effect of differences in density. (see MAGIC from van Dijk et al Cell 2018). |
sampleCells |
The number of cells to sub-sample to compute an imputation block. An imputation block is a cell x cell matrix that describes the linear combination for imputation for numerical values within these cells. ArchR creates many blocks to keep this cell x cell matrix sparse for memory concerns. |
nRep |
An integer representing the number of imputation replicates to create when downsampling extremely low. |
k |
The number of nearest neighbors for smoothing to use for MAGIC (see MAGIC from van Dijk et al Cell 2018). |
epsilon |
The value for the standard deviation of the kernel for MAGIC (see MAGIC from van Dijk et al Cell 2018). |
useHdf5 |
A boolean value that indicates whether HDF5 format should be used to store the impute weights. |
randomSuffix |
A boolean value that indicates whether a random suffix should be appended to the saved imputation weights hdf5 files. |
threads |
The number of threads to be used for parallel computing. |
seed |
A number to be used as the seed for random number generation. It is recommended to keep track of the seed used so that you can reproduce results downstream. |
verbose |
A boolean value indicating whether to use verbose output during execution of this function. Can be set to FALSE for a cleaner output. |
logFile |
The path to a file to be used for logging ArchR output. |
This function will compute an iterative LSI dimensionality reduction on an ArchRProject.
addIterativeLSI( ArchRProj = NULL, useMatrix = "TileMatrix", name = "IterativeLSI", iterations = 2, clusterParams = list(resolution = c(2), sampleCells = 10000, maxClusters = 6, n.start = 10), firstSelection = "top", depthCol = "nFrags", varFeatures = 25000, dimsToUse = 1:30, LSIMethod = 2, scaleDims = TRUE, corCutOff = 0.75, binarize = TRUE, outlierQuantiles = c(0.02, 0.98), filterBias = TRUE, sampleCellsPre = 10000, projectCellsPre = FALSE, sampleCellsFinal = NULL, selectionMethod = "var", scaleTo = 10000, totalFeatures = 5e+05, filterQuantile = 0.995, excludeChr = c(), saveIterations = TRUE, UMAPParams = list(n_neighbors = 40, min_dist = 0.4, metric = "cosine", verbose = FALSE, fast_sgd = TRUE), nPlot = 10000, outDir = getOutputDirectory(ArchRProj), threads = getArchRThreads(), seed = 1, verbose = TRUE, force = FALSE, logFile = createLogFile("addIterativeLSI") )
addIterativeLSI( ArchRProj = NULL, useMatrix = "TileMatrix", name = "IterativeLSI", iterations = 2, clusterParams = list(resolution = c(2), sampleCells = 10000, maxClusters = 6, n.start = 10), firstSelection = "top", depthCol = "nFrags", varFeatures = 25000, dimsToUse = 1:30, LSIMethod = 2, scaleDims = TRUE, corCutOff = 0.75, binarize = TRUE, outlierQuantiles = c(0.02, 0.98), filterBias = TRUE, sampleCellsPre = 10000, projectCellsPre = FALSE, sampleCellsFinal = NULL, selectionMethod = "var", scaleTo = 10000, totalFeatures = 5e+05, filterQuantile = 0.995, excludeChr = c(), saveIterations = TRUE, UMAPParams = list(n_neighbors = 40, min_dist = 0.4, metric = "cosine", verbose = FALSE, fast_sgd = TRUE), nPlot = 10000, outDir = getOutputDirectory(ArchRProj), threads = getArchRThreads(), seed = 1, verbose = TRUE, force = FALSE, logFile = createLogFile("addIterativeLSI") )
ArchRProj |
An |
useMatrix |
The name of the data matrix to retrieve from the ArrowFiles associated with the |
name |
The name to use for storage of the IterativeLSI dimensionality reduction in the |
iterations |
The number of LSI iterations to perform. |
clusterParams |
A list of additional parameters to be passed to |
firstSelection |
First iteration selection method for features to use for LSI. Either "Top" for the top accessible/average or "Var" for the top variable features. "Top" should be used for all scATAC-seq data (binary) while "Var" should be used for all scRNA/other-seq data types (non-binary). |
depthCol |
A column in the |
varFeatures |
The number of N variable features to use for LSI. The top N features will be used based on the |
dimsToUse |
A vector containing the dimensions from the |
LSIMethod |
A number or string indicating the order of operations in the TF-IDF normalization. Possible values are: 1 or "tf-logidf", 2 or "log(tf-idf)", and 3 or "logtf-logidf". |
scaleDims |
A boolean that indicates whether to z-score the reduced dimensions for each cell. This is useful forminimizing the contribution of strong biases (dominating early PCs) and lowly abundant populations. However, this may lead to stronger sample-specific biases since it is over-weighting latent PCs. |
corCutOff |
A numeric cutoff for the correlation of each dimension to the sequencing depth. If the dimension has a correlation to
sequencing depth that is greater than the |
binarize |
A boolean value indicating whether the matrix should be binarized before running LSI. This is often desired when working with insertion counts. |
outlierQuantiles |
Two numerical values (between 0 and 1) that describe the lower and upper quantiles of bias (number of acessible regions per cell, determined
by |
filterBias |
A boolean indicating whether to drop bias clusters when computing clusters during iterativeLSI. |
sampleCellsPre |
An integer specifying the number of cells to sample in iterations prior to the last in order to perform a sub-sampled LSI and sub-sampled clustering. This greatly reduced memory usage and increases speed for early iterations. |
projectCellsPre |
A boolean indicating whether to reproject all cells into the sub-sampled LSI (see |
sampleCellsFinal |
An integer specifying the number of cells to sample in order to perform a sub-sampled LSI in final iteration. |
selectionMethod |
The selection method to be used for identifying the top variable features. Valid options are "var" for log-variability or "vmr" for variance-to-mean ratio. |
scaleTo |
Each column in the matrix designated by |
totalFeatures |
The number of features to consider for use in LSI after ranking the features by the total number of insertions.
These features are the only ones used throught the variance identification and LSI. These are an equivalent when using a |
filterQuantile |
A number 0,1 that indicates the quantile above which features should be removed based on insertion counts prior
to the first iteration of the iterative LSI paradigm. For example, if |
excludeChr |
A string of chromosomes to exclude for iterativeLSI procedure. |
saveIterations |
A boolean value indicating whether the results of each LSI iterations should be saved as compressed |
UMAPParams |
The list of parameters to pass to the UMAP function if "UMAP" if |
nPlot |
If |
outDir |
The output directory for saving LSI iterations if desired. Default is in the |
threads |
The number of threads to be used for parallel computing. |
seed |
A number to be used as the seed for random number generation. It is recommended to keep track of the seed used so that you can reproduce results downstream. |
verbose |
A boolean value that determines whether standard output includes verbose sections. |
force |
A boolean value that indicates whether or not to overwrite relevant data in the |
logFile |
The path to a file to be used for logging ArchR output. |
This function calculates a module score from a set of features across all cells. This allows for
grouping of multiple features together into a single quantitative measurement. Currently, this
function only works for modules derived from the GeneScoreMatrix
. Each module is added as a
new column in cellColData
addModuleScore( ArchRProj = NULL, useMatrix = NULL, name = "Module", features = NULL, nBin = 25, nBgd = 100, seed = 1, threads = getArchRThreads(), logFile = createLogFile("addModuleScore") )
addModuleScore( ArchRProj = NULL, useMatrix = NULL, name = "Module", features = NULL, nBin = 25, nBgd = 100, seed = 1, threads = getArchRThreads(), logFile = createLogFile("addModuleScore") )
ArchRProj |
An |
useMatrix |
The name of the matrix to be used for calculation of the module score. See |
name |
The name to be given to the designated module. If |
features |
A list of feature names to be grouped into modules. For example, |
nBin |
The number of bins to use to divide all features for identification of signal-matched features for background calculation |
nBgd |
The number of background features to use for signal normalization. |
seed |
A number to be used as the seed for random number generation required when sampling cells for the background set. It is recommended to keep track of the seed used so that you can reproduce results downstream. |
threads |
The number of threads to be used for parallel computing. |
logFile |
The path to a file to be used for logging ArchR output. |
This function will add a trajectory from a monocle CDS created from getMonocleTrajectories
to an
ArchRProject.
addMonocleTrajectory( ArchRProj = NULL, name = "Trajectory", useGroups = NULL, groupBy = "Clusters", monocleCDS = NULL, force = FALSE )
addMonocleTrajectory( ArchRProj = NULL, name = "Trajectory", useGroups = NULL, groupBy = "Clusters", monocleCDS = NULL, force = FALSE )
ArchRProj |
An |
name |
A string indicating the name of the fitted trajectory to be added in |
useGroups |
The cell groups to be used for creating trajectory analysis. |
groupBy |
A string indicating the column name from |
monocleCDS |
A monocle CDS object created from |
force |
A boolean value indicating whether to force the trajactory indicated by |
This function adds information about which peaks contain motifs to a given ArchRProject. For each peak, a binary value is stored indicating whether each motif is observed within the peak region.
addMotifAnnotations( ArchRProj = NULL, motifSet = "cisbp", annoName = "Motif", species = NULL, collection = "CORE", motifPWMs = NULL, cutOff = 5e-05, width = 7, version = 2, force = FALSE, logFile = createLogFile("addMotifAnnotations"), ... )
addMotifAnnotations( ArchRProj = NULL, motifSet = "cisbp", annoName = "Motif", species = NULL, collection = "CORE", motifPWMs = NULL, cutOff = 5e-05, width = 7, version = 2, force = FALSE, logFile = createLogFile("addMotifAnnotations"), ... )
ArchRProj |
An |
motifSet |
The motif set to be used for annotation. Options include: (i) "JASPAR2016", "JASPAR2018", "JASPAR2020"
which gives the 2016, 2018 or 2020 version of JASPAR motifs, (ii) one of "cisbp", "encode", or "homer" which gives the
corresponding motif sets from the |
annoName |
The name of the |
species |
The name of the species relevant to the supplied |
collection |
If one of the JASPAR motif sets is used via |
motifPWMs |
A custom set of motif PWMs as a PWMList for adding motif annotations. |
cutOff |
The p-value cutoff to be used for motif search. The p-value is determined vs a background set of sequences
(see |
width |
The width in basepairs to consider for motif matches. See the |
version |
An integer specifying version 1 or version 2 of chromVARmotifs see github for more info GreenleafLab/chromVARmotifs. |
force |
A boolean value indicating whether to force the |
logFile |
The path to a file to be used for logging ArchR output. |
... |
Additional parameters to be passed to |
This function will add peak-to-gene links to a given ArchRProject
addPeak2GeneLinks( ArchRProj = NULL, reducedDims = "IterativeLSI", useMatrix = "GeneIntegrationMatrix", dimsToUse = 1:30, scaleDims = NULL, corCutOff = 0.75, cellsToUse = NULL, k = 100, knnIteration = 500, overlapCutoff = 0.8, maxDist = 250000, scaleTo = 10^4, log2Norm = TRUE, predictionCutoff = 0.4, addEmpiricalPval = FALSE, seed = 1, threads = max(floor(getArchRThreads()/2), 1), verbose = TRUE, logFile = createLogFile("addPeak2GeneLinks") )
addPeak2GeneLinks( ArchRProj = NULL, reducedDims = "IterativeLSI", useMatrix = "GeneIntegrationMatrix", dimsToUse = 1:30, scaleDims = NULL, corCutOff = 0.75, cellsToUse = NULL, k = 100, knnIteration = 500, overlapCutoff = 0.8, maxDist = 250000, scaleTo = 10^4, log2Norm = TRUE, predictionCutoff = 0.4, addEmpiricalPval = FALSE, seed = 1, threads = max(floor(getArchRThreads()/2), 1), verbose = TRUE, logFile = createLogFile("addPeak2GeneLinks") )
ArchRProj |
An |
reducedDims |
The name of the |
useMatrix |
The name of the matrix containing gene expression information to be used for determining peak-to-gene links. See |
dimsToUse |
A vector containing the dimensions from the |
scaleDims |
A boolean value that indicates whether to z-score the reduced dimensions for each cell. This is useful for minimizing
the contribution of strong biases (dominating early PCs) and lowly abundant populations. However, this may lead to stronger sample-specific
biases since it is over-weighting latent PCs. If set to |
corCutOff |
A numeric cutoff for the correlation of each dimension to the sequencing depth. If the dimension has a
correlation to sequencing depth that is greater than the |
cellsToUse |
A character vector of cellNames to compute coAccessibility on if desired to run on a subset of the total cells. |
k |
The number of k-nearest neighbors to use for creating single-cell groups for correlation analyses. |
knnIteration |
The number of k-nearest neighbor groupings to test for passing the supplied |
overlapCutoff |
The maximum allowable overlap between the current group and all previous groups to permit the current group be added to the group list during k-nearest neighbor calculations. |
maxDist |
The maximum allowable distance in basepairs between two peaks to consider for co-accessibility. |
scaleTo |
The total insertion counts from the designated group of single cells is summed across all relevant peak regions
from the |
log2Norm |
A boolean value indicating whether to log2 transform the single-cell groups prior to computing co-accessibility correlations. |
predictionCutoff |
A numeric describing the cutoff for RNA integration to use when picking cells for groupings. |
addEmpiricalPval |
Add empirical p-values based on randomly correlating peaks and genes not on the same seqname. |
seed |
A number to be used as the seed for random number generation required in knn determination. It is recommended to keep track of the seed used so that you can reproduce results downstream. |
threads |
The number of threads to be used for parallel computing. |
verbose |
A boolean value that determines whether standard output should be printed. |
logFile |
The path to a file to be used for logging ArchR output. |
This function adds information about which peaks contain input regions to a given ArchRProject. For each peak, a binary value is stored indicating whether each region is observed within the peak region.
addPeakAnnotations( ArchRProj = NULL, regions = NULL, name = "Region", force = FALSE, logFile = createLogFile("addPeakAnnotations") )
addPeakAnnotations( ArchRProj = NULL, regions = NULL, name = "Region", force = FALSE, logFile = createLogFile("addPeakAnnotations") )
ArchRProj |
An |
regions |
A named |
name |
The name of |
force |
A boolean value indicating whether to force the |
logFile |
The path to a file to be used for logging ArchR output. |
This function, for each sample, will independently compute counts for each peak per cell in the provided ArchRProject using the "PeakMatrix".
addPeakMatrix( ArchRProj = NULL, ceiling = 4, binarize = FALSE, verbose = TRUE, threads = getArchRThreads(), parallelParam = NULL, force = TRUE, logFile = createLogFile("addPeakMatrix") )
addPeakMatrix( ArchRProj = NULL, ceiling = 4, binarize = FALSE, verbose = TRUE, threads = getArchRThreads(), parallelParam = NULL, force = TRUE, logFile = createLogFile("addPeakMatrix") )
ArchRProj |
An |
ceiling |
The maximum counts per feature allowed. This is used to prevent large biases in peak counts. |
binarize |
A boolean value indicating whether the peak matrix should be binarized prior to storage. This can be useful for downstream analyses when working with insertion counts. |
verbose |
A boolean value that determines whether standard output includes verbose sections. |
threads |
The number of threads to be used for parallel computing. |
parallelParam |
A list of parameters to be passed for biocparallel/batchtools parallel computing. |
force |
A boolean value indicating whether to force the "PeakMatrix" to be overwritten if it already exist in the given |
logFile |
The path to a file to be used for logging ArchR output. |
This function adds a peak set as a GRanges object to a given ArchRProject.
addPeakSet( ArchRProj = NULL, peakSet = NULL, genomeAnnotation = getGenomeAnnotation(ArchRProj), force = FALSE )
addPeakSet( ArchRProj = NULL, peakSet = NULL, genomeAnnotation = getGenomeAnnotation(ArchRProj), force = FALSE )
ArchRProj |
An |
peakSet |
A |
genomeAnnotation |
The genomeAnnotation (see |
force |
If a |
This function adds info to the projectSummary of an ArchRProject
addProjectSummary(ArchRProj = NULL, name = NULL, summary = NULL)
addProjectSummary(ArchRProj = NULL, name = NULL, summary = NULL)
ArchRProj |
An |
name |
The name of the summary information to add to the |
summary |
A vector to add as summary information to the |
This function will get insertions from coverage files, call peaks, and merge peaks to get a "Union Reproducible Peak Set".
addReproduciblePeakSet( ArchRProj = NULL, groupBy = "Clusters", peakMethod = "Macs2", reproducibility = "2", peaksPerCell = 500, maxPeaks = 150000, minCells = 25, excludeChr = c("chrM", "chrY"), pathToMacs2 = if (tolower(peakMethod) == "macs2") findMacs2() else NULL, genomeSize = NULL, shift = -75, extsize = 150, method = if (tolower(peakMethod) == "macs2") "q" else "p", cutOff = 0.1, additionalParams = "--nomodel --nolambda", extendSummits = 250, promoterRegion = c(2000, 100), genomeAnnotation = getGenomeAnnotation(ArchRProj), geneAnnotation = getGeneAnnotation(ArchRProj), plot = TRUE, threads = getArchRThreads(), parallelParam = NULL, force = FALSE, verbose = TRUE, logFile = createLogFile("addReproduciblePeakSet"), ... )
addReproduciblePeakSet( ArchRProj = NULL, groupBy = "Clusters", peakMethod = "Macs2", reproducibility = "2", peaksPerCell = 500, maxPeaks = 150000, minCells = 25, excludeChr = c("chrM", "chrY"), pathToMacs2 = if (tolower(peakMethod) == "macs2") findMacs2() else NULL, genomeSize = NULL, shift = -75, extsize = 150, method = if (tolower(peakMethod) == "macs2") "q" else "p", cutOff = 0.1, additionalParams = "--nomodel --nolambda", extendSummits = 250, promoterRegion = c(2000, 100), genomeAnnotation = getGenomeAnnotation(ArchRProj), geneAnnotation = getGeneAnnotation(ArchRProj), plot = TRUE, threads = getArchRThreads(), parallelParam = NULL, force = FALSE, verbose = TRUE, logFile = createLogFile("addReproduciblePeakSet"), ... )
ArchRProj |
An |
groupBy |
The name of the column in |
peakMethod |
The name of peak calling method to be used. Options include "Macs2" for using macs2 callpeak or "Tiles" for using a TileMatrix. |
reproducibility |
A string that indicates how peak reproducibility should be handled. This string is dynamic and can be a
function of |
peaksPerCell |
The upper limit of the number of peaks that can be identified per cell-grouping in |
maxPeaks |
A numeric threshold for the maximum peaks to retain per group from |
minCells |
The minimum allowable number of unique cells that was used to create the coverage files on which peaks are called. This is important to allow for exclusion of pseudo-bulk replicates derived from very low cell numbers. |
excludeChr |
A character vector containing the |
pathToMacs2 |
The full path to the MACS2 executable. |
genomeSize |
The genome size to be used for MACS2 peak calling (see MACS2 documentation). This is required if genome is not hg19, hg38, mm9, or mm10. |
shift |
The number of basepairs to shift each Tn5 insertion. When combined with |
extsize |
The number of basepairs to extend the MACS2 fragment after |
method |
The method to use for significance testing in MACS2. Options are "p" for p-value and "q" for q-value. When combined with
|
cutOff |
The numeric significance cutOff for the testing method indicated by |
additionalParams |
A string of additional parameters to pass to MACS2 (see MACS2 documentation). |
extendSummits |
The number of basepairs to extend peak summits (in both directions) to obtain final fixed-width peaks. For example,
|
promoterRegion |
A vector of two integers specifying the distance in basepairs upstream and downstream of a TSS to be included as a promoter region.
Peaks called within one of these regions will be annotated as a "promoter" peak. For example, |
genomeAnnotation |
The genomeAnnotation (see |
geneAnnotation |
The geneAnnotation (see |
plot |
A boolean describing whether to plot peak annotation results. |
threads |
The number of threads to be used for parallel computing. |
parallelParam |
A list of parameters to be passed for biocparallel/batchtools parallel computing. |
force |
A boolean value indicating whether to force the reproducible peak set to be overwritten if it already exist in the given |
verbose |
A boolean value that determines whether standard output includes verbose sections. |
logFile |
The path to a file to be used for logging ArchR output. |
... |
Additional parameters to be pass to |
This function adds new data to sampleColData in an ArchRProject.
addSampleColData( ArchRProj = NULL, data = NULL, name = NULL, samples = NULL, force = FALSE )
addSampleColData( ArchRProj = NULL, data = NULL, name = NULL, samples = NULL, force = FALSE )
ArchRProj |
An |
data |
A vector containing the data to be added to |
name |
The column header name to be used for this new data in |
samples |
The names of the samples corresponding to |
force |
A boolean value that indicates whether or not to overwrite data in a given column when the value passed to |
This function will fit a supervised trajectory in a lower dimensional space that can then be used for downstream analyses.
addSlingShotTrajectories( ArchRProj = NULL, name = "SlingShot", useGroups = NULL, principalGroup = NULL, groupBy = NULL, embedding = NULL, reducedDims = NULL, force = FALSE, seed = 1 )
addSlingShotTrajectories( ArchRProj = NULL, name = "SlingShot", useGroups = NULL, principalGroup = NULL, groupBy = NULL, embedding = NULL, reducedDims = NULL, force = FALSE, seed = 1 )
ArchRProj |
An |
name |
A string indicating the name of the fitted trajectory to be added in |
useGroups |
A character vector that is used to select a subset of groups by name from the designated |
principalGroup |
The principal group which represents the group that will be the starting point for all trajectories. |
groupBy |
A string indicating the column name from |
embedding |
A string indicating the name of the |
reducedDims |
A string indicating the name of the |
force |
A boolean value indicating whether to force the trajactory indicated by |
seed |
A number to be used as the seed for random number generation for trajectory creation. |
This function, for each sample, will independently compute counts for each tile
addTileMatrix( input = NULL, chromSizes = if (inherits(input, "ArchRProject")) getChromSizes(input) else NULL, blacklist = if (inherits(input, "ArchRProject")) getBlacklist(input) else NULL, tileSize = 500, binarize = TRUE, excludeChr = c("chrM", "chrY"), threads = getArchRThreads(), parallelParam = NULL, force = FALSE, logFile = createLogFile("addTileMatrix") )
addTileMatrix( input = NULL, chromSizes = if (inherits(input, "ArchRProject")) getChromSizes(input) else NULL, blacklist = if (inherits(input, "ArchRProject")) getBlacklist(input) else NULL, tileSize = 500, binarize = TRUE, excludeChr = c("chrM", "chrY"), threads = getArchRThreads(), parallelParam = NULL, force = FALSE, logFile = createLogFile("addTileMatrix") )
input |
An |
chromSizes |
A named numeric vector containing the chromsome names and lengths. The default behavior is to retrieve
this from the |
blacklist |
A |
tileSize |
The size of the tiles used for binning counts in the "TileMatrix". |
binarize |
A boolean value indicating whether the "TileMatrix" should be binarized prior to storage. |
excludeChr |
A character vector containing the |
threads |
The number of threads to be used for parallel computing. |
parallelParam |
A list of parameters to be passed for biocparallel/batchtools parallel computing. |
force |
A boolean value indicating whether to force the "TileMatrix' to be overwritten if it already exist in the given |
logFile |
The path to a file to be used for logging ArchR output. |
This function will fit a supervised trajectory in a lower dimensional space that can then be used for downstream analyses.
addTrajectory( ArchRProj = NULL, name = "Trajectory", trajectory = NULL, groupBy = "Clusters", reducedDims = "IterativeLSI", embedding = NULL, preFilterQuantile = 0.9, postFilterQuantile = 0.9, useAll = FALSE, dof = 250, spar = 1, force = FALSE, seed = 1, logFile = createLogFile("addTrajectory") )
addTrajectory( ArchRProj = NULL, name = "Trajectory", trajectory = NULL, groupBy = "Clusters", reducedDims = "IterativeLSI", embedding = NULL, preFilterQuantile = 0.9, postFilterQuantile = 0.9, useAll = FALSE, dof = 250, spar = 1, force = FALSE, seed = 1, logFile = createLogFile("addTrajectory") )
ArchRProj |
An |
name |
A string indicating the name of the fitted trajectory to be added in |
trajectory |
The order of cell groups to be used for constraining the initial supervised fitting procedure. For example, to get a trajectory from Cluster1 to Cluster2 to Cluster3, input should be c("Cluster1", "Cluster2", "Cluster3"). Cells will then be used from these 3 groups to constrain an initial fit in the group order. |
groupBy |
A string indicating the column name from |
reducedDims |
A string indicating the name of the |
embedding |
A string indicating the name of the |
preFilterQuantile |
Prior to the initial supervised trajectory fitting, cells whose euclidean distance from the cell-grouping center is above the provided quantile will be excluded. |
postFilterQuantile |
After initial supervised trajectory fitting, cells whose euclidean distance from the cell-grouping center is above the provided quantile will be excluded. |
useAll |
A boolean describing whether to use cells outside of trajectory groups for post-fitting procedure. |
dof |
The number of degrees of freedom to be used in the spline fit. See |
spar |
The sparsity to be used in the spline fit. See |
force |
A boolean value indicating whether to force the trajactory indicated by |
seed |
A number to be used as the seed for random number generation for trajectory creation. |
logFile |
The path to a file to be used for logging ArchR output. |
This function will compute a TSNE embedding and add it to an ArchRProject.
addTSNE( ArchRProj = NULL, reducedDims = "IterativeLSI", method = "RTSNE", name = "TSNE", perplexity = 50, maxIterations = 1000, learningRate = 200, dimsToUse = NULL, scaleDims = NULL, corCutOff = 0.75, saveModel = FALSE, verbose = TRUE, seed = 1, force = FALSE, threads = max(floor(getArchRThreads()/2), 1), ... )
addTSNE( ArchRProj = NULL, reducedDims = "IterativeLSI", method = "RTSNE", name = "TSNE", perplexity = 50, maxIterations = 1000, learningRate = 200, dimsToUse = NULL, scaleDims = NULL, corCutOff = 0.75, saveModel = FALSE, verbose = TRUE, seed = 1, force = FALSE, threads = max(floor(getArchRThreads()/2), 1), ... )
ArchRProj |
An |
reducedDims |
The name of the |
method |
The method for computing a TSNE embedding to add to the |
name |
The name for the TSNE embedding to store in the given |
perplexity |
An integer describing the number of nearest neighbors to compute an |
maxIterations |
An integer describing the maximum number of iterations when computing a TSNE. This argument is passed to |
learningRate |
An integer controlling how much the weights are adjusted at each iteration. This argument is passed to |
dimsToUse |
A vector containing the dimensions from the |
scaleDims |
A boolean value that indicates whether to z-score the reduced dimensions for each cell. This is useful for minimizing
the contribution of strong biases (dominating early PCs) and lowly abundant populations. However, this may lead to stronger sample-specific
biases since it is over-weighting latent PCs. If set to |
corCutOff |
A numeric cutoff for the correlation of each dimension to the sequencing depth. If the dimension has a correlation to sequencing
depth that is greater than the |
verbose |
A boolean value that indicates whether printing TSNE output. |
seed |
A number to be used as the seed for random number generation. It is recommended to keep track of the seed used so that you can reproduce results downstream. |
force |
A boolean value that indicates whether to overwrite the relevant data in the |
threads |
The number of threads to be used for parallel computing. |
... |
Additional parameters for computing the TSNE embedding to pass to |
This function will compute a UMAP embedding and add it to an ArchRProject.
addUMAP( ArchRProj = NULL, reducedDims = "IterativeLSI", name = "UMAP", nNeighbors = 40, minDist = 0.4, metric = "cosine", dimsToUse = NULL, scaleDims = NULL, corCutOff = 0.75, sampleCells = NULL, outlierQuantile = 0.9, saveModel = TRUE, verbose = TRUE, seed = 1, force = FALSE, threads = 1, ... )
addUMAP( ArchRProj = NULL, reducedDims = "IterativeLSI", name = "UMAP", nNeighbors = 40, minDist = 0.4, metric = "cosine", dimsToUse = NULL, scaleDims = NULL, corCutOff = 0.75, sampleCells = NULL, outlierQuantile = 0.9, saveModel = TRUE, verbose = TRUE, seed = 1, force = FALSE, threads = 1, ... )
ArchRProj |
An |
reducedDims |
The name of the |
name |
The name for the UMAP embedding to store in the given |
nNeighbors |
An integer describing the number of nearest neighbors to compute a UMAP. This argument is passed to |
minDist |
A number that determines how tightly the UMAP is allowed to pack points together. This argument is passed to |
metric |
A number that determines how distance is computed in the |
dimsToUse |
A vector containing the dimensions from the |
scaleDims |
A boolean value that indicates whether to z-score the reduced dimensions for each cell. This is useful for minimizing
the contribution of strong biases (dominating early PCs) and lowly abundant populations. However, this may lead to stronger sample-specific
biases since it is over-weighting latent PCs. If set to |
corCutOff |
A numeric cutoff for the correlation of each dimension to the sequencing depth. If the dimension has a correlation to
sequencing depth that is greater than the |
sampleCells |
An integer specifying the number of cells to subsample and perform UMAP Embedding on. The remaining cells that were not subsampled will be re-projected using uwot::umap_transform to the UMAP Embedding. This enables a decrease in run time and memory but can lower the overal quality of the UMAP Embedding. Only recommended for extremely large number of cells. |
outlierQuantile |
A numeric (0 to 1) describing the distance quantile in the subsampled cels (see |
saveModel |
A boolean value indicating whether or not to save the UMAP model in an RDS file for downstream usage such as projection of data into the UMAP embedding. |
verbose |
A boolean value that indicates whether printing UMAP output. |
seed |
A number to be used as the seed for random number generation. It is recommended to keep track of the seed used so that you can reproduce results downstream. |
force |
A boolean value that indicates whether to overwrite the relevant data in the |
threads |
The number of threads to be used for parallel computing. Default set to 1 because if set to high can cause C stack usage errors. |
... |
Additional parameters to pass to |
This function will open an interactive shiny session in style of a browser track. It allows for normalization of the signal which
enables direct comparison across samples. Note that the genes displayed in this browser are derived from your geneAnnotation
(i.e. the BSgenome
object you used) so they may not match other online genome browsers that use different gene annotations.
ArchRBrowser( ArchRProj = NULL, features = getPeakSet(ArchRProj), loops = getCoAccessibility(ArchRProj), minCells = 25, baseSize = 10, borderWidth = 0.5, tickWidth = 0.5, facetbaseSize = 12, geneAnnotation = getGeneAnnotation(ArchRProj), browserTheme = "cosmo", threads = getArchRThreads(), verbose = TRUE, logFile = createLogFile("ArchRBrowser") )
ArchRBrowser( ArchRProj = NULL, features = getPeakSet(ArchRProj), loops = getCoAccessibility(ArchRProj), minCells = 25, baseSize = 10, borderWidth = 0.5, tickWidth = 0.5, facetbaseSize = 12, geneAnnotation = getGeneAnnotation(ArchRProj), browserTheme = "cosmo", threads = getArchRThreads(), verbose = TRUE, logFile = createLogFile("ArchRBrowser") )
ArchRProj |
An |
features |
A |
loops |
A |
minCells |
The minimum number of cells contained within a cell group to allow for this cell group to be plotted. This argument can be used to exclude pseudo-bulk replicates generated from low numbers of cells. |
baseSize |
The numeric font size to be used in the plot. This applies to all plot labels. |
borderWidth |
The numeric line width to be used for plot borders. |
tickWidth |
The numeric line width to be used for axis tick marks. |
facetbaseSize |
The numeric font size to be used in the facets (gray boxes used to provide track labels) of the plot. |
geneAnnotation |
The |
browserTheme |
A |
threads |
The number of threads to use for parallel execution. |
verbose |
A boolean value that determines whether standard output should be printed. |
logFile |
The path to a file to be used for logging ArchR output. |
A collection of some original and some borrowed color palettes to provide appealing color aesthetics for plots in ArchR
ArchRPalettes
ArchRPalettes
An object of class list
of length 30.
This function will create an ArchRProject from the provided ArrowFiles.
ArchRProject( ArrowFiles = NULL, outputDirectory = "ArchROutput", copyArrows = TRUE, geneAnnotation = getGeneAnnotation(), genomeAnnotation = getGenomeAnnotation(), showLogo = TRUE, threads = getArchRThreads() )
ArchRProject( ArrowFiles = NULL, outputDirectory = "ArchROutput", copyArrows = TRUE, geneAnnotation = getGeneAnnotation(), genomeAnnotation = getGenomeAnnotation(), showLogo = TRUE, threads = getArchRThreads() )
ArrowFiles |
A character vector containing the relative paths to the ArrowFiles to be used. |
outputDirectory |
A name for the relative path of the outputDirectory for ArchR results. Relative to the current working directory. |
copyArrows |
A boolean value indicating whether ArrowFiles should be copied into |
geneAnnotation |
The |
genomeAnnotation |
The |
showLogo |
A boolean value indicating whether to show the ascii ArchR logo after successful creation of an |
threads |
The number of threads to use for parallel execution. |
This function creates a confusion matrix based on two value vectors.
confusionMatrix(i = NULL, j = NULL)
confusionMatrix(i = NULL, j = NULL)
i |
A character/numeric value vector to see concordance with j. |
j |
A character/numeric value vector to see concordance with i. |
This function will correlate 2 matrices within an ArchRProject by name matching.
correlateMatrices( ArchRProj = NULL, useMatrix1 = NULL, useMatrix2 = NULL, useSeqnames1 = NULL, useSeqnames2 = NULL, removeFromName1 = c("underscore", "dash"), removeFromName2 = c("underscore", "dash"), log2Norm1 = TRUE, log2Norm2 = TRUE, reducedDims = "IterativeLSI", dimsToUse = 1:30, scaleDims = NULL, corCutOff = 0.75, k = 100, knnIteration = 500, overlapCutoff = 0.8, seed = 1, threads = getArchRThreads(), verbose = TRUE, logFile = createLogFile("correlateMatrices") )
correlateMatrices( ArchRProj = NULL, useMatrix1 = NULL, useMatrix2 = NULL, useSeqnames1 = NULL, useSeqnames2 = NULL, removeFromName1 = c("underscore", "dash"), removeFromName2 = c("underscore", "dash"), log2Norm1 = TRUE, log2Norm2 = TRUE, reducedDims = "IterativeLSI", dimsToUse = 1:30, scaleDims = NULL, corCutOff = 0.75, k = 100, knnIteration = 500, overlapCutoff = 0.8, seed = 1, threads = getArchRThreads(), verbose = TRUE, logFile = createLogFile("correlateMatrices") )
ArchRProj |
An |
useMatrix1 |
A character describing the first matrix to use. See |
useMatrix2 |
A character describing the second matrix to use. See |
useSeqnames1 |
A character vector describing which seqnames to use in matrix 1. |
useSeqnames2 |
A character vector describing which seqnames to use in matrix 2. |
removeFromName1 |
A character vector describing how to filter names in matrix 1. Options include "underscore", "dash", "numeric" and "dot". The string portion prior to these will be kept. |
removeFromName2 |
A character vector describing how to filter names in matrix 2. Options include "underscore", "dash", "numeric" and "dot". The string portion prior to these will be kept. |
log2Norm1 |
A boolean describing whether to log2 normalize matrix 1. |
log2Norm2 |
A boolean describing whether to log2 normalize matrix 2. |
reducedDims |
The name of the |
dimsToUse |
A vector containing the dimensions from the |
scaleDims |
A boolean value that indicates whether to z-score the reduced dimensions for each cell. This is useful for minimizing
the contribution of strong biases (dominating early PCs) and lowly abundant populations. However, this may lead to stronger sample-specific
biases since it is over-weighting latent PCs. If set to |
corCutOff |
A numeric cutoff for the correlation of each dimension to the sequencing depth. If the dimension has a correlation to
sequencing depth that is greater than the |
k |
The number of k-nearest neighbors to use for creating single-cell groups for correlation analyses. |
knnIteration |
The number of k-nearest neighbor groupings to test for passing the supplied |
overlapCutoff |
The maximum allowable overlap between the current group and all previous groups to permit the current group be added to the group list during k-nearest neighbor calculations. |
seed |
A number to be used as the seed for random number generation required in knn determination. It is recommended to keep track of the seed used so that you can reproduce results downstream. |
threads |
The number of threads to be used for parallel computing. |
verbose |
A boolean value that determines whether standard output should be printed. |
logFile |
The path to a file to be used for logging ArchR output. |
This function will correlate 2 trajectory matrices from getTrajectory.
correlateTrajectories( seTrajectory1 = NULL, seTrajectory2 = NULL, corCutOff = 0.5, varCutOff1 = 0.8, varCutOff2 = 0.8, removeFromName1 = c("underscore", "dash"), removeFromName2 = c("underscore", "dash"), useRanges = FALSE, fix1 = "center", fix2 = "start", maxDist = 250000, log2Norm1 = TRUE, log2Norm2 = TRUE, force = FALSE, logFile = createLogFile("correlateTrajectories") )
correlateTrajectories( seTrajectory1 = NULL, seTrajectory2 = NULL, corCutOff = 0.5, varCutOff1 = 0.8, varCutOff2 = 0.8, removeFromName1 = c("underscore", "dash"), removeFromName2 = c("underscore", "dash"), useRanges = FALSE, fix1 = "center", fix2 = "start", maxDist = 250000, log2Norm1 = TRUE, log2Norm2 = TRUE, force = FALSE, logFile = createLogFile("correlateTrajectories") )
seTrajectory1 |
A |
seTrajectory2 |
A |
corCutOff |
A numeric describing the cutoff for determining correlated features. |
varCutOff1 |
The "Variance Quantile Cutoff" to be used for identifying the top variable features across |
varCutOff2 |
The "Variance Quantile Cutoff" to be used for identifying the top variable features across |
removeFromName1 |
A character vector describing how to filter names in matrix 1. Options include "underscore", "dash", "numeric" and "dot". The string portion prior to these will be kept. |
removeFromName2 |
A character vector describing how to filter names in matrix 2. Options include "underscore", "dash", "numeric" and "dot". The string portion prior to these will be kept. |
useRanges |
A boolean describing whether to use range overlap matching for correlation analysis. |
fix1 |
A character describing where to resize the coordinates of |
fix2 |
A character describing where to resize the coordinates of |
maxDist |
A integer specifying the maximum distance between the coordinates of |
log2Norm1 |
A boolean describing whether to log2 normalize |
log2Norm2 |
A boolean describing whether to log2 normalize |
force |
A boolean value that determines whether analysis should continue if resizing coordinates in |
logFile |
The path to a file to be used for logging ArchR output. |
This function will create ArrowFiles from input files. These ArrowFiles are the main constituent for downstream analysis in ArchR.
createArrowFiles( inputFiles = NULL, sampleNames = names(inputFiles), outputNames = sampleNames, validBarcodes = NULL, geneAnnotation = getGeneAnnotation(), genomeAnnotation = getGenomeAnnotation(), minTSS = 4, minFrags = 1000, maxFrags = 1e+05, minFragSize = 10, maxFragSize = 2000, QCDir = "QualityControl", nucLength = 147, promoterRegion = c(2000, 100), TSSParams = list(), excludeChr = c("chrM", "chrY"), nChunk = 5, bcTag = "qname", gsubExpression = NULL, bamFlag = NULL, offsetPlus = 4, offsetMinus = -5, addTileMat = TRUE, TileMatParams = list(), addGeneScoreMat = TRUE, GeneScoreMatParams = list(), force = FALSE, threads = getArchRThreads(), parallelParam = NULL, subThreading = TRUE, verbose = TRUE, cleanTmp = TRUE, logFile = createLogFile("createArrows"), filterFrags = NULL, filterTSS = NULL )
createArrowFiles( inputFiles = NULL, sampleNames = names(inputFiles), outputNames = sampleNames, validBarcodes = NULL, geneAnnotation = getGeneAnnotation(), genomeAnnotation = getGenomeAnnotation(), minTSS = 4, minFrags = 1000, maxFrags = 1e+05, minFragSize = 10, maxFragSize = 2000, QCDir = "QualityControl", nucLength = 147, promoterRegion = c(2000, 100), TSSParams = list(), excludeChr = c("chrM", "chrY"), nChunk = 5, bcTag = "qname", gsubExpression = NULL, bamFlag = NULL, offsetPlus = 4, offsetMinus = -5, addTileMat = TRUE, TileMatParams = list(), addGeneScoreMat = TRUE, GeneScoreMatParams = list(), force = FALSE, threads = getArchRThreads(), parallelParam = NULL, subThreading = TRUE, verbose = TRUE, cleanTmp = TRUE, logFile = createLogFile("createArrows"), filterFrags = NULL, filterTSS = NULL )
inputFiles |
A character vector containing the paths to the input files to use to generate the ArrowFiles. These files can be in one of the following formats: (i) scATAC tabix files, (ii) fragment files, or (iii) bam files. |
sampleNames |
A character vector containing the names to assign to the samples that correspond to the |
outputNames |
The prefix to use for output files. Each input file should receive a unique output file name. This list should be in the same order as "inputFiles". For example, if the predix is "PBMC" the output file will be named "PBMC.arrow" |
validBarcodes |
A list of valid barcode strings to be used for filtering cells read from each input file
(see |
geneAnnotation |
The geneAnnotation (see |
genomeAnnotation |
The genomeAnnotation (see |
minTSS |
The minimum numeric transcription start site (TSS) enrichment score required for a cell to pass filtering for use
in downstream analyses. Cells with a TSS enrichment score greater than or equal to |
minFrags |
The minimum number of mapped ATAC-seq fragments required per cell to pass filtering for use in downstream analyses.
Cells containing greater than or equal to |
maxFrags |
The maximum number of mapped ATAC-seq fragments required per cell to pass filtering for use in downstream analyses.
Cells containing greater than or equal to |
minFragSize |
The minimum fragment size to be included into Arrow File. Fragments lower than this number are discarded. Must be less than maxFragSize. |
maxFragSize |
The maximum fragment size to be included into Arrow File. Fragments above than this number are discarded. Must be greater than maxFragSize. |
QCDir |
The relative path to the output directory for QC-level information and plots for each sample/ArrowFile. |
nucLength |
The length in basepairs that wraps around a nucleosome. This number is used for identifying fragments as sub-nucleosome-spanning, mono-nucleosome-spanning, or multi-nucleosome-spanning. |
promoterRegion |
A integer vector describing the number of basepairs upstream and downstream c(upstream, downstream) of the TSS to include as the promoter region for downstream calculation of things like the fraction of reads in promoters (FIP). |
TSSParams |
A list of parameters for computing TSS Enrichment scores. This includes the |
excludeChr |
A character vector containing the names of chromosomes to be excluded from downstream analyses. In most human/mouse analyses, this includes the mitochondrial DNA (chrM) and the male sex chromosome (chrY). This does, however, not exclude the corresponding fragments from being stored in the ArrowFile. |
nChunk |
The number of chunks to divide each chromosome into to allow for low-memory parallelized reading of the |
bcTag |
The name of the field in the input bam file containing the barcode tag information. See |
gsubExpression |
A regular expression used to clean up the barcode tag string read in from a bam file. For example, if the barcode is appended to the readname or qname field like for the mouse atlas data from Cusanovic* and Hill* et al. (2018), the gsubExpression would be ":.*". This would retrieve the string after the colon as the barcode. |
bamFlag |
A vector of bam flags to be used for reading in fragments from input bam files. Should be in the format of a
|
offsetPlus |
The numeric offset to apply to a "+" stranded Tn5 insertion to account for the precise Tn5 binding site. This parameter only applies to bam file input and it is assumed that fragment files have already been offset which is the standard from 10x output. See Buenrostro et al. Nature Methods 2013. |
offsetMinus |
The numeric offset to apply to a "-" stranded Tn5 insertion to account for the precise Tn5 binding site. This parameter only applies to bam file input and it is assumed that fragment files have already been offset which is the standard from 10x output. See Buenrostro et al. Nature Methods 2013. |
addTileMat |
A boolean value indicating whether to add a "Tile Matrix" to each ArrowFile. A Tile Matrix is a counts matrix that, instead of using peaks, uses a fixed-width sliding window of bins across the whole genome. This matrix can be used in many downstream ArchR operations. |
TileMatParams |
A list of parameters to pass to the |
addGeneScoreMat |
A boolean value indicating whether to add a Gene-Score Matrix to each ArrowFile. A Gene-Score Matrix uses ATAC-seq signal proximal to the TSS to estimate gene activity. |
GeneScoreMatParams |
A list of parameters to pass to the |
force |
A boolean value indicating whether to force ArrowFiles to be overwritten if they already exist. |
threads |
The number of threads to be used for parallel computing. |
parallelParam |
A list of parameters to be passed for biocparallel/batchtools parallel computing. |
subThreading |
A boolean determining whether possible use threads within each multi-threaded subprocess if greater than the number of input samples. |
verbose |
A boolean value that determines whether standard output should be printed. |
logFile |
The path to a file to be used for logging ArchR output. |
cleamTmp |
A boolean value that determines whether to clean temp folder of all intermediate ".arrow" files. |
This function will create a gene annotation object that can be used for creating ArrowFiles or an ArchRProject, etc.
createGeneAnnotation( genome = NULL, TxDb = NULL, OrgDb = NULL, genes = NULL, exons = NULL, TSS = NULL, annoStyle = NULL )
createGeneAnnotation( genome = NULL, TxDb = NULL, OrgDb = NULL, genes = NULL, exons = NULL, TSS = NULL, annoStyle = NULL )
genome |
A string that specifies the genome (ie "hg38", "hg19", "mm10", "mm9"). If |
TxDb |
A |
OrgDb |
An |
genes |
A |
exons |
A |
TSS |
A |
annoStyle |
annotation style to map between gene names and various gene identifiers e.g. "ENTREZID", "ENSEMBL". |
This function will create a genome annotation object that can be used for creating ArrowFiles or an ArchRProject, etc.
createGenomeAnnotation( genome = NULL, chromSizes = NULL, blacklist = NULL, filter = TRUE, filterChr = c("chrM") )
createGenomeAnnotation( genome = NULL, chromSizes = NULL, blacklist = NULL, filter = TRUE, filterChr = c("chrM") )
genome |
Either (i) a string that is a valid |
chromSizes |
A |
blacklist |
A |
filter |
A boolean value indicating whether non-standard chromosome scaffolds should be excluded.
These "non-standard" chromosomes are defined by |
filterChr |
A character vector indicating the seqlevels that should be removed if manual removal is desired for certain seqlevels.
If no manual removal is desired, |
This function will create a log file for ArchR functions. If ArchRLogging is not TRUE this function will return NULL.
createLogFile(name = NULL, logDir = "ArchRLogs", useLogs = getArchRLogging())
createLogFile(name = NULL, logDir = "ArchRLogs", useLogs = getArchRLogging())
name |
A character string to add a more descriptive name in log file. |
logDir |
The path to a directory where log files should be written. |
This function gets a PeakMatrix from an ArchRProject
and writes it to a set of files for STREAM (https://github.com/pinellolab/STREAM)
exportPeakMatrixForSTREAM( ArchRProj = NULL, useSeqnames = NULL, verbose = TRUE, binarize = FALSE, threads = getArchRThreads(), logFile = createLogFile("exportMatrixForSTREAM") )
exportPeakMatrixForSTREAM( ArchRProj = NULL, useSeqnames = NULL, verbose = TRUE, binarize = FALSE, threads = getArchRThreads(), logFile = createLogFile("exportMatrixForSTREAM") )
ArchRProj |
An |
useSeqnames |
A character vector of chromosome names to be used to subset the data matrix being obtained. |
verbose |
A boolean value indicating whether to use verbose output during execution of this function. Can be set to FALSE for a cleaner output. |
binarize |
A boolean value indicating whether the matrix should be binarized before return. This is often desired when working with insertion counts. |
logFile |
The path to a file to be used for logging ArchR output. |
This function extends each region in a Genomic Ranges object by a designated upstream and downstream extension in a strand-aware fashion
extendGR(gr = NULL, upstream = NULL, downstream = NULL)
extendGR(gr = NULL, upstream = NULL, downstream = NULL)
gr |
A |
upstream |
The number of basepairs upstream (5') to extend each region in |
downstream |
The number of basepairs downstream (3') to extend each region in |
This function allows for removal of manually designated or more broadly undesirable seqlevels from a Genomic Ranges object or similar object
filterChrGR( gr = NULL, remove = NULL, underscore = TRUE, standard = TRUE, pruningMode = "coarse" )
filterChrGR( gr = NULL, remove = NULL, underscore = TRUE, standard = TRUE, pruningMode = "coarse" )
gr |
A |
remove |
A character vector indicating the seqlevels that should be removed if manual removal is desired for certain seqlevels.
If no manual removal is desired, |
underscore |
A boolean value indicating whether to remove all seqlevels whose names contain an underscore (for example "chr11_KI270721v1_random"). |
standard |
A boolean value indicating whether only standard chromosomes should be kept. Standard chromosomes are defined by
|
pruningMode |
The name of the pruning method to use (from |
This function will filter doublets from an ArchRProject after addDoubletScores() has been run.
filterDoublets( ArchRProj = NULL, cutEnrich = 1, cutScore = -Inf, filterRatio = 1 )
filterDoublets( ArchRProj = NULL, cutEnrich = 1, cutScore = -Inf, filterRatio = 1 )
ArchRProj |
An |
cutEnrich |
The minimum numeric cutoff for |
cutScore |
The minimum numeric cutoff for |
filterRatio |
The maximum ratio of predicted doublets to filter based on the number of pass-filter cells.
For example, if there are 5000 cells, the maximum would be |
This function attempts to find the path to the MACS2 executable by serting the path and python's pip.
findMacs2()
findMacs2()
This function will get the default requirement of chromosomes to have a "chr" prefix.
getArchRChrPrefix()
getArchRChrPrefix()
This function will get ArchR Debugging which will save an RDS if an error is encountered.
getArchRDebugging()
getArchRDebugging()
This function will retrieve the genome that is currently in use by ArchR. Alternatively, this function can return either the geneAnnotation
or the genomeAnnotation
associated with the globally defined genome if desired.
getArchRGenome(geneAnnotation = FALSE, genomeAnnotation = FALSE)
getArchRGenome(geneAnnotation = FALSE, genomeAnnotation = FALSE)
geneAnnotation |
A boolean value indicating whether the |
genomeAnnotation |
A boolean value indicating whether the |
This function will get ArchR logging
getArchRLogging()
getArchRLogging()
This function will get the number of threads to be used for parallel execution across all ArchR functions.
getArchRThreads()
getArchRThreads()
This function will get ArchR logging verbosity.
getArchRVerbose()
getArchRVerbose()
This function gets the names of all ArrowFiles associated with a given ArchRProject.
getArrowFiles(ArchRProj = NULL)
getArrowFiles(ArchRProj = NULL)
ArchRProj |
An |
This function gets the available matrices from the ArrowFiles in a given ArchRProject object.
getAvailableMatrices(ArchRProj = NULL)
getAvailableMatrices(ArchRProj = NULL)
ArchRProj |
An |
This function will get/compute background peaks controlling for total accessibility and GC-content from an ArchRProject.
getBgdPeaks( ArchRProj = NULL, nIterations = 50, w = 0.1, binSize = 50, seed = 1, method = "chromVAR", force = FALSE )
getBgdPeaks( ArchRProj = NULL, nIterations = 50, w = 0.1, binSize = 50, seed = 1, method = "chromVAR", force = FALSE )
ArchRProj |
An |
nIterations |
The number of background peaks to sample. See |
w |
The parameter controlling similarity measure of background peaks. See |
binSize |
The precision with which the similarity is computed. See |
seed |
A number to be used as the seed for random number generation. It is recommended to keep track of the seed used so that you can reproduce results downstream. |
method |
A string indicating whether to use chromVAR or ArchR for background peak identification. |
force |
A boolean value indicating whether to force the file indicated by |
This function gets the blacklist (the regions to be excluded from analysis) as a GRanges object from the genomeAnnotation of a given ArchRProject.
getBlacklist(ArchRProj = NULL)
getBlacklist(ArchRProj = NULL)
ArchRProj |
An |
This function gets the cellColData from a given ArchRProject.
getCellColData(ArchRProj = NULL, select = NULL, drop = FALSE)
getCellColData(ArchRProj = NULL, select = NULL, drop = FALSE)
ArchRProj |
An |
select |
A character vector of column names to select from |
drop |
A boolean value that indicates whether to drop the |
This function gets the cellNames from a given ArchRProject object.
getCellNames(ArchRProj = NULL)
getCellNames(ArchRProj = NULL)
ArchRProj |
An |
This function gets the chromosome lengths as a vector from the genomeAnnotation of a given ArchRProject.
getChromLengths(ArchRProj = NULL)
getChromLengths(ArchRProj = NULL)
ArchRProj |
An |
This function gets the chromosome lengths as a GRanges object from the genomeAnnotation of a given ArchRProject.
getChromSizes(ArchRProj = NULL)
getChromSizes(ArchRProj = NULL)
ArchRProj |
An |
This function obtains co-accessibility data from an ArchRProject.
getCoAccessibility( ArchRProj = NULL, corCutOff = 0.5, resolution = 1, returnLoops = TRUE )
getCoAccessibility( ArchRProj = NULL, corCutOff = 0.5, resolution = 1, returnLoops = TRUE )
ArchRProj |
An |
corCutOff |
A numeric describing the minimum numeric peak-to-peak correlation to return. |
resolution |
A numeric describing the bp resolution to use when returning loops. This helps with overplotting of correlated regions.
This only takes affect if |
returnLoops |
A boolean indicating to return the co-accessibility signal as a |
This function gets an embedding (i.e. UMAP) from a given ArchRProject.
getEmbedding(ArchRProj = NULL, embedding = "UMAP", returnDF = TRUE)
getEmbedding(ArchRProj = NULL, embedding = "UMAP", returnDF = TRUE)
ArchRProj |
An |
embedding |
The name of the |
returnDF |
A boolean value indicating whether to return the embedding object as a |
This function gets the exons coordinates as a GRanges object from the geneAnnotation of a given ArchRProject.
getExons(ArchRProj = NULL, symbols = NULL)
getExons(ArchRProj = NULL, symbols = NULL)
ArchRProj |
An |
symbols |
A character vector containing the gene symbols for the genes where exons should be extracted. |
This function will identify available features from a given data matrix (i.e. "GeneScoreMatrix", or "TileMatrix") and return them for downstream plotting utilities.
getFeatures( ArchRProj = NULL, useMatrix = "GeneScoreMatrix", select = NULL, ignoreCase = TRUE )
getFeatures( ArchRProj = NULL, useMatrix = "GeneScoreMatrix", select = NULL, ignoreCase = TRUE )
ArchRProj |
An |
useMatrix |
The name of the data matrix as stored in the ArrowFiles of the |
select |
A string specifying a specific feature name (or rowname) to be found with |
ignoreCase |
A boolean value indicating whether to ignore the case (upper-case / lower-case) when searching via grep for the string passed to |
This function will get footprints for all samples in a given ArchRProject and return a summarized experiment object that can be used for downstream analyses
getFootprints( ArchRProj = NULL, positions = NULL, plotName = "Plot-Footprints", groupBy = "Clusters", useGroups = NULL, flank = 250, minCells = 25, nTop = NULL, threads = getArchRThreads(), verbose = TRUE, logFile = createLogFile("getFootprints") )
getFootprints( ArchRProj = NULL, positions = NULL, plotName = "Plot-Footprints", groupBy = "Clusters", useGroups = NULL, flank = 250, minCells = 25, nTop = NULL, threads = getArchRThreads(), verbose = TRUE, logFile = createLogFile("getFootprints") )
ArchRProj |
An |
positions |
A |
plotName |
The prefix to add to the file name for the output PDF file containing the footprint plots. |
groupBy |
The name of the column in |
useGroups |
A character vector that is used to select a subset of groups by name from the designated |
flank |
The number of basepairs from the position center (+/-) to consider as the flank. |
minCells |
The minimum number of cells required in a given cell group to permit footprint generation. |
nTop |
The number of genomic regions to consider. Only the top |
threads |
The number of threads to be used for parallel computing. |
verbose |
A boolean value that determines whether standard output includes verbose sections. |
logFile |
The path to a file to be used for logging ArchR output. |
This function retrieves the fragments from a given ArrowFile as a GRanges object.
getFragmentsFromArrow( ArrowFile = NULL, chr = NULL, cellNames = NULL, verbose = TRUE, logFile = createLogFile("getFragmentsFromArrow") )
getFragmentsFromArrow( ArrowFile = NULL, chr = NULL, cellNames = NULL, verbose = TRUE, logFile = createLogFile("getFragmentsFromArrow") )
ArrowFile |
The path to the ArrowFile from which fragments should be obtained. |
chr |
A name of a chromosome to be used to subset the fragments |
cellNames |
A character vector indicating the cell names of a subset of cells from which fragments whould be extracted.
This allows for extraction of fragments from only a subset of selected cells. By default, this function will extract all cells
from the provided ArrowFile using |
verbose |
A boolean value indicating whether to use verbose output during execution of this function. Can be set to |
logFile |
The path to a file to be used for logging ArchR output. |
This function retrieves the fragments from a given ArchRProject as a GRangesList object.
getFragmentsFromProject( ArchRProj = NULL, subsetBy = NULL, cellNames = NULL, verbose = FALSE, logFile = createLogFile("getFragmentsFromProject") )
getFragmentsFromProject( ArchRProj = NULL, subsetBy = NULL, cellNames = NULL, verbose = FALSE, logFile = createLogFile("getFragmentsFromProject") )
subsetBy |
A Genomic Ranges object to subset fragments by. |
cellNames |
A character vector indicating the cell names of a subset of cells from which fragments whould be extracted.
This allows for extraction of fragments from only a subset of selected cells. By default, this function will extract all cells
from the provided ArrowFile using |
verbose |
A boolean value indicating whether to use verbose output during execution of this function. Can be set to |
logFile |
The path to a file to be used for logging ArchR output. |
ArchRProject |
An |
This function gets the geneAnnotation from a given ArchRProject
getGeneAnnotation(ArchRProj = NULL)
getGeneAnnotation(ArchRProj = NULL)
ArchRProj |
An |
This function gets the genes start and end coordinates as a GRanges object from the geneAnnotation of a given ArchRProject.
getGenes(ArchRProj = NULL, symbols = NULL)
getGenes(ArchRProj = NULL, symbols = NULL)
ArchRProj |
An |
symbols |
A character vector containing the gene symbols to subset from the |
This function gets the name of the genome from the genomeAnnotation used by a given ArchRProject.
getGenome(ArchRProj = NULL)
getGenome(ArchRProj = NULL)
ArchRProj |
An |
This function gets the genomeAnnotation from a given ArchRProject.
getGenomeAnnotation(ArchRProj = NULL)
getGenomeAnnotation(ArchRProj = NULL)
ArchRProj |
An |
This function will group, summarize and export a bigwig for each group in an ArchRProject.
getGroupBW( ArchRProj = NULL, groupBy = "Sample", normMethod = "ReadsInTSS", tileSize = 100, maxCells = 1000, ceiling = 4, verbose = TRUE, threads = getArchRThreads(), logFile = createLogFile("getGroupBW") )
getGroupBW( ArchRProj = NULL, groupBy = "Sample", normMethod = "ReadsInTSS", tileSize = 100, maxCells = 1000, ceiling = 4, verbose = TRUE, threads = getArchRThreads(), logFile = createLogFile("getGroupBW") )
ArchRProj |
An |
groupBy |
A string that indicates how cells should be grouped. This string corresponds to one of the standard or
user-supplied |
normMethod |
The name of the column in |
tileSize |
The numeric width of the tile/bin in basepairs for plotting ATAC-seq signal tracks. All insertions in a single bin will be summed. |
maxCells |
Maximum number of cells used for each bigwig. |
ceiling |
Maximum contribution of accessibility per cell in each tile. |
verbose |
A boolean specifying to print messages during computation. |
threads |
An integer specifying the number of threads for parallel. |
logFile |
The path to a file to be used for logging ArchR output. |
This function will group, summarize and export a summarized experiment for a assay in a ArchRProject.
getGroupSE( ArchRProj = NULL, useMatrix = NULL, groupBy = "Sample", divideN = TRUE, scaleTo = NULL, threads = getArchRThreads(), verbose = TRUE, logFile = createLogFile("getGroupSE") )
getGroupSE( ArchRProj = NULL, useMatrix = NULL, groupBy = "Sample", divideN = TRUE, scaleTo = NULL, threads = getArchRThreads(), verbose = TRUE, logFile = createLogFile("getGroupSE") )
ArchRProj |
An |
useMatrix |
The name of the matrix in the ArrowFiles. See getAvailableMatrices to see options |
groupBy |
The name of the column in |
divideN |
A boolean describing whether to divide by the number of cells. |
scaleTo |
Depth normalize to this value if not NULL. |
threads |
An integer specifying the number of threads for parallel. |
verbose |
A boolean specifying to print messages during computation. |
logFile |
The path to a file to be used for logging ArchR output. |
This function summarizes a numeric cellColData entry across groupings in a ArchRProject.
getGroupSummary( ArchRProj = NULL, groupBy = "Sample", select = "TSSEnrichment", summary = "median", removeNA = TRUE )
getGroupSummary( ArchRProj = NULL, groupBy = "Sample", select = "TSSEnrichment", summary = "median", removeNA = TRUE )
ArchRProj |
An |
groupBy |
The name of the column in |
select |
A character vector containing the column names to select from |
summary |
A character vector describing which method for summarizing across group. Options include "median", "mean", or "sum". |
removeNA |
Remove NA's from summary method. |
This function gets imputation weights from an ArchRProject to impute numeric values.
getImputeWeights(ArchRProj = NULL)
getImputeWeights(ArchRProj = NULL)
ArchRProj |
An |
This function will look for fragment files and bam files in the input paths and return the full path and sample names.
getInputFiles(paths = NULL)
getInputFiles(paths = NULL)
paths |
A character vector of paths to search for usable input files. |
This function will identify features that are definitional of each provided cell grouping where possible
getMarkerFeatures( ArchRProj = NULL, groupBy = "Clusters", useGroups = NULL, bgdGroups = NULL, useMatrix = "GeneScoreMatrix", bias = c("TSSEnrichment", "log10(nFrags)"), normBy = NULL, testMethod = "wilcoxon", maxCells = 500, scaleTo = 10^4, threads = getArchRThreads(), k = 100, bufferRatio = 0.8, binarize = FALSE, useSeqnames = NULL, verbose = TRUE, logFile = createLogFile("getMarkerFeatures") )
getMarkerFeatures( ArchRProj = NULL, groupBy = "Clusters", useGroups = NULL, bgdGroups = NULL, useMatrix = "GeneScoreMatrix", bias = c("TSSEnrichment", "log10(nFrags)"), normBy = NULL, testMethod = "wilcoxon", maxCells = 500, scaleTo = 10^4, threads = getArchRThreads(), k = 100, bufferRatio = 0.8, binarize = FALSE, useSeqnames = NULL, verbose = TRUE, logFile = createLogFile("getMarkerFeatures") )
ArchRProj |
An |
groupBy |
The name of the column in |
useGroups |
A character vector that is used to select a subset of groups by name from the designated |
bgdGroups |
A character vector that is used to select a subset of groups by name from the designated |
useMatrix |
The name of the matrix to be used for performing differential analyses. Options include "GeneScoreMatrix", "PeakMatrix", etc. |
bias |
A character vector indicating the potential bias variables (i.e. c("TSSEnrichment", "log10(nFrags)")) to account
for in selecting a matched null group for marker feature identification. These should be column names from |
normBy |
The name of a numeric column in |
testMethod |
The name of the pairwise test method to use in comparing cell groupings to the null cell grouping during marker feature identification. Valid options include "wilcoxon", "ttest", and "binomial". |
maxCells |
The maximum number of cells to consider from a single-cell group when performing marker feature identification. |
scaleTo |
Each column in the matrix designated by |
threads |
The number of threads to be used for parallel computing. |
k |
The number of nearby cells to use for selecting a biased-matched background while accounting for |
bufferRatio |
When generating optimal biased-matched background groups of cells to determine significance, it can be difficult
to find sufficient numbers of well-matched cells to create a background group made up of an equal number of cells. The |
binarize |
A boolean value indicating whether to binarize the matrix prior to differential testing. This is useful when
|
useSeqnames |
A character vector that indicates which |
verbose |
A boolean value that determines whether standard output is printed. |
logFile |
The path to a file to be used for logging ArchR output. |
This function will identify Markers and return a List of Features or a GRangesList for each group of significant marker features.
getMarkers( seMarker = NULL, cutOff = "FDR <= 0.1 & Log2FC >= 0.5", n = NULL, returnGR = FALSE )
getMarkers( seMarker = NULL, cutOff = "FDR <= 0.1 & Log2FC >= 0.5", n = NULL, returnGR = FALSE )
seMarker |
A |
cutOff |
A valid-syntax logical statement that defines which marker features from |
n |
An integer that indicates the maximum number of features to return per group. |
returnGR |
A boolean indicating whether to return as a |
This function gets peak annotation matches from a given ArchRProject. The peaks in the returned object are in the
same order as the peaks returned by getPeakSet()
.
getMatches(ArchRProj = NULL, name = NULL, annoName = NULL)
getMatches(ArchRProj = NULL, name = NULL, annoName = NULL)
ArchRProj |
An |
name |
The name of the |
annoName |
The name of a specific annotation to subset within the |
This function gets a given data matrix from an individual ArrowFile.
getMatrixFromArrow( ArrowFile = NULL, useMatrix = "GeneScoreMatrix", useSeqnames = NULL, cellNames = NULL, ArchRProj = NULL, verbose = TRUE, binarize = FALSE, logFile = createLogFile("getMatrixFromArrow") )
getMatrixFromArrow( ArrowFile = NULL, useMatrix = "GeneScoreMatrix", useSeqnames = NULL, cellNames = NULL, ArchRProj = NULL, verbose = TRUE, binarize = FALSE, logFile = createLogFile("getMatrixFromArrow") )
ArrowFile |
The path to an ArrowFile from which the selected data matrix should be obtained. |
useMatrix |
The name of the data matrix to retrieve from the given ArrowFile. Options include "TileMatrix", "GeneScoreMatrix", etc. |
useSeqnames |
A character vector of chromosome names to be used to subset the data matrix being obtained. |
cellNames |
A character vector indicating the cell names of a subset of cells from which fragments whould be extracted.
This allows for extraction of fragments from only a subset of selected cells. By default, this function will extract all cells from
the provided ArrowFile using |
ArchRProj |
An |
verbose |
A boolean value indicating whether to use verbose output during execution of this function. Can be set to FALSE for a cleaner output. |
binarize |
A boolean value indicating whether the matrix should be binarized before return. This is often desired when working with insertion counts. |
logFile |
The path to a file to be used for logging ArchR output. |
This function gets a given data matrix from an ArchRProject
and returns it as a SummarizedExperiment
.
This function will return the matrix you ask it for, without altering that matrix unless you tell it to.
For example, if you added your PeakMatrix
using addPeakMatrix()
with binarize = TRUE
, then
getMatrixFromProject()
will return a binarized PeakMatrix
. Alternatively, you could set binarize = TRUE
in the parameters passed to getMatrixFromProject()
and the PeakMatrix
will be binarized as you pull
it out. No other normalization is applied to the matrix by this function.
getMatrixFromProject( ArchRProj = NULL, useMatrix = "GeneScoreMatrix", useSeqnames = NULL, verbose = TRUE, binarize = FALSE, threads = getArchRThreads(), logFile = createLogFile("getMatrixFromProject") )
getMatrixFromProject( ArchRProj = NULL, useMatrix = "GeneScoreMatrix", useSeqnames = NULL, verbose = TRUE, binarize = FALSE, threads = getArchRThreads(), logFile = createLogFile("getMatrixFromProject") )
ArchRProj |
An |
useMatrix |
The name of the data matrix to retrieve from the given ArrowFile. Options include "TileMatrix", "GeneScoreMatrix", etc. |
useSeqnames |
A character vector of chromosome names to be used to subset the data matrix being obtained. |
verbose |
A boolean value indicating whether to use verbose output during execution of this function. Can be set to FALSE for a cleaner output. |
binarize |
A boolean value indicating whether the matrix should be binarized before return.
This is often desired when working with insertion counts. Note that if the matrix has already been binarized previously, this should be set to |
logFile |
The path to a file to be used for logging ArchR output. |
This function will use monocle3 to find trajectories and then returns a monocle CDS object that can be used as
input for addMonocleTrajectory
.
getMonocleTrajectories( ArchRProj = NULL, name = "Trajectory", useGroups = NULL, principalGroup = NULL, groupBy = NULL, embedding = NULL, clusterParams = list(), graphParams = list(), seed = 1 )
getMonocleTrajectories( ArchRProj = NULL, name = "Trajectory", useGroups = NULL, principalGroup = NULL, groupBy = NULL, embedding = NULL, clusterParams = list(), graphParams = list(), seed = 1 )
ArchRProj |
An |
name |
A string indicating the name of the fitted trajectory. |
useGroups |
A character vector that is used to select a subset of groups by name from the designated |
principalGroup |
The principal group which represents the group that will be the starting point for all trajectories. |
groupBy |
A string indicating the column name from |
embedding |
A string indicating the name of the |
clusterParams |
A list of parameters to be added when clustering cells for monocle3 with |
graphParams |
A list of parameters to be added when learning graphs for monocle3 with |
seed |
A number to be used as the seed for random number generation for trajectory creation. |
This function gets the outputDirectory from a given ArchRProject. If null this returns "QualityControl" directory.
getOutputDirectory(ArchRProj = NULL)
getOutputDirectory(ArchRProj = NULL)
ArchRProj |
An |
This function obtains peak-to-gene links from an ArchRProject.
getPeak2GeneLinks( ArchRProj = NULL, corCutOff = 0.45, FDRCutOff = 1e-04, varCutOffATAC = 0.25, varCutOffRNA = 0.25, resolution = 1, returnLoops = TRUE )
getPeak2GeneLinks( ArchRProj = NULL, corCutOff = 0.45, FDRCutOff = 1e-04, varCutOffATAC = 0.25, varCutOffRNA = 0.25, resolution = 1, returnLoops = TRUE )
ArchRProj |
An |
corCutOff |
A numeric describing the minimum numeric peak-to-gene correlation to return. |
FDRCutOff |
A numeric describing the maximum numeric peak-to-gene false discovery rate to return. |
varCutOffATAC |
A numeric describing the minimum variance quantile of the ATAC peak accessibility when selecting links. |
varCutOffRNA |
A numeric describing the minimum variance quantile of the RNA gene expression when selecting links. |
resolution |
A numeric describing the bp resolution to return loops as. This helps with overplotting of correlated regions. |
returnLoops |
A boolean indicating to return the peak-to-gene links as a |
This function gets a peakAnnotation from a given ArchRProject.
getPeakAnnotation(ArchRProj = NULL, name = NULL)
getPeakAnnotation(ArchRProj = NULL, name = NULL)
ArchRProj |
An |
name |
The name of the |
This function gets the peak set as a GRanges object from an ArchRProject.
getPeakSet(ArchRProj = NULL)
getPeakSet(ArchRProj = NULL)
ArchRProj |
An |
This function gets the peak annotation positions (i.e. Motifs) from a given ArchRProject.
getPositions(ArchRProj = NULL, name = NULL, annoName = NULL)
getPositions(ArchRProj = NULL, name = NULL, annoName = NULL)
ArchRProj |
An |
name |
The name of the |
annoName |
The name of a specific annotation to subset within the |
This function prints the projectSummary from an ArchRProject
getProjectSummary(ArchRProj = NULL, returnSummary = FALSE)
getProjectSummary(ArchRProj = NULL, returnSummary = FALSE)
ArchRProj |
An |
returnSummary |
A boolean value indicating whether to return a summary of the |
This function gets a dimensionality reduction object (i.e. UMAP, tSNE, etc) from a given ArchRProject.
getReducedDims( ArchRProj = NULL, reducedDims = "IterativeLSI", returnMatrix = TRUE, dimsToUse = NULL, scaleDims = NULL, corCutOff = 0.75 )
getReducedDims( ArchRProj = NULL, reducedDims = "IterativeLSI", returnMatrix = TRUE, dimsToUse = NULL, scaleDims = NULL, corCutOff = 0.75 )
ArchRProj |
An |
reducedDims |
The name of the |
returnMatrix |
If set to "mat" or "matrix", the function will return the |
dimsToUse |
A vector containing the dimensions (i.e. 1:30) to return from the |
scaleDims |
A boolean describing whether to z-score the reduced dimensions for each cell. This is useful for minimizing the
contribution of strong biases (dominating early PCs) and lowly abundant populations. However, this may lead to stronger sample-specific
biases since it is over-weighting latent PCs. If |
corCutOff |
A numeric cutoff for the correlation of each dimension to the sequencing depth. If the dimension has a correlation
to sequencing depth that is greater than the |
This function gets the sampleColData from a given ArchRProject.
getSampleColData(ArchRProj = NULL, select = NULL, drop = FALSE)
getSampleColData(ArchRProj = NULL, select = NULL, drop = FALSE)
ArchRProj |
An |
select |
A character vector containing the column names to select from |
drop |
A boolean value that indicates whether to drop the |
This function gets the names of all samples from a given ArchRProject.
getSampleNames(ArchRProj = NULL)
getSampleNames(ArchRProj = NULL)
ArchRProj |
An |
This function will identify available seqnames from a given data matrix (i.e. "GeneScoreMatrix", or "TileMatrix") and return them for downstream plotting utilities.
getSeqnames(ArchRProj = NULL, useMatrix = "GeneScoreMatrix")
getSeqnames(ArchRProj = NULL, useMatrix = "GeneScoreMatrix")
ArchRProj |
An |
useMatrix |
The name of the data matrix as stored in the ArrowFiles of the |
This function will download fragments for a small PBMC test dataset (2k Cells) spanning chr1 and 2 (~20MB).
getTestFragments(x)
getTestFragments(x)
This function will download an ArchRProject for a small PBMC test dataset (2k Cells) spanning chr1 and 2 (~2-300MB).
getTestProject()
getTestProject()
This function will get a supervised trajectory from an ArchRProject
(see addTrajectory
), get data
from a desired matrix, and smooth each value across the input trajectory.
getTrajectory( ArchRProj = NULL, name = "Trajectory", useMatrix = "GeneScoreMatrix", groupEvery = 1, log2Norm = TRUE, scaleTo = 10000, smoothWindow = 11, threads = getArchRThreads() )
getTrajectory( ArchRProj = NULL, name = "Trajectory", useMatrix = "GeneScoreMatrix", groupEvery = 1, log2Norm = TRUE, scaleTo = 10000, smoothWindow = 11, threads = getArchRThreads() )
ArchRProj |
An |
name |
A string indicating the name of the fitted trajectory in |
useMatrix |
The name of the data matrix from the |
groupEvery |
The number of sequential percentiles to group together when generating a trajectory. This is similar to smoothing
via a non-overlapping sliding window across pseudo-time. If |
log2Norm |
A boolean value that indicates whether the summarized trajectory matrix should be log2 transformed. If you are using a "MotifMatrix" set to FALSE. |
scaleTo |
Once the sequential trajectory matrix is created, each column in that matrix will be normalized to a column sum
indicated by |
smoothWindow |
An integer value indicating the smoothing window in size (relaive to |
threads |
The number of threads to be used for parallel computing. |
This function gets the transcription start sites (TSSs) as a GRanges object of all genes from the geneAnnotation of a given ArchRProject.
getTSS(ArchRProj = NULL)
getTSS(ArchRProj = NULL)
ArchRProj |
An |
This function will download data for a given tutorial and return the input files required for ArchR.
getTutorialData(tutorial = "hematopoiesis", threads = getArchRThreads())
getTutorialData(tutorial = "hematopoiesis", threads = getArchRThreads())
tutorial |
The name of the available tutorial for which to retreive the tutorial data. Currently, the only available option is "Hematopoiesis". "Hematopoiesis" is a small scATAC-seq dataset that spans the hematopoieitic hierarchy from stem cells to differentiated cells. This dataset is made up of cells from peripheral blood, bone marrow, and CD34+ sorted bone marrow. |
threads |
The number of threads to be used for parallel computing. |
This function will read in processed 10x cell ranger files and identify barcodes that are associated with a cell that passed QC.
getValidBarcodes(csvFiles = NULL, sampleNames = NULL)
getValidBarcodes(csvFiles = NULL, sampleNames = NULL)
csvFiles |
A character vector of names from 10x CSV files to be read in for identification of valid cell barcodes. |
sampleNames |
A character vector containing the sample names to be associated with each individual entry in |
This function will rank the variability of the deviations computed by ArchR and label the top variable annotations.
getVarDeviations(ArchRProj = NULL, name = "MotifMatrix", plot = TRUE, n = 25)
getVarDeviations(ArchRProj = NULL, name = "MotifMatrix", plot = TRUE, n = 25)
ArchRProj |
An |
name |
The name of the |
plot |
A boolean value indicating whether the ranked variability should be plotted for each peakAnnotation in |
n |
The number of annotations to label with |
This function aligns ggplots vertically or horizontally
ggAlignPlots(..., plotList = NULL, sizes = NULL, type = "v", draw = TRUE)
ggAlignPlots(..., plotList = NULL, sizes = NULL, type = "v", draw = TRUE)
... |
All additional arguments will be interpreted as |
plotList |
A list of |
sizes |
A numeric vector or list of values indicating the relative size for each of the objects in |
type |
A string indicating wheter vertical ("v") or horizontal ("h") alignment should be used for the multi-plot layout. |
draw |
A boolean value indicating whether to draw the plot(s) ( |
This function is a wrapper around ggplot geom_density_ridges or geom_violin to allow for plotting group distribution plots in ArchR.
ggGroup( x = NULL, y = NULL, xlabel = NULL, ylabel = NULL, groupOrder = NULL, groupSort = FALSE, size = 1, baseSize = 10, ridgeScale = 1, ratioYX = NULL, alpha = 1, title = "", pal = paletteDiscrete(values = x, set = "stallion"), addBoxPlot = TRUE, plotAs = "ridges", ... )
ggGroup( x = NULL, y = NULL, xlabel = NULL, ylabel = NULL, groupOrder = NULL, groupSort = FALSE, size = 1, baseSize = 10, ridgeScale = 1, ratioYX = NULL, alpha = 1, title = "", pal = paletteDiscrete(values = x, set = "stallion"), addBoxPlot = TRUE, plotAs = "ridges", ... )
x |
A character vector containing the categorical x-axis values for each y-axis value. |
y |
A numeric vector containing the y-axis values for each point. |
xlabel |
The label to plot for the x-axis. |
ylabel |
The label to plot for the y-axis. |
groupOrder |
A character vector indicating a custom order for plotting x-axis categorical values. Should contain all possible
values of |
groupSort |
A boolean indicating whether to sort groups based on the average value of the group. |
size |
The line width for boxplot/summary lines. |
baseSize |
The base font size (in points) to use in the plot. |
ridgeScale |
A numeric indicating the relative size for each ridge in the ridgeplot. |
ratioYX |
The aspect ratio of the x and y axes on the plot. |
alpha |
A number indicating the transparency to use for each point. See |
title |
The title of the plot. |
pal |
A named custom palette (see |
addBoxPlot |
A boolean indicating whether to add a boxplot to the plot if |
plotAs |
A string indicating how the groups should be plotted. Acceptable values are "ridges" (for a |
... |
Additional parameters to pass to |
This function will plot x,y coordinate values summarized in hexagons in a standardized manner
ggHex( x = NULL, y = NULL, color = NULL, pal = paletteContinuous(set = "solarExtra"), bins = 200, xlim = NULL, ylim = NULL, extend = 0.05, xlabel = "x", ylabel = "y", title = "", colorTitle = "values", baseSize = 6, ratioYX = 1, FUN = "median", hexCut = c(0.02, 0.98), addPoints = FALSE, ... )
ggHex( x = NULL, y = NULL, color = NULL, pal = paletteContinuous(set = "solarExtra"), bins = 200, xlim = NULL, ylim = NULL, extend = 0.05, xlabel = "x", ylabel = "y", title = "", colorTitle = "values", baseSize = 6, ratioYX = 1, FUN = "median", hexCut = c(0.02, 0.98), addPoints = FALSE, ... )
x |
A numeric vector containing the x-axis values for each point. |
y |
A numeric vector containing the y-axis values for each point. |
color |
A numeric/categorical vector containing coloring information for each point. |
pal |
A custom continuous palette from |
bins |
The number of bins to be used for plotting the hexplot. |
xlim |
A numeric vector of two values indicating the lower and upper bounds of the x-axis on the plot. |
ylim |
A numeric vector of two values indicating the lower and upper bounds of the y-axis on the plot. |
extend |
A numeric value indicating the fraction to extend the x-axis and y-axis beyond the maximum and minimum values if |
xlabel |
The label to plot for the x-axis. |
ylabel |
The label to plot for the y-axis. |
title |
The title of the plot. |
colorTitle |
The label to use for the legend corresponding to |
baseSize |
The base font size (in points) to use in the plot. |
ratioYX |
The aspect ratio of the x and y axes on the plot. |
FUN |
The function to use for summarizing data into hexagons. Typically "mean" or something similar. |
hexCut |
If this is not null, a quantile cut is performed to threshold the top and bottom of the distribution of values.
This prevents skewed color scales caused by strong outliers. The format of this should be c(a,b) where |
addPoints |
A boolean value indicating whether individual points should be shown on the hexplot. |
... |
Additional params for plotting |
This function is a wrapper around ggplot geom_point to allow for plotting one-to-one sample comparisons in ArchR.
ggOneToOne( x = NULL, y = NULL, size = 2, alpha = 1, xlabel = "x", ylabel = "y", title = "Correlation", min = 0.05, max = 0.9999, nPlot = 100 * 10^3, nKernel = 100, densityMax = 0.95, extend = 0.05, baseSize = 6, rastr = TRUE, pal = paletteContinuous(set = "blueYellow"), ... )
ggOneToOne( x = NULL, y = NULL, size = 2, alpha = 1, xlabel = "x", ylabel = "y", title = "Correlation", min = 0.05, max = 0.9999, nPlot = 100 * 10^3, nKernel = 100, densityMax = 0.95, extend = 0.05, baseSize = 6, rastr = TRUE, pal = paletteContinuous(set = "blueYellow"), ... )
x |
A numeric vector containing the x-axis values for each point. |
y |
A numeric vector containing the y-axis values for each point. |
size |
The numeric size of the points to plot. |
alpha |
A number indicating the transparency to use for each point. See |
xlabel |
The label to plot for the x-axis. |
ylabel |
The label to plot for the y-axis. |
title |
The title of the plot. |
min |
The lower limit of the x and y axes as a numeric quantile between 0 and 1. |
max |
The upper limit of the x and y axes as a numeric quantile between 0 and 1. |
nPlot |
The number of points to plot. When this value is less than the total points, the |
nKernel |
The number of grid points in each direction to use when computing the kernel with |
densityMax |
The quantile that should be represented by the maximum color on the continuous scale designated by |
extend |
A numeric value indicating the fraction to extend the x-axis and y-axis beyond the maximum value on either axis. For example, 0.05 will extend the x-axis and y-axis by 5 percent on each end beyond |
baseSize |
The base font size (in points) to use in the plot. |
rastr |
A boolean value that indicates whether the plot should be rasterized. This does not rasterize lines and labels, just the internal portions of the plot. |
pal |
A custom palette from |
... |
Additional params to be supplied to ggPoint |
This function is a wrapper around ggplot geom_point to allow for a more intuitive plotting of ArchR data.
ggPoint( x = NULL, y = NULL, color = NULL, discrete = TRUE, discreteSet = "stallion", continuousSet = "solarExtra", labelMeans = TRUE, pal = NULL, defaultColor = "lightGrey", highlightPoints = NULL, colorDensity = FALSE, size = 1, xlim = NULL, ylim = NULL, extend = 0.05, xlabel = "x", ylabel = "y", title = "", randomize = FALSE, seed = 1, colorTitle = NULL, colorOrder = NULL, colorLimits = NULL, alpha = 1, baseSize = 10, legendSize = 3, ratioYX = 1, labelAsFactors = TRUE, fgColor = "black", bgColor = "white", bgWidth = 1, labelSize = 3, addFit = NULL, rastr = FALSE, dpi = 300, ... )
ggPoint( x = NULL, y = NULL, color = NULL, discrete = TRUE, discreteSet = "stallion", continuousSet = "solarExtra", labelMeans = TRUE, pal = NULL, defaultColor = "lightGrey", highlightPoints = NULL, colorDensity = FALSE, size = 1, xlim = NULL, ylim = NULL, extend = 0.05, xlabel = "x", ylabel = "y", title = "", randomize = FALSE, seed = 1, colorTitle = NULL, colorOrder = NULL, colorLimits = NULL, alpha = 1, baseSize = 10, legendSize = 3, ratioYX = 1, labelAsFactors = TRUE, fgColor = "black", bgColor = "white", bgWidth = 1, labelSize = 3, addFit = NULL, rastr = FALSE, dpi = 300, ... )
x |
A numeric vector containing the x-axis values for each point. |
y |
A numeric vector containing the y-axis values for each point. |
color |
A numeric/categorical vector used to determine the coloration for each point. |
discrete |
A boolean value indicating whether the supplied data is discrete ( |
discreteSet |
The name of a custom palette from |
continuousSet |
The name of a custom palette from |
labelMeans |
A boolean value indicating whether the mean of each categorical/discrete color should be labeled. |
pal |
A custom palette used to override discreteSet/continuousSet for coloring vector. |
defaultColor |
The default color for points that do not have another color applied (i.e. |
highlightPoints |
A integer vector describing which points to hightlight. The remainder of points will be colored light gray. |
colorDensity |
A boolean value indicating whether the density of points on the plot should be indicated by color.
If |
size |
The numeric size of the points to be plotted. |
xlim |
A numeric vector of two values indicating the lower and upper bounds of the x-axis on the plot. |
ylim |
A numeric vector of two values indicating the lower and upper bounds of the y-axis on the plot. |
extend |
A numeric value indicating the fraction to extend the x-axis and y-axis beyond the maximum and minimum
values if |
xlabel |
The label to plot for the x-axis. |
ylabel |
The label to plot for the y-axis. |
title |
The title of the plot. |
randomize |
A boolean value indicating whether to randomize the order of the points when plotting. |
seed |
A numeric seed number for use in randomization. |
colorTitle |
A title to be added to the legend if |
colorOrder |
A vector that allows you to control the order of palette colors associated with the values in |
colorLimits |
A numeric vector of two values indicating the lower and upper bounds of colors if numeric. Values beyond these limits are thresholded. |
alpha |
A number indicating the transparency to use for each point. See |
baseSize |
The base font size (in points) to use in the plot. |
legendSize |
The size in inches to use for plotting the color legend. |
ratioYX |
The aspect ratio of the x and y axes on the plot. |
labelAsFactors |
A boolean indicating whether to label the |
fgColor |
The foreground color of the plot. |
bgColor |
The background color of the plot. |
bgWidth |
The background relative width size of the halos in the labeling. |
labelSize |
The numeric font size of labels. |
addFit |
A string indicating the method to use for adding a fit/regression line to the plot (see |
rastr |
A boolean value that indicates whether the plot should be rasterized using |
dpi |
The resolution in dots per inch to use for the plot. |
This function will import the feature matrix from a 10x feature hdf5 file.
import10xFeatureMatrix( input = NULL, names = NULL, strictMatch = TRUE, verbose = TRUE, featureType = "Gene Expression" )
import10xFeatureMatrix( input = NULL, names = NULL, strictMatch = TRUE, verbose = TRUE, featureType = "Gene Expression" )
input |
A character of paths to 10x feature hdf5 file(s). These will traditionally have a suffix similar to "filtered_feature_bc_matrix.h5". |
names |
A character of sample names associated with each input file. |
strictMatch |
Only relevant when multiple input files are used. A boolean that indictes whether rows (genes) that do not match perfectly in the matrices
should be removed ( |
verbose |
Only relevant when multiple input files are used. A boolean that indicates whether messaging about mismatches should be verbose ( |
featureType |
The name of the feature to extract from the 10x feature file. See https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/advanced/h5_matrices for more information. |
This function gets imputation weights from an ArchRProject to impute a numerical matrix
imputeMatrix( mat = NULL, imputeWeights = NULL, threads = getArchRThreads(), verbose = FALSE, logFile = createLogFile("imputeMatrix") )
imputeMatrix( mat = NULL, imputeWeights = NULL, threads = getArchRThreads(), verbose = FALSE, logFile = createLogFile("imputeMatrix") )
mat |
A matrix or sparseMatrix of class dgCMatrix to be imputed. |
imputeWeights |
An R object containing impute weights as returned by |
threads |
The number of threads to be used for parallel computing. |
verbose |
A boolean value indicating whether to use verbose output during execution of this function. Can be set to FALSE for a cleaner output. |
logFile |
The path to a file to be used for logging ArchR output. |
This function will install extra packages used in ArchR that are not installed by default.
installExtraPackages(force = FALSE)
installExtraPackages(force = FALSE)
force |
If you want to force a reinstall of these pacakges. |
This function will load a previously saved ArchRProject and re-normalize paths for usage.
loadArchRProject(path = "./", force = FALSE, showLogo = TRUE)
loadArchRProject(path = "./", force = FALSE, showLogo = TRUE)
path |
A character path to an |
force |
A boolean value indicating whether missing optional |
showLogo |
A boolean value indicating whether to show the ascii ArchR logo after successful creation of an |
This function takes a character vector of labels and uses a set of old and new labels to re-map from the old label set to the new label set.
mapLabels(labels = NULL, newLabels = NULL, oldLabels = names(newLabels))
mapLabels(labels = NULL, newLabels = NULL, oldLabels = names(newLabels))
labels |
A character vector containing lables to map. |
newLabels |
A character vector (same length as oldLabels) to map labels to from oldLabels. |
oldLabels |
A character vector (same length as newLabels) to map labels from to newLabels |
This function gets number of cells from an ArchRProject or ArrowFile
nCells(input = NULL)
nCells(input = NULL)
input |
An |
This function returns a GRanges object containing a non-overlapping set regions derived from a supplied Genomic Ranges object.
nonOverlappingGR(gr = NULL, by = "score", decreasing = TRUE, verbose = FALSE)
nonOverlappingGR(gr = NULL, by = "score", decreasing = TRUE, verbose = FALSE)
gr |
A |
by |
The name of a column in |
decreasing |
A boolean value indicating whether the values in the column indicated via |
verbose |
A boolean value indicating whether the output should include extra reporting. |
Continuous Color Palette
paletteContinuous(set = "solarExtra", n = 256, reverse = FALSE)
paletteContinuous(set = "solarExtra", n = 256, reverse = FALSE)
set |
The name of a color palette provided in the |
n |
The number of unique colors to generate as part of this continuous color palette. |
reverse |
A boolean variable that indicates whether to return the palette colors in reverse order. |
This function assesses the number of inputs and returns a discrete color palette that is tailored to provide the most possible color contrast from the designated color set.
paletteDiscrete(values = NULL, set = "stallion", reverse = FALSE)
paletteDiscrete(values = NULL, set = "stallion", reverse = FALSE)
values |
A character vector containing the sample names that will be used. Each entry in this character vector will be given a unique color from the designated palette set. |
set |
The name of a color palette provided in the |
reverse |
A boolean variable that indicates whether to return the palette colors in reverse order. |
This function will perform hypergeometric enrichment of a given peak annotation within the defined marker peaks.
peakAnnoEnrichment( seMarker = NULL, ArchRProj = NULL, peakAnnotation = NULL, matches = NULL, cutOff = "FDR <= 0.1 & Log2FC >= 0.5", background = "all", logFile = createLogFile("peakAnnoEnrichment") )
peakAnnoEnrichment( seMarker = NULL, ArchRProj = NULL, peakAnnotation = NULL, matches = NULL, cutOff = "FDR <= 0.1 & Log2FC >= 0.5", background = "all", logFile = createLogFile("peakAnnoEnrichment") )
seMarker |
A |
ArchRProj |
An |
peakAnnotation |
A |
matches |
A custom |
cutOff |
A valid-syntax logical statement that defines which marker features from |
background |
A string that indicates whether to use a background set of matched peaks to compare against ("bgdPeaks") or all peaks ("all"). |
logFile |
The path to a file to be used for logging ArchR output. |
This function will plot the coverage at an input region in the style of a browser track. It allows for normalization of the signal
which enables direct comparison across samples. Note that the genes displayed in these plots are derived from your geneAnnotation
(i.e. the BSgenome
object you used) so they may not match other online genome browsers that use different gene annotations.
plotBrowserTrack( ArchRProj = NULL, region = NULL, groupBy = "Clusters", useGroups = NULL, plotSummary = c("bulkTrack", "featureTrack", "loopTrack", "geneTrack"), sizes = c(10, 1.5, 3, 4), features = getPeakSet(ArchRProj), loops = getCoAccessibility(ArchRProj), geneSymbol = NULL, useMatrix = NULL, log2Norm = TRUE, upstream = 50000, downstream = 50000, tileSize = 250, minCells = 25, normMethod = "ReadsInTSS", threads = getArchRThreads(), ylim = NULL, pal = NULL, baseSize = 7, scTileSize = 0.5, scCellsMax = 100, borderWidth = 0.4, tickWidth = 0.4, facetbaseSize = 7, geneAnnotation = getGeneAnnotation(ArchRProj), title = "", verbose = TRUE, logFile = createLogFile("plotBrowserTrack") )
plotBrowserTrack( ArchRProj = NULL, region = NULL, groupBy = "Clusters", useGroups = NULL, plotSummary = c("bulkTrack", "featureTrack", "loopTrack", "geneTrack"), sizes = c(10, 1.5, 3, 4), features = getPeakSet(ArchRProj), loops = getCoAccessibility(ArchRProj), geneSymbol = NULL, useMatrix = NULL, log2Norm = TRUE, upstream = 50000, downstream = 50000, tileSize = 250, minCells = 25, normMethod = "ReadsInTSS", threads = getArchRThreads(), ylim = NULL, pal = NULL, baseSize = 7, scTileSize = 0.5, scCellsMax = 100, borderWidth = 0.4, tickWidth = 0.4, facetbaseSize = 7, geneAnnotation = getGeneAnnotation(ArchRProj), title = "", verbose = TRUE, logFile = createLogFile("plotBrowserTrack") )
ArchRProj |
An |
region |
A |
groupBy |
A string that indicates how cells should be grouped. This string corresponds to one of the standard or
user-supplied |
useGroups |
A character vector that is used to select a subset of groups by name from the designated |
plotSummary |
A character vector containing the features to be potted. Possible values include "bulkTrack" (the ATAC-seq signal), "scTrack" (scATAC-seq signal), "featureTrack" (i.e. the peak regions), "geneTrack" (line diagrams of genes with introns and exons shown. Blue-colored genes are on the minus strand and red-colored genes are on the plus strand), and "loopTrack" (links between a peak and a gene). |
sizes |
A numeric vector containing up to 3 values that indicate the sizes of the individual components passed to |
features |
A |
loops |
A |
geneSymbol |
If |
useMatrix |
If supplied geneSymbol, one can plot the corresponding GeneScores/GeneExpression within this matrix. I.E. "GeneScoreMatrix" |
log2Norm |
If supplied geneSymbol, Log2 normalize the corresponding GeneScores/GeneExpression matrix before plotting. |
upstream |
The number of basepairs upstream of the transcription start site of |
downstream |
The number of basepairs downstream of the transcription start site of |
tileSize |
The numeric width of the tile/bin in basepairs for plotting ATAC-seq signal tracks. All insertions in a single bin will be summed. |
minCells |
The minimum number of cells contained within a cell group to allow for this cell group to be plotted. This argument can be used to exclude pseudo-bulk replicates generated from low numbers of cells. |
normMethod |
The name of the column in |
threads |
The number of threads to use for parallel execution. |
ylim |
The numeric quantile y-axis limit to be used for for "bulkTrack" plotting. This should be expressed as |
pal |
A custom palette (see |
baseSize |
The numeric font size to be used in the plot. This applies to all plot labels. |
scTileSize |
The width of the tiles in scTracks. Larger numbers may make cells overlap more. Default is 0.5 for about 100 cells. |
scCellsMax |
The maximum number of cells for scTracks. |
borderWidth |
The numeric line width to be used for plot borders. |
tickWidth |
The numeric line width to be used for axis tick marks. |
facetbaseSize |
The numeric font size to be used in the facets (gray boxes used to provide track labels) of the plot. |
geneAnnotation |
The |
title |
The title to add at the top of the plot next to the plot's genomic coordinates. |
verbose |
A boolean value that determines whether standard output should be printed. |
logFile |
The path to a file to be used for logging ArchR output. |
This function will plot an embedding stored in an ArchRProject
plotEmbedding( ArchRProj = NULL, embedding = "UMAP", colorBy = "cellColData", name = "Sample", log2Norm = NULL, imputeWeights = if (!grepl("coldata", tolower(colorBy[1]))) getImputeWeights(ArchRProj), pal = NULL, size = 0.1, sampleCells = NULL, highlightCells = NULL, rastr = TRUE, quantCut = c(0.01, 0.99), discreteSet = NULL, continuousSet = NULL, randomize = TRUE, keepAxis = FALSE, baseSize = 10, plotAs = NULL, threads = getArchRThreads(), logFile = createLogFile("plotEmbedding"), ... )
plotEmbedding( ArchRProj = NULL, embedding = "UMAP", colorBy = "cellColData", name = "Sample", log2Norm = NULL, imputeWeights = if (!grepl("coldata", tolower(colorBy[1]))) getImputeWeights(ArchRProj), pal = NULL, size = 0.1, sampleCells = NULL, highlightCells = NULL, rastr = TRUE, quantCut = c(0.01, 0.99), discreteSet = NULL, continuousSet = NULL, randomize = TRUE, keepAxis = FALSE, baseSize = 10, plotAs = NULL, threads = getArchRThreads(), logFile = createLogFile("plotEmbedding"), ... )
ArchRProj |
An |
embedding |
The name of the embedding stored in the |
colorBy |
A string indicating whether points in the plot should be colored by a column in |
name |
The name of the column in |
log2Norm |
A boolean value indicating whether a log2 transformation should be performed on the values (if continuous) in plotting. |
imputeWeights |
The weights to be used for imputing numerical values for each cell as a linear combination of other cells values.
See |
pal |
A custom palette used to override discreteSet/continuousSet for coloring cells. Typically created using |
size |
A number indicating the size of the points to plot if |
sampleCells |
A numeric describing number of cells to use for plot. If using impute weights, this will occur after imputation. |
highlightCells |
A character vector of cellNames describing which cells to hightlight if using |
rastr |
A boolean value that indicates whether the plot should be rasterized. This does not rasterize lines and labels, just the internal portions of the plot. |
quantCut |
If this is not |
discreteSet |
The name of a discrete palette from |
continuousSet |
The name of a continuous palette from |
randomize |
A boolean value that indicates whether to randomize points prior to plotting to prevent cells from one cluster being uniformly present at the front of the plot. |
keepAxis |
A boolean value that indicates whether the x- and y-axis ticks and labels should be plotted. |
baseSize |
The base font size to use in the plot. |
plotAs |
A string that indicates whether points ("points") should be plotted or a hexplot ("hex") should be plotted. By default
if |
threads |
The number of threads to be used for parallel computing. |
logFile |
The path to a file to be used for logging ArchR output. |
... |
Additional parameters to pass to |
This function will plot a heatmap of hypergeometric enrichment of a given peakAnnotation within the defined marker peaks.
plotEnrichHeatmap( seEnrich = NULL, pal = paletteContinuous(set = "comet", n = 100), n = 10, cutOff = 20, pMax = Inf, clusterCols = TRUE, binaryClusterRows = TRUE, labelRows = TRUE, rastr = TRUE, transpose = FALSE, returnMatrix = FALSE, logFile = createLogFile("plotEnrichHeatmap") )
plotEnrichHeatmap( seEnrich = NULL, pal = paletteContinuous(set = "comet", n = 100), n = 10, cutOff = 20, pMax = Inf, clusterCols = TRUE, binaryClusterRows = TRUE, labelRows = TRUE, rastr = TRUE, transpose = FALSE, returnMatrix = FALSE, logFile = createLogFile("plotEnrichHeatmap") )
seEnrich |
A |
pal |
A custom continuous palette (see |
n |
The number of top enriched peakAnnotations per column from the |
cutOff |
A numeric cutOff that indicates the minimum P-adj enrichment to be included in the heatmap. |
pMax |
A numeric representing the maximum P-adj for plotting in the heatmap. |
clusterCols |
A boolean indicating whether or not to cluster columns in the heatmap. |
binaryClusterRows |
A boolean indicating whether or not to cluster rows using binary classification in the heatmap. |
labelRows |
A boolean indicating whether or not to label all rows in the heatmap. |
rastr |
A boolean value that indicates whether the plot should be rasterized using |
transpose |
A boolean determining whether to transpose the heatmap in the plot. |
returnMatrix |
A boolean determining whether to return the matrix corresponding to the heatmap rather than generate a plot. |
logFile |
The path to a file to be used for logging ArchR output. |
This function will get footprints for all samples in a given ArchRProject or a properly-formatted Summarized Experiment
plotFootprints( seFoot = NULL, names = NULL, pal = NULL, flank = 250, flankNorm = 50, normMethod = "Subtract", smoothWindow = NULL, baseSize = 6, plot = TRUE, ArchRProj = NULL, plotName = paste0("Plot-Footprints-", normMethod), height = 6, width = 4, addDOC = TRUE, force = FALSE, logFile = createLogFile("plotFootprints") )
plotFootprints( seFoot = NULL, names = NULL, pal = NULL, flank = 250, flankNorm = 50, normMethod = "Subtract", smoothWindow = NULL, baseSize = 6, plot = TRUE, ArchRProj = NULL, plotName = paste0("Plot-Footprints-", normMethod), height = 6, width = 4, addDOC = TRUE, force = FALSE, logFile = createLogFile("plotFootprints") )
seFoot |
A summarized experiment object containing information on footprints returned by the |
names |
A character vector containing the names of the transcription factors to be plotted. These should match colnames of |
pal |
The name of a custom palette from |
flank |
The number of basepairs from the position center (+/-) to consider as the flank. |
flankNorm |
The number of basepairs to consider at the edge of the flank region (+/-) to be used for footprint normalization. |
normMethod |
The name of the normalization method to use to normalize the footprint relative to the Tn5 insertion bias. Options include "none", "subtract", "divide". "Subtract" means subtracting the normalized Tn5 Bias. "Divide" means dividing the normalized Tn5 Bias. |
smoothWindow |
The size in basepairs of the sliding window to be used for smoothing of the footprint signal. |
baseSize |
A numeric specifying the baseSize of font in the plots. |
plot |
A boolean value indicating whether or not the footprints should be plotted ( |
ArchRProj |
An |
plotName |
A string indicating the name/prefix of the file to be used for output plots. |
height |
The height in inches to be used for the output PDF file. |
width |
The width in inches to be used for the output PDF file. |
addDOC |
A boolean variable that determines whether to add the date of creation to end of the PDF file name. This is useful for preventing overwritting of old plots. |
force |
If many footprints are requested when plot = FALSE, please set force = TRUE. This prevents large amount of footprint plots stored as an object. |
logFile |
The path to a file to be used for logging ArchR output. |
This function will plot a fragment size distribution for each sample. Only cells in the ArchRProject
are used when making this plot.
plotFragmentSizes( ArchRProj = NULL, groupBy = "Sample", chromSizes = getChromSizes(ArchRProj), maxSize = 750, pal = NULL, returnDF = FALSE, threads = getArchRThreads(), logFile = createLogFile("plotFragmentSizes") )
plotFragmentSizes( ArchRProj = NULL, groupBy = "Sample", chromSizes = getChromSizes(ArchRProj), maxSize = 750, pal = NULL, returnDF = FALSE, threads = getArchRThreads(), logFile = createLogFile("plotFragmentSizes") )
ArchRProj |
An |
groupBy |
The name of the column in |
chromSizes |
A GRanges object of the chromosome lengths. See |
maxSize |
The maximum fragment size (in basepairs) to be included when plotting the fragment size distribution. |
pal |
A color palette representing the groups from groupBy in fragment size plot. |
returnDF |
A boolean value that indicates whether to return a |
threads |
An integer specifying the number of threads to use for calculation. By default this uses the number of threads set by |
logFile |
The path to a file to be used for logging ArchR output. |
This function will group, summarize and then plot data from an ArchRProject for visual comparison.
plotGroups( ArchRProj = NULL, groupBy = "Sample", colorBy = "colData", name = "TSSEnrichment", imputeWeights = if (!grepl("coldata", tolower(colorBy[1]))) getImputeWeights(ArchRProj), maxCells = 1000, quantCut = c(0.002, 0.998), log2Norm = NULL, pal = NULL, discreteSet = "stallion", ylim = NULL, size = 0.5, baseSize = 6, ratioYX = NULL, ridgeScale = 2, plotAs = "ridges", threads = getArchRThreads(), ... )
plotGroups( ArchRProj = NULL, groupBy = "Sample", colorBy = "colData", name = "TSSEnrichment", imputeWeights = if (!grepl("coldata", tolower(colorBy[1]))) getImputeWeights(ArchRProj), maxCells = 1000, quantCut = c(0.002, 0.998), log2Norm = NULL, pal = NULL, discreteSet = "stallion", ylim = NULL, size = 0.5, baseSize = 6, ratioYX = NULL, ridgeScale = 2, plotAs = "ridges", threads = getArchRThreads(), ... )
ArchRProj |
An |
groupBy |
The name of the column in |
colorBy |
A string indicating whether the numeric values to be used in the violin plot should be from a column in
|
name |
The name of the column in |
imputeWeights |
The weights to be used for imputing numerical values for each cell as a linear combination of other cells values. See |
maxCells |
The maximum cells to consider when making the plot. |
quantCut |
If this is not null, a quantile cut is performed to threshold the top and bottom of the distribution of values.
This prevents skewed color scales caused by strong outliers. The format of this should be c(a,b) where |
log2Norm |
A boolean value indicating whether a log2 transformation should be performed on the values (if continuous) in plotting. |
pal |
A custom palette (see |
discreteSet |
The name of a discrete palette from |
ylim |
A vector of two numeric values indicating the lower and upper bounds of the y-axis on the plot. |
size |
The numeric size of the points to be plotted. |
baseSize |
The base font size to use in the plot. |
ratioYX |
The aspect ratio of the x and y axes on the plot. |
ridgeScale |
The scale factor for the relative heights of each ridge when making a ridgeplot with |
plotAs |
A string that indicates whether a rigdge plot ("ridges") should be plotted or a violin plot ("violin") should be plotted. |
threads |
The number of threads to be used for parallel computing. |
... |
Additional parameters to pass to |
This function will plot a heatmap of the results from markerFeatures
plotMarkerHeatmap( seMarker = NULL, cutOff = "FDR <= 0.01 & Log2FC >= 0.5", log2Norm = TRUE, scaleTo = 10^4, scaleRows = TRUE, plotLog2FC = FALSE, limits = c(-2, 2), grepExclude = NULL, pal = NULL, binaryClusterRows = TRUE, clusterCols = TRUE, labelMarkers = NULL, nLabel = 15, nPrint = 15, labelRows = FALSE, returnMatrix = FALSE, transpose = FALSE, invert = FALSE, logFile = createLogFile("plotMarkerHeatmap") )
plotMarkerHeatmap( seMarker = NULL, cutOff = "FDR <= 0.01 & Log2FC >= 0.5", log2Norm = TRUE, scaleTo = 10^4, scaleRows = TRUE, plotLog2FC = FALSE, limits = c(-2, 2), grepExclude = NULL, pal = NULL, binaryClusterRows = TRUE, clusterCols = TRUE, labelMarkers = NULL, nLabel = 15, nPrint = 15, labelRows = FALSE, returnMatrix = FALSE, transpose = FALSE, invert = FALSE, logFile = createLogFile("plotMarkerHeatmap") )
seMarker |
A |
cutOff |
A valid-syntax logical statement that defines which marker features from |
log2Norm |
A boolean value indicating whether a log2 transformation should be performed on the values in
|
scaleTo |
Each column in the assay Mean from |
scaleRows |
A boolean value that indicates whether the heatmap should display row-wise z-scores instead of raw values. |
limits |
A numeric vector of two numbers that represent the lower and upper limits of the heatmap color scheme. |
grepExclude |
A character vector or string that indicates the |
pal |
A custom continuous palette from |
binaryClusterRows |
A boolean value that indicates whether a binary sorting algorithm should be used for fast clustering of heatmap rows. |
clusterCols |
A boolean value that indicates whether the columns of the marker heatmap should be clustered. |
labelMarkers |
A character vector listing the |
nLabel |
An integer value that indicates whether the top |
nPrint |
If provided |
labelRows |
A boolean value that indicates whether all rows should be labeled on the side of the heatmap. |
returnMatrix |
A boolean value that indicates whether the final heatmap matrix should be returned in lieu of plotting the actual heatmap. |
transpose |
A boolean value that indicates whether the heatmap should be transposed prior to plotting or returning. |
invert |
A boolean value that indicates whether the heatmap will display the features with the
lowest |
logFile |
The path to a file to be used for logging ArchR output. |
This function will plot one group/column of a differential markers as an MA or Volcano plot.
plotMarkers( seMarker = NULL, name = NULL, cutOff = "FDR <= 0.01 & abs(Log2FC) >= 0.5", plotAs = "Volcano", scaleTo = 10^4, rastr = TRUE )
plotMarkers( seMarker = NULL, name = NULL, cutOff = "FDR <= 0.01 & abs(Log2FC) >= 0.5", plotAs = "Volcano", scaleTo = 10^4, rastr = TRUE )
seMarker |
A |
name |
The name of a column in |
cutOff |
A valid-syntax logical statement that defines which marker features from |
plotAs |
A string indicating whether to plot a volcano plot ("Volcano") or an MA plot ("MA"). |
rastr |
A boolean value that indicates whether the plot should be rasterized using |
This function will save a plot or set of plots as a PDF file in the outputDirectory of a given ArchRProject.
plotPDF( ..., name = "Plot", width = 6, height = 6, ArchRProj = NULL, addDOC = TRUE, useDingbats = FALSE, plotList = NULL )
plotPDF( ..., name = "Plot", width = 6, height = 6, ArchRProj = NULL, addDOC = TRUE, useDingbats = FALSE, plotList = NULL )
... |
vector of plots to be plotted (if input is a list use plotList instead) |
name |
The file name to be used for the output PDF file. |
width |
The width in inches to be used for the output PDF file. |
height |
The height in inches to be used for the output PDF. |
ArchRProj |
An |
addDOC |
A boolean variable that determines whether to add the date of creation to the end of the PDF file name. This is useful for preventing overwritting of old plots. |
useDingbats |
A boolean variable that determines wheter to use dingbats characters for plotting points. |
plotList |
A |
This function plots side by side heatmaps of linked ATAC and Gene regions from addPeak2GeneLinks
.
plotPeak2GeneHeatmap( ArchRProj = NULL, corCutOff = 0.45, FDRCutOff = 1e-04, varCutOffATAC = 0.25, varCutOffRNA = 0.25, k = 25, nPlot = 25000, limitsATAC = c(-2, 2), limitsRNA = c(-2, 2), groupBy = "Clusters", palGroup = NULL, palATAC = paletteContinuous("solarExtra"), palRNA = paletteContinuous("blueYellow"), verbose = TRUE, returnMatrices = FALSE, seed = 1, logFile = createLogFile("plotPeak2GeneHeatmap") )
plotPeak2GeneHeatmap( ArchRProj = NULL, corCutOff = 0.45, FDRCutOff = 1e-04, varCutOffATAC = 0.25, varCutOffRNA = 0.25, k = 25, nPlot = 25000, limitsATAC = c(-2, 2), limitsRNA = c(-2, 2), groupBy = "Clusters", palGroup = NULL, palATAC = paletteContinuous("solarExtra"), palRNA = paletteContinuous("blueYellow"), verbose = TRUE, returnMatrices = FALSE, seed = 1, logFile = createLogFile("plotPeak2GeneHeatmap") )
ArchRProj |
An |
corCutOff |
A numeric describing the minimum numeric peak-to-gene correlation to return. |
FDRCutOff |
A numeric describing the maximum numeric peak-to-gene false discovery rate to return. |
varCutOffATAC |
A numeric describing the minimum variance quantile of the ATAC peak accessibility when selecting links. |
varCutOffRNA |
A numeric describing the minimum variance quantile of the RNA gene expression when selecting links. |
k |
An integer describing the number of k-means clusters to group peak-to-gene links prior to plotting heatmaps. |
nPlot |
An integer describing the maximum number of peak-to-gene links to plot in heatmap. |
limitsATAC |
An integer describing the maximum number of peak-to-gene links to plot in heatmap. |
limitsRNA |
An integer describing the maximum number of peak-to-gene links to plot in heatmap. |
groupBy |
The name of the column in |
palGroup |
A color palette describing the colors in |
palATAC |
A color palette describing the colors to be used for the ATAC heatmap. For example, paletteContinuous("solarExtra"). |
palRNA |
A color palette describing the colors to be used for the RNA heatmap. For example, paletteContinuous("blueYellow"). |
verbose |
A boolean value that determines whether standard output should be printed. |
returnMatrices |
A boolean value that determines whether the matrices should be returned with kmeans id versus plotting. |
seed |
A number to be used as the seed for random number generation. It is recommended to keep track of the seed used so that you can reproduce results downstream. |
logFile |
The path to a file to be used for logging ArchR output. |
This function will plot a trajectory that was created onto an embedding.
plotTrajectory( ArchRProj = NULL, embedding = "UMAP", trajectory = "Trajectory", colorBy = "colData", name = "Trajectory", log2Norm = NULL, imputeWeights = if (!grepl("coldata", tolower(colorBy[1]))) getImputeWeights(ArchRProj), pal = NULL, size = 0.2, rastr = TRUE, quantCut = c(0.01, 0.99), quantHex = 0.5, discreteSet = NULL, continuousSet = NULL, randomize = TRUE, keepAxis = FALSE, baseSize = 6, addArrow = TRUE, plotAs = NULL, smoothWindow = 5, logFile = createLogFile("plotTrajectory"), ... )
plotTrajectory( ArchRProj = NULL, embedding = "UMAP", trajectory = "Trajectory", colorBy = "colData", name = "Trajectory", log2Norm = NULL, imputeWeights = if (!grepl("coldata", tolower(colorBy[1]))) getImputeWeights(ArchRProj), pal = NULL, size = 0.2, rastr = TRUE, quantCut = c(0.01, 0.99), quantHex = 0.5, discreteSet = NULL, continuousSet = NULL, randomize = TRUE, keepAxis = FALSE, baseSize = 6, addArrow = TRUE, plotAs = NULL, smoothWindow = 5, logFile = createLogFile("plotTrajectory"), ... )
ArchRProj |
An |
embedding |
The name of the embedding to use to visualize the given |
trajectory |
The column name in |
colorBy |
A string indicating whether points in the plot should be colored by a column in |
name |
The name of the column in |
log2Norm |
A boolean value indicating whether a log2 transformation should be performed on the values from |
imputeWeights |
The weights to be used for imputing numerical values for each cell as a linear combination of other cells'
values. See |
pal |
The name of a custom palette from |
size |
A number indicating the size of the points to plot if |
rastr |
A boolean value that indicates whether the plot should be rasterized. This does not rasterize lines and labels, just the internal portions of the plot. |
quantCut |
If this is not |
quantHex |
The numeric xth quantile of all dots within each individual hexagon will determine the numerical value for
coloring to be displayed. This occurs when (i) |
discreteSet |
The name of a discrete palette from |
continuousSet |
The name of a continuous palette from |
randomize |
A boolean value that indicates whether to randomize points prior to plotting to prevent cells from one cluster being present at the front of the plot. |
keepAxis |
A boolean value that indicates whether the x and y axis ticks and labels should be plotted. |
baseSize |
The base font size to use in the plot. |
addArrow |
A boolean value that indicates whether to add a smoothed arrow in the embedding based on the aligned trajectory. |
plotAs |
A string that indicates whether points ("points") should be plotted or a hexplot ("hex") should be plotted. By default
if |
smoothWindow |
An integer value indicating the smoothing window for creating inferred Arrow overlay on to embedding. |
logFile |
The path to a file to be used for logging ArchR output. |
... |
Additional parameters to pass to |
This function will plot a heatmap of the results from getTrajectory
plotTrajectoryHeatmap( seTrajectory = NULL, varCutOff = 0.9, maxFeatures = 25000, scaleRows = TRUE, limits = c(-1.5, 1.5), grepExclude = NULL, pal = NULL, labelMarkers = NULL, labelTop = 50, labelRows = FALSE, rowOrder = NULL, useSeqnames = NULL, returnMatrix = FALSE, force = FALSE, logFile = createLogFile("plotTrajectoryHeatmap") )
plotTrajectoryHeatmap( seTrajectory = NULL, varCutOff = 0.9, maxFeatures = 25000, scaleRows = TRUE, limits = c(-1.5, 1.5), grepExclude = NULL, pal = NULL, labelMarkers = NULL, labelTop = 50, labelRows = FALSE, rowOrder = NULL, useSeqnames = NULL, returnMatrix = FALSE, force = FALSE, logFile = createLogFile("plotTrajectoryHeatmap") )
seTrajectory |
A |
varCutOff |
The "Variance Quantile Cutoff" to be used for identifying the top variable features across the given trajectory. Only features with a variance above the provided quantile will be retained. |
maxFeatures |
The maximum number of features, ordered by variance, to consider from |
scaleRows |
A boolean value that indicates whether row-wise z-scores should be computed on the matrix provided by |
limits |
A numeric vector of two numbers that represent the lower and upper limits of the heatmap color scheme. |
grepExclude |
A character vector or string that indicates the |
pal |
A custom continuous palette (see |
labelMarkers |
A character vector listing the |
labelTop |
A number indicating how many of the top N features, based on variance, in |
labelRows |
A boolean value that indicates whether all rows should be labeled on the side of the heatmap. |
rowOrder |
If wanting to set the order of rows to be plotted, the indices (integer or character correpsonding to rownmaes) can be provided here. |
useSeqnames |
A character vector that indicates which |
returnMatrix |
A boolean value that indicates whether the final heatmap matrix should be returned in lieu of plotting the actual heatmap. |
force |
If useSeqnames is longer than 1 if matrixClass is "Sparse.Assays.Matrix" to continue. This is not recommended because these matrices can be in different units. |
logFile |
The path to a file to be used for logging ArchR output. |
This function will plot a TSS enrichment plot for each sample. Cells in ArchRProject
are the only ones
used when making this plot.
plotTSSEnrichment( ArchRProj = NULL, groupBy = "Sample", chromSizes = getChromSizes(ArchRProj), TSS = getTSS(ArchRProj), flank = 2000, norm = 100, smooth = 11, pal = NULL, returnDF = FALSE, threads = getArchRThreads(), logFile = createLogFile("plotTSSEnrichment") )
plotTSSEnrichment( ArchRProj = NULL, groupBy = "Sample", chromSizes = getChromSizes(ArchRProj), TSS = getTSS(ArchRProj), flank = 2000, norm = 100, smooth = 11, pal = NULL, returnDF = FALSE, threads = getArchRThreads(), logFile = createLogFile("plotTSSEnrichment") )
ArchRProj |
An |
groupBy |
The name of the column in |
chromSizes |
A GRanges object of the chromosome lengths. See |
TSS |
A |
flank |
An integer that specifies how far in bp (+/-) to extend the TSS for plotting. |
norm |
An integer that specifies the number of base pairs from the ends of the flanks to be used for normalization.
For example if |
smooth |
An integer that indicates the smoothing window (in basepairs) to be applied to the TSS plot. |
pal |
A color palette representing the groups from groupBy in TSS plot. |
returnDF |
A boolean value that indicates whether to return a |
threads |
An integer specifying the number of threads to use for calculation. By default this uses the number of threads set by |
logFile |
The path to a file to be used for logging ArchR output. |
This function will Project Bulk ATAC-seq data into single cell subspace.
projectBulkATAC( ArchRProj = NULL, seATAC = NULL, reducedDims = "IterativeLSI", embedding = "UMAP", n = 250, verbose = TRUE, threads = getArchRThreads(), force = FALSE, logFile = createLogFile("projectBulkATAC") )
projectBulkATAC( ArchRProj = NULL, seATAC = NULL, reducedDims = "IterativeLSI", embedding = "UMAP", n = 250, verbose = TRUE, threads = getArchRThreads(), force = FALSE, logFile = createLogFile("projectBulkATAC") )
ArchRProj |
An |
seATAC |
A |
reducedDims |
A string specifying the name of the |
embedding |
A string specifying the name of the |
n |
An integer specifying the number of subsampled "pseudo single cells" per bulk sample. |
verbose |
A boolean value indicating whether to use verbose output during execution of this function. Can be set to FALSE for a cleaner output. |
threads |
The number of threads used for parallel execution |
force |
A boolean value indicating whether to force the projection of bulk ATAC data even if fewer than 25% of the features are present in the bulk ATAC data set. |
logFile |
The path to a file to be used for logging ArchR output. |
This function will recover an ArchRProject if it has broken sampleColData or cellColData due to different versions of bioconductor s4vectors.
recoverArchRProject(ArchRProj)
recoverArchRProject(ArchRProj)
ArchRProj |
An |
This function provides help in reformatting Fragment Files for reading in createArrowFiles. It will handle weird anomalies found that cause errors in reading tabix bgzip'd fragment files.
reformatFragmentFiles( fragmentFiles = NULL, checkChrPrefix = getArchRChrPrefix() )
reformatFragmentFiles( fragmentFiles = NULL, checkChrPrefix = getArchRChrPrefix() )
fragmentFiles |
A character vector the paths to fragment files to be reformatted |
checkChrPrefix |
A boolean value that determines whether seqnames should be checked to contain
"chr". IF set to |
This function will organize arrows and project output into a directory and save the ArchRProject for later usage.
saveArchRProject( ArchRProj = NULL, outputDirectory = getOutputDirectory(ArchRProj), overwrite = TRUE, load = TRUE, dropCells = FALSE, logFile = createLogFile("saveArchRProject"), threads = getArchRThreads() )
saveArchRProject( ArchRProj = NULL, outputDirectory = getOutputDirectory(ArchRProj), overwrite = TRUE, load = TRUE, dropCells = FALSE, logFile = createLogFile("saveArchRProject"), threads = getArchRThreads() )
ArchRProj |
An |
outputDirectory |
A directory path to save all ArchR output and |
overwrite |
When writing to outputDirectory, overwrite existing files with new files. |
dropCells |
A boolean indicating whether to drop cells that are not in |
logFile |
The path to a file to be used for logging ArchR output. |
threads |
The number of threads to use for parallel execution. |
This function will subset and ArchRProject by cells and save the output to a new directory and re-load the subsetted ArchRProject.
subsetArchRProject( ArchRProj = NULL, cells = getCellNames(ArchRProj), outputDirectory = "ArchRSubset", dropCells = TRUE, logFile = NULL, threads = getArchRThreads(), force = FALSE )
subsetArchRProject( ArchRProj = NULL, cells = getCellNames(ArchRProj), outputDirectory = "ArchRSubset", dropCells = TRUE, logFile = NULL, threads = getArchRThreads(), force = FALSE )
ArchRProj |
An |
cells |
A vector of cells to subset |
outputDirectory |
A directory path to save all ArchR output and the subsetted |
dropCells |
A boolean indicating whether to drop cells that are not in |
logFile |
The path to a file to be used for logging ArchR output. |
threads |
The number of threads to use for parallel execution. |
force |
If output directory exists overwrite. |
This function returns an ArchRProject object that contains a specified subset of cells.
subsetCells(ArchRProj = NULL, cellNames = NULL)
subsetCells(ArchRProj = NULL, cellNames = NULL)
ArchRProj |
An |
cellNames |
A character vector of |
This function returns a ggplot2 theme that is black borded with black font.
theme_ArchR( color = "black", textFamily = "sans", baseSize = 10, baseLineSize = 0.5, baseRectSize = 0.5, plotMarginCm = 1, legendPosition = "bottom", legendTextSize = 5, axisTickCm = 0.1, xText90 = FALSE, yText90 = FALSE )
theme_ArchR( color = "black", textFamily = "sans", baseSize = 10, baseLineSize = 0.5, baseRectSize = 0.5, plotMarginCm = 1, legendPosition = "bottom", legendTextSize = 5, axisTickCm = 0.1, xText90 = FALSE, yText90 = FALSE )
color |
The color to be used for text, lines, ticks, etc for the plot. |
textFamily |
The font default family to be used for the plot. |
baseSize |
The base font size (in points) to use in the plot. |
baseLineSize |
The base line width (in points) to be used throughout the plot. |
baseRectSize |
The base line width (in points) to use for rectangular boxes throughout the plot. |
plotMarginCm |
The width in centimeters of the whitespace margin around the plot. |
legendPosition |
The location to put the legend. Valid options are "bottom", "top", "left", and "right. |
legendTextSize |
The base text size (in points) for the legend text. |
axisTickCm |
The length in centimeters to be used for the axis ticks. |
xText90 |
A boolean value indicating whether the x-axis text should be rotated 90 degrees counterclockwise. |
yText90 |
A boolean value indicating whether the y-axis text should be rotated 90 degrees counterclockwise. |
This function will attempt to get or validate an input as a BSgenome.
validBSgenome(genome = NULL, masked = FALSE)
validBSgenome(genome = NULL, masked = FALSE)
genome |
This option must be one of the following: (i) the name of a valid ArchR-supported genome ("hg38", "hg19", or "mm10"),
(ii) the name of a |
masked |
A boolean describing whether or not to access the masked version of the selected genome. See |