Title: | Analyzing chromatin accessibility data in R |
---|---|
Description: | Tools for analyzing chromatin accessibility data in R. Bulk and single-cell ATAC-seq data are supported. |
Authors: | Fabian Mueller [aut, cre] |
Maintainer: | Fabian Mueller <[email protected]> |
License: | GPL-2 |
Version: | 0.9.21 |
Built: | 2024-10-26 05:49:25 UTC |
Source: | https://github.com/GreenleafLab/ChrAccR |
NOTE: '[' operator for DsATAC does not reorder samples or deal with index multiplicity
## S4 method for signature 'DsATAC,ANY,ANY,ANY' x[i]
## S4 method for signature 'DsATAC,ANY,ANY,ANY' x[i]
x |
DsATAC object |
i |
sample names or indices |
add a sample annotation column to the sample annotation table
## S4 method for signature 'DsAcc' addSampleAnnotCol(.object, name, vals)
## S4 method for signature 'DsAcc' addSampleAnnotCol(.object, name, vals)
.object |
|
name |
a name for the new column |
vals |
vector of values |
a new DsAcc
object with added sample annotation
Fabian Mueller
Agregate counts across a set of regions, e.g. for footprinting analysis
## S4 method for signature 'DsATAC' aggregateRegionCounts( .object, regionGr, samples = getSamples(.object), countAggrFun = "sum", norm = "tailMean", normTailW = 0.1, kmerBiasAdj = TRUE, k = 6, sampleCovg = NULL, sampleKmerFreqM = NULL, regionKmerFreqM = NULL, silent = FALSE )
## S4 method for signature 'DsATAC' aggregateRegionCounts( .object, regionGr, samples = getSamples(.object), countAggrFun = "sum", norm = "tailMean", normTailW = 0.1, kmerBiasAdj = TRUE, k = 6, sampleCovg = NULL, sampleKmerFreqM = NULL, regionKmerFreqM = NULL, silent = FALSE )
.object |
|
regionGr |
|
samples |
sample identifiers |
countAggrFun |
aggration function to be used for summarizing the insertion counts at each position. Possible values include |
norm |
method used for normalizing the resulting position-wise counts.
Currently only |
normTailW |
fraction of the region window to be used on each side of the window to be used for normalization if |
kmerBiasAdj |
compute Tn5 bias and use it to adjust the counts as in Corces, et al., Science, (2018) |
k |
length of the kmer to be used for sequence bias correction. Only relevant if |
sampleCovg |
to save compute time, a sample coverage track list (as computed by |
sampleKmerFreqM |
to save compute time, a matrix of sample kmer frequency at insertion sites (as computed by |
regionKmerFreqM |
to save compute time, a matrix of region kmer frequencies (kmers X window width).
Must have the same number of rows as the specified (or computed) |
silent |
limit log messages |
a data.frame
containing position-wise counts (raw, normalized and optionally Tn5-bias-corrected) for each sample
Fabian Mueller
Performs peak calling based on insertion sites
## S4 method for signature 'DsATAC' callPeaks( .object, samples = getSamples(.object), method = "macs2_summit_fw_no", methodOpts = list(macs2.exec = "macs2", macs2.params = c("--shift", "-75", "--extsize", "150", "-p", "0.01"), fixedWidth = 250, genomeSizesFromObject = FALSE) )
## S4 method for signature 'DsATAC' callPeaks( .object, samples = getSamples(.object), method = "macs2_summit_fw_no", methodOpts = list(macs2.exec = "macs2", macs2.params = c("--shift", "-75", "--extsize", "150", "-p", "0.01"), fixedWidth = 250, genomeSizesFromObject = FALSE) )
.object |
|
samples |
sample identifiers for which peak calling is performed |
method |
peak calling method. Currently only |
methodOpts |
list of other options depending on the |
The following methods are currently supported
'macs2_summit_fw_no'
Fixed-width, non-overlapping peaks based on MACS2 summit calls:
1. Call peaks using system call to MACS2. You can specify the MACS2 executable in methodOpts$macs2.exec
.
2. Identify peak summits
3. extend peak summits on each side by a number of basepairs (specified in methodOpts$fixedWidth
; default: 250bp) to obtain unified peak widths
4. Find non-overlapping peaks by taking the peak with the best MACS2 score from each set of partially overlapping peaks
GRangesList
of peak coordinates for each sample
Fabian Mueller
Tools for analyzing chromatin accessibility data in R. Currently supports ATAC-seq and NOMe-seq data analysis.
clean the system mory by invoking the garbage collector
cleanMem(iter.gc = 1L)
cleanMem(iter.gc = 1L)
iter.gc |
number of times to invoke the garbage collector |
nothing of particular interest
Fabian Mueller
Collapse TF motif matrix of arbitrary values by aggregating values over motif cluster assignment
collapseMotifMatrix( X, motifClust = NULL, assembly = "hg38", motifs = "jaspar", aggrFun = mean )
collapseMotifMatrix( X, motifClust = NULL, assembly = "hg38", motifs = "jaspar", aggrFun = mean )
X |
matrix to be collapsed. Must have the motif names as rownames. E.g. matrix obtained by |
motifClust |
optional: motif clustering computed by |
assembly |
genome assembly for which the motif clustering should be retrieved. Only required if for automatic mode (i.e. |
motifs |
a character string specifying the motif set (currently only "jaspar" is supported) |
aggrFun |
function to use to aggregate values |
list containing two elements: X
: Collapsed matrix containing motif cluster aggregated values;
clustering
: clustering result used for aggregation (seegetMotifClustering
for details)
Fabian Mueller
Performs z-score normalization on the columns of a matrix. (Basically a wrapper around matrixStats
)
colZscores(X, na.rm = FALSE)
colZscores(X, na.rm = FALSE)
X |
input matrix |
na.rm |
should NAs be omitted? |
z-score normalized matrix
Fabian Mueller
computes differential accessibility for NOMe datasets using RnBeads
functionality
computeDiffAcc.rnb.nome( dsn, cmpCols, regionTypes = getRegionTypes(dsn), covgThres = 5L, allPairs = TRUE, adjPairCols = NULL, adjCols = NULL, skipSites = FALSE, disk.dump = rnb.getOption("disk.dump.big.matrices"), disk.dump.dir = tempfile(pattern = "diffMethTables_"), ... )
computeDiffAcc.rnb.nome( dsn, cmpCols, regionTypes = getRegionTypes(dsn), covgThres = 5L, allPairs = TRUE, adjPairCols = NULL, adjCols = NULL, skipSites = FALSE, disk.dump = rnb.getOption("disk.dump.big.matrices"), disk.dump.dir = tempfile(pattern = "diffMethTables_"), ... )
dsn |
|
cmpCols |
column names of the sample annotation of the dataset that will be used for comparison |
regionTypes |
which region types should be processed for differential analysis. |
covgThres |
coverage threshold for computing the summary statistics. See |
allPairs |
Logical indicating whether all pairwise comparisons should be conducted, when more than 2 groups are present |
adjPairCols |
argument passed on to |
adjCols |
not used yet |
skipSites |
flag indicating whether differential methylation in regions should be computed directly and not from sites. This leads to skipping of site-specific differential methylation |
disk.dump |
Flag indicating whether the resulting differential methylation object should be file backed, ie.e the matrices dumped to disk |
disk.dump.dir |
disk location for file backing of the resulting differential methylation object. Only meaningful if |
... |
arguments passed on to binary differential methylation calling. See |
an RnBDiffMeth
object. See class description for details.
Fabian Mueller
Fabian Mueller
Create a report summarizing differential accessibility analysis
## S4 method for signature 'DsATAC' createReport_differential(.object, reportDir)
## S4 method for signature 'DsATAC' createReport_differential(.object, reportDir)
.object |
|
reportDir |
directory in which the report will be created |
chromVarObj |
[optional] pre-computed result of a call to |
(invisible) muReportR::Report
object containing the report
Fabian Mueller
## Not run: dsa <- ChrAccRex::loadExample("dsAtac_ia_example") reportDir <- file.path(".", "ChrAccR_reports") setConfigElement("regionTypes", setdiff(getRegionTypes(dsa), c("promoters_gc_protein_coding", "t10k"))) setConfigElement("differentialColumns", c("stimulus", "cellType")) # adjust for the donor annotation in the differential test setConfigElement("differentialAdjColumns", c("donor")) # create the report createReport_differential(dsa, reportDir) ## End(Not run)
## Not run: dsa <- ChrAccRex::loadExample("dsAtac_ia_example") reportDir <- file.path(".", "ChrAccR_reports") setConfigElement("regionTypes", setdiff(getRegionTypes(dsa), c("promoters_gc_protein_coding", "t10k"))) setConfigElement("differentialColumns", c("stimulus", "cellType")) # adjust for the donor annotation in the differential test setConfigElement("differentialAdjColumns", c("donor")) # create the report createReport_differential(dsa, reportDir) ## End(Not run)
Create a report summarizing exploratory analyses of an accessibility dataset
## S4 method for signature 'DsATAC' createReport_exploratory( .object, reportDir, chromVarObj = NULL, itLsiObj = NULL, geneActSe = NULL )
## S4 method for signature 'DsATAC' createReport_exploratory( .object, reportDir, chromVarObj = NULL, itLsiObj = NULL, geneActSe = NULL )
.object |
|
reportDir |
directory in which the report will be created |
chromVarObj |
[optional] pre-computed result of a call to |
itLsiObj |
[for single-cell only; optional] pre-computed result of a call to |
geneActSe |
[for single-cell only; optional] pre-computed result of a call to |
(invisible) muReportR::Report
object containing the report
Fabian Mueller
## Not run: dsa <- ChrAccRex::loadExample("dsAtac_ia_example") dsa_qnorm <- transformCounts(dsa, method="quantile") setConfigElement("annotationColumns", c("cellType", "donor", "stimulus")) setConfigElement("regionTypes", setdiff(getRegionTypes(dsa), c("promoters_gc_protein_coding", "t10k"))) reportDir <- file.path(".", "ChrAccR_reports") createReport_exploratory(dsa_qnorm, reportDir) ## End(Not run)
## Not run: dsa <- ChrAccRex::loadExample("dsAtac_ia_example") dsa_qnorm <- transformCounts(dsa, method="quantile") setConfigElement("annotationColumns", c("cellType", "donor", "stimulus")) setConfigElement("regionTypes", setdiff(getRegionTypes(dsa), c("promoters_gc_protein_coding", "t10k"))) reportDir <- file.path(".", "ChrAccR_reports") createReport_exploratory(dsa_qnorm, reportDir) ## End(Not run)
Create a report summarizing steps and statistics
## S4 method for signature 'DsATAC' createReport_filtering(.object, reportDir, unfilteredObj, filterStats = NULL)
## S4 method for signature 'DsATAC' createReport_filtering(.object, reportDir, unfilteredObj, filterStats = NULL)
.object |
filtered |
reportDir |
directory in which the report will be created |
unfilteredObj |
unfiltered |
filterStats |
filtering statistics as output by |
(invisible) muReportR::Report
object containing the report
Fabian Mueller
Create a report summarizing normalization
## S4 method for signature 'DsATAC' createReport_normalization(.object, reportDir, unnormObj)
## S4 method for signature 'DsATAC' createReport_normalization(.object, reportDir, unnormObj)
.object |
normalized |
reportDir |
directory in which the report will be created |
unnormObj |
unnormalized |
(invisible) muReportR::Report
object containing the report
Fabian Mueller
Create a report summarizing an accessibility dataset
## S4 method for signature 'DsATAC' createReport_summary(.object, reportDir)
## S4 method for signature 'DsATAC' createReport_summary(.object, reportDir)
.object |
|
reportDir |
directory in which the report will be created |
(invisible) muReportR::Report
object containing the report
Fabian Mueller
## Not run: dsa <- ChrAccRex::loadExample("dsAtac_ia_example") reportDir <- file.path(".", "ChrAccR_reports") createReport_summary(dsa, reportDir) ## End(Not run)
## Not run: dsa <- ChrAccRex::loadExample("dsAtac_ia_example") reportDir <- file.path(".", "ChrAccR_reports") createReport_summary(dsa, reportDir) ## End(Not run)
Retrieve dimension reduction embedding and object using UMAP
## S4 method for signature 'DsATACsc' dimRed_UMAP( .object, regions, tfidf = TRUE, pcs = 1:50, normPcs = FALSE, umapParams = list(distMethod = "euclidean", min_dist = 0.5, n_neighbors = 25), rmDepthCor = 1 )
## S4 method for signature 'DsATACsc' dimRed_UMAP( .object, regions, tfidf = TRUE, pcs = 1:50, normPcs = FALSE, umapParams = list(distMethod = "euclidean", min_dist = 0.5, n_neighbors = 25), rmDepthCor = 1 )
.object |
|
regions |
character string specifying the region type to retrieve the UMAP coordinates from.
Alternatively, a |
tfidf |
normalize the counts using TF-IDF transformation |
pcs |
components to use to compute the SVD |
normPcs |
flag indicating whether to apply z-score normalization to PCs for each cell |
umapParams |
parameters to compute UMAP coordinates (passed on to
|
rmDepthCor |
correlation cutoff to be used to discard principal components associated with fragment depth (all iterationa). By default (value >=1) no filtering will be applied. |
The output object includes the final singular values/principal components (result$pcaCoord
), the low-dimensional coordinates (result$umapCoord
) as well as region set that provided the basis for the dimension reduction (result$regionGr
).
an S3
object containing dimensionality reduction results
Fabian Mueller
A class for accessibility datasets
coord
List of coordinates (GRanges objects) for accessibility summarized regions.
sampleAnnot
Sample annotation Table
genome
Genome assembly
diskDump
Flag indicating whether large matrices and objects will be kept on disk rather than in main memory.
pkgVersion
Version number of the ChrAccR package that created the object
Fabian Mueller
A class for storing ATAC-seq accessibility data
inherits from DsAcc
fragments
GRanges
object storing sequencing fragments. Alternativily pointers to files in which this data isf stored
as R data object
counts
List of count matrices for each summarized region type (dimension: regions X samples).
Depending on the settings for the slots diskDump
and sparseCounts
, the matrices are
either (a) regular matrices, (b) HDF5Array
/DelayedArray
or (c) sparse matrices.
countTransform
list of character vectors specifying which transformations have been applied to the count matrices
sparseCounts
Flag indicating whether count data will be stored as sparse matrices rather than regular matrices
diskDump.fragments
Flag indicating whether fragment data will be kept on disk rather than in main memory.
Fabian Mueller
Create a DsATAC dataset from multiple input bam files
DsATAC.bam( sampleAnnot, bamFiles, genome, regionSets = NULL, sampleIdCol = NULL, diskDump = FALSE, keepInsertionInfo = TRUE, pairedEnd = TRUE )
DsATAC.bam( sampleAnnot, bamFiles, genome, regionSets = NULL, sampleIdCol = NULL, diskDump = FALSE, keepInsertionInfo = TRUE, pairedEnd = TRUE )
sampleAnnot |
data.frame specifying the sample annotation table |
bamFiles |
either a character vector of the same length as sampleAnnot has rows, specifying the file paths of the bam files for each
sample or a single character string specifying the column name in |
genome |
genome assembly |
regionSets |
a list of GRanges objects which contain region sets over which count data will be aggregated |
sampleIdCol |
column name in the sample annotation table containing unique sample identifiers. If |
diskDump |
should large data objects (count matrices, fragment data, ...) be disk-backed to save main memory |
keepInsertionInfo |
flag indicating whether to maintain the insertion information in the resulting object. Only relevant when |
pairedEnd |
is the input data paired-end? Only relevant when |
DsATAC
object
Fabian Mueller
## Not run: # download and unzip the dataset datasetUrl <- "https://s3.amazonaws.com/muellerf/data/ChrAccR/data/tutorial/tcells.zip" downFn <- "tcells.zip" download.file(datasetUrl, downFn) unzip(downFn, exdir=".") # prepare the sample annotation table sampleAnnotFn <- file.path("tcells", "samples.tsv") bamDir <- file.path("tcells", "bam") sampleAnnot <- read.table(sampleAnnotFn, sep="\t", header=TRUE, stringsAsFactors=FALSE) # add a column that ChrAccR can use to find the correct bam file for each sample sampleAnnot[,"bamFilenameFull"] <- file.path(bamDir, sampleAnnot[,"bamFilename"]) # prepare the dataset dsa_fromBam <- DsATAC.bam(sampleAnnot, "bamFilenameFull", "hg38", regionSets=NULL, sampleIdCol="sampleId") ## End(Not run)
## Not run: # download and unzip the dataset datasetUrl <- "https://s3.amazonaws.com/muellerf/data/ChrAccR/data/tutorial/tcells.zip" downFn <- "tcells.zip" download.file(datasetUrl, downFn) unzip(downFn, exdir=".") # prepare the sample annotation table sampleAnnotFn <- file.path("tcells", "samples.tsv") bamDir <- file.path("tcells", "bam") sampleAnnot <- read.table(sampleAnnotFn, sep="\t", header=TRUE, stringsAsFactors=FALSE) # add a column that ChrAccR can use to find the correct bam file for each sample sampleAnnot[,"bamFilenameFull"] <- file.path(bamDir, sampleAnnot[,"bamFilename"]) # prepare the dataset dsa_fromBam <- DsATAC.bam(sampleAnnot, "bamFilenameFull", "hg38", regionSets=NULL, sampleIdCol="sampleId") ## End(Not run)
Create a DsATAC dataset from multiple input files output by 10x cellranger
DsATAC.cellranger( sampleAnnot, sampleDirPrefixCol, genome, dataDir = "", regionSets = NULL, addPeakRegions = TRUE, sampleIdCol = sampleDirPrefixCol, keepInsertionInfo = FALSE, diskDump.fragments = keepInsertionInfo )
DsATAC.cellranger( sampleAnnot, sampleDirPrefixCol, genome, dataDir = "", regionSets = NULL, addPeakRegions = TRUE, sampleIdCol = sampleDirPrefixCol, keepInsertionInfo = FALSE, diskDump.fragments = keepInsertionInfo )
sampleAnnot |
data.frame specifying the sample annotation table |
sampleDirPrefixCol |
column name specifying the directory prefix for each sample in the sample annotation table |
genome |
genome assembly |
dataDir |
directory where the files are located |
regionSets |
a list of GRanges objects which contain region sets over which count data will be aggregated |
addPeakRegions |
should a merged set of peaks be created as one of the region sets (merged, non-overlapping peaks of width=500bp from the peaks of individual samples) |
sampleIdCol |
column name or index in the sample annotation table containing unique sample identifiers |
keepInsertionInfo |
flag indicating whether to maintain the insertion information in the resulting object. |
diskDump.fragments |
Keep fragment coordinates stored on disk rather than in main memory. This saves memory, but increases runtime and I/O. |
DsATAC
object
Fabian Mueller
Create a DsATAC dataset from multiple input fragment bed files
DsATAC.fragmentBed( sampleAnnot, bedFiles, genome, regionSets = NULL, sampleIdCol = NULL, diskDump = FALSE, keepInsertionInfo = TRUE )
DsATAC.fragmentBed( sampleAnnot, bedFiles, genome, regionSets = NULL, sampleIdCol = NULL, diskDump = FALSE, keepInsertionInfo = TRUE )
sampleAnnot |
data.frame specifying the sample annotation table |
bedFiles |
either a character vector of the same length as sampleAnnot has rows, specifying the file paths of the bed files for each
sample or a single character string specifying the column name in |
genome |
genome assembly |
regionSets |
a list of GRanges objects which contain region sets over which count data will be aggregated |
sampleIdCol |
column name in the sample annotation table containing unique sample identifiers. If |
diskDump |
should large data objects (count matrices, fragment data, ...) be disk-backed to save main memory |
keepInsertionInfo |
flag indicating whether to maintain the insertion information in the resulting object. Only relevant when |
DsATAC
object
Fabian Mueller
Create a DsATAC dataset from multiple input files output by snakeATAC
DsATAC.snakeATAC( sampleAnnot, filePrefixCol, genome, dataDir = "", regionSets = NULL, sampleIdCol = filePrefixCol, type = "insBam", diskDump = FALSE, keepInsertionInfo = TRUE, bySample = FALSE, pairedEnd = TRUE )
DsATAC.snakeATAC( sampleAnnot, filePrefixCol, genome, dataDir = "", regionSets = NULL, sampleIdCol = filePrefixCol, type = "insBam", diskDump = FALSE, keepInsertionInfo = TRUE, bySample = FALSE, pairedEnd = TRUE )
sampleAnnot |
data.frame specifying the sample annotation table |
filePrefixCol |
column name specifying the file prefix for each sample in the sample annotation table. If |
genome |
genome assembly |
dataDir |
directory where the files are located. If it is the empty character ( |
regionSets |
a list of GRanges objects which contain region sets over which count data will be aggregated |
sampleIdCol |
column name or index in the sample annotation table containing unique sample identifiers |
type |
input data type. Currently only "insBed" (insertion beds), "insBam" (insertion info inferred from bam files (aligned reads); default) and "bam" (aligned reads) are supported |
diskDump |
should large data objects (count matrices, fragment data, ...) be disk-backed to save main memory |
keepInsertionInfo |
flag indicating whether to maintain the insertion information in the resulting object. Only relevant when |
bySample |
process sample-by-sample to save memory (currently only has an effect for |
pairedEnd |
is the input data paired-end? Only relevant when |
DsATAC
object
Fabian Mueller
Create a DsATACsc dataset from an ArchR
project
DsATACsc.archr( ap, useTiling = TRUE, keepInsertionInfo = FALSE, diskDump.fragments = keepInsertionInfo )
DsATACsc.archr( ap, useTiling = TRUE, keepInsertionInfo = FALSE, diskDump.fragments = keepInsertionInfo )
ap |
|
useTiling |
flag indicating whether to use tiling information from the ArchR project |
keepInsertionInfo |
flag indicating whether to maintain the insertion information in the resulting object. |
diskDump.fragments |
Keep fragment coordinates stored on disk rather than in main memory. This saves memory, but increases runtime and I/O. |
DsATACsc
object
Fabian Mueller
Create a DsATACsc dataset from multiple input fragment files
DsATACsc.fragments( sampleAnnot, fragmentFiles, genome, regionSets = NULL, sampleIdCol = NULL, minFragsPerBarcode = 500L, maxFragsPerBarcode = Inf, cellAnnot = NULL, keepInsertionInfo = FALSE, diskDump.fragments = keepInsertionInfo, cellQcStats = TRUE )
DsATACsc.fragments( sampleAnnot, fragmentFiles, genome, regionSets = NULL, sampleIdCol = NULL, minFragsPerBarcode = 500L, maxFragsPerBarcode = Inf, cellAnnot = NULL, keepInsertionInfo = FALSE, diskDump.fragments = keepInsertionInfo, cellQcStats = TRUE )
sampleAnnot |
data.frame specifying the sample annotation table |
fragmentFiles |
vector of fragment files or the column name in the sample annotation table containing thse file names. fragment files must be tab-separated with columns "chrom", "chromStart", "chromEnd", "barcode" and "duplicateCount" and must not contain a header line |
genome |
genome assembly |
regionSets |
a list of GRanges objects which contain region sets over which count data will be aggregated |
sampleIdCol |
column name or index in the sample annotation table containing unique sample identifiers |
minFragsPerBarcode |
minimum number of fragments required for a barcode to be kept. [Only relevant if |
maxFragsPerBarcode |
maximum number of fragments per barcode. Only barcodes with fewer fragments will be kept. [Only relevant if |
cellAnnot |
(optional) annotation table of all cells in the dataset. Must contain a |
keepInsertionInfo |
flag indicating whether to maintain the insertion information in the resulting object. |
diskDump.fragments |
Keep fragment coordinates stored on disk rather than in main memory. This saves memory, but increases runtime and I/O. |
cellQcStats |
flag indicating whether to compute additional cell QC statistics (TSS enrichment, etc.). |
DsATACsc
object
Fabian Mueller
A class for storing NOMe accessibility data
meth
List of GC methylation for sites and summarized regions.
covg
List of GC read coverage for sites and summarized regions.
Fabian Mueller
Create a DsNOMe dataset from multiple input files in bisSNP output format
DsNOMe.bisSNP(inputFns, sampleAnnot, genome, sampleIds = rownames(sampleAnnot))
DsNOMe.bisSNP(inputFns, sampleAnnot, genome, sampleIds = rownames(sampleAnnot))
inputFns |
a NAMED vector of input file names |
sampleAnnot |
data.frame specifying the sample annotation table |
genome |
genome assembly |
sampleIds |
character vector of sample names |
DsNOMe
object
Fabian Mueller
export count data as genome tracks (e.g. for visualization in the browser)
## S4 method for signature 'DsATAC' exportCountTracks( .object, type, outDir, formats = c("bed", "igv"), groupBy = NULL )
## S4 method for signature 'DsATAC' exportCountTracks( .object, type, outDir, formats = c("bed", "igv"), groupBy = NULL )
.object |
|
type |
character string specifying the region type |
outDir |
output directory. Must be existing. |
formats |
browser format. Currently only bed and "igv" are supported |
groupBy |
a column in the sample annotation table to group by (the mean will be computed) |
nothing of particular interest
Fabian Mueller
scan a directory containing fastq files and create a sample annotation table from the file names
fastqDirToTable(fqDir, tabFn = NULL, pat = "")
fastqDirToTable(fqDir, tabFn = NULL, pat = "")
fqDir |
string specifying a directory with fastq files |
tabFn |
filename specifying the where the table should be written to. If NULL (default), the table will just be returned as data frame |
pat |
(optional) regular expression that fastq file names have to pass |
data frame of parsed annotation
Fabian Mueller
Filter out regions based on a GRanges object
## S4 method for signature 'DsATAC' filterByGRanges(.object, gr, method = "black")
## S4 method for signature 'DsATAC' filterByGRanges(.object, gr, method = "black")
.object |
|
gr |
|
method |
character string specifying the method. Can be |
a new, filtered DsATAC
object
Fabian Mueller
Filter out cells with low TSS enrichment
## S4 method for signature 'DsATACsc' filterCellsTssEnrichment(.object, cutoff = 6)
## S4 method for signature 'DsATACsc' filterCellsTssEnrichment(.object, cutoff = 6)
.object |
|
cutoff |
TSS enrichment cutoff to filter cells |
modified DsATAC
object without filtered cells
Fabian Mueller
Filter out regions based on chromosome list
## S4 method for signature 'DsATAC' filterChroms(.object, exclChrom = c("chrX", "chrY", "chrM"))
## S4 method for signature 'DsATAC' filterChroms(.object, exclChrom = c("chrX", "chrY", "chrM"))
.object |
|
exclChrom |
vector of chromosome names to filter out |
a new DsATAC
object filtered for chromosomes
Fabian Mueller
Filter regions with low read counts
## S4 method for signature 'DsATAC' filterLowCovg( .object, thresh = 1L, reqSamples = 0.75, regionTypes = getRegionTypes(.object) )
## S4 method for signature 'DsATAC' filterLowCovg( .object, thresh = 1L, reqSamples = 0.75, regionTypes = getRegionTypes(.object) )
.object |
|
thresh |
regions with read counts below this threshold will be considered lowly covered regions (default: regions with fewer than 1 read will be discarded) |
reqSamples |
the percentile of samples required to meet or exceed the threshold in order for a region to be retained. must be in the interval [0, 1) (default: 0.75 = 75 percent) |
regionTypes |
character vector specifying the names of the region types to which filtering should be applied (default: all region types) |
a new DsATAC
object with low coverage regions removed
Fabian Mueller
get gene annotation for a GRanges
object by linking to the nearest gene
findNearestGeneForGr(gr, geneGr = NULL, maxDist = 1e+05)
findNearestGeneForGr(gr, geneGr = NULL, maxDist = 1e+05)
gr |
|
geneGr |
gene annotation from which to pull the annotation. Can be |
maxDist |
maximum distance for matching to nearest gene |
data.frame
containing information on the nearest gene for each element in gr
find the first occurrence of a name in a vector of strings
findOrderedNames(x, orderedNames, exact = TRUE, ignore.case = FALSE)
findOrderedNames(x, orderedNames, exact = TRUE, ignore.case = FALSE)
x |
character vector in which the name should be found |
orderedNames |
vector of names that will be queried. This method will go through them one by one and find the first occurrence in the order of the orderedNames provided |
exact |
should only be exact matches be reported |
ignore.case |
should casing be ignored |
the string that matches the first occurrence in the order of orderedNames
. Returns NA
if no match is found.
Fabian Mueller
Given a GAlignmentPairs
or GAlignments
object, return a GRanges
object containing the fragment (or insertion site for single-end data)
getATACfragments(ga, offsetTn = TRUE)
getATACfragments(ga, offsetTn = TRUE)
ga |
|
offsetTn |
apply offsets for Tn5 dimer cut site (+4 bp on genomic + strand; -4 bp on genomic - strand) |
GRanges
object containing derived insertions. For paired-end data (recommended), the width of the resulting ranges corresponds to the insert size
for single-end data, the width is set to 1bp
Fabian Mueller
retrieve the corresponding ChrAccRAnnotation package for a given genome
getChrAccRAnnotationPackage(genome)
getChrAccRAnnotationPackage(genome)
genome |
character string specifying the genome |
name of the annotation package, if installed. NULL
and a warning if the package is not installed
Fabian Mueller
Compute chromVar deviations
## S4 method for signature 'DsATAC' getChromVarDev(.object, type, motifs = "jaspar")
## S4 method for signature 'DsATAC' getChromVarDev(.object, type, motifs = "jaspar")
.object |
|
type |
character string specifying the region type |
motifs |
either a character string (currently only "jaspar" and sets contained in |
Deviations object as returned by chromVAR::computeDeviations
Fabian Mueller
Obtain Cicero gene activities
## S4 method for signature 'DsATAC' getCiceroGeneActivities( .object, regionType, promoterGr = NULL, maxDist = 250000L, corCutOff = 0.35, dimRedCoord = NULL, knn.k = 50 )
## S4 method for signature 'DsATAC' getCiceroGeneActivities( .object, regionType, promoterGr = NULL, maxDist = 250000L, corCutOff = 0.35, dimRedCoord = NULL, knn.k = 50 )
.object |
|
regionType |
region type of regions that will be linked to the promoter (typical some sort of peak annotation) |
promoterGr |
|
maxDist |
maximum distance to consider for region-region interactions |
corCutOff |
cutoff of correlation coefficients (Pearson) to consider for region-region interactions |
dimRedCoord |
matrix of reduced dimension coordinates. must have coordinates for all samples/cells in the dataset |
knn.k |
parameter k for Cicero's k-nearest-neighbor method |
an SummarizedExperiment
object containing gene activities for all cells/samples in the dataset
Fabian Mueller
retrieve the comparison information for an DsAcc object. Analogous to RnBeads::get.comparison.info
getComparisonInfo( dsa, cmpNames = NULL, regionTypes = getRegionTypes(dsa), allPairs = TRUE, adjPairCols = NULL, minGrpSize = 1L, maxGrpCount = NULL )
getComparisonInfo( dsa, cmpNames = NULL, regionTypes = getRegionTypes(dsa), allPairs = TRUE, adjPairCols = NULL, minGrpSize = 1L, maxGrpCount = NULL )
dsa |
|
cmpNames |
column names of the sample annotation of the dataset that will be used for comparison |
regionTypes |
which region types should be processed for differential analysis. |
allPairs |
Logical indicating whether all pairwise comparisons should be conducted, when more than 2 groups are present |
adjPairCols |
argument passed on to |
minGrpSize |
Minimum number of samples required to form a group in comparison |
maxGrpCount |
maximum number of groups to consider for comparisons |
a list containing one element for each comparison to be conducted. Each element is again a list containing:
comparison
the name of the comparison
pheno.colname
the column name of the sample annotation table the comparison is derived from
group.names
the names of the two groups being compared
group.inds
the sample indices of the samples belonging to the two groups
paired
flag indicating whether paired analysis is conducted
adj.sva
flag indicating whether adjustment for SVA is conducted
adj.celltype
flag indicating whether adjustment for cell type is conducted
adjustment.table
the covariate adjustment table. NULL
if the comparison is not adjusted
region.types
the region types applicable to the analysis
Fabian Mueller
Retrieve a table describing pairwise comparisons on a DsAcc
object
## S4 method for signature 'DsAcc' getComparisonTable( .object, cols = NULL, cols1vAll = NULL, compNames = NULL, minGroupSize = 2L, maxGroupCount = length(.object) - 1 )
## S4 method for signature 'DsAcc' getComparisonTable( .object, cols = NULL, cols1vAll = NULL, compNames = NULL, minGroupSize = 2L, maxGroupCount = length(.object) - 1 )
.object |
|
cols |
column names in the sample annotation table to consider for pairwise comparisons |
cols1vAll |
column names in the sample annotation table to consider for 1-vs-all comparisons |
compNames |
vector of character strings specifying a fixed comparison names to be parsed (format "$GRP1_NAME vs $GRP1_NAME [$ANNOTATION_COLUMN]") |
minGroupSize |
Minimum size of a group to be used in comparison. Affects the annotation columns that will be used for comparisons. |
maxGroupCount |
Maximum number of groups for a column to be considered for comparison. |
a data.frame
with comparison inforamtion containing columns for the comparison name (compName
),
column in the annotation table (compCol
)
and group names for the two groups in the comparison (grp1Name, grp2Name
),
Fabian Mueller
Get the value for a configuration item
getConfigElement(name)
getConfigElement(name)
name |
name of the config item |
the value of the config item
Fabian Mueller
Retrieve a consensus peak set from a set of peak lists
getConsensusPeakSet( grl, mode = "no_by_score", grouping = NULL, groupAgreePerc = 1, groupConsSelect = FALSE, scoreCol = "score", keepOvInfo = FALSE )
getConsensusPeakSet( grl, mode = "no_by_score", grouping = NULL, groupAgreePerc = 1, groupConsSelect = FALSE, scoreCol = "score", keepOvInfo = FALSE )
grl |
list or |
mode |
consensus mode. Currently only "no_by_score" (non-overlapping; i.e. select the peak with the highest score from each set of overlapping peaks) is supported. |
grouping |
vector of group memberships (numeric, character or factor). must be of the same length as |
groupAgreePerc |
percentile of members in a group required to contain a peak in order to keep it. E.g. a value of 1 (default) means that all replicates in a group are required to contain that peak in order to keep it. |
groupConsSelect |
if set, the peak set will also be checked for consistency, i.e. in order to retain a peak
it has to be consistently be present or absent in each group (as specified in |
scoreCol |
name of the column to be used as score in the |
keepOvInfo |
keep annotation columns in the elementMetadata of the results specifying whether a consensus peak overlaps with a peak in each sample |
GRanges
object the containing consensus peak set
Fabian Mueller
Return coordinates of sites/regions in a dataset
## S4 method for signature 'DsAcc' getCoord(.object, type)
## S4 method for signature 'DsAcc' getCoord(.object, type)
.object |
|
type |
character string specifying the rgion type or |
GRanges
object containing coordinates for covered
sites/regions
Fabian Mueller
Return table of count values
## S4 method for signature 'DsATAC' getCounts( .object, type, i = NULL, j = NULL, asMatrix = TRUE, naIsZero = TRUE, allowSparseMatrix = FALSE )
## S4 method for signature 'DsATAC' getCounts( .object, type, i = NULL, j = NULL, asMatrix = TRUE, naIsZero = TRUE, allowSparseMatrix = FALSE )
.object |
|
type |
character string specifying the region type |
i |
(optional) row (region) indices |
j |
(optional) column (sample) indices |
asMatrix |
return a matrix object instead of the internal representation |
naIsZero |
should |
allowSparseMatrix |
if |
Matrix containing counts for each region and sample
Fabian Mueller
Return a SummarizedExperiment
object of count values
## S4 method for signature 'DsATAC' getCountsSE(.object, type, naIsZero = TRUE)
## S4 method for signature 'DsATAC' getCountsSE(.object, type, naIsZero = TRUE)
.object |
|
type |
character string specifying the region type |
naIsZero |
should |
SummarizedExperiment
containing counts for each region and sample
Fabian Mueller
## Not run: dsa <- ChrAccRex::loadExample("dsAtac_ia_example") se <- getCountsSE(dsa, "IA_prog_peaks") se ## End(Not run)
## Not run: dsa <- ChrAccRex::loadExample("dsAtac_ia_example") se <- getCountsSE(dsa, "IA_prog_peaks") se ## End(Not run)
Return a list of genome-wide coverage from insertion sites
## S4 method for signature 'DsATAC' getCoverage(.object, samples = getSamples(.object))
## S4 method for signature 'DsATAC' getCoverage(.object, samples = getSamples(.object))
.object |
|
samples |
sample identifiers |
list
of Rle
objects of sample coverage tracks
Fabian Mueller
Return table of read coverage values
## S4 method for signature 'DsNOMe' getCovg(.object, type = "sites", asMatrix = FALSE)
## S4 method for signature 'DsNOMe' getCovg(.object, type = "sites", asMatrix = FALSE)
.object |
|
type |
character string specifying the rgion type or |
asMatrix |
return a matrix instead of a |
data.table
or matrix
containing read coverage for
each site/region and sample
Fabian Mueller
Retrieve a differential expression dataset computed with DESeq2
## S4 method for signature 'DsATAC' getDESeq2Dataset(.object, regionType, designCols = NULL, compTab = NULL, ...)
## S4 method for signature 'DsATAC' getDESeq2Dataset(.object, regionType, designCols = NULL, compTab = NULL, ...)
.object |
|
regionType |
character string specifying the region type |
designCols |
column names in the sample annotation potentially used to create the design matrix |
compTab |
if design columns are not specified, you can specify a comparison table directly. These comparison tables can be obtained by |
... |
parameters passed on to |
DESeqDataSet
as returned by DESeq2::DESeq
Fabian Mueller
## Not run: dsa <- ChrAccRex::loadExample("dsAtac_ia_example") dds <- getDESeq2Dataset(dsa, "IA_prog_peaks", designCols=c("donor", "stimulus", "cellType")) dds ## End(Not run)
## Not run: dsa <- ChrAccRex::loadExample("dsAtac_ia_example") dds <- getDESeq2Dataset(dsa, "IA_prog_peaks", designCols=c("donor", "stimulus", "cellType")) dds ## End(Not run)
Compute differential accessibility
## S4 method for signature 'DsATAC' getDiffAcc( .object, regionType, comparisonCol, grp1Name = NULL, grp2Name = NULL, adjustCols = character(0), method = "DESeq2", diffObj = NULL )
## S4 method for signature 'DsATAC' getDiffAcc( .object, regionType, comparisonCol, grp1Name = NULL, grp2Name = NULL, adjustCols = character(0), method = "DESeq2", diffObj = NULL )
.object |
|
regionType |
character string specifying the region type |
comparisonCol |
column name in the sample annotation table to base the comparison on |
grp1Name |
name of the first group in the comparison. if not specified, it will be taken as the first factor level specified in the
sample annotation table in |
grp2Name |
name of the second group (reference) in the comparison. if not specified, it will be taken as the first factor level specified in the
sample annotation table in |
adjustCols |
column names in the sample annotation potentially used to create the design matrix |
method |
Method for determining differential accessibility. Currently only |
diffObj |
optional differential object to avoid computing it for each comparison and thus reduce runtime |
a data.frame
containing differential accessibility information
Fabian Mueller
## Not run: dsa <- ChrAccRex::loadExample("dsAtac_ia_example") daTab <- getDiffAcc(dsa, "IA_prog_peaks", "stimulus", grp1Name="S", grp2Name="U", adjustCols=c("cellType", "donor")) ## End(Not run)
## Not run: dsa <- ChrAccRex::loadExample("dsAtac_ia_example") daTab <- getDiffAcc(dsa, "IA_prog_peaks", "stimulus", grp1Name="S", grp2Name="U", adjustCols=c("cellType", "donor")) ## End(Not run)
Return a GRanges
object of fragment data for a given sample
## S4 method for signature 'DsATAC' getFragmentGr(.object, sampleId)
## S4 method for signature 'DsATAC' getFragmentGr(.object, sampleId)
.object |
|
sampleId |
sample identifier |
GRanges
object containing fragment data
Fabian Mueller
Return a list of GRanges
objects of fragment data for a given set of samples
## S4 method for signature 'DsATAC' getFragmentGrl(.object, sampleIds, asGRangesList = FALSE)
## S4 method for signature 'DsATAC' getFragmentGrl(.object, sampleIds, asGRangesList = FALSE)
.object |
|
sampleIds |
sample identifiers |
asGRangesList |
should a |
A named list of GRanges
objects containing fragment data
Fabian Mueller
Return the number of fragments in the DsATAC
object
## S4 method for signature 'DsATAC' getFragmentNum(.object, sampleIds = getSamples(.object))
## S4 method for signature 'DsATAC' getFragmentNum(.object, sampleIds = getSamples(.object))
.object |
|
sampleIds |
sample identifiers |
a vector of fragment counts per sample
Fabian Mueller
Return the genome assembly
## S4 method for signature 'DsAcc' getGenome(.object)
## S4 method for signature 'DsAcc' getGenome(.object)
.object |
|
Character string containing the genome assembly
Fabian Mueller
retrieve the appropriate BSgenome
for an assembly string
getGenomeObject(assembly, adjChrNames = TRUE)
getGenomeObject(assembly, adjChrNames = TRUE)
assembly |
string specifying the assembly |
adjChrNames |
should the prefix "chr" be added to main chromosomes if not already present and chrMT be renamed to chrM? |
BSgenome
object
Retrieve groupings given a table containing some categorical columns
getGroupsFromTable(tt, cols = NULL, minGrpSize = 2, maxGrpCount = nrow(tt) - 1)
getGroupsFromTable(tt, cols = NULL, minGrpSize = 2, maxGrpCount = nrow(tt) - 1)
tt |
table to retrieve groupings for |
cols |
(Optional) predefined column names (in the form of a |
minGrpSize |
Minimum number of items required to form a group in comparison |
maxGrpCount |
Maximum number of groups to be considered |
List of groupings. Each element corresponds to a categorical column in the table and contains the row indices for each category
Fabian Mueller
compute kmer frequencies at insertion sites for each sample
## S4 method for signature 'DsATAC' getInsertionKmerFreq( .object, samples = getSamples(.object), k = 6, normGenome = FALSE )
## S4 method for signature 'DsATAC' getInsertionKmerFreq( .object, samples = getSamples(.object), k = 6, normGenome = FALSE )
.object |
|
samples |
sample identifiers |
k |
length of the kmer |
normGenome |
should the result be normalized by genome-wide kmer frequencies |
a matrix
containing kmer frequencies (one row for each kmer and one column for each sample in the dataset)
Fabian Mueller
Return a list of insertion sites (Tn5 cut sites) for each sample
## S4 method for signature 'DsATAC' getInsertionSites( .object, samples = getSamples(.object), asGRangesList = FALSE )
## S4 method for signature 'DsATAC' getInsertionSites( .object, samples = getSamples(.object), asGRangesList = FALSE )
.object |
|
samples |
sample identifiers |
asGRangesList |
should a |
list
or GRangesList
containing Tn5 cut sites for each sample
Fabian Mueller
retrieve motif annotation data
getJasparAnnot(ss, type = "humantfs")
getJasparAnnot(ss, type = "humantfs")
ss |
character vector or JASPAR identifiers |
type |
annotation type. Currently only |
list of data frames of TF annotation (a motif can have multiple annotated TFs)
Fabian Mueller
Retrieve the TF names (symbols) from a JASPAR identifier
getJasparSymbols(ss)
getJasparSymbols(ss)
ss |
character vector or JASPAR identifiers |
a list of TF names (symbols) for each identifier
Return table of methylation values
## S4 method for signature 'DsNOMe' getMeth(.object, type = "sites", asMatrix = FALSE)
## S4 method for signature 'DsNOMe' getMeth(.object, type = "sites", asMatrix = FALSE)
.object |
|
type |
character string specifying the rgion type or |
asMatrix |
return a matrix instead of a |
data.table
or matrix
containing methylation levels for
each site/region and sample
Fabian Mueller
Obtain cell_data_set
object for analysis using the monocle3
package
## S4 method for signature 'DsATAC' getMonocleCellDataSet(.object, regionType, binarize = TRUE)
## S4 method for signature 'DsATAC' getMonocleCellDataSet(.object, regionType, binarize = TRUE)
.object |
|
regionType |
name of the region type to be exported |
binarize |
should the counts be binarized |
a cell_data_set
object containing the counts for the specified region type
Fabian Mueller
Retrieve motif clustering of TF motifs
getMotifClustering( k = 0, distM = NULL, assembly = "hg38", motifs = "jaspar", clusterMethod = "pam" )
getMotifClustering( k = 0, distM = NULL, assembly = "hg38", motifs = "jaspar", clusterMethod = "pam" )
k |
number of clusters. |
distM |
distance matrix ( |
assembly |
genome assembly for which the motifs dissimilarity should be retrieved. Only the species information
of the assembly is really relevant. Can be |
motifs |
a character string specifying the motif set (currently only "jaspar" is supported) |
clusterMethod |
method to be used for motif clustering (currently only |
a list structure containing the clustering result
Fabian Mueller
Retrieve motif dissimilarity/distance matrix for TF motifs
getMotifDistMat(assembly = "hg38", mmObj = NULL, method = "jaspar")
getMotifDistMat(assembly = "hg38", mmObj = NULL, method = "jaspar")
assembly |
genome assembly for which the motifs dissimilarity should be retrieved. Only the species information
of the assembly is really relevant. Can be |
mmObj |
optional motifmatchr object as returned by |
method |
method of dissimilarity quantification. Currently only |
a matrix of motif DISsimilarities (dist
object)
Fabian Mueller
Retrieve motif a comparison table from JASPAR annotation website and construct a dissimilarity matrix for given motif IDs
getMotifDistMat.jaspar(motifIds = NULL, scoreCol = "Ncor")
getMotifDistMat.jaspar(motifIds = NULL, scoreCol = "Ncor")
motifIds |
string vector of motif ids whose dissimilarities are retrieved |
scoreCol |
namew of the annotation column in the JASPAR annotation that contains the motif similarity |
a matrix of motif DISsimilarities
Perform enrichment analysis for (TF) motifs of a query set of regions. Fisher's Exact Test is employed to test the association of motif present in the query set against the background of all regions of that type covered in the object
## S4 method for signature 'DsATAC' getMotifEnrichment(.object, type, idx, motifs = "jaspar")
## S4 method for signature 'DsATAC' getMotifEnrichment(.object, type, idx, motifs = "jaspar")
.object |
|
type |
character string specifying the region type |
idx |
logical vector or indices of the same length as |
motifs |
either a character string (currently only "jaspar" and sets contained in |
a data.frame
summarizing Fisher's Exact Test enrichment statistics for each motif
Fabian Mueller
Perform enrichment analysis for (TF) motif footprinting
## S4 method for signature 'DsATAC' getMotifFootprints( .object, motifNames, samples = getSamples(.object), motifFlank = 250L, type = ".genome", motifDb = "jaspar" )
## S4 method for signature 'DsATAC' getMotifFootprints( .object, motifNames, samples = getSamples(.object), motifFlank = 250L, type = ".genome", motifDb = "jaspar" )
.object |
|
motifNames |
character vector of motif names |
samples |
sample identifiers |
motifFlank |
number of base pairs flanking the motif on each side |
type |
(PLACEHOLDER ARGUMENT: NOT IMPLEMENTED YET) character string specifying the region type or |
motifDb |
either a character string (currently only "jaspar" and sets contained in |
a list
of footprinting results with one element for each motif. Each motif's results contain summary data frames with aggregated counts
across all motif occurrences and a ggplot
object for plotting footprints
Fabian Mueller
## Not run: dsa <- ChrAccRex::loadExample("dsAtac_ia_example") motifNames <- c("MA1419.1_IRF4", "MA0139.1_CTCF", "MA0037.3_GATA3") # motifNames <- grep("(IRF4|CTCF|GATA3)$", names(prepareMotifmatchr("hg38", "jaspar")$motifs), value=TRUE, ignore.case=TRUE) # alternative by searching samples <- c("TeffNaive_U_1001", "TeffNaive_U_1002", "TeffMem_U_1001", "TeffMem_U_1002") fps <- getMotifFootprints(dsa, motifNames, samples) fps[["MA1419.1_IRF4"]]$plot ## End(Not run)
## Not run: dsa <- ChrAccRex::loadExample("dsAtac_ia_example") motifNames <- c("MA1419.1_IRF4", "MA0139.1_CTCF", "MA0037.3_GATA3") # motifNames <- grep("(IRF4|CTCF|GATA3)$", names(prepareMotifmatchr("hg38", "jaspar")$motifs), value=TRUE, ignore.case=TRUE) # alternative by searching samples <- c("TeffNaive_U_1001", "TeffNaive_U_1002", "TeffMem_U_1001", "TeffMem_U_1002") fps <- getMotifFootprints(dsa, motifNames, samples) fps[["MA1419.1_IRF4"]]$plot ## End(Not run)
Find occurrences of motifs in a given genome
getMotifOccurrences(motifNames = NULL, motifDb = "jaspar", genome = "hg38")
getMotifOccurrences(motifNames = NULL, motifDb = "jaspar", genome = "hg38")
motifNames |
character vector of motif names |
motifDb |
either a character string (currently only "jaspar" and sets contained in |
genome |
character string specifying genome assembly |
a GenomicRangesList
containing motif occurrences
Fabian Mueller
Retrieve the set of non-verlapping regions by iteratively picking the region with maximum score for each set of consecutively overlapping regions
getNonOverlappingByScore(gr, scoreCol = "score")
getNonOverlappingByScore(gr, scoreCol = "score")
gr |
|
scoreCol |
name of the column to be used as score in the |
GRanges
object containing non-overlapping regions
Fabian Mueller
Return the number of regions of a given type
## S4 method for signature 'DsAcc' getNRegions(.object, type = "sites")
## S4 method for signature 'DsAcc' getNRegions(.object, type = "sites")
.object |
|
type |
character string specifying the rgion type or |
the number of regions of that type
Fabian Mueller
Retrieve a consensus set of ATAC peaks from the snakeATAC pipline run
getPeakSet.snakeATAC( sampleAnnot, filePrefixCol, genome, dataDir, sampleIdCol = filePrefixCol, type = "summits_no_fw", unifWidth = 500L, replicateCol = NA, replicatePercReq = 1, replicateConsSelect = FALSE, keepOvInfo = FALSE )
getPeakSet.snakeATAC( sampleAnnot, filePrefixCol, genome, dataDir, sampleIdCol = filePrefixCol, type = "summits_no_fw", unifWidth = 500L, replicateCol = NA, replicatePercReq = 1, replicateConsSelect = FALSE, keepOvInfo = FALSE )
sampleAnnot |
data.frame specifying the sample annotation table |
filePrefixCol |
column name specifying the file prefix for each sample in the sample annotation table |
genome |
genome assembly |
dataDir |
directory where the files are located |
sampleIdCol |
column name or index in the sample annotation table containing unique sample identifiers |
type |
input data type. Currently only "summits_no_fw" (non-overlapping, fixed-width peaks deduced from summits) |
unifWidth |
width of the peaks if the results have uniform peak lengths |
replicateCol |
column name specifying the replicate group for cross-checking coverage across replicates |
replicatePercReq |
percentile of replicates in a group required to contain a peak in order to keep it. E.g. a value of 1 (default) means that all replicates in a group are required to contain that peak in order to keep it. |
replicateConsSelect |
if set, the peak set will also be checked for consistency, i.e. in order to retain a peak
it has to be consistently be present or absent in each replicate group (as specified in |
keepOvInfo |
keep annotation columns in the elementMetadata of the results specifying whether a consensus peak overlaps with a peak in each sample |
GRanges
object containing consensus peak set
Fabian Mueller
[Experimental] Quick, heuristic version of TSS enrichment to just get scores for each sample in the dataset. Useful, e.g. for single cells.
## S4 method for signature 'DsATAC' getQuickTssEnrichment( .object, tssGr = NULL, sampleIds = getSamples(.object), tssW = 201L, distBg = 1900L )
## S4 method for signature 'DsATAC' getQuickTssEnrichment( .object, tssGr = NULL, sampleIds = getSamples(.object), tssW = 201L, distBg = 1900L )
.object |
|
tssGr |
|
sampleIds |
sampleIds for which TSS enrichment should be computed |
tssW |
width to consider arount the TSS |
distBg |
number of bases flanking each TSS that will be added on each side |
Computes TSS enrichment
as the ratio of total insertion sites at a window (of width tssW
bp) directly at the TSS and 2 background regions
symmetrically located (distBg
bp) upstream and downstream of the TSS
a vector of TSS enrichment values for each sample/cell in the dataset
Fabian Mueller
Retrieve a mapping from regions to GC indices in the dataset
## S4 method for signature 'DsNOMe' getRegionMapping(.object, type)
## S4 method for signature 'DsNOMe' getRegionMapping(.object, type)
.object |
|
type |
character string specifying a name for the region type |
list containing vectors of indices of GCs for each region of the specified type
Fabian Mueller
Return sample IDs in a dataset
## S4 method for signature 'DsAcc' getRegionTypes(.object, inclSites = FALSE)
## S4 method for signature 'DsAcc' getRegionTypes(.object, inclSites = FALSE)
.object |
|
inclSites |
include |
Character vector of sample IDs in the dataset
Fabian Mueller
Return sample annotation table of a dataset
## S4 method for signature 'DsAcc' getSampleAnnot(.object)
## S4 method for signature 'DsAcc' getSampleAnnot(.object)
.object |
|
data.frame
containing sample annotation
Fabian Mueller
Retrieve sample summary statistics from the output a snakeATAC pipline run
getSampleMetrics.snakeATAC(sampleAnnot, snakeDir, withPeaks = TRUE)
getSampleMetrics.snakeATAC(sampleAnnot, snakeDir, withPeaks = TRUE)
sampleAnnot |
data.frame specifying the sample annotation table. Must have valid rownames corresponding to the sample ids used in the snakeAtac filenames |
snakeDir |
snakeATAC base directory (where the files are located) |
withPeaks |
flag indicating whether to output peak statistics |
data.frame
containing sample summary statistics. the original sample annotation table will be appended to the summary output
Fabian Mueller
Return sample IDs in a dataset
## S4 method for signature 'DsAcc' getSamples(.object)
## S4 method for signature 'DsAcc' getSamples(.object)
.object |
|
Character vector of sample IDs in the dataset
Fabian Mueller
Retrieve a table of QC statistics for single cells
## S4 method for signature 'DsATACsc' getScQcStatsTab(.object)
## S4 method for signature 'DsATACsc' getScQcStatsTab(.object)
.object |
|
an data.frame
contain QC statistics for each cell
Fabian Mueller
retrieve TF annotation data
getTfAnnot(type = "humantfs")
getTfAnnot(type = "humantfs")
type |
annotation type. Currently only |
a data frame of TF annotation
Fabian Mueller
Get TSS enrichment data and plot
## S4 method for signature 'DsATAC' getTssEnrichment( .object, sampleId, tssGr = NULL, flank = 2000L, normTailW = 100L, smoothW = 25L, silent = FALSE )
## S4 method for signature 'DsATAC' getTssEnrichment( .object, sampleId, tssGr = NULL, flank = 2000L, normTailW = 100L, smoothW = 25L, silent = FALSE )
.object |
|
sampleId |
sample to be plotted |
tssGr |
|
flank |
number of bases flanking each TSS that will be added on each side |
normTailW |
number of bases on each side whose counts will be used to normalize the data |
smoothW |
radius of the window (in bp) that will be used to smooth the data, i.e. the total width of the smoothing window will be twice that number |
silent |
limit log messages |
a list containing TSS enrichment data and a ggplot
object containing TSS enrichment plot
Fabian Mueller
Get TSS enrichment data and plot
## S4 method for signature 'DsATAC' getTssEnrichmentBatch( .object, tssGr = NULL, sampleIds = getSamples(.object), tssW = 201L, flank = 2000L, normTailW = 200L, smoothW = 51L )
## S4 method for signature 'DsATAC' getTssEnrichmentBatch( .object, tssGr = NULL, sampleIds = getSamples(.object), tssW = 201L, flank = 2000L, normTailW = 200L, smoothW = 51L )
.object |
|
tssGr |
|
sampleIds |
sampleIds for which TSS enrichment should be computed |
tssW |
size of the core TSS window |
flank |
number of bases flanking each TSS that will be added on each side |
normTailW |
number of bases on each side whose counts will be used to normalize the data |
smoothW |
diameter of the window (in bp) that will be used to smooth the data |
a list containing TSS enrichment data
Fabian Mueller
Draw a sequence motif logo in a Complex Heatmap using grid.
adapted from seqLogo::seqLogo()
hmSeqLogo( pwm, x = unit(0.5, "npc"), y = unit(0.5, "npc"), width = 1, height = 1, ic.scale = TRUE )
hmSeqLogo( pwm, x = unit(0.5, "npc"), y = unit(0.5, "npc"), width = 1, height = 1, ic.scale = TRUE )
pwm |
PWM (from TFBSTools package) |
x |
x center coordinate where the motif should be drawn |
y |
y center coordinate where the motif should be drawn |
width |
drawing width |
height |
drawing height |
ic.scale |
|
Draws the motif
Fabian Mueller
## Not run: mm <- prepareMotifmatchr("hg38", "jaspar")$motifs[["MA0137.3_STAT1"]] hmSeqLogo(mm, unit(0.5, "npc"), unit(0.5, "npc"), 0.5, 0.5, ic.scale=TRUE) ## End(Not run)
## Not run: mm <- prepareMotifmatchr("hg38", "jaspar")$motifs[["MA0137.3_STAT1"]] hmSeqLogo(mm, unit(0.5, "npc"), unit(0.5, "npc"), 0.5, 0.5, ic.scale=TRUE) ## End(Not run)
for a character string of chromosome names, determine if it is a canonical chromosome (i.e. not not ChrUn*, *_random, ...)
isCanonicalChrom(ss)
isCanonicalChrom(ss)
ss |
character vector of chromosome names |
logical vector stating whether the given chromosome names correspond to canonical chromosomes
Fabian Mueller
Perform iterative LSI clustering and dimension reduction as described in doi:10.1038/s41587-019-0332-7
## S4 method for signature 'DsATACsc' iterativeLSI( .object, it0regionType = "t5k", it0nMostAcc = 20000L, it0pcs = 1:25, it0clusterResolution = 0.8, it0clusterMinCells = 200L, it0nTopPeaksPerCluster = 2e+05, it1pcs = 1:50, it1clusterResolution = 0.8, it1mostVarPeaks = 50000L, it2pcs = 1:50, it2clusterResolution = 0.8, rmDepthCor = 0.5, normPcs = FALSE, umapParams = list(distMethod = "euclidean", min_dist = 0.5, n_neighbors = 25) )
## S4 method for signature 'DsATACsc' iterativeLSI( .object, it0regionType = "t5k", it0nMostAcc = 20000L, it0pcs = 1:25, it0clusterResolution = 0.8, it0clusterMinCells = 200L, it0nTopPeaksPerCluster = 2e+05, it1pcs = 1:50, it1clusterResolution = 0.8, it1mostVarPeaks = 50000L, it2pcs = 1:50, it2clusterResolution = 0.8, rmDepthCor = 0.5, normPcs = FALSE, umapParams = list(distMethod = "euclidean", min_dist = 0.5, n_neighbors = 25) )
.object |
|
it0regionType |
character string specifying the region type to start with |
it0nMostAcc |
the number of the most accessible regions to consider in iteration 0 |
it0pcs |
the principal components to consider in iteration 0 |
it0clusterResolution |
resolution paramter for Seurat's clustering ( |
it0clusterMinCells |
the minimum number of cells in a cluster in order for it to be considered in peak calling (iteration 0) |
it0nTopPeaksPerCluster |
the number of best peaks to be considered for each cluster in the merged peak set (iteration 0) |
it1pcs |
the principal components to consider in iteration 0 |
it1clusterResolution |
resolution paramter for Seurat's clustering ( |
it1mostVarPeaks |
the number of the most variable peaks to consider after iteration 1 |
it2pcs |
the principal components to consider in the final iteration (2) |
it2clusterResolution |
resolution paramter for Seurat's clustering ( |
rmDepthCor |
correlation cutoff to be used to discard principal components associated with fragment depth (all iterationa) |
normPcs |
flag indicating whether to apply z-score normalization to PCs for each cell (all iterations) |
umapParams |
parameters to compute UMAP coordinates (passed on to |
In order to obtain a low dimensional representation of single-cell ATAC datasets in terms of principal components and UMAP coordinates, we recommend an iterative application of the Latent Semantic Indexing approach [10.1016/j.cell.2018.06.052] described in [doi:10.1038/s41587-019-0332-7]. This approach also identifies cell clusters and a peak set that represents a consensus peak set of cluster peaks in a given dataset. In brief, in an initial iteration clusters are identified based on the most accessible regions (e.g. genomic tiling regions). Here, the counts are first normalized using the term frequency–inverse document frequency (TF-IDF) transformation and singular values are computed based on these normalized counts in selected regions (i.e. the most accessible regions in the initial iteration). Clusters are identified based on the singular values using Louvain clustering (as implemented in the Seurat
package). Peak calling is then performed on the aggregated insertion sites from all cells of each cluster (using MACS2) and a union/consensus set of peaks uniform-length non-overlapping peaks is selected. In a second iteration, the peak regions whose TF-IDF-normalized counts which exhibit the most variability across the initial clusters provide the basis for a refined clustering using derived singular values. In the final iteration, the most variable peaks across the refined clusters are identified as the final peak set and singular values are computed again. Based on these final singular values UMAP coordinates are computed for low-dimensional projection.
The output object includes the final singular values/principal components (result$pcaCoord
), the low-dimensional coordinates (result$umapCoord
), the final cluster assignment of all cells (result$clustAss
), the complete, unfiltered initial cluster peak set (result$clusterPeaks_unfiltered
) as well as the final cluster-variable peak set (result$regionGr
).
an S3
object containing dimensionality reduction results, peak sets and clustering
Fabian Mueller
Combine two DsATAC
objects
## S4 method for signature 'DsATAC' join(.object, objectB, joinRegionTypes = "union")
## S4 method for signature 'DsATAC' join(.object, objectB, joinRegionTypes = "union")
.object |
|
objectB |
|
combineRegionTypes |
how to combine region types: 'union' (default): the resulting object will have counts aggregated over region types from both objects. 'intersect': only region types present in both objects will occur in the output |
a new DsATAC
object combining both input objects.
It contains untransformed counts.
Fabian Mueller
Retrieve the number of samples contained in a DsAcc object
## S4 method for signature 'DsAcc' length(x)
## S4 method for signature 'DsAcc' length(x)
x |
DsAcc object |
Sets the configuration from a configuration file (JSON)
loadConfig(cfgFile)
loadConfig(cfgFile)
cfgFile |
Config file in JSON format. As output by saveConfig |
nothing of particular interest. The configuration is set for the current environment
Fabian Mueller
Set the indices specified in a mask to NA
## S4 method for signature 'DsNOMe' maskMethNA(.object, mask, type = "sites", reaggregate = TRUE)
## S4 method for signature 'DsNOMe' maskMethNA(.object, mask, type = "sites", reaggregate = TRUE)
.object |
|
mask |
a mask, i.e. a logical matrix of indices to set to NA |
type |
character string specifying a name for the region type (default: sites) |
reaggregate |
redo region aggregation (only has an effect if type is sites and there are aggregated regions in the dataset) |
a new DsNOMe
object with sites/regions masked
Fabian Mueller
Merge cells into pseudobulk samples based on annotation
## S4 method for signature 'DsATACsc' mergePseudoBulk(.object, mergeGroups, cleanSampleAnnot = TRUE)
## S4 method for signature 'DsATACsc' mergePseudoBulk(.object, mergeGroups, cleanSampleAnnot = TRUE)
.object |
|
mergeGroups |
factor or character vector or column name in sample annotation table. Can alternatively be a (named) list containing sample indices or names for each group to merge. |
cleanSampleAnnot |
clean up sample annotation table in the new object |
a new DsATAC
object with cells merged into pseudobulk samples
Fabian Mueller
Merge signal and insertion data across samples
## S4 method for signature 'DsATAC' mergeSamples(.object, mergeGroups, countAggrFun = "sum")
## S4 method for signature 'DsATAC' mergeSamples(.object, mergeGroups, countAggrFun = "sum")
.object |
|
mergeGroups |
factor or character vector or column name in sample annotation table. Can alternatively be a (named) list containing sample indices or names for each group to merge. |
countAggrFun |
aggregation function for signal counts.
Currently |
a new DsATAC
object with samples merged
Fabian Mueller
Merge + and - strands of the dataset by adding read coverage and recomputing Methylation levels
## S4 method for signature 'DsNOMe' mergeStrands(.object, reaggregate = TRUE)
## S4 method for signature 'DsNOMe' mergeStrands(.object, reaggregate = TRUE)
.object |
|
reaggregate |
redo region aggregation (only has an effect if there are aggregated regions in the dataset) |
a new DsNOMe
object with the strands merged
Fabian Mueller
Normalize methylation levels
## S4 method for signature 'DsNOMe' normalizeMeth(.object, type = "sites", method = "quantile", reaggregate = TRUE)
## S4 method for signature 'DsNOMe' normalizeMeth(.object, type = "sites", method = "quantile", reaggregate = TRUE)
.object |
|
type |
character string specifying a name for the region type (default: sites) |
method |
normalization method to be applied. Currently only 'quantile' is supported |
reaggregate |
redo region aggregation (only has an effect if type is sites and there are aggregated regions in the dataset) |
a new DsNOMe
object with normalized methylation levels
Fabian Mueller
Plot insert size distribution
## S4 method for signature 'DsATAC' plotInsertSizeDistribution(.object, sampleId)
## S4 method for signature 'DsATAC' plotInsertSizeDistribution(.object, sampleId)
.object |
|
sampleId |
sample to be plotted |
ggplot
object containing insert size distribution plot
Fabian Mueller
prepare objects for a motifmatchr
analysis
prepareMotifmatchr(genome, motifs)
prepareMotifmatchr(genome, motifs)
genome |
character string specifying genome assembly |
motifs |
either a character string (currently only "jaspar" and sets contained in |
a list containing objects to be used as arguments for motifmatchr
Fabian Mueller
given a (count) matrix and dimension reduction result, return the projected UMAP coordinates in the embedding space
projectMatrix_UMAP(X, umapObj, binarize = TRUE, addPcCoord = FALSE)
projectMatrix_UMAP(X, umapObj, binarize = TRUE, addPcCoord = FALSE)
X |
matrix to be projected (features X samples) |
umapObj |
dimension reduction result as returned by |
binarize |
binarize the counts before projecting |
addPcCoord |
also add PC coordinates to the resulting matrix |
Projected UMAP coordinates
Fabian Mueller
convert a log2probratio PWM (PWMatrix
from TFBSTools package) to a matrix containing probabilities in [0,1]
PWMatrixToProbMatrix(x)
PWMatrixToProbMatrix(x)
x |
log2probratio PWM ( |
PWM probability matrix with values in
Fabian Mueller
Reads the MACS2 ouput as GRanges
readMACS2peakFile(fn)
readMACS2peakFile(fn)
fn |
Filename for MACS2 narrow peak file |
GRanges
object containing peak information
Fabian Mueller
Aggregate signal counts across a set of regions
## S4 method for signature 'DsATAC' regionAggregation( .object, regGr, type, signal = NULL, aggrFun = "median", dropEmpty = TRUE, bySample = TRUE, chunkSize = 5000L )
## S4 method for signature 'DsATAC' regionAggregation( .object, regGr, type, signal = NULL, aggrFun = "median", dropEmpty = TRUE, bySample = TRUE, chunkSize = 5000L )
.object |
|
regGr |
|
type |
character string specifying a name for the region type |
signal |
character string specifying a name for the region type for the signal to be aggregated
If it is |
aggrFun |
aggregation function for signal counts. Will only be used if |
dropEmpty |
discard all regions with no observed signal counts |
bySample |
[only relevant if |
chunkSize |
[only relevant if |
a new DsATAC
object with aggregated signal counts per regions
Fabian Mueller
Aggregate methylation levels and coverage values accross a set of regions
## S4 method for signature 'DsNOMe' regionAggregation( .object, regGr, type, methAggrFun = "weightedMean", dropEmpty = TRUE )
## S4 method for signature 'DsNOMe' regionAggregation( .object, regGr, type, methAggrFun = "weightedMean", dropEmpty = TRUE )
.object |
|
regGr |
|
type |
character string specifying a name for the region type |
methAggrFun |
aggregation function for methylation levels.
Currently |
dropEmpty |
discard all regions with no observed methylation levels |
Coverage values are aggregated by summing up coverage values for individual GCs
while the aggregation function for methylation levels is specified by the
methAggrFun
parameter.
a new DsNOMe
object with aggregated regions
Fabian Mueller
Overlap the insertion data with a list of region sets
## S4 method for signature 'DsATAC' regionSetCounts(.object, rsl, bySample = FALSE)
## S4 method for signature 'DsATAC' regionSetCounts(.object, rsl, bySample = FALSE)
.object |
|
rsl |
|
bySample |
for internal use: iterate over samples (instead of retrieving one giant insertion list for all samples) in order to save memory (at the tradeoff of compute time) |
a matrix of overlap counts for each region set and sample
Fabian Mueller
Removes fragment data from DsATAC
object (e.g. to save space)
## S4 method for signature 'DsATAC' removeFragmentData(object)
## S4 method for signature 'DsATAC' removeFragmentData(object)
object |
|
the modified object (without fragment data)
Fabian Mueller
Remove all region data from a DsATAC
object
## S4 method for signature 'DsATAC' removeRegionData(.object)
## S4 method for signature 'DsATAC' removeRegionData(.object)
.object |
|
a new DsATAC
object with region data removed
Fabian Mueller
Remove the specified sites or regions from an object
## S4 method for signature 'DsAcc' removeRegions(.object, indices, type)
## S4 method for signature 'DsAcc' removeRegions(.object, indices, type)
.object |
|
indices |
a vector of indices of sites/regions to be removed. Can be numeric, integer or logical. |
type |
character string specifying a name for the region type (sefault: sites) |
a new DsAcc
object with sites/regions removed
Fabian Mueller
Remove the specified sites or regions from an object
## S4 method for signature 'DsATAC' removeRegions(.object, indices, type)
## S4 method for signature 'DsATAC' removeRegions(.object, indices, type)
.object |
|
indices |
a vector of indices of sites/regions to be removed. Can be numeric, integer or logical. |
type |
character string specifying a name for the region type (sefault: sites) |
a new DsATAC
object with sites/regions removed
Fabian Mueller
Remove the specified sites or regions from an object
## S4 method for signature 'DsNOMe' removeRegions(.object, indices, type = "sites", reaggregate = TRUE)
## S4 method for signature 'DsNOMe' removeRegions(.object, indices, type = "sites", reaggregate = TRUE)
.object |
|
indices |
a vector of indices of sites/regions to be removed. Can be numeric, integer or logical. |
type |
character string specifying a name for the region type (sefault: sites) |
reaggregate |
redo region aggregation (only has an effect if type is sites and there are aggregated regions in the dataset) |
a new DsNOMe
object with sites/regions removed
Fabian Mueller
Remove the specified region type from an object
## S4 method for signature 'DsATAC' removeRegionType(.object, type)
## S4 method for signature 'DsATAC' removeRegionType(.object, type)
.object |
|
type |
character string specifying a name for the region type (sefault: sites) |
a new DsATAC
object with the region type removed
Fabian Mueller
Remove samples from a DsATAC
object
## S4 method for signature 'DsATAC' removeSamples(.object, indices)
## S4 method for signature 'DsATAC' removeSamples(.object, indices)
.object |
|
indices |
a vector of indices of samples to be removed. Can be numeric, integer or logical. |
a new DsATAC
object with sites/regions removed
Fabian Mueller
Performs z-score normalization on the rows of a matrix. (Basically a wrapper around matrixStats
)
rowZscores(X, na.rm = FALSE)
rowZscores(X, na.rm = FALSE)
X |
input matrix |
na.rm |
should NAs be omitted? |
z-score normalized matrix
Fabian Mueller
Run the complete ChrAccR analysis for ATAC-seq data
run_atac( anaDir, input = NULL, sampleAnnot = NULL, genome = NULL, sampleIdCol = NULL, regionSets = NULL, startStage = "raw", resetStage = NULL )
run_atac( anaDir, input = NULL, sampleAnnot = NULL, genome = NULL, sampleIdCol = NULL, regionSets = NULL, startStage = "raw", resetStage = NULL )
anaDir |
analysis directory |
input |
Input object. Can be either |
sampleAnnot |
sample annotation table ( |
genome |
genome assembly. Only relevant if not continuing existing analysis and input is not a |
sampleIdCol |
column name in the sample annotation table containing unique sample Only relevant if not continuing existing analysis and input is not a |
regionSets |
a list of GRanges objects which contain region sets over which count data will be aggregated. Only relevant if not continuing existing analysis and input is not a |
startStage |
stage where to start the analysis from. can be one of |
resetStage |
flag indicating whether to reset the analysis directory (i.e. deleting previously generated reports and datasets), when continuing previous analyses ( |
DsATAC
object (invisible)
Fabian Mueller
Run chromVAR analysis for ATAC-seq data
run_atac_chromvar(.object)
run_atac_chromvar(.object)
.object |
|
An S3 object containing a list of chromVAR Deviations objects as returned by chromVAR::computeDeviations
. One object for each region type specified in the chromVarRegionTypes
configuration.
Fabian Mueller
Run differential analyses for ATAC-seq data
run_atac_differential(dsa, anaDir, chromVarObj = NULL)
run_atac_differential(dsa, anaDir, chromVarObj = NULL)
dsa |
|
anaDir |
analysis directory |
chromVarObj |
[optional] pre-computed result of a call to |
S3
object containing differential analysis results and an analysis report object
Fabian Mueller
Run exploratory analyses for ATAC-seq data
run_atac_exploratory( dsa, anaDir, chromVarObj = NULL, itLsiObj = NULL, geneActSe = NULL )
run_atac_exploratory( dsa, anaDir, chromVarObj = NULL, itLsiObj = NULL, geneActSe = NULL )
dsa |
|
anaDir |
analysis directory |
chromVarObj |
[optional] pre-computed result of a call to |
itLsiObj |
[for single-cell only; optional] pre-computed result of a call to |
geneActSe |
[for single-cell only; optional] pre-computed result of a call to |
S3
object containing exploratory metrics and an analysis report object
Fabian Mueller
Run the filtering for ATAC-seq data
run_atac_filtering(dsa, anaDir)
run_atac_filtering(dsa, anaDir)
dsa |
|
anaDir |
analysis directory |
S3
object containing the filtered DsATAC
object, filtering statistics and an analysis report object
Fabian Mueller
Run count normalization for ATAC-seq data
run_atac_normalization(dsa, anaDir)
run_atac_normalization(dsa, anaDir)
dsa |
|
anaDir |
analysis directory |
S3
object containing the normalized DsATAC
object and an analysis report object
Fabian Mueller
Run peak calling for ATAC-seq data
run_atac_peakcalling(dsa, anaDir)
run_atac_peakcalling(dsa, anaDir)
dsa |
|
anaDir |
analysis directory |
S3
object containing the annotated DsATAC
object, per-sample peak calls, a consensus peak set and an analysis report object
Fabian Mueller
Run the summary QC analysis for ATAC-seq data
run_atac_qc(dsa, anaDir)
run_atac_qc(dsa, anaDir)
dsa |
|
anaDir |
analysis directory |
S3
object containing QC statistics and an analysis report object
Fabian Mueller
Run unsupervised analysis for single-cell ATAC-seq data (i.e. iterative LSI, clustering and cluster peak detection)
run_atac_sc_unsupervised(dsa, anaDir)
run_atac_sc_unsupervised(dsa, anaDir)
dsa |
|
anaDir |
analysis directory |
S3
object containing the annoted DsATAC
object, the results of running iterative LSI and an analysis report object
Fabian Mueller
Compute matrix statistics selecting the appropriate function depending on the matrix class of the input (supports sparse matrices and DelayedArrays)
safeMatrixStats(X, statFun = "rowSums", ...)
safeMatrixStats(X, statFun = "rowSums", ...)
X |
input matrix |
statFun |
statistic. E.g. |
... |
arguments passed on to the matrix stats function. E.g. |
result of the corresponding matrix statistic
Fabian Mueller
Samples pseudo-bulk samples from single-cells
## S4 method for signature 'DsATACsc' samplePseudoBulk(.object, nnData, nSamples, nCellsPerSample = 100)
## S4 method for signature 'DsATACsc' samplePseudoBulk(.object, nnData, nSamples, nCellsPerSample = 100)
.object |
|
nnData |
Data to use for nearest neighbor matching. Can either be the
name of a region type in |
nSamples |
number of pseudobulk samples to be returned |
nCellsPerSample |
number of cells to be aggregated per sample |
Samples pseudo-bulk samples from single-cells by sampling nSamples
individual cells
and then merging it with its nCellsPerSample - 1
nearest neighbors (according to nnData
).
S3
data structure containing a list of sampling results as well as
a DsATAC
object containing pseudo-bulk aggregates
Fabian Mueller
Save the current configuration to a configuration file (JSON)
saveConfig(dest)
saveConfig(dest)
dest |
Filename for the config file in JSON format. |
nothing of particular interest.
Fabian Mueller
Save a DsAcc dataset to disk for later loading
saveDsAcc(.object, path, forceDiskDump = FALSE, updateDiskRef = TRUE)
saveDsAcc(.object, path, forceDiskDump = FALSE, updateDiskRef = TRUE)
.object |
|
path |
destination to save the object to |
forceDiskDump |
force large matrices (counts) to be stored as HDF5 (even when the object was not created using |
updateDiskRef |
update disk dumped (HDF5) references (e.g. for count data) |
(invisibly) The object (with potentially updated disk dumped references)
Fabian Mueller
Set a configuration item to a given value
setConfigElement(name, value)
setConfigElement(name, value)
name |
name of the config item |
value |
value of the config item |
nothing of particular interest.
tmpDir
= temdir()
Directory for temporary files. Must be existing.
cleanMem
= TRUE
During runtime, regularly clean-out the memory in order to reduce memory overuse
colorSchemes
named list
of DISCRETE color schemes to be used for plotting. Each element should be a named vector specifying colors for groups/annotations.
colorSchemesCont
named list
of CONTINOUS color schemes to be used for plotting. Each element should be a vector specifying a range of colors.
geneModelVersions
Gene model versions to be used for various genomes
analysisName
= "ChrAccR analysis"
A title for the analysis (a string).
regionTypes
Region types to be used in the analysis
chromVarRegionTypes
= NULL
Region types to be used for chromVar analysis. If NULL
(default), ChrAccR will automatically look for region types with the keyword "peak"
in their name.
chromVarMotifs
= "jaspar_vert"
Character vector of names of TF motif sets to be used in ChromVAR analyses. By default the vertebrate set of the JASPAR database will be used.
chromVarMotifNamesForDimRed
Names of motifs to be used for dimension reduction plots in the reports. [only relevant for single-cell data]
genesOfInterest
Names of genes of interest to be highlighted in the reports (e.g. dimension reduction) in the reports.
[currently only relevant for single-cell data and only when scGeneActivity
is activated]
annotationColumns
Sample annotation columns to be used for reporting
annotationMinGroupSize
= 2
Minimum size of a group to be used in the reports. Influences which annotation columns are automatically selected for reporting.
annotationMaxGroupCount
= NULL
Maximum number of groups to be used in the reports. Influences which annotation columns are
automatically selected for reporting. If NULL
(default) it will effectively be the
number of samples - 1.
doPeakCalling
= FALSE
Perform per-sample peak calling and retrieve consensus peak set. Requires that macs2
is installed and can be called from the command line. [for bulk data analysis only]
peakCallingProfile
= NULL
If set to a string describing a valid profile, will apply a special profile for macs2
peak calling.
[only valid in combination with the doPeakCalling
option]
annotationPeakGroupColumn
Annotation column to base the consensus peak set replication filtering on.
annotationPeakGroupAgreePerc
= 1.0
Percent of samples that have to agree to identify consensus peaks. See getConsensusPeakSet
for details.
filteringCovgCount
= 1L
Minimum insertion count to filter count matrices by. See filterLowCovg,DsATAC-method
for details. [for bulk data analysis only]
filteringCovgReqSamples
= 0.75
Minimum required samples to apply low coverage filtering to. See filterLowCovg,DsATAC-method
for details. [for bulk data analysis only]
filteringSexChroms
= FALSE
Flag indicating whether to remove sex chromosomes.
filteringScMinFragmentsPerCell
= 1000L
Minimum number of fragments per cell to retain a cell in the analysis. [for single-cell data analysis only]
filteringScMaxFragmentsPerCell
= Inf
Maximum number of fragments allowed per cell to retain a cell in the analysis. [for single-cell data analysis only]
filteringScMinTssEnrichment
= 6
Minimum TSS enrichment score per cell to retain a cell in the analysis. [for single-cell data analysis only]
normalizationMethod
= "quantile"
Normalization method to use for count normalization. Allowed methods include the ones listed in transformCounts,DsATAC-method
. [for bulk data analysis only]
exploratoryLogNormCounts
= TRUE
Should a log-normalization be applied in the exploratory plot sections of the reports (dimension reduction, heatmaps)
exploratoryNSubsample
= 2e6
Number of regions to subsample in exploratory analysis in order to increase computational performance.
differentialColumns
Sample annotation columns to be used for differential testing and reporting
differentialColumns1vsAll
Sample annotation columns to be used for differential testing and reporting in a 1-vs-all group setting. Should be a subset of differentialColumns
.
differentialCompNames
Comparison names from which comparison information is derived. Must be in the format of "$GRP1_NAME vs $GRP2_NAME [$ANNOTATION_COLUMN]".
differentialAdjColumns
Sample annotation columns to be adjusted for in differential testing
differentialCutoffL2FC
Cutoff on log2 fold-change to be used for reporting differential accessibility.
lolaDbPaths
Precomputed LOLA databases to be used for enrichment analysis. If NULL
(default), ChrAccR will download an apropriate core database.
scIterativeLsiRegType
For single-cell analysis only: region type to be used for clustering and dimension reduction using iterative LSI. By default (NULL
),
ChrAccR will look for a region type named "tiling"
.
scIterativeLsiParams
Parameters to use for iterative LSI. See iterativeLSI,DsATACsc-method
for details.
scGeneActivity
= FALSE
For single-cell analysis only: Compute gene activity from accessibility.
Possible options are "RBF"
for radial-basis-function-weighted count aggregation (default when set to TRUE
) or
"Cicero"
for Cicero correlation-based aggregation
Fabian Mueller
transform count data for an ATAC seq dataset
## S4 method for signature 'DsATAC' transformCounts( .object, method = "quantile", regionTypes = getRegionTypes(.object), ... )
## S4 method for signature 'DsATAC' transformCounts( .object, method = "quantile", regionTypes = getRegionTypes(.object), ... )
.object |
|
method |
transformation method to be applied. Currently only 'log2', 'log10', 'quantile' (quantile normalization), 'percentile' (percentile normalization),'rankPerc' (rank percentile), 'vst' (DESeq2 Variance Stabilizing Transformation), 'batchCorrect' (limma batch effect removal), tf-idf', 'CPM' (counts per million), and 'RPKM' (RPKM normalization) are supported |
regionTypes |
character vector specifying a name for the region type in which count data should be normalized(default: all region types) |
... |
other arguments depending on the |
a new DsATAC
object with normalized count data
Fabian Mueller
Perform unsupervised analysis on single-cell data. Performs dimensionality reduction and clustering.
## S4 method for signature 'DsATACsc' unsupervisedAnalysisSc( .object, regionType, regionIdx = NULL, dimRedMethod = "tf-idf_irlba", usePcs = 1:50, clusteringMethod = "seurat_louvain" )
## S4 method for signature 'DsATACsc' unsupervisedAnalysisSc( .object, regionType, regionIdx = NULL, dimRedMethod = "tf-idf_irlba", usePcs = 1:50, clusteringMethod = "seurat_louvain" )
.object |
|
regionType |
character string specifying the region type |
regionIdx |
indices of regions to be used (logical or integer vector). If |
dimRedMethod |
character string specifying the dimensionality reduction method. Currently on |
usePcs |
integer vector specifying the principal components to use for UMAP and clustering |
clusteringMethod |
character string specifying the clustering method. Currently on |
an S3
object containing dimensionality reduction results and clustering
Fabian Mueller