| Title: | R Tools for Blaser Lab Data Analysis |
|---|---|
| Description: | This is a repository of R tools for Single Cell RNA seq and other lab data analysis functions. |
| Authors: | Brad Blaser [aut, cre] (ORCID: <https://orcid.org/0000-0002-3168-5423>) |
| Maintainer: | Brad Blaser <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.0.0.9208 |
| Built: | 2026-05-23 03:48:54 UTC |
| Source: | https://github.com/blaserlab/blaseRtools |
This is a repository of R tools for Single Cell RNA seq and other lab data analysis functions.
Maintainer: Brad Blaser [email protected] (ORCID)
Useful links:
Report bugs at https://github.com/blaserlab/blaseRtools/issues
Add predefined sample-level cell metadata to cell data sets while importing
add_cds_factor_columns(cds, columns_to_add)add_cds_factor_columns(cds, columns_to_add)
cds |
A cell data set object |
columns_to_add |
A named vector where the name of each element becomes the name of the new colData column and the value is the value for that particular sample. Best used when importing from a metadata table. |
Add significance annotations to an existing ggplot
add_sig_annotations( plot, p_table, y_npc, group1_col = "group1", group2_col = "group2", label_col = "significance", x_levels = NULL, facet_cols = NULL, draw_brackets = TRUE, bracket_tip_npc = 0.015, bracket_margin_npc = 0.02, text_size_pt = NULL, star_y_npc_offset = 0.01, text_family = NULL, text_face = NULL, text_colour = "black", bracket_colour = "black", bracket_linewidth = 0.4, bracket_linetype = 1, bracket_lineend = "round", vjust = 0 )add_sig_annotations( plot, p_table, y_npc, group1_col = "group1", group2_col = "group2", label_col = "significance", x_levels = NULL, facet_cols = NULL, draw_brackets = TRUE, bracket_tip_npc = 0.015, bracket_margin_npc = 0.02, text_size_pt = NULL, star_y_npc_offset = 0.01, text_family = NULL, text_face = NULL, text_colour = "black", bracket_colour = "black", bracket_linewidth = 0.4, bracket_linetype = 1, bracket_lineend = "round", vjust = 0 )
plot |
A ggplot object with a discrete x-axis and continuous y-axis. |
p_table |
A data frame containing at minimum |
y_npc |
Numeric vector of label y positions in npc coordinates. |
group1_col, group2_col, label_col
|
Column names in |
x_levels |
Optional character vector giving x-axis order. |
facet_cols |
Optional character vector naming the facet columns in
|
draw_brackets |
Logical; whether brackets should be drawn. |
bracket_tip_npc |
Length of bracket tips in npc units. |
bracket_margin_npc |
Gap between label and bracket top line in npc units. |
text_size_pt |
Text size in points. Defaults to theme text size minus 2. |
star_y_npc_offset |
Upward offset, in npc units, applied only to
star-only labels like |
text_family, text_face, text_colour
|
Text styling parameters. |
bracket_colour, bracket_linewidth, bracket_linetype
|
Bracket styling. |
bracket_lineend |
Line ending for bracket segments. Defaults to |
vjust |
Vertical justification for text. |
A ggplot object with annotation layers appended.
Similar to aggregate. Splits the matrix into groups as
specified by groupings, which can be one or more variables. Aggregation
function will be applied to all columns in data, or as specified in formula.
Warning: groupings will be made dense if it is sparse, though data will not.
## S3 method for class 'Matrix' aggregate(x, groupings = NULL, form = NULL, fun = "sum", ...)## S3 method for class 'Matrix' aggregate(x, groupings = NULL, form = NULL, fun = "sum", ...)
x |
a |
groupings |
an object coercible to a group of factors defining the groups |
form |
|
fun |
character string specifying the name of aggregation function to be applied to all columns in data. Currently "sum", "count", and "mean" are supported. |
... |
arguments to be passed to or from methods. Currently ignored |
aggregate.Matrix uses its own implementations of functions and should
be passed a string in the fun argument.
A sparse Matrix. The rownames correspond to the values of the
groupings or the interactions of groupings joined by a _.
There is an attribute crosswalk that includes the groupings as a
data frame. This is necessary because it is not possible to include
character or data frame groupings in a sparse Matrix. If needed, one can
cbind(attr(x,"crosswalk"),x) to combine the groupings and the
aggregates.
An instance of this class is best created by calling "bb_parseape()" on a genebank or APE-formatted file. That function will parse the file, correctly format the sections and place them in the slots of the Ape Object. Technically only "LOCUS" is a required slot for the Ape object, however there is no point without having "ORIGIN" (sequence data), and so bb_parseape() will fail without an "ORIGIN" section. Other slots are optional. Additional slots will be ignored by the constructor function. DNA sequence will be stored in a DNAStringSet object and features in a GRanges object. See https://www.ncbi.nlm.nih.gov/Sitemap/samplerecord.html for genebank file specification.
LOCUSThe LOCUS line of the genebank formatted as a character string.
DEFINITIONThe DEFINITION line of the genebank file formatted as a character string.
ACCESSIONThe ACCESSION section of the genebank file formatted as a character string.
VERSIONThe VERSION section of the genebank file formatted as a character string.
SOURCEThe SOURCE section of the genebank file formatted as a character string.
COMMENTThe COMMENT section of the genebank file formatted as a character string.
FEATURESThe FEATURES section of the genebank file formatted as a character string. Created internally from the GRanges object. Caution: some FEATURE attributes may be lost in conversion.
ORIGINThe DNA sequence
end_of_fileThe end of the file signal.
dna_biostringThe entire ORIGIN sequence formatted as a DNAStringSet of length 1.
grangesGenebank features formatted as a GRanges object.
Get the DNASringSet Slot from an Ape Object
Ape.DNA(ape)Ape.DNA(ape)
Save an Ape Instance as a Fasta File
Ape.fasta(ape, feature = NULL, out)Ape.fasta(ape, feature = NULL, out)
feature |
Name of feature to select when writing FASTA file. If null (default), the whole biostring will be saved as a fasta. |
out |
Name of FASTA file to write |
For the supplied Ape object, run FIMO to identify putative transcription factor binding sites in a DNA subsequence.
Ape.fimo(ape, fimo_feature, out = NULL)Ape.fimo(ape, fimo_feature, out = NULL)
ape |
An Ape instance |
fimo_feature |
A character vector of features from the Ape object that will be used to run fimo. |
out |
Directory that will be created to hold the fimo results. A date/time stamp will be appended. If null, the objects will not be saved and the function will only return a GRanges object |
Get the GRanges Slot from an Ape Object
Ape.granges(ape)Ape.granges(ape)
Save an Ape Instance as a Genebank Format File
Ape.save(ape, out)Ape.save(ape, out)
out |
Name of genebank/APE file to write |
Set the FEATURES Slot of a GRanges Object
Ape.setFeatures(ape, gr)Ape.setFeatures(ape, gr)
ape |
An ape object |
gr |
A GRanges object. This object will become the new FEATURES and granges slots for the Ape object. So if you want to keep the old features, the new features need to be appended using c(old_gr, new_gr) as the value for the gr argument. |
cell_data_set objectsConvert objects to Monocle3 cell_data_set objects
as.cell_data_set(x, ...) ## S3 method for class 'Seurat' as.cell_data_set( x, assay = DefaultAssay(object = x), reductions = AssociatedDimReducs(object = x, assay = assay), default.reduction = DefaultDimReduc(object = x, assay = assay), graph = paste0(assay, "_snn"), group.by = NULL, ... )as.cell_data_set(x, ...) ## S3 method for class 'Seurat' as.cell_data_set( x, assay = DefaultAssay(object = x), reductions = AssociatedDimReducs(object = x, assay = assay), default.reduction = DefaultDimReduc(object = x, assay = assay), graph = paste0(assay, "_snn"), group.by = NULL, ... )
x |
An object |
... |
Arguments passed to other methods |
assay |
Assays to convert |
reductions |
A vector of dimensional reductions add to the
|
default.reduction |
Name of dimensional reduction to use for clustering name |
graph |
Name of graph to be used for clustering results |
group.by |
Name of cell-level metadata column to use as identites; pass |
The Seurat method utilizes
as.SingleCellExperiment to transfer over expression
and cell-level metadata. The following additional information is also
transferred over:
Cell emebeddings are transferred over to the
reducedDims slot. Dimensional reduction
names are converted to upper-case (eg. “umap” to “UMAP”) to
match Monocle 3 style
Feature loadings are transfered to
cds@reduce_dim_aux$gene_loadings if present. NOTE: only the
feature loadings of the last dimensional reduction are transferred over
Standard deviations are added to
cds@reduce_dim_aux$prop_var_expl if present. NOTE: only the
standard deviations of the last dimensional reduction are transferred over
Clustering information is transferred over in the following manner: if
cell-level metadata entries “monocle3_clusters” and
“monocle3_partitions” exist, then these will be set as the clusters
and partitions, with no nearest neighbor graph being added to the object;
otherwise, Seurat's nearest-neighbor graph will be converted to an
igraph object and added to the cell_data_set
object along with Seurat's clusters. No partition information is added when
using Seurat's clsuters
A cell_data_set object
Convert single cell experiment to Seurat
## S3 method for class 'cell_data_set' as.Seurat( x, counts = "counts", data = NULL, assay = "RNA", project = "cell_data_set", loadings = NULL, clusters = NULL, ... )## S3 method for class 'cell_data_set' as.Seurat( x, counts = "counts", data = NULL, assay = "RNA", project = "cell_data_set", loadings = NULL, clusters = NULL, ... )
loadings |
Name of dimensional reduction to save loadings to, if present;
defaults to first dimensional reduction present (eg.
|
clusters |
Name of clustering method to use for setting identity classes |
The cell_data_set method for as.Seurat
utilizes the SingleCellExperiment method of
as.Seurat to handle moving over expression data, cell
embeddings, and cell-level metadata. The following additional information
will also be transfered over:
Feature loadings from cds@reduce_dim_aux$gene_loadings will be
added to the dimensional reduction specified by loadings or the name
of the first dimensional reduction that contains "pca" (case-insensitive) if
loadings is not set
Monocle 3 clustering will be set as the default identity class. In addition, the Monocle 3 clustering will be added to cell-level metadata as “monocle3_clusters”, if present
Monocle 3 partitions will be added to cell-level metadata as “monocle3_partitions”, if present
Monocle 3 pseudotime calculations will be added to “monocle3_pseudotime”, if present
The nearest-neighbor graph, if present, will be converted to a
Graph object, and stored as
“assay_monocle3_graph”
as.Seurat.SingleCellExperiment
Generates a matrix of counts aggregated by gene and/or cell group.
bb_aggregate( obj, assay = "RNA", experiment_type = "Gene Expression", gene_group_df = NULL, cell_group_df = NULL, norm_method = c("log", "binary", "size_only"), pseudocount = 1, scale_agg_values = TRUE, max_agg_value = 3, min_agg_value = -3, binary_min = 0, exclude.na = TRUE )bb_aggregate( obj, assay = "RNA", experiment_type = "Gene Expression", gene_group_df = NULL, cell_group_df = NULL, norm_method = c("log", "binary", "size_only"), pseudocount = 1, scale_agg_values = TRUE, max_agg_value = 3, min_agg_value = -3, binary_min = 0, exclude.na = TRUE )
obj |
A Seurat or cell data set object |
assay |
Gene expression assay to use for aggregation; currently only applies to Seurat objects, Default: 'RNA' |
gene_group_df |
A 2-column dataframe with gene names or ids and gene groupings, Default: NULL |
cell_group_df |
A 2-coumn dataframe with cell ids and gene groupings, Default: NULL |
norm_method |
Gene normalization method, Default: c("log", "binary", "size_only") |
pseudocount |
Pseudocount, Default: 1 |
scale_agg_values |
Whether to scale the aggregated values, Default: TRUE |
max_agg_value |
If scaling, make this the maximum aggregated value, Default: 3 |
min_agg_value |
If scaling, make this the minimum aggregated value, Default: -3 |
binary_min |
Minimum value below which a cell is considered not to express a feature, Default: 0 |
exclude.na |
Exclude NA?, Default: TRUE |
The best way to group genes or cells is by using bb_*meta and then select cell_id or feature_id plus one metadata column with your group labels.
A dense or sparse matrix.
cli_div, cli_alert
normalized_counts, my.aggregate.Matrix
character(0)
Align a CDS object according to a metadata variable
bb_align(cds, align_by, n_cores = 8)bb_align(cds, align_by, n_cores = 8)
cds |
A cell data set object to align |
align_by |
A metadata column to align by |
n_cores |
Number of cores to reduce dimensions by. |
A modified cell data set object with aligned dimensions and new metadata columns holding prealignment umap coordinates.
Annotate a Plot using NPC Coordinates
bb_annotate_npc(label, x, y, ...)bb_annotate_npc(label, x, y, ...)
label |
the text label to apply to the plot |
x |
NPC X coordinate |
y |
NPC Y coordinate |
Will copy and rename the files and generate two files: "blinding_key.csv" with the original and blinded file names, and "scoresheet.csv" with just the blinded filenames. Add columns as needed to scoresheet, for example, runx_count. Then run bb_unblind to rejoin scoresheet to the key and generate an unblinded result file.
bb_blind_images(analysis_file, file_column, output_dir)bb_blind_images(analysis_file, file_column, output_dir)
analysis_file |
The analysis file for the experiment. It should contain 1 line for every biological sample and should have a filename for every file to be blinded. |
file_column |
The column name (substring match) in the analysis_file with the files to be blinded. |
output_dir |
The linux-style file path for the directory that will hold the blinded images. The directory will be created by the function. |
nothing
This function does several things. It removes ranges with non-standard chromosomes and drops their levels. It will optionally set the genome to the user-provided value. Typically we would use "hg38" or "danRer11". This is the exported version because it is so useful.
bb_buff_granges(x, gen)bb_buff_granges(x, gen)
x |
A Granges object to buff. |
gen |
An optional genome name to provide. Recommend "hg38" or "danRer11". |
A GRanges object
Use this function to transfer cell labels from one single cell dataset to another. If a cds is provided for either reference or query, it is converted to a seurat object and the labels are transferred to the query by anchor finding. The assignments will be in the form of a new cds column with name predicted."column name from reference". This should be unique on the first application of the function to a query dataset. However, if running queries against more than 1 reference data set it is possible that you will unintentionally generate the same column name which would overwrite the first assignment column. The function checks for this and aborts with the recommendation to supply a unique id to the unique_id parameter.
bb_cds_anno(query_cds, ref, transfer_col, unique_id = NULL)bb_cds_anno(query_cds, ref, transfer_col, unique_id = NULL)
query_cds |
The single cell data set you wish to annotate. Must be a CDS. |
ref |
The reference single cell data set. May be either a CDS or Seurat object. |
transfer_col |
The column from the reference data set that provides the labels. |
unique_id |
A unique identifier to add to the column with the transferred labels. Default is NULL but it is recommended to provide an informative label when annotating against more than one reference. Default is NULL. |
a CDS with two new cell metadata columns
cli_abort
components, SCTransform, RunPCA, RunUMAP, FindTransferAnchors, MapQuery
exprs
character(0)
reexports
select, mutate-joins
as_tibble
Supply a cds subset and aggregation values and get a heatmap. Plot the return value using cowplot::plot_grid(). This function wraps several complicated functions from ComplexHeatmap and tries to apply default values for the most common use case: plotting top markers from a cds with cells grouped arbitrarily (usually by cluster of some type). This function provides the option to aggregate by genes as well in order to plot gene modules. For cell and gene aggregation, provide a name (in the form of a string) of the corresponding metadata column and the aggregation will be performed. There are many aesthetic parameters for this function and even more available for the internal ComplexHeatmap::Heatmap() function. The best way to adjust parameters not provided here is to scroll the popup box that appears when you type ComplexHeatmap::Heatmap() in RStudio and pass them in via the ellipsis. The default is to put genes in columns and cells in rows. You can flip this behavior by setting flip_axis = TRUE. This will require adjustment of some aesthetic parameters. Complex annotations (beyond labeling cherry-picked genes) are not currently supported.
bb_cds_heatmap( cds_subset, cellmeta_col = NULL, rowmeta_col = NULL, heatmap_highlights = NULL, three_colors = c("blue4", "ivory", "red3"), flip_axis = FALSE, name = NULL, heatmap_legend_param = list(title_gp = gpar(fontface = "plain", fontsize = 9), grid_width = unit(0.14, "in"), labels_gp = gpar(fontsize = 8)), row_dend_width = unit(5, "mm"), column_dend_height = unit(5, "mm"), column_dend_side = "bottom", show_row_names = T, row_names_gp = gpar(fontsize = 9), show_column_names = F, row_dend_gp = gpar(lwd = 0.5), column_dend_gp = gpar(lwd = 0.5), row_title = NULL, column_title = NULL, padding = 1.5, labels_rot = 45, ... )bb_cds_heatmap( cds_subset, cellmeta_col = NULL, rowmeta_col = NULL, heatmap_highlights = NULL, three_colors = c("blue4", "ivory", "red3"), flip_axis = FALSE, name = NULL, heatmap_legend_param = list(title_gp = gpar(fontface = "plain", fontsize = 9), grid_width = unit(0.14, "in"), labels_gp = gpar(fontsize = 8)), row_dend_width = unit(5, "mm"), column_dend_height = unit(5, "mm"), column_dend_side = "bottom", show_row_names = T, row_names_gp = gpar(fontsize = 9), show_column_names = F, row_dend_gp = gpar(lwd = 0.5), column_dend_gp = gpar(lwd = 0.5), row_title = NULL, column_title = NULL, padding = 1.5, labels_rot = 45, ... )
cds_subset |
The subset of cells and genes you want to plot as a heatmap. Best approach is to pipe the cds through filter_cds() and into this function. |
cellmeta_col |
The name of a cell metadata column to aggregate cells by; one of cellmeta_col and rowmeta_col must not be NULL, Default: NULL |
rowmeta_col |
The name of a row metadata column to aggregate cells by; one of cellmeta_col and rowmeta_col must not be NULL, Default: NULL |
heatmap_highlights |
A vector of gene names to highlight using anno_mark(), Default: NULL |
three_colors |
A vector of colors for the main color scale, Default: c("blue4", "ivory", "red3") |
flip_axis |
Logical; whether to plot genes as rows (TRUE) or columns (FALSE), Default: FALSE |
name |
Name of the main color scale, Default: NULL |
heatmap_legend_param |
Graphical parameters for the main heatmap legend, Default: list(title_gp = gpar(fontface = "plain", fontsize = 9), grid_width = unit(0.14, "in"), labels_gp = gpar(fontsize = 8)) |
row_dend_width |
Row dendrogram width, Default: unit(5, "mm") |
column_dend_height |
Column dendrogram height, Default: unit(5, "mm") |
column_dend_side |
Side on which to plot the column dendrogram, Default: 'bottom' |
show_row_names |
Logical; whether or not to show rownames, Default: T |
row_names_gp |
Graphical parameters for the row names, Default: gpar(fontsize = 9) |
show_column_names |
Logical; whether or not to show column names, Default: F |
row_dend_gp |
Graphical parameters for the row dendrogram, Default: gpar(lwd = 0.5) |
column_dend_gp |
Graphical parameters for teh column dendrogram, Default: gpar(lwd = 0.5) |
row_title |
Row title text, Default: NULL |
column_title |
Column title text, Default: NULL |
padding |
Padding between gene names on the heatmap highlights, Default: 1.5 |
labels_rot |
Rotation of the heatmap highlight labels, Default: 45 |
... |
Optional arguments to pass to ComplexHeatmap::Heatmap() |
A complex heatmap in the form of a gtree.
Use this function to identify ligand/receptor pairs expressed by cell clusters in human, mouse or zebrafish single cell data. A CellChat object is generated which can be used to visualize these connections using bb_cellchat_heatmap or other tools from package CellChat.
bb_cellchat( cds, group_var, n_cores = 12, species = c("human", "mouse", "zebrafish"), min_cells = 10, prob_type = c("triMean", "truncatedMean", "median"), prob_trim = NULL, project = TRUE, pop_size_arg = TRUE, ask = TRUE )bb_cellchat( cds, group_var, n_cores = 12, species = c("human", "mouse", "zebrafish"), min_cells = 10, prob_type = c("triMean", "truncatedMean", "median"), prob_trim = NULL, project = TRUE, pop_size_arg = TRUE, ask = TRUE )
cds |
The cell data set object. It should usually be pre-filtered to conatin a single biological sample. |
group_var |
The cell metadata column identifying cell groups for cell-cell communication inference. |
n_cores |
Number of cores for the analysis, Default: 12 |
species |
Species for the assay, Default: c("human", "mouse", "zebrafish") |
min_cells |
Cell clusters smaller than this value will be ignored., Default: 10 |
prob_type |
Methods for computing the average gene expression per cell group. By default = "triMean", producing fewer but stronger interactions; When setting ‘type = "truncatedMean"', a value should be assigned to ’trim', producing more interactions, Default: c("triMean", "truncatedMean", "median") |
prob_trim |
the fraction (0 to 0.25) of observations to be trimmed from each end of x before the mean is computed if using truncatedMean, Default: NULL |
project |
Whether or not to smooth gene expression, Default: TRUE |
pop_size_arg |
Whether consider the proportion of cells in each group across all sequenced cells. Set population.size = FALSE if analyzing sorting-enriched single cells, to remove the potential artifact of population size. Set population.size = TRUE if analyzing unsorted single-cell transcriptomes, with the reason that abundant cell populations tend to send collectively stronger signals than the rare cell populations., Default: TRUE |
see github::sqjin/CellChat
A CellChat object
normalized_counts
mutate-joins,pull
tibble
createCellChat,CellChatDB.human,CellChatDB.mouse,character(0),subsetData,identifyOverExpressedGenes,identifyOverExpressedInteractions,projectData,computeCommunProb,filterCommunication,computeCommunProbPathway,aggregateNet
character(0)
plan
This will generate heatmap from a CellChat object using ComplexHeatmap::Heatmap. Options are provided to filter for sender and receiver cells, to generate simple marginal annotations and for aesthetic control.
bb_cellchat_heatmap( object, source_filter = NULL, target_filter = NULL, interaction_filter = NULL, interaction_threshold = 0, colors = c("transparent", "red3"), rowanno = c(NULL, "Annotation", "Pathway"), rowanno_colors = NULL, colanno = c(NULL, "Source", "Target"), colanno_colors = NULL, pval_filter = 0.05, heatmap_name = "Interaction\nScore", heatmap_show_row_dend = TRUE, heatmap_row_dend_width = unit(5, "mm"), heatmap_show_column_dend = TRUE, heatmap_column_dend_height = unit(5, "mm"), heatmap_row_names_gp = gpar(fontsize = 10), heatmap_column_names_gp = gpar(fontsize = 10), heatmap_column_names_rot = 90, heatmap_column_title = NULL, heatmap_column_title_gp = gpar(fontsize = 12, fontface = "bold"), col_anno_name_gp = gpar(fontsize = 10, fontface = "bold"), row_anno_name_gp = gpar(fontsize = 10, fontface = "bold"), return_value = c("heatmap", "plot", "matrix") )bb_cellchat_heatmap( object, source_filter = NULL, target_filter = NULL, interaction_filter = NULL, interaction_threshold = 0, colors = c("transparent", "red3"), rowanno = c(NULL, "Annotation", "Pathway"), rowanno_colors = NULL, colanno = c(NULL, "Source", "Target"), colanno_colors = NULL, pval_filter = 0.05, heatmap_name = "Interaction\nScore", heatmap_show_row_dend = TRUE, heatmap_row_dend_width = unit(5, "mm"), heatmap_show_column_dend = TRUE, heatmap_column_dend_height = unit(5, "mm"), heatmap_row_names_gp = gpar(fontsize = 10), heatmap_column_names_gp = gpar(fontsize = 10), heatmap_column_names_rot = 90, heatmap_column_title = NULL, heatmap_column_title_gp = gpar(fontsize = 12, fontface = "bold"), col_anno_name_gp = gpar(fontsize = 10, fontface = "bold"), row_anno_name_gp = gpar(fontsize = 10, fontface = "bold"), return_value = c("heatmap", "plot", "matrix") )
object |
The CellChat object to plot |
source_filter |
Optional filter for source cell clusters from the object metadata. Accepts a single string or vector of cell groups., Default: NULL |
target_filter |
Optional filter for target cell clusters from the object metadata. Accepts a single string or vector of cell groups., Default: NULL |
interaction_filter |
Optional filter to include only certain interactions in the figure. |
interaction_threshold |
Optional filter to only include interactions above a certain threshold. |
colors |
Color scale endpoints, Default: c("transparent", "red3") |
rowanno |
Options for simple row annotation; must be one of c(NULL, "Annotation", "Pathway") |
rowanno_colors |
Optional colors to replace the poor color selections from Complex heatmap. Must be supplied as a named list with one element each for "Annotation" and "Pathway". Not required if not showing these annotations. The list should be of the form: list(Annotation = c("name1" = "color value1", "name2" = "color_value2")), Default: NULL |
colanno |
Options for simple column annotation; must be one of c(NULL, "Source", "Target") |
colanno_colors |
See rowanno_colors, Default: NULL |
pval_filter |
Filter for significance of associations. CellChat returns pvalues of 0, 0.01, and 0.05; this function will filter and retain values less than or equal to the provided value. Default: 0.05 |
heatmap_name |
Name for the main color scale of the heatmap, Default: 'InteractionScore' |
heatmap_show_row_dend |
Show row dendrograms? Default: TRUE |
heatmap_row_dend_width |
Width of row dendrograms Default: unit(5, "mm") |
heatmap_show_column_dend |
Show column dendrograms?' Default: TRUE |
heatmap_column_dend_height |
Height of column dendrograms. Default: unit(5, "mm") |
heatmap_row_names_gp |
Row name graphical params, Default: gpar(fontsize = 10) |
heatmap_column_names_gp |
Column name graphical params, Default: gpar(fontsize = 10) |
heatmap_column_names_rot |
Column name rotation, Default: 90 |
heatmap_column_title |
Column title, Default: NULL |
heatmap_column_title_gp |
Column title graphical params, Default: gpar(fontsize = 12, fontface = "bold") |
col_anno_name_gp |
Column annotation name graphical params, Default: gpar(fonmtsize = 10, fontface = "bold") |
row_anno_name_gp |
Row annotation name graphical params, Default: gpar(fontsize = 10, fontface = "bold") |
return_value |
Return a heatmap plot or a matrix. |
see github::sqjin/CellChat
a heatmap as a grid object; plot using cowplot::plot_grid
subsetCommunication
as_tibble,c("tibble", "tibble"),rownames
filter,mutate,select,mutate-joins,group_by,summarise
pivot_wider
rowAnnotation,columnAnnotation,draw-dispatch,Heatmap
colorRamp2
grid.grab
Take a cell_data_set object or a Seurat object and return the cell metadata in the form of a tibble. The unique cell identifier column is labeled cell_id by default. Prior versions of this function would only accept a cell_data_set. The input argument has been changed from cds to obj to reflect the fact that Seurat objects are now also accepted.
bb_cellmeta(obj, row_name = "cell_id", cds = NULL)bb_cellmeta(obj, row_name = "cell_id", cds = NULL)
obj |
A cell_data_set or Seurat object. |
row_name |
Optional name to provide for cell unique identifier, Default: 'cell_id' |
cds |
Provided for compatibility with prior versions, Default: NULL |
If a value is supplied for cds, a warning will be issued and the function will pass the value of cds to obj.
A tibble
Requires a cds with an alt experiment established. Use bb_split_citeseq to generate this and to normalize binding data using the CLR method. Returns a ggplot.
bb_cite_umap( cds, antibody, assay = "CLR_counts", cell_size = 1, alpha = 1, alt_dim_x = NULL, alt_dim_y = NULL, plot_title = NULL, color_legend_title = NULL, order = TRUE, rescale = NULL, ncol = NULL )bb_cite_umap( cds, antibody, assay = "CLR_counts", cell_size = 1, alpha = 1, alt_dim_x = NULL, alt_dim_y = NULL, plot_title = NULL, color_legend_title = NULL, order = TRUE, rescale = NULL, ncol = NULL )
cds |
The cds with an "Antibody Capture" alt experiment to plot. |
antibody |
The name of the antibody to plot. Equivalent to gene_short_name. Accepts a character vector. |
assay |
The binding assay to use, Default: "CLR_counts" |
cell_size |
Size of points to plot, Default: 1 |
alpha |
Alpha for the plotted points, Default: 1 |
alt_dim_x |
Alternate/reference dimensions to plot by. |
alt_dim_y |
Alternate/reference dimensions to plot by. |
plot_title |
Optional title for the plot, Default: NULL |
color_legend_title |
Optional title for the color scale., Default: NULL |
order |
Whether or not to order cells by gene expression. When ordered, non-expressing cells are plotted first, i.e. on the bottom. Default: TRUE |
rescale |
Optional redefinition of the color scale, Default: NULL |
ncol |
If specified, the number of columns for facet_wrap, Default: NULL |
a ggplot
Use this function to determine the differential representation of cells in clusters. It will determine fold change in a single experimental class over a single control or reference class. This value is normalized to the number of cells captured in all clusters from the class. Significance is determined using Fisher's exact test. This test may overestimate significance in large data sets. In this case, bb_cluster_representation2 may be more robust.
bb_cluster_representation( cds, cluster_var, class_var, experimental_class, control_class, pseudocount = 1, return_value = c("table", "plot") )bb_cluster_representation( cds, cluster_var, class_var, experimental_class, control_class, pseudocount = 1, return_value = c("table", "plot") )
cds |
A cell data set object |
cluster_var |
The CDS cell metadata column holding cluster data. There can be any number of clusters in this column. |
class_var |
The CDS cell metadata column holding sample class data. There can be only 2 classes in this column. You may need to subset or reclass the samples to achieve this. |
experimental_class |
The value from the class column indicating the experimental group. |
control_class |
The value from the class column indicating the control or reference class. |
return_value |
Option to return either a plot or a data table for plotting in a separate step. Must be either "plot" or "table". |
A ggplot or a table of data for plotting
Use this function to determine the differential representation of cells in clusters. It uses a regression method to determine fold change between groups of biological samples. It can only compare two sample groups, e.g. control vs experimental at this point. See parameter descriptions for how to identify these properly.
bb_cluster_representation2( obj, sample_var, cluster_var, comparison_var, comparison_levels = NULL, color_pal = c("red3", "blue4"), sig_val = c("FDR", "PValue"), return_val = c("plot", "data") )bb_cluster_representation2( obj, sample_var, cluster_var, comparison_var, comparison_levels = NULL, color_pal = c("red3", "blue4"), sig_val = c("FDR", "PValue"), return_val = c("plot", "data") )
obj |
The (possibly filtered) single cell object to operate on. Can be either Seurat or monocle/CDS object. |
sample_var |
The metadata column holding the biological sample information. |
cluster_var |
The metadata column holding the clustering or other cell classification information. |
comparison_var |
The metadata column holding the comparison group information. There can be only two levels in this column. Character data will be converted to factors. |
comparison_levels |
A character vector identifying the order of the levels to compare. The first value will be shown with negative log2Fold Change and the second will be positive. If NULL (default), R will pick for you. |
color_pal |
Color palette for the comparison levels, Default: c("red3", "blue4") |
sig_val |
Report PValue or FDR, Default: "FDR" |
return_val |
Value to return, Default: c("plot", "table) |
DETAILS
OUTPUT_DESCRIPTION
http://bioconductor.org/books/3.13/OSCA.multisample/differential-abundance.html
This function takes a cell data set object and a cell metadata variable as input. The latter is specified as a named list. The name is the cell metadata variable name and each element must be a character vector of length 2 specifying the levels of that variable to compare. For example, for a column named "genotype" with levels "WT", "heterozygote", and "homozygote", you would compare differential abundance between WT and homozygote like this: list(genotype = c("WT", "homozygote")). Cells with these classifications will get a differential abundance score. Negative values will be assigned to the first element (WT in this example) and positive values to the second element (homozygote in this example). Cells with other values for this variable ("heterozygote" in this example) will get an NA. Multiple comparisons can be performed for different levels within the same variable or for different variables altogether by providing additional elements to the list. E.g. list(genotype = c("WT", "heterozygote), genotype = c("WT", "homozygote")). For each comparison, a new cell metadata column will be added in the form of da_score_name_level1_level2. E.g. da_score_genotype_WT_homozygote.
bb_daseq(obj, comparison_list, sample_var = "sample")bb_daseq(obj, comparison_list, sample_var = "sample")
obj |
a cell data set object |
comparison_list |
a named list as specified in description |
sample_var |
the cell metadata variable holding biological sample information, Default: 'sample' |
a cell data set
cli_div, cli_alert
map2, reexports, pmap, reduce
filter, pull, mutate-joins, select, join_by, rename
reducedDims
getDAcells
as_tibble
Use doubletfinder to model and mark doublets
bb_doubletfinder(cds, doublet_prediction, qc_table, ncores = 1)bb_doubletfinder(cds, doublet_prediction, qc_table, ncores = 1)
cds |
A cell data set object |
doublet_prediction |
Predicted proportion of doublets fom 0 to 1 |
qc_table |
A table of qc calls from the blaseRtools qc function |
A tibble of low- and high-confidence doublet calls by barcode
Use this to extract gene sets from the MSIGDB. Most gene sets are known by "STANDARD_NAME". You can filter the gene set list by supplying a named filter list to the bb_extract_msig function. The name of each list element should be one of the metadata column names and the list element contents should be the values to filter for. Filtering works in an additive way, meaning if you supply a filter list with two elements it will extract gene sets passing filters 1 AND 2.
bb_extract_msig( filter_list = NULL, return_form = c("id_list", "name_list", "tibble") )bb_extract_msig( filter_list = NULL, return_form = c("id_list", "name_list", "tibble") )
filter_list |
A named list to filter the MSIGDB by. Defaults to NULL which will return the whole MSIGDB |
return_form |
Select from a list of gene ids or a list of gene names by gene set. This is a useful format for the fgsea package. Alternatively "tibble" can be select and all filtered gene sets will be bound into a long-form (tidy) tibble. |
Gene set as a list or tibble.
This will only work if the backslashes have already been escaped to double backslashes. For example, readr automatically does this when reading in windows file paths from csv files. If you are using as a standalone function, run this in the terminal: scan(what = "character, n = 1), paste the unmodified filepath at the blank line, and copy the modified file path as the argument, x.
bb_fix_file_path(x)bb_fix_file_path(x)
x |
A character string filepath copied from Windows with escaped backslashes. |
A linux-compatible filepath
Often you will wish to package a Signac object or share the object to someone who does not have access to the same directories as you. This is a problem because one of the internal sub-objects, the Fragment object, holds a file path to a directory containing an atac fragments file and its .tbi intex. Often this file will only be accessbile to you. This function allows you to replace the fragments file path that is held internally within the Signac object. Just copy the files to a shared location and provide that file location to the function.
For Signac objects with multiple internal Fragments objects, provide vector of file paths.
For packaged fragments files, use system.file or fs::path_package to access this, usually from extdata.
bb_fragment_replacement(obj, new_paths)bb_fragment_replacement(obj, new_paths)
obj |
A signac object. |
new_paths |
A character vector of new file paths. You must have the same number of new file paths as Fragments sub-objects within the signac object. |
A modified Signac object.
c("Cells.Fragment", "CountFragments", "CreateFragmentObject", "FilterCells", "Fragment-class", "Fragments", "Fragments", "SplitFragments", "UpdatePath", "ValidateCells", "ValidateFragments", "ValidateHash", "head.Fragment", "subset.Fragment"), CreateFragmentObject
cli_abort
file_access
map2
Make a dotplot of gene expression by cell population
bb_gene_dotplot( cds, markers, group_cells_by, reduction_method = "UMAP", norm_method = c("size_log", "log_only"), scale_expression_by_gene = FALSE, lower_threshold = 0, max.size = 10, group_ordering = "bicluster", gene_ordering = NULL, pseudocount = 1, scale_max = 3, scale_min = -3, colorscale_name = NULL, sizescale_name = NULL, ... )bb_gene_dotplot( cds, markers, group_cells_by, reduction_method = "UMAP", norm_method = c("size_log", "log_only"), scale_expression_by_gene = FALSE, lower_threshold = 0, max.size = 10, group_ordering = "bicluster", gene_ordering = NULL, pseudocount = 1, scale_max = 3, scale_min = -3, colorscale_name = NULL, sizescale_name = NULL, ... )
cds |
A cell data set object |
markers |
A character vector of genes to plot |
group_cells_by |
A cds colData column. Use "multifactorial" to pick 2 categorical variables to put on X axis and to facet by. See ordering below. |
norm_method |
How to normalize gene expression. Size_factor and log normalized or only log normalized. |
scale_expression_by_gene |
Whether to scale expression values according to gene. Defaults to FALSE. |
lower_threshold |
Lower cutoff for gene expression |
max.size |
The maximum size of the dotplot |
group_ordering |
Defaults to "biclustering" method from pheatmap. Optionally will take a vector of group values to set the axis order explicitly. If using group_cells_by = "multifactorial" you will need a df to define facet and axis levels. See example. |
gene_ordering |
Optional vector of gene names to order the plot. |
pseudocount |
Add to zero expressors. Default = 1 |
scale_max |
Expression scale max |
scale_min |
Expression scale min |
colorscale_name |
Label for the color scale |
sizescale_name |
Label for the size scale |
... |
Additional parameters to pass to facet_wrap. |
A ggplot
Based on Monocle3's gene module functions. Implemented with default values. Will convert a Seurat object to a cell data set using SeuratWrappers and then calculate modules. The function returns an object of the same type.
bb_gene_modules(obj, n_cores = 8, cds = NULL)bb_gene_modules(obj, n_cores = 8, cds = NULL)
obj |
A single cell object of type Seurat or cell_data_set. |
n_cores |
Number of processor cores to use for the analysis, Default: 8 |
cds |
Provided for backward compatibility. If a value is supplied, it will return a warning and transfer to the obj argument., Default: NULL |
see https://cole-trapnell-lab.github.io/monocle3/docs/differential/#gene-modules
An object of the same type: Seurat or cell_data_set
graph_test, find_gene_modules
rename, mutate-joins, mutate, select
fct_shift
Plot expression of a gene or genes in pseudotime.
bb_gene_pseudotime( cds_subset, min_expr = NULL, cell_size = 0.75, nrow = NULL, ncol = 1, panel_order = NULL, color_cells_by = "pseudotime", trend_formula = "~ splines::ns(pseudotime, df=3)", label_by_short_name = TRUE, vertical_jitter = NULL, horizontal_jitter = NULL )bb_gene_pseudotime( cds_subset, min_expr = NULL, cell_size = 0.75, nrow = NULL, ncol = 1, panel_order = NULL, color_cells_by = "pseudotime", trend_formula = "~ splines::ns(pseudotime, df=3)", label_by_short_name = TRUE, vertical_jitter = NULL, horizontal_jitter = NULL )
cds_subset |
A cell data set object subset with only cells and genes of interest |
min_expr |
Lower threshold of expression for plotting |
cell_size |
Size of point for plotting |
nrow |
Number of rows for facetting |
ncol |
Number of columns for facetting |
panel_order |
Character string for order of genes to plot |
color_cells_by |
A cds colData column |
trend_formula |
Formula for the trend line |
label_by_short_name |
Boolean to label by gene name or ID |
vertical_jitter |
Adjustment to vertical jitter. Optional |
horizontal_jitter |
Adjustment to horizontal jitter. Optional |
A ggplot
Takes in a Seurat or cell_dat_set object, extracts UMAP dimensions and gene expression values. For Seurat, default assay is "RNA"; can be changed to if necessary. For cell_data_set, the assay parameter does nothing; the function extracts log and size-factor normalized counts which are similar but not identical to the Seurat "RNA" assay. If a vector of genes is supplied to gene_or_genes, a faceted plot will be generated. If a dataframe is supplied, an aggregated plot will be generated with a facet for each gene group. The dataframe must be of 2 colums: the first containing feature ids and the second containing grouping information. This is best generated using bb_rowmeta.
bb_gene_umap( obj, gene_or_genes, assay = "RNA", order = TRUE, cell_size = 1, alpha = 1, ncol = NULL, plot_title = NULL, color_legend_title = "Expression", max_expr_val = NULL, alt_dim_x = NULL, alt_dim_y = NULL, rasterize = FALSE, raster_dpi = 300, cds = NULL )bb_gene_umap( obj, gene_or_genes, assay = "RNA", order = TRUE, cell_size = 1, alpha = 1, ncol = NULL, plot_title = NULL, color_legend_title = "Expression", max_expr_val = NULL, alt_dim_x = NULL, alt_dim_y = NULL, rasterize = FALSE, raster_dpi = 300, cds = NULL )
obj |
A Seurat or cell_data_set object. |
gene_or_genes |
Individual gene or genes or aggregated genes to plot. Supply a character string for a single gene, a vector for multiple genes or a dataframe for aggregated genes. See description. |
assay |
For Seurat objects only: the gene expression assay to get expression data from, Default: 'RNA' |
order |
Whether or not to order cells by gene expression. When ordered, non-expressing cells are plotted first, i.e. on the bottom. Caution: when many cells are overplotted it may lead to a misleading presentation. Generally bb_genebubbles is a better way to present, Default: TRUE |
cell_size |
Size of the points, Default: 1 |
alpha |
Transparency of the points, Default: 1 |
ncol |
Specify the number of columns if faceting, Default: NULL |
plot_title |
Optional title for the plot, Default: NULL |
color_legend_title |
Option to change the color scale title, Default: 'Expression' |
max_expr_val |
Maximum expression value to cap the color scale, Default: NULL |
alt_dim_x |
Alternate/reference dimensions to plot by. |
alt_dim_y |
Alternate/reference dimensions to plot by. |
rasterize |
Whether to render the graphical layer as a raster image. Default is FALSE. |
raster_dpi |
If rasterize then this is the DPI used. Default = 300. |
cds |
Provided for backward compatibility. If a value is supplied a warning will be emitted., Default: NULL |
A ggplot
normalized_counts
as_tibble
pivot_longer
mutate-joins, mutate, select, arrange
ggplot, aes, geom_point, scale_colour_viridis_d, labs, facet_wrap, vars, theme, margin
Make a plot of gene expression in UMAP form
bb_gene_violinplot( cds, variable, genes_to_plot, experiment_type = "Gene Expression", pseudocount = 1, include_jitter = FALSE, ytitle = "Expression", plot_title = NULL, rows = 1, show_x_label = TRUE, legend_pos = "none", comparison_list = NULL, palette = NULL, violin_alpha = 1, jitter_alpha = 1, jitter_color = "black", jitter_fill = "transparent", jitter_size = 0.5, facet_scales = "fixed", order_genes = TRUE, jitter_match = FALSE, rasterize = FALSE, raster_dpi = 300 )bb_gene_violinplot( cds, variable, genes_to_plot, experiment_type = "Gene Expression", pseudocount = 1, include_jitter = FALSE, ytitle = "Expression", plot_title = NULL, rows = 1, show_x_label = TRUE, legend_pos = "none", comparison_list = NULL, palette = NULL, violin_alpha = 1, jitter_alpha = 1, jitter_color = "black", jitter_fill = "transparent", jitter_size = 0.5, facet_scales = "fixed", order_genes = TRUE, jitter_match = FALSE, rasterize = FALSE, raster_dpi = 300 )
cds |
A cell data set object |
variable |
Stratification variable for x-axis |
genes_to_plot |
Either a character vector of gene short names or a tbl/df where the first column is gene short name and the second is the gene grouping. |
pseudocount |
Value to add to zero-cells |
include_jitter |
Include jitter points |
ytitle |
Title for y axis |
plot_title |
Main title for the plot |
rows |
Number of rows for facetting |
show_x_label |
Option to show x label |
legend_pos |
Position for label |
comparison_list |
Optional list of comparisons for ggpubr |
palette |
Color palette to use. Viridis is default. |
violin_alpha |
Alpha value for violin plot |
jitter_alpha |
Alpha value for jitter plot |
jitter_color |
Color for the jitter plot. Defaults to black and ignored if jitter_match == TRUE |
jitter_fill |
Fill for the jitter plot |
jitter_size |
Size of the jitter points |
facet_scales |
Scale option for facetting. "Fixed" is default |
order_genes |
If true, put genes in the same order as variable parameter |
jitter_match |
If true, match jitter color to violin fill. |
rasterize |
Whether to render the graphical layer as a raster image. Default is FALSE. |
raster_dpi |
If rasterize then this is the DPI used. Default = 300. |
A ggplot
This is a very data-dense plot and is the recommended way for showing expression of single markers/genes by cell group. By default, this function will return an unfaceted ggplot with cell groups on the X axis and genes on the Y axis with dot size representing proportion of cells in the cell group expressing a gene and color scale representing per-cell expression.
But it also may be of interest to add aesthetic variables such as facets or additional color scales. There are two ways this function will facilitate that. First, you can supply a vector of cell groups to the cell_grouping argument and a the cells will be grouped by the composite value of these factors. Usually if you are doing this, you also will want to have access to the components of this composite variable to facet by. So you can supply "data" to the return_value argument to get a tibble. From there you can modify as necessary and generate a ggplot assigning aesthetics and scales as desired and using geom_point.
This function also supports visualizing citeseq data. These data should be allocated to an alternative experiment in the cds object. To show these data, set experiment_type to "Antibody Capture" or the name of the alternate experiment with citeseq data. The genes parameter should be the name assigned to the antibody derived tag. Expression threshold is particularly useful in this case because of the background binding observed with antibodies. The default is 0 and so by default any cell with more than 0 counts will be considered an expressor of that marker. This threshold is applied before scaling across markers. The best way to set this threshold is to visualize your markers of interest and isotypes with expression_threshold = 0 and scale_expr = FALSE. Then pick a threshold value based on the color scale and rerun with scale_expr either TRUE or FALSE.
bb_genebubbles( obj, genes, cell_grouping, experiment_type = "Gene Expression", scale_expr = TRUE, expression_threshold = 0, gene_ordering = c("bicluster", "as_supplied"), group_ordering = c("bicluster", "as_supplied"), return_value = c("plot", "data") )bb_genebubbles( obj, genes, cell_grouping, experiment_type = "Gene Expression", scale_expr = TRUE, expression_threshold = 0, gene_ordering = c("bicluster", "as_supplied"), group_ordering = c("bicluster", "as_supplied"), return_value = c("plot", "data") )
obj |
A Seurat or cell_data_set object. |
genes |
Gene or genes to plot. |
cell_grouping |
Cell metadata column to group cells by. Supply more than one in a vector to generate a composite variable. |
experiment_type |
Experiment data to plot. Usually will be either "Gene Expression" or "Antibody Capture", Default: 'Gene Expression' |
scale_expr |
Whether to scale expression by gene, Default: TRUE |
expression_threshold |
Pre-scaling expression value below which a cell is considered not to express a marker. This value is fed to the binary_min parameter of bb_aggregate, Default = 0 |
gene_ordering |
By default, genes will be ordered by a clustering algorithm. Supply "as_supplied" to plot the genes in the order supplied to the "genes" argument , Default: c("bicluster", "as_supplied") |
group_ordering |
By default, cell groups will be ordered by a clustering algorithm. Supply "as_supplied" to plot the cell groups in the order supplied to "cell_grouping", Default: c("bicluster", "as_supplied") |
return_value |
Whether to return a plot or data in tibble form, Default: c("plot", "data") |
A ggplot or a tibble
A function to find enriched go terms from a query list of gene names relative to a reference list of gene names.
bb_goenrichment( query, reference, group_pval = 0.01, go_db = c("org.Hs.eg.db", "org.Dr.eg.db", "org.Mm.eg.db") )bb_goenrichment( query, reference, group_pval = 0.01, go_db = c("org.Hs.eg.db", "org.Dr.eg.db", "org.Mm.eg.db") )
query |
A vector of gene names |
reference |
The background gene list. Usually will be as_tibble(rowData(cds_main)). |
group_pval |
P value to determine enrichment. Default: 0.01. |
go_db |
GO term database Default: c("org.Hs.eg.db", "org.Dr.eg.db", "org.Mm.eg.db") |
A list of items including the enrichment results.
Make a scatter plot of GO term associations
bb_goscatter( simMatrix, reducedTerms, size = "score", addLabel = TRUE, labelSize = 4 )bb_goscatter( simMatrix, reducedTerms, size = "score", addLabel = TRUE, labelSize = 4 )
simMatrix |
Take from output of bb_gosummary |
reducedTerms |
Also take from output of bb_gosummary |
size |
Variable to map to point size. Defaults to "score". |
addLabel |
Boolean; whether or not to add text labels |
labelSize |
Optional label size |
A ggplot
A function to reduce go terms by semantic similarity
bb_gosummary( x, reduce_threshold = 0.8, go_db = c("org.Hs.eg.db", "org.Dr.eg.db", "org.Mm.eg.db") )bb_gosummary( x, reduce_threshold = 0.8, go_db = c("org.Hs.eg.db", "org.Dr.eg.db", "org.Mm.eg.db") )
x |
A list go term enrichment results produced by bb_goenrichment. |
reduce_threshold |
The degree of term reduction. 0 to 1. Higher is more reduction. |
go_db |
The database to query. Choose from c("org.Hs.eg.db", "org.Dr.eg.db", "org.Mm.eg.db", ...). |
A list of items for downstream plotting
This is a thin wrapper around rtracklayer::import.bw. The purpose is to serve as a helper function for making Trace objects from bigwig files. This function converts the name of the numeric value metadata column to "coverage" for consistency with Trace objects made from Seurat/Signac objects. The numeric column it will look for is "score" by default. This appears to be the default name applied by import.bw and so this is the default value for this function. The option is provided to change it if necessary. The group variable must be supplied. Typically this will be something informative, like "ATAC" or some Histone mark that the data come from.
bb_import_bw(path, group, coverage_column = "score")bb_import_bw(path, group, coverage_column = "score")
path |
file path to the bigwig file |
group |
a label to apply that describes the source data for the track |
coverage_column |
PARAM_DESCRIPTION, Default: 'score' |
a granges object
The function reads peaks in .bed file format produced by SEACR. Optionally add a group variable and value for later filtering or faceting when combined with other peak files.
The function reads peak files produced by MACS. Optionally add a group variable and value for later filtering or faceting when combined with other peak files.
bb_import_seacr_peaks(file, group_variable = NULL, group_value = NULL) bb_import_macs_narrowpeaks(file, group_variable = NULL, group_value = NULL)bb_import_seacr_peaks(file, group_variable = NULL, group_value = NULL) bb_import_macs_narrowpeaks(file, group_variable = NULL, group_value = NULL)
file |
file path to the MACS narrowpeak file |
group_variable |
An optional variable name for additional group metadata. PARAM_DESCRIPTION, Default: NULL |
group_value |
A value supplied to the group metadata variable. Required if group_variable is not NULL. PARAM_DESCRIPTION |
A GRanges object
A GRanges object
read_delim
select, mutate
makeGRangesFromDataFrame
read_delim
select, mutate
makeGRangesFromDataFrame
This function reads a 10X Genomics H5 file and returns a cell_data_set or CDS. The option to split citeseq data .
bb_load_tenx_h5(filename, sample_metadata_tbl = NULL, split_citeseq = FALSE)bb_load_tenx_h5(filename, sample_metadata_tbl = NULL, split_citeseq = FALSE)
filename |
Path to the h5 file. |
sample_metadata_tbl |
A single row data frame with sample metadata. Usually this will be filtered from an experiment config file. |
split_citeseq |
Option to retain citeseq data within the main experiment slot or split it out to an alternate slot. |
A cell data set.
Converts a cell data set into a loupe file.
bb_loupeR(cds, output_dir = ".", output_file = "loupe")bb_loupeR(cds, output_dir = ".", output_file = "loupe")
cds |
Input cds. Only works on cell data set objects. For seurat objects, use built in loupeR functions |
output_dir |
Output directory, Default: '.' |
output_file |
Name of the loupe file. .cloupe will be appended for compatibility, Default: 'loupe' |
Nothing
cli_abort, cli_alert
create
select, mutate, across, reexports
starts_with
reducedDims
exprs
compare
create_loupe
This function takes either a gene name or sequence coordinates and returns an ape object with DNA sequence from the the selected genome reference. You can choose from hg38 and GRCz11. Features are generated from ensembl GFF files. Features overlapping the query range (gene or sequence) are returned as a GRanges object and as features within the Ape object. The features included can optionally be filtered using the "include_type" argument to the function. Using the "additional_granges" argument, you can provide additional features not present in the standard gene model which will be added to the Ape object GRanges slot and to the features slot.
bb_make_ape_genomic( query, genome = c("hg38", "GRCz11"), extend_left = 0, extend_right = 0, include_type = c("ncRNA_gene", "rRNA", "exon", "pseudogene", "pseudogenic_transcript", "ncRNA", "gene", "CDS", "lnc_RNA", "mRNA", "three_prime_UTR", "five_prime_UTR", "unconfirmed_transcript", "scRNA", "C_gene_segment", "D_gene_segment", "J_gene_segment", "V_gene_segment", "miRNA", "tRNA", "snRNA", "snoRNA", "lincRNA_gene", "lncRNA_gene", "unconfirmed_transcript"), additional_granges = NULL )bb_make_ape_genomic( query, genome = c("hg38", "GRCz11"), extend_left = 0, extend_right = 0, include_type = c("ncRNA_gene", "rRNA", "exon", "pseudogene", "pseudogenic_transcript", "ncRNA", "gene", "CDS", "lnc_RNA", "mRNA", "three_prime_UTR", "five_prime_UTR", "unconfirmed_transcript", "scRNA", "C_gene_segment", "D_gene_segment", "J_gene_segment", "V_gene_segment", "miRNA", "tRNA", "snRNA", "snoRNA", "lincRNA_gene", "lncRNA_gene", "unconfirmed_transcript"), additional_granges = NULL )
query |
Either a valid gene name or a named numeric vector of genome coordinates. This vector should be of the form: c(chr = 1, start = 1000, end = 2000). The vector must be numeric an must have those names. The chromosome number will be converted to "chr1" etc internally. |
genome |
The genome to pull from, Default: c("hg38", "GRCz11") |
extend_left |
Number of bases to extend the query to the left or "upstream" relative to the + strand. |
extend_right |
Number of bases to extend the query to the right or "downstream" relative to the + strand. |
include_type |
The type of features to include from the standard gene model. Default: c("ncRNA_gene", "rRNA", "exon", "pseudogene", "pseudogenic_transcript", "ncRNA", "gene", "CDS", "lnc_RNA", "mRNA", "three_prime_UTR", "five_prime_UTR", "unconfirmed_transcript", "scRNA", "C_gene_segment", "D_gene_segment", "J_gene_segment", "V_gene_segment", "miRNA", "tRNA", "snRNA", "snoRNA", "lincRNA_gene", "lncRNA_gene", "unconfirmed_transcript") |
additional_granges |
A GRanges object with features to add to the Ape Object. Coordinates should all be relative to the reference, NOT the sequence extracted for the ape file. The Granges object can be constructed with the following syntax: GenomicRanges::makeGRangesFromDataFrame(data.frame(seqname = "chr6", start = 40523370, end = 40523380, strand = "+", type = "addl_feature", gene_name = "prkcda", label = "feature1"), keep.extra.columns = T). The gene_name argument here is optional. If you have defined features based on the extracted sequence, (i.e. relative to position 1 in the ORIGIN section of the Ape object), the best option is to use the feature setting function FEATURES(instance_of_Ape) <- GRanges_Object. |
An APE object
This function takes a specific ensembl transcript identifier, such as ENST00000348343.11, and gets the cDNA sequence from the corresponding transcriptomic reference. This is returned as an Ape object with the UTR's and the CDS annotated as features.
bb_make_ape_transcript(query, transcriptome = c("hg38", "GRCz11"))bb_make_ape_transcript(query, transcriptome = c("hg38", "GRCz11"))
query |
A specific ensembl transcript identifier. |
transcriptome |
Genome/transcriptome reference to use, Default: c("hg38", "GRCz11") |
DETAILS
OUTPUT_DESCRIPTION
TxDb.Hsapiens.UCSC.hg38.knownGene
BSgenome.Hsapiens.UCSC.hg38
org.Hs.eg.db
character(0)
BSgenome.Drerio.UCSC.danRer11
org.Dr.eg.db
cli_abort
matchPattern
## Not run: if(interactive()){ #EXAMPLE1 } ## End(Not run)## Not run: if(interactive()){ #EXAMPLE1 } ## End(Not run)
Problem 1: Signac objects and GRanges made from bigwigs are large and it is computationally expensive to get data from them when tweaking plots. Problem 2: For the most part we like how Signac plots genomic coverage of track-like data and we would like to show bulk data in a similar way with a similar compuatational interface. The trace object is a small intermediate object that holds the minimal amount of data you need to make a coverage plot showing accessibility or binding to a specific genomic region. Then you can use the trace object to quickly and easily generate tracks as needed for your plot. These tracks are all ggplots and are easy to configure for legible graphics using add-on layers. There are options for displaying groups by color or facet which are built in with good graphical defaults. If these are not suitable, they can be changed post hoc like any other ggplot.
bb_makeTrace( obj, gene_to_plot = NULL, genome = c("hg38", "danRer11"), extend_left = 0, extend_right = 0, peaks = NULL, bulk_group_col = "group", bulk_coverage_col = "coverage", fill_in = FALSE, fixed_width = 100 )bb_makeTrace( obj, gene_to_plot = NULL, genome = c("hg38", "danRer11"), extend_left = 0, extend_right = 0, peaks = NULL, bulk_group_col = "group", bulk_coverage_col = "coverage", fill_in = FALSE, fixed_width = 100 )
obj |
A Signac/Seurat object or a GRanges object. Import a bigwig file to a GRanges object using bb_import_bw to ensure proper formatting. The precise range can optionally be defined by gene_to_plot and the extend arguments. You may wish to add grouping metadata columns and to merge several bulk tracks. This can be done while importing using c(bw1, bw2). If you are importing a small bigwig or related object, you can skip the gene_to_plot argument. The boundaries of the trace object will be defined by the data you import. |
gene_to_plot |
The gene you want to center the trace object on. Must be a valid gene in the genome assembly being used. Optional. If omitted, the full range of the imported object will be used as the boundaries of the trace object. |
genome |
The genome assembly. Required. Must be either "hg38" or "danRer11". |
extend_left |
Bases to extend plot_range left, or upstream relative to the top strand. |
extend_right |
Bases to extend plot_range right, or downstream relative to the top strand. |
peaks |
An optional GRanges object holding peak data. Ignored for Signac/Seurat objects which carry this internally. |
bulk_group_col |
Used only when making a Trace from a GRanges object. This identifies the metadata column holding the grouping information for the Trace data. It will be converted literally to "group" when the object is made. Recommendation is to import the bigwigs using import_bw which will set the group column and name it appropriately and then this can be left as the default. This is ignored for Signac/Seurat objects which report this by default in a column named group. |
bulk_coverage_col |
If you are making the object from a bigwig/GRanges, you need to identify which column in your bigwig/GRanges object holds the coverage data to plot on the y axis. Defaults to "coverage". Similar to bulk_group_col, if you import the bigwig using bb_import_bw, it should already be named appropriately. Will be ignored for Signac/Seurat objects which use "coverage" by default. |
fill_in |
Some track data may be very sparse. This probably depends on how it was generated. It seems that nextflow bigwigs have missing data where there is no signal but data from Signac is more complete with zeros. The problem with missing data is that it makes the line plot look jagged. This option allows you to fill in gaps in the trace data with 0's. This is done by tiling the regions without real signal, leaving the ranges with real signal untouched. Default is FALSE for compatibility with older code. TRUE will fill in the gaps. |
fixed_width |
Bin width for filling in gaps in sparse genome track data. Defaults to 100 bp. |
@description This function merges replicate peaks GRanges objects into 1. @param peaks_gr A regular list of GRanges objects (Not a GRangesList). @return A GRanges object @export @import IRanges GenomicRanges
bb_merge_narrowpeaks(peaks_gr)bb_merge_narrowpeaks(peaks_gr)
Use this function to generate data for making TSS enrichment plots or other metafeature plots that are centered on a single genomic locus. This function returns the data you need for the plot. Use the tibble element that is returned to plot the enrichment plot and the matrix for the heatmap. The problem currently is that the binwidths for the enrichment plot need to be smaller than the binwidths for the heatmap to look good. If you use good binwidths for the enrichment plot, the heatmap will crash. So either reduce the size of the heatmap matrix before plotting that or rerun the function with a different bin size. This function allows sample names to be added, so several samples can be column-bound together for comparison. Each gene is normalized to its own outer flanks so this should account for differences in sequencing depth to some degree. You also have the option to include all possible TSS in the plot (i.e. including zeros) which you may want to do if comparing several samples. To do this, set select_hits to FALSE.
bb_metafeature( query, targets, select_hits = TRUE, width = 2000, binwidth = 10, sample_id = NULL )bb_metafeature( query, targets, select_hits = TRUE, width = 2000, binwidth = 10, sample_id = NULL )
query |
A GRanges object. This should be from a bam file so you can plot read coverage across the metagene. |
targets |
A GRanges object. The targets you want to plot around. |
select_hits |
Do you want to plot only the targets that have overlappign query reads? Defaults to true. |
width |
The width of the analysis in bp. |
binwidth |
The binwidth in bp. Width must be evenly divided by binwidth. |
sample_id |
An optional sample id if you want to join this matrix up with another one. |
A list including a matrix and a tibble.
This function wraps the monocle3 method for data projection and label transfer into a function. This function takes in reference and query cds objects plus a character vector of label column names and returns the query CDS with the new labels.
bb_monocle_anno and bb_monocle_project use similar inputs and methods and both return cds objects, but the cds objects are different. Both wrap around the monocle3 mehtod for data projection and label transfer.
bb_monocle_anno takes in reference and query cds objects plus a character vector of label column names to transfer. It returns the query CDS with new cell metadata contining one or more columns with the labels corresponding to the nearest neighbor in the reference data. A suffix is appended to the new column name in the result. By default this is "_ref" but it can be changed using the suffix parameter.
bb_monocle_project also takes in reference and query cds objects and character vector of label column names. In this case, a combined cds object is returned carrying the query and reference data projected int the reference data space. New column names are added indicating the reference and query data. The query cells are given the label of their nearest neighbor in the reference. These label column(s) are prepended with "merged_".
Importantly, for both of these to work, the reference and query genes must have a shared namespace.
bb_monocle_(cds_qry, cds_ref, labels, suffix, use_aligned) bb_monocle_anno(cds_qry, cds_ref, labels, suffix = "_ref", use_aligned = TRUE) bb_monocle_project( cds_qry, cds_ref, labels, suffix = "_ref", use_aligned = TRUE )bb_monocle_(cds_qry, cds_ref, labels, suffix, use_aligned) bb_monocle_anno(cds_qry, cds_ref, labels, suffix = "_ref", use_aligned = TRUE) bb_monocle_project( cds_qry, cds_ref, labels, suffix = "_ref", use_aligned = TRUE )
cds_qry |
A query cell data set. |
cds_ref |
A reference cell data set. |
labels |
A character vector of cell metadata column names to transfer. |
suffix |
A character string of length 1 to append to all of the tranferred column names. There is no checking for name conflicts, so use this sensibly to prevent overwriting preexisting columns. Default = "_ref". |
use_aligned |
Whether to use aligned PCA coordinates from cds_ref, if they are available. Default = TRUE. |
see https://cole-trapnell-lab.github.io/monocle3/docs/projection/
A cell data set
cli_abort, cli_alert
sets
estimate_size_factors, preprocess_cds, reduce_dimension, save_transform_models, load_transform_models, preprocess_transform, reduce_dimension_transform, transfer_cell_labels, fix_missing_cell_labels
reducedDims
path
map, reduce
SummarizedExperiment-class
select, mutate-joins, join_by
## Not run: if(interactive()){ #EXAMPLE1 } ## End(Not run)## Not run: if(interactive()){ #EXAMPLE1 } ## End(Not run)
Note: this function is deprecated in favor of bb_monocle_regression_better which returns log2FoldChange instead of estimates. The log2FoldChange value is the normalized_effect value returned from the monocle regression functions.
bb_monocle_regression( cds, gene_or_genes, stratification_variable = NULL, stratification_value = NULL, form, linking_function = "negbinomial" )bb_monocle_regression( cds, gene_or_genes, stratification_variable = NULL, stratification_value = NULL, form, linking_function = "negbinomial" )
cds |
A cell data set object. |
gene_or_genes |
Genes to regress by. |
stratification_variable |
Optional colData column to subset the cds by internal to the function. |
stratification_value |
Optional value to stratify by. |
form |
The regression formula in the form of "~var1+var2+..." |
linking_function |
For the generalized linear model. |
A tibble containing the regression results.
This function replaces bb_monocle_regression. It returns log2FoldChange instead of estimates. The log2FoldChange value is the normalized_effect value returned from the monocle regression functions.
bb_monocle_regression_better( cds, gene_or_genes, stratification_variable = NULL, stratification_value = NULL, form, linking_function = "negbinomial" )bb_monocle_regression_better( cds, gene_or_genes, stratification_variable = NULL, stratification_value = NULL, form, linking_function = "negbinomial" )
cds |
A cell data set object. |
gene_or_genes |
Genes to regress by. |
stratification_variable |
Optional colData column to subset the cds by internal to the function. |
stratification_value |
Optional value to stratify by. |
form |
The regression formula in the form of "~var1+var2+..." |
linking_function |
For the generalized linear model. |
A tibble containing the regression results.
This is the main function for reading file in genebank/ape/equivalent format and generating an instance of the Ape class. String manipulations are used to parse the input ape file. Biostrings and GRanges functions are called to generate DNAStringSet and GRanges objects to store sequence and feature data, respectively. The Ape constructor function is called internally at the end.
bb_parseape(input_file)bb_parseape(input_file)
input_file |
The genebank/ape file to parse and construct into an instance of the Ape class. |
An Ape object
Plots expression for one or more genes as a function of pseudotime
bb_plot_genes_in_pseudotime( cds, gene_or_genes, pseudotime_dim, min_expr = NULL, cell_size = 0.75, nrow = NULL, ncol = 1, panel_order = NULL, color_cells_by = pseudotime_dim, trend_formula_df = 3, label_by_short_name = TRUE, vertical_jitter = NULL, horizontal_jitter = NULL, legend_title = NULL )bb_plot_genes_in_pseudotime( cds, gene_or_genes, pseudotime_dim, min_expr = NULL, cell_size = 0.75, nrow = NULL, ncol = 1, panel_order = NULL, color_cells_by = pseudotime_dim, trend_formula_df = 3, label_by_short_name = TRUE, vertical_jitter = NULL, horizontal_jitter = NULL, legend_title = NULL )
cds |
Cell data set to plot. |
gene_or_genes |
Gene or genes for which to plot pseudotime. |
pseudotime_dim |
The column holding the pseudotime dimension to plot along. |
min_expr |
the minimum (untransformed) expression level to plot. |
cell_size |
the size (in points) of each cell used in the plot. |
nrow |
the number of rows used when laying out the panels for each gene's expression. |
ncol |
the number of columns used when laying out the panels for each gene's expression |
panel_order |
vector of gene names indicating the order in which genes
should be laid out (left-to-right, top-to-bottom). If
|
color_cells_by |
the cell attribute (e.g. the column of colData(cds)) to be used to color each cell. Defaults to the value provided for pseudotime_dim. |
trend_formula_df |
degrees of freedom for the model formula used to fit the expression trend over pseudotime. The formulat takes the form of "~ splines::ns(pseudotime_dim, df = trend_formula_df)". |
label_by_short_name |
label figure panels by gene_short_name (TRUE) or feature ID (FALSE). |
vertical_jitter |
A value passed to ggplot to jitter the points in the vertical dimension. Prevents overplotting, and is particularly helpful for rounded transcript count data. |
horizontal_jitter |
A value passed to ggplot to jitter the points in the horizontal dimension. Prevents overplotting, and is particularly helpful for rounded transcript count data. |
a ggplot2 plot object
Will generate a ggplot from the colData of a SummarizedHeatmap object. Typically this will be placed at the top or bottom of a plot. If flipped, use the side argument to put the colData on the left or right.
bb_plot_heatmap_colData( obj, tile_color = "white", vars = colnames(colData(obj)), side = c("top", "bottom", "right", "left"), manual_pal = NULL )bb_plot_heatmap_colData( obj, tile_color = "white", vars = colnames(colData(obj)), side = c("top", "bottom", "right", "left"), manual_pal = NULL )
obj |
a SummarizedHeatmap object |
tile_color |
Outline color for the tiles, Default: 'white' |
vars |
Variables to plot. Supply a named vector to change the axis text and legend titles for each variable, Default: colnames(colData(obj)) |
side |
Side on which to plot, Default: c("top", "bottom", "right", "left") |
manual_pal |
a color palette, preferably a named vector corresponding to values of colData, Default: NULL |
a ggplot
Takes in a SummarizedHeatmap object and returns a ggplot of the rowDendro slot. This can be positioned with the side parameter. Default is to position it on the left. If flipping the heatmap so that the rows run vertically, you will need to change the side argument to top or bottom.
If row order is set explicitly when creating this object, the dendrogram slot will be NULL, and this function will abort.
bb_plot_heatmap_colDendro( obj, side = c("top", "bottom", "left", "right"), linewidth = 0.5 )bb_plot_heatmap_colDendro( obj, side = c("top", "bottom", "left", "right"), linewidth = 0.5 )
obj |
a Summarized Heatmap |
side |
Orientation/side of the heatmap to put the dendrogram, Default: c("top", "bottom", "left", "right") |
linewidth |
Weight of the dendrogram plot, Default: 0.5 |
a ggplot
Use geom_text_repel to selectively highlight some column names. Useful when there are too many to highlight to be able to use the axis directly.
bb_plot_heatmap_colHighlight( obj, highlights = character(0), side = c("top", "bottom", "right", "left"), ... )bb_plot_heatmap_colHighlight( obj, highlights = character(0), side = c("top", "bottom", "right", "left"), ... )
obj |
A summarized heatmap |
highlights |
A vector of columns to highlight, Default: character(0) |
side |
Side on which to put the higlight, Default: c("top", "bottom", "right", "left") |
... |
Other arguments to pass to geom_text_repel |
a ggplot
Takes in a Summarized Heatmap object and returns a ggplot of the matrix data.
bb_plot_heatmap_main( obj, tile_color = "white", high = "red3", mid = "white", low = "blue4", flip = FALSE )bb_plot_heatmap_main( obj, tile_color = "white", high = "red3", mid = "white", low = "blue4", flip = FALSE )
obj |
A SummarizedHeatmap |
tile_color |
Outline of the color tiles, Default: 'white' |
high |
Color for high values, applied to scale_fill_gradient_2, Default: 'red3' |
mid |
Color for mid values, applied to scale_fill_gradient_2, Default: 'white' |
low |
Color for low values, applied to scale_fill_gradient_2, Default: 'blue4' |
flip |
Whether to transpose the matrix, i.e. plot the rows as columns and columns as rows, Default: FALSE |
a ggplot
Use this function to create a plot annotating SummarizedHeatmap rowDAta
bb_plot_heatmap_rowData( obj, tile_color = "white", vars = colnames(rowData(obj)), side = c("right", "left", "top", "bottom"), manual_pal = NULL )bb_plot_heatmap_rowData( obj, tile_color = "white", vars = colnames(rowData(obj)), side = c("right", "left", "top", "bottom"), manual_pal = NULL )
obj |
A SummarizedHeatmap objectr |
tile_color |
Color for the tile outlines, Default: 'white' |
vars |
rowData variables to plot. Supply a named vector to change the names shown on the axis and legend, Default: colnames(rowData(obj)) |
side |
Side on which to plotj, Default: c("right", "left", "top", "bottom") |
manual_pal |
Color palette for filling the tiles, preferably a named vector, Default: NULL |
a ggplot
Takes in a SummarizedHeatmap object and returns a ggplot of the rowDendro slot. This can be positioned with the side parameter. Default is to position it on the left. If flipping the heatmap so that the rows run vertically, you will need to change the side argument to top or bottom.
If row order is set explicitly when creating this object, the dendrogram slot will be NULL and this function will abort.
bb_plot_heatmap_rowDendro( obj, side = c("left", "right", "top", "bottom"), linewidth = 0.5 )bb_plot_heatmap_rowDendro( obj, side = c("left", "right", "top", "bottom"), linewidth = 0.5 )
obj |
a Summarized Heatmap |
side |
Orientation/side of the heatmap to plot, Default: c("left", "right", "top", "bottom") |
linewidth |
Weight of the lines for the dendrogram, Default: 0.5 |
A ggplot
Plots a selected row name from a SummarizedHeatmap object. Useful when there are too many to highlight so you can't alter the plot axis.
bb_plot_heatmap_rowHighlight( obj, highlights = character(0), side = c("right", "left", "top", "bottom"), ... )bb_plot_heatmap_rowHighlight( obj, highlights = character(0), side = c("right", "left", "top", "bottom"), ... )
obj |
A summarized Heatmap object |
highlights |
A vector of rows to highlight, Default: character(0) |
side |
Side on which to plot the highlight, Default: c("top", "bottom", "right", "left") |
... |
Other arguments to pass to geom_text_repel |
a ggplot
Use as argument for the "gene_or_genes" parameter for bb_gene_umap.
bb_plot_rowData_col(cds, rowData_col, filter_in = NULL, filter_out = NULL)bb_plot_rowData_col(cds, rowData_col, filter_in = NULL, filter_out = NULL)
cds |
CDS from which to extract the gene metadata. Should be the same cds as the enclosing function. |
rowData_col |
Gene metadata column to aggregate by. |
filter_in |
Subset of values to focus on. Each will become a facet in the final plot. Default is to keep everything except NA values. |
filter_out |
Option to filter out any unwanted values. Default is to not filter out anything. |
A data frame in the format needed to pass into bb_gene_umap.
Generates the X axis for stacking other track plots on top of.
bb_plot_trace_axis(trace, xtitle = NULL)bb_plot_trace_axis(trace, xtitle = NULL)
trace |
A Trace object. |
xtitle |
An optional title for the X axis. Defaults to the genome and chromosome. |
A function to generate a line plot from tracklike genomic data. This pulls data from the Trace object trace_data slot. Check what numeric and categorical variables are available for plotting using Trace.data.
bb_plot_trace_data( trace, yvar = "coverage", yvar_label = "Coverage", facet_var = "group", color_var = "group", pal = NULL, legend_pos = "none", group_filter = NULL, group_variable = "group" )bb_plot_trace_data( trace, yvar = "coverage", yvar_label = "Coverage", facet_var = "group", color_var = "group", pal = NULL, legend_pos = "none", group_filter = NULL, group_variable = "group" )
trace |
A Trace object |
yvar |
The trace_data metadata variable that will become the y axis. Defaults to "coverage". Must be numeric. |
yvar_label |
The y-axis label for the coverage track. Defaults to "Coverage". |
facet_var |
The trace_data metadata variable describing data facets. Each will be placed as a separate horizontal track with the value printed to the left. Optional but recommended. Defaults to "group". |
color_var |
The variable to color groups of traces by. Optional but recommended. Defaults to "group". |
pal |
A color palette. Can also be added after the fact. |
legend_pos |
Color legend position. Can also be added after the fact. Defaults to "none". |
group_filter |
Optional value to filter the trace data by. Should be a value from the "group" metadata variable in the trace object. |
group_variable |
Optional metadatavariable to filter trace data by. When imported from signac/seurat objects, this value defaults to "group", so that is the default here. However if constructed manually, you may wish to apply filtering to another variable. If so, apply it to this parameter. |
A function to generate a link plot from tracklike genomic data. Links will automatically be trimmed to lie entirely within the plot range. An additional, optional score cutoff can be provided.
bb_plot_trace_links( trace, cutoff = 0, link_low_color = "grey80", link_high_color = "red3", link_range = c(0, 1) )bb_plot_trace_links( trace, cutoff = 0, link_low_color = "grey80", link_high_color = "red3", link_range = c(0, 1) )
trace |
A Trace object containing a valid links slot. |
cutoff |
Score cutoff for link plotting. Defaults to 0. |
link_low_color |
The color of a link with value of 0, default = grey80 |
link_high_color |
The color of a link with value of 1, default = red3 |
link_range |
The range of the color scale in terms of link values, default = c(0,1) |
A function to generate a plot of the underlying gene model. The genes to be plotted are automatically selected according to the genome build and the plot range. The function automatically picks the longest principle transcript to show. Optionally, alternative transcripts can be shown by specifying the select_transcript argument. This must be an ensembl transcript identifier lying within the plot range.
bb_plot_trace_model( trace, font_face = "italic", line_width = 0.5, select_transcript = NULL, icon_fill = "cornsilk", icon_alpha = 0.5, arrow_scale = 1, segment_length_bp = 1000, debug = FALSE )bb_plot_trace_model( trace, font_face = "italic", line_width = 0.5, select_transcript = NULL, icon_fill = "cornsilk", icon_alpha = 0.5, arrow_scale = 1, segment_length_bp = 1000, debug = FALSE )
trace |
A Trace object. |
font_face |
Font face option to use. Default = "italic". |
select_transcript |
Optional selected transcript(s) to plot. |
icon_fill |
The color to make the exon boxes. |
debug |
Boolean. Option to show the transcript ID on the final plot to confirm you have the right one. Default = FALSE. |
A function to generate a peak plot from tracklike genomic data.
bb_plot_trace_peaks( trace, group_filter = NULL, group_variable = "group", pal = NULL )bb_plot_trace_peaks( trace, group_filter = NULL, group_variable = "group", pal = NULL )
trace |
A Trace object. |
group_filter |
Optional value to filter the peak data by. Should be a value from the "group" metadata variable in the trace object. |
group_variable |
Optional metadatavariable to filter trace data by. When imported from signac/seurat objects, this value defaults to "group", so that is the default here. However if constructed manually, you may wish to apply filtering to another variable. If so, apply it to this parameter. |
fill_color |
The color to fill the peak graphics with. Defaults to grey60. |
Plot motif footprinting results
bb_plotfootprint( object, features, alt_main_title = NULL, alt_color_title = NULL, legend_pos = "right", colorscale = NULL, assay = NULL, group.by = NULL, idents = NULL, label = TRUE, repel = TRUE, show.expected = TRUE, normalization = "subtract", label.top = 3, label.idents = NULL, fontsize = 14, linesize = 0.2 )bb_plotfootprint( object, features, alt_main_title = NULL, alt_color_title = NULL, legend_pos = "right", colorscale = NULL, assay = NULL, group.by = NULL, idents = NULL, label = TRUE, repel = TRUE, show.expected = TRUE, normalization = "subtract", label.top = 3, label.idents = NULL, fontsize = 14, linesize = 0.2 )
object |
A Seurat object |
features |
A vector of features to plot |
alt_main_title |
Alternative title for the main plot. Accepts markdown. |
alt_color_title |
Alternative title for the color scale |
legend_pos |
Position to place the legend |
colorscale |
Named vector of colors to apply to the top plot. |
assay |
Name of assay to use |
group.by |
A grouping variable |
idents |
Set of identities to include in the plot |
label |
TRUE/FALSE value to control whether groups are labeled. |
repel |
Repel labels from each other |
show.expected |
Plot the expected Tn5 integration frequency below the main footprint plot |
normalization |
Method to normalize for Tn5 DNA sequence bias. Options are "subtract", "divide", or NULL to perform no bias correction. |
label.top |
Number of groups to label based on highest accessibility in motif flanking region. |
label.idents |
Vector of identities to label. If supplied, |
fontsize |
Theme font size |
linesize |
Size to draw the footprint lines
|
Print out a stats report
bb_print_full_stats( data, classification_variable, numeric_variable, test_type = c("Student", "Welch", "Wilcox"), output = NULL )bb_print_full_stats( data, classification_variable, numeric_variable, test_type = c("Student", "Welch", "Wilcox"), output = NULL )
data |
A Tibble in tidy data format. Must contain or be filtered to contain only 2 levels in "classification_variable" for comparisons. |
classification_variable |
Column containing the class variable |
numeric_variable |
The column containing the numeric values to summarize and compare |
test_type |
Must be one of "Student", "Welch", and "Wilcox" |
output |
Output file; if null prints to screen. |
A text file
For a given GRanges object containing peaks, determine how many peaks overlap promoters, how many promoters are overlapped by peaks and the significance of enrichment of query peaks relative to promoters.
bb_promoter_overlap(query, tss = c("hg38_tss", "dr11_tss"), width = 200)bb_promoter_overlap(query, tss = c("hg38_tss", "dr11_tss"), width = 200)
query |
A GRanges object containing peaks |
tss |
The tss data base to use. Must be one of "hg38_tss" or "dr11_tss" |
width |
The width around the tss to evaluate. Defaults to 200 bp. |
A list including overlap information and binomal test results.
Use this function to perform Pseudbulk DGE analysis.
bb_pseudobulk_mf( cds, pseudosample_table, design_formula, count_filter = 10, result_recipe = "default", test = "Wald", reduced = NULL )bb_pseudobulk_mf( cds, pseudosample_table, design_formula, count_filter = 10, result_recipe = "default", test = "Wald", reduced = NULL )
cds |
The cell data set object subset to analyze |
pseudosample_table |
A tibble indicating the sample groupings for analysis. This should include 1.) Unique sample identifiers 2.) Any sample-level cell metadata you wish to include in the regression model and 3.) Any Cell-level metadata you may wish to include such as clusters or partitions. Values will be coerced to factors. |
design_formula |
The regression-style formula for the analysis. In the form of "~ variable1 + variable2 + ... final_variable". The default behavior is to calculate results according to the final_variable in the design_formula with preceding variables as co-variates. The reference class is chosen according to alphabetical order. This behavior can be modified by specifying the result_recipe argument. |
count_filter |
The minimum number of counts required across all pseudosamples in order to keep a gene in the analysis. |
result_recipe |
See above for the default recipe. Alternatively, supply a 3-element vector in the form of c("variable", "experimental_level","reference_or_control_level") |
A list of results from pseudobulk analysis
This function allows you to simulate single cell data from bulk RNA-seq data. It requires a TPM count matrix. The rationale is that TPM quantifies transcript counts per million reads, so you can think of this like 1 million UMI counts from a scRNA-seq experiment distributed in a certain way across the transcriptome. This function samples n_pseudocells with transcripts_per_pseudocell from this distribution. Then it creates a cell data set based on a matrix of these samples.
Importantly, this function cannot accurately identify transcriptional heterogenity within the bulk data. The sampling effect may reveal some potential heterogeneity but there is no method here for determining whether this is due to randomness or heterogeneity within the data.
The intended use of this function is to project a pseudocell cds onto an actual single cell data set. This can be used to help identify regions of UMAP space that are shared between the pseudocells and the real cells.
Note: The matrix rownames and row metadata for the returned cds will contain only the rownames from the input tpm_matrix. So these should share the same namespace as the single cell cds that the pseudocell data will be projected onto.
bb_pseudocells( tpm_matrix, n_pseudocells, transcripts_per_pseudocell, remove_genes = NULL )bb_pseudocells( tpm_matrix, n_pseudocells, transcripts_per_pseudocell, remove_genes = NULL )
tpm_matrix |
A matrix of TPM counts from a bulk RNA experiment. A cds will be generated for each column in this matrix and the result combined. |
n_pseudocells |
The number of pseudocells to create. Should be length 1 or to specify a uniqe n_pseudocells for each dataset, a vector of the same length as the number of columns in tpm_matrix. This value will be recycled if necessary to match the number of columns in tpm_mtx. |
transcripts_per_pseudocell |
The number of transcripts to sample for each pseudocell. Should be similar to the median number of UMI in the single cell data the pseudocells will be projected onto. Should be length 1 or to specify a uniqe n_pseudocells for each dataset, a vector of the same length as the number of columns in tpm_matrix. This value will be recycled if necessary to match the number of columns in tpm_mtx. |
remove_genes |
A vector of genes to remove before sampling. Should be the same or similar to the genes removed from the single cell data the pseudocells will be projected onto, Default: NULL |
A cell data set
map, reduce, map2, pmap
tibble
count, select
components
new_cell_data_set, combine_cds, preprocess_cds, reduce_dimension
## Not run: if(interactive()){ #EXAMPLE1 } ## End(Not run)## Not run: if(interactive()){ #EXAMPLE1 } ## End(Not run)
This function determines a gene expression trajectory using learn_graph from monocle3 and then calculates pseudotime dimensions along this trajectory using order_cells. So it is 2 functions wrapped into 1. Usually we will not adjust the parameters for learn_graph with the possible exception of close_loop and use_partition which are also available in this function with the same defaults. If you need to fine-tune the trajectory, use monocle3::learn_graph on the cds object first and then run this function to calculate pseudotime. The graph learning will not be repeated on an object unless force_graph is set to TRUE.
If you just want to look at the trajectory graph and not calculate pseudotime, change calculate_pseudotime to FALSE, or run monocle3::learn_graph.
After the pseudotime values are calculated, they are handled differently than in monocle3. In this function, they are copied from the hidden CDS slot and made an explicit cell metadata column. Pseudotime needs a starting point or anchor. There is no interactive option here as in monocle3. To identify this starting point, you identify a cell metadata variable and provide it to cluster_variable. This should identify a cohesive group of cells in UMAP space such as a leiden cluster, louvain cluster or partition. Then provide a value corresponding to the cluster of interest to cluster_value. The function will start pseudotime at the cell closest to the graph node in that cluster. The pseudotime value column will be named automatically as a composite of the cluster_variable and cluster_value parameters.
bb_pseudotime( cds, calculate_pseudotime = TRUE, cluster_variable, cluster_value, use_partition = TRUE, close_loop = TRUE, force_graph = FALSE )bb_pseudotime( cds, calculate_pseudotime = TRUE, cluster_variable, cluster_value, use_partition = TRUE, close_loop = TRUE, force_graph = FALSE )
cds |
The cell data set object to calculate pseudotime upon. Does not yet accept seurat objects. |
calculate_pseudotime |
Logical, whether to calculate the pseudotime dimension. If false, will only run learn_graph, Default: TRUE |
cluster_variable |
The cell metadata column from which the pseudotime = 0 cell will be selected. |
cluster_value |
The value of cluster_variable that identifies a cluster. The cell closest to the root node closest to the center of this cluster will have pseudotime of 0. |
use_partition |
Logical; If TRUE, learn_graph will construct trajectories within partitions. If FALSE, it will connect partitions, Default: TRUE |
close_loop |
Logical; Whether learn_graph will close looping trajectories, Default: TRUE |
force_graph |
Logical; If TRUE, the function will recalculate the graph., Default: FALSE |
A cell data set
cli_div, cli_alert
learn_graph, order_cells, pseudotime
SummarizedExperiment-class
A function to run qc tests on cds objects.
bb_qc( cds, cds_name, genome = c("human", "mouse", "zfish"), nmad_mito = 2, nmad_detected = 2, max_mito = NULL, min_log_detected = NULL )bb_qc( cds, cds_name, genome = c("human", "mouse", "zfish"), nmad_mito = 2, nmad_detected = 2, max_mito = NULL, min_log_detected = NULL )
cds |
A cell data set object to run qc functions on |
cds_name |
The name of the cds |
genome |
The species to use for identifying mitochondrial genes. Choose from "human", "mouse", "zfish", "human_mouse" for pdx. |
max_mito |
Manual cutoff for mitochondrial percentage. May be more strict, i.e. lower, than the automated cutoff but not less strict, Default: NULL |
min_log_detected |
Manual cutoff for log detected features. May be more strict, i.e. higher, than the automated cutoff not not less strict, Default: NULL |
A list of qc objects
This function reads a single sorted, dedupliated paired end bam file and returns either a GRanges object or a GenomicAlignmentPairs object. The former requires much less memory but a the cost of retaining the outer boundaries of each read. If read 1 has start S1 and end E1 and read 2 has start S2 and end S2, the Granges object spans S1-E2.
bb_read_bam( sortedBam, genome = c("hg38", "danRer11"), return_type = c("GenomicAlignmentPairs", "GRanges") )bb_read_bam( sortedBam, genome = c("hg38", "danRer11"), return_type = c("GenomicAlignmentPairs", "GRanges") )
sortedBam |
File path to the bam file to load. |
genome |
One of "hg38" or "danRer11". This is used to clean up the granges object if necessary. |
return_type |
Type of object to return. GRanges is smaller. GenomicAlignmentPairs retains read pair data. |
An object according to return_type.
This function reads the narrow peaks file (BED6 + 4 format) and turns it into a GRanges object.
bb_read_narrowpeak(file)bb_read_narrowpeak(file)
file |
The file path to the narrow peaks file. |
A GRanges object
Rejoin qc and doubletfinder data to a cds object
bb_rejoin(cds, qc_data, doubletfinder_data)bb_rejoin(cds, qc_data, doubletfinder_data)
cds |
A cell data set object to rejoin to |
qc_data |
A table of cell barcodes with qc data. Can be extracted from bb_qc with purrr::map(qc_result, 1) |
doubletfinder_data |
The doubletfinder result tbl |
A cell data set object with qc and doubletfinder data
Remove rows that have duplicates in a given column
bb_remove_dupes(data, column)bb_remove_dupes(data, column)
data |
A tibble. |
column |
A column to deduplicate |
A deduplicated tibble
Take a cell_data_set or Seurat object and return the gene/feature metadata in the form of a tibble. RNA is used as the default assay.
bb_rowmeta( obj, row_name = "feature_id", experiment_type = "Gene Expression", assay = "RNA", cds = NULL )bb_rowmeta( obj, row_name = "feature_id", experiment_type = "Gene Expression", assay = "RNA", cds = NULL )
obj |
A cell_data_set or Seurat object |
row_name |
Optional name to provide for feature unique identifier, Default: 'feature_id' |
experiment_type |
The experiment type to display. Applies only to cds objects. Commonly will be either "Gene Expression" or "Antibody Capture", Default: 'Gene Expression' |
assay |
For a Seurat object, th feature assay to return. CDS objects with alternative experiments are not supported, Default: 'RNA' |
cds |
Provided for compatibility with prior versions, Default: NULL |
If a value is supplied for cds, a warning will be issued and the function will pass the value of cds to obj.
At tibble.
A function to generate automated cell labelings with Seurat
bb_seurat_anno(cds, reference)bb_seurat_anno(cds, reference)
cds |
A cell data set object |
reference |
Seurat reference data. |
A modified cds with Seurat cell assignments.
Reads standard Souporcell output files and aggregates reference/alternate allele counts across cells assigned to each Souporcell genotype. The returned matrix is designed for genotype-demultiplexing QC plots, especially heatmaps showing whether inferred genetic IDs have distinct SNP allele profiles.
bb_souporcell_matrix( souporcell_dir, return = c("continuous", "discrete"), orientation = c("snps_by_genotypes", "genotypes_by_snps"), variant_file = NULL, use_variant_names = TRUE, variant_name_style = c("chr_pos_ref_alt", "chr_pos", "id"), status_keep = "singlet", assignment_col = "assignment", status_col = "status", min_total_reads = 10, min_genotype_coverage = 0.7, require_finite = TRUE, min_range = 0.35, min_sd = 0.12, ref_threshold = 0.2, alt_threshold = 0.8, require_ref_like = 1L, require_alt_like = 1L, top_n = 75L, discrete_values = c(ref = 0, het = 0.5, alt = 1), genotype_prefix = "ID_", snp_prefix = "SNP_", sort_genotypes = TRUE, verbose = TRUE, return_metadata = FALSE )bb_souporcell_matrix( souporcell_dir, return = c("continuous", "discrete"), orientation = c("snps_by_genotypes", "genotypes_by_snps"), variant_file = NULL, use_variant_names = TRUE, variant_name_style = c("chr_pos_ref_alt", "chr_pos", "id"), status_keep = "singlet", assignment_col = "assignment", status_col = "status", min_total_reads = 10, min_genotype_coverage = 0.7, require_finite = TRUE, min_range = 0.35, min_sd = 0.12, ref_threshold = 0.2, alt_threshold = 0.8, require_ref_like = 1L, require_alt_like = 1L, top_n = 75L, discrete_values = c(ref = 0, het = 0.5, alt = 1), genotype_prefix = "ID_", snp_prefix = "SNP_", sort_genotypes = TRUE, verbose = TRUE, return_metadata = FALSE )
souporcell_dir |
Character scalar. Path to a Souporcell output directory,
for example |
return |
Character scalar. One of |
orientation |
Character scalar. One of |
variant_file |
Character scalar or |
use_variant_names |
Logical scalar. If |
variant_name_style |
Character scalar. One of |
status_keep |
Character vector. Cell statuses from |
assignment_col |
Character scalar. Column in |
status_col |
Character scalar. Column in |
min_total_reads |
Numeric scalar. Minimum aggregate read depth required
for a genotype/SNP pair. Values below this are set to |
min_genotype_coverage |
Numeric scalar between 0 and 1. Fraction of genotype IDs that must have adequate coverage for a SNP to be retained. |
require_finite |
Logical scalar. If |
min_range |
Numeric scalar. Minimum ALT allele-fraction range across genotype IDs required for a SNP to be retained. |
min_sd |
Numeric scalar. Minimum ALT allele-fraction standard deviation across genotype IDs required for a SNP to be retained. |
ref_threshold |
Numeric scalar. ALT allele fractions less than or equal to this value are considered reference-like. |
alt_threshold |
Numeric scalar. ALT allele fractions greater than or equal to this value are considered alternate-like. |
require_ref_like |
Integer scalar. Minimum number of genotype IDs that must be reference-like for a SNP to be retained. |
require_alt_like |
Integer scalar. Minimum number of genotype IDs that must be alternate-like for a SNP to be retained. |
top_n |
Integer scalar or |
discrete_values |
Numeric vector of length 3. Values used for
reference-like, intermediate/heterozygous-like, and alternate-like states
when |
genotype_prefix |
Character scalar. Prefix for genotype names. |
snp_prefix |
Character scalar. Prefix for fallback SNP names when no usable variant label is available. |
sort_genotypes |
Logical scalar. If |
verbose |
Logical scalar. If |
return_metadata |
Logical scalar. If |
The function expects a Souporcell output directory containing alt.mtx,
ref.mtx, and clusters.tsv. If present, common_variants_covered_tmp.vcf
is used to label SNPs with genomic coordinates and alleles.
If return_metadata = FALSE, a numeric matrix. By default, rows are
SNPs and columns are Souporcell genotype IDs. If return_metadata = TRUE,
a list with elements matrix, snp_summary, variant_table,
alt_by_genotype, ref_by_genotype, depth_by_genotype,
allele_fraction_by_genotype, and clusters.
mat <- build_souporcell_genotype_matrix( souporcell_dir = "results/g_pos_1/souporcell/K15", return = "discrete", orientation = "snps_by_genotypes", top_n = 75 ) pheatmap::pheatmap( mat, cluster_rows = TRUE, cluster_cols = TRUE, show_rownames = FALSE )mat <- build_souporcell_genotype_matrix( souporcell_dir = "results/g_pos_1/souporcell/K15", return = "discrete", orientation = "snps_by_genotypes", top_n = 75 ) pheatmap::pheatmap( mat, cluster_rows = TRUE, cluster_cols = TRUE, show_rownames = FALSE )
Extracts Peaks data from 10X counts matrix and feature metadata and saves it as an alternate experiment in the assigned CDS.
bb_split_atac(cds)bb_split_atac(cds)
cds |
CDS to split the Peaks data from |
A cell data set
If you have cite-seq data together with gene expression data, this function will move the cite seq data to a new separate experiment. It will use Seurat to normalize these data using the CLR method and store them in a new assay.
bb_split_citeseq(cds)bb_split_citeseq(cds)
cds |
the cell data set to split |
a new CDS
A Function To Add Tibble Columns To Cell Metadata
bb_tbl_to_coldata(obj, min_tbl, join_col = "cell_id", cds = NULL)bb_tbl_to_coldata(obj, min_tbl, join_col = "cell_id", cds = NULL)
obj |
A Seurat or cell data set object |
min_tbl |
A tibble containing only the columns you want to add plus one column for joining. Cell IDs may not be duplicated but missing cells are ok; values will be replaced by NA. |
join_col |
The column in min_tbl containing the join information for the cds rowData. Defaults to "cell_id". |
cds |
Retained for backwards compatibility. If supplied, will generate a warning and pass argument to obj. Default = NULL |
An object of the same class
Convert a wide-form tibble a matrix
bb_tbl_to_matrix(data)bb_tbl_to_matrix(data)
data |
A wide form tibble to convert to a matrix. The first column will become the rownames. |
A matrix
A Function To Add Tibble Columns To Feature Metadata
bb_tbl_to_rowdata( obj, assay = "RNA", min_tbl, join_col = "feature_id", cds = NULL )bb_tbl_to_rowdata( obj, assay = "RNA", min_tbl, join_col = "feature_id", cds = NULL )
obj |
A Seurat or cell data set object |
assay |
The assay to which to add the metadata column, Default = RNA |
min_tbl |
A tibble containing only the columns you want to add plus one column for joining. Features cannot be duplicated but missing features are ok and will be replaced by NA. |
join_col |
The column in min_tbl containing the join information for the cds rowData. Defaults to "feature_id". |
cds |
Retained for backwards compatibility. If supplied, will generate a warning and pass argument to obj. Default = NULL |
An object of the same class
Based on Monocle3's Partitions, Leiden, and Louvain clustering methods. Implemented mostly with default values. Seurat objects will be converted to cell_data_set objects for the clustering. The function produces a list of top markers for each cluster type and returns these assignments to the original object as new cell metadata columnts.
bb_triplecluster( obj, n_top_markers = 50, outfile = NULL, n_cores = 8, cds = NULL )bb_triplecluster( obj, n_top_markers = 50, outfile = NULL, n_cores = 8, cds = NULL )
obj |
A Seurat or cell_data_set object |
n_top_markers |
Number of top markers to identify per cell group, Default: 50 |
outfile |
Name of a csv file to hold the top marker results. If null, will place "top_markers.csv" in the working directory, Default: NULL |
n_cores |
Number of processor cores to use, Default: 8 |
cds |
Provided for backwards compatibility for existing code. If a value is supplied it will be transferred to obj and a warning message will be emitted, Default: NULL |
A modified Seurat or cell_data_set object
Will rejoin scoresheet and blinded key to produce unblinded results. If you change the names of either of those files, they have to be provided as arguments to the function. Otherwise keyfile and scorefile are optional.
bb_unblind_images( directory, keyfile = "blinding_key.csv", scorefile = "scoresheet.csv", analysis_file, file_column )bb_unblind_images( directory, keyfile = "blinding_key.csv", scorefile = "scoresheet.csv", analysis_file, file_column )
directory |
The linux-style filepath of the folder containing the scoresheet and blinded key. |
keyfile |
Optional: filename of the key file. Defaults to "blinding_key.csv". |
scorefile |
Optional: filename of the score file. Defaults to "scoresheet.csv". |
analysis_file |
Complete file path to the the unblinded main analysis sheet. The function will will left_join analysis_file and unblinded results. In the process, it will necessarily convert windows file paths to linux-style file paths. Samples not included in the blinding should return with NA values for the added columns. New data columns being added on from scoresheet should be unique relative to analysis_file. |
file_column |
The column in analysis_file containing file paths for the files that were blinded. |
nothing
A function to generate a UMAP with colors mapped to colData variables
bb_var_umap( obj, var, assay = "RNA", value_to_highlight = NULL, foreground_alpha = 1, legend_pos = "right", cell_size = 0.5, alt_stroke_color = NULL, legend_title = NULL, plot_title = NULL, palette = NULL, alt_dim_x = NULL, alt_dim_y = NULL, overwrite_labels = FALSE, group_label_size = 3, alt_label_col = NULL, shape = 21, nbin = 100, facet_by = NULL, sample_equally = FALSE, rasterize = FALSE, raster_dpi = 300, show_trajectory_graph = FALSE, trajectory_graph_color = "grey28", trajectory_graph_segment_size = 0.75, label_root_node = FALSE, pseudotime_dim = var, label_principal_points = FALSE, graph_label_size = 2, cds = NULL, outline_cluster = FALSE, outline_color = "black", outline_size = 1, outline_type = "solid", outline_alpha = 1, ..., man_text_df = NULL, text_geom = "text", minimum_segment_length = 1, hexify = FALSE, n_hexbins = 100 )bb_var_umap( obj, var, assay = "RNA", value_to_highlight = NULL, foreground_alpha = 1, legend_pos = "right", cell_size = 0.5, alt_stroke_color = NULL, legend_title = NULL, plot_title = NULL, palette = NULL, alt_dim_x = NULL, alt_dim_y = NULL, overwrite_labels = FALSE, group_label_size = 3, alt_label_col = NULL, shape = 21, nbin = 100, facet_by = NULL, sample_equally = FALSE, rasterize = FALSE, raster_dpi = 300, show_trajectory_graph = FALSE, trajectory_graph_color = "grey28", trajectory_graph_segment_size = 0.75, label_root_node = FALSE, pseudotime_dim = var, label_principal_points = FALSE, graph_label_size = 2, cds = NULL, outline_cluster = FALSE, outline_color = "black", outline_size = 1, outline_type = "solid", outline_alpha = 1, ..., man_text_df = NULL, text_geom = "text", minimum_segment_length = 1, hexify = FALSE, n_hexbins = 100 )
obj |
A Seurat or cell data set object |
var |
The variable to map colors to. Special exceptions are "density", "local_n" and "log_local_n" which calculate the 2 d kernel density estimate or binned cell counts and maps to color scale. |
assay |
The gene expression assay to draw reduced dimensions from. Default is "RNA". Does not do anything with cell_data_set objects. |
value_to_highlight |
Option to highlight a single value |
foreground_alpha |
Alpha value for foreground points |
legend_pos |
Legend position |
cell_size |
Cell point size |
alt_stroke_color |
Alternative color for the data point stroke |
legend_title |
Title for the legend |
plot_title |
Main title for the plot |
palette |
Color palette to use. "Rcolorbrewer", "Viridis" are builtin options. Otherwise provide manual values. |
alt_dim_x |
Alternate/reference dimensions to plot by. |
alt_dim_y |
Alternate/reference dimensions to plot by. |
overwrite_labels |
Whether to overwrite the variable value labels |
group_label_size |
Size of the overwritten labels |
alt_label_col |
Alternate column to label cells by |
shape |
Shape for data points |
nbin |
Number of bins if using var %in% c("density". "local_n", "log_local_n") |
facet_by |
Variable or variables to facet by. |
sample_equally |
Whether or not you should downsample to the same number of cells in each plot. Default is FALSE or no. |
rasterize |
Whether to render the graphical layer as a raster image. Default is FALSE. |
raster_dpi |
If rasterize then this is the DPI used. Default = 300. |
show_trajectory_graph |
Whether to render the principal graph for the trajectory. Requires that learn_graph() has been called on cds. |
trajectory_graph_color |
The color to be used for plotting the trajectory graph. |
trajectory_graph_segment_size |
The size of the line segments used for plotting the trajectory graph. |
label_root_node |
Logical; whether to label the root node for the selected pseudotime trajectory. The function will requires that a valid pseudotime column be identified, usually as the value of the "var" argument in the form of "pseudotime_cluster_value". If you wish to use var to color the cells in some other way, the pseudotime_dim argument needs to be supplied with the correct pseudotime dimension to pick the root node from. |
pseudotime_dim |
An alternative column to pick the pseudoetime root node from, if not supplied to var. |
label_principal_points |
Logical indicating whether to label roots, leaves, and branch points with principal point names. This is useful for order_cells and choose_graph_segments in non-interactive mode. |
graph_label_size |
How large to make the branch, root, and leaf labels. |
cds |
Provided for backward compatibility with prior versions. If a value is supplied, a warning will be emitted and the value will be transferred to the obj argument, Default: NULL |
... |
Additional params for facetting. |
man_text_df |
A data frame in the form of text_x = numeric_vector, text_y = numeric_vector, label = character_vector for manually placing text labels. |
minimum_segment_length |
Minimum length of a line to draw from label to centroid. |
a ggplot
This function gets the comments slot from an Ape object.
COMMENTS(ape)COMMENTS(ape)
Generate median +/- se stat object for jitter ggplot
data_median_se(x)data_median_se(x)
x |
A numeric vector |
Generate mean +/- sd stat object for jitter ggplot
data_summary_mean_sd(x)data_summary_mean_sd(x)
x |
A numeric vector |
Generate mean +/- se stat object for jitter ggplot
data_summary_mean_se(x)data_summary_mean_se(x)
x |
A numeric vector |
Generate median +/- iqr stat object for jitter ggplot
data_summary_median_iqr(x)data_summary_median_iqr(x)
x |
A numeric vector |
Generate median +/- mad stat object for jitter ggplot
data_summary_median_mad(x)data_summary_median_mad(x)
x |
A numeric vector |
data frame into a wide sparse matrixSimilar in function to dcast, but produces a sparse
Matrix as an output. Sparse matrices are beneficial for this
application because such outputs are often very wide and sparse. Conceptually
similar to a pivot operation.
dMcast( data, formula, fun.aggregate = "sum", value.var = NULL, as.factors = FALSE, factor.nas = TRUE, drop.unused.levels = TRUE )dMcast( data, formula, fun.aggregate = "sum", value.var = NULL, as.factors = FALSE, factor.nas = TRUE, drop.unused.levels = TRUE )
data |
a data frame |
formula |
casting |
fun.aggregate |
name of aggregation function. Defaults to 'sum' |
value.var |
name of column that stores values to be aggregated numerics |
as.factors |
if TRUE, treat all columns as factors, including |
factor.nas |
if TRUE, treat factors with NAs as new levels. Otherwise, rows with NAs will receive zeroes in all columns for that factor |
drop.unused.levels |
should factors have unused levels dropped? Defaults to TRUE,
in contrast to |
Casting formulas are slightly different than those in dcast and follow
the conventions of model.matrix. See formula for
details. Briefly, the left hand side of the ~ will be used as the
grouping criteria. This can either be a single variable, or a group of
variables linked using :. The right hand side specifies what the
columns will be. Unlike dcast, using the + operator will append
the values for each variable as additional columns. This is useful for
things such as one-hot encoding. Using : will combine the columns as
interactions.
a sparse Matrix
This function cats the features slot from an Ape object.
FEATURES(ape)FEATURES(ape)
This function provides a pipe-friendly method to filter cds objects.
filter_cds(cds, cells = "all", genes = "all")filter_cds(cds, cells = "all", genes = "all")
cds |
The CDS to filter. |
cells |
Optional: a tibble of cell metadata for the cells you wish to keep. Use bb_cellmeta(). Default: 'all' |
genes |
Optional: a tibble of gene metadata for the genes you wish to keep. Use bb_rowmeta(). Default: 'all' |
A filtered CDS
Build significance annotation geoms
geom_sig_annotations( annotation_data, draw_brackets = TRUE, text_family = NULL, text_face = NULL, text_colour = "black", bracket_colour = "black", bracket_linewidth = 0.4, bracket_linetype = 1, bracket_lineend = "round", vjust = 0 )geom_sig_annotations( annotation_data, draw_brackets = TRUE, text_family = NULL, text_face = NULL, text_colour = "black", bracket_colour = "black", bracket_linewidth = 0.4, bracket_linetype = 1, bracket_lineend = "round", vjust = 0 )
annotation_data |
Output from |
draw_brackets |
Logical; draw brackets if |
text_family, text_face, text_colour
|
Text styling parameters. |
bracket_colour, bracket_linewidth, bracket_linetype
|
Bracket styling. |
bracket_lineend |
Line ending for bracket segments. Defaults to |
vjust |
Vertical justification for text. |
A list of ggplot layers.
Helpers to add significance labels and optional brackets to ggplot2 plots from a user-supplied comparison table.
Returns an object that can be added directly to a ggplot with +.
geom_sig_table( p_table, y_npc, group1_col = "group1", group2_col = "group2", label_col = "significance", x_levels = NULL, facet_cols = NULL, draw_brackets = TRUE, bracket_tip_npc = 0.015, bracket_margin_npc = 0.02, text_size_pt = NULL, star_y_npc_offset = -0.05, text_family = NULL, text_face = NULL, text_colour = "black", bracket_colour = "black", bracket_linewidth = 0.2, bracket_linetype = 1, bracket_lineend = "round", vjust = 0 )geom_sig_table( p_table, y_npc, group1_col = "group1", group2_col = "group2", label_col = "significance", x_levels = NULL, facet_cols = NULL, draw_brackets = TRUE, bracket_tip_npc = 0.015, bracket_margin_npc = 0.02, text_size_pt = NULL, star_y_npc_offset = -0.05, text_family = NULL, text_face = NULL, text_colour = "black", bracket_colour = "black", bracket_linewidth = 0.2, bracket_linetype = 1, bracket_lineend = "round", vjust = 0 )
p_table |
A data frame containing at minimum |
y_npc |
Numeric vector of label y positions in npc coordinates. |
group1_col, group2_col, label_col
|
Column names in |
x_levels |
Optional character vector giving x-axis order. |
facet_cols |
Optional character vector naming the facet columns in
|
draw_brackets |
Logical; whether brackets should be drawn. |
bracket_tip_npc |
Length of bracket tips in npc units. |
bracket_margin_npc |
Gap between label and bracket top line in npc units. |
text_size_pt |
Text size in points. Defaults to theme text size minus 2. |
star_y_npc_offset |
Upward offset, in npc units, applied only to
star-only labels like |
text_family, text_face, text_colour
|
Text styling parameters. |
bracket_colour, bracket_linewidth, bracket_linetype
|
Bracket styling. |
bracket_lineend |
Line ending for bracket segments. Defaults to |
vjust |
Vertical justification for text. |
Notes:
Facet-aware: if p_table contains facet columns matching the plot facets,
annotations are placed only in matching panels. Otherwise they are repeated
across all panels.
PDF-safe significance stars: star-only labels like "", "", "" remain ASCII and are nudged upward slightly so they sit more like "ns".
Bracket segment ends default to lineend = "round".
An object that can be added to a ggplot with +.
create a split violin geom
geom_split_violin( mapping = NULL, data = NULL, stat = "ydensity", position = "identity", ..., draw_quantiles = NULL, trim = TRUE, scale = "area", na.rm = FALSE, show.legend = NA, inherit.aes = TRUE )geom_split_violin( mapping = NULL, data = NULL, stat = "ydensity", position = "identity", ..., draw_quantiles = NULL, trim = TRUE, scale = "area", na.rm = FALSE, show.legend = NA, inherit.aes = TRUE )
mapping |
Set of aesthetitc mappings, Default: NULL |
data |
Data to display, Default: NULL |
stat |
Statistic to plot, Default: 'ydensity' |
position |
Position, Default: 'identity' |
... |
extra arguments |
draw_quantiles |
Quantiles to draw, Default: NULL |
trim |
Trim ends?, Default: TRUE |
scale |
Analogous to violin scale, Default: 'area' |
na.rm |
Default: FALSE |
show.legend |
Default: NA |
inherit.aes |
Default: TRUE |
This function takes a GRanges object and returns a character vector. Only 4 metadata fields from the GRanges object will be included: locus_tag, type, fwdcolor and revcolor. Locus_tag must be unique. This will be checked by the Ape constructor. This function should mostly be used internally in the construction and FEATURE-setting of instances of the Ape class.
granges_to_features(gr)granges_to_features(gr)
gr |
A GRanges object. |
A character vector.
This is an S4 Class for holding the relevant metadata we need for key images. The idea is to generate this object for each of the images we will use in a grant, paper or other important document. That way when we want to reuse these images we know where they are.
One can construct a single image object using the Image() constructor method. When called as such, it will open a file chooser to identify the file from the network drive. Then it will provide an interactive menu to add the needed metadata (see below).
The workflow is to use this to create a new Image object when you are using a new key image in a grant or other document. Then you will add the image to the image catalog using ImageCatalog.add.
The Image constructor provides several validation checks, including that the image file must be accessible.
Each slot has it's own getter and setter methods which are identical to the name of the slot.
It is critical that your images are stored in a common network drive. Ideally this is X/Labs/Blaser/staff/keyence imaging data
Avoid Duplication!! Please keep your all of your raw imaging data in X/Labs/Blaser/staff and subdirectories.
## S4 method for signature 'Image_' show(object) file_path(x) file_path(x) <- value species(x) species(x) <- value stage(x) stage(x) <- value genetics(x) genetics(x) <- value treatment(x) treatment(x) <- value microscope(x) microscope(x) <- value mag(x) mag(x) <- value filter(x) filter(x) <- value use(x) use(x) <- value note(x) note(x) <- value## S4 method for signature 'Image_' show(object) file_path(x) file_path(x) <- value species(x) species(x) <- value stage(x) stage(x) <- value genetics(x) genetics(x) <- value treatment(x) treatment(x) <- value microscope(x) microscope(x) <- value mag(x) mag(x) <- value filter(x) filter(x) <- value use(x) use(x) <- value note(x) note(x) <- value
file_pathPath to file. Should start with ~/network/X/Labs/Blaser...
speciesThe species being imaged.
stageThe stage of the sample. Options include various times in hpf plus other for other timepoints.
geneticsAny genetic modifications.
treatmentAny treatments performed.
microscopeThe microscope used.
magMagnification
filterThe filter or camera setup used.
useThe document the image is being used in.
noteAny additional notes.
This object holds all of the individual Image objects.
Methods are provided for
viewing as a tibble: ImageCatalog.as_tibble
writing to tsv format: ImageCatalog.write
adding an image: ImageCatalog.add
deleting an image: ImageCatalog.delete
subsetting and extracting: brackets and double brackets
An ImageCatalog object can be made from a list of Image objects using ImageCatalog(list = ).
More commonly, you will generate the ImageCatalog from a tsv catalog file. To make an image catalog this way, run ImageCatalog(catalog_path =
## S4 method for signature 'ImageCatalog' show(object) ## S4 method for signature 'ImageCatalog,ANY,ANY,ANY' x[i, j, ..., drop = TRUE] ## S4 method for signature 'ImageCatalog,ANY,ANY' x[[i, j, ..., drop = TRUE]] ## S4 method for signature 'ImageCatalog' x$name ImageCatalog.as_tibble(image_catalog) ImageCatalog.write(image_catalog, out) ImageCatalog.add(image_catalog, image) ImageCatalog.delete(image_catalog, hash)## S4 method for signature 'ImageCatalog' show(object) ## S4 method for signature 'ImageCatalog,ANY,ANY,ANY' x[i, j, ..., drop = TRUE] ## S4 method for signature 'ImageCatalog,ANY,ANY' x[[i, j, ..., drop = TRUE]] ## S4 method for signature 'ImageCatalog' x$name ImageCatalog.as_tibble(image_catalog) ImageCatalog.write(image_catalog, out) ImageCatalog.add(image_catalog, image) ImageCatalog.delete(image_catalog, hash)
link[monocle3]{learn_graph} on a Seurat objectRun link[monocle3]{learn_graph} on a Seurat object
LearnGraph(object, reduction = DefaultDimReduc(object = object), ...)LearnGraph(object, reduction = DefaultDimReduc(object = object), ...)
object |
A |
reduction |
Name of reduction to use for learning the pseudotime graph |
... |
Arguments passed to |
A cell_data_set object with the pseudotime graph
This function cats the Locus slot from an Ape object.
LOCUS(ape)LOCUS(ape)
Implementation of merge for Matrix. By explicitly
calling merge.Matrix it will also work for matrix, for
data.frame, and vector objects as a much faster alternative to
the built-in merge.
merge.Matrix( x, y, by.x, by.y, all.x = TRUE, all.y = TRUE, out.class = class(x)[1], fill.x = ifelse(is(x, "sparseMatrix"), FALSE, NA), fill.y = fill.x, ... ) join.Matrix( x, y, by.x, by.y, all.x = TRUE, all.y = TRUE, out.class = class(x)[1], fill.x = ifelse(is(x, "sparseMatrix"), FALSE, NA), fill.y = fill.x, ... )merge.Matrix( x, y, by.x, by.y, all.x = TRUE, all.y = TRUE, out.class = class(x)[1], fill.x = ifelse(is(x, "sparseMatrix"), FALSE, NA), fill.y = fill.x, ... ) join.Matrix( x, y, by.x, by.y, all.x = TRUE, all.y = TRUE, out.class = class(x)[1], fill.x = ifelse(is(x, "sparseMatrix"), FALSE, NA), fill.y = fill.x, ... )
x, y
|
|
by.x |
vector indicating the names to match from |
by.y |
vector indicating the names to match from |
all.x |
logical; if |
all.y |
logical; if |
out.class |
the class of the output object. Defaults to the class of x. Note that some output classes are not possible due to R coercion capabilities, such as converting a character matrix to a Matrix. |
fill.x, fill.y
|
the value to put in merged columns where there is no match. Defaults to 0/FALSE for sparse matrices in order to preserve sparsity, NA for all other classes |
... |
arguments to be passed to or from methods. Currently ignored |
#' all.x/all.y correspond to the four types of database joins in the
following way:
all.x=TRUE, all.y=FALSE
all.x=FALSE, all.y=TRUE
all.x=FALSE, all.y=FALSE
all.x=TRUE, all.y=TRUE
Note that NA values will match other NA values.
Often you will have a data table with repeats or batches of the same experiment. An effective way to control for batch effects is to normalize the data from each batch to a control group present in all of the experiments. To use this function, provide such a data table, identify the column holding the experimental group data, the identity of the control group to normalize by, the column holding the batch data, and the column holding the numerical data to normalize. Also select the function to average by (mean or median). The function will return the data table with three new columns: the average of the control group by batch, fold change of each observation relative to the batch average and the log2-transformed fold change. This function is pipe-friendly.
normalize_batch( data, group_col, norm_group, batch_col, data_col, fun = c("mean", "median") )normalize_batch( data, group_col, norm_group, batch_col, data_col, fun = c("mean", "median") )
data |
a tibble |
group_col |
the column containing the experimental group identifier |
norm_group |
the experimental group you want to normalize to across batches |
batch_col |
the column containing the batch identifier |
data_col |
the column with your data |
fun |
averaging function to use, Default: c("mean", "median") |
A tibble with new columns indicating batch normalization group average, fold change for each observation relative to the batch average and log2 fold change
arg_match
cli_abort
filter, group_by, summarise, select, mutate-joins, mutate
Facet-aware version. If p_table contains facet columns matching the plot,
annotations are placed only in the matching panels. If not, annotations are
repeated across all panels.
prepare_sig_annotations( plot, p_table, y_npc, group1_col = "group1", group2_col = "group2", label_col = "significance", x_levels = NULL, facet_cols = NULL, draw_brackets = TRUE, bracket_tip_npc = 0.015, bracket_margin_npc = 0.02, text_size_pt = NULL, star_y_npc_offset = 0.01 )prepare_sig_annotations( plot, p_table, y_npc, group1_col = "group1", group2_col = "group2", label_col = "significance", x_levels = NULL, facet_cols = NULL, draw_brackets = TRUE, bracket_tip_npc = 0.015, bracket_margin_npc = 0.02, text_size_pt = NULL, star_y_npc_offset = 0.01 )
plot |
A ggplot object with a discrete x-axis and continuous y-axis. |
p_table |
A data frame containing at minimum |
y_npc |
Numeric vector of label y positions in npc coordinates. |
group1_col, group2_col, label_col
|
Column names in |
x_levels |
Optional character vector giving x-axis order. |
facet_cols |
Optional character vector naming the facet columns in
|
draw_brackets |
Logical; whether brackets should be drawn. |
bracket_tip_npc |
Length of bracket tips in npc units. |
bracket_margin_npc |
Gap between label and bracket top line in npc units. |
text_size_pt |
Text size in points. Defaults to theme text size minus 2. |
star_y_npc_offset |
Upward offset, in npc units, applied only to
star-only labels like |
Star-only labels like "***" are kept as ASCII for PDF robustness and
nudged upward slightly so they sit more like "ns".
A tibble with one row per annotation per matched panel.
Use this to update, install and/or load project data. Usual practice is to provide the path to a directory holding data package tarballs. This function will find the newest version, compare that to the versions in the cache and used in the package and give you the newest version. Alternatively, provide the path to a specific .tar.gz file to install and activate that one.
If a specific version is requested, i.e. a specific .tar.gz file, and this version is already cached, it will be linked and not reinstalled. If for some reason there are multiple hashes with the same version number (usually because a package was rebuilt without incrementing the version), then the latest hash of that version will be linked.
This function accepts multiple paths, i.e. multiple independent data packages, in the form of a character vector of length >= 1. After deciding which version to install based on the inputs, the function will load all of the data objects into a single environment called deconflicted.data. The problem with loading multiple data packages into the same environment is that there may be name conflicts and objects get overridden. The problem with keeping them in separate environments is that they are difficult to specify and access. Here is how this function deals with these problems:
If length(path) > 1, the function will require a vector for the argument deconflict_string of the same length. The first element of deconflict_string will be added as a suffix to the data object from the first package in path, etc. For example if the first value of the argument deconflict_string is ".my.project.data", then all objects in the package will be suffixed with .my.project.data.
Note that you will have to reference the object correctly in your code using the proper suffix.
Also note that all of the elements of deconflict_string must be unique. But an empty string, i.e. "", is also a valid input which means that all of the names of the data objects from that package will be unchanged. This is helpful if you have a lot of code using one data package but at a later time decide you need to add a different data package. Make the deconflict string c("", ".my.new.data") and you don't have to change any of your old code.
Make sure you include a separator like . or _ but not a space as the first character of each element of deconflict_string.
If only a single package is loaded, there will be no conflicts and by default, deconflict_string is set to "".
As before, all data elements are loaded as promises which means that they are loaded into memory only when called.
Since version 9211, this function also handles on-disk storage for monocle3 cds objects. If such an object is detected within extdata, it will be loaded into the same deconflicted.data environment using the functions provided by monocle.
project_data(path, deconflict_string = "")project_data(path, deconflict_string = "")
path |
Path or vector of paths to data directory/ies. |
deconflict_string |
Character vector used to disambiguate objects from packages in path, Default: ” |
loads data as promises as a side effect
cli_abort, cli_alert
read_delim, cols
path, path_file
pmap, map
str_detect, str_extract, str_remove, str_replace
filter, pull, arrange, slice
as_tibble
rbinds a list of Matrix or matrix like objects, filling in missing columns.
rBind.fill(x, ..., fill = NULL, out.class = class(rbind(x, x))[1])rBind.fill(x, ..., fill = NULL, out.class = class(rbind(x, x))[1])
x, ...
|
Objects to combine. If the first argument is a list and
|
fill |
value with which to fill unmatched columns |
out.class |
the class of the output object. Defaults to the class of x. Note that some output classes are not possible due to R coercion capabilities, such as converting a character matrix to a Matrix. |
Similar to rbind.fill.matrix, but works for
Matrix as well as all other R objects. It is completely
agnostic to class, and will produce an object of the class of the first input
(or of class matrix if the first object is one dimensional).
The implementation is recursive, so it can handle an arbitrary number of inputs, albeit inefficiently for large numbers of inputs.
This method is still experimental, but should work in most cases. If the
data sets consist solely of data frames, rbind.fill is
preferred.
a single object of the same class as the first input, or of class
matrix if the first object is one dimensional
CDS objects can be large and because they are normally stored in memory this can lead to prolonged loading time and possibly system crashes. Use this function to save a monocle cds object to disk. The expected usage is to provide the function a cds object to save, and two directories. These directories should be within a datapackage project, but this is not strictly required. The extdata directory is where the cds data will be saved. This must be within inst (so, inst/extdata) for it to be installed and loaded with the data package using project_data. The data directory is where normal .rda files are saved for typical R objects. This will be a placeholder object with the same name. The purpose is to permit documentation of the cds object and to mitigate namespace conflicts. When the real object is loaded, the placeholder is overwritten.
save_monocle_disk(cds_disk, data_directory, extdata_directory, alt_name = NULL)save_monocle_disk(cds_disk, data_directory, extdata_directory, alt_name = NULL)
cds_disk |
The cds object to save to disk. |
data_directory |
The package data directory to save to. |
extdata_directory |
The package extdata directory to save to. |
alt_name |
An alternative name to save the cds under. Useful if the function is used programmatically. |
nothing
file_access, path, path_file
cli_abort, cli_alert
convert_counts_matrix, save_monocle_objects
SaveAs
Calculate standard error of the mean
se(x)se(x)
x |
A numeric vector |
Show an Ape Object
## S4 method for signature 'Ape' show(object)## S4 method for signature 'Ape' show(object)
Show a Trace Object
## S4 method for signature 'Trace' show(object)## S4 method for signature 'Trace' show(object)
This is an S4 Class that is part of a solution to optimize plotting heatmaps. This class is derived from the SummarizedExperiment class. It inherits structure and methods and adds some structures.
Like the SummarizedExperiment Class it is built around a matrix. Like the SingleCellExperiment and cell_data_set classes which are also derived from SummarizedExperiment, SummarizedHeatmap holds metadata about the columns and rows of the matrix. This enables plotting useful annotation information with a set of plotting functions (bb_plot_heatmap...)
Use the SummarizedHeatmap constructor to make an instance of the class from a matrix. Use colData and rowData to get or set these values. Internal validity checks will ensure the columns and rows match.
New to this object are colDendro and rowDendro slots. These hold hierarchical clustering information used for ordering the heatmap plot and plotting the dendrogrms. These are generated automatically when the object is created.
In order to manually set the order of the columns or rows, supply values to the rowOrder or colOrder parameters. This will prevent creation of dendrograms for the respective colums or rows.
SummarizedHeatmap( mat, colOrder = NULL, rowOrder = NULL, cluster_method = "ave", ... )SummarizedHeatmap( mat, colOrder = NULL, rowOrder = NULL, cluster_method = "ave", ... )
mat |
A matrix to build the object from. |
colOrder |
A character string corresponding to matrix column names. |
rowOrder |
A character string corresponding to matrix row names. |
cluster_method |
Clusterihng algorithm. See stats::hclust. |
... |
other arguments to pass into SummarizedExperiment |
A SummarizedHeatmap object
SummarizedExperiment-class, SummarizedExperiment
DataFrame-class, S4VectorsOverview
## Not run: if(interactive()){ #EXAMPLE1 mat <- matrix(rnorm(100), ncol=5) colnames(mat) <- letters[1:5] rownames(mat) <- letters[6:25] test_sh <- SummarizedHeatmap(mat) colData(test_sh)$sample_type <- c("vowel", "consonant", "consonant", "consonant", "vowel") colData(test_sh)$sample_type2 <- c("vowel2", "consonant2", "consonant2", "consonant2", "vowel2") isVowel <- function(char) char %in% c('a', 'e', 'i', 'o', 'u') rowData(test_sh)$feature_type <- ifelse(isVowel(letters[6:25]), "vowel", "consonant") rowData(test_sh)$feature_type2 <- paste0(rowData(test_sh)$feature_type, "2") } ## End(Not run)## Not run: if(interactive()){ #EXAMPLE1 mat <- matrix(rnorm(100), ncol=5) colnames(mat) <- letters[1:5] rownames(mat) <- letters[6:25] test_sh <- SummarizedHeatmap(mat) colData(test_sh)$sample_type <- c("vowel", "consonant", "consonant", "consonant", "vowel") colData(test_sh)$sample_type2 <- c("vowel2", "consonant2", "consonant2", "consonant2", "vowel2") isVowel <- function(char) char %in% c('a', 'e', 'i', 'o', 'u') rowData(test_sh)$feature_type <- ifelse(isVowel(letters[6:25]), "vowel", "consonant") rowData(test_sh)$feature_type2 <- paste0(rowData(test_sh)$feature_type, "2") } ## End(Not run)
An instance of this class is created by calling "bb_makeTrace". All slots in this object are GRanges objects. Validation checks will make sure data, peaks, links, and gene models all are bound by the same plot range on the same chromosome of the same genome. This is different from the standard GRangesList object in that each slot can have it's own metadata columns. Currently, hg38 and danRer11 are the supported genomes. Use this class to plot coverage tracks from bulk or single cell ATAC, CHIP, or similar experiments. Link plotting is available for cicero-style links.
trace_dataA GRanges object with a metadata column for "score" or "coverage" intended to be plotted as a y-variable. Metadata variables may be included to annotate color and and facets in the final plotting. This is particularly important for single cell data or when combining bulk tracks from different samples. Whole sample bigwig files can be converted to GRanges objects using import.bw from rtracklayer. All track data is trimmed during import but pre-trimming to the approximate range desired will significantly speed up processing. The plyranges package is recommended for granges manipulations such as filtering by chromosome, adding metadata and binding GRanges objects together. Bulk tracks from different samples should be pre-normalized before importing.
peaksA GRanges object containing peaks to plot in a track-style format.
linksA GRanges object with Cicero-style links.
gene_modelThe gene model for plotting. Will be automatically generated by bb_makeTrace.
plot_rangeThe master GRange for the whole object. Validity checks and/or constructors ensure all other ranges are contained within.
Get the Trace Data Slot from a Trace Object
Trace.data(trace)Trace.data(trace)
Get the gene_model Slot from a Trace Object
Trace.gene_model(trace)Trace.gene_model(trace)
Get the Links Slot from a Trace Object
Trace.links(trace)Trace.links(trace)
Get the peaks Slot from a Trace Object
Trace.peaks(trace)Trace.peaks(trace)
Get the plot_range Slot from a Trace Object
Trace.plot_range(trace)Trace.plot_range(trace)
Set the Trace Data Slot of a GRanges Object
Trace.setData(trace, gr)Trace.setData(trace, gr)
trace |
A trace object |
gr |
A GRanges object. This object will become the new trace_data. If the range is smaller, it will trim the other slots to match. Usually this is used to change range metadata only. |
Set the Links Slot of a GRanges Object
Trace.setLinks(trace, gr)Trace.setLinks(trace, gr)
trace |
A trace object |
gr |
A GRanges object. This object will become the new plot links. |
Set the peaks Slot of a GRanges Object
Trace.setpeaks(trace, gr)Trace.setpeaks(trace, gr)
trace |
A trace object |
gr |
A GRanges object. This object will become the new plot peaks. |
Set the Plot Range Slot of a GRanges Object
Trace.setRange(trace, gr)Trace.setRange(trace, gr)
trace |
A trace object |
gr |
A GRanges object. This object will become the new plot range. |