Skip to contents

Given mf data, construct a plot displaying the mutation subtypes observed in a cohort.

Usage

plot_spectra(
  mf_data,
  group_col = "sample",
  subtype_resolution = "base_6",
  response = "proportion",
  mf_type = "min",
  group_order = "none",
  group_order_input = NULL,
  dist = "cosine",
  cluster_method = "ward.D",
  custom_palette = NULL,
  x_lab = NULL,
  y_lab = NULL,
  rotate_xlabs = FALSE
)

Arguments

mf_data

A data frame containing the mutation frequency data at the desired subtype resolution. This is obtained using the 'calculate_mf' function with subtype_resolution set to the desired resolution. Data must include a column containing the group_col, a column containing the mutation subtypes, a column containing the desired response variable (mf, proportion, sum) for the desired mf_type (min or max), and if applicable, a column containing the variable by which to order the samples/groups.

group_col

The name of the column(s) in the mf data that contains the sample/group names. This will generally be the same values used for the cols_to_group argument in the calculate_mf function. However, you may also use groups that are at a higher level of the aggregation in mf_data.

subtype_resolution

The subtype resolution of the mf data. Options are base_6, base_12, base_96, base_192, or type. Default is base_6.

response

The desired response variable to be plotted. Options are mf, proportion, or sum. Default is proportion. Your mf_data must contain columns with the name of your desired response: mf_min, mf_max, proportion_min, proportion_max, sum_min, and sum_max.

mf_type

The mutation counting method to use. Options are min or max. Default is min.

group_order

The method for ordering the samples within the plot. Options include:

  • none: No ordering is performed. Default.

  • smart: Groups are automatically ordered based on the group names (alphabetical, numerical)

  • arranged: Groups are ordered based on one or more factor column(s) in mf_data. Column names are passed to the function using the group_order_input.

  • custom: Groups are ordered based on a custom vector of group names. The custom vector is passed to the function using the group_order_input.

  • clustered: Groups are ordered based on hierarchical clustering. The dissimilarity matrix can be specified using the dist argument. The agglomeration method can be specified using the cluster_method argument.

group_order_input

A character vector specifying details for the group order method. If group_order is arranged, group_order_input should contain the column name(s) to be used for ordering the samples. If group_order is custom, group_order_input should contain the custom vector of group names.

dist

The dissimilarity matrix for hierarchical clustering. Options are cosine, euclidean, maximum, manhattan, canberra, binary or minkowski. The default is cosine. See dist for details.

cluster_method

The agglomeration method for hierarchical clustering. Options are ward.D, ward.D2, single, complete, average (= UPGMA), mcquitty (= WPGMA), median (= WPGMC) or centroid (= UPGMC). The default is Ward.D. See hclust for details.

custom_palette

A named vector of colors to be used for the mutation subtypes. The names of the vector should correspond to the mutation subtypes in the data. Alternatively, you can specify a color palette from the RColorBrewer package. See brewer.pal for palette options. You may visualize the palettes at the ColorBrewer website: https://colorbrewer2.org/. Default is NULL.

x_lab

The label for the x-axis. Default is the value of group_col.

y_lab

The label for the y-axis. Default is the value of response_col.

rotate_xlabs

A logical value indicating whether the x-axis labels should be rotated 90 degrees. Default is FALSE.

Examples

# Load example data
example_file <- system.file("extdata", "Example_files",
                            "example_mutation_data_filtered.rds",
                            package = "MutSeqR")
example_data <- readRDS(example_file)

# Example 1: plot the proportion of 6-based mutation subtypes
# for each sample, organized by dose group:

# Calculate the mutation frequency data at the 6-base resolution.
# Retain the dose_group column to use for ordering the samples.
mf_data <- calculate_mf(mutation_data = example_data,
                        cols_to_group = "sample",
                        subtype_resolution = "base_6",
                        retain_metadata_cols = "dose_group")
#> Performing internal depth correction to prevent double-counting...
#> Internal depth correction complete.
#> Joining with `by = join_by(normalized_subtype, sample)`
#> Joining with `by = join_by(sample, normalized_ref)`
# Set the desired order for the dose_group levels.
mf_data$dose_group <- factor(mf_data$dose_group,
                             levels = c("Control", "Low", "Medium", "High"))
# Plot the mutation spectra
plot <- plot_spectra(mf_data = mf_data,
                     group_col = "sample",
                     subtype_resolution = "base_6",
                     response = "proportion",
                     group_order = "arranged",
                     group_order_input = "dose_group")

# Example 2: plot the proportion of 6-based mutation subtypes
# for each sample, ordered by hierarchical clustering:
plot <- plot_spectra(mf_data = mf_data,
                     group_col = "sample",
                     subtype_resolution = "base_6",
                     response = "proportion",
                     group_order = "clustered")
#> Warning: `scale_x_dendrogram()` was deprecated in ggh4x 0.3.0.
#>  Please use `legendry::scale_x_dendro()` instead.
#>  The deprecated feature was likely used in the MutSeqR package.
#>   Please report the issue at
#>   <https://github.com/EHSRB-BSRSE-Bioinformatics/MutSeqR/issues>.