
Compare the overall mutation spectra between groups
Source:R/spectra_comparison.R
spectra_comparison.Rd
spectra_comparison compares the mutation spectra of groups using a modified contingency table approach.
Arguments
- mf_data
A data frame containing the MF data. This is the output from calculate_mf(). MF data should be at the desired subtype resolution. Required columns are the exp_variable column(s), the subtype column, and sum_min or sum_max.
- exp_variable
The column names of the experimental variable(s) to be compared.
- mf_type
The type of mutation frequency to use. Default is "min" (recommended).
- contrasts
a filepath to a file OR a dataframe that specifies the comparisons to be made between levels of the exp_variable(s) The table must consist of two columns, each containing a level of the exp_variable. The level in the first column will compared to the level in the second column for each row in contrasts. When using more than one exp_variable, separate the levels of each variable with a colon. Ensure that all variables listed in exp_variable are represented in each entry for the table. See details for examples.
- cont_sep
The delimiter used to import the contrasts table. Default is tab.
Value
the log-likelihood statistic G2 for the specified comparisons with the p-value adjusted for multiple-comparisons.
Details
This function creates an R * 2 contigency table of the subtype counts, where R is the number of subtypes for the 2 groups being compared. The G2 likelihood ratio statistic is used to evaluate whether the proportion (count/group total) of each mutation subtype equals that of the other group.
The G2 statistic refers to a chi-squared distribution to compute the p-value for large sample sizes. When N / (R-1) < 20, where N is the total mutation counts across both groups, the function will use a F-distribution to compute the p-value in order to reduce false positive rates.
The comparison assumes independance among the observations, as such, it is highly recommended to use mf_type = "min".
Examples of contrasts
:
For 'exp_variable = "dose"` with dose groups 0, 12.5, 25, 50, compare each
treated dose to the control:
12.5 0
25 0
50 0
Ex. Consider two 'exp_variables = c("dose", "tissue")`; with levels dose (0, 12.5, 25, 50) and tissue("bone_marrow", "liver"). To compare the mutation spectra between tissues for each dose group, the contrast table would look like:
0:bone_marrow 0:liver
12.5:bone_marrow 12.5:liver
25:bone_marrow 25:liver
50:bone_marrow 50:liver
Examples
# Load the example data
example_file <- system.file("extdata", "Example_files",
"example_mutation_data_filtered.rds",
package = "MutSeqR")
example_data <- readRDS(example_file)
# Example: compare 6-base mutation spectra between dose groups
# Calculate the mutation frequency data at the 6-base resolution
mf_data <- calculate_mf(mutation_data = example_data,
cols_to_group = "dose_group",
subtype_resolution = "base_6")
#> Performing internal depth correction to prevent double-counting...
#> Internal depth correction complete.
#> Joining with `by = join_by(dose_group, normalized_ref)`
# Create the contrasts table
contrasts <- data.frame(col1 = c("Low", "Medium", "High"),
col2 = rep("Control", 3))
# Run the comparison
spectra_comparison(mf_data = mf_data,
exp_variable = "dose_group",
mf_type = "min",
contrasts = contrasts)
#> Using chi-squared distribution to compute p-value
#> Using chi-squared distribution to compute p-value
#> Using chi-squared distribution to compute p-value
#> contrasts G2 p.value adj_p.value Significance
#> 1 Low vs Control 195.6281 0 0 ***
#> 2 Medium vs Control 503.3807 0 0 ***
#> 3 High vs Control 714.0200 0 0 ***