Get mutations at CpG sites. — get_cpg

Needs to be reworked for variants >1bp. Subset the mutation data and return only mutations that are found at positions with a specific motif. The default is CpG sites, but can be customizable.

Usage

get_cpg_mutations(
  mutation_data,
  regions,
  variant_types = c("-no_variant"),
  motif = "CG",
  filter_mut = TRUE
)

Arguments

mutation_data: A dataframe or GRanges object containing the mutation data to be interrogated. If supplying a data frame, the genomic coordinates must be 1-based (true for mutation data imported using import_mut_data or import_vcf_data).
regions: A GRanges object containing the genomic regions of interest in which to look for CpG sites. Must have the metadata column "sequence" populated with the raw nucleotide sequence to search for CpGs. This object can be obtained using the get_seq.R function.
variant_types: Use this parameter to choose which variation_types to include in the output. Provide a character vector of the variation _types that you want to include. Options are "ambiguous", "complex", "deletion", "insertion", "mnv", "no_variant", "snv", "sv", "uncategorized". Alternatively, provide a character vector of the variation_types that you want to exclude preceded by "-". All variation_types except those excluded will be returned. Ex. inclusion: variant_types = "snv", will return only rows with variation_type == "snv". Ex. exclusion: variant_types = "-no_variant" will return all rows, except those with variation_type == "no_variant" (default).
motif: Default "CG", which returns CpG sites. You could in theory use an arbitrary string to look at different motifs. Use with caution.
filter_mut: A logical value indicating whether the function should exclude rows flagged in the filter_mut column from the output. Default is TRUE.

Value

A GRanges object where each range is a mutation at a CpG site (a subset of mutations from the larger object provided to the function).