Skip to contents

Create a GRanges object from the genomic target ranges and import raw nucleotide sequences.

Usage

get_seq(
  regions,
  rg_sep = "\t",
  is_0_based_rg = TRUE,
  species = NULL,
  genome = NULL,
  masked = FALSE,
  padding = 0,
  ucsc = FALSE
)

Arguments

regions

The regions metadata file to import. Can be either a file path, a data frame, or a GRanges object. File paths will be read using the rg_sep. Users can also choose from the built-in TwinStrand's Mutagenesis Panels by inputting "TSpanel_human", "TSpanel_mouse", or "TSpanel_rat". Required columns for the regions file are "contig", "start", and "end". In a GRanges object, the required columns are "seqnames", "start", and "end".

rg_sep

The delimiter for importing the regions file. The default is tab-delimited ("\t").

is_0_based_rg

A logical variable. Indicates whether the position coordinates in regions are 0 based (TRUE) or 1 based (FALSE). If TRUE, positions will be converted to 1-based (start + 1). Need not be supplied for TSpanels. Default is TRUE.

species

The species for which to retrieve the sequences. Species may be given as the scientific name or the common name. Ex. "Human", "Homo sapien". Used to choose the appropriate BS genome. Need not be supplied for TSpanels.

genome

The genome assembly version for which to retrieve the sequences. Used to choose the appropriate genome (BS genome or UCSC). Ex. hg38, hg19, mm10, mm39, rn6, rn7. Need not be supplied for TSpanels.

masked

A logical value indicating whether to use the masked version of the BS genome when retrieving sequences. Default is FALSE.

padding

An integer value by which the function will extend the range of the target sequence on both sides. Start and end coordinates will be adjusted accordingly. Default is 0.

ucsc

A logical value. If TRUE, the function will retrieve the sequences from the UCSC genome browser using an API. If FALSE, the function will retrieve sequences using the appropriate BSgenome package, which will be installed as needed. Default is FALSE.

Value

a GRanges object with sequences of targeted regions.

Details

Consult available.genomes(splitNameParts=FALSE, type=getOption("pkgType")) for a full list of the available BS genomes and their associated species/genome/masked values. The BSgenome package will be installed if not already available. If using the UCSC API, the function will retrieve the sequences from the UCSC genome browser using the DAS API. See the UCSC website for available genomes: https://genome.ucsc.edu.

Examples

# Example 1: Retrieve the sequences for TwinStrand Mouse Mutagenesis Panel
regions_seq <- get_seq(regions = "TSpanel_mouse")
#> 'getOption("repos")' replaces Bioconductor standard repositories, see
#> 'help("repositories", package = "BiocManager")' for details.
#> Replacement repositories:
#>     CRAN: https://cran.rstudio.com
#> Reference genome already installed.
#> Loading reference genome: BSgenome.Mmusculus.UCSC.mm10.

# Example 2: Retrieve the sequences for custom regions
# We will load the TSpanel_human regions file as an example
# and supply it to the function as a GRanges object.
human <- load_regions_file("TSpanel_human")
regions_seq <- get_seq(regions = human,
                       is_0_based_rg = FALSE,
                       species = "human",
                       genome = "hg38",
                       masked = FALSE,
                       padding = 0)
#> 'getOption("repos")' replaces Bioconductor standard repositories, see
#> 'help("repositories", package = "BiocManager")' for details.
#> Replacement repositories:
#>     CRAN: https://cran.rstudio.com
#> Reference genome already installed.
#> Loading reference genome: BSgenome.Hsapiens.UCSC.hg38.