Create a GRanges object from the genomic target ranges and import raw nucleotide sequences.
Usage
get_seq(
regions,
rg_sep = "\t",
is_0_based_rg = TRUE,
species = NULL,
genome = NULL,
masked = FALSE,
padding = 0,
ucsc = FALSE
)
Arguments
- regions
The regions metadata file to import. Can be either a file path, a data frame, or a GRanges object. File paths will be read using the rg_sep. Users can also choose from the built-in TwinStrand's Mutagenesis Panels by inputting "TSpanel_human", "TSpanel_mouse", or "TSpanel_rat". Required columns for the regions file are "contig", "start", and "end". In a GRanges object, the required columns are "seqnames", "start", and "end".
- rg_sep
The delimiter for importing the regions file. The default is tab-delimited ("\t").
- is_0_based_rg
A logical variable. Indicates whether the position coordinates in
regions
are 0 based (TRUE) or 1 based (FALSE). If TRUE, positions will be converted to 1-based (start + 1). Need not be supplied for TSpanels. Default is TRUE.- species
The species for which to retrieve the sequences. Species may be given as the scientific name or the common name. Ex. "Human", "Homo sapien". Used to choose the appropriate BS genome. Need not be supplied for TSpanels.
- genome
The genome assembly version for which to retrieve the sequences. Used to choose the appropriate genome (BS genome or UCSC). Ex. hg38, hg19, mm10, mm39, rn6, rn7. Need not be supplied for TSpanels.
- masked
A logical value indicating whether to use the masked version of the BS genome when retrieving sequences. Default is FALSE.
- padding
An integer value by which the function will extend the range of the target sequence on both sides. Start and end coordinates will be adjusted accordingly. Default is 0.
- ucsc
A logical value. If TRUE, the function will retrieve the sequences from the UCSC genome browser using an API. If FALSE, the function will retrieve sequences using the appropriate BSgenome package, which will be installed as needed. Default is FALSE.
Details
Consult
available.genomes(splitNameParts=FALSE, type=getOption("pkgType"))
for a full list of the available BS genomes and their associated
species/genome/masked values. The BSgenome package will be installed if
not already available. If using the UCSC API, the function will retrieve
the sequences from the UCSC genome browser using the DAS API. See the
UCSC website for available genomes: https://genome.ucsc.edu.
Examples
# Example 1: Retrieve the sequences for TwinStrand Mouse Mutagenesis Panel
regions_seq <- get_seq(regions = "TSpanel_mouse")
#> 'getOption("repos")' replaces Bioconductor standard repositories, see
#> 'help("repositories", package = "BiocManager")' for details.
#> Replacement repositories:
#> CRAN: https://cran.rstudio.com
#> Reference genome already installed.
#> Loading reference genome: BSgenome.Mmusculus.UCSC.mm10.
# Example 2: Retrieve the sequences for custom regions
# We will load the TSpanel_human regions file as an example
# and supply it to the function as a GRanges object.
human <- load_regions_file("TSpanel_human")
regions_seq <- get_seq(regions = human,
is_0_based_rg = FALSE,
species = "human",
genome = "hg38",
masked = FALSE,
padding = 0)
#> 'getOption("repos")' replaces Bioconductor standard repositories, see
#> 'help("repositories", package = "BiocManager")' for details.
#> Replacement repositories:
#> CRAN: https://cran.rstudio.com
#> Reference genome already installed.
#> Loading reference genome: BSgenome.Hsapiens.UCSC.hg38.