Calls findMotifsGenome.pl to run motif analysis over a given set of regions.

find_motifs_genome(x, path, genome, motif_length = c(8, 10, 12),
  scan_size = 100, optimize_count = 8, background = "automatic",
  local_background = FALSE, only_known = FALSE, only_denovo = FALSE,
  fdr_num = 0, cores = parallel::detectCores(),
  cache = .calc_free_mem()/4, overwrite = FALSE, keep_minimal = FALSE)

Arguments

x

data.frame with the first three columns being chromosome, start, and end coordinates, with a fourth column corresponding to a region identifier; extra columns may be kept; x may alternately be a path to an existing bed file of this format

path

where to write HOMER results

genome

ID of installed genome; check installed genomes using list_homer_packages(); examples include "hg38" and "mm10"; add an 'r' at the end to mask repeats, e.g. "mm10r"

motif_length

vector of motif lengths to consider [default is c(8, 10, 12)]

scan_size

size of sequence to scan; this can be a numeric to specify the number of bases to scan centered on the region, or alternately can be set to "given" to scan the entire region; if using "given", will use the "-chopify" option to cut large background sequences to average of target sequence size [default: 100]

optimize_count

number of motifs to optimize [default: 8]

background

data.frame containing coordinates of desired regions to use as the background; alternately may be a path to an existing bedfile; the default, "automatic", creates a background based on the GC content and (scan) size of the target sequences

local_background

a numeric scalar specifying number of equal size regions around peaks to use as the local background [by default this is not used, e.g. default: FALSE]

only_known

turns off searching for denovo motifs

only_denovo

turns off searching for known motif enrichment

fdr_num

number of randomizations to perform to calculate FDR [default: 0]

cores

number of cores to use [default: all cores available]

cache

number in MB to use as cache to store sequences in memory [default: calculates free memory and divides by 4]

overwrite

overwrite an existing HOMER results directory [default: FALSE]

keep_minimal

remove all extra clutter from results, keep only the essentials (knownResults.txt and homerMotifs.all.motifs [default: FALSE]

Value

Nothing; called for its side-effect of producing HOMER results

Details

find_motifs_genome runs the core HOMER motif enrichment function from the R system, and in the process generates (as a side-effect) a HOMER results directory.

This results directory is inspectable via a file browser, and contains a summary of the results as HTML files as well as text files.

For our purposes, within the directory two key files exist:

  • knownResults.txt known motifs that are enriched

  • homerResults.all.motif denovo motifs that are enriched

These two text files are the core results (all else can be discarded by setting keep_minimal to TRUE, and are parsed downstream by read_known_results and read_denovo_results.

See also

read_known_results, read_denovo_results