Calls to run motif analysis over a given set of regions.

find_motifs_genome(x, path, genome, motif_length = c(8, 10, 12),
  scan_size = 100, optimize_count = 8, background = "automatic",
  local_background = FALSE, only_known = FALSE, only_denovo = FALSE,
  fdr_num = 0, cores = parallel::detectCores(),
  cache = .calc_free_mem()/4, overwrite = FALSE, keep_minimal = FALSE)



data.frame with the first three columns being chromosome, start, and end coordinates, with a fourth column corresponding to a region identifier; extra columns may be kept; x may alternately be a path to an existing bed file of this format


where to write HOMER results


ID of installed genome; check installed genomes using list_homer_packages(); examples include "hg38" and "mm10"; add an 'r' at the end to mask repeats, e.g. "mm10r"


vector of motif lengths to consider [default is c(8, 10, 12)]


size of sequence to scan; this can be a numeric to specify the number of bases to scan centered on the region, or alternately can be set to "given" to scan the entire region; if using "given", will use the "-chopify" option to cut large background sequences to average of target sequence size [default: 100]


number of motifs to optimize [default: 8]


data.frame containing coordinates of desired regions to use as the background; alternately may be a path to an existing bedfile; the default, "automatic", creates a background based on the GC content and (scan) size of the target sequences


a numeric scalar specifying number of equal size regions around peaks to use as the local background [by default this is not used, e.g. default: FALSE]


turns off searching for denovo motifs


turns off searching for known motif enrichment


number of randomizations to perform to calculate FDR [default: 0]


number of cores to use [default: all cores available]


number in MB to use as cache to store sequences in memory [default: calculates free memory and divides by 4]


overwrite an existing HOMER results directory [default: FALSE]


remove all extra clutter from results, keep only the essentials (knownResults.txt and homerMotifs.all.motifs [default: FALSE]


Nothing; called for its side-effect of producing HOMER results


find_motifs_genome runs the core HOMER motif enrichment function from the R system, and in the process generates (as a side-effect) a HOMER results directory.

This results directory is inspectable via a file browser, and contains a summary of the results as HTML files as well as text files.

For our purposes, within the directory two key files exist:

  • knownResults.txt known motifs that are enriched

  • homerResults.all.motif denovo motifs that are enriched

These two text files are the core results (all else can be discarded by setting keep_minimal to TRUE, and are parsed downstream by read_known_results and read_denovo_results.

See also

read_known_results, read_denovo_results