Calls findMotifsGenome.pl to run motif analysis over a given set of regions.

find_motifs_genome(x, path, genome, motif_length = c(8, 10, 12),
scan_size = 100, optimize_count = 8, background = "automatic",
local_background = FALSE, only_known = FALSE, only_denovo = FALSE,
fdr_num = 0, cores = parallel::detectCores(),
cache = .calc_free_mem()/4, overwrite = FALSE, keep_minimal = FALSE)

## Arguments

x data.frame with the first three columns being chromosome, start, and end coordinates, with a fourth column corresponding to a region identifier; extra columns may be kept; x may alternately be a path to an existing bed file of this format where to write HOMER results ID of installed genome; check installed genomes using list_homer_packages(); examples include "hg38" and "mm10"; add an 'r' at the end to mask repeats, e.g. "mm10r" vector of motif lengths to consider [default is c(8, 10, 12)] size of sequence to scan; this can be a numeric to specify the number of bases to scan centered on the region, or alternately can be set to "given" to scan the entire region; if using "given", will use the "-chopify" option to cut large background sequences to average of target sequence size [default: 100] number of motifs to optimize [default: 8] data.frame containing coordinates of desired regions to use as the background; alternately may be a path to an existing bedfile; the default, "automatic", creates a background based on the GC content and (scan) size of the target sequences a numeric scalar specifying number of equal size regions around peaks to use as the local background [by default this is not used, e.g. default: FALSE] turns off searching for denovo motifs turns off searching for known motif enrichment number of randomizations to perform to calculate FDR [default: 0] number of cores to use [default: all cores available] number in MB to use as cache to store sequences in memory [default: calculates free memory and divides by 4] overwrite an existing HOMER results directory [default: FALSE] remove all extra clutter from results, keep only the essentials (knownResults.txt and homerMotifs.all.motifs [default: FALSE]

## Value

Nothing; called for its side-effect of producing HOMER results

## Details

find_motifs_genome runs the core HOMER motif enrichment function from the R system, and in the process generates (as a side-effect) a HOMER results directory.

This results directory is inspectable via a file browser, and contains a summary of the results as HTML files as well as text files.

For our purposes, within the directory two key files exist:

• knownResults.txt known motifs that are enriched

• homerResults.all.motif denovo motifs that are enriched

These two text files are the core results (all else can be discarded by setting keep_minimal to TRUE, and are parsed downstream by read_known_results and read_denovo_results.

read_known_results, read_denovo_results