MotifSampler

! Attention - update on August 04, 2020

Dear user,
You can already read about the new functionality of using a Position Specific Prior (PSP) in MotifSampler. The software to actually also do this is currently in test modus and not available here. Please be patient. If you have questions in the meantime, contact us by sending an email.

MotifSampler is a probabilistic de novo motif detection tool for DNA sequences upstream of coregulated genes from one species. Detection is done by means of a stochastic optimization strategy (a Gibbs sampling approach) that searches for all possible sets of short DNA segments that are overrepresented in the sequence dataset compared to the surrounding nucleotides (also called the non-functional background). Optionally, also position-specific regulatory evidence (beyond sequence-specificity) can be used to guide the motif search. The output of MotifSampler must be provided to MotifRanking and/or FuzzyClustering to extract the most likely true motifs from the list of multiple solutions reported by MotifSampler.

To optimally run this tool and evaluate its output, please consult our guidelines (includes link to a case study).
We propose a comprehensive de novo motif detection workflow in step by step approach.
Stand-alone executable: download.

The speed with which results are generated depends on the server load.

Last software revision : . (updated April 21, 2020)

Questions & suggestions: contact us.

Publications:

If you like our software, please use the MotifSuite publication for citing : MotifSuite publication.

References :
- G. Thijs, K. Marchal, M. Lescot, S. Rombauts, B. De Moor, P. Rouze and Y. Moreau. A Gibbs Sampling method to detect over-represented motifs in the upstream regions of co-expressed genes. 2002. Journal of Computational Biology, 9 (3):447-464.
- G. Thijs, Y. Moreau, F. De Smet, J. Mathys, M. Lescot, S. Rombauts, P. Rouze B. De Moor, and K. Marchal. INCLUSive: INtegrated Clustering, Upstream sequence retrieval and motif Sampling. 2002. Bioinformatics, 18(2):331-2.

Run MotifSampler:

To run this application, please fill in the required input in the blank fields. In the output section, a (randomized) file name has been generated. You can overwrite this automatically generated filename with a more meaningful description if desired (do not use spaces, dots, colons,... in this name). The program parameters have been set to a default value. Please analyze if these settings apply to your case (checkout our MotifSampler Guidelines) and overwrite whenever needed. Pressing Submit will initiate the MotifSampler software on our server. An url containing the results will be sent by email.
Illustrative examples are the result of running MotifSuite on an E. coli sequence set containing the known EvgA motif (as derived from RegulonDB)(read more on the benchmark data in the case study).

Input:
Your email address, we will mail you the url with the result.
-f <filename>: file with DNA sequences in Fasta format (EvgA example).
-q <filename>: (optional) file with position-specific prior (PSP) scores over DNA positions in PSP format (EvgA example).
-b <filename>: file with genome-specific backgroundmodel (format, EvgA example), please choose your way :
Upload your own backgroundmodel file
Select a precompiled backgroundmodel file on our server
Please enter your file :
Please select the organism and order :

Output:
-o <filename>: file with solutions in annotated instances format (EvgA example)
-m <filename>: file with solutions in PWM format (EvgA example)

Parameters:
-r <value>: number of times one algorithm run should be repeated with the same parameter settings on the same input sequence dataset. Default <100>.
-s <0\|1>: default <1> both strands of the sequences will be analyzed (i.e. input sequences and the reverse complement). <0> is only input sequences.
-w <value>: length of the motif. Default <8>.
-n <value>: number of different motifs to search for. Default <1>.
-x <value>: maximal allowed overlap between different motifs (only used if -n > 1). Default <1>.
-M <value>: maximum number of instances of a motif to search for in any sequence. Default <2>.
-p <prior>: sets prior information on the number of instances of a motif to search for per sequence. Default is tuned towards mainly 1 instance per sequence (but also 0 and 2 allocations are possible). Read more in 5 prior types for more options on this parameter.
-Q <value>: sets a weight on the PSP information (provided in -q file) compared to applying a uniform PSP. Default <10>
-z <0\|1>: Temporary parameter, only for internal use. Please do not change the setting unless you are informed on its impact on motif sampling. Default <1>.
! Proceed with MotifRanking (or FuzzyClustering) to prioritize your MotifSampler output (why? step2/3).