Logo NOrthoMotifSampler

NOrthoMotifSampler is a probabilistic de novo motif detection tool for orthologous regulatory DNA regions from phylogenetically related organisms. The basic idea is that selective pressure causes functional elements to evolve at a slower rate than non-functional sequences. Detection is done by means of a stochastic optimization strategy (a Gibbs sampling approach) that searches for all possible sets of short DNA segments amongst a set of orthologous regulatory regions that are evolutionary better conserved compared to the surrounding nucleotides (also called the non-functional background). The output of NOrthoMotifSampler must be provided to MotifRanking and/or FuzzyClustering to extract the most likely true motifs from the list of multiple solutions reported by NOrthoMotifSampler.

To optimally run this tool and evaluate its output, please consult our guidelines (to do) ((maybe:) includes link to a case study).
Stand-alone executable: download (to do).

The speed with which results are generated depends on the server load.

Last software revision : . (updated April 21, 2020)

Questions & suggestions: contact us.

Publications:

If you like our software, please use the (to do) publication for citing : (to do)

References :
(to do)

Run NOrthoMotifSampler:

To run this application, please fill in the required input in the blank fields. In the output section, a (randomized) file name has been generated. You can overwrite this automatically generated filename with a more meaningful description if desired (do not use spaces, dots, colons,... in this name). The program parameters have been set to a default value. Please analyze if these settings apply to your case (checkout our NOrthoMotifSampler Guidelines) and overwrite whenever needed. Pressing Submit will initiate the NOrthoMotifSampler software on our server. An url containing the results will be sent by email.
Illustrative examples are the result of running NOrthoMotifSampler on orthologous sequences from yeast (Saccharomyces species) containing known binding sites for the Urs1h regulator (read more (to do)).

  Input:
  Your email address,   we will mail you the url with the result.
  -f <filename>:   file with orthologous DNA sequences grouped per target gene, in Fasta format ( Urs1h example).
  -b <filename>:  file listing the organisms with the names of the respective background model files (format, Saccharomyces example).
    MARK: This file format requires that the factual background model files (listed in the descriptive background model file-names file) are present in the MotifSuite upload directory or the Background Model Database directory (/www/group/biocomp/extra/bioinformatics_prod/webtools/MotifSuite/bg-thijs/). If not so, send me the files or store them yourself on /www/bioapp/MotifSuite/northomotifsampler_uploads/).
(to do:)
Ideally, the above sequences FASTA file should first be loaded, reading the names of all involved organisms, which are then prompted in the '-b' section for the user to supply the respective background models files (i.e. files that can be uploaded from the background model database from a dropdown menu, or files that will be supplied by the user - as is done in MotifSampler for one organism only).
  -c <filename>:   file describing the rooted phylogenetic tree with branchlengts, in Tree format ( Saccharomyces example).

  Output:
  -o <filename>:   file with solutions in annotated instances format (Urs1h example)
  -m <filename>:   file with solutions in PWM format (Urs1h example)

  Parameters:
  -r <value>:   number of times one algorithm run should be repeated with the same parameter settings on the same input sequence dataset. Default <100>.
  -s <0|1>:   default <1> both strands of the sequences will be analyzed (i.e. input sequences and the reverse complement). <0> is only input sequences.
  -w <value>:   length of the motif. Default <8>.
  -n <value>:   number of different motifs to search for. Default <1>.
  -x <value>:   maximal allowed overlap between different motifs (only used if -n > 1). Default <1>.
  -M <value>:   maximum number of instances of a motif to search for in any sequence. Default <2>.
  -p <prior>:   sets prior information on the number of motif instances to search for per sequence. Default is tuned towards mainly 1 instance per sequence (but also 0 and 2 allocations are possible). Read more in 5 types prior for more options on this parameter.
  -i <value>:   minimal fraction of input sequences to have at least one motif instance. Default <0.5>.
  -N <value>:   maximal masked fraction of an input sequence at start of a motif search. Default <0.2>.
  -j <value>:   fractional slower motif evolution rate compared to background. Default <0.5>.
  -k <value>:   proportional weight of prior evolution counts (high k) compared to data-inferred evolution (k=1). Default <1000>.
  -Z <0|1>:   output also species-specific motif results. Default <0>(=no).


! Proceed with MotifRanking (or FuzzyClustering) to prioritize your NOrthoMotifSampler output (why? step2/3).