Logo FuzzyClustering

FuzzyClustering reorganizes the solutions (instances) reported by multiple runs of MotifSampler (or another stochastic motif detector) in a shorter list of ensemble motifs. An ensemble motif is composed of a subset of instances (cluster) that often occur together in the multiple motif detection runs of MotifSampler.

Each ensemble motif is characterized by:
1) a list of instances, each with an instance membership score. An instance membership reflects the likelihood that this instance belongs to the ensemble motif. Instances with a minimal instance membership correspond to the eventual instances of the ensemble motif in the given sequence set. The instance memberships are used to weigh the contribution of each instance when calculating the PWM representation of the ensemble motif.
2) a listing of motif identifiers (each referring to a motif detected by a single run of MotifSampler), each with a motif membership score. The motif memberships allows prioritizing how well the motifs detected by each run of MotifSampler correspond to the ensemble motif.

Based on user-defined (or default) thresholds, FuzzyClustering only reports ensemble motifs 1) that have instances in a sufficiently high fraction of the given sequence set, 2) that correspond to a sufficient fraction of the detected motifs by MotifSampler and 3) that have a minimal PWM-consensus score.

To optimally run this tool and evaluate its output, please consult our guidelines (includes link to a case study).
Stand-alone executable: download.

The speed with which results are generated depends on the server load.

Last software revision : . (updated April 21, 2020)

Questions & suggestions: contact us.

Publications:

If you like our software, please use the MotifSuite publication for citing : MotifSuite publication.

References :
- (reference article on spectral graph based clustering) A. Joshi, Y. Van de Peer, T. Michoel. (2008) Analysis of a Gibbs sampler method for model based clustering of gene expression data. Bioinformatics, 24(2),176-183.


Run FuzzyClustering:

To run this application, please fill in the required input in the blank fields. In the output section, a (randomized) file name has been generated. You can overwrite this automatically generated filename with a more meaningful description if desired (do not use spaces, dots, colons,... in this name). The program parameters have been set to a default value. Please analyze if these settings apply to your case (checkout our FuzzyClustering Guidelines) and overwrite whenever needed. Pressing Submit will initiate the FuzzyClustering software on our server. An url containing the results will be sent by email.
Illustrative examples are the result of running MotifSuite on an E. coli sequence set containing the known EvgA motif (as derived from RegulonDB)(read more on the benchmark data in the case study).

  Input:
  Your email address,   we will mail you the url with the result.
  -f <filename>:   file with multiple motif-detection solutions
      in annotated instances format (EvgA example).

  Output:
  -o <filename>:   file with output motifs in instances format and additional clustering information (EvgA example).
  -O <filename>:   file with output motifs in PWM format (EvgA example).

  Parameters:
  -p <value>:   minimal detection frequency of an instance in the input file (pre-filtering).
        Default <0.1> ; range [0,1[.
  -m <value>:   minimal fractional (instance and motif) membership in an ensemble motif.
        Default this value is computed by the program for the instance and for the motif membership.
        If you provide a setting (range [0,1[), this value will apply for both the instance and motif membership.
  -i <value>:   minimal fraction of sequences where an ensemble motif must have at least one instance.
        Default <0.5> ; range ]0,1].
  -j <value>:   minimal fraction of motifs that corresponds to the ensemble motif.
        Default <0.2> ; range ]0,1].
  -c <value>:   minimal consensus score of an ensemble motif (PWM). Default <0.5> ; range [0,2[.