Logo MotifComparison Extra

This page is linked to MotifComparison Guidelines and should be read accordingly.

Extra : compare our p-BLiC metric with the design described in Habib et al.(2008)

(back to : guidelines)
The design of our p-BLiC metric as implemented in MotifComparison (further called p-BLIC[MC]) is largely based on the original Bayesian Likelihood 2-Component score (further called BLiC[Habib]) described in the paper of Habib et al.(2008).

The major difference in the BLiC computation is that BLiC[Habib] uses nucleotide counts (values in general higher than 1, proportional with the number of instances of a motif) in their formula while we use nucleotide frequencies (normalized values between 0 and 1) in our formula. Nucleotide frequencies express the same information as nucleotide counts when used in a relative way (which is what 'comparing' does), nucleotide frequencies are directly available in the PWM representation of a motif (which is the default format used in MotifComparison) and using nucleotide frequencies avoids that a (database) motif described by a high number of instances (high counts) would overshadow the information coming from a (query) motif described by a low number of instances. It was indeed observed (Tanaka, E. et al.(2011)) that the BLiC[Habib] score may become non-selective (the BLiC[Habib] score becomes high for many query motifs compared with a database motif composed of a large number of instances, irrespective of the properties of the query motif) and sensitive to reporting false positive similarities.

Secondly, in BLiC[Habib], the unaligned flanks of the motif are scored according to their distance from the background distribution multiplied by a relaxing factor of 0.2. In our BLiC computation, the BLiC[MC] score is only computed on the aligned positions of the best possible alignment between the two motifs being compared. Unaligned flanks are left unscored reasoning that (dis)similarity to the background distribution of the flanking part is not a prerequisite if motifs being compared are of different length.

A statistical calibration of the BLiC score is done by computing the p-value against a distribution of scores of random motifs (negative control set). For BLiC[Habib], the random motifs were generated by sampling positions of motifs from the TRANSFAC database. In p-BLiC[MC], the random motifs are generated by shuffling positions in the given query and database motif, reasoning that the overall characteristics (among which the motif length) of the motifs being compared are best retained.


Mark that in the paper of Habib, et al. (2008), a correction has recently (2011) been suggested for the original BLiC[Habib] score. The corrected BLiC[Habib] formula uses Jenson-Shannon which is by definition finite and symmetrical, which may circumvent the above mentioned problem of high counts. The authors did however not assess the performance of this corrected formula.



Feedback

Contact us if you have comments, questions or suggestions or simply want to react on the contents of this guideline. Thank you.