BlockAligner Helpfile

Overview

Required arguments
Optional arguments
Output definitions
INCLUSive motif model format
Example

The BlockAligner uses a local ungapped alignment strategy based on dynamic programming to mutually compare conserved promoter regions (i.e. blocks) represented by their respective motif models.
Some basic remarks on the program:

The program should be started from the command line. A full description of the required and optional arguments can be found below.
The final results are printed either on STDOUT or in a file in GFF format.
On the STDERR you can monitor the progress of the program.

Required Arguments

Switch Argument Description

-m file File containing the query motif models (in INCLUSive format). Format description of this file can be found below.

-d file File containing database of models (in INCLUSive format) with which all query motifs will be compared. Format description of this file can be found below.

Switch	Argument	Description
-m	file	File containing the query motif models (in INCLUSive format). Format description of this file can be found below.
-d	file	File containing database of models (in INCLUSive format) with which all query motifs will be compared. Format description of this file can be found below.

Optional Arguments

Switch Argument Description

-t value Maximal distance between two conserved blocks to have a common part that is considered as the same motif (default 0.4).

-g value Gap score (default 0.4). Because a biological motif is often "gapped" (i.e. consisting of conserved nucleotides intersected by some non-conserved nucleotides, a small non-match penalty can be introduced (i.e. "gap score"). Remark that this is different from a gap score as insertions and deletions are not explicitly modelled (local ungapped alignment).

-w value Sets the minimal length of reported common motif (default 4).

-s value To assess the significance of the results, the alignment procedure can be repeated a number of times on the same motif model but after randomly shuffling their columns Based on the alignment scores of these randomly shuffled motif models, the parameters for an extreme value distribution are estimated. This permits to asign a p-value to the real alignment. The number of times the alignment procedure is repeated with randomly shuffled motif models, can be set with the number of shuffles. The higher this number, the more accurate the parameter estimation of the extreme value distribution.

Switch	Argument	Description
-t	value	Maximal distance between two conserved blocks to have a common part that is considered as the same motif (default 0.4).
-g	value	Gap score (default 0.4). Because a biological motif is often "gapped" (i.e. consisting of conserved nucleotides intersected by some non-conserved nucleotides, a small non-match penalty can be introduced (i.e. "gap score"). Remark that this is different from a gap score as insertions and deletions are not explicitly modelled (local ungapped alignment).
-w	value	Sets the minimal length of reported common motif (default 4).
-s	value	To assess the significance of the results, the alignment procedure can be repeated a number of times on the same motif model but after randomly shuffling their columns Based on the alignment scores of these randomly shuffled motif models, the parameters for an extreme value distribution are estimated. This permits to asign a p-value to the real alignment. The number of times the alignment procedure is repeated with randomly shuffled motif models, can be set with the number of shuffles. The higher this number, the more accurate the parameter estimation of the extreme value distribution.

Output Description

Switch Argument Description

-o file Sets the output file to save the results. Default the results are written to STDOUT.

-M file Sets the file name of the matrix file to store the common matrices between both blocks. If not provided the matrices are not saved.

Switch	Argument	Description
-o	file	Sets the output file to save the results. Default the results are written to STDOUT.
-M	file	Sets the file name of the matrix file to store the common matrices between both blocks. If not provided the matrices are not saved.

INCLUSive motif model format

A INCLUSive motif model is stored as an ascii text file using a well defined format. Below you can find an example of conserved blocks found in the intergenic regions of recN in Salmonella typhimurium and its orthologs. The file should always start with the word #INCLUSive at the first position of the file. Next, there are lines representing the BlockID, the score, the width and the consensus of the motif model respectively. Finally the data itself is represented, where each row represents one position in the motif model, and each column represents one of the 4 bases (A, C, G or T, in that order).

#INCLUSive Motif Model
#
#ID = block_recN|NC_003197_1
#Score = 562.831
#W = 60
#Consensus = TACGyCAGCCTCTTTACTGTATATAAAACCAGTTTATACTGTAywCAATwACAGTmATGG
0.0125109	0.128344	0.00868059	0.850465	
0.970187	0.00863404	0.00868059	0.0124982	
0.0125109	0.96631	0.00868059	0.0124982	
0.0125109	0.128344	0.846647	0.0124982	
0.0125109	0.607182	0.00868059	0.371627	
0.0125109	0.726891	0.00868059	0.251917	
0.970187	0.00863404	0.00868059	0.0124982	
0.0125109	0.00863404	0.966357	0.0124982	
0.0125109	0.96631	0.00868059	0.0124982	
0.0125109	0.846601	0.00868059	0.132208	
0.13222	0.00863404	0.00868059	0.850465	
0.0125109	0.96631	0.00868059	0.0124982	
0.0125109	0.00863404	0.00868059	0.970174	
0.0125109	0.00863404	0.00868059	0.970174	
...
...

Example

Here is a step-by-step example on how to use the BlockAligner. The current version is a Linux version. To make sure that all the file specifications are clear, an example data set is provided as additional data file.

1. Software installation

The first step is the installation of the program. Download our software here . If you save it, make it executable (chmod 755 BlockAligner) and make sure that the program is included in your path. You can test if it works by just typing BlockAligner at the prompt without any option.
The output should look like this:

ssh|pmonsieu>BlockAligner
Seed = 2081726080

Usage: BlockAligner 

 Required Arguments
  -m <matrixFile>     File containing the query motif models.
  -d <matrixFile>     File containing database of models with which all query motifs will be compared.

 Optional Arguments
  -t <value>          Maximal distance between two motifs to be considered as the same motif (default 0.4)
  -g <value>          Gap score (default 0.4)
  -w <value>          Minimal length of reported common motif (default 4)
  -s <value>          Number of shuffles of blocks to assess significance (default = 0)
    
  -o <outFile>        Output file to write results to.
  -M <filename>       File to write common matrices.

  -v                  Version of MotifComparison

Version 3.1 -- the bug fix release
Questions and Remarks:

2. Input Matrices

Input files containing the query matrix / matrices and the database matrices need to have the INCLUSive format. We give here an example of a database file and a query file.

3. Run BlockAligner

We use the default parameters of BlockAligner except for

-o blockaligner.out The output is written to a text file
-M blockaligner.matrix Common matrices between query and database matrices are written to a matrix file
-w 6 Common part between two overlapping matrices needs to be at least 6 nucleotids.
-s 100 We perform 100 shuffles in order to assess a significance to each alignment with BlockAligner

Command line: BlockAligner -d database.matrix -m query.matrix -o blockaligner.out -M blockaligner.matrix -s 100 -w 8 >error.log

Note that in this example the STDERR is redirected to 'error.log'.

block_recN|NC_003197_76	72	5	block_lexA|NC_003197_24	97	76	21	3.3	+1	CTTTACTGTATAwAAAACCAG	CATrAyTGTATATACACCCAG	0.0142371	0
block_recN|NC_003197_76	72	8	block_uvrB|NC_003197_13	88	64	19	3.7	-1	TACTGTATAwAAAACCAGT	TACTGGATrAAAAAACAGT	3.52575e-05	0
block_recN|NC_003197_76	72	54	block_uvrB|NC_003197_78	87	27	9	1.7	-1	TTTTTCATA	TTTTTAACA	0.674001	0
block_recN|NC_003197_76	72	54	block_uvrB|NC_003197_82	68	28	9	1.7	-1	TTTTTCATA	TTTTTAACA	0.728504	0
block_recN|NC_003197_76	72	62	block_uvrB|NC_003197_92	80	58	9	2.13264	-1	ACAGGAAAA	ACAGGAATA	0.0330056	0
block_recN|NC_003197_76	72	10	block_uvrD|NC_003197_1	26	6	18	3.4	+1	CTGTATAwAAAACCAGTT	CTGTATAwATwCCCAGyT	8.71482e-05	0
block_recN|NC_003197_76	72	4	block_uvrD|NC_003197_32	8	0	8	1.4	+1	TCTTTACT	TCTTCTCT	0.334046	0
block_recN|NC_003197_76	72	48	block_dinI|NC_003197_82	13	4	9	1.2	+1	TmATGGTTT	TmsTrGmTT	0.29316	0
block_recN|NC_003197_76	72	6	block_dinI|NC_003197_89	38	1	27	5.1	+1	TTTACTGTATAwAAAACCAGTTTATAC	TTAmCTGTATAwATAwCCAGTATATTC	1.09177e-06	0

This output contains the following information:

column 1: ID of the query matrix
column 2: lenght of the query matrix
column 3: start position of the overlapping part with the database matrix
column 4: ID of the database matrix
column 5: length of the database matrix
column 6: start position of the overlapping part with the query matrix
column 7: length of the overlapping part
column 8: score of the alignment
column 9: indicates whether overlap is found in direct version of database matrix or the reverse complement
column 10: consensus-site in the query matrix
column 11: consensus-site in the database matrix
column 12: p-value of the alignment (= 0 if number of shuffles s is 0)

Take a look at the example of the output file 'blockaligner.out' and overlapping matrix file 'blockaligner.matrix'. The resulting files should look more or less like this.

Top