Logo Background model file names format

This page describes the format of a file describing a list of organisms with the names of files that describe their respective genome-specific background models. Mark that the background model itself (representing the probability of the nucleotides A,C,G,T to be non-functional non-coding data) is NOT described in this file.
We comment on all optional and required fields in case you need to supply such a file as input to MotifSuite.

A list of background model file-names is required as input for NOrthoMotifSampler.
Page contents :
    File format
    Conversion requirements
    Example



Background model file names format

The different organisms and their corresponding file names are described on separate lines. Each line in the file starts with greater than ('>') symbol, immediately followed by the name of an organism. The names of the organisms involved should be exactly the organism-identifiers used in your sequences FASTA file (in this file, the organism names also follow the greater than '>' symbol in the sequence identifier lines, see Fasta format).In the same line, separated by a tab, follows the name of the file that will describe the background model for the respective organism.
The file should end with a blank line return to asure that also the last sequence in the dataset is being loaded by the program.



Conversion requirements

- Lines describing organisms that do not have sequences in your FASTA file are simply ignored by our software.
- Any file name extension can be used, preferably use the simple '.txt' extension to avoid confusion with the factual background model files (that typically have the '.bg' file extension).
- Use the prefix '#" for comment lines (lines with information for your own interest), '#'-lines are skipped during file load.



Example

Fig fail : Sacc_bgfiles.png



Feedback

Contact us if you have comments, questions or suggestions or simply want to react on the contents of this guideline. Thank you.