The file starts with a comment line (#INCLUSive Background Model v1.0) that refers to our program and serves as a file recognition for our applications that load a background model file as input.
Next follows a description of the order of the background model (#order) and two informational fields describing respectively a genome identifier (#organism) and the path referring to the sequences data where the model is extracted from (#Sequences).
The single nucleotide frequencies for A,C,G,T are described by 4 tab separated values (between 0 and 1) on the line following #snf. They represent the probability (Pr) to find the respective nucleotide in the sequence dataset where the background is modelled for, independent of the position of this nucleotide in the sequences.
#snf
Pr(A) Pr(C) Pr(G) Pr(T)
The section following #oligo describes the probability of all possible combinations of the nucleotides A,C,G,T of length equal to the background model order (also called an oligonucleotide) in the sequence dataset where the background is modelled for. The total number of oligonucleotides are printed on separate lines and equals 4 powered to the background model order (e.g. 16 for a second order model). The section starts with the oligonucleotide consisting of all A, followed by oligonucleotides where each next position in the oligonucleotide A is repeatedly replaced by respectively C,G,T. Below example is for a second order background model.
#oligo
Pr(AA)
Pr(AC)
Pr(AG)
Pr(AT)
Pr(CA)
Pr(CC)
Pr(CG)
Pr(CT)
Pr(GA)
Pr(GC)
Pr(GG)
Pr(GT)
Pr(TA)
Pr(TC)
Pr(TG)
Pr(TT)
The higherorder background model is described in the section following #transition matrix. Each line in this section describes the tab separated probabilities (Pr) of finding nucleotide A respectively C, G and T given a set of preceding nucleotides of length equal to the background model order. The total number of lines equals 4 powered to the background model order. The preceding oligonucleotide for the first line consists of all A, and in next lines A is repeatedly replaced by respectively C,G,T on each next position in the oligonucleotide. Below example is for a second order background model.
#transition matrix
Pr(AAA) Pr(CAA) Pr(GAA) Pr(TAA)
Pr(AAC) Pr(CAC) Pr(GAC) Pr(TAC)
Pr(AAG) Pr(CAG) Pr(GAG) Pr(TAG)
Pr(AAT) Pr(CAT) Pr(GAT) Pr(TAT)
Pr(ACA) Pr(CCA) Pr(GCA) Pr(TCA)
Pr(ACC) Pr(CCC) Pr(GCC) Pr(TCC)
Pr(ACG) Pr(CCG) Pr(GCG) Pr(TCG)
Pr(ACT) Pr(CCT) Pr(GCT) Pr(TCT)
Pr(AGA) Pr(CGA) Pr(GGA) Pr(TGA)
Pr(AGC) Pr(CGC) Pr(GGC) Pr(TGC)
Pr(AGG) Pr(CGG) Pr(GGG) Pr(TGG)
Pr(AGT) Pr(CGT) Pr(GGT) Pr(TGT)
Pr(ATA) Pr(CTA) Pr(GTA) Pr(TTA)
Pr(ATC) Pr(CTC) Pr(GTC) Pr(TTC)
Pr(ATG) Pr(CTG) Pr(GTG) Pr(TTG)
Pr(ATT) Pr(CTT) Pr(GTT) Pr(TTT)
