Required Data and Formats
Interaction Network Data
Key to the use of IAMBEE is an interaction network. The interaction network is a compilation of the available interactomics data of the organism under research.
The interaction network file consists of two parts:
- The definition of the interaction types (Header).
- The list of interactions (Body).
The different types of interactions used have to be defined. Each interaction definition line must start with a "%", list the name of the interaction type freely chosen by the user and list if the interaction is regulatory or not using the keywords "regulatory" or "non-regulatory". Beneath is an example of the definition of different interaction types.
Each interaction type line starts with a "%" symbol, contain the name of the interaction type (e.g. metabolic, srna, pp, pd, ...) and indicate whether the interaction type is regulatory or not (indicated with "regulatory" or "non-regulatory"). Each field must be separated from the next field using a "space".
The main body of the interaction network file consists of the interaction definitions. Each interaction line contains the identifier of the starting gene, the end gene, the type of the interaction (corresponding with the defined interaction types) and the direction of the interaction with the keywords "directed" or "undirected". Beneath is an example of some interaction lines.
Example of interaction lines. Each interaction line starts with the id of the start gene, the id of the end gene, the type of the interaction and then the direction of the interaction. All the fields must be separated with a "comma".
IAMBEE is designed to be used to analyze sequencing data from evolving populations before and after a selective sweep. The program prioritize adaptive mutations by combining information of each individual mutation inferred from functional impact scores and relative frequency increase during a selective sweep.
The mutation data supplied must have the specific format delimited by "comma". The file header is also delimited by "comma" but starts with "#" symbol.
The information of each mutation must be defined used the proper order. To identify the header from other rows "comma" separated, it is necessary to add "#" symbol without spaces. The header contains the following elements:
The gene name column must match the identifiers of the network file.
Example of mutation file lines. Each line starts with the variant position. Followed by the gene id, reference and alternative nucleotide(s).
The last two columns indicate the condition. Here, each of the selective sweeps can be specified as different conditions. IAMBEE requires at least 2 conditions, in case of have only one sweep, the parental condition can be used as initial sweep.
Gene-Names Mapping File
We know that sometimes is hard to deal with gene nomenclatures, depending of the database the ids can change. Another scenario can be just to focus the attention in genes of interest. For this reason we add the option of add a file to map the genes ids from the ones used in the network/nutation files to the ones you feel more comfortable to work with.
The mapping file supplied must have the specific format delimited by "comma". The file header is also delimited by "comma" but starts with "#" symbol.
The information of each gene must be defined using a from -> to structure. To identify the header from other rows "comma" separated, it is necessary to add "#" symbol without spaces. The header contains the following elements:
The from column must match the identifiers of the network file and mutation file.
Example of mapping file lines. Each line starts with the gene id used in the previous files. Followed by the novel id to be used.