Logo Newick tree format

This page describes the format of a file that describes a rooted phylogenetic tree with branchlengts in NEWICK format.
We comment on all optional and required fields in case you need to supply such a file as input to MotifSuite.

In MotifSuite, a tree file is supplied as input for PHMS and NOMS.
Page contents :
    File format
    Conversion requirements
    Example



Newick Tree file format

In bioinformatics, NEWICK is a text-based format for representing trees in computer-readable form using (nested) parentheses and kommas. For detailed information how to build such a nested-parentheses tree, we refer to ~/phylip/newicktree.html.

The newick tree occurs on a single line, starting with a greater-than ('>') symbol in the first column and a tree-recognition string, in our case 'Tree' or 'Star', immediateley followed by the (nested) parentheses describing the relations between the species involved in your study:
e.g.
        >Star(species1:branchlength1,species2:branchlength2,species3:branchlength3);
or
        >Tree(species1:branchlength1,(species2:branchlength2,species3:branchlength3):node_branchlength1);
        or with internal node identification (optional):
        >Tree(species1:branchlength1,(species2:branchlength2,species3:branchlength3)NODE1:node_branchlength1);

The names of the species involved should be exactly the species-identifiers used in your sequences FASTA file (in this file, the species names also follow the greater than '>' symbol in the sequence identifier lines, see Fasta format). A branchlength consists of integers and (if decimal numbers) a dot (no komma!). Branchlenghts are always preceded by a colon symbol ':'. Internal nodes may but do not need to be identified by a string (we do not further use such node strings in our software). The description of 'NODE1' above is thus optional. The whole line is loaded as one unit, so no white spaces or tabs are allowed in any of the identifiers or numbers or before or after parentheses, kommas or colons.
Mark: The use of a string ('Star' or 'Tree') preceeding the first parenthese is as such required in our software to recognize the newick format, but the choise is not stringent (you can use 'Tree' or 'Star' for both star respectively tree topology, it does not further influence the way of loading the root-nodes-ancestor- and branchlength information).

(expert use, ask for details)
if (pInput[0] == "GeneBias") StoreGeneMutationRates(pInput);
else if (pInput[0] == "CoregBias") StoreCoregulationWeights(pInput);



Conversion requirements

- There is no standard file extension for a text file containing a Newick Tree. We propose the use of '.tree' or simply '.txt'.
- When a file is loaded by our software, lines starting with the symbol '#' are skipped. You can use it to add information lines for your own interest.
- IMPORTANT: In case of a non-star toplogy, species that are not involved in your motif detection study (i.e. there are no sequences for these species in your FASTA file) cannot be simply removed from the newick tree description (parentheses may not be nesting correctly obstructing the defintion of internal nodes in our software). In such case, leave the original rooted newick tree as it is. Our software will load the full tree and internally remove the branches that describe non-involved species or internal nodes that only connect to non-involved species. Mark that we choose to not remove internal nodes that connect to only a single involved species (i.e. we do not sum the values of the branchlengths).
Alternatively, you redesign the tree (using appropiate software or by correctly resetting the nesting parentheses) to only describe involved species and provide the according newick format description as input for our software.
- The file should end with a blank line return to asure that all information is being loaded by the program.



Example

Fig fail : Sacc_NewickTree.png

The above contents describes the contents of a star tree file for 5 Saccharomyces species related as shown in:
Fig fail : Sacc_star_tree.png

---------
Fig fail : Proteobact_NewickTree.png

The above contents describes the contents of a rooted tree file for 8 Gamma-proteobacterial species related as shown in:
Fig fail : Proteobact_rooted_tree.png



Feedback

Contact us if you have comments, questions or suggestions or simply want to react on the contents of this guideline. Thank you.