Frequency-based haplotype reconstruction from deep sequencing data of bacterial populations

Sergio Pulido-Tamayo, Aminael Sánchez-Rodríguez, Toon Swings, Bram Van den Bergh, Akanksha Dubey, Hans Steenackers, Jan Michiels, Jan Fostier, Kathleen Marchal

EVORhA (EVOlutionary Reconstruction of hAplotypes), a haplotype reconstruction method that can handle haplotype reconstruction also in slowly evolving species.

Download

How to use EVORhA

Example:
java -jar EVORhA.jar completeAnalysis example/EcoliK12MG1655_truncated.fasta example/file_sorted.bam


Usage: java -jar EVORhA.jar <method> <files>
Methods:
- snpCounter: Count SNPs in sorted bam file
files: <genotype fasta> <annotation gff> <sorted bam> <outputfile snpout>

- subsMatrixBuilder: build a substitution matrix from observed data
files: <snp count snpout> <outputfile blosum>

- windowFinder: find and order a list of windows
files: <snp count snpout> <outputfile windows>

- windowProcessor: process each window in window file
files: <genotype fasta> <sorted bam> <bam index bam.bai> <snp count snpout> <window list windows> <sub matrix blosum> <output sampling file>

- windowExtention: load processing results and extend window haplotypes
files: <genotype fasta> <snp count snpout> <sub matrix blosum> <process file> <output haplotype fasta file

- globalReconstruction: load extension results and construct haplotypes
files: <genotype fasta> <snp count snpout> <sub matrix blosum> <extension file> <output haplotype fasta file

- joinGlobalHap2fasta: join different global.hapfreq files into one joined hap files with fasta format
files: <genotype fasta> <file1.global.hapfreq> <file2.global.hapfreq> ... <fileN.global.hapfreq> <output.join.fasta file>

- completeAnalysis: run the complete pipeline to a bam file
files: <genotypeName - .fasta & .gff> <sorted bam>

- evolutionExperiment: analize different timepoint of an evolution experiment
files: <genotype fasta> <baseNameTimePoint1> <baseNameTimePoint2> ... <baseNameTimePointN> <joinFile>

NOTE: if only one file is used its assumed all files have the same name
e.g. "HaplotypeReconstruction snpCounter REL606" is the same as "HaplotypeReconstruction snpCounter REL606_sorted.bam REL606.fa REL606.gff REL606.snpout"

Contact

Reporting bugs or asking for features are much welcome!
Please contact us via email at sergio.pulidotamayo[[@]]ugent.be .