Extraction of over-represented alleles in Bulk Segregant Analysis

Simulate datasets

Number of simulated selected segregants in the pool.
Length of the crhomosome to simulate (min 100,000)
Quantity of polimorphic sites to be randomly selected.
Fold coverage to be simulated.
Recombination rate to be used in the simulation (cm/kb).
Give the recombination rate as centiMorgan / Kilobasepair (e.g for Yeast 0.37 )
Sequencing error rate per base sequenced.
Percentage of segregants expected in the final population to have the causal site.
Lower values mean more segregants are in the selected poool because of other reasons not related to genomics (e.g. noise)


This page simulates an artificial chromosome of the specified length with random polymorphic sites. A single site is randomly chosen to be causative. The proportion of segregants with the causal site is used to construct a selected pool as follows: each segregant originates by randomly combining both parental alleles. So each segregant has a probability of 50% to contain the causal variant. Each segregant with the causal variant has a probability equal to the entered proportion to be present in the final pool whereas a segregant without the causal variant has a probability of 1 minus the entered proportion. Segregants are added to the pool until the final number of selected segregants is reached. The proportion of segregants with the causal site works as the noise level, the idea behind this is to not make any assumptions on the cause of the noise (which can both be attributed to an incomplete QTL effect or to a difficult selection procedure of the selected segregants). Note that in this simulations a higher number of segregants does not increase the noise level.