Appendix : details on selected collections from JASPAR ----------------------------------------------------------- Source URL : http://jaspar.genereg.net/docs - CORE = contains a curated, non-redundant set of TF binding profiles, derived from published collections of experimentally defined transcription factor binding sites for eukaryotes. - UNVALIDATED = For these profiles, our curators failed to find any support from existing literature. We encourage the community to perform experiments and/or point us to literature that our curators missed in order to support these profiles. The collection should not be used as validation or as support in publications. Additional collection for M.musculus : - PBM = built using new in-vitro techniques, based on k-mer microarrays. PBM matrix models have their own database which is specialized for the data: UniPROBE. The PBM collection is derived by Badis et al. (Science 2009) from binding preferences of 104 mouse transcription factors. It should be used when it is important that each matrix was derived using the same protocol. - PBM_HOMEO = built using new in-vitro techniques, based on k-mer microarrays. PBM matrix models have their own database which is specialized for the data: UniPROBE. The PBM_HOMEO collection is derived by Berger et al. (Cell 2008) including 176 profiles from mouse homeodomains. It should be used when it is important that each matrix was derived using the same protocol, focused on homeobox factors. Additional collection for H.sapiens : - CNE = collection of 233 matrix profiles derived by Xie et al. (PNAS 2007) based on clustering of overrepresented motifs from human conserved non-coding elements. While the biochemical and biological role of most of these patterns is still unknown, Xie et al. have shown that the most abundant ones correspond to known DNA-binding proteins, most notably insulator-binding protein CTCF. The CNE collection is best used when characterizing regulatory inputs in long-range developmental gene regulation in vertebrates or when analyzing properties of potential enhancers. - PHYLOFACTS = consists of 174 profiles that were extracted from phylogenetically conserved gene upstream elements by Xie et al. (Nature 2005). It is a mix corresponding to motifs for known and undefined transcription factors. They are useful when one expects that other factors might determine promoter characteristics, such as structural aspects and tissue specificity. They are highly complementary to the JASPAR CORE matrices (for H.sapiens and M.musculus), so are best used in combination. Sources : Badis et al.(Science 2009) Diversity and complexity in DNA recognition by transcription factors. URL https://science.sciencemag.org/content/324/5935/1720.long Berger et al. (Cell 2008) Variation in homeodomain DNA binding revealed by high-resolution analysis of sequence preferences. URL https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2531161/ Xie et al.(Nature 2005) Systematic discovery of regulatory motifs in human promoters and 3' UTRs by comparison of several mammals. URL https://www.ncbi.nlm.nih.gov/pubmed/15735639?dopt=Abstract Xie et al. (PNAS 2007) Systematic discovery of regulatory motifs in conserved regions of the human genome, including thousands of CTCF insulator sites. URL https://www.ncbi.nlm.nih.gov/pubmed/17442748?dopt=Citation End of file.