Using an ensemble of statistical metrics to quantify large sets of plant transcription factor binding sites
1 Department of Bioinformatics and Computational Biology, George Mason University, Manassas, Virginia, USA
2 Computational Biology Branch, National Center for Biotechnology Information, National Institutes of Health, Bethesda, Maryland, USA
3 Soybean Genomics and Improvement Laboratory, United States Department of Agriculture, Beltsville, Maryland, USA
Plant Methods 2013, 9:12 doi:10.1186/1746-4811-9-12Published: 11 April 2013
From initial seed germination through reproduction, plants continuously reprogram their transcriptional repertoire to facilitate growth and development. This dynamic is mediated by a diverse but inextricably-linked catalog of regulatory proteins called transcription factors (TFs). Statistically quantifying TF binding site (TFBS) abundance in promoters of differentially expressed genes can be used to identify binding site patterns in promoters that are closely related to stress-response. Output from today’s transcriptomic assays necessitates statistically-oriented software to handle large promoter-sequence sets in a computationally tractable fashion.
We present Marina, an open-source software for identifying over-represented TFBSs from amongst large sets of promoter sequences, using an ensemble of 7 statistical metrics and binding-site profiles. Through software comparison, we show that Marina can identify considerably more over-represented plant TFBSs compared to a popular software alternative.
Marina was used to identify over-represented TFBSs in a two time-point RNA-Seq study exploring the transcriptomic interplay between soybean (Glycine max) and soybean rust (Phakopsora pachyrhizi). Marina identified numerous abundant TFBSs recognized by transcription factors that are associated with defense-response such as WRKY, HY5 and MYB2. Comparing results from Marina to that of a popular software alternative suggests that regardless of the number of promoter-sequences, Marina is able to identify significantly more over-represented TFBSs.