PhosphoRice: a meta-predictor of rice-specific phosphorylation sites

Que, Shufu; Li, Kuan; Chen, Min; Wang, Yongfei; Yang, Qiaobin; Zhang, Wenfeng; Zhang, Baoqian; Xiong, Bangshu; He, Huaqin

doi:10.1186/1746-4811-8-5

Methodology
Open access
Published: 03 February 2012

PhosphoRice: a meta-predictor of rice-specific phosphorylation sites

Shufu Que^1,2,
Kuan Li²,
Min Chen²,
Yongfei Wang²,
Qiaobin Yang²,
Wenfeng Zhang^1,2,
Baoqian Zhang^1,2,
Bangshu Xiong³ &
…
Huaqin He^1,2

Plant Methods volume 8, Article number: 5 (2012) Cite this article

5384 Accesses
24 Citations
Metrics details

Abstract

Background

As a result of the growing body of protein phosphorylation sites data, the number of phosphoprotein databases is constantly increasing, and dozens of tools are available for predicting protein phosphorylation sites to achieve fast automatic results. However, none of the existing tools has been developed to predict protein phosphorylation sites in rice.

Results

In this paper, the phosphorylation site predictors, NetPhos 2.0, NetPhosK, Kinasephos, Scansite, Disphos and Predphosphos, were integrated to construct meta-predictors of rice-specific phosphorylation sites using several methods, including unweighted voting, unreduced weighted voting, reduced unweighted voting and weighted voting strategies. PhosphoRice, the meta-predictor produced by using weighted voting strategy with parameters selected by restricted grid search and conditional random search, performed the best at predicting phosphorylation sites in rice. Its Matthew's Correlation Coefficient (MCC) and Accuracy (ACC) reached to 0.474 and 73.8%, respectively. Compared to the best individual element predictor (Disphos_default), PhosphoRice archieved a significant increase in MCC of 0.071 (P < 0.01), and an increase in ACC of 4.6%.

Conclusions

PhosphoRice is a powerful tool for predicting unidentified phosphorylation sites in rice. Compared to the existing methods, we found that our tool showed greater robustness in ACC and MCC. PhosphoRice is available to the public at http://bioinformatics.fafu.edu.cn/PhosphoRice.

Background

Protein phosphorylation is the most common form of protein post-translational modification (PTM) [1–3]. Phosphorylation and dephosphorylation of proteins is a universal mechanism for regulating protein function in the eukaryote, prokaryote and archaea kingdoms. Given the importance of protein phosphorylation in regulating cellular signaling, large-scale identification of phosphorylated proteins has been carried out in yeast [4], mice [5], humans [6], Arabidopsis [7, 8], rice [9–12] and Medicago [13]. As the data grow, the number and the size of the available phosphoprotein databases are increasing and are becoming more complex. The Phospho.ELM database contains validated phosphorylation sites that are mostly derived from mammals [14], Phosida contains large-scale data from Homo sapien and Bacillus subtilis [15], PhosphoSite (http://www.phosphosite.org/) is a curated site that focuses on vertebrate systems [16] and PhosPhAt is a phosphorylation site database that is specific for Arabidopsis [17].

The growing data of protein phosphorylation sites have stimulated the development of computational approaches to predict these sites from protein sequences. Over the past decade, a series of algorithms have been developed to predict phosphorylation sites from amino acid sequences [18]. A few well-maintained web sites that offer prediction of protein phosphorylation sites have been made freely available to the scientific community, including NetPhos [19], NetPhosK [20], KinasePhos [21], KinasePhos 2.0 [22], DISPHOS [23], Scansite [24], PPSP [25], GPS [26], PredPhospho [27], NetPhosYeast [28], GANNPhos [29] and Musites [30]. However, the existing protein phosphorylation site prediction tools show a data sampling bias. The predictors perform at a high accuracy only for individual species [17]. Many existing prediction programs were primarily derived from mammalian data and exhibit poor performance in predicting plant phosphorylation sites. Therefore, based on the experimentally validated phosphorylation sites in a specific model organism, organism-specific predictors have been developed. NetPhosYeast, a yeast-specific predictor, outperforms existing generic predictors in the identification of phosphorylation sites in yeast [28]. PhosPhAt, which predicts phosphorylated-Serine sites in Arabidopsis, is benchmarked to perform better with Arabidopsis sequences than other generic predictors [17]. To our knowledge, no existing methods have been developed to specifically predict protein phosphorylation sites in rice.

As Arabidopsis thaliana (L.) standing as a model of dicotyledoneous species, rice (Oryza sativa L.) is a representative model monocotyledoneous (monocot) species. Moreover, rice shows an immense socio-economic impact on human civilization. In the past decade, with proteomic technologies and the availability of the genome sequences, rice proteomic research has been propelled towards a new height, which is crucial to better understand monocot plants [31]. Therefore, rice (Oryza Sativa L.) also serves as a cornerstone for the study of functional genomics in cereal plants [31]. However, current predictors perform poorly when individually used to predict phosphorylation sites in rice phosphoproteins [18]. In our previous research work, we constructed three different phosphorylation sites datasets to test the performance of different predictors. We found that the phosphorylation site predictors were complementary to some extent [18]. Therefore, establishment of a meta-server by maximizing complementary of individual predictors might be a promising approach to develop an improved prediction system. In this study, we developped a rice-specific meta-predictor of protein phosphorylation sites by integrating the newly individual predictors.

Results

Preprocessing performance assessment of element predictors

All of the protein sequences in the dataset were run through all 15 element predictors. Perl scripts were developed to submit jobs to the servers with the specified prediction options and then to analyze the prediction performance. As shown in Table 1, the element predictors showed different performances in predicting rice phosphorylation sites. The element predictor that provided the best prediction performance was Disphos_default (ACC: 69.2%, MCC: 0.403).

Table 1 Prediction performance of the element predictors on the test dataset

Full size table

Unweighted voting, unreduced weighted voting and reduced weighted voting strategies

We combined the element predictors to construct meta-predictors using unweighted voting, unreduced weighted voting and reduced weighted voting strategies. In the two-class phosphorylation site prediction problems, a score threshold must be set. The threshold score was set as half of the sum of all of the weights of the element predictors to construct meta-predictor of unweighted voting, unreduced weighted voting and reduced weighted voting strategies [32]. In this paper, the threshold scores (T) were less than half of the total weight of the predictors.

As shown in Table 2, compared to that of the best element predictors (ACC: 69.2%, MCC: 0.403), the meta-predictors constructed by unweighted voting, unreduced weighted voting and reduced weighted voting strategies achieved an significant increase in MCC of between 0.046 and 0.051. They all had a slight increase in ACC of between 3.2% and 3.7%. The meta-predictor of reduced weighted voting (with weights set by MCC) showed the best prediction performance (MCC: 0.455) in all the meta-predictors.

Table 2 The prediction performance of meta-predictors constructed by unweighted voting, unreduced weighted voting and reduced weighted voting strategies

Full size table

Restricted grid search and Conditional random search

We also ran a weighted voting strategy with parameters selected by restricted grid search to construct meta-predictors for phosphorylation sites in rice. As shown in Table 3, we found that the weighted voting strategy with the parameters selected by restricted grid search produced a satisfactory meta-predictor, which exhibited outstanding prediction performance (ACC: 73.5%, MCC: 0.469). Compared to the best element predictor, they improved MCC of 0.066 and ACC of 4.3%.

Table 3 The parameters in the weighted voting meta-predictors selected by a restricted grid search and a conditional random search

Full size table

Following the restricted grid search, we developed a conditional random search scheme to select the value of the 16 parameters. We decided that the weight of any element predictor would be allowed to fluctuate within a certain range, which was between the last grid and the next grid of parameter selected by the restricted grid search (Table 3). For instance, the weight value of NetPhos2.0 was 1 for the restricted grid search, which last grid value was 0 and next grid value was 3. Then, in conditional random search, the weight value of NetPhosK_0.5 was set to fluctuate between 0 and 3 (Table 3). Using this strategy, we produced a conditional random search meta-predictor, which possessed the best performance than that of all the individual predictors and the meta-predictors described above (Table 3). Its MCC were 0.071 significantly higher than that of the best individual element predictor (Disphos_default), while ACC was 4.6% higher than that of the best element predictor. We named this optimal conditional random search meta-predictor PhosphoRice.

Moreover, we generated the receiver operating characteristic (ROC) curve according to the predicted potentials of meta predictors. ROC is a plot of the true-positive ratio (sensitivity) against the false-positive ratio (1-specificity). The area under an ROC curve (AUC) represents the trade-off between sensitivity and specificity. The ROC curves of the prediction performance of all the meta-predictors in comparison to that of the best element predictor (Disphos_default) were shown in Figure 1. All meta-predictors had higher ROC areas than that of the best element predictor (Table 4). Meanwhile, we calculated the area underneath ROC curve to compare the predicting performance of PhosphoRice with that of Musite. Musite was a Java-based standalone application for predicting both general and kinase-specific protein phosphorylation sites [30]. Table 5 showed that the performance of PhosphoRice was significantly higher than that of Musite (Table 5).

Table 4 Areas under the ROC curves for the best element predictor, meta-predictors constructed by unweighted voting, unreduced weighted voting, reduced weighted voting and weighted voting strategies.

Full size table

Table 5 The prediction performance of PhosphoRice in comparison to that of Musite

Full size table

Discussion

Prediction performance of element predictors

Before being integrated into the meta-predictors, the existing phosphorylation site predictors used in this study were tested and assessed on the rice phosphorylation site dataset. All of element predictors achieved an ACC over 50.0%. However, their MCC was quite difference from each other, which was between 0.07 and 0.403. Different predictors may yield different performance in phosphorylation sites prediction due to their different types of algorithm and training dataset. The result also showed that some of kinase family-specific predictors could yield good performance under no kinase-specific condition, such as KinasePhos_95 (ACC: 70.0%, MCC: 0.396).

Prediction performance of unweighted voting, unreduced weighted voting and reduced weighted voting meta-predictors

In this paper, the prediction performance of unweighted voting, unreduced weighted voting and reduced weighted voting meta-predictors exceeded that of the best element predictor (ACC: 69.2%, MCC: 0.403), showing a significant increase in MCC (P < 0.01). The good performance archieved by these meta-predictors was due to element predictors' complementing each other. The reduced weighted voting strategies had been applied to produce meta-predictors in protein subcellular localization prediction [33] and phosphorylation site prediction for specific kinase family [32]. However, it got different result. This strategy produced good meta-predictors in the protein subcellular localization prediction problem [33], but failed to yield meta-predictors with expected performance in the prediction of phosphorylation sites for the CK2 kinase family [32]. Wan et al. (2008) discussed that the stronger correlation among the element predictors might play a role for the failure. However, we argued that the selection of element predictors was vital to the prediction performance of meta-predictors. The prediction performance of six element predictors used in this study was evaluated in Que et al. (2010). We found that the element predictors were complementary to some extent.

Prediction performance of PhosphoRice

In this study, we applied a more general form of the weighted voting strategy. First, we used a restricted grid search to determine a range for the parameters. Second, we set ranges of the parameters selected by the restricted grid search to perform a conditional random search. The restricted grid search was very efficient in running time performance and in parameter selection. It has been widely used to construct meta-predictors, including a serine/threonine phosphorylation site predictor [32] and a protein-protein interaction site predictor [34]. Using the restricted grid search, we selected 9 non-zero weight parameters for the final meta-predictors (Table 3). However, a drawback of using a restricted grid search is that it might find a local, rather than a global, optimum. Therefore, based on the result of restricted grid search, we ran an exhaustive search approach, conditional random search, to determine the 16 parameters. The conditional random search produced a good meta-predictor, whose rice phosphorylation site prediction performance not only exceeded that of the best element predictor, but also surpassed that of the meta-predictors integrated with unweighted voting, unreduced weighted voting and reduced weighted voting strategies. We can conclude here that a combined restricted grid search and conditional random search may be a good approach for determining the parameters in weighted voting strategy.

Conclusion

To summarize, we created a meta-predictor, PhosphoRice, using a weighted voting strategy, in which parameters were selected by restricted grid search and conditional random search. It shows good performance in predicting rice phosphorylation sites, as measured by the MCC and ACC. Its MCC were 0.071 significantly higher than that of the best individual element predictor (Disphos_default), while ACC was 4.6% higher than that of the best element predictor. We have also provided a web service for the prediction of rice protein phosphorylation sites, which can be accessed at http://bioinformatics.fafu.edu.cn/PhosphoRice.

Methods

Preprocessing of dataset

We collected rice phosphorylation sites from recent literature, including Nakagami et al. (2010), and the feature table of Swiss-Prot database. After removing the redundant phosphorylation sites, the number of serine (S), threonine (T) and tyrosine (Y) substrates were 4220, 605 and 141 respectively (Table 6). These phosphorylation sites were involved in 2162 proteins (Additional file 1). The 25-mer sequences (-12 ~ +12) of phosphorylation sites were extracted from the protein sequences and constructed as dataset. Because all of the phosphorylation sites in the positive dataset were experimentally verified, they were regarded as (+) sites. The Ser, Thr and Tyr residues that were not annotated as phosphorylation sites within the dataset were regarded as (-) sites (i.e., non-phosphorylation sites). We balanced the positive and negative dataset and the sizes of positive dataset and negative dataset are equal during cross-validation processes (Table 6).

Table 6 Number of phosphoserine, phosphothreonine and phosphotyrosine sites in positive and negative dataset

Full size table

We used a standard 10-fold cross validation to optimize the weight of all the individual predictors, and calculated the ACC and MCC of each meta predictor. The dataset was randomly partitioned into 10 subsets, including one testing subset and nine training subsets. The weights are updated and the ACC and MCC were recalculated. The new weights were kept only if the ACC and MCC increased; otherwise the weights are rolled back to the previous values. Using this strategy, the meta-predictors were training by shifting the test subset stepwise so that all data is used for training and test when completed.

Selection of element predictors

Six phosphorylation site prediction programs, NetPhosK, NetPhos2.0, KinasePhos, PrePhospho 1.0, Scansite and DISPHOS, were selected as elemental predicting programs. NetPhosK, KinasePhos, PrePhospho 1.0 and Scansite are kinase-family-specific phosphoryaltion site predictor, while NetPhos2.0 and DISPHOS are not. All of the element predictors were run under no kinase-specific condition. Their prediction performance was evaluated in our last research work. Fifteen element predictors derived from these programs were used to form rice-specific meta-predictors of phosphorylation sites (Additional file 2). The methods for obtaining these 15 element predictors are described below.

Netphos and NetPhosK (http://www.cbs.dtu.dk/services/NetPhosK/) use an artificial neural network algorithm to predict phosphorylation sites. With the NetPhosK prediction server, the option "prediction without filtering" was selected to predict phosphorylation sites. The threshold value was set as 0.5 and 0.7 to determine whether or not a site is predicted as phosphorylated. The result at each threshold value was selected to be an element predictor, they were named NetPhosK_0.5 and NetPhosK_0.7.

DISPHOS (DISorder-enhanced PHOSphorylation site predictor, http://core.ist.temple.edu/pred/) uses position-specific amino acid composition and predicts structural disorder information to distinguish phosphorylation and non-phosphorylation sites. In this study, "default predictor," "Eukaryotes" or "A. thaliana" was chosen to predict phosphorylation sites in rice and were named Disphos_default, Disphos_Eukaryotes and Disphos_Arabidopsis, respectively.

KinasePhos (http://kinasephos.mbc.nctu.edu.tw/index.php) employs a Profile Hidden Markov Model (HMM) to predict kinase family-specific phosphorylation sites. In this study, KinasePhos was run with the option of 90%, 95%, 100% prediction specificity and 'by default HMM bit score', whilst KinasePhos 2.0 with 80% prediction specificity, respectively. These five selections resulted in four separate element predictors termed KinasePhos_90, KinasePhos_95, KinasePhos_100, KinasePhos_default and KinasePhos 2.0_80.

Scansite (http://scansite.mit.edu/) uses scores calculated from position-specific score matrices (PSSM) to search for motifs within proteins that are likely to be phosphorylated by specific protein kinases. In this work, the setting of a high, medium or low stringency level was selected and resulted in the production of three separate element predictors named Scansite_high, Scansite_medium and Scansite_low, respectively.

PredPhospho (http://pred.ngri.re.kr/PredPhospho.htm) predicts various kinase-specific phosphorylation sites by training SVMs. In this study, the prediction was made by considering all kinase groups and families.

Prediction and performance measures

It was difficult to compare the numerical scores produced by the individual element predictors due to their differences in mathematical meaning [32]. In this study, the value of the scores was ignored, and instead a binary value was assigned (representing phosphorylated or not phosphorylated) and then performance was compared across prediction programs.

Four measurements-Sensitivity (Sn), Specificity (Sp), Accuracy (ACC) and the Matthew's Correlation Coefficient (MCC)-were employed to evaluate the performance of the tested predictors (definitions below):

\begin{gathered} S n = \frac{T P}{T P + F N}, \\ S p = \frac{T N}{T N + F P}, \\ A c = \frac{T P + T N}{T P + F P + T N + F N}, \end{gathered}

and

M C C = \frac{(T P \times T N) - (F N \times F P)}{\sqrt{(T P + F N) \times (T N + F P) \times (T P + F P) \times (T N + F N)}} .

where TP, FP, FN, and TN denote true positives, false positives, false negatives, and true negatives. Sn and Sp illustrate the correct prediction ratios of positive and negative datasets, respectively. Because MCC is much less susceptible to the ratio of positive samples and negative samples in the dataset, it is the most widely used prediction measure for two-class prediction programs [32].

We used SPSS 16.0 to create operating characteristic (ROC) curves to measure the performance of meta-predictors. For each possible threshold, the sensitivity and specificity were evaluated, the ROC curve [sensitivity versus (1-specificity) curve] was plotted, and the area underneath this curve was calculated. In this study, ROC curves were used to compare the predicting performance of every meta-predictors with the best element predictor, Disphos_default, respectively. The area underneath ROC curve was calculated to compare the predicting performance of PhophoRice with Musite, which was a newly predictor.

Unweighted voting, unreduced weighted voting and reduced weighted voting strategies

The unweighted voting, unreduced weighted voting and reduced weighted voting strategies were used to construct meta-predictors according to the procedure outlined by Liu et al. (2007) and Wan et al.(2008). Generally, if the following condition was satisfied, a linear voting-based two-class classifier would make a positive prediction:

\sum_{j = 1}^{N} [P_{j} \cdot w_{j}] \geq T

(1)

Where N is the total number of element predictors (in this experiment, N = 15), w_j is the weight of the jth prediction method and w_j = 1 for all element predictors in the unweighted voting strategy. P_j is the prediction made by the jth predictor; in a positive prediction, P_j = 1, otherwise P_j = 0. T is the threshold score.

For a simple weighting voting strategy, the threshold T can be set as the half of the total weight of the predictors.

T = \frac{1}{2} \sum_{j = 1}^{N} w_{j}

(2)

Restricted grid search

In Equation (1), proper weight parameters (w_j) would produce a classifier with good prediction performance. In this study, there are 16 parameters, including 15 possible values for w_j, and a value for T that needs to be determined for the highest performance classifier. We applied the restricted grid search method to select the values of these 16 parameters, which has been widely used in two-class classification problems [32, 33]. There were two critical restrictions of this method in our study. First, we limited the weight of the element predictors to be one of the following values: 0, 1, 3, 5, 7, 9, 11, 13, and 15. Second, the sum of the weights of all 15 element predictors must be equal to 15 (Table 7). The restricted grid search of the 16 parameters was conducted on the dataset with 10-fold cross-validation.

Table 7 Weight combinations, permutations and possible weights sum values in the restricted grid search scheme

Full size table

Conditional random search

Conditional random fields were first introduced by Lafferty and colleagues in 2001 [35]. For the conditional random search, the threshold T was set as a random value of the total weight of the predictors.

T = r a n d (\sum_{j = 1}^{N} w_{j})

(3)

Randomized algorithms are often simple, beautiful and efficient for selecting parameters. They produce a series of unrelated and unpredictable digits or characters. However, the computer cannot produce an absolute random number; it can only have a "pseudorandom number". The conditional random search method can be represented as follows:

a.
the weight selected by restricted grid search;
b.
random search range was set between the last grid and the next grid of parameter selected by the restricted grid search;
c.
runuing random search program;
d.
training on the training set, test on the test set;
e.
stopping at the parameter combination that achieve higher MCC than that of restricted grid search.

References

Hubbard MJ, Cohen P: On target with a new mechanism for the regulation of protein phosphorylation. Trends Biochem Sci. 1993, 18: 172-177. 10.1016/0968-0004(93)90109-Z.
Article CAS PubMed Google Scholar
Peck SC: Early phosphorylation events in biotic stress. Current Opinion Plant Biology. 2003, 6: 334-338. 10.1016/S1369-5266(03)00056-6.
Article CAS Google Scholar
Khan M, Takasaki H, Komatsu S: Comprehensive phosphoproteome analysis in Rice and identification of phosphoproteins responsive to different hormones/stresses. Journal of Proteome Research. 2005, 4: 1592-1599. 10.1021/pr0501160.
Article CAS PubMed Google Scholar
Ficarro SB, McCleland ML, Stukenberg PT, Burke DJ, Ross MM, Shabanowitz J, Hunt DF, White FM: Phosphoproteome analysis by mass spectrometry and its application to Saccharomyces cerevisiae. Nat Biotechnol. 2002, 20: 301-305. 10.1038/nbt0302-301.
Article CAS PubMed Google Scholar
Ballif BA, Villen J, Beausoleil SA, Schwartz D, Gygi SP: Phosphoproteomic analysis of the developing mouse brain. Mol Cell Proteomics. 2004, 3: 1093-1101. 10.1074/mcp.M400085-MCP200.
Article CAS PubMed Google Scholar
Lim YP, Diong LS, Qi R, Druker BJ, Epstein RJ: Phosphoproteomic fingerprinting of epidermal growth factor signaling and anticancer drug action in human tumor cells. Mo Cancer Ther. 2003, 2: 1369-77.
CAS Google Scholar
Nuhse TS, Stensballe A, Jensen ON, Peck SC: Phosphoproteomics of the Arabidopsis plasma membrane and a new phosphorylation site database. Plant Cell. 2004, 16: 2394-2405. 10.1105/tpc.104.023150.
Article PubMed Central PubMed Google Scholar
Sugiyama N, Nakagami H, Mochida K, Daudi A: Large-scale phosphorylation mapping reveals the extent of tyrosine phosphorylation in Arabidopsis. Mol Syst Biol. 2008, 4: 193-
Article PubMed Central PubMed Google Scholar
Tan F, Li G, Chitteti BR, Peng Z: Proteome and phosphoproteome analysis of chromatin associated proteins in rice (Oryza sativa). Proteomics. 2007, 7: 4511-4527. 10.1002/pmic.200700580.
Article CAS PubMed Google Scholar
He H, Li J: Proteomic analysis of phosphoproteins regulated by abscisic acid in rice leaves. Biochemical Biophysical Research Communication. 2008, 371: 883-888. 10.1016/j.bbrc.2008.05.001.
Article CAS Google Scholar
Ke Y, Han G, Chen X, He H: Differential regulation of proteins and phosphoproteins in rice under drought stress. Biochemical Biophysical Research Communication. 2009, 379: 133-138. 10.1016/j.bbrc.2008.12.067.
Article CAS Google Scholar
Nakagami H, Sugiyama N, Mochida K, Daudi A: Large-scale comparative phosphoproteomics identifies conserved phosphorylation sites in plants. Plant Physiol. 2010, 153: 1161-1674. 10.1104/pp.110.157347.
Article PubMed Central CAS PubMed Google Scholar
Grimsrud PA, den OD, Wenger CD, Swaney DL: Large-scale phosphoprotein analysis in Medicago truncatula roots provides insight into in vivo kinase activity in legumes. Plant Physiol. 2010, 152: 19-28. 10.1104/pp.109.149625.
Article PubMed Central CAS PubMed Google Scholar
Diella F, Cameron S, Gemünd C, Linding R, Via A, Kuster B, Sicheritz-Pontén T, Blom B, Gibson T: Phospho.ELM: A database of experimentally verified phosphorylation sites in eukaryotic proteins. BMC Bioinformatics. 2004, 5: 79-10.1186/1471-2105-5-79.
Article PubMed Central PubMed Google Scholar
Gnad F, Ren S, Cox J, Olsen J, Macek B, Oroshi M, Mann M: PHOSIDA (phosphorylation site database): management, structural and evolutionary investigation, and prediction of phosphosites. Genome Biology. 2007, 8: R250-10.1186/gb-2007-8-11-r250.
Article PubMed Central PubMed Google Scholar
Hornbeck PV, Chabra I, Kornhauser JM, Skrzypek E, Zhang B: PhosphoSite: A bioinformatics resource dedicated to physiological protein phosphorylation. Proteomics. 2004, 4: 1551-1561. 10.1002/pmic.200300772.
Article CAS PubMed Google Scholar
Heazlewood JL, Durek P, Hummel J, Selbig J, Weckwerth W, Walther D, Schulze WX: PhosPhAt: a database of phosphorylation sites in Arabidopsis thaliana and a plant-specific phosphorylation site predictor. Nucleic Acids Research. 2007, 36: D1015-21. 10.1093/nar/gkm812.
Article PubMed Central PubMed Google Scholar
Que S, Wang Y, Chen P, Tang Y, Zhang Z, He H: Evaluation of Protein Phosphorylation Site Predictors. Protein and Peptide Letters. 2010, 17: 64-69. 10.2174/092986610789909412.
Article CAS PubMed Google Scholar
Blom N, Gammeltoft S, Brunak S: Sequence and structure-based prediction of eukaryotic protein phosphorylation sites. J Mol Biol. 1999, 294: 1351-1362. 10.1006/jmbi.1999.3310.
Article CAS PubMed Google Scholar
Blom N, Sicheritz-Ponten T, Gupta R, Gammeltoft S, Brunak S: Prediction of post-translational glycosylation and phosphorylation of proteins from the amino acid sequence. Proteomics. 2004, 4: 1633-49. 10.1002/pmic.200300771.
Article CAS PubMed Google Scholar
Huang HD, Lee TY, Tzeng SW, Horng JT: KinasePhos: a web tool for identifying protein kinase-specific phosphorylation sites. Nucleic Acids Res. 2005, 33: W226-9. 10.1093/nar/gki471.
Article PubMed Central CAS PubMed Google Scholar
Wong YH, Lee TY, Liang HK, Huang CM, Yang YH, Chu CH, Huang HD, Ko MT, Hwang JK: KinasePhos 2.0: a web server for identifying protein kinase-specific phosphorylation sites based on sequences and coupling patterns. Nucleic Acids Research. 2007, 35: W588-594. 10.1093/nar/gkm322.
Article PubMed Central PubMed Google Scholar
Iakoucheva LM, Radivojac P, Brown CJ, O'Connor TR, Sikes JG, Obradovic Z, Dunker AK: The importance of intrinsic disorder for protein phosphorylation. Nucleic Acids Res. 2004, 32: 1037-1049. 10.1093/nar/gkh253.
Article PubMed Central CAS PubMed Google Scholar
Obenauer JC, Cantley LC, Yaffe MB: Scansite 2.0: Proteome-wide prediction of cell signaling interactions using short sequence motifs. Nucleic Acids Res. 2003, 31: 3635-3641. 10.1093/nar/gkg584.
Article PubMed Central CAS PubMed Google Scholar
Xue Y, Li A, Wang L, Feng H, Yao X: PPSP: prediction of PK-specific phosphorylation site with Bayesian decision theory. BMC Bioinformatics. 2006, 7: 163-10.1186/1471-2105-7-163.
Article PubMed Central PubMed Google Scholar
Xue Y, Zhou F, Zhu M, Ahmed K, Chen G, Yao X: GPS: a comprehensive www server for phosphorylation sites prediction. Nucleic Acids Res. 2005, 33: W184-187. 10.1093/nar/gki393.
Article PubMed Central CAS PubMed Google Scholar
Kim JH, Lee J, Oh B, Kim K, Koh I: Prediction of phosphorylation sites using SVMs. Bioinformatics. 2004, 20: 3179-3184. 10.1093/bioinformatics/bth382.
Article CAS PubMed Google Scholar
Ingrell CR, Miller ML, Jensen ON, Blom N: NetPhosYeast: prediction of protein phosphorylation sites in yeast. Bioinformatics. 2007, 23: 895-897. 10.1093/bioinformatics/btm020.
Article CAS PubMed Google Scholar
Tang YR, Chen YZ, Canchaya CA, Zhang Z: GANNPhos: a new phosphorylation site predictor based on a genetic algorithm integrated neural network. Protein Engineering Design & Selection. 2007, 20: 405-412. 10.1093/protein/gzm035.
Article CAS Google Scholar
Gao J, Thelen JJ, Dunker AK, Xu D: Musite, a tool for global prediction of general and kinase specific phosphorylation sites. Mol Cell Proteomics. 2010, 9: 2586-2600. 10.1074/mcp.M110.001388.
Article PubMed Central CAS PubMed Google Scholar
Agrawal GK, Rakwal R: Rice proteomics: A Cornerstone for cereal food crop proteomics. Mass Spectrometry Reviews. 2006, 25: 1-53. 10.1002/mas.20056.
Article CAS PubMed Google Scholar
Wan J, Kang S, Tang C, Yan J, Ren Y, Liu J, Gao X, Banerjee A, Ellis L, Li T: Meta-prediction of phosphorylation sites with weighted voting and restricted grid search parameter selection. Nucleic Acids Res. 2008, 36: e22-
Article PubMed Central PubMed Google Scholar
Liu J, Kang S, Tang C, Ellis L, Li T: Meta-prediction of protein subcellular localization with reduced voting. Nucleic Acids Res. 2007, 35: e96-10.1093/nar/gkm562.
Article PubMed Central PubMed Google Scholar
Deng L, Guan J, Dong Q, Zhou S: Prediction of protein-protein interaction sites using an ensemble method. BMC Bioinformatics. 2009, 10: 26-10.1186/1471-2105-10-26.
Article Google Scholar
Lafferty J, McCallum A, Pereira F: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. Proceedings of the 18th International Conference on 44 Machine Learning. 2001, Morgan Kaufmann, San Francisco, CA, 282-289.
Google Scholar

Download references

Acknowledgements

We thank the anonymous referees whose constructive comments were very helpful in improving the quality of this work. This work was supported by the Natural Science Foundation of China and Fujian (No. 31070402, 61163047 and 2011J01075), a grant from Education Department of Fujian (No. JA10103) and the Key Program of Ecology, Fujian, China (No. 0608507 and No. 0b08b005).

Author information

Authors and Affiliations

Key Laboratory of Ministry of Education for Genetic, Breeding and Multiple Utilization of Crops, Fuzhou, 350002, China
Shufu Que, Wenfeng Zhang, Baoqian Zhang & Huaqin He
College of Life Sciences, Fujian Agriculture and Forestry University, Fuzhou, 350002, China
Shufu Que, Kuan Li, Min Chen, Yongfei Wang, Qiaobin Yang, Wenfeng Zhang, Baoqian Zhang & Huaqin He
Key Laboratory of Nondestructive Test of Ministry of Education, Nanchang Hangkong University, Nanchang, 330063, China
Bangshu Xiong

Authors

Shufu Que
View author publications
You can also search for this author in PubMed Google Scholar
Kuan Li
View author publications
You can also search for this author in PubMed Google Scholar
Min Chen
View author publications
You can also search for this author in PubMed Google Scholar
Yongfei Wang
View author publications
You can also search for this author in PubMed Google Scholar
Qiaobin Yang
View author publications
You can also search for this author in PubMed Google Scholar
Wenfeng Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Baoqian Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Bangshu Xiong
View author publications
You can also search for this author in PubMed Google Scholar
Huaqin He
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Huaqin He.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

HQH conceived of the study, designed experiments, analyzed data and revised the manuscript. SFQ designed and carried out restricted grid and random search. KL developed Perl scripts. MC analyzed on the performance of element and meta predictors. QBY constructed the dataset. YFW participated in the dataset construction. WFZ and BQZ developed and maintained the website. BSX helped to write the computer program. All authors read and approved the final manuscript.

Shufu Que, Kuan Li, Min Chen contributed equally to this work.

Electronic supplementary material

13007_2011_177_MOESM1_ESM.XLS

Additional file 1: Rice phosphorylation sites data. Data file listing Accession Number, full-length sequence, phosphorylated amino acid and its site position. (XLS 3 MB)

13007_2011_177_MOESM2_ESM.DOC

Additional file 2: Summary of the 15 element predictors. Summary file listing the name, references and URLs of the 15 element predictors used to produce meta-predictors. (DOC 36 KB)

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Rights and permissions

Open Access This article is published under license to BioMed Central Ltd. This is an Open Access article is distributed under the terms of the Creative Commons Attribution License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Que, S., Li, K., Chen, M. et al. PhosphoRice: a meta-predictor of rice-specific phosphorylation sites. Plant Methods 8, 5 (2012). https://doi.org/10.1186/1746-4811-8-5

Download citation

Received: 07 December 2011
Accepted: 03 February 2012
Published: 03 February 2012
DOI: https://doi.org/10.1186/1746-4811-8-5

PhosphoRice: a meta-predictor of rice-specific phosphorylation sites

Abstract

Background

Results

Conclusions

Background

Results

Preprocessing performance assessment of element predictors

Unweighted voting, unreduced weighted voting and reduced weighted voting strategies

Restricted grid search and Conditional random search

Discussion

Prediction performance of element predictors

Prediction performance of unweighted voting, unreduced weighted voting and reduced weighted voting meta-predictors

Prediction performance of PhosphoRice

Conclusion

Methods

Preprocessing of dataset

Selection of element predictors

Prediction and performance measures

Unweighted voting, unreduced weighted voting and reduced weighted voting strategies

Restricted grid search

Conditional random search

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Competing interests

Authors' contributions

Electronic supplementary material

13007_2011_177_MOESM1_ESM.XLS

13007_2011_177_MOESM2_ESM.DOC

Authors’ original submitted files for images

Authors’ original file for figure 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Plant Methods

Contact us