1st International and 10th National Iranian Conference on Bioinformatics
Improving the prediction of physical protein interaction by Balanced Random Forest inter-protein residue contact predictions using sequence covariation information
Paper ID : 1166-ICB10
Authors:
Sara Salmanian *1, Hamid Pezeshk2, Mehdi Sadeghi3
1Department of Bioinformatics, Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran
2School of Mathematics, Statistics and Computer Science, College of Science, University of Tehran, Tehran, Iran (currently visiting Department of Mathematics and Statistics, Concordia University, Montreal, Canada)
3National Institute of Genetic Engineering and Biotechnology, Tehran, Iran
Abstract:
Protein-protein interactions are essential for most cellular processes. There are a lot of protein interactions and a large number of protein sequences with unknown interacting partners. Prediction of protein interaction from sequence information has always been a great challenge. Those predictions would be more challenging when someone is supposed to specifically detect physical but not functional protein interplays. Therefore, developing new approaches for the accurate prediction of sequence-based physical protein interactions could be an important advancement in computational biology. Inter-protein spatially interrelating residue positions exhibit correlated patterns of sequence evolution in multiple sequence alignments. Those co-evolutions are wisely exploited for the prediction of physical protein interactions.
It is shown that feeding norm values of whole covariation information of protein heterodimers into Support Vector Machines (SVM), could accurately predict the possibility of physical interaction of those dimers using sequence information [1]. In the present study, Balanced Random Forest (BRF) models were trained with the covariations of inter-protein residues at different hypothetical interacting sites and then the models were employed for the prediction of possible inter-protein residue contacts. Instead of considering whole co-evolutionary information, those BRF predictions could take into account the covariation information of more probable physically interacting residues for further prediction of protein dimers at higher protein scales. BRF predicted those more probable contacting residues as positive class and other interacting pairs of amino acids as negative. After BRF predictions, previously computed covariation scores of negatively predicted residue partners were zeroized, thereby the role of those pairs in the final calculation of norm values were driven out. Results of the current study indicated that feeding the updated norm values of residue-residue covariation matrices, obtained after BRF predictions, into SVM models could significantly increase the accuracy of the final protein interaction predictions at the protein family level.
Keywords:
residue contacts, physical interaction, covariation, protein interaction prediction
Status : Paper Accepted (Poster Presentation)