1st International and 10th National Iranian Conference on Bioinformatics
Predicting the membrane proteins' classification using multi-dimensional wavelet and random forest classifier
Paper ID : 1006-ICB10
Authors:
Parham Hajishafiezahramini, Parviz Abdolmaleki *
Department of Biophysics, Faculty of Biological Sciences, Tarbiat Modares University, Jalal AleAhmad,Nasr, P.O.Box 14115-111, Tehran, Iran
Abstract:
Concerning the difficulties and complexity of experimental methods to determine the functionality and structure of the proteins, the computational techniques have recently found their proper place in predicting protein function problems. While different techniques have been introduced based on the machine learning approach, there is no combination technique of exploiting Multidimensional discrete wavelet transform(DWT) analysis and machine learning. In this study, we have devised a handy, accurate, and time-efficient predictive model to classify the membrane proteins into five different classes, including single-pass type 1, single-pass type 2, Multi-Pass, Lipid-Chain, and GPI membrane proteins based on DWT analysis and machine learning approach.
We have applied our proposed method for Chou's membrane protein datasets, containing 2059 and 2625 membrane protein sequences from five different classes. The majority of the former studies used these datasets as the complete ones. In this technique, protein sequences were initially transformed into six-dimensional signals, including the hydropathy scale, polarity, secondary structure, molecular volume, codon diversity, and electrostatic charge indexes. These six-dimensional signals are then used as the multidimensional discrete wavelet transform input data to analyze the entire signals. Feature vectors were then generated regarding the proper criteria of approximate and detailed coefficients for every single protein. Eventually, the feature vectors were used in a random forest classifier to avoid overfitting and take advantage of measuring variable importance.
As a result, we obtained an accuracy of 91.7% and 89.6% for the independent dataset and jackknife test, respectively. These results indicated that the proposed method yielded better results.
Keywords:
Membrane proteins; Predictive model; Discrete Wavelet transform; Hydropathy scale
Status : Paper Accepted (Poster Presentation)