1st International and 10th National Iranian Conference on Bioinformatics
Functional annotation of Missense Mutations Based on Protein Features
Paper ID : 1421-ICB10
Authors:
Motahareh Hakiminejad *1, Hesam Montazeri2, Bahram Goliaei1
1Department of Biophysics, Institute of Biochemistry and Biophysics (IBB), University of Tehran, Tehran, Iran
2Department of Bioinformatics, Institute of Biochemistry and Biophysics (IBB), University of Tehran, Iran
Abstract:
Background:
Among all genetic alterations in cancers, Single Nucleotide Variant (SNV) are the most common mutation. Still, identifying cancer-driving SNV (driver mutation) among the plenty of non-driving ones (passenger mutations) remains a challenge due to the innate bias in their population and the fact that most driver mutations are rare. We present a random forest tool that can annotate driver/passenger missense mutations based on protein information and helps find novel driving mutations.

Materials and Methods:
Pan-cancer mutation data from the TCGA database (n = 600k) fetched and labeled passenger/driver (based on their prevalence). According to the mutations, a feature-set containing five categories was built: 1) Physio-chemical changes of the changed amino acid, 2) Changes in Pseudo-amino acid composition of the 21-mer sequence around the point of mutation 3) disorderness of the mutation region, 4) site of mutation reported region/functions in uniport 5) whether the gene is reported to be Oncogene/ Tumor Suppressor or none. A random forest model was trained on the feature set by the ranger package in R.

Results:
The accuracy of the method on test data is 99% (sensitivity = 99%, specificity = 54%). The method was evaluated against other cancer missense annotations such as CHASMplus, CHASM, Mutation Assessor, Polyphen2, and VEST on experimentally-labeled cancer missense mutations. The receiver operating characteristic curve (auROC) of methods were 88%, 67%, 67%, 59%, 72%, respectively, and our method auROC was 83%. Also, it was tested against cancer SNV Golden standard based on extensive literature and database review, in which the accuracy was reported to be 72%, (sensitivity = 74%, specificity = 71%)

Conclusion:
We developed a random forest method that discriminates drivers from passenger missense mutations. As the method is solely based on protein descriptors, it can give insight into the mutation mode of action.
Keywords:
Protein structure/function, cancer-type-specific driver, missense mutation, rare drivers
Status : Paper Accepted (Poster Presentation)