1st International and 10th National Iranian Conference on Bioinformatics
Performance evaluation of different machine learning classification models on expression profiles of tumor educated platelets data
Paper ID : 1395-ICB10
Authors:
Sajedeh Bahonar *, Fahimeh Palizban, Hesam Montazeri
Department of Bioinformatics, Institute Biochemistry and Biophysics, University of Tehran, Tehran, Iran.
Abstract:
Since liquid biopsy is less invasive than tissue biopsy, studies on liquid biopsy biomarkers for the early detection of cancer are taken into consideration. Expression profiles of tumor-educated platelets (TEP) in liquid biopsy can be used as one of the biomarkers. Using classification machine learning models, given the feature space derived from the expression data of TEPs, has given us the ability to predict data categories. Here, the aim is performance evaluation of different classification models for diagnosis of cancer-based on expression profiles of platelets. First, expression profiles of TEPs in 230 patients with breast, liver, colorectal, brain, pancreatic, and lung cancers in addition to profiles of 55 healthy individuals were downloaded from the GEO database (GSE68086). Thereafter, the data were normalized using the edgeR package (R software version 4.1.0) and 2000 genes with the highest variance were selected. Then, different types of classification models namely SVM, LDA, logistic regression, boosting, classification tree, and random forest, were evaluated on the feature selected data in 10-fold cross-validation. In addition, the variable importance of selected genes was obtained using polynomial SVM. Then, pathway enrichment analysis was performed using H, C6, and C7 gene sets of MSigDB database using preranked GSEA method. The results showed that the polynomial SVM has the highest performance on the validation set (accuracy ~ 95%, mean AUC ~ 0.994, sd AUC ~ 0.0093). Also, the linear SVM model had the second-best performance on validation set (mean AUC ~ 0.9917). In pathway enrichment analysis 10 immunological pathways were enriched in cancer samples compared to healthy donors. Overall, the results showed that polynomial SVM can be a model with good performance for classifying TEP data. All in all, the results of this study indicate that the expression profile of TEPs can be considered as a candidate biomarker in liquid biopsy.
Keywords:
Tumor educated platelets (TEP), classification models, cancer, pathway enrichment analysis
Status : Paper Accepted (Poster Presentation)