1st International and 10th National Iranian Conference on Bioinformatics
Structural Variation Detection from Paired-end NGS data using Hidden Markov Model
Paper ID : 1347-ICB10
Authors:
Mohammad Amin Rahimi *
Abstract:
The objective of the present study was to identify the areas variated in the samples’ genomes, which
was achieved by the Hidden Markov Model. The single-read data of IIlumina technology displays
the exact same correlation with Array-CGH and almost the same algorithms can be applied [1]. In
this study, Hidden Markov Models were used for the more precise and abundant paired-end data,
which is quite unusual for this type of algorithm. For this purpose, two methods of identification
were used [2, 3]: 1) in the first method, using a specific threshold, the ratios compared with the
normal samples were extracted and after the labelling of variated areas, the Hidden Markov Model
was applied, 2) the second method utilized the ground truth data and SVM machine leaning
technique to label variated areas. The Hidden Markov Model was then applied for re-labelling of
variated areas. Finally, for evaluation of the model, artificial data were acquired using simulation
techniques. After the identification of variated areas by Hidden Markov Model, the percentage of
the found duplication, deletion and translocations were calculated. The novelty of this study lies in
the identification of structural variations in paired-end data by Hidden Markov Model whereas, in
previous studies single-read data were used. Furthermore, this study identifies translocations using
Hidden Markov Model for the first time.
Keywords:
Structural Variation; Hidden Markov Model; Support Vector Machine; Next Generation Sequencing.
Status : Paper Accepted (Poster Presentation)