Please use this identifier to cite or link to this item: https://dl.ucsc.cmb.ac.lk/jspui/handle/123456789/3677
Title: A Computational Approach to Identify High Quality Mutations in Exome Sequencing
Authors: Deshappriya, K.G.H.U.
Keywords: Next Generation Sequencing
Exome Sequencing
High Quality Mutations
Single Nucleotide Polymorphisms
Supervised Machine Learning Algorithms
Class Imbalance
Issue Date: 8-Sep-2016
Abstract: Sequencing of human genome/ exome facilitates identifying genetic mutations. Since, majority of disease causing mutations (i.e., SNPs) are in exome, more focus has been given towards sequencing the exome alone. Next Generation Sequencing (NGS) technology is the currently using sequencing technology due to its high throughput sequencing capability. But, NGS-based mutation detection is also prone to erroneous calls due to sequencing and read mapping errors. There are about 15,000 – 20,000 genetic mutations per individual exome. Hence, it is hard for geneticists to analyze all of them manually. Considering the limitations in existing literature, this study proposed a computational approach, utilizing supervised machine learning techniques to identify high quality SNPs in exome sequencing, which helps to reduce the large volume of data in to a human manageable amount. Series of machine learning algorithms such as Naïve Bayes, SVM and ANN have been experimented. Data is obtained from the Human Genetics Unit, Faculty of Medicine, Colombo, Sri Lanka and applied a systematic feature engineering process to transform the initial data set in to a model compatible format. The study utilized range of data level, algorithmic and hybrid techniques to overcome the class imbalance problem. For the evaluation, we used wide range of evaluation measures to analyze the performance of each learning algorithm before and after applying class imbalance mitigation techniques. Experimented results indicated that ANN model trained, applying over-sampling and boosting techniques is the best model to identify high quality SNPs in a given sequenced exome of an individual.
URI: http://hdl.handle.net/123456789/3677
Appears in Collections:SCS Individual Project - Final Thesis (2015)

Files in This Item:
File Description SizeFormat 
11002085_finalDissertation.pdf
  Restricted Access
1.9 MBAdobe PDFView/Open Request a copy


Items in UCSC Digital Library are protected by copyright, with all rights reserved, unless otherwise indicated.