Please use this identifier to cite or link to this item:
https://dl.ucsc.cmb.ac.lk/jspui/handle/123456789/3677
Title: | A Computational Approach to Identify High Quality Mutations in Exome Sequencing |
Authors: | Deshappriya, K.G.H.U. |
Keywords: | Next Generation Sequencing Exome Sequencing High Quality Mutations Single Nucleotide Polymorphisms Supervised Machine Learning Algorithms Class Imbalance |
Issue Date: | 8-Sep-2016 |
Abstract: | Sequencing of human genome/ exome facilitates identifying genetic mutations. Since, majority of disease causing mutations (i.e., SNPs) are in exome, more focus has been given towards sequencing the exome alone. Next Generation Sequencing (NGS) technology is the currently using sequencing technology due to its high throughput sequencing capability. But, NGS-based mutation detection is also prone to erroneous calls due to sequencing and read mapping errors. There are about 15,000 – 20,000 genetic mutations per individual exome. Hence, it is hard for geneticists to analyze all of them manually. Considering the limitations in existing literature, this study proposed a computational approach, utilizing supervised machine learning techniques to identify high quality SNPs in exome sequencing, which helps to reduce the large volume of data in to a human manageable amount. Series of machine learning algorithms such as Naïve Bayes, SVM and ANN have been experimented. Data is obtained from the Human Genetics Unit, Faculty of Medicine, Colombo, Sri Lanka and applied a systematic feature engineering process to transform the initial data set in to a model compatible format. The study utilized range of data level, algorithmic and hybrid techniques to overcome the class imbalance problem. For the evaluation, we used wide range of evaluation measures to analyze the performance of each learning algorithm before and after applying class imbalance mitigation techniques. Experimented results indicated that ANN model trained, applying over-sampling and boosting techniques is the best model to identify high quality SNPs in a given sequenced exome of an individual. |
URI: | http://hdl.handle.net/123456789/3677 |
Appears in Collections: | SCS Individual Project - Final Thesis (2015) |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
11002085_finalDissertation.pdf Restricted Access | 1.9 MB | Adobe PDF | View/Open Request a copy |
Items in UCSC Digital Library are protected by copyright, with all rights reserved, unless otherwise indicated.