Please use this identifier to cite or link to this item: https://dl.ucsc.cmb.ac.lk/jspui/handle/123456789/3507
Title: Identification & Verification of Mutations in Human Genome
Authors: Kamalsooriya, D.N.
Ranasinghe, R.D.T.W.
Keywords: Mutations in Human Genome
Issue Date: 8-Jun-2016
Abstract: The completion of Human Genome Project (HGP) catalyzed the development of Next Generation Sequencing (NGS) technologies. NGS techniques are used to detect the full range of genomic variations which are then extensively studied by researchers to better understand how diseases occur, people who are at risk and also to produce personalized treatments. The volume of scientific articles being published rises each year, fueled by new sequencing technologies. However at the present time, due to the prevalence of gaps & inconsistencies in the variant analysis process, researchers are facing challenges in discriminating disease-associated variations from a large number of genomic variants and the state-of-the-art text mining systems are not routinely used since they do not meet their varying needs & expectations. In this study, the aforementioned challenges are addressed by proposing a framework to discriminate variants with deleterious effects. Hence, initially, the focus was given to identification of mutations by constructing a logistic regression model. Thus, proposed regression model quantitatively assesses the harmful nature of variants by taking into account the annotated information about variants from online databases. Moreover, a text mining framework is proposed to find conflicting information about the clinical impacts of variants present in literature & thereby to verify the mutations. Term Frequency- Inverse Document Frequency (TF-IDF) based document relevancy ranking algorithm is used improve the efficiency of document retrieval. The regression model was successfully able to narrow down variants to very small number of candidate variants achieving a value of 89.5% for the average accuracy. Moreover, Spearman’s rho and Kendall’s tau values were measured to assess the performance of the ranking algorithm and the values 85.5% & 68.8% were obtained respectively confirming the validity of the framework in verification of deleterious impacts of identified mutations.
URI: http://hdl.handle.net/123456789/3507
Appears in Collections:BICT Group project (2015)

Files in This Item:
File Description SizeFormat 
ICT4001_1_FinalThesis.pdf
  Restricted Access
3.11 MBAdobe PDFView/Open Request a copy


Items in UCSC Digital Library are protected by copyright, with all rights reserved, unless otherwise indicated.