Please use this identifier to cite or link to this item: https://dl.ucsc.cmb.ac.lk/jspui/handle/123456789/4428
Title: Categorizing high dimensional unlabelled genomic data
Authors: Ranasinghe, R.D.T.W.
Issue Date: 4-Aug-2021
Abstract: Since genomic data exploration became an important area with the completion of the Human Genome project, the tools and techniques that were used in genomic context were improved. These tools and techniques for data generation has increased the volume of data available to researchers and it is being increasing rapidly. However the high dimensional nature of these data make it difficult to analyze the presented data and make valuable conclusions or predictions. These data are presented in different types of formats with several parameters in different data sources. Thousands of DNA combinations have been identified as indicators of susceptibility to specific diseases. Categorizing these data using there similarities which can be a hidden feature, will lead to reveal some important factors of these data collections. Clustering is one of the major method is been used for data analyzing. In this study I present a novel approach to cluster the high dimensional genomic data in order to make important and valuable predictions on available data by taking into account the annotated information about genes on prostate cancers from online databases such as cBio portal. These data has different characteristics as numerical, categorical, sparse and dense. Hence different normalizing methods and different clustering approaches. These different approaches were carried out having a base of three main clustering algorithms which are K-means, Hierarchical clustering and DBSCAN clustering. These clustering algorithms were used in different procedures using several dimensional reduction methods, different data normalizing methods. Each approach were evaluated using different measurements in order to find the better approach for genomic data clustering when the data are high dimensional. Silhouette score and Davies–Bouldin index were used as the messurements of evaluation of each cluster in each approach. Selected novel hybrid approach of clustering genomic data gives the best scores for these meassurements confirming the validity of the novel approach in clustering high dimensional genomic data.
URI: http://dl.ucsc.cmb.ac.lk/jspui/handle/123456789/4428
Appears in Collections:2019

Files in This Item:
File Description SizeFormat 
2016MCS088.pdf2.35 MBAdobe PDFView/Open


Items in UCSC Digital Library are protected by copyright, with all rights reserved, unless otherwise indicated.