Neural Network Based Phylogenetic Analysis

Halgaswaththa, T.

Please use this identifier to cite or link to this item: https://dl.ucsc.cmb.ac.lk/jspui/handle/123456789/1738

Title:	Neural Network Based Phylogenetic Analysis
Authors:	Halgaswaththa, T.
Issue Date:	12
Abstract:	If someone finds an unknown bone fragment is important to understand which category that fragment may relate to. The standard way would be to extract a common gene from it and create the phylogenetic tree and then understand the category. But this method involves a lot of difficulties as it involves various tasks such as multiple sequences alignment. Throughout this research we are going to implement a neural network to understand the category of an unknown DNA sequence without using the phylogenetic tree approach. The approach we have used is to first identify the main categories of the data sequence using the phylogenetic tree and then develop the neural network to train that data sequences to the target categories. Finally that trained neural network would be able to identify categories of unknown DNA sequence to the suitable target category. We have used the Transferring sequences as the primary data set and used Mitochondrial DNA with 400 sequences as the secondary data set. We divided the sequences in to appropriate testing and training data set for the neural network. Although the number of training data needs to be a multiple of features multiplied by 10 for the neural network, but it was difficult to find such large numbers of sequences. Then we used the the maximum number of sequences for the mitochondrial DNA sequences for the experiment. We also used the tri gram method as the sequence encoding schema and used various codon searching mechanisms to get the codon content in the whole sequence as the feature extraction mechanism to prepare the input vector for the neural network. We used probabilities of each codon in the sequences as features of the input vector which has 64 dimensions. We used both probabilistic neural networks and feed forward neural networks as supervised neural networks. We could see that the most suitable supervised neural network type for this type of analysis was a probabilistic neural network according to our result. Using this approach the DNA sequences can be categories in to the main target categories and we can understand the most suitable category for the unknown sequence using such a neural network. Although phylogenetic approach can be used to understand the relationships between species , it is difficult to understand the relationships between species using neural network approach. If we found some unknown sequences which relate to the trained data sequences, we can use this approach to understand its originated category in less amount of time without using the phylogenetic analysis. Our result suggest that above experiment can be done using neural network approach without doing multiple sequence alignment and using phylogenetic tree.
URI:	http://hdl.handle.net/123456789/1738
Appears in Collections:	SCS Individual Project - Final Thesis (2011)

Files in This Item:

File	Description	Size	Format
12.pdf Restricted Access		4.16 MB	Adobe PDF	View/Open Request a copy

Show full item record