Please use this identifier to cite or link to this item: https://dl.ucsc.cmb.ac.lk/jspui/handle/123456789/2476
Title: Mitigation of Breast Cancer Deaths using Data Mining Techniques : In Sri Lankan Context
Authors: Wickramasekara, N.K.
Issue Date: 20-May-2014
Abstract: Breast cancer among females has increased in Sri Lanka within the last twenty years. National Cancer Control Program statistics reveal that breast cancer has caused the highest number of cancer deaths among females in Sri Lanka. According to the Ministry of Health, around thirty percent of the diagnosed breast cancer patients die early since the cancer is diagnosed at late stages. Therefore in order to mitigate breast cancer deaths in Sri Lanka, the most probable thing we have to do is to diagnose it in the very early stage. To assist this diagnosing in early stage process, as computer scientists we can do is to come up with a breast cancer risk score model, where people will be able to calculate a risk score of developing a breast cancer according to the values they have for breast cancer risk factors. Finally according to the risk scores they earn, people can go to have medical checkups and treatments where it will help to diagnose breast cancer in early stages. So the outlined objectives of this research were to: 1) Identify breast cancer cause factors which has become a major concern in worldwide, 2) Identify and explore the social issues that connect with the identified breast cancer cause factors, 3) Identify methods and techniques used by researchers in data mining as a way to mitigate breast cancer death risk, 4) Investigate how the identified breast cancer cause factors, breast cancer social issues, methods and techniques affect to Sri Lankan context, and 5) Develop and evaluate a model as an early warning to mitigate breast cancer deaths in Sri Lanka. Currently there are some breast cancer risk models available online, which may be not applicable to Sri Lankan context since Sri Lankan cancer data may differ from developed European countries due to the geographical and cultural differences which came across the literature study. Since the secondary breast cancer data from National Cancer Registry was not sufficient to proceed with the research, a questionnaire to collect data about breast cancer risk factors (social factors) was designed and conducted interviews for two hundred female breast cancer patients who attended to the breast cancer clinic in National Cancer Institute, Maharagama and also two hundred control cases were collected from the same hospital. According to the literature study, when it comes to development of computer aided tools, data mining techniques plays a major role specially in medical domain. So in this research five different data mining techniques: association rules, decision tree, linear regression, k-nearest neighbor and naive bayes classifiers, which have been identified through literature study as most prominent data mining techniques in breast cancer researches, are using with comparing their accuracy and reliability in order to come up with a better breast cancer risk model. After applying initial steps of knowledge discovery process such as data cleaning and data transformation, selected five data mining techniques were applied to the data set. As the results fifteen valid association rules were discovered through Apriori algorithm and also came up with a decision tree model using C4.5 algorithm (J48 algorithm in WEKA) with thirty one rules. Applying k-nearest neighbor algorithm and linear regression were not successful because of the high error rates. Among the five data mining techniques naive bayes classifier acquired higher performance than other techniques, which was then used to build the breast cancer risk model for Sri Lanka. iv The implemented breast cancer risk score model can deliver the knowledge gained through this research to the general public. It will help to identify breast cancer risk subgroups in Sri Lanka in order to make them aware of breast cancer risk and ultimately to mitigate breast cancer deaths in Sri Lanka. We can use this model as an early warning to society through community medicine such as MOH, PHI and nurses where this early identification of breast cancer will help breast cancer risk groups to overcome from that disease by having relevant treatments at right time. Not only that it will also help to government administrators in order to get precautions regarding breast cancer as well. Because of the limited scope of the research, only two hundred breast cancer patients were been able to be interviewed and all of them were limited to Maharagama National Cancer Hospital. So as a future work, we can collect more cancer data by interviewing more breast cancer patients not only from Maharagama hospital but also from other areas in Sri Lanka which will then might give good results from data mining techniques since data mining techniques give best results for bigger data sets. Not only that we can try some more data mining techniques such as Neural Networks, Self-Organizing Map, Support Vector Machine, Genetic algorithm, Fuzzy Sets and Rough Sets etc which might give good results than the used five data mining techniques in this research.
URI: http://hdl.handle.net/123456789/2476
Appears in Collections:SCS Individual Project - Final Thesis (2013)

Files in This Item:
File SizeFormat 
9001638.pdf
  Restricted Access
2.31 MBAdobe PDFView/Open Request a copy


Items in UCSC Digital Library are protected by copyright, with all rights reserved, unless otherwise indicated.