Please use this identifier to cite or link to this item: https://dl.ucsc.cmb.ac.lk/jspui/handle/123456789/4272
Title: Data Leakage Prevention Framework Through Information Sensitivity Classification
Authors: Lakshman, M.S.C.
Issue Date: 28-Jul-2021
Abstract: Data Leakage Prevention System is one of the core elements in Information Security Tools Framework among other utilities such as Intrusion Detection and Prevention Systems, Firewalls, Security Incident & Event Management Systems, Spam Filters etc. In today’s context, many security incidents occur by the insiders via intentional or unintentional information sharing with unauthorized personals or systems. Thus, a Data Leakage Prevention System plays a key role in securing information assets and it works on three principle domains, namely, Information Asset Discovery, Monitoring and Prevention. ‘Discovery’ stage should identify the available Information Assets within an organization while discovering the sensitivity levels associated with respective assets. Today, this step is either a manual process where information asset owner is responsible for assigning the classification label for the asset or an automated process where various classification mechanisms are applied on the assets. Automated Classification is not yet fully adopted in to the ‘Commercial Data Leakage Prevention Systems’ due to the unpredictable ‘Accuracy Levels’. This experiment was done for identifying a better technique for classifying information assets of a Domain Specific Data Set with an increased accuracy level. Multi-Layer Perceptron Neural Network was identified as ~98% accurate in classification for the considered data set. ~97% and ~96% was the highest accuracy level observed for Random Forest and Convolution Neural Network techniques respectively. Even though the experiment was performed on another non-standard model which combines the Random Forest with Convolution Neural Network, 60% was the maximum accuracy level achieved. The proposed Multi-Layer Perceptron Neural Network technique achieved ~1% accuracy improvement over Random Forest while Random Forest was the well-accepted algorithm for a Data Set classification. A realistic data set was prepared as part of this experiment where the Systems Integrator Industry was the target domain. Prepared data set comprised of Legal Documents, HR Documents, Data Sheets, Solution Documents, Agreements, Policy Documents and White Papers. Data Set was finally classified in to four different classes based on the industry acceptance. The different classes are based on sensitivity levels, namely, High Sensitive, Sensitive, Sensitive, Non-Sensitive and Open.
URI: http://dl.ucsc.cmb.ac.lk/jspui/handle/123456789/4272
Appears in Collections:2019

Files in This Item:
File Description SizeFormat 
2016MIS018.docx2.01 MBMicrosoft Word XMLView/Open


Items in UCSC Digital Library are protected by copyright, with all rights reserved, unless otherwise indicated.