Automated Microsoft O ce Word Document Marking System

Gunathilaka, H.M.I

Please use this identifier to cite or link to this item: https://dl.ucsc.cmb.ac.lk/jspui/handle/123456789/3732

Title:	Automated Microsoft O ce Word Document Marking System
Authors:	Gunathilaka, H.M.I
Issue Date:	15-Sep-2016
Abstract:	Information Technology is a rapidly growing and changing eld in the world. This led has a lot of ongoing research and development engaging in various other elds for new innovative products to make real world scenarios much easier, quick and comfortable. This research is focused on innovating a new application to automate Microsoft O ce Word document marking process to achieve fast and high qual- ity document marking. Time and accuracy is the main attribute of the document marking. The key challenge of document marking is to mark more documents con- suming less time with higher accuracy. Microsoft O ce package is known as a vast area. This research focuses mainly on a speci c area which is Microsoft O ce Word documents with OOXML (Open O ce XML) le format. OOXML le format uses XML and ZIP archive technologies to create a document le. This research will brie y discuss the le structure of OOXML le format and data extraction from Microsoft O ce Word document le with OOXML le format. Apache Tika and POI APIs are used to extract data (content and format) from OOXML le formatted Microsoft O ce Word document les. Apache POI API provides many classes to extract content and format from the Microsoft O ce doc- uments and XWPFDocument class can be used to read (extract content and format) and write the OOXML le format Microsoft O ce Word document les. This class represents entire document content and format and it acts as a root level of class hierarchy. It aggregates with various classes which are provided by Apache POI API to represent various content types in the document OOXML le format such as paragraphs, tables, pictures, etc. and all of those classes contain the format of the content. Content and format matching is the main task of this research and we tried to extract and match the content and the format of the two Microsoft O ce Word documents with OOXML le format. The implemented system has three modules, "document content and format extraction" module, "documents content and format matching" module and "produce result document" module. The implemented sys- tem can recognize completely, partially and mismatch matching content and format in the compared both Microsoft O ce Word documents with OOXML le format. We describe our approach, along with the design of a prototype implementation and its evaluation included in this report.
URI:	http://hdl.handle.net/123456789/3732
Appears in Collections:	Master of Computer Science - 2016

Files in This Item:

File	Description	Size	Format
13440234.pdf Restricted Access		5.36 MB	Adobe PDF	View/Open Request a copy

Show full item record