Please use this identifier to cite or link to this item:
Title: An Email Interpretation System for Dry Cargo Chartering
Authors: Jayawardhana, M.U.K.
Keywords: Research Subject Categories::TECHNOLOGY
data mining
Issue Date: 24-Sep-2013
Abstract: Information about open vessels (ready ships) and open cargo opportunities (commodities to ship) in Dry Cargo Chartering industry is communicated among charterers using emails. Those are manually written unstructured documents without following grammar rules, and hence complete sentences are rare. Emails originated from thousands of different charterers do not reflect a specific format and no general writing convention is used and writers also tend to copy and paste the content from other emails. Since an email has no limit for the information contained within it, mix of multiple vessels and cargos may contain in an email. Although there is a set of keywords and abbreviations commonly used in emails. A set of relationships can be seen between those keywords like synonyms or aliases, parent-child relationship, opposite meaning and occurrence of only one from a set of words. Readers have to be well versed in the industry terms to understand and extract the information. Especially knowledge in vessel names, ports and their locations, cargo types, conditions applied at their shipping and details of other chartering companies. This research targets the extraction of vessel and cargo information from those unstructured emails and converts them into a structured format which will allow for further processing and analyzing. After a study of relevant literature, we have initially discussed few possible approaches for information extraction from those unstructured emails. Machine learning based approach and natural language processing tools based approach are analyzed in terms of their strengths, weaknesses and relevance to our scenario. Progressively a mix approach to solve the problem is presented using natural language processing, text analytics, text mining and some statistical methods with necessary theoretical and implementation level details. Text pre-processing, text segmentation, construction of entity dictionaries, named entity recognition, word sense disambiguation, content re-writing, text processing template development, rules based processing and weight and threshold base calculation methods are the main underlying concepts in the solution. A sufficiently large and manually labeled baseline data-set is used for testing and evaluation purposes. Finally the outcome of presented solution is evaluated using measurements of precision, recall and f-measure.
Appears in Collections:Master of Computer Science - 2013

Files in This Item:
File Description SizeFormat 
  Restricted Access
1.32 MBAdobe PDFView/Open Request a copy

Items in UCSC Digital Library are protected by copyright, with all rights reserved, unless otherwise indicated.