Please use this identifier to cite or link to this item: https://dl.ucsc.cmb.ac.lk/jspui/handle/123456789/4388
Title: Automated Cricket News Generation in Sri Lankan Style using Natural Language Generation
Authors: Gunasiri, M. H. D. Y.
Issue Date: 3-Aug-2021
Abstract: Automated sports journalism is a relatively a new technologically innovative field where actual journalistic content is created through Natural Language Generation (NLG). NLG is a sub task of natural language processing where the human readable form of text is composed from a nonlinguistic representation of information. This dissertation provides a bibliographic review on applications of NLG related to journalism and other related NLG researches, as well as it provides a review on the tasks which are associated with the NLG process. Techniques and tools which support to perform those tasks and evaluate them, are also discussed. Cricket is one of the most followed sport in South Asia. There is a wide requirement for news to get produced within a short period after the cricket match. Within Sri Lankan publishing this news article is a manual process and needs journalists with domain and language competencies. Most of the time it is not possible for sport journalists to focus on in-depth reporting due to time and cost constraints. Therefore, automated process would be efficient and cost effective. As a solution, an Automatic Cricket News Generation System is presented through this research and how a template based natural language generation is utilized in implementing such system and its suitability are demonstrated. A system which could generate a journalistic summary of a cricket match using a score card in Sri Lankan style is implemented using pipeline approach in NLG. A methodology based on pipeline architecture is proposed for the system and it states how the data is transformed in each level in the architecture. Furthermore, it also focuses on the variation of the output generated via the system which is not much typically be in used in template-based NLG systems. The templates are created using the actual corpus written by journalists for the Sri Lankan newspapers. This system is evaluated under manual and automatic generation. At the same time each module is unit tested. The generated text is evaluated by comparing with a reference text under 3 parameters which are Similarity Score, Degree of closeness, Data count. The results show that the generation system is capable of producing a grammatically correct and easy to read news piece for a given cricket match. The summary generated was also compared with a summary written manually by an expert and it shows that although creativity is not in a satisfactory level, accurate information could be gained through the summary generated.
URI: http://dl.ucsc.cmb.ac.lk/jspui/handle/123456789/4388
Appears in Collections:2019

Files in This Item:
File Description SizeFormat 
2016MCS034.pdf1.32 MBAdobe PDFView/Open


Items in UCSC Digital Library are protected by copyright, with all rights reserved, unless otherwise indicated.