Please use this identifier to cite or link to this item: https://dl.ucsc.cmb.ac.lk/jspui/handle/123456789/4160
Title: Cross Language Information Retrieval for Accessing the English Web in Sinhala
Authors: Hisan, Mohamed Hunais Mohamed
Keywords: Cross Language Information Retrieval (CLIR)
Translation Models
Word Embedding based Approach
Issue Date: 19-Jul-2021
Abstract: The Internet is a place where people tend to access in search of knowledge. An immense amount of information is available in many different languages and they can be accessed by people irrespective of the location and time. But it has been observed that search engines do not always provide relevant answers when searching using a less popular language including Sinhala which is one of the native languages of Sri Lanka. Although relevant documents are available for the given query, search engines are not able to link the queries to the appropriate documents since the query and documents are in two different languages. This study focuses on performing Cross Language Information Retrieval (CLIR) from Sinhala to English to retrieve relevant web documents. This includes determining whether a proper system can be built which could perform such a task effectively. To the best of my knowledge, there have been no efforts taken to perform CLIR involving Sinhala Language. In addition to the normal procedure of retrieving documents, this study checks whether there is a different order of importance of the documents when they are translated back to the language of the query. A word embedding based approach was considered to represent words since they have shown to be effective in representing text data. Several translation models were employed to obtain the equivalent English query for a given Sinhala query and the Linear Transformation combined with the Standard Nearest Neighbour Retrieval method has performed well. Among the Re-ranking models used in this study, the LSI based re-ranking model was performed well. But re-ranking the documents did not show a positive impact. A brief user-based evaluation was performed and the results showed that it is possible to perform Sinhala to English CLIR using a word embedding based approach.
URI: http://dl.ucsc.cmb.ac.lk/jspui/handle/123456789/4160
Appears in Collections:2019

Files in This Item:
File Description SizeFormat 
2015 CS 053.pdf2.7 MBAdobe PDFView/Open


Items in UCSC Digital Library are protected by copyright, with all rights reserved, unless otherwise indicated.