Please use this identifier to cite or link to this item: https://dl.ucsc.cmb.ac.lk/jspui/handle/123456789/4160
Full metadata record
DC FieldValueLanguage
dc.contributor.authorHisan, Mohamed Hunais Mohamed-
dc.date.accessioned2021-07-19T08:07:17Z-
dc.date.available2021-07-19T08:07:17Z-
dc.date.issued2021-07-19-
dc.identifier.urihttp://dl.ucsc.cmb.ac.lk/jspui/handle/123456789/4160-
dc.description.abstractThe Internet is a place where people tend to access in search of knowledge. An immense amount of information is available in many different languages and they can be accessed by people irrespective of the location and time. But it has been observed that search engines do not always provide relevant answers when searching using a less popular language including Sinhala which is one of the native languages of Sri Lanka. Although relevant documents are available for the given query, search engines are not able to link the queries to the appropriate documents since the query and documents are in two different languages. This study focuses on performing Cross Language Information Retrieval (CLIR) from Sinhala to English to retrieve relevant web documents. This includes determining whether a proper system can be built which could perform such a task effectively. To the best of my knowledge, there have been no efforts taken to perform CLIR involving Sinhala Language. In addition to the normal procedure of retrieving documents, this study checks whether there is a different order of importance of the documents when they are translated back to the language of the query. A word embedding based approach was considered to represent words since they have shown to be effective in representing text data. Several translation models were employed to obtain the equivalent English query for a given Sinhala query and the Linear Transformation combined with the Standard Nearest Neighbour Retrieval method has performed well. Among the Re-ranking models used in this study, the LSI based re-ranking model was performed well. But re-ranking the documents did not show a positive impact. A brief user-based evaluation was performed and the results showed that it is possible to perform Sinhala to English CLIR using a word embedding based approach.en_US
dc.language.isoenen_US
dc.subjectCross Language Information Retrieval (CLIR)en_US
dc.subjectTranslation Modelsen_US
dc.subjectWord Embedding based Approachen_US
dc.titleCross Language Information Retrieval for Accessing the English Web in Sinhalaen_US
dc.typeThesisen_US
Appears in Collections:2019

Files in This Item:
File Description SizeFormat 
2015 CS 053.pdf2.7 MBAdobe PDFView/Open


Items in UCSC Digital Library are protected by copyright, with all rights reserved, unless otherwise indicated.