Please use this identifier to cite or link to this item: https://dl.ucsc.cmb.ac.lk/jspui/handle/123456789/1778
Title: Using Active Learning to Gather Sinhala / English Data to Enhance Machine Translation System Performance
Authors: Dilani, W.G.D.T.
Issue Date:  12
Abstract: Statistical machine translation(SMT) systems requires a bilingual parallel corpus of source and target languages to learn the knowledge for translation.The quality of translations depends on the available amount of parallel data in the corpus.There are some languages which are only spoken by little population and there are some languages which do not have much on-line parallel resources even though most of the people are using those languages.These two kinds of languages are called lowdensity languages.Active learning has proved that it can deal with the situations where shortage of parallel data problem exists for SMT systems.SMT systems rely on statistical parameters .Thus it has few drawbacks. Kernel ridge regression is a new technique using for string-to-string mapping applications in natural language processing.It has also used to map source language features to target language features in ma- chine translation.This technique does not rely on probabilistic estimations.Therefore this dissertation investigates the applicability of active learning on kernel based machine translation. Detailed description of kernel ridge regression framework,how to decode using DeBruijn graphs,how to select sentences using active learning and a comparison of active learning over random selection with results are presented in this thesis.Interestingly active learning showed a good performance for kernel based machine translation over random selection even for a small amount of training dataset.Active learning selection strategy tend to select sentences which has more unseen words with respect to available parallel dataset.
URI: http://hdl.handle.net/123456789/1778
Appears in Collections:SCS Individual Project - Final Thesis (2012)

Files in This Item:
File Description SizeFormat 
10.pdf
  Restricted Access
1.03 MBAdobe PDFView/Open Request a copy


Items in UCSC Digital Library are protected by copyright, with all rights reserved, unless otherwise indicated.