Please use this identifier to cite or link to this item: https://dl.ucsc.cmb.ac.lk/jspui/handle/123456789/1662
Full metadata record
DC FieldValueLanguage
dc.thesis.supervisorRanasinghe, D.N. (Dr.)-
dc.contributor.authorKarunadasa, R.G.N.P.en_US
dc.date.accessioned2013-12-18T11:58:46Z-
dc.date.available2013-12-18T11:58:46Z-
dc.date.issued2013-12-18-
dc.identifier.urihttp://hdl.handle.net/123456789/1662-
dc.description.abstractCUDA programmed GPUs are rapidly becoming a major choice in high performance computing and there are a growing number of applications which are being ported to the CUDA platform. However much less research has been carried out to evaluate the performance when CUDA is integrated with other parallel programming paradigms. We have developed a general purpose matrix multiplication algorithm and a Conjugate Gradient algorithm using CUDA and MPI. In this approach, MPI works as the data distributing mechanism between the GPU nodes and CUDA as the main computing engine. This enables the programmer to connect GPU nodes via high speed Ethernet without special technologies and also it helps the programmer to see the separate GPU nodes as they are and execute different components of a program in several GPU nodes. We have achieved a significant performance gain in CUDA+MPI based Strassen s algorithm compared to MPI only Strassen s algorithm running on six node cluster. The performance of CUDA+MPI based Conjugate Gradient Algorithm has a comparatively poor performance than its MPI only counter part. We identify the suitable categories of applications that can use the combined power of CUDA and MPI effectively.en_US
dc.titlePerformance Comparison of Parallel Algorithms on Small GPU Clustersen_US
Appears in Collections:SCS Individual Project - Final Thesis (2009)

Files in This Item:
File Description SizeFormat 
20.pdf
  Restricted Access
460.4 kBAdobe PDFView/Open Request a copy


Items in UCSC Digital Library are protected by copyright, with all rights reserved, unless otherwise indicated.