Performance Comparison of Parallel Algorithms on Small GPU Clusters

Please use this identifier to cite or link to this item: https://dl.ucsc.cmb.ac.lk/jspui/handle/123456789/1662

Title:	Performance Comparison of Parallel Algorithms on Small GPU Clusters
Authors:	Karunadasa, R.G.N.P.
Issue Date:	18-Dec-2013
Abstract:	CUDA programmed GPUs are rapidly becoming a major choice in high performance computing and there are a growing number of applications which are being ported to the CUDA platform. However much less research has been carried out to evaluate the performance when CUDA is integrated with other parallel programming paradigms. We have developed a general purpose matrix multiplication algorithm and a Conjugate Gradient algorithm using CUDA and MPI. In this approach, MPI works as the data distributing mechanism between the GPU nodes and CUDA as the main computing engine. This enables the programmer to connect GPU nodes via high speed Ethernet without special technologies and also it helps the programmer to see the separate GPU nodes as they are and execute different components of a program in several GPU nodes. We have achieved a significant performance gain in CUDA+MPI based Strassen s algorithm compared to MPI only Strassen s algorithm running on six node cluster. The performance of CUDA+MPI based Conjugate Gradient Algorithm has a comparatively poor performance than its MPI only counter part. We identify the suitable categories of applications that can use the combined power of CUDA and MPI effectively.
URI:	http://hdl.handle.net/123456789/1662
Appears in Collections:	SCS Individual Project - Final Thesis (2009)

Files in This Item:

File	Description	Size	Format
20.pdf Restricted Access		460.4 kB	Adobe PDF	View/Open Request a copy