Runtime Systems: Taming the High Performance Computing Beast
The speaker is Prof. Cal Ribbens from the Department of Computer Science at Virginia Tech. The abstract of his talk follows. High performance computing (HPC) is an area of computer science and engineering that has always evolved rapidly---sometimes leading and sometimes riding succeeding waves of technical innovation. While HPC application developers and users have continued to benefit from the increasing power of these high-end resources, the increasing complexity of HPC execution environments will require more and more reliance on runtime systems. Parallelism, load-balancing, power, fault-tolerance, and hardware heterogeneity are just a few of the emerging dominant issues that require runtime solutions. In this talk I will briefly describe some of the motivations and trends in runtime systems for HPC. I will then describe two recent projects we have worked on at Virginia Tech. The first, ReSHAPE, is a runtime system that allows the number of nodes assigned to job running on a cluster to be changed at run time. Experimental results from a prototype implementation of ReSHAPE illustrate the potential of "malleable" jobs for improving overall cluster utilization and reducing turn-around time for individual jobs. The second project, Samhita, is a distributed shared memory (DSM) execution environment, which allow programs based on the widely used Pthreads library for shared memory thread parallelism to be easily ported to a distributed memory (cluster) platform. Samhita not only allows a wide range of parallel codes to be ported to a new context, but its design reduces the problem of DSM to a cache management problem, with corresponding opportunities for exploiting locality at runtime.