GUMSMP: a scalable parallel Haskell implementation

Aljabri, Malak Saleh (2015) GUMSMP: a scalable parallel Haskell implementation. PhD thesis, University of Glasgow.

Full text available as:
Download (5MB) | Preview


The most widely available high performance platforms today are hierarchical, with shared memory leaves, e.g. clusters of multi-cores, or NUMA with multiple regions. The Glasgow Haskell Compiler (GHC) provides a number of parallel Haskell implementations targeting different parallel architectures. In particular, GHC-SMP supports shared memory architectures, and GHC-GUM supports distributed memory machines. Both implementations use different, but related, runtime system (RTS) mechanisms and achieve good performance. A specialised RTS for the ubiquitous hierarchical architectures is lacking. This thesis presents the design, implementation, and evaluation of a new parallel Haskell RTS, GUMSMP, that combines shared and distributed memory mechanisms to exploit hierarchical architectures more effectively. The design evaluates a variety of design choices and aims to efficiently combine scalable distributed memory parallelism, using a virtual shared heap over a hierarchical architecture, with low-overhead shared memory parallelism on shared memory nodes. Key design objectives in realising this system are to prefer local work, and to exploit mostly passive load distribution with pre-fetching. Systematic performance evaluation shows that the automatic hierarchical load distribution policies must be carefully tuned to obtain good performance. We investigate the impact of several policies including work pre-fetching, favouring inter-node work distribution, and spark segregation with different export and select policies. We present the performance results for GUMSMP, demonstrating good scalability for a set of benchmarks on up to 300 cores. Moreover, our policies provide performance improvements of up to a factor of 1.5 compared to GHC- GUM. The thesis provides a performance evaluation of distributed and shared heap implementations of parallel Haskell on a state-of-the-art physical shared memory NUMA machine. The evaluation exposes bottlenecks in memory management, which limit scalability beyond 25 cores. We demonstrate that GUMSMP, that combines both distributed and shared heap abstractions, consistently outper- forms the shared memory GHC-SMP on seven benchmarks by a factor of 3.3 on average. Specifically, we show that the best results are obtained when shar- ing memory only within a single NUMA region, and using distributed memory system abstractions across the regions.

Item Type: Thesis (PhD)
Qualification Level: Doctoral
Keywords: parallel, multi-core
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Colleges/Schools: College of Science and Engineering > School of Computing Science
Funder's Name: UNSPECIFIED
Supervisor's Name: Trinder, Professor Phil and Loidl, Dr. Hans-Wolfgang
Date of Award: 2015
Depositing User: Mrs Malak Saleh Aljabri
Unique ID: glathesis:2015-6822
Copyright: Copyright of this thesis is held by the author.
Date Deposited: 03 Nov 2015 10:29
Last Modified: 19 Nov 2015 08:40

Actions (login required)

View Item View Item