GUMSMP: a scalable parallel Haskell implementation

Aljabri, Malak Saleh (2015) GUMSMP: a scalable parallel Haskell implementation. PhD thesis, University of Glasgow.

Full text available as:
[img]
Preview
PDF
Download (5MB) | Preview

Abstract

The most widely available high performance platforms today are hierarchical,
with shared memory leaves, e.g. clusters of multi-cores, or NUMA with multiple
regions. The Glasgow Haskell Compiler (GHC) provides a number of parallel
Haskell implementations targeting different parallel architectures. In particular,
GHC-SMP supports shared memory architectures, and GHC-GUM supports
distributed memory machines. Both implementations use different, but related,
runtime system (RTS) mechanisms and achieve good performance. A specialised
RTS for the ubiquitous hierarchical architectures is lacking.
This thesis presents the design, implementation, and evaluation of a new
parallel Haskell RTS, GUMSMP, that combines shared and distributed memory
mechanisms to exploit hierarchical architectures more effectively. The design
evaluates a variety of design choices and aims to efficiently combine scalable
distributed memory parallelism, using a virtual shared heap over a hierarchical
architecture, with low-overhead shared memory parallelism on shared memory
nodes. Key design objectives in realising this system are to prefer local work,
and to exploit mostly passive load distribution with pre-fetching.
Systematic performance evaluation shows that the automatic hierarchical load
distribution policies must be carefully tuned to obtain good performance. We
investigate the impact of several policies including work pre-fetching, favouring
inter-node work distribution, and spark segregation with different export and
select policies. We present the performance results for GUMSMP, demonstrating
good scalability for a set of benchmarks on up to 300 cores. Moreover, our policies
provide performance improvements of up to a factor of 1.5 compared to GHC-
GUM.
The thesis provides a performance evaluation of distributed and shared heap
implementations of parallel Haskell on a state-of-the-art physical shared memory
NUMA machine. The evaluation exposes bottlenecks in memory management,
which limit scalability beyond 25 cores. We demonstrate that GUMSMP, that
combines both distributed and shared heap abstractions, consistently outper-
forms the shared memory GHC-SMP on seven benchmarks by a factor of 3.3
on average. Specifically, we show that the best results are obtained when shar-
ing memory only within a single NUMA region, and using distributed memory
system abstractions across the regions.

Item Type: Thesis (PhD)
Qualification Level: Doctoral
Keywords: parallel, multi-core
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Colleges/Schools: College of Science and Engineering > School of Computing Science
Supervisor's Name: Trinder, Professor Phil and Loidl, Dr. Hans-Wolfgang
Date of Award: 2015
Depositing User: Mrs Malak Saleh Aljabri
Unique ID: glathesis:2015-6822
Copyright: Copyright of this thesis is held by the author.
Date Deposited: 03 Nov 2015 10:29
Last Modified: 19 Nov 2015 08:40
URI: http://theses.gla.ac.uk/id/eprint/6822

Actions (login required)

View Item View Item