Algorithmic skeletons for exact combinatorial search at scale

Archibald, Blair (2018) Algorithmic skeletons for exact combinatorial search at scale. PhD thesis, University of Glasgow.

Full text available as:
[thumbnail of 2018ArchibaldPhD.pdf] PDF
Download (1MB)
Printed Thesis Information: https://eleanor.lib.gla.ac.uk/record=b3330293

Abstract

Exact combinatorial search is essential to a wide range of application areas including constraint optimisation, graph matching, and computer algebra. Solutions to combinatorial problems are found by systematically exploring a search space, either to enumerate solutions, determine if a specific solution exists, or to find an optimal solution. Combinatorial searches are computationally hard both in theory and practice, and efficiently exploring the huge number of combinations is a real challenge, often addressed using approximate search algorithms. Alternatively, exact search can be parallelised to reduce execution time. However, parallel search is challenging due to both highly irregular search trees and sensitivity to search order, leading to anomalies that can cause unexpected speedups and slowdowns. As core counts continue to grow, parallel search becomes increasingly useful for improving the performance of existing searches, and allowing larger instances to be solved.
A high-level approach to parallel search allows non-expert users to benefit from increasing core counts. Algorithmic Skeletons provide reusable implementations of common parallelism patterns that are parameterised with user code which determines the specific computation, e.g. a particular search. We define a set of skeletons for exact search, requiring the user to provide in the minimal case a single class that specifies how the search tree is generated and a parameter that specifies the type of search required. The five are: Sequential search; three general-purpose parallel search methods: Depth-Bounded, Stack-Stealing, and Budget; and a specific parallel search method, Ordered, that guarantees replicable performance. We implement and evaluate the skeletons in a new C++ parallel search framework, YewPar. YewPar provides both high-level skeletons and low-level search specific schedulers and utilities to deal with the irregularity
of search and knowledge exchange between workers. YewPar is based on the HPX library for distributed task-parallelism potentially allowing search to execute on multi-cores, clusters, cloud, and high performance computing systems.
Underpinning the skeleton design is a novel formal model, MT^3 , a parallel operational semantics that describes multi-threaded tree traversals, allowing reasoning about parallel search, e.g. describing common parallel search phenomena such as performance anomalies.
YewPar is evaluated using seven different search applications (and over 25 specific instances): Maximum Clique, k-Clique, Subgraph Isomorphism, Travelling Salesperson, Binary Knapsack, Enumerating Numerical Semigroups, and the Unbalanced Tree Search Benchmark. The search instances are evaluated at multiple scales from 1 to 255 workers, on a 17 host, 272 core Beowulf cluster. The overheads of the skeletons are low, with a mean 6.1% slowdown compared to hand-coded sequential implementation. Crucially, for all search applications YewPar reduces search times by an order of magnitude, i.e hours/minutes to minutes/seconds, and we commonly see greater than 60% (average) parallel efficiency speedups for up to 255 workers. Comparing skeleton performance reveals that no one skeleton is best for all searches, highlighting a benefit of a skeleton approach that allows multiple parallelisations to be explored with minimal refactoring.
The Ordered skeleton avoids slowdown anomalies where, due to search knowledge being order dependent, a parallel search takes longer than a sequential search. Analysis of Ordered shows that, while being 41% slower on average (73% worse-case) than Depth-Bounded, in nearly all cases it maintains the following replicable performance properties: 1) parallel executions are no slower than one worker sequential executions 2) runtimes do not increase as workers are added, and 3) variance between repeated runs is low. In particular, where Ordered maintains a relative standard deviation (RSD) of less than 15%, Depth-Bounded suffers from an RSD greater than 50%, showing the importance of carefully controlling search orders for repeatability.

Item Type: Thesis (PhD)
Qualification Level: Doctoral
Keywords: Algorithmic skeletons, combinatorics, parallelism.
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Colleges/Schools: College of Science and Engineering > School of Computing Science
Funder's Name: Engineering and Physical Sciences Research Council (EPSRC), Engineering and Physical Sciences Research Council (EPSRC), Engineering and Physical Sciences Research Council (EPSRC)
Supervisor's Name: Trinder, Professor Phil, Maier, Dr. Patrick and Stewart, Dr. Rob
Date of Award: 2018
Depositing User: Mr. Blair Archibald
Unique ID: glathesis:2018-31000
Copyright: Copyright of this thesis is held by the author.
Date Deposited: 06 Nov 2018 10:51
Last Modified: 04 Jan 2019 09:48
URI: https://theses.gla.ac.uk/id/eprint/31000
Related URLs:

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year