Compiler-hardware co-design in High-Level Synthesis

Szafarczyk, Robert (2025) Compiler-hardware co-design in High-Level Synthesis. PhD thesis, University of Glasgow.

Full text available as:
[thumbnail of 2025szafarczykphd.pdf] PDF
Download (1MB)

Abstract

High-Level Synthesis (HLS) simplifies the hardware design process by generating specialized hardware directly from an algorithmic software description. Current HLS tools work well on regular code, but are suboptimal on irregular code with data-dependent memory accesses and control-flow. This is because they follow a Finite State Machine with Datapath (FSMD) model of computation, which requires the compiler to schedule operations statically at compile time, failing to adapt to runtime conditions. A Dynamic Dataflow (DDF) model of computation in HLS augments each functional unit with additional scheduling logic that enables dataflow scheduling at runtime, naturally adapting to the unpredictable conditions in irregular codes. However, the resulting hardware generated by DDF HLS uses more area and produces longer critical paths than necessary, because every operator in the circuit is scheduled dynamically, even if only a few exhibit irregular behavior.

In this thesis, we propose a closer compiler-hardware co-design to make the HLS of irregular codes more efficient. We make four significant contributions. First, we show how the FSMD computational model can be extended with DDF behavior without having to schedule the entire circuit dynamically. This is achieved by letting the compiler discover sources of irregularity that prevent efficient static scheduling and by decoupling the original code into multiple FSMD instances along the discovered sources of irregularity. Second, we show how a compiler can automatically generate a Decoupled Access/Execute (DAE) architecture to enable efficient out-of-order dynamic memory scheduling in HLS, and we show how a compiler can automatically parametrize hardware structures, such as a Load-Store Queue (LSQ), to maximize throughput at minimal area usage. Third, we introduce compiler support for speculation in DAE architectures with two algorithms: one that speculates memory requests in the access program slice, and another that poisons mis-speculations in the compute slice, all without the need for mis-speculation recovery or synchronization. And finally, we show that a close compiler-hardware co-design can enable new optimization opportunities by presenting dynamic loop fusion. This novel technique is able to fuse the execution of sibling loops dynamically at runtime by resolving inter-loop memory dependencies in a hardware structure parametrized by the compiler. To enable dynamic loop fusion, we introduce a new hardware-optimized program-order schedule inspired by polyhedral compilers and we exploit the concept of monotonically non-decreasing address expressions—a larger class of functions than affine expressions required in static loop fusion. Our FPGA-based experiments show that our four contributions consistently result in at least an order of magnitude area-delay improvement over state-of-the-art HLS tools on irregular codes.

Item Type: Thesis (PhD)
Qualification Level: Doctoral
Additional Information: Supported by a scholarship from the UK Engineering and Physical Sciences Research Council.
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Colleges/Schools: College of Science and Engineering > School of Computing Science
Funder's Name: Engineering and Physical Sciences Research Council (EPSRC)
Supervisor's Name: Vanderbauwhede, Professor Wim and Nabi, Dr. Syed Waqar
Date of Award: 2025
Depositing User: Theses Team
Unique ID: glathesis:2025-85229
Copyright: Copyright of this thesis is held by the author.
Date Deposited: 20 Jun 2025 08:33
Last Modified: 20 Jun 2025 08:35
Thesis DOI: 10.5525/gla.thesis.85229
URI: https://theses.gla.ac.uk/id/eprint/85229
Related URLs:

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year