Haris, Jude Christudas (2025) Hardware-software co-design of FPGA-based neural network accelerators for edge inference. PhD thesis, University of Glasgow.
Full text available as:![]() |
PDF
Download (6MB) |
Abstract
The demand for efficient Deep Neural Network (DNN) accelerators has increased due to the growing popularity of DNNs in various applications, including image classification, speech recognition, and natural language processing. However, designing flexible, reconfigurable, and efficient DNN accelerators is challenging due to the computational intensity and memory requirements of DNN models. As such, Field-Programmable Gate Arrays (FPGAs) have become a popular choice for implementing DNN accelerators due to their ability to be reconfigured to suit the requirements of the workload and their energy efficiency compared to traditional general-purpose CPUs and GPUs. However, designing efficient accelerators for resource-constrained edge devices with FPGAs is challenging. As such, this thesis focuses on solving the difficulties of designing new efficient DNN accelerators for resource-constrained edge FPGAs.
First, this thesis presents the SECDA methodology (SystemC Enabled Co design of DNN Accelerator), which enables hardware-software co-design of resource-constrained hardware accelerators for DNN inference on edge FPGAs. To expand upon the SECDA methodology, SECDA-TFLite and SECDA-LLM were developed to quickly adopt the design methodology within TensorFlow Lite and llama.cpp, two popular frameworks for DNN inference on edge devices.
Second, this thesis presents the design of the MM2IM architecture for accelerating Transposed Convolution (TCONV) operations within Generative Adversarial Networks (GANs) for resource-constrained edge devices. This architecture was developed utilising the SECDA methodology and the SECDA-TFLite toolkit. The MM2IM accelerator achieved an average speedup of 84× across 261 TFLite TCONV problem configurations compared to an ARM Neon-optimised CPU baseline.
Finally, this thesis presents AXI4MLIR, an extension to the MLIR compiler framework that enables efficient host-accelerator communication by automatically generating host driver code that is aware of the accelerator architecture and capable of performing efficient data transfers. Our experiments using specialised FPGA accelerators demonstrate AXI4MLIR’s versatility across different types of accelerators and problems, showcasing significant CPU cache reference reductions (up to 56%) and up to a 1.65× speedup compared to manually optimised driver code implementations.
Item Type: | Thesis (PhD) |
---|---|
Qualification Level: | Doctoral |
Subjects: | Q Science > QA Mathematics > QA75 Electronic computers. Computer science Q Science > QA Mathematics > QA76 Computer software |
Colleges/Schools: | College of Science and Engineering > School of Computing Science |
Supervisor's Name: | Cano Reyes, Dr. Jose |
Date of Award: | 2025 |
Depositing User: | Theses Team |
Unique ID: | glathesis:2025-85185 |
Copyright: | Copyright of this thesis is held by the author. |
Date Deposited: | 13 Jun 2025 07:39 |
Last Modified: | 13 Jun 2025 07:41 |
Thesis DOI: | 10.5525/gla.thesis.85185 |
URI: | https://theses.gla.ac.uk/id/eprint/85185 |
Related URLs: |
Actions (login required)
![]() |
View Item |
Downloads
Downloads per month over past year