Enlighten Theses

In this section

Hardware-software co-design of FPGA-based neural network accelerators for edge inference

Haris, Jude Christudas (2025) Hardware-software co-design of FPGA-based neural network accelerators for edge inference. PhD thesis, University of Glasgow.

Full text available as:

PDF
Download (6MB)

Abstract

The demand for efficient Deep Neural Network (DNN) accelerators has increased due to the growing popularity of DNNs in various applications, including image classification, speech recognition, and natural language processing. However, designing flexible, reconfigurable, and efficient DNN accelerators is challenging due to the computational intensity and memory requirements of DNN models. As such, Field-Programmable Gate Arrays (FPGAs) have become a popular choice for implementing DNN accelerators due to their ability to be reconfigured to suit the requirements of the workload and their energy efficiency compared to traditional general-purpose CPUs and GPUs. However, designing efficient accelerators for resource-constrained edge devices with FPGAs is challenging. As such, this thesis focuses on solving the difficulties of designing new efficient DNN accelerators for resource-constrained edge FPGAs.

First, this thesis presents the SECDA methodology (SystemC Enabled Co design of DNN Accelerator), which enables hardware-software co-design of resource-constrained hardware accelerators for DNN inference on edge FPGAs. To expand upon the SECDA methodology, SECDA-TFLite and SECDA-LLM were developed to quickly adopt the design methodology within TensorFlow Lite and llama.cpp, two popular frameworks for DNN inference on edge devices.

Second, this thesis presents the design of the MM2IM architecture for accelerating Transposed Convolution (TCONV) operations within Generative Adversarial Networks (GANs) for resource-constrained edge devices. This architecture was developed utilising the SECDA methodology and the SECDA-TFLite toolkit. The MM2IM accelerator achieved an average speedup of 84× across 261 TFLite TCONV problem configurations compared to an ARM Neon-optimised CPU baseline.

Finally, this thesis presents AXI4MLIR, an extension to the MLIR compiler framework that enables efficient host-accelerator communication by automatically generating host driver code that is aware of the accelerator architecture and capable of performing efficient data transfers. Our experiments using specialised FPGA accelerators demonstrate AXI4MLIR’s versatility across different types of accelerators and problems, showcasing significant CPU cache reference reductions (up to 56%) and up to a 1.65× speedup compared to manually optimised driver code implementations.

Item Type:	Thesis (PhD)
Qualification Level:	Doctoral
Subjects:	Q Science > QA Mathematics > QA75 Electronic computers. Computer science Q Science > QA Mathematics > QA76 Computer software
Colleges/Schools:	College of Science and Engineering > School of Computing Science
Supervisor's Name:	Cano Reyes, Dr. Jose
Date of Award:	2025
Depositing User:	Theses Team
Unique ID:	glathesis:2025-85185
Copyright:	Copyright of this thesis is held by the author.
Date Deposited:	13 Jun 2025 07:39
Last Modified:	13 Jun 2025 07:41
Thesis DOI:	10.5525/gla.thesis.85185
URI:	https://theses.gla.ac.uk/id/eprint/85185
Related URLs:	Conference proceeding Article DOI Conference proceeding Pre-print Pre-print Enlighten Publications Record

Actions (login required)

View Item

Downloads

Downloads per month over past year

Tools

Enlighten Theses

Hardware-software co-design of FPGA-based neural network accelerators for edge inference

Abstract

Actions (login required)

Downloads

Library