Enlighten Theses

In this section

Exploiting compression techniques for efficient edge AI

Hu, Wenhao (2026) Exploiting compression techniques for efficient edge AI. PhD thesis, University of Glasgow.

Full text available as:

PDF
Download (22MB)

Abstract

Deep Neural Network (DNN) model compression is a key technique for achieving Artificial Intelligence on edge devices (edge AI). There are three major categories of DNN model compression techniques: a) pruning, methods that remove weights from DNN models; b) quantization, methods that reduce the bit length of weight and/or activation values of DNN models; and c) knowledge distillation, methods that can distill the knowledge from a larger model to a smaller one. However, while these model compression techniques mainly focus on theoretical feasibility in providing good balances of accuracy and compression rates, their practicality in the real world is commonly ignored. The real world has many limitations, such as limited available data, time, energy, and hardware resources, which can reduce the usability of those model compression techniques.
In this thesis, we propose three practical challenges in DNN model compression for edge AI with three contributions to deal with them. They are: (1) Iteratively pruning of a DNN model can be slow and inefficient. To solve this challenge, we propose ICE-Pruning, an efficient iterative structured pruning pipeline; (2) Existing DNN model activation quantization methods can be expensive and not hardware friendly, especially for edge devices. For this challenge, we propose eDQA, a deep quantization method for DNN activations; and (3) Fine-tuning pruned DNN models can be infeasible when the access to labeled data is limited. For this, we propose Neural-Mimicking, a method to recover the DNN accuracy after unstructured pruning without using fine-tuning.
Overall, these contributions provide significant improvements: (1) ICE-Pruning accelerates iterative pruning by up to 5.82× while maintaining accuracies; (2) eDQA shows up to 75% better accuracy compared to three existing methods, achieving up to 309× speedup on an edge device compared to a state-of-the-art method; and (3) Neural-Mimicking improves accuracy up to 27% compared to three state-of-the-art methods and requires 88% fewer floating point operations compared to the conventional method.
These contributions advocate for involving the attention (beyond only balancing the accuracy and compression rates) of real-world practicality to the research of DNN model compression, and can be used to optimize future DNN model compression methods for edge AI.

Item Type:	Thesis (PhD)
Qualification Level:	Doctoral
Subjects:	Q Science > QA Mathematics > QA75 Electronic computers. Computer science Q Science > QA Mathematics > QA76 Computer software
Colleges/Schools:	College of Science and Engineering > School of Computing Science
Supervisor's Name:	Cano Reyes, Dr. Jose
Date of Award:	2026
Depositing User:	Theses Team
Unique ID:	glathesis:2026-85864
Copyright:	Copyright of this thesis is held by the author.
Date Deposited:	16 Apr 2026 10:08
Last Modified:	05 May 2026 14:45
Thesis DOI:	10.5525/gla.thesis.85864
URI:	https://theses.gla.ac.uk/id/eprint/85864
Related URLs:	Conference item Conference item Conference proceeding Conference proceeding

Actions (login required)

View Item

Downloads

Downloads per month over past year

Tools

Enlighten Theses

Exploiting compression techniques for efficient edge AI

Abstract

Actions (login required)

Downloads

Library