Hu, Wenhao (2026) Exploiting compression techniques for efficient edge AI. PhD thesis, University of Glasgow.
Full text available as:|
PDF
Download (22MB) |
Abstract
Deep Neural Network (DNN) model compression is a key technique for achieving Artificial Intelligence on edge devices (edge AI). There are three major categories of DNN model compression techniques: a) pruning, methods that remove weights from DNN models; b) quantization, methods that reduce the bit length of weight and/or activation values of DNN models; and c) knowledge distillation, methods that can distill the knowledge from a larger model to a smaller one. However, while these model compression techniques mainly focus on theoretical feasibility in providing good balances of accuracy and compression rates, their practicality in the real world is commonly ignored. The real world has many limitations, such as limited available data, time, energy, and hardware resources, which can reduce the usability of those model compression techniques.
In this thesis, we propose three practical challenges in DNN model compression for edge AI with three contributions to deal with them. They are: (1) Iteratively pruning of a DNN model can be slow and inefficient. To solve this challenge, we propose ICE-Pruning, an efficient iterative structured pruning pipeline; (2) Existing DNN model activation quantization methods can be expensive and not hardware friendly, especially for edge devices. For this challenge, we propose eDQA, a deep quantization method for DNN activations; and (3) Fine-tuning pruned DNN models can be infeasible when the access to labeled data is limited. For this, we propose Neural-Mimicking, a method to recover the DNN accuracy after unstructured pruning without using fine-tuning.
Overall, these contributions provide significant improvements: (1) ICE-Pruning accelerates iterative pruning by up to 5.82× while maintaining accuracies; (2) eDQA shows up to 75% better accuracy compared to three existing methods, achieving up to 309× speedup on an edge device compared to a state-of-the-art method; and (3) Neural-Mimicking improves accuracy up to 27% compared to three state-of-the-art methods and requires 88% fewer floating point operations compared to the conventional method.
These contributions advocate for involving the attention (beyond only balancing the accuracy and compression rates) of real-world practicality to the research of DNN model compression, and can be used to optimize future DNN model compression methods for edge AI.
| Item Type: | Thesis (PhD) |
|---|---|
| Qualification Level: | Doctoral |
| Subjects: | Q Science > QA Mathematics > QA75 Electronic computers. Computer science Q Science > QA Mathematics > QA76 Computer software |
| Colleges/Schools: | College of Science and Engineering > School of Computing Science |
| Supervisor's Name: | Cano Reyes, Dr. Jose |
| Date of Award: | 2026 |
| Depositing User: | Theses Team |
| Unique ID: | glathesis:2026-85864 |
| Copyright: | Copyright of this thesis is held by the author. |
| Date Deposited: | 16 Apr 2026 10:08 |
| Last Modified: | 16 Apr 2026 10:08 |
| Thesis DOI: | 10.5525/gla.thesis.85864 |
| URI: | https://theses.gla.ac.uk/id/eprint/85864 |
| Related URLs: |
Actions (login required)
![]() |
View Item |
Downloads
Downloads per month over past year

Tools
Tools