Gibson, Perry (2023) Compiler-centric across-stack deep learning acceleration. PhD thesis, University of Glasgow.
Full text available as:| ![[thumbnail of 2023GibsonPhD.pdf]](https://theses.gla.ac.uk/style/images/fileicons/other.png) | PDF Download (14MB) | 
Abstract
Optimizing the deployment of Deep Neural Networks (DNNs) is hard. Despite deep learning approaches increasingly providing state-of-the-art solutions to a variety of difficult problems, such as computer vision and natural language processing, DNNs can be prohibitively expensive, for example, in terms of inference time or memory usage. Effective exploration of the design space requires a holistic approach, including a range of topics from machine learning, systems, and hardware. The rapid proliferation of deep learning applications has raised demand for efficient exploration and acceleration of deep learning based solutions. However, managing the range of optimization techniques, as well as how they interact with each other across the stack is a non-trivial task. A family of emerging specialized compilers for deep learning, tensor compilers, appear to be a strong candidate to help manage the complexity of across-stack optimization choices, and enable new approaches.
This thesis presents new techniques and explorations of the Deep Learning Acceleration Stack (DLAS), with the perspective that the tensor compiler will increasingly be the center of this stack. First, we motivate the challenges in exploring DLAS, by describing the experience of running a perturbation study varying parameters at every layer of the stack. The core of the study is implemented using a tensor compiler, which reduces the complexity of evaluating the wide range of variants, although still requires a significant engineering effort to realize. Next, we develop a new algorithm for grouped convolution, a model optimization technique for which existing solutions provided poor inference time scaling. We implement and optimize our algorithm using a tensor compiler, outperforming existing approaches by 5.1× on average (arithmetic mean). Finally, we propose a technique, transfer-tuning, to reduce the search time required for automatic tensor compiler code optimization, reducing the search time required by 6.5× on average.
The techniques and contributions of this thesis across these interconnected domains demonstrate the exciting potential of tensor compilers to simplify and improve design space exploration for DNNs, and their deployment. The outcomes of this thesis enable new lines of research to enable machine learning developers to keep up with the rapidly evolving landscape of neural architectures and hardware.
| Item Type: | Thesis (PhD) | 
|---|---|
| Qualification Level: | Doctoral | 
| Subjects: | Q Science > QA Mathematics > QA75 Electronic computers. Computer science Q Science > QA Mathematics > QA76 Computer software | 
| Colleges/Schools: | College of Science and Engineering > School of Computing Science | 
| Supervisor's Name: | Cano Reyes, Dr. Jose | 
| Date of Award: | 2023 | 
| Depositing User: | Theses Team | 
| Unique ID: | glathesis:2023-83959 | 
| Copyright: | Copyright of this thesis is held by the author. | 
| Date Deposited: | 30 Nov 2023 11:57 | 
| Last Modified: | 05 Dec 2023 12:07 | 
| Thesis DOI: | 10.5525/gla.thesis.83959 | 
| URI: | https://theses.gla.ac.uk/id/eprint/83959 | 
| Related URLs: | 
Actions (login required)
|  | View Item | 
Downloads
Downloads per month over past year
 
         
             Tools
 Tools Tools
 Tools