Compiler-centric across-stack deep learning acceleration

Gibson, Perry (2023) Compiler-centric across-stack deep learning acceleration. PhD thesis, University of Glasgow.

Full text available as:
[thumbnail of 2023GibsonPhD.pdf] PDF
Download (14MB)

Abstract

Optimizing the deployment of Deep Neural Networks (DNNs) is hard. Despite deep learning approaches increasingly providing state-of-the-art solutions to a variety of difficult problems, such as computer vision and natural language processing, DNNs can be prohibitively expensive, for example, in terms of inference time or memory usage. Effective exploration of the design space requires a holistic approach, including a range of topics from machine learning, systems, and hardware. The rapid proliferation of deep learning applications has raised demand for efficient exploration and acceleration of deep learning based solutions. However, managing the range of optimization techniques, as well as how they interact with each other across the stack is a non-trivial task. A family of emerging specialized compilers for deep learning, tensor compilers, appear to be a strong candidate to help manage the complexity of across-stack optimization choices, and enable new approaches.

This thesis presents new techniques and explorations of the Deep Learning Acceleration Stack (DLAS), with the perspective that the tensor compiler will increasingly be the center of this stack. First, we motivate the challenges in exploring DLAS, by describing the experience of running a perturbation study varying parameters at every layer of the stack. The core of the study is implemented using a tensor compiler, which reduces the complexity of evaluating the wide range of variants, although still requires a significant engineering effort to realize. Next, we develop a new algorithm for grouped convolution, a model optimization technique for which existing solutions provided poor inference time scaling. We implement and optimize our algorithm using a tensor compiler, outperforming existing approaches by 5.1× on average (arithmetic mean). Finally, we propose a technique, transfer-tuning, to reduce the search time required for automatic tensor compiler code optimization, reducing the search time required by 6.5× on average.

The techniques and contributions of this thesis across these interconnected domains demonstrate the exciting potential of tensor compilers to simplify and improve design space exploration for DNNs, and their deployment. The outcomes of this thesis enable new lines of research to enable machine learning developers to keep up with the rapidly evolving landscape of neural architectures and hardware.

Item Type: Thesis (PhD)
Qualification Level: Doctoral
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Q Science > QA Mathematics > QA76 Computer software
Colleges/Schools: College of Science and Engineering > School of Computing Science
Supervisor's Name: Cano Reyes, Dr. Jose
Date of Award: 2023
Depositing User: Theses Team
Unique ID: glathesis:2023-83959
Copyright: Copyright of this thesis is held by the author.
Date Deposited: 30 Nov 2023 11:57
Last Modified: 05 Dec 2023 12:07
Thesis DOI: 10.5525/gla.thesis.83959
URI: https://theses.gla.ac.uk/id/eprint/83959
Related URLs:

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year