Enlighten Theses

In this section

Image classification with foveated neural networks

Killick, George William (2025) Image classification with foveated neural networks. PhD thesis, University of Glasgow.

Full text available as:

PDF
Download (8MB)

Abstract

Foveated vision draws inspiration from the way many biological vision systems process visual information. It features space-variant resolution, concentrating high-resolution sampling in a small area known as the fovea. This approach aims to deliver the same visual acuity and field of view as uniform vision but with significantly fewer pixels. This reduction in pixel count can potentially lower the computational demands of subsequent visual processes without compromising their effectiveness in performing visual perception tasks. Despite these qualities of foveated vision, a uniform approach remains the dominant paradigm in computer vision. This thesis investigates the use of deep neural networks on foveated images, aiming to determine whether foveated vision can improve the ability of such systems to classify challenging datasets comprised of natural images.

In Chapter 1, we outline the motivations for exploring foveated vision in conjunction with deep neural networks, the research gaps, and the corresponding questions that we aim to answer through this thesis. Furthermore, we provide an overview of biological vision processes, computational models of foveated vision, and the relationship between foveated vision systems and the active vision paradigm.

In Chapter 2, we explore the application of convolutional neural networks (CNNs) to foveated images. Prior works have frequently shown that foveated sampling does not improve the accuracy of CNNs. Motivated by this observation, we analyse the implications of convolutional processing of foveated images through the lens of geometric deep learning. We hypothesise that the application of CNNs to foveated images often requires imposing a suboptimal coordinate frame for representing foveated image data, inhibiting classification accuracy. We test this hypothesis through a novel graph convolution layer that allows for coordinate frames to be freely defined. We show that the classification accuracy of a foveated CNN is highly sensitive to the choice of coordinate frame.

In Chapter 3, we expand upon the studies conducted in Chapter 2 and explore foveated CNNs in the presence of visual attention to guide the sensor. We propose a two-stage approach where a separate CNN first localises objects, informing the foveated classifier where to centre its gaze. Empirical results corroborate the findings of the previous chapter on the importance of coordinate frames. Furthermore, our novel graph convolution layer allows us to build a foveated CNN that significantly outperforms a uniform CNN under an equivalent pixel budget. Furthermore, we propose a novel foveated sensor with a parameterised sampling layout. We show the sensitivity of classification accuracy to this parameterisation and find that having smaller higher-resolution foveae for sensors with fewer pixels is favourable.

In Chapter 4, we conduct studies similar to those in Chapter 3, but in the context of non-convolutional models such as vision transformers. We propose a simple reformulation of image tokenisation to a foveated setting. We also show how the sampling layout of this method can be optimised by backpropagation using only gradients from a classification loss. We show that foveated sensing can improve the classification accuracy of these models and is increasingly beneficial as the number of total pixels in the sensor decreases. Furthermore, we explore the parameterisation of the sampling layout and how the optimal configuration is related to the properties of the data itself. We show that as the range in the scale of objects increases, it becomes increasingly beneficial to have smaller, higher resolution fovea in order to classify objects accurately at all scales.

Chapter 5 explores a sequential approach for foveated vision systems, where they can repeatedly attend to an image. For each observation, feature vectors are computed using a foveated CNN, integrated into a single representation, and used as input to a classifier. Despite using only a single dedicated convolution layer to implement attention, we show that these models can perform as well as a two-stage method where a dedicated CNN is used to perform attention. Furthermore, we show that classification accuracy increases the more times the model attends to an image and that a simple averaging approach suffices for integrating information from multiple observations. Finally, we explore an architecture based on vision transformers that maintains a memory of previous observations in all hidden layers. We show that Legendre Memory Units can effectively replace self-attention and allow such a system to run in O(1) time complexity, as opposed to self-attention’s O(N) complexity, where N is the number of previous observations.

In Chapter 6, we summarise the contributions made in this thesis in relation to the research questions we set out to answer and provide several avenues for future work that can further the field of foveated vision.

This work was supported by the Engineering and Physical Sciences Research Council, grant number 2443519, and has appeared in the following papers:
1. Killick, G., Aragon-Camarasa, G. and Siebert, J.P., 2022. Monte-Carlo Convolutions on Foveated Images. (VISAPP2022)
2. Killick, G., Henderson, P., Siebert, P. and Aragon-Camarasa, G., 2023. Foveation in the Era of Deep Learning. (BMVC2023)

Item Type:	Thesis (PhD)
Qualification Level:	Doctoral
Subjects:	Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Colleges/Schools:	College of Science and Engineering > School of Computing Science
Funder's Name:	Engineering and Physical Sciences Research Council (EPSRC)
Supervisor's Name:	Aragon Camarasa, Dr. Gerardo and Siebert, Dr. Paul
Date of Award:	2025
Depositing User:	Theses Team
Unique ID:	glathesis:2025-85208
Copyright:	Copyright of this thesis is held by the author.
Date Deposited:	18 Jun 2025 10:26
Last Modified:	18 Jun 2025 10:28
Thesis DOI:	10.5525/gla.thesis.85208
URI:	https://theses.gla.ac.uk/id/eprint/85208
Related URLs:	Conference proceeding Conference proceeding

Actions (login required)

View Item

Downloads

Downloads per month over past year

Tools

Enlighten Theses

Image classification with foveated neural networks

Abstract

Actions (login required)

Downloads

Library