GaitTriViT and GaitVViT: transformer-based methods emphasizing spatial or temporal aspects in Gait Recognition

Sheng, Hongyun (2024) GaitTriViT and GaitVViT: transformer-based methods emphasizing spatial or temporal aspects in Gait Recognition. MSc(R) thesis, University of Glasgow.

Full text available as:
[thumbnail of 2023ShengMSc(R).pdf] PDF
Download (1MB)

Abstract

In image recognition tasks, subjects with long distance and low resolution remains a challenge, whereas Gait Recognition, identifying subjects by walking patterns, is considered one of the most promising biometric technologies due to the stability and efficiency. Previous Gait Recognition methods mostly focused on constructing a sophisticated model structure to better extract spatial and temporal features from frame sequences, aiming to increase the distinctiveness between different feature representations for better model performance during evaluation. Moreover, these methods primarily based on traditional Convolutional Neural Networks (CNNs) due to the dominance of CNNs in Computer Vision.

However, since the alternative form of Transformer, named Vision Transformer, which originally has a wide application in Natural Language Processing (NLP), has introduced into Computer Vision field, the Vision Transformer has gained a strong attention by the outstanding performance in various tasks. Thus, unlike previous methods mainly based on Convolutional Neural Networks (CNNs), this project introduces two Transformer-based method: a completely Vision Transformer-based gait recognition method GaitTriViT and a Video Vision Transformer-based method GaitVViT. The GaitTriViT leveraging Vision Transformer to gain more fine-grained spatial features, while GaitVViT enhances the capacity of temporal extraction. This work evaluates their performances on two of the most popular benchmarks. The results show the still-existing gaps, and several encouraging outperforms compared with current State-of-the-Art (SOTA), demonstrating the difficulties and challenges these Transformer-based methods will encounter continuously. But I still believe in the promising future of Vision Transformers in the field of Gait Recognition.

Item Type: Thesis (MSc(R))
Qualification Level: Masters
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Colleges/Schools: College of Science and Engineering > School of Computing Science
Supervisor's Name: Mahmoud, Dr. Marwa
Date of Award: 2024
Depositing User: Theses Team
Unique ID: glathesis:2024-84475
Copyright: Copyright of this thesis is held by the author.
Date Deposited: 22 Jul 2024 15:46
Last Modified: 22 Jul 2024 15:46
Thesis DOI: 10.5525/gla.thesis.84475
URI: https://theses.gla.ac.uk/id/eprint/84475

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year