Interpreting intermediate feature representations of raw-waveform deep CNNs by sonification

Yadav, Sarthak (2022) Interpreting intermediate feature representations of raw-waveform deep CNNs by sonification. MSc(R) thesis, University of Glasgow.

Full text available as:
[img] PDF
Download (14MB)


The majority of the recent works that address the interpretability of raw waveform based deep neural networks (DNNs) for audio processing focus on interpreting spectral and frequency response information, often limiting to visual and signal theoretic means of interpretation, solely for the first layer. This work proposes sonification, a method to interpret intermediate feature representations of sound event recognition (SER) 1D-convolutional neural networks (1D-CNNs) trained on raw waveforms by mapping these representations back into the discrete-time input signal domain, highlighting substructures in the input that maximally activate a feature map as intelligible acoustic events. Sonification is used to compare supervised and contrastive self-supervised feature representations, observing how the latter learn more acoustically discernible representations, especially in the deeper layers. A metric to quantify acoustic similarity between the interpretations and their corresponding inputs is proposed, and a layer-by-layer analysis of the trained feature representations using this metric supports the observations made.

Item Type: Thesis (MSc(R))
Qualification Level: Masters
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Colleges/Schools: College of Science and Engineering > School of Computing Science
Supervisor's Name: Foster, Dr. Mary Ellen
Date of Award: 2022
Depositing User: Theses Team
Unique ID: glathesis:2022-82820
Copyright: Copyright of this thesis is held by the author.
Date Deposited: 20 Apr 2022 10:48
Last Modified: 20 Apr 2022 10:50
Thesis DOI: 10.5525/gla.thesis.82820

Actions (login required)

View Item View Item


Downloads per month over past year