Speech-based automatic depression detection via biomarkers identification and artificial intelligence approaches

Tao, Fuxiang (2024) Speech-based automatic depression detection via biomarkers identification and artificial intelligence approaches. PhD thesis, University of Glasgow.

Full text available as:
[thumbnail of 2023TaoPhD.pdf] PDF
Download (4MB)


Depression has become one of the most prevalent mental health issues, affecting more than 300 million people all over the world. However, due to factors such as limited medical resources and accessibility to health care, there are still a large number of patients undiagnosed. In addition, the traditional approaches to depression diagnosis have limitations because they are usually time-consuming, and depend on clinical experience that varies across different clinicians. From this perspective, the use of automatic depression detection can make the diagnosis process much faster and more accessible. In this thesis, we present the possibility of using speech for automatic depression detection. This is based on the findings in neuroscience that depressed patients have abnormal cognition mechanisms thus leading to the speech differs from that of healthy people.

Therefore, in this thesis, we show two ways of benefiting from automatic depression detection, i.e., identifying speech markers of depression and constructing novel deep learning models to improve detection accuracy.
The identification of speech markers tries to capture measurable depression traces left in speech. From this perspective, speech markers such as speech duration, pauses and correlation matrices are proposed. Speech duration and pauses take speech fluency into account, while correlation matrices represent the relationship between acoustic features and aim at capturing psychomotor retardation in depressed patients. Experimental results demonstrate that these proposed markers are effective at improving the performance in recognizing depressed speakers. In addition, such markers show statistically significant differences between depressed patients and non-depressed individuals, which explains the possibility of using these markers for depression detection and further confirms that depression leaves detectable traces in speech.

In addition to the above, we propose an attention mechanism, Multi-local Attention (MLA), to emphasize depression-relevant information locally. Then we analyse the effectiveness of MLA on performance and efficiency. According to the experimental results, such a model can significantly improve performance and confidence in the detection while reducing the time required for recognition. Furthermore, we propose Cross-Data Multilevel Attention (CDMA) to emphasize different types of depression-relevant information, i.e., specific to each type of speech and common to both, by using multiple attention mechanisms. Experimental results demonstrate that the proposed model is effective to integrate different types of depression-relevant information in speech, improving the performance significantly for depression detection.

Item Type: Thesis (PhD)
Qualification Level: Doctoral
Subjects: R Medicine > RC Internal medicine > RC0321 Neuroscience. Biological psychiatry. Neuropsychiatry
T Technology > TK Electrical engineering. Electronics Nuclear engineering
Colleges/Schools: College of Science and Engineering > School of Engineering
Supervisor's Name: Vinciarelli, Professor Alessandro
Date of Award: 2024
Depositing User: Theses Team
Unique ID: glathesis:2024-84055
Copyright: Copyright of this thesis is held by the author.
Date Deposited: 05 Feb 2024 08:50
Last Modified: 05 Feb 2024 08:52
Thesis DOI: 10.5525/gla.thesis.84055
URI: https://theses.gla.ac.uk/id/eprint/84055
Related URLs:

Actions (login required)

View Item View Item


Downloads per month over past year