When a few words are not enough: improving text classification through contextual information

Aloshban, Nujud (2021) When a few words are not enough: improving text classification through contextual information. PhD thesis, University of Glasgow.

Full text available as:
[img] PDF
Download (3MB)

Abstract

Traditional text classification approaches may be ineffective when applied to texts with insufficient or limited number of words due to brevity of text and sparsity of feature space. The lack of contextual information can make texts ambiguous; hence, text classification approaches relying solely on words may not properly capture the critical features of a real-world problem. One of the popular approaches to overcoming this problem is to enrich texts with additional domain-specific features. Thus, this thesis shows how it can be done in two realworld problems in which text information alone is insufficient for classification. While one problem is depression detection based on the automatic analysis of clinical interviews, another problem is detecting fake online news. Depression profoundly affects how people behave, perceive, and interact. Language reveals our ideas, moods, feelings, beliefs, behaviours and personalities. However, because of inherent variations in the speech system, no single cue is sufficiently discriminative as a sign of depression on its own. This means that language alone may not be adequate for understanding a person’s mental characteristics and states. Therefore, adding contextual information can properly represent the critical features of texts. Speech includes both linguistic content (what people say) and acoustic aspects (how words are said), which provide important clues about the speaker’s emotional, physiological and mental characteristics. Therefore, we study the possibility of effectively detecting depression using unobtrusive and inexpensive technologies based on the automatic analysis of language (what you say) and speech (how you say it). For fake news detection, people seem to use their cognitive abilities to hide information, which induces behavioural change, thereby changing their writing style and word choices. Therefore, the spread of false claims has polluted the web. However, the claims are relatively short and include limited content. Thus, capturing only text features of the claims will not provide sufficient information to detect deceptive claims. Evidence articles can help support the factual claim by representing the central content of the claim more authentically. Therefore, we propose an automated credibility assessment approach based on linguistic analysis of the claim and its evidence articles.

Item Type: Thesis (PhD)
Qualification Level: Doctoral
Colleges/Schools: College of Science and Engineering > School of Computing Science
Supervisor's Name: Vinciarelli, Prof. Alessandro
Date of Award: 2021
Depositing User: Theses Team
Unique ID: glathesis:2021-82571
Copyright: Copyright of this thesis is held by the author.
Date Deposited: 23 Nov 2021 15:31
Last Modified: 08 Apr 2022 17:08
Thesis DOI: 10.5525/gla.thesis.82571
URI: http://theses.gla.ac.uk/id/eprint/82571
Related URLs:

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year