Automatic role recognition

Salamin, Hugues Eric (2013) Automatic role recognition. PhD thesis, University of Glasgow.

Full text available as:
[img]
Preview
PDF
Download (794kB) | Preview

Abstract

The computing community is making significant efforts towards the development of automatic approaches for the analysis of social interactions. The way people interact depends on the context, but there is one aspect that all social interactions seem to have in common: humans behave according to roles. Therefore, recognizing the roles of participants is an essential step towards understanding social interactions and the construction of socially aware computer. This thesis addresses the problem of automatically recognizing roles of participants in multi-party recordings. The objective is to assign to each participant a role. All the proposed approaches use a similar strategy. They all start by segmenting the audio into turns. Those turns are used as basic analysis units. The next step is to extract features accounting for the organization of turns. The more sophisticated approaches extend the features extracted with features from either the prosody or the semantic. Finally, the mapping of people or turns to roles is done using statistical models. The goal of this thesis is to gain a better understanding of role recognition and we will investigate three aspects that can influence the performance of the system: We investigate the impact of modelling the dependency between the roles. We investigate the contribution of different modalities for the effectiveness of role recognition approach. We investigate the effectiveness of the approach for different scenarios. Three models are proposed and tested on three different corpora totalizing more than 90 hours of audio. The first contribution of this thesis is to investigate the combination of turn-taking features and semantic information for role recognition, improving the accuracy of role recognition from a baseline of 46.4% to 67.9% on the AMI meeting corpus. The second contribution is to use features extracted from the prosody to assign roles. The performance of this model is 89.7% on broadcast news and 87.0% on talk-shows. Finally, the third contribution is the development of a model robust to change in the social setting. This model achieved an accuracy of 86.7% on a database composed of a mixture of broadcast news and talk-shows.

Item Type: Thesis (PhD)
Qualification Level: Doctoral
Keywords: Social Signal Processing, Automatic Role Recognition
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Colleges/Schools: College of Science and Engineering > School of Computing Science
Funder's Name: UNSPECIFIED
Supervisor's Name: Vinciarelli, Dr. Alsessandro
Date of Award: 2013
Depositing User: Mr Hugues Salamin
Unique ID: glathesis:2013-4367
Copyright: Copyright of this thesis is held by the author.
Date Deposited: 11 Jun 2013 08:34
Last Modified: 11 Jun 2013 08:34
URI: http://theses.gla.ac.uk/id/eprint/4367

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year