Investigating the neural mechanisms underlying audio-visual perception using electroencephalography (EEG)

Boyle, Stephanie Claire (2018) Investigating the neural mechanisms underlying audio-visual perception using electroencephalography (EEG). PhD thesis, University of Glasgow.

Full text available as:
[thumbnail of 2018BoylePhD.pdf] PDF
Available under License Creative Commons Attribution.

Download (16MB)
Printed Thesis Information:


Traditionally research into how we perceive our external world focused on the unisensory approach, examining how information is processed by one sense at a time. This produced a vast literature of results revealing how our brains process information from the different senses, from fields such as psychophysics, animal electrophysiology, and neuroimaging. However, we know from our own experiences that we use more than one sense at a time to understand our external world. Therefore to fully understand perception, we must understand not only how the brain processes information from individual sensory modalities, but also how and when this information interacts and combines with information from other modalities. In short, we need to understand the phenomenon of multisensory perception.

The work in this thesis describes three experiments aimed to provide new insights into this topic. Specifically, the three experiments presented here focused on examining when and where effects related to multisensory perception emerged in neural signals, and whether or not these effects could be related to behaviour in a time-resolved way and on a trial-by-trial basis. These experiments were carried out using a novel combination of psychophysics, high density electroencephalography (EEG), and advanced computational methods (linear discriminant analysis and mutual information analysis).

Experiment 1 (Chapter 3) investigated how behavioural and neural signals are modulated by the reliability of sensory information. Previous work has shown that subjects will weight sensory cues in proportion to their relative reliabilities; high reliability cues are assigned a higher weight and have more influence on the final perceptual estimate, while low reliability cues are assigned a lower weight and have less influence. Despite this widespread finding, it remains unclear when neural correlates of sensory reliability emerge during a trial, and whether or not modulations in neural signals due to reliability relate to modulations in behavioural reweighting. To investigate these questions we used a combination of psychophysics, EEG-based neuroimaging, single-trial decoding, and regression modelling. Subjects performed an audio-visual rate discrimination task where the modality (auditory, visual, audio-visual), stimulus stream rate (8 to 14 Hz), visual reliability (high/low), and congruency in rate between audio-visual stimuli (± 2 Hz) were systematically manipulated. For the behavioural and EEG components (derived using linear discriminant analysis), a set of perceptual and neural weights were calculated for each time point. The behavioural results revealed that participants weighted sensory information based on reliability: as visual reliability decreased, auditory weighting increased. These modulations in perceptual weights emerged early after stimulus onset (48 ms). The EEG data revealed that neural correlates of sensory reliability and perceptual weighting were also evident in decoding signals, and that these occurred surprisingly early in the trial (84 ms). Finally, source localisation suggested that these correlates originated in early sensory (occipital/temporal) and parietal regions respectively. Overall, these results provide the first insights into the temporal dynamics underlying human cue weighting in the brain, and suggest that it is an early, dynamic, and distributed process in the brain.

Experiment 2 (Chapter 4) expanded on this work by investigating how oscillatory power was modulated by the reliability of sensory information. To this end, we used a time-frequency approach to analyse the data collected for the work in Chapter 3. Our results showed that significant effects in the theta and alpha bands over fronto-central regions occurred during the same early time windows as a shift in perceptual weighting (100 ms and 250 ms respectively). Specifically, we found that theta power (4 - 6 Hz) was lower and alpha power (10 – 12 Hz) was higher in audio-visual conditions where visual reliability was low, relative to conditions where visual reliability was high. These results suggest that changes in oscillatory power may underlie reliability based cue weighting in the brain, and that these changes occur early during the sensory integration process.

Finally, Experiment 3 (Chapter 5) moved away from examining reliability based cue weighting and focused on investigating cases where spatially and temporally incongruent auditory and visual cues interact to affect behaviour. Known collectively as “cross-modal associations”, past work has shown that observers have preferred and non-preferred stimuli pairings. For example, subjects will frequently pair high pitched tones with small objects and low pitched tones with large objects. However it is still unclear when and where these associations are reflected in neural signals, and whether they emerge at an early perceptual level or later decisional level. To investigate these questions we used a modified version of the implicit association test (IAT) to examine the modulation of behavioural and neural signals underlying an auditory pitch – visual size cross modal association. Congruency was manipulated by assigning two stimuli (one auditory and one visual) to each of the left or right response keys and changing this assignment across blocks to create congruent (left key: high tone – small circle, right key: low tone – large circle) and incongruent (left key: low tone – small circle, right key: high tone – large circle) pairings of stimuli. On each trial, subjects were presented with only one of the four stimuli (auditory high tone, auditory low tone, visual small circle, visual large circle), and asked to respond which was presented as quickly and accurately as possible. The key assumption with such a design is that subjects should respond faster when associated (i.e. congruent) stimuli are assigned to the same response key than when two non-associated stimuli are. In line with this, our behavioural results demonstrated that subjects responded faster on blocks where congruent pairings of stimuli were assigned to the response keys (high pitch-small circle and low pitch large circle), than blocks where incongruent pairings were. The EEG results demonstrated that information about auditory pitch and visual size could be extracted from neural signals using two approaches to single-trial analysis (linear discriminant analysis and mutual information analysis) early during the trial (50ms), with the strongest information contained over posterior and temporal electrodes for auditory trials, and posterior electrodes for visual trials. EEG components related to auditory pitch were significantly modulated by cross-modal congruency over temporal and frontal regions early in the trial (~100ms), while EEG components related to visual size were modulated later (~220ms) over frontal and temporal electrodes. For the auditory trials, these EEG components were significantly predictive of single trial reaction times, yet for the visual trials the components were not. As a result, the data support an early and short-latency origin of cross-modal associations, and suggest that these may originate in a bottom-up manner during early sensory processing rather than from high-level inference processes. Importantly, the findings were consistent across both analysis methods, suggesting these effects are robust.

To summarise, the results across all three experiments showed that it is possible to extract meaningful, single-trial information from the EEG signal and relate it to behaviour on a time resolved basis. As a result, the work presented here steps beyond previous studies to provide new insights into the temporal dynamics of audio-visual perception in the brain. All experiments, although employing different paradigms and investigating different processes, showed early neural correlates related to audio-visual perception emerging in neural signals across early sensory, parietal, and frontal regions. Together, these results provide support for the prevailing modern view that the entire cortex is essentially multisensory and that multisensory effects can emerge at all stages during the perceptual process.

Item Type: Thesis (PhD)
Qualification Level: Doctoral
Keywords: multisensory, audio-visual, EEG, perception, multivariate analysis, single-trial decoding.
Subjects: B Philosophy. Psychology. Religion > BF Psychology
H Social Sciences > H Social Sciences (General)
Q Science > Q Science (General)
Colleges/Schools: College of Medical Veterinary and Life Sciences > School of Psychology & Neuroscience
Supervisor's Name: Kayser, Prof. Christoph and Schyns, Prof. Philippe
Date of Award: 2018
Unique ID: glathesis:2018-8874
Copyright: Copyright of this thesis is held by the author.
Date Deposited: 22 Mar 2018 10:04
Last Modified: 08 Dec 2023 16:12

Actions (login required)

View Item View Item


Downloads per month over past year