Real-time event detection using Twitter

McMinn, Andrew James (2018) Real-time event detection using Twitter. PhD thesis, University of Glasgow.

Full text available as:
[thumbnail of 2018McMinnPhD.pdf] PDF
Download (899kB)
Printed Thesis Information: https://eleanor.lib.gla.ac.uk/record=b3335434

Abstract

Twitter has become the social network of news and journalism. Monitoring what is said on Twitter is a frequent task for anyone who requires timely access to information: journalists, traders, and the emergency services have all invested heavily in monitoring Twitter in recent years. Given this, there is a need to develop systems that can automatically monitor Twitter to detect real-world events as they happen, and alert users to novel events. However, this is not an easy task due to the noise and volume of data that is produced from social media streams such as Twitter. Although a range of approaches have been developed, many are unevaluated, cannot scale past low volume streams, or can only detect specific types of event.

In this thesis, we develop novel approaches to event detection, and enable the evaluation and comparison of event detection approaches by creating a large-scale test collection called Events 2012, containing 120 million tweets and with relevance judgements for over 500 events. We use existing event detection approaches and Wikipedia to generate candidate events, then use crowdsourcing to gather annotations.

We propose a novel entity-based, real-time, event detection approach that we evaluate using the Events 2012 collection, and show that it outperforms existing state-of-the-art approaches to event detection whilst also being scalable. We examine and compare automated and crowdsourced evaluation methodologies for the evaluation of event detection.

Finally, we propose a Newsworthiness score that is learned in real-time from heuristically labelled data. The score is able to accurately classify individual tweets as newsworthy or noise in real-time. We adapt the score for use as a feature for event detection, and find that it can easily be used to filter out noisy clusters and improve existing event detection techniques.

We conclude with a summary of our research findings and answers to our research questions. We discuss some of the difficulties that remain to be solved in event detection on Twitter and propose some possible future directions for research into real-time event detection on Twitter.

Item Type: Thesis (PhD)
Qualification Level: Doctoral
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Colleges/Schools: College of Science and Engineering > School of Computing Science
Funder's Name: Engineering and Physical Sciences Research Council (EPSRC), Engineering and Physical Sciences Research Council (EPSRC)
Supervisor's Name: Jose, Professor Joemon
Date of Award: 2018
Depositing User: Dr Andrew James McMinn
Unique ID: glathesis:2018-38990
Copyright: Copyright of this thesis is held by the author.
Date Deposited: 08 Jan 2019 14:35
Last Modified: 11 Feb 2019 11:46
URI: https://theses.gla.ac.uk/id/eprint/38990
Related URLs:

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year