Detecting worm mutations using machine learning

Sharma, Oliver (2008) Detecting worm mutations using machine learning. PhD thesis, University of Glasgow.

Full text available as:
[thumbnail of 2008sharmaphd.pdf] PDF
Download (4MB)
Printed Thesis Information:


Worms are malicious programs that spread over the Internet without human intervention. Since worms generally spread faster than humans can respond, the only viable defence is to automate their detection.

Network intrusion detection systems typically detect worms by examining packet or flow logs for known signatures. Not only does this approach mean that new worms cannot be detected until the corresponding signatures are created, but that mutations of known worms will remain undetected because each mutation will usually have a different signature. The intuitive and seemingly most effective solution is to write more generic signatures, but this has been found to increase false alarm rates and is thus impractical.

This dissertation investigates the feasibility of using machine learning to automatically detect mutations of known worms. First, it investigates whether Support Vector Machines can detect mutations of known worms.
Support Vector Machines have been shown to be well suited to pattern recognition tasks such as text categorisation and hand-written digit recognition. Since detecting worms is effectively a pattern recognition problem, this work investigates how well Support Vector Machines perform at this task.

The second part of this dissertation compares Support Vector Machines to other machine learning techniques in detecting worm mutations.
Gaussian Processes, unlike Support Vector Machines, automatically return confidence values as part of their result. Since confidence values can be used to reduce false alarm rates, this dissertation determines how Gaussian Process compare to Support Vector Machines in terms of detection accuracy. For further comparison, this work also compares Support Vector Machines to K-nearest neighbours, known for its simplicity and solid results in other domains.

The third part of this dissertation investigates the automatic generation of training data. Classifier accuracy depends on good quality training data -- the wider the training data spectrum, the higher the classifier's accuracy.
This dissertation describes the design and implementation of a worm mutation generator whose output is fed to the machine learning techniques as training data. This dissertation then evaluates whether the training data can be used to train classifiers of sufficiently high quality to detect worm mutations.

The findings of this work demonstrate that Support Vector Machines can be used to detect worm mutations, and that the optimal configuration for detection of worm mutations is to use a linear kernel with unnormalised bi-gram frequency counts. Moreover, the results show that Gaussian Processes and Support Vector Machines exhibit similar accuracy on average in detecting worm mutations, while K-nearest neighbours consistently produces lower quality predictions. The generated worm mutations are shown to be of sufficiently high quality to serve as training data.
Combined, the results demonstrate that machine learning is capable of accurately detecting mutations of known worms.

Item Type: Thesis (PhD)
Qualification Level: Doctoral
Keywords: Worms, worm mutations, machine learning, network intrusion detection
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Q Science > QA Mathematics > QA76 Computer software
Colleges/Schools: College of Science and Engineering > School of Computing Science
Supervisor's Name: Sventek, Prof Joseph
Date of Award: 2008
Depositing User: Mr Oliver Sharma
Unique ID: glathesis:2008-469
Copyright: Copyright of this thesis is held by the author.
Date Deposited: 12 Nov 2008
Last Modified: 10 Dec 2012 13:18

Actions (login required)

View Item View Item


Downloads per month over past year