Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

Getting to the root of the problem: A detailed comparison of kernel and user level data for dynamic malware analysis

Nunes, Matthew ORCID: https://orcid.org/0000-0003-1990-5814, Burnap, Peter ORCID: https://orcid.org/0000-0003-0396-633X, Rana, Omer ORCID: https://orcid.org/0000-0003-3597-2646, Reinecke, Philipp ORCID: https://orcid.org/0000-0002-2411-0891 and Lloyd, Kaelon 2019. Getting to the root of the problem: A detailed comparison of kernel and user level data for dynamic malware analysis. Journal of Information Security and Applications 48 , 102365. 10.1016/j.jisa.2019.102365

[thumbnail of 1-s2.0-S2214212619300109-main.pdf]
Preview
PDF - Published Version
Available under License Creative Commons Attribution.

Download (1MB) | Preview

Abstract

Dynamic malware analysis is fast gaining popularity over static analysis since it is not easily defeated by evasion tactics such as obfuscation and polymorphism. During dynamic analysis it is common practice to capture the system calls that are made to better understand the behaviour of malware. There are several techniques to capture system calls, the most popular of which is a user-level hook. To study the effects of collecting system calls at different privilege levels and viewpoints, we collected data at a process-specific user-level using a virtualised sandbox environment and a system-wide kernel-level using a custom-built kernel driver. We then tested the performance of several state-of-the-art machine learning classifiers on the data. Random Forest was the best performing classifier with an accuracy of 95.2% for the kernel driver and 94.0% at a user-level. The combination of user and kernel level data gave the best classification results with an accuracy of 96.0% for Random Forest. This may seem intuitive but was hitherto not empirically demonstrated. Additionally, we observed that machine learning algorithms trained on data from the user-level tended to use the anti-debug/anti-vm features in malware to distinguish it from benignware. Whereas, when trained on data from our kernel driver, machine learning algorithms seemed to use the differences in the general behaviour of the system to make their prediction, which explains why they complement each other so well. Our results show that capturing data at different privilege levels will affect the classifier's ability to detect malware, with kernel-level providing more utility than user-level for malware classification. Despite this, there exist more established user-level tools than kernel-level tools, suggesting more research effort should be directed at kernel-level. In short, this paper provides the first objective, evidence-based comparison of user and kernel level data for the purposes of malware classification.

Item Type: Article
Date Type: Publication
Status: Published
Schools: Computer Science & Informatics
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Q Science > QA Mathematics > QA76 Computer software
Publisher: Elsevier
ISSN: 2214-2126
Funders: EPSRC
Date of First Compliant Deposit: 30 July 2019
Date of Acceptance: 26 July 2019
Last Modified: 05 Jan 2024 06:55
URI: https://orca.cardiff.ac.uk/id/eprint/124535

Citation Data

Cited 8 times in Scopus. View in Scopus. Powered By Scopus® Data

Actions (repository staff only)

Edit Item Edit Item

Downloads

Downloads per month over past year

View more statistics