System Call Processing Using Lightweight NLP for IoT Behavioral Malware Detection

by John Carter, Spiros Mancoridis, Malvin Nkomo, Steven Weber and Kapil R. Dandekar

Abstract:

Although much of the work in behaviorally detecting malware lies in collecting the best explanatory data and using the most efficacious machine learning models, the processing of the data can sometimes prove to be the most important step in the data pipeline. In this work, we collect kernel-level system calls on a resource-constrained Internet of Things (IoT) device, apply lightweight Natural Language Processing (NLP) techniques to the data, and feed this processed data to two simple machine learning classification models: Logistic Regression (LR) and a Neural Network (NN). For the data processing, we group the system calls into n-grams that are sorted by the timestamp in which they are recorded. To demonstrate the effectiveness, or lack thereof, of using n-grams, we deploy two types of malware onto the IoT device: a Denial-of-Service (DoS) attack, and an Advanced Persistent Threat (APT) malware. We examine the effects of using lightweight NLP on malware like the DoS and the stealthy APT malware. For stealthier malware, such as the APT, using more advanced, but far more resource-intensive, NLP techniques will likely increase detection capability, which is saved for future work.

Reference:

System Call Processing Using Lightweight NLP for IoT Behavioral Malware Detection (John Carter, Spiros Mancoridis, Malvin Nkomo, Steven Weber and Kapil R. Dandekar), In Ubiquitous Security (Wang, Guojun, Choo, Kim-Kwang Raymond, Wu, Jie, Damiani, Ernesto, eds.), Springer Nature Singapore, 2023.

Bibtex Entry:

@InProceedings{10.1007/978-981-99-0272-9_7,
author="John Carter and Spiros Mancoridis and Malvin Nkomo and Steven Weber and Kapil R. Dandekar",
editor="Wang, Guojun
and Choo, Kim-Kwang Raymond
and Wu, Jie
and Damiani, Ernesto",
title="System Call Processing Using Lightweight {NLP} for {IoT} Behavioral Malware Detection",
booktitle="Ubiquitous Security",
year="2023",
doi={https://doi.org/10.1007/978-981-99-0272-9_7},
publisher="Springer Nature Singapore",
address="Singapore",
pages="103--115",
abstract="Although much of the work in behaviorally detecting malware lies in collecting the best explanatory data and using the most efficacious machine learning models, the processing of the data can sometimes prove to be the most important step in the data pipeline. In this work, we collect kernel-level system calls on a resource-constrained Internet of Things (IoT) device, apply lightweight Natural Language Processing (NLP) techniques to the data, and feed this processed data to two simple machine learning classification models: Logistic Regression (LR) and a Neural Network (NN). For the data processing, we group the system calls into n-grams that are sorted by the timestamp in which they are recorded. To demonstrate the effectiveness, or lack thereof, of using n-grams, we deploy two types of malware onto the IoT device: a Denial-of-Service (DoS) attack, and an Advanced Persistent Threat (APT) malware. We examine the effects of using lightweight NLP on malware like the DoS and the stealthy APT malware. For stealthier malware, such as the APT, using more advanced, but far more resource-intensive, NLP techniques will likely increase detection capability, which is saved for future work.",
isbn="978-981-99-0272-9"
}