This repository contains the code used and developed during a master thesis at DTU Compute in 2018.
Professor Ole Winther has been supervisor for this master thesis.
Alex Omø Agerholm from Napatech has been co-supervisor for this project.
In this thesis we examined and evaluated different ways of classifying encrypted network traffic by use of neural networks. For this purpose we created a dataset with a streaming/non-streaming focus. The dataset comprises seven different classes, five streaming and two non-streaming. The thesis serves as a preliminary proof-of-concept for Napatech A/S.
We propose a novel approach where the unencrypted parts of network traffic, namely the headers are utilized. This is done by concatenating the initial headers from a session thus forming a signature datapoint as shown in the following figure:
The datasets created by use of the first 8 and 16 headers are available in the datasets folder in this repository. We explored the dataset by running t-SNE on the concatenated headers dataset. As can be seen in the t-SNE plot below, which shows all the individual datasets merged, it seems possible to perform classification of individual classes.
In experiments using the header-based approach we achieve very promising results, showing that a simple neural network with a single hidden layer of less than 50 units, can predict the individual classes with an accuracy of 96.4% and an AUC of 0.99 to 1.00 for the individual classes, as shown in the following figures.
The thesis hereby provides a solution to network traffic classification using the unencrypted headers.