The 3ntr0py from 3nthusia5t

3ntr0py's Introduction

Utilizing CUDA + Numba to calculate entropy.

Normally entropy is calculated using the solution below. The Numba + CUDA solution is around 10% faster than this for single file and up to 300 times faster for multiple files (on my equipment - NVIDIA 3060).

from scipy.stats import entropy
import numpy as np

def entropy(labels, base=None):
  labels = np.frombuffer(labels, dtype=np.uint8)

  value,counts = np.unique(labels, return_counts=True)
  return entropy(counts, base=2)

Still in development

Goal

Quickly calculate entropy of over 200k (110 GB) malware samples without using any CPU multiprocessing. It took 10522.091444253922 seconds to complete the processing of all 200k malware samples (110GB). The malware was stored on a network attached storage, which has greatly impacted the I/O performance.

By applying CPU multiprocessing I was able to maximize usage of my computer resources and process the 110 GB in around 3600 seconds (1 hour). The data was stored on the network attached storage.

Testing

Currently, tests cannot be performed on the Github actions as there is no Nvidia GPU available. If it will be possible, I will create a self-hosted runner in the future.

Remarks

Code is not optimized and cleaned yet.

3ntr0py's People

Contributors

Watchers

3ntr0py's Issues

Make sure that it works for differential entropy

The code should be able to also calculate differnetial entropy on GPU.

Create a github actions workflow to test code

The tests should be executed in the workflow.

Upload samples, benchmark

Create samples directory
Create benchmark comparing to some common solutions and various numbers of files.

Create tests

Create tests to ensure that entropy is estimated properly.

Write proper sum_array function instead of using np.sum()

The idea for sum_array is to limit the sequencial number of operations to minimum.

Example illustrating the idea:

[ 1, 1, 1, 1, 0, 0]

will be changed to

sum [1, 1] - thread 0
sum [1, 1] - thread 1
sum [0, 0] - thread 2

those calculations result in [2, 2, 0], which then:

sum [2, 2] - thread 0
sum [0] - thread 1

Result: [4, 0]

sum [4, 0] - thread 0 which results in 4

Conclusion:

In best case scenario that would double the performace of numpy sum() function, which sum array sequentially. The CUDA solution can be slower for arrays with 3 elements, because of the overhead (for example copying memory host <-> device ). Despite that fact, I think it may be beneficial for the performance, since the goal is to process file samples, which are at least 97 bytes (http://www.phreedom.org/research/tinype/).

Recommend Projects