netsharecmu / netshare Goto Github PK

View Code? Open in Web Editor NEW

77.0 8.0 21.0 4.4 MB

(SIGCOMM '22) Practical GAN-based Synthetic IP Header Trace Generation using NetShare

Home Page: https://www.pcapshare.com/

License: BSD 3-Clause Clear License

Python 93.06% C 2.87% Shell 4.06%

gan gans gans-models generative-adversarial-network netflow netflow-data netflow-v9 pcap pcap-generator synthetic-data

netshare's People

Contributors

Stargazers

Watchers

netshare's Issues

How to train the model with naive differential privacy?

I'm trying to train the model on a small test dataset with naive differential privacy. I tried to change something in the configuration, but I got empty output.
Here is the changed configuration

{
    "global_config": {
        "original_data_file": "../traces/simple-network/Switch1_Ethernet1_to_PC1_Ethernet0-correct/raw.pcap",
        "dataset_type": "pcap",
        "n_chunks": 1,
        "dp": true
    },
    "model_manager": {
        "class": "NetShareManager",
        "config": {
            "pretrain_non_dp": false,
            "pretrain_non_dp_reduce_time": null,
            "pretrain_dp": false
        }
    },
    "model": {
        "class": "DoppelGANgerTFModel",
        "config": {
            "batch_size": 1,
            "sample_len": [
                1
            ],
            "iteration": 80000,
            "extra_checkpoint_freq": 4000,
            "epoch_checkpoint_freq": 1000,
            "gen_feature_num_layers": 1,
            "gen_feature_num_units": 100,
            "gen_attribute_num_layers": 1,
            "gen_attribute_num_units": 32,
            "disc_num_layers": 1,
            "disc_num_units": 32,
            "attr_disc_num_layers": 1,
            "attr_disc_num_units": 32,
            "dp_noise_multiplier": 0.2797,
            "dp_l2_norm_clip": 1.0
        }
    },
    "default": "pcap.json"
}

I suspect it is partly because I set pretrain_dp=false. But if pretrain_dp=true, I will be asked to provide a model pretrained with public dataset.

[Documentation Improvement] Missing dependency

When I tried to run NetShare in a Docker container, I realized it had an implicit dependency on libpcap. I used this command to install pcap2csv.so:

sudo apt install libpcap-dev

Maybe it would be helpful to add this dependency line in the README file?

RuntimeError during pip install

Hi! I'm following the README.md in branch new_dataset, and after pip3 install -e ., I encountered the following error. Is it a known issue or is there any additional dependency that I need to install? Thanks!

ERROR: Command errored out with exit status 1:
     command: /Users/dorothyko/opt/anaconda3/envs/NetShare/bin/python -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/8z/rhgl1dpd44j80z0vry1zh7k80000gn/T/pip-install-0xh9kpj3/dm-tree_8e1c2948f02b4e8dbaffa8699b00febb/setup.py'"'"'; __file__='"'"'/private/var/folders/8z/rhgl1dpd44j80z0vry1zh7k80000gn/T/pip-install-0xh9kpj3/dm-tree_8e1c2948f02b4e8dbaffa8699b00febb/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /private/var/folders/8z/rhgl1dpd44j80z0vry1zh7k80000gn/T/pip-record-t1fpvnsw/install-record.txt --single-version-externally-managed --compile --install-headers /Users/dorothyko/opt/anaconda3/envs/NetShare/include/python3.6m/dm-tree
         cwd: /private/var/folders/8z/rhgl1dpd44j80z0vry1zh7k80000gn/T/pip-install-0xh9kpj3/dm-tree_8e1c2948f02b4e8dbaffa8699b00febb/
    Complete output (57 lines):
    running install
    running build
    running build_py
    creating build
    creating build/lib.macosx-10.9-x86_64-3.6
    creating build/lib.macosx-10.9-x86_64-3.6/tree
    copying tree/sequence.py -> build/lib.macosx-10.9-x86_64-3.6/tree
    copying tree/__init__.py -> build/lib.macosx-10.9-x86_64-3.6/tree
    copying tree/tree_test.py -> build/lib.macosx-10.9-x86_64-3.6/tree
    copying tree/tree_benchmark.py -> build/lib.macosx-10.9-x86_64-3.6/tree
    running build_ext
    Traceback (most recent call last):
      File "/private/var/folders/8z/rhgl1dpd44j80z0vry1zh7k80000gn/T/pip-install-0xh9kpj3/dm-tree_8e1c2948f02b4e8dbaffa8699b00febb/setup.py", line 77, in _check_build_environment
        subprocess.check_call(['cmake', '--version'])
      File "/Users/dorothyko/opt/anaconda3/envs/NetShare/lib/python3.6/subprocess.py", line 306, in check_call
        retcode = call(*popenargs, **kwargs)
      File "/Users/dorothyko/opt/anaconda3/envs/NetShare/lib/python3.6/subprocess.py", line 287, in call
        with Popen(*popenargs, **kwargs) as p:
      File "/Users/dorothyko/opt/anaconda3/envs/NetShare/lib/python3.6/subprocess.py", line 729, in __init__
        restore_signals, start_new_session)
      File "/Users/dorothyko/opt/anaconda3/envs/NetShare/lib/python3.6/subprocess.py", line 1364, in _execute_child
        raise child_exception_type(errno_num, err_msg, err_filename)
    FileNotFoundError: [Errno 2] No such file or directory: 'cmake': 'cmake'

    The above exception was the direct cause of the following exception:

    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/private/var/folders/8z/rhgl1dpd44j80z0vry1zh7k80000gn/T/pip-install-0xh9kpj3/dm-tree_8e1c2948f02b4e8dbaffa8699b00febb/setup.py", line 155, in <module>
        keywords='tree nest flatten',
      File "/Users/dorothyko/opt/anaconda3/envs/NetShare/lib/python3.6/site-packages/setuptools/__init__.py", line 153, in setup
        return distutils.core.setup(**attrs)
      File "/Users/dorothyko/opt/anaconda3/envs/NetShare/lib/python3.6/distutils/core.py", line 148, in setup
        dist.run_commands()
      File "/Users/dorothyko/opt/anaconda3/envs/NetShare/lib/python3.6/distutils/dist.py", line 955, in run_commands
        self.run_command(cmd)
      File "/Users/dorothyko/opt/anaconda3/envs/NetShare/lib/python3.6/distutils/dist.py", line 974, in run_command
        cmd_obj.run()
      File "/Users/dorothyko/opt/anaconda3/envs/NetShare/lib/python3.6/site-packages/setuptools/command/install.py", line 61, in run
        return orig.install.run(self)
      File "/Users/dorothyko/opt/anaconda3/envs/NetShare/lib/python3.6/distutils/command/install.py", line 545, in run
        self.run_command('build')
      File "/Users/dorothyko/opt/anaconda3/envs/NetShare/lib/python3.6/distutils/cmd.py", line 313, in run_command
        self.distribution.run_command(command)
      File "/Users/dorothyko/opt/anaconda3/envs/NetShare/lib/python3.6/distutils/dist.py", line 974, in run_command
        cmd_obj.run()
      File "/Users/dorothyko/opt/anaconda3/envs/NetShare/lib/python3.6/distutils/command/build.py", line 135, in run
        self.run_command(cmd_name)
      File "/Users/dorothyko/opt/anaconda3/envs/NetShare/lib/python3.6/distutils/cmd.py", line 313, in run_command
        self.distribution.run_command(command)
      File "/Users/dorothyko/opt/anaconda3/envs/NetShare/lib/python3.6/distutils/dist.py", line 974, in run_command
        cmd_obj.run()
      File "/private/var/folders/8z/rhgl1dpd44j80z0vry1zh7k80000gn/T/pip-install-0xh9kpj3/dm-tree_8e1c2948f02b4e8dbaffa8699b00febb/setup.py", line 70, in run
        self._check_build_environment()
      File "/private/var/folders/8z/rhgl1dpd44j80z0vry1zh7k80000gn/T/pip-install-0xh9kpj3/dm-tree_8e1c2948f02b4e8dbaffa8699b00febb/setup.py", line 82, in _check_build_environment
        ) from e
    RuntimeError: CMake must be installed to build the following extensions: _tree
    ----------------------------------------
ERROR: Command errored out with exit status 1: /Users/dorothyko/opt/anaconda3/envs/NetShare/bin/python -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/8z/rhgl1dpd44j80z0vry1zh7k80000gn/T/pip-install-0xh9kpj3/dm-tree_8e1c2948f02b4e8dbaffa8699b00febb/setup.py'"'"'; __file__='"'"'/private/var/folders/8z/rhgl1dpd44j80z0vry1zh7k80000gn/T/pip-install-0xh9kpj3/dm-tree_8e1c2948f02b4e8dbaffa8699b00febb/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /private/var/folders/8z/rhgl1dpd44j80z0vry1zh7k80000gn/T/pip-record-t1fpvnsw/install-record.txt --single-version-externally-managed --compile --install-headers /Users/dorothyko/opt/anaconda3/envs/NetShare/include/python3.6m/dm-tree Check the logs for full command output. ```

ValueError: Variable DoppelGANgerGenerator/attribute_real/layer0/linear/matrix/Adam/ already exists, disallowed.

I have just follow the instructions and run the script driver.py. Here is the error message:

Traceback (most recent call last):
  File "/home/runwei/NetShare/netshare/models/model.py", line 27, in train
    log_folder=log_folder)
  File "/home/runwei/NetShare/netshare/models/doppelganger_tf_model.py", line 176, in _train
    gan.build()
  File "/home/runwei/NetShare/netshare/models/doppelganger_tf/doppelganger.py", line 293, in build
    self.build_loss()
  File "/home/runwei/NetShare/netshare/models/doppelganger_tf/doppelganger.py", line 708, in build_loss
    self.g_loss, var_list=self.generator.trainable_vars
  File "/home/runwei/anaconda3/envs/netshare/lib/python3.6/site-packages/tensorflow_core/python/training/optimizer.py", line 413, in minimize
    name=name)
  File "/home/runwei/anaconda3/envs/netshare/lib/python3.6/site-packages/tensorflow_core/python/training/optimizer.py", line 597, in apply_gradients
    self._create_slots(var_list)
  File "/home/runwei/anaconda3/envs/netshare/lib/python3.6/site-packages/tensorflow_core/python/training/adam.py", line 131, in _create_slots
    self._zeros_slot(v, "m", self._name)
  File "/home/runwei/anaconda3/envs/netshare/lib/python3.6/site-packages/tensorflow_core/python/training/optimizer.py", line 1156, in _zeros_slot
    new_slot_variable = slot_creator.create_zeros_slot(var, op_name)
  File "/home/runwei/anaconda3/envs/netshare/lib/python3.6/site-packages/tensorflow_core/python/training/slot_creator.py", line 190, in create_zeros_slot
    colocate_with_primary=colocate_with_primary)
  File "/home/runwei/anaconda3/envs/netshare/lib/python3.6/site-packages/tensorflow_core/python/training/slot_creator.py", line 164, in create_slot_with_initializer
    dtype)
  File "/home/runwei/anaconda3/envs/netshare/lib/python3.6/site-packages/tensorflow_core/python/training/slot_creator.py", line 74, in _create_slot_var
    validate_shape=validate_shape)
  File "/home/runwei/anaconda3/envs/netshare/lib/python3.6/site-packages/tensorflow_core/python/ops/variable_scope.py", line 1500, in get_variable
    aggregation=aggregation)
  File "/home/runwei/anaconda3/envs/netshare/lib/python3.6/site-packages/tensorflow_core/python/ops/variable_scope.py", line 1243, in get_variable
    aggregation=aggregation)
  File "/home/runwei/anaconda3/envs/netshare/lib/python3.6/site-packages/tensorflow_core/python/ops/variable_scope.py", line 567, in get_variable
    aggregation=aggregation)
  File "/home/runwei/anaconda3/envs/netshare/lib/python3.6/site-packages/tensorflow_core/python/ops/variable_scope.py", line 519, in _true_getter
    aggregation=aggregation)
  File "/home/runwei/anaconda3/envs/netshare/lib/python3.6/site-packages/tensorflow_core/python/ops/variable_scope.py", line 868, in _get_single_variable
    (err_msg, "".join(traceback.format_list(tb))))
ValueError: Variable DoppelGANgerGenerator/attribute_real/layer0/linear/matrix/Adam/ already exists, disallowed. Did you mean to set reuse=True or reuse=tf.AUTO_REUSE in VarScope? Originally defined at:

  File "/home/runwei/anaconda3/envs/netshare/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 1748, in __init__
    self._traceback = tf_stack.extract_stack()
  File "/home/runwei/anaconda3/envs/netshare/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 3426, in _create_op_internal
    op_def=op_def)
  File "/home/runwei/anaconda3/envs/netshare/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 3357, in create_op
    attrs, op_def, compute_device)
  File "/home/runwei/anaconda3/envs/netshare/lib/python3.6/site-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/home/runwei/anaconda3/envs/netshare/lib/python3.6/site-packages/tensorflow_core/python/framework/op_def_library.py", line 794, in _apply_op_helper
    op_def=op_def)

Running driver.py problem

File "/lv_scratch/scratch/zirul13/trans/NetShare/netshare/pre_post_processors/netshare/word2vec_embedding.py", line 31, in custom_load
return CustomUnpickler(f).load()
File "mtrand.pyx", line 180, in numpy.random.mtrand.RandomState.init
TypeError: init() takes at most 1 positional argument (2 given)

Configuaration file and driver code for Wikipedia webtraffic dataset

Could you share the configuration file and driver code for wikipedia webtraffic dataset?

Did you rename the attributes of UGR' 16 and CIDDS dataset ?

Hello,
I'm wondering if you rename the attribute of the .csv dataset in your preprocessing step.
For CIDDS, the programs netshare_pre_post_processor.py seems to be looking up to a column named 'td' as well as 'ts'. Both these attributes are not present in the original .csv.
For UGR'16, the .csv given by the team has no column name, did you decide to name them ? If yes, how (wich name correspond to which column number) ?

Could you please give a quick explanation of all the attributes/column your program is supposed to use, so one can change its dataset accordingly ?

Missing "default" config of the pcap example

When I was running examples/driver.py using generator = Generator(config=""pcap/config_example_pcap_nodp.json), an error occured:

Traceback (most recent call last):
  File "driver.py", line 10, in <module>
    generator = Generator(config="pcap/config_example_pcap_nodp.json")
  File "/home/xinyu/NetShare/netshare/generators/generator.py", line 39, in __init__
    self._overwrite = global_config['overwrite']
  File "/home/xinyu/anaconda3/envs/NetShare/lib/python3.6/site-packages/config_io/config.py", line 12, in __mi    ssing__
    raise AttributeError(key)
AttributeError: overwrite

After some debugging, I found out it's likely because the config file of pcap examples/config_example_pcap_nodp.json misses a "default" a rgument, thus making the config_io failed to read the default setting.

The config file currently is

 {
     "global_config": {
         "original_data_file": "../traces/caida/raw.pcap",
         "dataset_type": "pcap",
         "n_chunks": 10,
         "dp": false
     }
}

It should be

 {
     "global_config": {
         "original_data_file": "../traces/caida/raw.pcap",
         "dataset_type": "pcap",
         "n_chunks": 10,
         "dp": false
     },
     "default": "pcap.json"
}

FileNotFoundError: [Errno 2] No such file or directory: '/data1/maoyuning/NetShare-master/results/test_cidds/generated_data/best_syn_dfs/syn.csv'

I'm running on a single machine(Ubuntu 16.04.3 LTS (GNU/Linux 4.4.0-148-generic x86_64) ) and turn Ray on.
The driver.py looks:

import netshare.ray as ray

from netshare import Generator

if __name__ == '__main__':
    # Change to False if you would not like to use Ray
    ray.config.enabled = True
    ray.init(address="auto")

    # configuration file
    generator = Generator(config="netflow/config_example_netflow_nodp_cidds.json")

    # `work_folder` should not exist o/w an overwrite error will be thrown.
    # Please set the `worker_folder` as *absolute path*
    # if you are using Ray with multi-machine setup
    # since Ray has bugs when dealing with relative paths.
    generator.train_and_generate(work_folder='/data1/maoyuning/NetShare-master/results/test_cidds')

    ray.shutdown()

The netflow/config_example_netflow_nodp_cidds.json is as follows:

{
    "global_config": {
        "original_data_file": "../traces/cidds/raw.csv",
        "dataset_type": "netflow",
        "n_chunks": 10,
        "dp": false
    },
    "pre_post_processor": {
        "class": "NetsharePrePostProcessor",
        "config": {
            "max_flow_len": null
        }
    },
    "model": {
        "class": "DoppelGANgerTFModel",
        "config": {
            "iteration": 20,
            "extra_checkpoint_freq": 10,
            "epoch_checkpoint_freq": 5
        }
    },
    "default": "netflow.json"
}

Here is the error message:

Traceback (most recent call last):
  File "driver.py", line 17, in <module>
    generator.train_and_generate(work_folder='/data1/maoyuning/NetShare-master/results/test_cidds')
  File "/data1/maoyuning/NetShare-master/netshare/generators/generator.py", line 205, in train_and_generate
    if not self.generate(work_folder):
  File "/data1/maoyuning/NetShare-master/netshare/generators/generator.py", line 176, in generate
    work_folder)):
  File "/data1/maoyuning/NetShare-master/netshare/generators/generator.py", line 110, in _post_process
    log_folder=log_folder)
  File "/data1/maoyuning/NetShare-master/netshare/pre_post_processors/pre_post_processor.py", line 38, in post_process
    log_folder=log_folder)
  File "/data1/maoyuning/NetShare-master/netshare/pre_post_processors/netshare/netshare_pre_post_processor.py", line 532, in _post_process
    os.path.join(output_folder, "syn.csv")
  File "/data1/maoyuning/.conda/envs/tf-1.15-py36/lib/python3.6/shutil.py", line 120, in copyfile
    with open(src, 'rb') as fsrc:
FileNotFoundError: [Errno 2] No such file or directory: '/data1/maoyuning/NetShare-master/results/test_cidds/generated_data/best_syn_dfs/syn.csv'

Could you upload the synthetic data generated from NetShare trained on the cluster?

Could you please upload the synthetic datasets that are generated from NetShare trained on the 200 CPU cluster mentioned in the paper? The training of NetShare is quite compute-intensive and nearly impossible without a cluster. I have found the synthetic CAIDA and UGR16 from this repo (as csv files), but I can't find synthetic data of other datasets.

Reproducibility on TON dataset experiments

Can you people release the configuration and driver scripts for evaluating the model accuracy when using the full TON dataset?
This should include both training/testing on the real dataset as well as training on synthetic and testing on the real dataset.

importing ray raise a AttributeError on sys.stdout()

Hello,

I'm trying to implement a simple generation of netflow from ugr16 dataset. I'm on linux and python3.6

My config.json is

{
    "global_config": {
        "original_data_file": "../traces/ugr16/raw.csv",
        "dataset_type": "netflow",
        "n_chunks": 10,
        "dp": false,
        "max_flow_len": false #to avoid the raising of an attribute error in netshare_pre_post_processor.py
    },
    "default": "netflow.json"
}

My driver.py is


from netshare import Generator
if __name__ == '__main__':
    # configuration file
    generator = Generator(config="netflow/config_example_netflow_nodp.json")

    # `work_folder` should not exist o/w an overwrite error will be thrown.
    # Please set the `worker_folder` as *absolute path*
    # if you are using Ray with multi-machine setup
    # since Ray has bugs when dealing with relative paths.
    
    #generator.visualize(work_folder='../results/vis_test')
    generator.train_and_generate(work_folder='../results/vis_test/')

After building all the chunk, the program starts working on the first chunk and raise an Attribute error linked to ray (wich is not use in the program)

The error Traceback is


Traceback (most recent call last):
  File "/srv/tempdd/aschoen/NetShare/netshare/pre_post_processors/pre_post_processor.py", line 29, in pre_process
    log_folder=log_folder)
  File "/srv/tempdd/aschoen/NetShare/netshare/pre_post_processors/netshare/netshare_pre_post_processor.py", line 406, in _pre_process
    flowkeys_chunkidx=flowkeys_chunkidx,
  File "/srv/tempdd/aschoen/NetShare/netshare/ray/remote.py", line 25, in remote
    import ray
  File "/udd/aschoen/.local/lib/python3.6/site-packages/ray/__init__.py", line 169, in <module>
    from ray import autoscaler  # noqa:E402
  File "/udd/aschoen/.local/lib/python3.6/site-packages/ray/autoscaler/__init__.py", line 1, in <module>
    from ray.autoscaler import sdk
  File "/udd/aschoen/.local/lib/python3.6/site-packages/ray/autoscaler/sdk/__init__.py", line 1, in <module>
    from ray.autoscaler.sdk.sdk import (
  File "/udd/aschoen/.local/lib/python3.6/site-packages/ray/autoscaler/sdk/sdk.py", line 9, in <module>
    from ray.autoscaler._private import commands
  File "/udd/aschoen/.local/lib/python3.6/site-packages/ray/autoscaler/_private/commands.py", line 24, in <module>
    from ray.autoscaler._private import subprocess_output_util as cmd_output_util
  File "/udd/aschoen/.local/lib/python3.6/site-packages/ray/autoscaler/_private/subprocess_output_util.py", line 8, in <module>
    from ray.autoscaler._private.cli_logger import cf, cli_logger
  File "/udd/aschoen/.local/lib/python3.6/site-packages/ray/autoscaler/_private/cli_logger.py", line 61, in <module>
    import colorful as _cf
  File "/udd/aschoen/.conda/envs/NetShare-env/lib/python3.6/site-packages/colorful/__init__.py", line 133, in <module>
    sys.modules[__name__] = ColorfulModule(Colorful(), __name__)
  File "/udd/aschoen/.conda/envs/NetShare-env/lib/python3.6/site-packages/colorful/core.py", line 342, in __init__
    colormode = terminal.detect_color_support(env=os.environ)
  File "/udd/aschoen/.conda/envs/NetShare-env/lib/python3.6/site-packages/colorful/terminal.py", line 48, in detect_color_support
    if not sys.stdout.isatty():

I think it is the import of ray in the RemoteFunctionWrapper of remote.py which is causing this fault. Is there a way to ignore this wrapper in your code by giving a specific argument to the json file ?

If you have any advice, I would really appreciate.

Good luck for your big merge ;)

Adrien

TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

Hello,
I meet this error while I'm trying to generate data using GPU. The whole error message is here:

Traceback (most recent call last):
  File "/home/ubuntu/xzj/NetShare-new/netshare/models/model.py", line 34, in generate
    return self._generate(
  File "/home/ubuntu/xzj/NetShare-new/netshare/models/doppelganger_torch_model.py", line 247, in _generate
    ) = dg.generate(
  File "/home/ubuntu/xzj/NetShare-new/netshare/models/doppelganger_torch/doppelganger.py", line 237, in generate
    attribute, attribute_discrete, feature = tuple(
  File "/home/ubuntu/xzj/NetShare-new/netshare/models/doppelganger_torch/doppelganger.py", line 238, in <genexpr>
    np.concatenate(d, axis=0) for d in zip(*generated_data_list)
  File "<__array_function__ internals>", line 200, in concatenate
  File "/home/ubuntu/.conda/envs/NetShare-new/lib/python3.9/site-packages/torch/_tensor.py", line 970, in __array__
    return self.numpy()
TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

It's just a little bug, I solved it by changing the function DoppelGANger._generate in ./netshare/models/doppelganger_torch/doppelganger.py.
The changed code is like this:

def _generate(
        self,
        real_attribute_noise,
        addi_attribute_noise,
        feature_input_noise,
        h0,
        c0,
        given_attribute=None,
        given_attribute_discrete=None,
    ):

        self.generator.eval()
        self.discriminator.eval()
        if self.use_attr_discriminator:
            self.attr_discriminator.eval()

        if given_attribute is None and given_attribute_discrete is None:
            with torch.no_grad():
                attribute, attribute_discrete, feature = self.generator(
                    real_attribute_noise=real_attribute_noise.to(self.device),
                    addi_attribute_noise=addi_attribute_noise.to(self.device),
                    feature_input_noise=feature_input_noise.to(self.device),
                    h0=h0.to(self.device),
                    c0=c0.to(self.device)
                )
        else:
            given_attribute = torch.from_numpy(given_attribute).float()
            given_attribute_discrete = torch.from_numpy(
                given_attribute_discrete).float()
            with torch.no_grad():
                attribute, attribute_discrete, feature = self.generator(
                    real_attribute_noise=real_attribute_noise.to(self.device),
                    addi_attribute_noise=addi_attribute_noise.to(self.device),
                    feature_input_noise=feature_input_noise.to(self.device),
                    h0=h0.to(self.device),
                    c0=c0.to(self.device),
                    given_attribute=given_attribute.to(self.device),
                    given_attribute_discrete=given_attribute_discrete.to(self.device),
                )
        return attribute.cpu(), attribute_discrete.cpu(), feature.cpu()

Difference of iteration count in camera-ready branch and main branch

Hi, I have one problem about the configuration. I notice that the camera-ready branch has set the default iteration to 40 for pcap w/o dp while the main branch set it to 80000. Could you please explain this difference? Are the results in the NetShare essay produced by experiments using the config in camera-ready branch?

ValueError: result_folder: /data1/maoyuning/NetShare-master/results/test_caida/models/chunkid-1/sample_len-100 not found in configs!

I follow the instructions and run the scrip driver.py. The configuration is as follows:

{
    "global_config": {
        "original_data_file": "../traces/caida/raw.pcap",
        "dataset_type": "pcap",
        "n_chunks": 5,
        "dp": false
    },
    "model":{
        "class": "DoppelGANgerTFModel",
        "config": {
            "iteration": 10,
            "extra_checkpoint_freq": 5,
            "epoch_checkpoint_freq": 2
        }
    },
    "default": "pcap.json"
}

Here is the error message:

Traceback (most recent call last):
  File "driver.py", line 16, in <module>
    generator.train_and_generate(work_folder='/data1/maoyuning/NetShare-master/results/test_caida')
  File "/data1/maoyuning/NetShare-master/netshare/generators/generator.py", line 205, in train_and_generate
    if not self.generate(work_folder):
  File "/data1/maoyuning/NetShare-master/netshare/generators/generator.py", line 176, in generate
    work_folder)):
  File "/data1/maoyuning/NetShare-master/netshare/generators/generator.py", line 110, in _post_process
    log_folder=log_folder)
  File "/data1/maoyuning/NetShare-master/netshare/pre_post_processors/pre_post_processor.py", line 38, in post_process
    log_folder=log_folder)
  File "/data1/maoyuning/NetShare-master/netshare/pre_post_processors/netshare/netshare_pre_post_processor.py", line 500, in _post_process
    configs)
  File "/data1/maoyuning/NetShare-master/netshare/pre_post_processors/netshare/util.py", line 503, in _recalulate_config_ids_in_each_config_group
    config["result_folder"]))
  File "/data1/maoyuning/NetShare-master/netshare/model_managers/netshare_manager/netshare_util.py", line 77, in get_configid_from_kv
    raise ValueError("{}: {} not found in configs!".format(k, v))
ValueError: result_folder: /data1/maoyuning/NetShare-master/results/test_caida/models/chunkid-1/sample_len-100 not found in configs!

Bug when loading from chunk0 checkpoint

When loading from chunk 0 checkpoint by setting skip_chunk0_train to true, the function _configs2configsgroup in netshare/model_managers/netshare_manager/netshare_util.py can't find the checkpoint correctly. I believe this is due to a bug in this function where the for loop for finding the latest checkpoint use an incorrect format of checkpoint file.

The currrent format is

ckpt_dir = os.path.join(
    configs[config_id]["result_folder"],
    "checkpoint",
    "epoch_id-{}".format(epoch_id)
)

whereas the correct format should be

ckpt_dir = os.path.join(
    configs[config_id]["result_folder"],
    "checkpoint",
    "epoch_id-{}.pt".format(epoch_id)
)

Prepostprocessor for a new dataset

Hello,

I'm trying to adapt your solution for a new dataset based on CIFlowMeter features extractor.

I would like to know what are the expected output of the preprocess and the postprocess.

Because I already have my data in the format require by doppelGANger, so I was thinking about just loading my data in these functions.

In the output of preprocess, before being given to training step, should all my attributes be continuous, or should I keep some discrete attributes that will be manage by word2vec, and if so, where do I indicate which attribute is continuous and which attribute is discrete ?

Also, in your example on zeek, you didn't change any argument in the PrePostProcessor config field of the config file

},
  "pre_post_processor": {
      "class": "ZeeklogPrePostProcessor",
      "config": {
          "norm_option": 0,
          "split_name": "multichunk_dep_v2",
          "df2chunks": "fixed_time",
          "full_IP_header": true,
          "encode_IP": "bit"
      }

Wouldn't be a problem anywhere else down the pipeline ?

And if so, could you please provide some info on what argument does what ? But if it is not mandatory t change it, we can just keep it like this.

Thanks in advance.

The package can only be installed in ediable mode.

If the netshare package was installed through pip3 install . (without opening the editable option), I got the following error message when importing netshare:

Traceback (most recent call last):
  File "test.py", line 1, in <module>
    import netshare
  File "/home/runwei/anaconda3/envs/netshare/lib/python3.6/site-packages/netshare/__init__.py", line 1, in <module>
    from .generators.generator import Generator
  File "/home/runwei/anaconda3/envs/netshare/lib/python3.6/site-packages/netshare/generators/generator.py", line 5, in <module>
    import netshare.pre_post_processors as pre_post_processors
  File "/home/runwei/anaconda3/envs/netshare/lib/python3.6/site-packages/netshare/pre_post_processors/__init__.py", line 2, in <module>
    from .netshare.netshare_pre_post_processor import NetsharePrePostProcessor
ModuleNotFoundError: No module named 'netshare.pre_post_processors.netshare'

Netflow Training and generation ValueError: Pretrain_dir {} does not exist!

Hello,
Thank you for your work. I'm interested in implementing your solution for netflow traffic generation.
Unfortunately, at the end of the chunk0 training, I experience the following error (I copy/paste the last part of the stdout/err, because the entire would be too long).

Stop data_loader #0: True
Stop data_loader #1: True
data loader ended
-------------
data loader ended
-------------
data loader endeddata loader ended
-------------

-------------
data loader ended
-------------
Stop data_loader #2: True
Stop data_loader #3: True
Stop data_loader #4: True
data loader ended
-------------
Stop data_loader #5: True
Stop data_loader #6: True
Stop data_loader #7: True
Stop data_loader #8: True
Stop data_loader #9: True
-------------
Finish launching chunk0 experiments ...
Start waiting for chunk0 from config_group_id 0experiments finished ...
Traceback (most recent call last):
  File "/srv/tempdd/aschoen/NetShare/netshare/model_managers/model_manager.py", line 34, in train
    model_config=model_config)
  File "/srv/tempdd/aschoen/NetShare/netshare/model_managers/netshare_manager/netshare_manager.py", line 42, in _train
    log_folder=log_folder)
  File "/srv/tempdd/aschoen/NetShare/netshare/ray/remote.py", line 34, in remote
    return ResultWrapper(self._ray_args[0](*args, **kwargs))
  File "/srv/tempdd/aschoen/NetShare/netshare/model_managers/netshare_manager/train_helper.py", line 104, in _train_specific_config_group
    log_folder)
  File "/srv/tempdd/aschoen/NetShare/netshare/model_managers/netshare_manager/train_helper.py", line 24, in _launch_other_chunks_training
    raise ValueError("Pretrain_dir {} does not exist!")
ValueError: Pretrain_dir {} does not exist!
Traceback (most recent call last):
  File "example_netflow.py", line 15, in <module>
    generator.train_and_generate(work_folder='../results/vis_test/')
  File "/srv/tempdd/aschoen/NetShare/netshare/generators/generator.py", line 203, in train_and_generate
    if not self.train(work_folder):
  File "/srv/tempdd/aschoen/NetShare/netshare/generators/generator.py", line 196, in train
    log_folder=self._get_model_log_folder(work_folder)):
  File "/srv/tempdd/aschoen/NetShare/netshare/generators/generator.py", line 122, in _train
    model_config=self._model_config)
  File "/srv/tempdd/aschoen/NetShare/netshare/model_managers/model_manager.py", line 34, in train
    model_config=model_config)
  File "/srv/tempdd/aschoen/NetShare/netshare/model_managers/netshare_manager/netshare_manager.py", line 42, in _train
    log_folder=log_folder)
  File "/srv/tempdd/aschoen/NetShare/netshare/ray/remote.py", line 34, in remote
    return ResultWrapper(self._ray_args[0](*args, **kwargs))
  File "/srv/tempdd/aschoen/NetShare/netshare/model_managers/netshare_manager/train_helper.py", line 104, in _train_specific_config_group
    log_folder)
  File "/srv/tempdd/aschoen/NetShare/netshare/model_managers/netshare_manager/train_helper.py", line 24, in _launch_other_chunks_training
    raise ValueError("Pretrain_dir {} does not exist!")
ValueError: Pretrain_dir {} does not exist!

Maybe it has something to do with my config file and especially the part with the pretrain_dir argument. I've tryed with pretrain_dir = null or pretrain_dir = '/path/to/a/directory/on/my/computer' but both return me this error.

This is my complete config file


{
    "global_config": {
        "overwrite": true,
        "original_data_file": "../traces/ugr16/raw.csv",
        "dataset_type": "netflow",
        "n_chunks": 10,
        "dp": false,
        "word2vec_vecSize": 10,
        "timestamp": "interarrival",
        "truncate": "per_chunk",
        "max_flow_len": false 
    },
    "pre_post_processor": {
        "class": "NetsharePrePostProcessor",
        "config": {
            "norm_option": 0,
            "split_name": "multichunk_dep_v2",
            "df2chunks": "fixed_time",
            "full_IP_header": true,
            "encode_IP": "bit"
        }
    },
    "model_manager": {
        "class": "NetShareManager",
        "config": {
            "pretrain_dir": null,
            "skip_chunk0_train": false,
            "pretrain_non_dp": true,
            "pretrain_non_dp_reduce_time": 4.0,
            "pretrain_dp": false,
            "run": 0
        }
    },
    "model": {
        "class": "DoppelGANgerTFModel",
        "config": {
            "batch_size": 1000,
            "sample_len": [
                1,
                5,
                10
            ],
            "sample_len_expand": true,
            "iteration": 500,
            "vis_freq": 2001,
            "vis_num_sample": 5,
            "d_rounds": 5,
            "g_rounds": 1,
            "num_packing": 1,
            "noise": true,
            "attr_noise_type": "normal",
            "feature_noise_type": "normal",
            "rnn_mlp_num_layers": 0,
            "feed_back": false,
            "g_lr": 0.0001,
            "d_lr": 0.0001,
            "d_gp_coe": 10.0,
            "gen_feature_num_layers": 1,
            "gen_feature_num_units": 100,
            "gen_attribute_num_layers": 5,
            "gen_attribute_num_units": 512,
            "disc_num_layers": 5,
            "disc_num_units": 512,
            "initial_state": "random",
            "leaky_relu": false,
            "attr_d_lr": 0.0001,
            "attr_d_gp_coe": 10.0,
            "g_attr_d_coe": 1.0,
            "attr_disc_num_layers": 5,
            "attr_disc_num_units": 512,
            "aux_disc": true,
            "self_norm": false,
            "fix_feature_network": false,
            "debug": false,
            "combined_disc": true,
            "use_gt_lengths": false,
            "use_uniform_lengths": false,
            "num_cores": null,
            "sn_mode": null,
            "scale": 1.0,
            "extra_checkpoint_freq": 20000,
            "epoch_checkpoint_freq": 1000,
            "dp_noise_multiplier": null,
            "dp_l2_norm_clip": null
        }
    }
}

Thank you in advance for your help.
Hallavar

Problem with PCAP to CSV conversion in `netshare/pre_post_processors/netshare/main.c`

The program /netshare/pre_post_processors/netshare/main.c seems unable to identify protocols other than TCP and UDP.

This program converts TCP and UDP to their corresponding protocol number but it ignores all other kinds of protocals. Is this a bug or is it by design?

netsharecmu / netshare Goto Github PK

netshare's People

Contributors

Stargazers

Watchers

Forkers

netshare's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs