netsharecmu / netshare Goto Github PK
View Code? Open in Web Editor NEW(SIGCOMM '22) Practical GAN-based Synthetic IP Header Trace Generation using NetShare
Home Page: https://www.pcapshare.com/
License: BSD 3-Clause Clear License
(SIGCOMM '22) Practical GAN-based Synthetic IP Header Trace Generation using NetShare
Home Page: https://www.pcapshare.com/
License: BSD 3-Clause Clear License
I'm trying to train the model on a small test dataset with naive differential privacy. I tried to change something in the configuration, but I got empty output.
Here is the changed configuration
{
"global_config": {
"original_data_file": "../traces/simple-network/Switch1_Ethernet1_to_PC1_Ethernet0-correct/raw.pcap",
"dataset_type": "pcap",
"n_chunks": 1,
"dp": true
},
"model_manager": {
"class": "NetShareManager",
"config": {
"pretrain_non_dp": false,
"pretrain_non_dp_reduce_time": null,
"pretrain_dp": false
}
},
"model": {
"class": "DoppelGANgerTFModel",
"config": {
"batch_size": 1,
"sample_len": [
1
],
"iteration": 80000,
"extra_checkpoint_freq": 4000,
"epoch_checkpoint_freq": 1000,
"gen_feature_num_layers": 1,
"gen_feature_num_units": 100,
"gen_attribute_num_layers": 1,
"gen_attribute_num_units": 32,
"disc_num_layers": 1,
"disc_num_units": 32,
"attr_disc_num_layers": 1,
"attr_disc_num_units": 32,
"dp_noise_multiplier": 0.2797,
"dp_l2_norm_clip": 1.0
}
},
"default": "pcap.json"
}
I suspect it is partly because I set pretrain_dp=false
. But if pretrain_dp=true
, I will be asked to provide a model pretrained with public dataset.
When I tried to run NetShare in a Docker container, I realized it had an implicit dependency on libpcap. I used this command to install pcap2csv.so:
sudo apt install libpcap-dev
Maybe it would be helpful to add this dependency line in the README file?
Hi! I'm following the README.md in branch new_dataset, and after pip3 install -e .
, I encountered the following error. Is it a known issue or is there any additional dependency that I need to install? Thanks!
ERROR: Command errored out with exit status 1:
command: /Users/dorothyko/opt/anaconda3/envs/NetShare/bin/python -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/8z/rhgl1dpd44j80z0vry1zh7k80000gn/T/pip-install-0xh9kpj3/dm-tree_8e1c2948f02b4e8dbaffa8699b00febb/setup.py'"'"'; __file__='"'"'/private/var/folders/8z/rhgl1dpd44j80z0vry1zh7k80000gn/T/pip-install-0xh9kpj3/dm-tree_8e1c2948f02b4e8dbaffa8699b00febb/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /private/var/folders/8z/rhgl1dpd44j80z0vry1zh7k80000gn/T/pip-record-t1fpvnsw/install-record.txt --single-version-externally-managed --compile --install-headers /Users/dorothyko/opt/anaconda3/envs/NetShare/include/python3.6m/dm-tree
cwd: /private/var/folders/8z/rhgl1dpd44j80z0vry1zh7k80000gn/T/pip-install-0xh9kpj3/dm-tree_8e1c2948f02b4e8dbaffa8699b00febb/
Complete output (57 lines):
running install
running build
running build_py
creating build
creating build/lib.macosx-10.9-x86_64-3.6
creating build/lib.macosx-10.9-x86_64-3.6/tree
copying tree/sequence.py -> build/lib.macosx-10.9-x86_64-3.6/tree
copying tree/__init__.py -> build/lib.macosx-10.9-x86_64-3.6/tree
copying tree/tree_test.py -> build/lib.macosx-10.9-x86_64-3.6/tree
copying tree/tree_benchmark.py -> build/lib.macosx-10.9-x86_64-3.6/tree
running build_ext
Traceback (most recent call last):
File "/private/var/folders/8z/rhgl1dpd44j80z0vry1zh7k80000gn/T/pip-install-0xh9kpj3/dm-tree_8e1c2948f02b4e8dbaffa8699b00febb/setup.py", line 77, in _check_build_environment
subprocess.check_call(['cmake', '--version'])
File "/Users/dorothyko/opt/anaconda3/envs/NetShare/lib/python3.6/subprocess.py", line 306, in check_call
retcode = call(*popenargs, **kwargs)
File "/Users/dorothyko/opt/anaconda3/envs/NetShare/lib/python3.6/subprocess.py", line 287, in call
with Popen(*popenargs, **kwargs) as p:
File "/Users/dorothyko/opt/anaconda3/envs/NetShare/lib/python3.6/subprocess.py", line 729, in __init__
restore_signals, start_new_session)
File "/Users/dorothyko/opt/anaconda3/envs/NetShare/lib/python3.6/subprocess.py", line 1364, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'cmake': 'cmake'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/private/var/folders/8z/rhgl1dpd44j80z0vry1zh7k80000gn/T/pip-install-0xh9kpj3/dm-tree_8e1c2948f02b4e8dbaffa8699b00febb/setup.py", line 155, in <module>
keywords='tree nest flatten',
File "/Users/dorothyko/opt/anaconda3/envs/NetShare/lib/python3.6/site-packages/setuptools/__init__.py", line 153, in setup
return distutils.core.setup(**attrs)
File "/Users/dorothyko/opt/anaconda3/envs/NetShare/lib/python3.6/distutils/core.py", line 148, in setup
dist.run_commands()
File "/Users/dorothyko/opt/anaconda3/envs/NetShare/lib/python3.6/distutils/dist.py", line 955, in run_commands
self.run_command(cmd)
File "/Users/dorothyko/opt/anaconda3/envs/NetShare/lib/python3.6/distutils/dist.py", line 974, in run_command
cmd_obj.run()
File "/Users/dorothyko/opt/anaconda3/envs/NetShare/lib/python3.6/site-packages/setuptools/command/install.py", line 61, in run
return orig.install.run(self)
File "/Users/dorothyko/opt/anaconda3/envs/NetShare/lib/python3.6/distutils/command/install.py", line 545, in run
self.run_command('build')
File "/Users/dorothyko/opt/anaconda3/envs/NetShare/lib/python3.6/distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "/Users/dorothyko/opt/anaconda3/envs/NetShare/lib/python3.6/distutils/dist.py", line 974, in run_command
cmd_obj.run()
File "/Users/dorothyko/opt/anaconda3/envs/NetShare/lib/python3.6/distutils/command/build.py", line 135, in run
self.run_command(cmd_name)
File "/Users/dorothyko/opt/anaconda3/envs/NetShare/lib/python3.6/distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "/Users/dorothyko/opt/anaconda3/envs/NetShare/lib/python3.6/distutils/dist.py", line 974, in run_command
cmd_obj.run()
File "/private/var/folders/8z/rhgl1dpd44j80z0vry1zh7k80000gn/T/pip-install-0xh9kpj3/dm-tree_8e1c2948f02b4e8dbaffa8699b00febb/setup.py", line 70, in run
self._check_build_environment()
File "/private/var/folders/8z/rhgl1dpd44j80z0vry1zh7k80000gn/T/pip-install-0xh9kpj3/dm-tree_8e1c2948f02b4e8dbaffa8699b00febb/setup.py", line 82, in _check_build_environment
) from e
RuntimeError: CMake must be installed to build the following extensions: _tree
----------------------------------------
ERROR: Command errored out with exit status 1: /Users/dorothyko/opt/anaconda3/envs/NetShare/bin/python -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/8z/rhgl1dpd44j80z0vry1zh7k80000gn/T/pip-install-0xh9kpj3/dm-tree_8e1c2948f02b4e8dbaffa8699b00febb/setup.py'"'"'; __file__='"'"'/private/var/folders/8z/rhgl1dpd44j80z0vry1zh7k80000gn/T/pip-install-0xh9kpj3/dm-tree_8e1c2948f02b4e8dbaffa8699b00febb/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /private/var/folders/8z/rhgl1dpd44j80z0vry1zh7k80000gn/T/pip-record-t1fpvnsw/install-record.txt --single-version-externally-managed --compile --install-headers /Users/dorothyko/opt/anaconda3/envs/NetShare/include/python3.6m/dm-tree Check the logs for full command output. ```
I have just follow the instructions and run the script driver.py
. Here is the error message:
Traceback (most recent call last):
File "/home/runwei/NetShare/netshare/models/model.py", line 27, in train
log_folder=log_folder)
File "/home/runwei/NetShare/netshare/models/doppelganger_tf_model.py", line 176, in _train
gan.build()
File "/home/runwei/NetShare/netshare/models/doppelganger_tf/doppelganger.py", line 293, in build
self.build_loss()
File "/home/runwei/NetShare/netshare/models/doppelganger_tf/doppelganger.py", line 708, in build_loss
self.g_loss, var_list=self.generator.trainable_vars
File "/home/runwei/anaconda3/envs/netshare/lib/python3.6/site-packages/tensorflow_core/python/training/optimizer.py", line 413, in minimize
name=name)
File "/home/runwei/anaconda3/envs/netshare/lib/python3.6/site-packages/tensorflow_core/python/training/optimizer.py", line 597, in apply_gradients
self._create_slots(var_list)
File "/home/runwei/anaconda3/envs/netshare/lib/python3.6/site-packages/tensorflow_core/python/training/adam.py", line 131, in _create_slots
self._zeros_slot(v, "m", self._name)
File "/home/runwei/anaconda3/envs/netshare/lib/python3.6/site-packages/tensorflow_core/python/training/optimizer.py", line 1156, in _zeros_slot
new_slot_variable = slot_creator.create_zeros_slot(var, op_name)
File "/home/runwei/anaconda3/envs/netshare/lib/python3.6/site-packages/tensorflow_core/python/training/slot_creator.py", line 190, in create_zeros_slot
colocate_with_primary=colocate_with_primary)
File "/home/runwei/anaconda3/envs/netshare/lib/python3.6/site-packages/tensorflow_core/python/training/slot_creator.py", line 164, in create_slot_with_initializer
dtype)
File "/home/runwei/anaconda3/envs/netshare/lib/python3.6/site-packages/tensorflow_core/python/training/slot_creator.py", line 74, in _create_slot_var
validate_shape=validate_shape)
File "/home/runwei/anaconda3/envs/netshare/lib/python3.6/site-packages/tensorflow_core/python/ops/variable_scope.py", line 1500, in get_variable
aggregation=aggregation)
File "/home/runwei/anaconda3/envs/netshare/lib/python3.6/site-packages/tensorflow_core/python/ops/variable_scope.py", line 1243, in get_variable
aggregation=aggregation)
File "/home/runwei/anaconda3/envs/netshare/lib/python3.6/site-packages/tensorflow_core/python/ops/variable_scope.py", line 567, in get_variable
aggregation=aggregation)
File "/home/runwei/anaconda3/envs/netshare/lib/python3.6/site-packages/tensorflow_core/python/ops/variable_scope.py", line 519, in _true_getter
aggregation=aggregation)
File "/home/runwei/anaconda3/envs/netshare/lib/python3.6/site-packages/tensorflow_core/python/ops/variable_scope.py", line 868, in _get_single_variable
(err_msg, "".join(traceback.format_list(tb))))
ValueError: Variable DoppelGANgerGenerator/attribute_real/layer0/linear/matrix/Adam/ already exists, disallowed. Did you mean to set reuse=True or reuse=tf.AUTO_REUSE in VarScope? Originally defined at:
File "/home/runwei/anaconda3/envs/netshare/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 1748, in __init__
self._traceback = tf_stack.extract_stack()
File "/home/runwei/anaconda3/envs/netshare/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 3426, in _create_op_internal
op_def=op_def)
File "/home/runwei/anaconda3/envs/netshare/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 3357, in create_op
attrs, op_def, compute_device)
File "/home/runwei/anaconda3/envs/netshare/lib/python3.6/site-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "/home/runwei/anaconda3/envs/netshare/lib/python3.6/site-packages/tensorflow_core/python/framework/op_def_library.py", line 794, in _apply_op_helper
op_def=op_def)
File "/lv_scratch/scratch/zirul13/trans/NetShare/netshare/pre_post_processors/netshare/word2vec_embedding.py", line 31, in custom_load
return CustomUnpickler(f).load()
File "mtrand.pyx", line 180, in numpy.random.mtrand.RandomState.init
TypeError: init() takes at most 1 positional argument (2 given)
Could you share the configuration file and driver code for wikipedia webtraffic dataset?
Hello,
I'm wondering if you rename the attribute of the .csv dataset in your preprocessing step.
For CIDDS, the programs netshare_pre_post_processor.py seems to be looking up to a column named 'td' as well as 'ts'. Both these attributes are not present in the original .csv.
For UGR'16, the .csv given by the team has no column name, did you decide to name them ? If yes, how (wich name correspond to which column number) ?
Could you please give a quick explanation of all the attributes/column your program is supposed to use, so one can change its dataset accordingly ?
When I was running examples/driver.py
using generator = Generator(config=""pcap/config_example_pcap_nodp.json)
, an error occured:
Traceback (most recent call last):
File "driver.py", line 10, in <module>
generator = Generator(config="pcap/config_example_pcap_nodp.json")
File "/home/xinyu/NetShare/netshare/generators/generator.py", line 39, in __init__
self._overwrite = global_config['overwrite']
File "/home/xinyu/anaconda3/envs/NetShare/lib/python3.6/site-packages/config_io/config.py", line 12, in __mi ssing__
raise AttributeError(key)
AttributeError: overwrite
After some debugging, I found out it's likely because the config file of pcap examples/config_example_pcap_nodp.json
misses a "default" a rgument, thus making the config_io
failed to read the default setting.
The config file currently is
{
"global_config": {
"original_data_file": "../traces/caida/raw.pcap",
"dataset_type": "pcap",
"n_chunks": 10,
"dp": false
}
}
It should be
{
"global_config": {
"original_data_file": "../traces/caida/raw.pcap",
"dataset_type": "pcap",
"n_chunks": 10,
"dp": false
},
"default": "pcap.json"
}
I'm running on a single machine(Ubuntu 16.04.3 LTS (GNU/Linux 4.4.0-148-generic x86_64) ) and turn Ray on.
The driver.py
looks:
import netshare.ray as ray
from netshare import Generator
if __name__ == '__main__':
# Change to False if you would not like to use Ray
ray.config.enabled = True
ray.init(address="auto")
# configuration file
generator = Generator(config="netflow/config_example_netflow_nodp_cidds.json")
# `work_folder` should not exist o/w an overwrite error will be thrown.
# Please set the `worker_folder` as *absolute path*
# if you are using Ray with multi-machine setup
# since Ray has bugs when dealing with relative paths.
generator.train_and_generate(work_folder='/data1/maoyuning/NetShare-master/results/test_cidds')
ray.shutdown()
The netflow/config_example_netflow_nodp_cidds.json
is as follows:
{
"global_config": {
"original_data_file": "../traces/cidds/raw.csv",
"dataset_type": "netflow",
"n_chunks": 10,
"dp": false
},
"pre_post_processor": {
"class": "NetsharePrePostProcessor",
"config": {
"max_flow_len": null
}
},
"model": {
"class": "DoppelGANgerTFModel",
"config": {
"iteration": 20,
"extra_checkpoint_freq": 10,
"epoch_checkpoint_freq": 5
}
},
"default": "netflow.json"
}
Here is the error message:
Traceback (most recent call last):
File "driver.py", line 17, in <module>
generator.train_and_generate(work_folder='/data1/maoyuning/NetShare-master/results/test_cidds')
File "/data1/maoyuning/NetShare-master/netshare/generators/generator.py", line 205, in train_and_generate
if not self.generate(work_folder):
File "/data1/maoyuning/NetShare-master/netshare/generators/generator.py", line 176, in generate
work_folder)):
File "/data1/maoyuning/NetShare-master/netshare/generators/generator.py", line 110, in _post_process
log_folder=log_folder)
File "/data1/maoyuning/NetShare-master/netshare/pre_post_processors/pre_post_processor.py", line 38, in post_process
log_folder=log_folder)
File "/data1/maoyuning/NetShare-master/netshare/pre_post_processors/netshare/netshare_pre_post_processor.py", line 532, in _post_process
os.path.join(output_folder, "syn.csv")
File "/data1/maoyuning/.conda/envs/tf-1.15-py36/lib/python3.6/shutil.py", line 120, in copyfile
with open(src, 'rb') as fsrc:
FileNotFoundError: [Errno 2] No such file or directory: '/data1/maoyuning/NetShare-master/results/test_cidds/generated_data/best_syn_dfs/syn.csv'
Could you please upload the synthetic datasets that are generated from NetShare trained on the 200 CPU cluster mentioned in the paper? The training of NetShare is quite compute-intensive and nearly impossible without a cluster. I have found the synthetic CAIDA and UGR16 from this repo (as csv files), but I can't find synthetic data of other datasets.
Can you people release the configuration and driver scripts for evaluating the model accuracy when using the full TON dataset?
This should include both training/testing on the real dataset as well as training on synthetic and testing on the real dataset.
Hello,
I'm trying to implement a simple generation of netflow from ugr16 dataset. I'm on linux and python3.6
My config.json is
{
"global_config": {
"original_data_file": "../traces/ugr16/raw.csv",
"dataset_type": "netflow",
"n_chunks": 10,
"dp": false,
"max_flow_len": false #to avoid the raising of an attribute error in netshare_pre_post_processor.py
},
"default": "netflow.json"
}
My driver.py is
from netshare import Generator
if __name__ == '__main__':
# configuration file
generator = Generator(config="netflow/config_example_netflow_nodp.json")
# `work_folder` should not exist o/w an overwrite error will be thrown.
# Please set the `worker_folder` as *absolute path*
# if you are using Ray with multi-machine setup
# since Ray has bugs when dealing with relative paths.
#generator.visualize(work_folder='../results/vis_test')
generator.train_and_generate(work_folder='../results/vis_test/')
After building all the chunk, the program starts working on the first chunk and raise an Attribute error linked to ray (wich is not use in the program)
The error Traceback is
Traceback (most recent call last):
File "/srv/tempdd/aschoen/NetShare/netshare/pre_post_processors/pre_post_processor.py", line 29, in pre_process
log_folder=log_folder)
File "/srv/tempdd/aschoen/NetShare/netshare/pre_post_processors/netshare/netshare_pre_post_processor.py", line 406, in _pre_process
flowkeys_chunkidx=flowkeys_chunkidx,
File "/srv/tempdd/aschoen/NetShare/netshare/ray/remote.py", line 25, in remote
import ray
File "/udd/aschoen/.local/lib/python3.6/site-packages/ray/__init__.py", line 169, in <module>
from ray import autoscaler # noqa:E402
File "/udd/aschoen/.local/lib/python3.6/site-packages/ray/autoscaler/__init__.py", line 1, in <module>
from ray.autoscaler import sdk
File "/udd/aschoen/.local/lib/python3.6/site-packages/ray/autoscaler/sdk/__init__.py", line 1, in <module>
from ray.autoscaler.sdk.sdk import (
File "/udd/aschoen/.local/lib/python3.6/site-packages/ray/autoscaler/sdk/sdk.py", line 9, in <module>
from ray.autoscaler._private import commands
File "/udd/aschoen/.local/lib/python3.6/site-packages/ray/autoscaler/_private/commands.py", line 24, in <module>
from ray.autoscaler._private import subprocess_output_util as cmd_output_util
File "/udd/aschoen/.local/lib/python3.6/site-packages/ray/autoscaler/_private/subprocess_output_util.py", line 8, in <module>
from ray.autoscaler._private.cli_logger import cf, cli_logger
File "/udd/aschoen/.local/lib/python3.6/site-packages/ray/autoscaler/_private/cli_logger.py", line 61, in <module>
import colorful as _cf
File "/udd/aschoen/.conda/envs/NetShare-env/lib/python3.6/site-packages/colorful/__init__.py", line 133, in <module>
sys.modules[__name__] = ColorfulModule(Colorful(), __name__)
File "/udd/aschoen/.conda/envs/NetShare-env/lib/python3.6/site-packages/colorful/core.py", line 342, in __init__
colormode = terminal.detect_color_support(env=os.environ)
File "/udd/aschoen/.conda/envs/NetShare-env/lib/python3.6/site-packages/colorful/terminal.py", line 48, in detect_color_support
if not sys.stdout.isatty():
I think it is the import of ray in the RemoteFunctionWrapper of remote.py which is causing this fault. Is there a way to ignore this wrapper in your code by giving a specific argument to the json file ?
If you have any advice, I would really appreciate.
Good luck for your big merge ;)
Adrien
Hello,
I meet this error while I'm trying to generate data using GPU. The whole error message is here:
Traceback (most recent call last):
File "/home/ubuntu/xzj/NetShare-new/netshare/models/model.py", line 34, in generate
return self._generate(
File "/home/ubuntu/xzj/NetShare-new/netshare/models/doppelganger_torch_model.py", line 247, in _generate
) = dg.generate(
File "/home/ubuntu/xzj/NetShare-new/netshare/models/doppelganger_torch/doppelganger.py", line 237, in generate
attribute, attribute_discrete, feature = tuple(
File "/home/ubuntu/xzj/NetShare-new/netshare/models/doppelganger_torch/doppelganger.py", line 238, in <genexpr>
np.concatenate(d, axis=0) for d in zip(*generated_data_list)
File "<__array_function__ internals>", line 200, in concatenate
File "/home/ubuntu/.conda/envs/NetShare-new/lib/python3.9/site-packages/torch/_tensor.py", line 970, in __array__
return self.numpy()
TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.
It's just a little bug, I solved it by changing the function DoppelGANger._generate
in ./netshare/models/doppelganger_torch/doppelganger.py
.
The changed code is like this:
def _generate(
self,
real_attribute_noise,
addi_attribute_noise,
feature_input_noise,
h0,
c0,
given_attribute=None,
given_attribute_discrete=None,
):
self.generator.eval()
self.discriminator.eval()
if self.use_attr_discriminator:
self.attr_discriminator.eval()
if given_attribute is None and given_attribute_discrete is None:
with torch.no_grad():
attribute, attribute_discrete, feature = self.generator(
real_attribute_noise=real_attribute_noise.to(self.device),
addi_attribute_noise=addi_attribute_noise.to(self.device),
feature_input_noise=feature_input_noise.to(self.device),
h0=h0.to(self.device),
c0=c0.to(self.device)
)
else:
given_attribute = torch.from_numpy(given_attribute).float()
given_attribute_discrete = torch.from_numpy(
given_attribute_discrete).float()
with torch.no_grad():
attribute, attribute_discrete, feature = self.generator(
real_attribute_noise=real_attribute_noise.to(self.device),
addi_attribute_noise=addi_attribute_noise.to(self.device),
feature_input_noise=feature_input_noise.to(self.device),
h0=h0.to(self.device),
c0=c0.to(self.device),
given_attribute=given_attribute.to(self.device),
given_attribute_discrete=given_attribute_discrete.to(self.device),
)
return attribute.cpu(), attribute_discrete.cpu(), feature.cpu()
Hi, I have one problem about the configuration. I notice that the camera-ready branch has set the default iteration to 40 for pcap w/o dp while the main branch set it to 80000. Could you please explain this difference? Are the results in the NetShare essay produced by experiments using the config in camera-ready branch?
I follow the instructions and run the scrip driver.py
. The configuration is as follows:
{
"global_config": {
"original_data_file": "../traces/caida/raw.pcap",
"dataset_type": "pcap",
"n_chunks": 5,
"dp": false
},
"model":{
"class": "DoppelGANgerTFModel",
"config": {
"iteration": 10,
"extra_checkpoint_freq": 5,
"epoch_checkpoint_freq": 2
}
},
"default": "pcap.json"
}
Here is the error message:
Traceback (most recent call last):
File "driver.py", line 16, in <module>
generator.train_and_generate(work_folder='/data1/maoyuning/NetShare-master/results/test_caida')
File "/data1/maoyuning/NetShare-master/netshare/generators/generator.py", line 205, in train_and_generate
if not self.generate(work_folder):
File "/data1/maoyuning/NetShare-master/netshare/generators/generator.py", line 176, in generate
work_folder)):
File "/data1/maoyuning/NetShare-master/netshare/generators/generator.py", line 110, in _post_process
log_folder=log_folder)
File "/data1/maoyuning/NetShare-master/netshare/pre_post_processors/pre_post_processor.py", line 38, in post_process
log_folder=log_folder)
File "/data1/maoyuning/NetShare-master/netshare/pre_post_processors/netshare/netshare_pre_post_processor.py", line 500, in _post_process
configs)
File "/data1/maoyuning/NetShare-master/netshare/pre_post_processors/netshare/util.py", line 503, in _recalulate_config_ids_in_each_config_group
config["result_folder"]))
File "/data1/maoyuning/NetShare-master/netshare/model_managers/netshare_manager/netshare_util.py", line 77, in get_configid_from_kv
raise ValueError("{}: {} not found in configs!".format(k, v))
ValueError: result_folder: /data1/maoyuning/NetShare-master/results/test_caida/models/chunkid-1/sample_len-100 not found in configs!
When loading from chunk 0 checkpoint by setting skip_chunk0_train
to true
, the function _configs2configsgroup
in netshare/model_managers/netshare_manager/netshare_util.py
can't find the checkpoint correctly. I believe this is due to a bug in this function where the for loop for finding the latest checkpoint use an incorrect format of checkpoint file.
The currrent format is
ckpt_dir = os.path.join(
configs[config_id]["result_folder"],
"checkpoint",
"epoch_id-{}".format(epoch_id)
)
whereas the correct format should be
ckpt_dir = os.path.join(
configs[config_id]["result_folder"],
"checkpoint",
"epoch_id-{}.pt".format(epoch_id)
)
Hello,
I'm trying to adapt your solution for a new dataset based on CIFlowMeter features extractor.
I would like to know what are the expected output of the preprocess and the postprocess.
Because I already have my data in the format require by doppelGANger, so I was thinking about just loading my data in these functions.
In the output of preprocess, before being given to training step, should all my attributes be continuous, or should I keep some discrete attributes that will be manage by word2vec, and if so, where do I indicate which attribute is continuous and which attribute is discrete ?
Also, in your example on zeek, you didn't change any argument in the PrePostProcessor config field of the config file
},
"pre_post_processor": {
"class": "ZeeklogPrePostProcessor",
"config": {
"norm_option": 0,
"split_name": "multichunk_dep_v2",
"df2chunks": "fixed_time",
"full_IP_header": true,
"encode_IP": "bit"
}
Wouldn't be a problem anywhere else down the pipeline ?
And if so, could you please provide some info on what argument does what ? But if it is not mandatory t change it, we can just keep it like this.
Thanks in advance.
If the netshare package was installed through pip3 install .
(without opening the editable option), I got the following error message when importing netshare:
Traceback (most recent call last):
File "test.py", line 1, in <module>
import netshare
File "/home/runwei/anaconda3/envs/netshare/lib/python3.6/site-packages/netshare/__init__.py", line 1, in <module>
from .generators.generator import Generator
File "/home/runwei/anaconda3/envs/netshare/lib/python3.6/site-packages/netshare/generators/generator.py", line 5, in <module>
import netshare.pre_post_processors as pre_post_processors
File "/home/runwei/anaconda3/envs/netshare/lib/python3.6/site-packages/netshare/pre_post_processors/__init__.py", line 2, in <module>
from .netshare.netshare_pre_post_processor import NetsharePrePostProcessor
ModuleNotFoundError: No module named 'netshare.pre_post_processors.netshare'
Hello,
Thank you for your work. I'm interested in implementing your solution for netflow traffic generation.
Unfortunately, at the end of the chunk0 training, I experience the following error (I copy/paste the last part of the stdout/err, because the entire would be too long).
Stop data_loader #0: True
Stop data_loader #1: True
data loader ended
-------------
data loader ended
-------------
data loader endeddata loader ended
-------------
-------------
data loader ended
-------------
Stop data_loader #2: True
Stop data_loader #3: True
Stop data_loader #4: True
data loader ended
-------------
Stop data_loader #5: True
Stop data_loader #6: True
Stop data_loader #7: True
Stop data_loader #8: True
Stop data_loader #9: True
-------------
Finish launching chunk0 experiments ...
Start waiting for chunk0 from config_group_id 0experiments finished ...
Traceback (most recent call last):
File "/srv/tempdd/aschoen/NetShare/netshare/model_managers/model_manager.py", line 34, in train
model_config=model_config)
File "/srv/tempdd/aschoen/NetShare/netshare/model_managers/netshare_manager/netshare_manager.py", line 42, in _train
log_folder=log_folder)
File "/srv/tempdd/aschoen/NetShare/netshare/ray/remote.py", line 34, in remote
return ResultWrapper(self._ray_args[0](*args, **kwargs))
File "/srv/tempdd/aschoen/NetShare/netshare/model_managers/netshare_manager/train_helper.py", line 104, in _train_specific_config_group
log_folder)
File "/srv/tempdd/aschoen/NetShare/netshare/model_managers/netshare_manager/train_helper.py", line 24, in _launch_other_chunks_training
raise ValueError("Pretrain_dir {} does not exist!")
ValueError: Pretrain_dir {} does not exist!
Traceback (most recent call last):
File "example_netflow.py", line 15, in <module>
generator.train_and_generate(work_folder='../results/vis_test/')
File "/srv/tempdd/aschoen/NetShare/netshare/generators/generator.py", line 203, in train_and_generate
if not self.train(work_folder):
File "/srv/tempdd/aschoen/NetShare/netshare/generators/generator.py", line 196, in train
log_folder=self._get_model_log_folder(work_folder)):
File "/srv/tempdd/aschoen/NetShare/netshare/generators/generator.py", line 122, in _train
model_config=self._model_config)
File "/srv/tempdd/aschoen/NetShare/netshare/model_managers/model_manager.py", line 34, in train
model_config=model_config)
File "/srv/tempdd/aschoen/NetShare/netshare/model_managers/netshare_manager/netshare_manager.py", line 42, in _train
log_folder=log_folder)
File "/srv/tempdd/aschoen/NetShare/netshare/ray/remote.py", line 34, in remote
return ResultWrapper(self._ray_args[0](*args, **kwargs))
File "/srv/tempdd/aschoen/NetShare/netshare/model_managers/netshare_manager/train_helper.py", line 104, in _train_specific_config_group
log_folder)
File "/srv/tempdd/aschoen/NetShare/netshare/model_managers/netshare_manager/train_helper.py", line 24, in _launch_other_chunks_training
raise ValueError("Pretrain_dir {} does not exist!")
ValueError: Pretrain_dir {} does not exist!
Maybe it has something to do with my config file and especially the part with the pretrain_dir argument. I've tryed with pretrain_dir = null or pretrain_dir = '/path/to/a/directory/on/my/computer' but both return me this error.
This is my complete config file
{
"global_config": {
"overwrite": true,
"original_data_file": "../traces/ugr16/raw.csv",
"dataset_type": "netflow",
"n_chunks": 10,
"dp": false,
"word2vec_vecSize": 10,
"timestamp": "interarrival",
"truncate": "per_chunk",
"max_flow_len": false
},
"pre_post_processor": {
"class": "NetsharePrePostProcessor",
"config": {
"norm_option": 0,
"split_name": "multichunk_dep_v2",
"df2chunks": "fixed_time",
"full_IP_header": true,
"encode_IP": "bit"
}
},
"model_manager": {
"class": "NetShareManager",
"config": {
"pretrain_dir": null,
"skip_chunk0_train": false,
"pretrain_non_dp": true,
"pretrain_non_dp_reduce_time": 4.0,
"pretrain_dp": false,
"run": 0
}
},
"model": {
"class": "DoppelGANgerTFModel",
"config": {
"batch_size": 1000,
"sample_len": [
1,
5,
10
],
"sample_len_expand": true,
"iteration": 500,
"vis_freq": 2001,
"vis_num_sample": 5,
"d_rounds": 5,
"g_rounds": 1,
"num_packing": 1,
"noise": true,
"attr_noise_type": "normal",
"feature_noise_type": "normal",
"rnn_mlp_num_layers": 0,
"feed_back": false,
"g_lr": 0.0001,
"d_lr": 0.0001,
"d_gp_coe": 10.0,
"gen_feature_num_layers": 1,
"gen_feature_num_units": 100,
"gen_attribute_num_layers": 5,
"gen_attribute_num_units": 512,
"disc_num_layers": 5,
"disc_num_units": 512,
"initial_state": "random",
"leaky_relu": false,
"attr_d_lr": 0.0001,
"attr_d_gp_coe": 10.0,
"g_attr_d_coe": 1.0,
"attr_disc_num_layers": 5,
"attr_disc_num_units": 512,
"aux_disc": true,
"self_norm": false,
"fix_feature_network": false,
"debug": false,
"combined_disc": true,
"use_gt_lengths": false,
"use_uniform_lengths": false,
"num_cores": null,
"sn_mode": null,
"scale": 1.0,
"extra_checkpoint_freq": 20000,
"epoch_checkpoint_freq": 1000,
"dp_noise_multiplier": null,
"dp_l2_norm_clip": null
}
}
}
Thank you in advance for your help.
Hallavar
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.