ando-khachatryan / hidden Goto Github PK

Pytorch implementation of paper "HiDDeN: Hiding Data With Deep Networks" by Jiren Zhu, Russell Kaplan, Justin Johnson, and Li Fei-Fei

License: MIT License

Python 99.19% Shell 0.81%

hidden's Introduction

HiDDeN

Pytorch implementation of paper "HiDDeN: Hiding Data With Deep Networks" by Jiren Zhu*, Russell Kaplan*, Justin Johnson, and Li Fei-Fei: https://arxiv.org/abs/1807.09937
*: These authors contributed equally

The authors have Lua+Torch implementation here: https://github.com/jirenz/HiDDeN

Note that this is a work in progress, and I was not yet able to fully reproduce the results of the original paper.

Requirements

You need Pytorch 1.0 with TorchVision to run this. If you want to use Tensorboard, you need to install TensorboardX and Tensorboard. This allows to use a subset of Tensorboard functionality to visualize the training. However, this is optional. The code has been tested with Python 3.6+ and runs both on Ubuntu 16.04, 18.04 and Windows 10.

Data

We use 10,000 images for training and 1,000 images for validation. Following the original paper, we chose those 10,000 + 1,000 images randomly from one of the coco datasets. http://cocodataset.org/#download

The data directory has the following structure:

<data_root>/
  train/
    train_class/
      train_image1.jpg
      train_image2.jpg
      ...
  val/
    val_class/
      val_image1.jpg
      val_image2.jpg
      ...

train_class and val_class folders are so that we can use the standard torchvision data loaders without change.

Running

You will need to install the requirements, then run

python main.py new --name <experiment_name> --data-dir <data_root> --batch-size <b>

By default, tensorboard logging is disabled. To enable it, use the --tensorboard switch. If you want to continue from a training run, use

python main.py continue --folder <incomplete_run_folder>

There are additional parameters for main.py. Use

python main.py --help

to see the description of all of the parameters. Each run creates a folder in ./runs/<experiment_name date-and-time> and stores all the information about the run in there.

Running with Noise Layers

You can specify noise layers configuration. To do so, use the --noise switch, following by configuration of noise layer or layers. For instance, the command

python main.py new --name 'combined-noise' --data-dir /data/ --batch-size 12 --noise  'crop((0.2,0.3),(0.4,0.5))+cropout((0
.11,0.22),(0.33,0.44))+dropout(0.2,0.3)+jpeg()'

runs the training with the following noise layers applied to each watermarked image: crop, then cropout, then dropout, then jpeg compression. The parameters of the layers are explained below. It is important to use the quotes around the noise configuration. Also, avoid redundant spaces If you want to stack several noise layers, specify them using + in the noise configuration, as shown in the example.

Update 16.04.2019 Prior to this, the noise layers in our implementation worked sequentially, that is, resize(...)+jpeg() first resized the image and then applied jpeg compression. This is different from the behaviour described in the paper. From 16.04.2019, this has been changed. Now, if several noise layers are specified, one of them is picked at random and applied to the batch.

Noise Layer paremeters

Crop((height_min,height_max),(width_min,width_max)), where (height_min,height_max) is a range from which we draw a random number and keep that fraction of the height of the original image. (width_min,width_max) controls the same for the width of the image. Put it another way, given an image with dimensions H x W, the Crop() will randomly crop this into dimensions H' x W', where H'/H is in the range (height_min,height_max), and W'/W is in the range (width_min,width_max). In the paper, the authors use a single parameter p which shows the ratio (H' * W')/ (H * W), i.e., which fraction of the are to keep. In our setting, you can obtain the appropriate p by picking height_min, height_max width_min, width_max to be all equal to sqrt(p)
Cropout((height_min,height_max), (width_min,width_max)), the parameters have the same meaning as in case of Crop.
Dropout(keep_min, keep_max) : where the ratio of the pixels to keep from the watermarked image, keep_ratio, is drawn uniformly from the range (keep_min,keep_max).
Resize(keep_min, keep_max), where the resize ratio is drawn uniformly from the range (keep_min, keep_max). This ratio applies to both dimensions. For instance, of we have Resize(0.7, 0.9), and we randomly draw the number 0.8 for a particular image, then the resulting image will have the dimensions (H * 0.8, W * 0.8).
Jpeg does not have any parameters.

Experiments

The data for some of the experiments are stored in './experiments/ folder. This includes: figures, detailed training and validation losses, all the settings in pickle format, and the checkpoint file of the trained model. Here, we provide summary of the experiments.

Setup

We try to follow the experimental setup of the original paper as closely as possibly. We train the network on 10,000 random images from COCO dataset. We use 200-400 epochs for training and validation. The validation is on 1,000 images. During training, we take randomly positioned center crops of the images. This makes sure that there is very low chance the network will see the exact same cropped image during training. For validation, we take center crops which are not random, therefore we can exactly compare metrics from one epoch to another.

Due to random cropping, we observed no overfitting, and our train and validation metrics (mean square error, binary cross-entropy) were extremely close. For this reason, we only show the validation metrics here.

When measuring the decoder accuracy, we do not use error-correcting codes like in the paper. We take the decoder output, clip it to range [0, 1], then round it up. We call this "Decoder bitwise error". We also report mean squate error of of the decoder for consistency with the paper.

Experimental runs

This table summarizes experimental runs. Detailed information about the runs can be found in ./experiments folder.

experiment_name	loss	encoder_mse	bitwise-error	dec_mse	epoch
crop(0.2-0.25)	0.046	0.0019	0.0603	0.0435	300
cropout(0.55-0.6)	0.071	0.0011	0.0647	0.0662	300
dropout(0.55-0.6)	0.033	0.0019	0.008	0.0298	300
jpeg	0.0272	0.0025	0.0096	0.0253	300
resize(0.7-0.8)	0.0251	0.0016	0.0052	0.0238	300
combined-noise	0.1681	0.0028	0.2109	0.1648	400

No noise means no noise layers.
Crop((0.2,0.25),...) is shorthand for Crop((0.2,0.25),(0.2,0.25)). This means that the height and the weight of the cropped image have the expected value of (0.25 + 0.2)/2 = 0.225. Therefore, the ratio of (expected) area of the Cropped image against the original image is 0.225x0.225 ≈ 0.05. The paper used p = 0.035.
Cropout((0.55,0.6),...) is a shorhand for Cropout((0.55,0.6),(0.55,0.6)). Similar to Crop(...), this translates to ratio of Cropped vs original image areas with p ≈ 0.33. The paper used p = 0.3
Jpeg the same as the Jpeg layer from the paper. It is a differentiable approximation of Jpeg compression with the highest compression coefficient.
combined-noise is the configuration 'crop((0.4,0.55),(0.4,0.55))+cropout((0.25,0.35),(0.25,0.35))+dropout(0.25,0.35)+resize(0.4,0.6)+jpeg()'. This is somewhat similar to combined noise configuation in the paper.

hidden's People

Contributors

Stargazers

Watchers

hidden's Issues

how to change the message_length?

JPEG可能错误

ef jpeg_compress_decompress(image, downsample_c=False, rounding=diff_round, factor=1.,device=True):
# todo 是01区间，而不是-1至1区间
# image=(image+1)/2改为
image*=255
b, c, h, w = image.size()

Would you explain the meaning of "highest compression coefficient" of Jpeg?

Hello, Ando.

Thank you for sharing the great repo.

Thanks to this repo, I can understand paper.

I have a question about JPEG compression.

In the paper, they randomly picked the quality factor.

But your implementation does not have and you left the message
"Jpeg the same as the Jpeg layer from the paper. It is a differentiable approximation of Jpeg compression with the highest compression coefficient."

Are you meaning current implementation is equal to Q = 50?

Thank you.

How to use multi-GPUs to train the model?

Hello, I am trying to train the model with image size of 3512512, but the training speed is too slow because it uses only one GPU . I added some codes such as 'torch.nn.DataParallel(model)' to make it can do parallel. However, it doesn't work at all.
Can this program be parallel?

Help me run this project

Can you kindly give me a detailed steps to run this project for a beginner guy like me. I can't figure out where to include my dataset folder(training and testing folder). Please help me with this. Thanks in advance

Error in JpegCompression

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [1, 1, 256, 256]], which is output 0 of UnsqueezeBackward1, is at version 3; expected version 2 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

About the test setting of the trained model

Thanks for your contribution of this repository.
I have a doubt. In the original paper, the authors mentioned that the experimental results given are all tested under real attacks, such as JPEG compression with different quality factors, but in this repository, it seems that the trained models are verified and tested on the basis of simulated noise attacks?

TypeError

Hello, thanks for your code. I tried to run it but got a TypeError in HiDDeN/model/encoder.py line 40. The error log is: expected Tensor as element 1 in argument 0, but got Sequential. Is it caused by the different type of encoded_image and image?

Not matching in preprocess data

Hi. I have a question about your code.
In utils module, you normalize input into mean = 0.5 and std = 0.5. But in test_model module, you rerange input from [0, 1] to [-1, 1]. Can you explain why?

a question about function"eval()"

Hello, I would like to ask a question. Why does adding function "eval()" in the validation mode raise "bitwise-error"? After removing "eval()", the "bitwise-error" of the validation mode is similar to the training mode's.
looking forward to your reply.

Modify the input binary information watermark to input an image watermark

Hi, thanks for the code.
I am currently trying to modify the embedded watermark into an image, but I have encountered many problems. Do you and your team have any research in this direction? After all, the application of binary information as a watermark in real life is less meaningful.
Anyway, thank you!

Can the image watermark resist printing and photographing

Memory requirements for your experiments

Dear Ando,
I would like to know how much memory / gpu is required to run the same experiments as yours.
Your 400 epochs trainining took about 20 hours according to train.csv
What is the configuration you used for your own? Is it close to the original paper?

On issue #12 , they were using 131663888 kB memory and still got out of memory error.
Is it that heavy?

Thank you very much

Out of Memory

I run this command in my lab servers.
th main.lua --develop --name test-run --type float>
And I got error like this.
{
maxPoolStride : 2
noProgress : false
name : "test-run"
learningRate : 0.001
transmissionJPEGU_yc : 5
batchSize : 12
develop : true
optimType : "adam"
adversaryFeatureDepth : 64
messageLength : 30
transmissionCropout : 0.4
transmissionDropout : 0.4
transmissionJPEGQuality : 50
type : "float"
transmissionCropSize : 0.5
decoderConvolutions : 6
loadCheckpoint : ""
fixImage : false
encoderPreMessageConvolution : 3
noSave : false
seed : 1234
maxPoolWindowSize : 4
transmissionGaussianSigma : 2
small : false
encoderFeatureDepth : 64
confusionPer : 20
imageSize : 128
savePer : 20
imagePenaltyCoef : 1
testPer : 1
save : "checkpoints"
transmissionJPEGU_yd : 0
fixMessage : false
epochs : 200
decoderFeatureDepth : 64
transmissionNoiseType : "identity"
thin : false
transmissionJPEGCutoff : 5
transmissionJPEGU_uvd : 0
transmissionJPEGU_uvc : 3
small16 : false
transmissionOutsize : 128
transmissionCombinedRecipe : ""
adversary_gradient_scale : 0.1
adversaryConvolutions : 2
messagePenaltyCoef : 1
grayscale : false
transmissionConcatenatedRecipe : ""
encoderPostMessageConvolution : 1
randomImage : false
}
{
beta1 : 0.9
epsilon : 1e-08
learningRateDecay : 0
learningRate : 0.001
beta2 : 0.999
}
Loading training dataset
Accepting non-grayscale input
test-run: starting to train

epoch: 1

slurmstepd: error: Detected 1 oom-kill event(s) in step 10160.1 cgroup. Some of your processes may have been killed by the cgroup out-of-memory handler.
srun: error: wmc-slave-g6: task 0: Out Of Memory

It looks like that I have no enough memory to run this. I just git clone these code and run the test command.

Could you plz share some requirements about this ? Thank you !

By the way, hope the pretrained models for research.

Thank you !

help! RuntimeError: result type Float can't be cast to the desired output type Long

File "D:\AP\Anaconda\ANaconda\lib\site-packages\torch\nn\modules\module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "D:\AP\Anaconda\ANaconda\lib\site-packages\torch\nn\modules\loss.py", line 713, in forward
return F.binary_cross_entropy_with_logits(input, target,
File "D:\AP\Anaconda\ANaconda\lib\site-packages\torch\nn\functional.py", line 3132, in binary_cross_entropy_with_logits
return torch.binary_cross_entropy_with_logits(input, target, weight, pos_weight, reduction_enum)
RuntimeError: result type Float can't be cast to the desired output type Long

Is embedding capacity ≈ 0.0006 when considering robustness？

Dear author，
When we consider watermarking（robustness），
the embedding capacity is 30 ÷ 3 ÷ 128 ÷ 128 ≈ 0.00061， right？

Overflow/Underflow of pixel value

Hi, thanks for your contribution to the source code. I encounter a problem when training the encoder and decoder. After embedding a watermark into the cover image, the pixel value may overflow/underflow, which degrades the visual quality of the encoded image (some weird noises appear). To tackle this problem, I clip the pixel value to ensure it is in the range [-1,1]. However, this leads to a decrease in the bit error rate between the extracted message and the original one. What should I do? Besides, it seems that converting a float value to an integer may also decrease the bit error rate due to the loss of precision. Is there any good solution?

Hide a large message

Hello.
Firstly, sorry for my poor English.
Your work is great. I'm very excited about digging into it. But I have a problem: I want to hide information in a (128x128) RGB image. Is it possible to embed a 3200bit length message into the image in order to maintain bpp = 0.2 If it is not, what should I do?
Thank you!

Images dimensions

Hello, first of all, thank you for your code, it's very clear and it's a big pleasure to read it.
Could you help me understand, it's possible to train on images of arbitrary shapes and also make predictions for arbitrary images? As I can see, currently in this project, you are doing crop from the original ones.

Error

Hello! I'm running main.py There was an error in the file, but I couldn't find a solution

“”Running validation for epoch 1/200
Traceback (most recent call last):
File "F:/fu/HiDDeN-master/main.py", line 147, in
main()
File "F:/fu/HiDDeN-master/main.py", line 143, in main
train(model, device, hidden_config, train_options, this_run_folder, tb_logger)
File "F:\fu\HiDDeN-master\train.py", line 88, in train
os.path.join(this_run_folder, 'images'), resize_to=saved_images_size)
File "F:\fu\HiDDeN-master\utils.py", line 58, in save_images
torchvision.utils.save_image(stacked_images, filename, original_images.shape[0], normalize=False)
File "F:\fu\StegaStamp\venv\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "F:\fu\StegaStamp\venv\lib\site-packages\torchvision\utils.py", line 135, in save_image
im.save(fp, format=format)
File "F:\fu\StegaStamp\venv\lib\site-packages\PIL\Image.py", line 1978, in save
if format.upper() not in SAVE:
AttributeError: 'int' object has no attribute 'upper'

Process finished with exit code 1
“”

Security issues of the second threat scenario

Recently, I was reproducing your work. That's really promising and inspiring. And I got some issues consequently. Emmm, whether or not the only difference among these 5 models is the random seed? Perhaps I'd ignored some details, but the detection accuracy of the trained model towards encoded images genereated from 6th model was over 50%, even higher... That's really confusing and it will help a lot with your comfirmation received.@ando-khachatryan

Results with Combined Noise lower than expected

Hi and thanks for the code and architecture!
I was wondering why the results with the combined noise are lower than the ones from the paper. Indeed, the original paper recovers 100% accuracy for the combined models when dealing with untransformed watermarked images, while we only obtain around 90% for networks trained with combined noise here.
Do you happen to know why ?

the question about the loss function in the paper

Regarding the optimization target L_G for the encoder, I think it should optimize log(A(I_en)) instead of log(1- A(I_en)). This is because the generator needs to generate targets that the discriminator cannot distinguish, meaning the value of A(I_en) should decrease, and consequently, the value of log should also decrease. It seems like the loss function in the original text is written in the opposite way. Can someone help me solve this question?

error ouccur when validating a distortion with its hyperparameters e.g crop

`---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[1], line 76
73 image = image.to(device)
74 message = torch.Tensor(np.random.choice([0, 1], (image.shape[0], hidden_config.message_length))).to(device)
---> 76 losses, (encoded_images, noised_images, decoded_messages) = model.validate_on_batch([image, message])
78 # Calculate average bitwise error
79 decoded_rounded = decoded_messages.detach().cpu().numpy().round().clip(0, 1)

File ~/feiyuchen3/HiDDeN-master/model/hidden.py:166, in Hidden.validate_on_batch(self, batch)
163 ######
164 d_loss_on_cover = self.bce_with_logits_loss(d_on_cover, d_target_label_cover)
--> 166 encoded_images, noised_images, decoded_messages = self.encoder_decoder(images, messages)
168 d_on_encoded = self.discriminator(encoded_images)
169 ######

File ~/anaconda3/envs/pytorch_lab20/lib/python3.9/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
1496 # If we don't have any hooks, we want to skip the rest of the logic in
1497 # this function, and just call forward.
1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1499 or _global_backward_pre_hooks or _global_backward_hooks
1500 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501 return forward_call(*args, **kwargs)
1502 # Do not call functions when jit is used
1503 full_backward_hooks, non_full_backward_hooks = [], []
...
30 random_noise_layer = np.random.choice(self.noise_layers, 1)[0]
31 print(random_noise_layer)
---> 32 return random_noise_layer(encoded_and_cover)

TypeError: 'dict' object is not callable`

This is the error log.

After checking, the bug may be the stored trained experiments' configure.

Take the crop-0.2-025 for example. In this situation, the random_noise_layer = np.random.choice(self.noise_layers, 1)[0] would output a dictionary in the form of {'type': 'crop', 'height_ratios': (0.2, 0.25), 'width_ratios': (0.2, 0.25)}. Instead, we need here a function

How do I test a trained model

If I want to adjust different attack intensity, for example, crop 0.5 to 0.8, do I need to retrain the model? If not, how do I adjust the attack intensity parameters

Cannot reproduce the results provided in the "experiments" folder?

I created a training set by sampling 10000 images from the Coco datasets. I have some questions:

I can reproduce the "crop" experiment's training footprint with a similar scale of loss. But I cannot reproduce the 'combined' experiment. The total loss increased after several epochs and the training bitwise-error remained very high (0.46). It seems the model can not converge very well.
In the paper, the authors use YUV color space. Did you use YUV color space?

Help!The version of torch used in the paper！

Running validation for epoch 1/300
Traceback (most recent call last):
File "main.py", line 147, in
main()
File "main.py", line 143, in main
train(model, device, hidden_config, train_options, this_run_folder, tb_logger)
File "/content/HiDDeN/train.py", line 85, in train
utils.save_images(image.cpu()[:images_to_save, :, :, :],
File "/content/HiDDeN/utils.py", line 57, in save_images
torchvision.utils.save_image(stacked_images, filename, original_images.shape[0], normalize=False)
File "/usr/local/lib/python3.8/dist-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/torchvision/utils.py", line 728, in save_image
im.save(fp, format=str(format))
File "/usr/local/lib/python3.8/dist-packages/PIL/Image.py", line 2123, in save
save_handler = SAVE[format.upper()]
KeyError: '8'

When we reproduce the experiment, we found a lot of errors caused by our difference from your Torch version.we want to know which version of Torch you used in paper, Appreciate it.

Syntax error in : model/hidden.py def to_stirng(self):

On line 183 of model/hidden.py:

def to_stirng(self):

should be

def to_string(self):

Can capacity increase？

Hello，
in your steganography experiment，
binary messages of length L is 52，cover image is 16×16（grayscale）。
Can L be longer? that's to say，can capacity increase？