GithubHelp home page GithubHelp logo

mjkwon2021 / cat-net Goto Github PK

View Code? Open in Web Editor NEW
204.0 4.0 25.0 24.76 MB

Official code for CAT-Net: Compression Artifact Tracing Network. Image manipulation detection and localization.

Python 88.58% C++ 5.69% Cuda 5.73%
forensics image-forensics image-forgery-detection image-manipulation-detection image-splicing-detection discrete-cosine-transform forgery-detection jpeg

cat-net's People

Contributors

cauchycomplete avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

cat-net's Issues

Question about training

Hi,
I'm trying to detect removal tampering in images. As mentioned in this issue the reason CAT-Net is unable to detect these forgeries is due to lack of data. So, my question is, if I need to train this model on a removal dataset, how much will I need to change the model architecture or there will be no changes required?

Thanks in Advance.

about infer.py

image
This is a great project!! Can you tell me why there is no output when I run infer.py and there is no output image in the out_pre folder?

Using pretrained when training

According to your paper, you initialized RGB Stream weight by pretrained on ImageNet and DCT Stream by double JPEG classification. Do you think that I can train CAT-NET model without using pretrained? I reimplementing your model in tensorflow and it so difficult to load the weight of the pretrained model.
Thank you <3

About JPEGIO

Thanks for publishing the good work.

I have the main issue with the installation of JPEGIO. I am encountering the following issue, even tried its all solution.

  error: command 'C:\\Program Files\\Microsoft Visual Studio\\2022\\Community\\VC\\Tools\\MSVC\\14.36.32532\\bin\\HostX86\\x64\\cl.exe' failed with exit code 2

CUDA: 11.8
Python3.9

Is there any solution? Or alternative library that we can use. I tried other alternatives but failed in components collection.

Generalization on new data!

Hi!

Thanks for sharing this code. I've a question about training Cat-Net on custom dataset related to forgery which is equivalent to IMD2020 in terms of size.

While training Cat-Net on the custom data, I've observed that while using pre-trained weights/training from scratch, to train on new data the performance keeps on decreasing and the model gets overfitted (val_loss keeps on increasing and train_loss keeps on decreasing).

Maybe we should fine tune it instead of training all the network but even with a very low learning rate the metrics are not getting stable.

Can you explain (I've not specifically found this in paper), on how many datasets your original model is trained and can you suggest how we can train CatNet in order to generalize well on custom dataset.

Smallest Dataset size

I am try train on small (400 images) CASIA dataset for debug purpose.
I am get assert error: "assert 0 <= index < len(self.tamp_list), f"Index {index} is not available!""
In data_core.py I am found "self.smallest = 1869 # smallest dataset size (IMD:1869)" and presume my dataet is too small.
I am wonder why smallest dataset size is exactly 1869?

dataset processing

Hello,Thank you for your wonderful code!I see COLUMB data set is used in the paper, may I ask how do you deal with the mask of this data set

NaN

Thank you again for your answer. Now there is a new problem, Loss is NaN at the beginning of training. Do you know how to solve this problem?

Epoch: [1/200] Iter:[280/934], Time: 7.62, lr: 0.0049, Loss: nan
NaN or Inf found in input tensor.

F1 score

Can you upload the code for F1 score?

error

Hi,Sorry to bother you.when I try to run tools/infer.py by args = argparse.Namespace(cfg='../experiments/CAT_DCT_only.yaml', opts=['TEST.MODEL_FILE', '../output/splicing_dataset/CAT_DCT_only/DCT_only_v2.pth.tar', 'TEST.FLIP_TEST', 'False', 'TEST.NUM_SAMPLES', '0']) and test_dataset = splicing_dataset(crop_size=None, grid_crop=True, blocks=('DCTvol', 'qtable'), DCT_channels=1, mode='arbitrary', read_from_jpeg=True) # DCT stream .

I just get:
D:\conda\envs\cat\python.exe C:/Users/ty/Desktop/suanfayichengxu/CAT-Net-main/tools/infer.py
<Splicing.data.dataset_arbitrary.arbitrary object at 0x00000206334F4F28>(0)
crop_size=None, grid_crop=True, blocks=('DCTvol', 'qtable'), mode=arbitrary, read_from_jpeg=True, class_weight=tensor([1., 1.])

=> loading model from ../output/splicing_dataset/CAT_DCT_only/DCT_only_v2.pth.tar
0it [00:00, ?it/s]Epoch: 199
0it [00:01, ?it/s]

Process finished with exit code 0

Memory leak when training

Hi,
When I tried to train this model, I found there is a memory leak (CPU RAM) at loading image part. The memory increase (almost linear increasing) with script running.
And I use the memory_profiler to check the place of memory leak, it shows that

Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
    99   4743.2 MiB   4743.2 MiB           1   @profile
   100                                         def train_cls(config, epoch, num_epoch, epoch_iters, base_lr, num_iters,
   101                                                   trainloader, optimizer, model, writer_dict, final_output_dir):
   102                                             # Training
   103   4743.2 MiB      0.0 MiB           1       model.train()
   104   4743.2 MiB      0.0 MiB           1       batch_time = AverageMeter()
   105   4743.2 MiB      0.0 MiB           1       ave_loss = AverageMeter()
   106   4743.2 MiB      0.0 MiB           1       tic = time.time()
   107   4743.2 MiB      0.0 MiB           1       cur_iters = epoch * epoch_iters
   108   4743.2 MiB      0.0 MiB           1       writer = writer_dict['writer']
   109   4743.2 MiB      0.0 MiB           1       global_steps = writer_dict['train_global_steps']
   110   4743.2 MiB      0.0 MiB           1       world_size = get_world_size()
   111                                         
   112   5984.3 MiB -53718.8 MiB          41       for i_iter, (images, labels, qtable) in enumerate(trainloader):
   113                                                 # images, labels, _, _ = batch
   114   7328.2 MiB  55016.1 MiB          41           images = images.cuda()
   115   7328.3 MiB    -43.6 MiB          41           labels = labels.long().cuda()
   116                                         
   117   7328.3 MiB   1126.4 MiB          41           losses, _ = model(images, labels, qtable)  # _ : output of the model (see utils.py)
   118   7328.3 MiB     -0.4 MiB          41           loss = losses.mean()

It seems that the memory leak happens at images=images.cuda()
When I set worker > 0 (worker=4), all of workers' memory increase (by top command)

DO you have any idea about the reason why there is a memory leak when data loading.

Thanks a lot! : )

Questions about crop processing of DCT volume

Hi~
I saw there is a param self._grid_crop in dataset class, determining a random number wheter it is a integer multiple of 8.
Does it have a large influence when I set this param as False.

if self._grid_crop:
    s_r = (random.randint(0, max(h - crop_size[0], 0)) // 8) * 8
    s_c = (random.randint(0, max(w - crop_size[1], 0)) // 8) * 8
else:
    s_r = random.randint(0, max(h - crop_size[0], 0))
    s_c = random.randint(0, max(w - crop_size[1], 0))
# crop img_RGB
img_RGB = img_RGB[s_r:s_r+crop_size[0], s_c:s_c+crop_size[1], :]

I worry about that when I set this param as False, the crop results cannot align with the original DCT blocks (since it seems like the DCT in jpegio is calculated by a 8x8 block. And cropping may break this block and generate a new 8x8 block, which is not the same as the original DCT information from jpegio).
btw, I am also stuck in the implementation about DCT calculation in jpegio. It is quite different with the results by cv2.dct(Y) in Y channel. Can you give me some breif explanations? :)
thanks a lot
<3

train dataset

hi, can you please offer the download address of your five custom datasets.

About validate performance

Hi, I'm kind of confused about the validation performance after each training epoch, I am now traing on my own dataset , it seems that the training loss is continuing dropping after each epoch , but the numbers in validation keeps exactly the same after each epoch, but i think it should at least change a bit. may i know what cause that?Thank you!
image
image
image
image

insufficient shared memory in training phase

Hi,
I occurred the same problem as #24 When I tried to train this model. I saved all the images with the format of '.jpg'. However, the RAM memory linearly increased during the training process (Even I only iterate the training data and ignore anything else). Finally, the system would reported "RuntimeError: DataLoader worker (pid 11676) is killed by signal: Bus error. It is possible that dataloader's workers are out of shared memory. Please try to raise your shared memory limit.”
It's hard for me the fix this bug after I tried everything, can you give me some advises about this?
BTW, I found the variant "out_view" in AbstractDataset._get_jpeg_info is not used, I'm curious what it does.

DCT_coef is all 0

in AbstractDataset.py file,i notice at 96 number of rows, the "DCT_coef.append(out_arr)" ,the out_arr matrix is 0,why

About training time

Very happy to see your work!
When I was downloading the tampcoco dataset, I found 40G of content and would like to ask you how long is the training duration in 2X NVIDIA TITAN RTX?
If I have only one NVIDIA TITAN RTX, will it lead to a long trial period?
Thank you for answering my questions

About training dataset 'Fantastic Reality'

Hi, your work is really fascinating to me .Can you offer me some help about the access to the datasets of the fantastic reality dataset? It is really hard to fetch, really need your help!. Thanks a lot !

Docker image for inference

Is there any docker image that I can atleast use for inference? That would be really helpful!

For some reason, I am finding it hard to install jpegio.

Inference on CPU

Hello,

I would like to test your code and your pre-trainer model and I wanted to know if it works only on CPU?
Do you know the execution time to process an image?

Thanks in advance.

Pretraining DCT

Hi,

Thank you for the code.
I was trying to understand how you do the pretraining step using only the DCT stream.
You say in the paper that you use data from Park et al. containing single a double compressed images. Regarding pretraining I have a couple of doubts that I hope you can help me understand:

  • Looking at the code I see that in Splicing/data/AbstractDataset.py you return a tensor with the qtables and the gt masks. But in the case of the DJPEG annotations you are receiving an integer 0 or 1 instead of a mask, so when the tensor is created in the _create_tensor function it throws an AttributeError: 'int' object has no attribute 'shape' in line 133.
    Shouldn't this code create a mask full of zeros when the image is single compressed and full of ones when it is double compressed?

  • If these images have sizes 256x256, doesn't the padding to 512x512 with pixels with values 127.5 affect the performance during pretraining? SInce masks would have 1/4 of pixels with 0's or 1's and the remaining 3/4 with gray values. Maybe I didn't understand it correctly.

Thank you.

The results are different

Hello, thanks for the code you provided, download the dataset according to the CASIA address provided by dataset_CASIA. Py, set the path of the dataset and run the file dataset_CASIA. However, the result is not the same as the one you provided. I guess this is the reason why training errors are reported " FileNotFoundError: [Errno 2] No such file or directory: 'E:\cat\datasets\CASIA\CASIA 2.0\jpg\Tp_S_NRN_S_N_sec20081_sec20081_01671.
jpg ". Can you help me analyze where went wrong, thank you very much .The result is as follows:
your mine
#CASIA2 imlist 6042 5122
#mask validation new_list 6025 5105
#CASIA2 authentic imlist 6042 12613

RGB Only model?

Dear @mjkwon2021
I saw that u have build the model with RGB Stream only and the result is not bad (In your paper)
Can you share the pretrianed-model and how to inference with RGB Stream only
Thank you so much <3

about training

How to train a model consistent with CAT_full_v1 performance?

A question about the dataset and train

Thanks for your sharing your code. I have a question about the train.

you say when I want to try "Set training and validation set configuration in Splicing/data/data_core.py." However, in the data_core.py,

" self.dataset_list.append(tampCOCO(crop_size, grid_crop, blocks, DCT_channels, "Splicing/data/cm_COCO_train_list.txt"))"

The tampCOCO has a train and test.txt, but when I download the tampCOCO dataset, it only has a list.txt. How can i get the train_list.txt? and what is the file format of train_list.txt?

CAT_full.yaml'

hello, I followed your steps to run the code, but the following error occurred. I checked the file directory and there was no problem. Could you give me some advice? Thank you very much

Traceback (most recent call last):
File "/home/x1234/Dan/Dan_test/CAT/tools/train.py", line 233, in
main()
File "/home/x1234/Dan/Dan_test/CAT/tools/train.py", line 73, in main
update_config(config, args)
File "/home/x1234/Dan/Dan_test/CAT/tools/../lib/config/default.py", line 114, in update_config
cfg.merge_from_file(args.cfg)
File "/home/x1234/anaconda3/envs/cat/lib/python3.6/site-packages/yacs/config.py", line 211, in merge_from_file
with open(cfg_filename, "r") as f:
FileNotFoundError: [Errno 2] No such file or directory: 'experiments/CAT_full.yaml'

Process finished with exit code 1

the Resolution is lower

when I test your code ,I find the Resolution of output is lower than the input.How can I solve the problem.

false negative report

hi, i just find if the forged pic experience a compression step after tamper workflow such as change the png format pic in to jpg format pic in some compression quality, the model will be no longer in force, also the resample operator such as crop the pic by win software and the pic may be resampled automatic.

About class weight

Hi @CauchyComplete again, have nice day!
In your paper and in your training code, I see that it's fivefold more weight on the tampered class (0.5/2.5 in your code)
I wonder that if it depend on the dataset? (number of tampered image and authentic image)
If it depend on number of tampered image and authentic image, how can I calculate this ratio?
Thank you for your reply <3

Error when run with CAT_DCT_only

Hi, thank you for perfect project <3
I run python tool/infers.py
With this line args = argparse.Namespace(cfg='experiments/CAT_DCT_only.yaml', opts=['TEST.MODEL_FILE', 'output/splicing_dataset/CAT_DCT_only/DCT_only_v2.pth.tar', 'TEST.FLIP_TEST', 'False', 'TEST.NUM_SAMPLES', '0'])
To use CAT_DCT_only, but i got this error, can you help me fix it. Thank you

=> loading model from output/splicing_dataset/CAT_DCT_only/DCT_only_v2.pth.tar
Epoch: 199
  0% 0/2 [00:01<?, ?it/s]
Traceback (most recent call last):
  File "tools/infer.py", line 155, in <module>
    main()
  File "tools/infer.py", line 132, in main
    _, pred = model(image, label, qtable)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/parallel/data_parallel.py", line 150, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/content/drive/My Drive/Colab Notebooks/CAT-net/CAT-Net/tools/../lib/utils/utils.py", line 34, in forward
    outputs = self.model(inputs, qtable)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/content/drive/My Drive/Colab Notebooks/CAT-net/CAT-Net/tools/../lib/models/network_DCT.py", line 408, in forward
    x = self.dc_layer0_dil(DCTcoef)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/container.py", line 92, in forward
    input = module(input)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/conv.py", line 338, in forward
    self.padding, self.dilation, self.groups)
RuntimeError: Given groups=1, weight of size 64 21 3 3, expected input[1, 24, 1280, 960] to have 21 channels, but got 24 channels instead

torch 1.1.0

ERROR: Could not find a version that satisfies the requirement torch==1.1.0 (from -r requirements.txt (line 1)) (from versions: 1.4.0, 1.5.0, 1.5.1, 1.6.0, 1.7.0, 1.7.1, 1.8.0, 1.8.1, 1.9.0, 1.9.1, 1.10.0, 1.10.1, 1.10.2, 1.11.0, 1.12.0, 1.12.1, 1.13.0, 1.13.1, 2.0.0, 2.0.1, 2.1.0)
ERROR: No matching distribution found for torch==1.1.0 (from -r requirements.txt (line 1))

Loss function problem

Hi @CauchyComplete again 💯
I am implementing your model with tensorflow. I try training it with CASIAv2/Tp (without pretrain) in 6 epoch but loss function reduce very low and not reduce since epoch 5. Can you share the plot of you loss without pretrain?
loss

CASIA2.0 notfoundfile

Hi, I have seen the CASIA2.0 address in your code dataset_CASIA. Py, downloaded it, and set the layout and path of the dataset as required, but errors kept coming when running the code, the errors are as follows, I have also checked the path of the dataset, but they are all correct, I don't know why, could you give me some suggestions?

warnings.warn("nn.functional.upsample is deprecated. Use nn.functional.interpolate instead.")
Epoch: [0/200] Iter:[0/1869], Time: 1069.73, lr: 0.005000, Loss: 0.695368
Traceback (most recent call last):
File "train.py", line 234, in
main()
File "train.py", line 185, in main
trainloader, optimizer, model, writer_dict, final_output_dir)
File "E:\cat\tools..\lib\core\function.py", line 58, in train
for i_iter, (images, labels, qtable) in enumerate(trainloader):
File "C:\Users\dell\anaconda3\envs\cat\lib\site-packages\torch\utils\data\dataloader.py", line 568, in next
return self._process_next_batch(batch)
File "C:\Users\dell\anaconda3\envs\cat\lib\site-packages\torch\utils\data\dataloader.py", line 608, in _process_next_batch
raise batch.exc_type(batch.exc_msg)
FileNotFoundError: Traceback (most recent call last):
File "C:\Users\dell\anaconda3\envs\cat\lib\site-packages\torch\utils\data_utils\worker.py", line 99, in _worker_loop
samples = collate_fn([dataset[i] for i in batch_indices])
File "C:\Users\dell\anaconda3\envs\cat\lib\site-packages\torch\utils\data_utils\worker.py", line 99, in
samples = collate_fn([dataset[i] for i in batch_indices])
File "E:\cat\tools..\Splicing\data\data_core.py", line 99, in getitem
return self.dataset_list[index//self.smallest].get_tamp(index%self.smallest)
File "E:\cat\tools..\Splicing\data\dataset_CASIA.py", line 55, in get_tamp
return self._create_tensor(tamp_path, mask)
File "E:\cat\tools..\Splicing\data\AbstractDataset.py", line 106, in _create_tensor
img_RGB = np.array(Image.open(im_path).convert("RGB"))
File "C:\Users\dell\anaconda3\envs\cat\lib\site-packages\PIL\Image.py", line 2912, in open
fp = builtins.open(filename, "rb")
FileNotFoundError: [Errno 2] No such file or directory: 'E:\cat\datasets\CASIA\CASIA 2.0\jpg\Tp_S_NNN_S_N_cha00089_cha00089_00871.
jpg'

can i use new arch?

Hi
I want to change HRNet to UNET. What changes I should do? I have changed HRNet architecture to UNET but for pretraining I am unable to find UNET weights with ImageNet available.
Pls Help

Evaluation result

Hi,

I appreciate your code. I was wondering how to get the evaluation results for each testing dataset.

I appreciate your help.

Thanks

fusion part

Can you tell me where this fusion part is in the code? I can't find it. Thank you.
image
:D

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.