dong03 / mvss-net Goto Github PK
View Code? Open in Web Editor NEWcode for Image Manipulation Detection by Multi-View Multi-Scale Supervision
code for Image Manipulation Detection by Multi-View Multi-Scale Supervision
Hello!
Thanks for your inspiring work. And I 'm very intersted in it,Among them, there is a little doubt in the training,How is batchsize set during training? After tampering and non-tampering images are shuffled in the dataloader, the loss function used by the images in batch will be different. My training setting, when the batchsize is 8, I don’t know how to calculate the loss,Is it to traverse or set batchsize to 1 or some other method?
Hi,
Thanks for your great work. I read your paper and released code these days, and I have questions about the Edge-Supervised Branch.
In your paper, the predicted manipulation edge map, denoted as {Gedge(xi)}, obtained by transforming the output of the last ERB with a sigmoid layer.
But in your code, I didn't see the sigmoid layer, only the last ERB as the output.
if self.sobel:
res1 = self.erb_db_1(run_sobel(self.sobel_x1, self.sobel_y1, c1))
res1 = self.erb_trans_1(res1 + self.upsample(self.erb_db_2(run_sobel(self.sobel_x2, self.sobel_y2, c2))))
res1 = self.erb_trans_2(res1 + self.upsample_4(self.erb_db_3(run_sobel(self.sobel_x3, self.sobel_y3, c3))))
res1 = self.erb_trans_3(res1 + self.upsample_4(self.erb_db_4(run_sobel(self.sobel_x4, self.sobel_y4, c4))), relu=False)
else:
res1 = self.erb_db_1(c1)
res1 = self.erb_trans_1(res1 + self.upsample(self.erb_db_2(c2)))
res1 = self.erb_trans_2(res1 + self.upsample_4(self.erb_db_3(c3)))
res1 = self.erb_trans_3(res1 + self.upsample_4(self.erb_db_4(c4)), relu=False)
if self.constrain:
x = rgb2gray(x)
x = self.constrain_conv(x)
constrain_features, _ = self.noise_extractor.base_forward(x)
constrain_feature = constrain_features[-1]
c4 = torch.cat([c4, constrain_feature], dim=1)
outputs = []
x = self.head(c4)
x0 = F.interpolate(x[0], size, mode='bilinear', align_corners=True)
outputs.append(x0)
if self.aux:
x1 = F.interpolate(x[1], size, mode='bilinear', align_corners=True)
x2 = F.interpolate(x[2], size, mode='bilinear', align_corners=True)
outputs.append(x1)
outputs.append(x2)
return res1, x0
I think 'res1' is the output, with no sigmoid layer.
Did I miss something? Could you help me? Thank you very much.
Best regards.
Hello!
Thanks for your inspiring work. And I 'm very intersted in it.
Last two weeks, I read your great work and decided to re-Implement your network according to the paper.
However, after two weeks, the network is roughly finished, and I trained it in the CASIA_v2 dataset. But the performance is not so good. I thought there must be something wrong with my code. So I have a few questions here.
Dear authors,
Thanks for sharing the source code of MVSS-Net and it is really a great work!
I am actually wondering how the edge loss is calculated during training.
As CASIV v2 does not provide ground truth edge maps, did you extract the edge maps from the ground truth pixel maps on your own? If so, could you please share how you did it?
Thank you!
hi, cause there is no training code , i try pretrain model on my own dataset(data from real world may experience resample, resize and muti-compression) , however i have tried MVSS pretrained model trained CAISA and DEFACTO but results are really disappointed (worse than Mantra-net),so may be the model should trained on my dataset first? or i should change some predict parameters?
Dear @dong03
Thank you for the code!
Is the pixel ground truth mask available for casia2, or we have to create it from the edge mask?
Hello!May I ask whether you converted the groundtruth of Columbia dataset into a binary mask graph for training?After converting, I found that the conversion of tamper boundary was not thorough and the effect was not good enough.Can you share the groundtruth after your transformation?
Thanks for your work.
As the paper show, it consists four sublayer. But I wonder how the sobel result of x and y be fusioned together?
DOES THE CODE BELOW RIGHT?
def forward(self, x):
sobel_x = F.conv2d(x, sobel_kernel_x, padding=1, bias=False)
sobel_y = F.conv2d(y, sobel_kernel_y, padding=1, bias=False)
sobel_rs = torch.pow(torch.pow(sobel_x, 2) + torch.pow(sobel_y, 2), 0.5)
sobel_rs = F.normalize(x, p=2)
sobel_rs = self.bn(sobel_rs)
sobel_rs = F.sigmoid(sobel_rs)
sobel_rs = x * sobel_rs
Hi! Sorry to disturb you again.
I download the pretrain model from your link, and I found that the constraint conv layer only have one param and it's size is [1, 3, 24]. But I thought that it would be [3, 1, 5, 5] according to your figure.
Could you plz figure out where the wrong lied in?
Thanks sincerely.
15:32 Add
Does it mean: the center pixel of the conv kernel is always -1. And you implemented it by a array, which is further filled into the kernel matrix?
Hello, highly appreciate your recent work achieved SOTA, but there is still something I want to figure out, the questions are listed as follows:
(1)In the SOTA comparison, I noticed you have used the CASIA V2 as the training datasets and other public datasets as the testing datasets, but what is the composition of the validation set?
(2)In the paper, the data online augmentation method was too short and brief, but the clear thing is that you simultaneously embrace the tampered images and non-tampered images. I wonder how to implement that "naive manipulations either by cropping and pasting a squared area", is it just cropping and pasting on the same image from the non-tampered datasets?
(3)Recently, another questions bother me greatly——since the previous work didn't take the non-tampered datasets into account, like GSR-Net, with this extra non-tampered datastes, mvss-net achieve greater progress(some metric even more that 10 percent points). However, is it just a fair comparison with the amount more than doubled?
Hope you fine, looking forward to your detailed answers!
Hello! Thanks for your inspiring work. Could you provide the model code for mvssnet++?
hello, i have just downloaded the casia dataset, however, the dataset don't have the mask files, i have tried to get the mask according to the pixel difference, but those are not very accurate. Could you please tell me how do you get the mask gt? Thanks a lot.
In the Readme it said:
Update: 22.02.17, Pretrained model for Real-World Image Foregery Localization Challange
But I could not find the model anywhere, could you please help me on that? Thank you!
It seems like the names of the images of the columbia dataset are different from the version downloaded from here https://www.ee.columbia.edu/ln/dvmm/downloads/AuthSplicedDataSet/AuthSplicedDataSet.htm
Is this the right version of the dataset?
Hello!
Thanks for your inspiring work. And I 'm very intersted in it.
Since the DEFACTO dataset can't be download, could you share the DEFACTO dataset?
your baidu link is not working can you give google drive link
Is the code for MVSS-net++ or the training code going to be available? Specifically there seems to be a non-trainable layer between the ReLU activation and the 1x1 Conv in the ConvGEM module based on the checkpoint you shared. Could you maybe share what this layer is?
Thank you!
Hi,
Thank you very much for your work. I read your paper and tested your code (casia1.0) these days, but there are some doubts about the pixel F1 score
As shown in the following image,In the case of fixed threshold , the image F1 and pixel f1 are consistent, but I want to get the best threshold test results,What can I do to get it? thanks
Hello, thank you for your works
can i ask for the pretrained weight of mvss-net++ to be uploaded in google drive too?
Thank you in advance
Hello! Thanks for your inspiring work. Could you provide the training code?
Thank you for your very meaningful work. I have learned a lot. Could you please share the processed NIST16 data set? Some pictures of the NIST16 data set I downloaded were damaged.
There is a severe error when calculating the image level F1 score. This question can be exactly found here:
Lines 44 to 45 in cc2aed7
The correct metric would be F1 = 2 * true_pos / (2 * true_pos + false_pos + false_neg + eps)
, where eps is a numerical stability factor.
Or calculate with the Recall in the coming formula instead of the Specificity:
And your implementation for pixel level F1 score is correct, you could refer to this part of the codes:
Lines 57 to 59 in cc2aed7
I hope you can recalculate and revise the metrics of the image level F1 score results, and at least publish a patch Table of metrics on GitHub, otherwise, it will be very unfair for future research, as this calculation will bring higher results to your F1 score values.
To create a better research environment, we hope that you can value research integrity! Thank you very much for your inspiring work!!
Traceback (most recent call last):
File "D:/PyCharmProject/MVSS-Net-main/inference.py", line 53, in
model.load_state_dict(checkpoint, strict=True)
File "D:\Anaconda\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 1051, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for MVSSNet:
Missing key(s) in state_dict: "model.conv1.weight", "model.bn1.weight", "model.bn1.bias", "model.bn1.running_mean", "model.bn1.running_var", "model.layer1.0.conv1.weight", "model.layer1.0.bn1.weight", "model.layer1.0.bn1.bias", "model.layer1.0.bn1.running_mean", "model.layer1.0.bn1.running_var", "model.layer1.0.conv2.weight", "model.layer1.0.bn2.weight", "
Hi, @dong03
Thanks for your nice work!
After reading your paper, I still have some questions:
Hi, @dong03
sorry to disturb you, I find that MVSSNet is baseline model in TIANCHI challenge round1.
I wonder if checkpoint provided is only training on this dataset.
If so, there are all fake images, how you train model in this condition.
By the way, when will MVSSNet's training script is available? (open source when TIANCHI challenge round is over?)
To reproduce this repo, we need to generate edge map first to train edge_loss, but even if we have know cv2.findContours
is used, it's still hard to reproduce.
I use default setting in opencv functions, but the edge extraction result is so bad. The normal process of edge extract as follow:
cv2.threshold
to do this.cv2.findContours
also has some setting.Considering above these setting unknown, it is hard to reproduce your result.
Can you share some code only about edge map extraction process?
I don't think the disclosure of these pre-processing codes will directly affect your benefit.
wishing for your reply and thanks again.
Dear all
Did you find any training code? Could you provide that?
The images coming from the corel dataset that are included in CASIAv1plus.txt have different names with the corel images downloaded from here https://sites.google.com/site/dctresearch/Home/content-based-image-retrieval (this link is included in the data/README.md)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.