endlesssora / deeperforensics-1.0 Goto Github PK

[CVPR 2020] A Large-Scale Dataset for Real-World Face Forgery Detection

Python 95.90% Shell 4.10%

benchmark cvpr2020 dataset deepfakes face-forensics face-forgery-detection face-manipulation method perturbations real-world videos

deeperforensics-1.0's People

Stargazers

Watchers

deeperforensics-1.0's Issues

What is the FF++ original video compression level corresponding to end_to_end_random_level?

Any updates on the code & dataset?

Hi,

Thank you for an interesting & well-written paper! Do you have any updates on the code & dataset?

Thank you,
Johannes

敬请期待

啥时候才能真正发布出来呀？要是过年的时候开放下载，那我这年都过不踏实了。。。。哈

Dear DeeperForensics authors,
great work! Thank you so much. In the effort of seamlessly download automatically all the dataset I create a bash script using gdown it worked at the beginning but apparently after a while it brakes with the message for large files only.

Access denied with the following error:

        Too many users have viewed or downloaded this file recently. Please
        try accessing the file again later. If the file you are trying to
        access is particularly large or is shared with many people, it may
        take up to 24 hours to be able to view or download the file. If you
        still can't access a file after 24 hours, contact your domain
        administrator. 

You may still be able to access the file from the browser:

Update: I realized that I was posting the downloading script here which is not correct. @EndlessSora let me know if you want I can share with you the script privately so you can provide it to people that access the dataset

What compression level does the std set correspond to FF++ original video?

There are three types of compressed videos in FF++, raw, c23, and c40. I don't know which compression level corresponds to the end_to_end set? Is it raw?

Datasets split for stand, and training

Hi, thank you for your work.
According to your paper, the standard set only includes 1k Youtube videos and 1k manipulated videos(end_to_end), right?
And if one wants to train their model on "std+std/sing", he or she need to apply the same pertubation on real videos(1k from ff++ and 100 actors' videos) , since the provided real videos have no pertubations, right?

Looking forward to your reply

Another question about dataset split

Thanks for your interest in our work.

Yes. The standard set only includes 1k Youtube videos and 1k manipulated videos (end_to_end).
Almost correct. The perturbations with a similar distribution should be applied to the real (1k Youtube videos from ff++, but no need for 100 actors' videos since they are source videos used for face manipulation) and fake videos if one would like to train his model on "std+std/sing".

Originally posted by @EndlessSora in #10 (comment)

JPEG compression in distortions.py

Hi,

I just wanted to point out that jpeg_compression in distortions.py actually performs pixelation instead.

About std/x dataset in your experiments in the paper

hi!
Thank you for your wonderful work~
I have noticed that you have used different data settings in your experiments: std/sing 、std/rand 、std/mix, and I am confused that whether you add the same perturbations to the original video data as you did to the manipulated data in the experiments?

About DF-VAE

Thank you for your work!

Do you intend to release the relevant code and training scripts about the DF-VAE?

How did you generate deepfakes for frames where no faces were detected in the original video in FaceForensics++?

Thanks for the great dataset. While creating a deepfake dataset, I came up with a question.
When generating deepfakes, sometimes faces are not detected in the original video frames. How did you handle this situation in the frames of the original FaceForensics++ video?

Questions of benchmark

As the title shows , I have difficulty in reproducing the Results of XceptionNet Baseline .
I hope you could show me some ”Not private“ details of your experiments if you still remember them. Or point out the errors in my own process.

Thank you anyway.

Our total process is shown as follows:

Using face detection method(MTCNN) to detect all frames in FF++_C23 videos, to get original face bounding box --【Boxes only from FF++_c23】；
With the scale (=1.3), enlarge the bounding box(also trying to be a rectangle box ); Then I use the boxes to extract faces in both FF++_C23 videos and DF1.0--end2end--the corresponding fake videos ; --【1.3 faces from Both】；

2.5) Then we have two big folders, each has 1,000 sub-folders of images (1000+1000 == 2000)

The the XceptionNet is trained from the two Folders (train:val: test is about 7: 1：2, so about 0.7x2000 ==1400 sub-folders), and each video/sub-folder produce 270 frames at regular intervals(like frame_0, frame_2, ..., frame_538, if total frames is larger than 540)--【270 frames from each video】
The parameter of XceptionNet is
4.1) batch_size = 32 , epoches = 40
4.2) optimizer_ft = optim.Adam(model.parameters(), lr=0.0002) #Other Default
4.3) exp_lr_scheduler = lr_scheduler.StepLR(optimizer_ft, step_size=2, gamma=0.9)
4.4）Val is done after each epoch has trained
The test process is done with all the images of the test sub-folders (about 0.2 x 2000 == 400)
If test on other dataset, like end_to_end_level_1, the test set is also like above(about 0.2 x 2000 == 400 sub-folders)

Unable to reproduce the experimental results of the paper

Can you provide the training log corresponding to the experiment of the paper?

I performed the same distortions on the source videos of Faceforensics++, and used this dataset to train the face detection model. The model can quickly converge during training, but its performance on the hidden dataset is very poor? Do you know what the problem is?

In addition, this is my training log. thank you very much!

2020-09-22 10:27:08,954 - INFO: Epoch:0 || Iter:0/549 || Loss:0.69330(0.69330) || Accuracy:0.56250(0.56250) 2020-09-22 10:27:13,801 - INFO: Epoch:0 || Iter:10/549 || Loss:0.33249(0.54624) || Accuracy:0.83594(0.72301) 2020-09-22 10:27:18,548 - INFO: Epoch:0 || Iter:20/549 || Loss:0.10217(0.38276) || Accuracy:0.96875(0.81659) 2020-09-22 10:27:23,361 - INFO: Epoch:0 || Iter:30/549 || Loss:0.09741(0.29400) || Accuracy:0.93750(0.86139) 2020-09-22 10:27:28,143 - INFO: Epoch:0 || Iter:40/549 || Loss:0.09310(0.24562) || Accuracy:0.96094(0.88472) 2020-09-22 10:27:32,883 - INFO: Epoch:0 || Iter:50/549 || Loss:0.05807(0.20966) || Accuracy:0.98438(0.90227) 2020-09-22 10:27:37,603 - INFO: Epoch:0 || Iter:60/549 || Loss:0.07660(0.18957) || Accuracy:0.98438(0.91342) 2020-09-22 10:27:42,391 - INFO: Epoch:0 || Iter:70/549 || Loss:0.05925(0.17513) || Accuracy:0.96875(0.92066) 2020-09-22 10:27:47,103 - INFO: Epoch:0 || Iter:80/549 || Loss:0.07028(0.16092) || Accuracy:0.96875(0.92824) 2020-09-22 10:27:51,832 - INFO: Epoch:0 || Iter:90/549 || Loss:0.06247(0.14881) || Accuracy:0.96094(0.93389) 2020-09-22 10:27:56,724 - INFO: Epoch:0 || Iter:100/549 || Loss:0.09728(0.13896) || Accuracy:0.96875(0.93858) 2020-09-22 10:28:01,467 - INFO: Epoch:0 || Iter:110/549 || Loss:0.04423(0.13061) || Accuracy:0.97656(0.94264) 2020-09-22 10:28:06,290 - INFO: Epoch:0 || Iter:120/549 || Loss:0.09134(0.12317) || Accuracy:0.96875(0.94602) 2020-09-22 10:28:11,023 - INFO: Epoch:0 || Iter:130/549 || Loss:0.02772(0.11808) || Accuracy:0.99219(0.94853) 2020-09-22 10:28:15,779 - INFO: Epoch:0 || Iter:140/549 || Loss:0.02995(0.11276) || Accuracy:0.98438(0.95063) 2020-09-22 10:28:20,518 - INFO: Epoch:0 || Iter:150/549 || Loss:0.02055(0.10843) || Accuracy:1.00000(0.95266) 2020-09-22 10:28:25,399 - INFO: Epoch:0 || Iter:160/549 || Loss:0.04992(0.10441) || Accuracy:0.96094(0.95414) 2020-09-22 10:28:30,255 - INFO: Epoch:0 || Iter:170/549 || Loss:0.02497(0.10071) || Accuracy:0.99219(0.95587) 2020-09-22 10:28:35,166 - INFO: Epoch:0 || Iter:180/549 || Loss:0.03729(0.09727) || Accuracy:0.98438(0.95740) 2020-09-22 10:28:39,957 - INFO: Epoch:0 || Iter:190/549 || Loss:0.03673(0.09374) || Accuracy:0.97656(0.95877) 2020-09-22 10:28:44,687 - INFO: Epoch:0 || Iter:200/549 || Loss:0.03946(0.09064) || Accuracy:0.99219(0.96028) 2020-09-22 10:28:49,426 - INFO: Epoch:0 || Iter:210/549 || Loss:0.02468(0.08788) || Accuracy:0.98438(0.96131) 2020-09-22 10:28:54,239 - INFO: Epoch:0 || Iter:220/549 || Loss:0.04746(0.08512) || Accuracy:0.98438(0.96249) 2020-09-22 10:28:58,963 - INFO: Epoch:0 || Iter:230/549 || Loss:0.03039(0.08289) || Accuracy:0.98438(0.96341) 2020-09-22 10:29:03,685 - INFO: Epoch:0 || Iter:240/549 || Loss:0.08809(0.08134) || Accuracy:0.95312(0.96398) 2020-09-22 10:29:08,470 - INFO: Epoch:0 || Iter:250/549 || Loss:0.02432(0.07950) || Accuracy:0.97656(0.96473) 2020-09-22 10:29:13,233 - INFO: Epoch:0 || Iter:260/549 || Loss:0.02534(0.07781) || Accuracy:1.00000(0.96558) 2020-09-22 10:29:18,048 - INFO: Epoch:0 || Iter:270/549 || Loss:0.03035(0.07645) || Accuracy:0.97656(0.96616) 2020-09-22 10:29:22,891 - INFO: Epoch:0 || Iter:280/549 || Loss:0.01610(0.07478) || Accuracy:0.99219(0.96694) 2020-09-22 10:29:27,696 - INFO: Epoch:0 || Iter:290/549 || Loss:0.02178(0.07328) || Accuracy:0.98438(0.96770) 2020-09-22 10:29:32,495 - INFO: Epoch:0 || Iter:300/549 || Loss:0.01254(0.07157) || Accuracy:1.00000(0.96846) 2020-09-22 10:29:37,226 - INFO: Epoch:0 || Iter:310/549 || Loss:0.01840(0.06979) || Accuracy:0.99219(0.96928) 2020-09-22 10:29:41,951 - INFO: Epoch:0 || Iter:320/549 || Loss:0.02171(0.06842) || Accuracy:0.98438(0.96982) 2020-09-22 10:29:46,696 - INFO: Epoch:0 || Iter:330/549 || Loss:0.00467(0.06701) || Accuracy:1.00000(0.97035) 2020-09-22 10:29:51,466 - INFO: Epoch:0 || Iter:340/549 || Loss:0.01416(0.06609) || Accuracy:1.00000(0.97063) 2020-09-22 10:29:56,222 - INFO: Epoch:0 || Iter:350/549 || Loss:0.01028(0.06513) || Accuracy:1.00000(0.97106) 2020-09-22 10:30:00,983 - INFO: Epoch:0 || Iter:360/549 || Loss:0.02263(0.06392) || Accuracy:0.99219(0.97156) 2020-09-22 10:30:05,720 - INFO: Epoch:0 || Iter:370/549 || Loss:0.03179(0.06285) || Accuracy:0.97656(0.97197) 2020-09-22 10:30:10,445 - INFO: Epoch:0 || Iter:380/549 || Loss:0.03527(0.06230) || Accuracy:0.98438(0.97240) 2020-09-22 10:30:15,216 - INFO: Epoch:0 || Iter:390/549 || Loss:0.00949(0.06134) || Accuracy:1.00000(0.97279) 2020-09-22 10:30:20,041 - INFO: Epoch:0 || Iter:400/549 || Loss:0.05724(0.06046) || Accuracy:0.97656(0.97317) 2020-09-22 10:30:24,888 - INFO: Epoch:0 || Iter:410/549 || Loss:0.00370(0.05961) || Accuracy:1.00000(0.97354) 2020-09-22 10:30:29,716 - INFO: Epoch:0 || Iter:420/549 || Loss:0.04780(0.05885) || Accuracy:0.96875(0.97391) 2020-09-22 10:30:34,467 - INFO: Epoch:0 || Iter:430/549 || Loss:0.04402(0.05810) || Accuracy:0.96875(0.97419) 2020-09-22 10:30:39,184 - INFO: Epoch:0 || Iter:440/549 || Loss:0.05830(0.05733) || Accuracy:0.98438(0.97456) 2020-09-22 10:30:43,892 - INFO: Epoch:0 || Iter:450/549 || Loss:0.02611(0.05658) || Accuracy:0.97656(0.97490) 2020-09-22 10:30:48,628 - INFO: Epoch:0 || Iter:460/549 || Loss:0.02152(0.05582) || Accuracy:0.98438(0.97519)

About training set and test set

Do you use the entire DeeperForensics_1.0\source_videos as the training set, so when generating the swapped dataset, the model has actually been trained on all source images. The whole model is not subject agnostic. Is it true?

endlesssora / deeperforensics-1.0 Goto Github PK

deeperforensics-1.0's People

Stargazers

Watchers

Forkers

deeperforensics-1.0's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs