GithubHelp home page GithubHelp logo

fvd-comparison's Introduction

About the repo

In this repo, we demonstrate that the FVD implementation from StyleGAN-V paper is equivalent to the original one when the videos are already loaded into memory and resized to a necessary resolution. The main difference of our FVD evaluation protocol from the paper is that we strictly specify how data should be processed, clips sampled, etc.

Disclaimer: this repo is just for verifying that our pytorch FVD implementation is identical to the one from tensorflow. If you want to compute FVD on your videos — please, use the src/scripts/calc_metrics_for_dataset.py script in the main repo.

Why did we implement FVD ourselves?

The problem with the original implementation is that it does not handle:

  • data processing: in which format videos are being stored (JPG/PNG directories of frames or MP4, etc.), how frames are being resized, normalized, etc.
  • clip sampling strategy: how clips are being selected (from the beginning of the video, or randomly, with which framerate, how many clips per video, etc.)
  • how many fake and real videos should be used

That's why every project computes FVD in their own way and this leads to a lot of discrepancies. In StyleGAN-V, we seek to establish a unified evaluation pipeline.

Also, the original tensorflow snippet is implemented in TensorFlow v1, which final release was done on January 6, 2021 (i.e. more than a year ago) and it won't be updated since then: https://github.com/tensorflow/tensorflow/releases/tag/v1.15.5

How to launch the comparison

We provide two comparisons:

  1. Comparison between tf.hub's I3D model and our torchscript port to demonstrate that our port is a perfectly precise copy (up to numerical precision) of tf.hub's one.
  2. Comparison between FVD metrics itself. It is done by generating two dummy datasets of 256 videos each with two different random seeds.

Installing dependencies

We put all the dependencies used into requirements.txt. You can install them by running:

pip install -r requirements.txt

1. Launching models' comparison

To compare the models between each other (in terms of L2 distance of their output), run:

python compare_models.py

In our case, it gives the output:

L_2 difference is 0.00026316816207043225

Which means that both models perform equivalent operations (note that even two equivalent convolutional layers in TF and PyTorch would produce slightly different outputs due to numerical percision).

2. Launching metrics' comparison

To compare the metrics between each other, run:

python compare_models.py

On our machine, this gives the output:

[FVD scores] Theirs: 10.13808536529541. Ours: 10.138084766713924

So, the difference is 1e-6, which is negligible.

Note: computing FVD for TensorFlow's implementation might take time since they use exact the square root. In our case, we use a very accurate approximate square root.

fvd-comparison's People

Contributors

universome avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

fvd-comparison's Issues

Input value scale and resolution

Hi. Thanks for sharing works!

Input value scale

  • Is it right input values for torch i3d model would be range in [0,1]?

  • Meanwhile, It seems input range [-1, 1] used in tensorflow i3d

  • If I use FVD with torch, which one is right between [0,1] and [-1,1] ?

Input resolution

  • If I set input resolution above 224, this code returns [batch size, 400, feat_h, feat_w]. If I want to use resolution 256, is it right average [batch size, 400, feat_h, feat_w] to have shape of [batch size, 400] (average across spatial dim) ?
    detector_kwargs = dict(rescale=False, resize=False, return_features=True) # Return raw features before the softmax layer.

Zap spider scan throwing connection refused error

requests.exceptions.ProxyError: HTTPConnectionPool(host='127.0.0.1', port=8080): Max retries exceeded with url: http://zap/JSON/spider/action/scan/?apikey=&url=https%3A%2F%2Fdlq7l0b00rmgi.cloudfront.net (Caused by ProxyError('Cannot connect to proxy.', NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f10a3706e80>: Failed to establish a new connection: [Errno 111] Connection refused')))
90
Error: Process completed with exit code 1.

How to use your code on my video?

Thanks for your great work!
I try to use your code on my videos. I'm not sure if the way it is used is correct. I am using your code as follows:

  1. Read all frames using opencv from real and fake videos. And reshape them to (num_videos, video_len, 224, 224, 3)
  2. run compare_metrics.py (without any data pre-processing, such as transforms.Normalize() in pytorch )

Is this correct? @universome

requests.exceptions.ProxyError: HTTPSConnectionPool(host='www.dropbox.com', port=443)

hi, when i use your code, i found this problem:
requests.exceptions.ProxyError: HTTPSConnectionPool(host='www.dropbox.com', port=443): Max retries exceeded with url: /s/ge9e5ujwgetktms/i3d_torchscript.pt?dl=1 (Caused by ProxyError('Cannot connect to proxy.', RemoteDisconnected('Remote end closed connection without response')))
what should i do?
And i have a another question,the videos resolution must be 224x224?If my video resolution is 128x128,how can i do?
Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.