GithubHelp home page GithubHelp logo

scnuhealthy / common_metrics_on_video_quality Goto Github PK

View Code? Open in Web Editor NEW

This project forked from junyaohu/common_metrics_on_video_quality

0.0 0.0 0.0 45.22 MB

You can easily calculate FVD, PSNR, SSIM, LPIPS for evaluating the quality of generated or predicted videos.

Python 100.00%

common_metrics_on_video_quality's Introduction

common_metrics_on_video_quality

You can easily calculate the following video quality metrics:

  • FVD: Frechét Video Distance
  • PSNR: peak-signal-to-noise ratio
  • SSIM: structural similarity index measure
  • LPIPS: learned perceptual image patch similarity

As for FVD, the code refers to MVCD and other websites and projects, I've just extracted the part of it that's relevant to the calculation. This code can be used to evaluate FVD scores for generative or predictive models.

  • This project supports grayscale and RGB videos.
  • This project supports Ubuntu, but maybe something is wrong with Windows. If you can solve it, welcome any PR.
  • If the project cannot run correctly, please give me an issue or PR~
  • use scipy==1.7.3/1.9.3, if you use 1.11.3, you will calculate a WRONG FVD VALUE!!! For more details see below Notice.

Example

8 videos of a batch, 10 frames, 3 channels, 64x64 size.

import torch
from calculate_fvd import calculate_fvd
from calculate_psnr import calculate_psnr
from calculate_ssim import calculate_ssim
from calculate_lpips import calculate_lpips

NUMBER_OF_VIDEOS = 8
VIDEO_LENGTH = 30
CHANNEL = 3
SIZE = 64
videos1 = torch.zeros(NUMBER_OF_VIDEOS, VIDEO_LENGTH, CHANNEL, SIZE, SIZE, requires_grad=False)
videos2 = torch.ones(NUMBER_OF_VIDEOS, VIDEO_LENGTH, CHANNEL, SIZE, SIZE, requires_grad=False)
device = torch.device("cuda")
device = torch.device("cpu")

import json
result = {}
result['fvd'] = calculate_fvd(videos1, videos2, device)
result['ssim'] = calculate_ssim(videos1, videos2)
result['psnr'] = calculate_psnr(videos1, videos2)
result['lpips'] = calculate_lpips(videos1, videos2, device)
print(json.dumps(result, indent=4))

It means we calculate:

  • FVD-frames[:10], FVD-frames[:11], ..., FVD-frames[:30]
  • avg-PSNR/SSIM/LPIPS-frame[0], avg-PSNR/SSIM/LPIPS-frame[1], ..., avg-PSNR/SSIM/LPIPS-frame[:30], and their std.

We cannot calculate FVD-frames[:8], and it will pass when calculating, see ps.6.

The result shows: a all-zero matrix and a all-one matrix, their FVD-30 (FVD[:30]) is 152.15. We also calculate their standard deviation. Other metrics are the same.

{
    "fvd": {
        "value": {
            "10": 570.07320378183,
            "11": 486.1906542471159,
            "12": 552.3373915075898,
            "13": 146.6242330185728,
            "14": 172.57268402948895,
            "15": 133.88932632116126,
            "16": 153.11023578170108,
            "17": 357.56400892781204,
            "18": 382.1335612721498,
            "19": 306.7100176942531,
            "20": 338.18221898178774,
            "21": 77.95587603163293,
            "22": 82.49997632357349,
            "23": 64.41624523513073,
            "24": 66.08097153313875,
            "25": 314.4341061962642,
            "26": 316.8616746151064,
            "27": 288.884418528541,
            "28": 287.8192683223724,
            "29": 152.15076552354864
        },
        "video_setting": [
            8,
            3,
            30,
            64,
            64
        ],
        "video_setting_name": "batch_size, channel, time, heigth, width"
    },
    "ssim": {
        "value": {
            "0": 9.999000099990664e-05,
            ...,
            "29": 9.999000099990664e-05
        },
        "value_std": {
            "0": 0.0,
            ...,
            "29": 0.0
        },
        "video_setting": [
            30,
            3,
            64,
            64
        ],
        "video_setting_name": "time, channel, heigth, width"
    },
    "psnr": {
        "value": {
            "0": 0.0,
            ...,
            "29": 0.0
        },
        "value_std": {
            "0": 0.0,
            ...,
            "29": 0.0
        },
        "video_setting": [
            30,
            3,
            64,
            64
        ],
        "video_setting_name": "time, channel, heigth, width"
    },
    "lpips": {
        "value": {
            "0": 0.8140146732330322,
            ...,
            "29": 0.8140146732330322
        },
        "value_std": {
            "0": 0.0,
            ...,
            "29": 0.0
        },
        "video_setting": [
            30,
            3,
            64,
            64
        ],
        "video_setting_name": "time, channel, heigth, width"
    }
}

Notice

  1. You should pip install lpips first.
  2. Make sure the pixel value of videos should be in [0, 1].
  3. If you have something wrong with downloading FVD pre-trained model, you should manually download any of the following and put it into FVD folder.
    • i3d_torchscript.pt from here
    • i3d_pretrained_400.pt from here
  4. For grayscale videos, we multiply to 3 channels as it says.
  5. We average SSIM when images have 3 channels, ssim is the only metric extremely sensitive to gray being compared to b/w.
  6. Since frames_num should > 10 when calculating FVD, so FVD calculation begins from 10-th frame, like upper example.
  7. You had better use scipy==1.7.3/1.9.3, if you use 1.11.3, **you will calculate a WRONG FVD VALUE!!! **

Star Trend

Star History

Star History Chart

VFID

python calculate_fid.py 需要设定两个视频目录,一个是生成的,另一个是GT. 视频目录的视频数量要是4的倍数(原因未知,可能与torchscript这种模型读取运行方式有关). 运行流程:

def load_feature():
    videos_feature = []
    for every video:
        feature = I3D(video)
        videos_feature.append(feature)
    return videos_feature

features_predict = load_feature(gt_videos)
features_gt = load_feature(predict_videos)
calculate_fid(features_predict,features_gt)  # 这步和图像的计算是一样的

common_metrics_on_video_quality's People

Contributors

junyaohu avatar scnuhealthy avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.