GithubHelp home page GithubHelp logo

anchen1011 / toflow Goto Github PK

View Code? Open in Web Editor NEW
427.0 15.0 89.0 116.83 MB

TOFlow: Video Enhancement with Task-Oriented Flow

Home Page: http://toflow.csail.mit.edu

License: MIT License

Lua 32.19% Cuda 7.46% CMake 0.80% C 4.08% MATLAB 54.99% Shell 0.37% M 0.10%
video video-demo video-denoising dataset video-deblocking super-resolution deep-learning optical-flow video-processing interpolation

toflow's Introduction

TOFlow: Video Enhancement with Task-Oriented Flow

This repository is based on our IJCV publication TOFlow: Video Enhancement with Task-Oriented Flow (PDF). It contains pre-trained models and a demo code. It also includes the description and download scripts for the Vimeo-90K dataset we collected. If you used this code or dataset in your work, please cite:

@article{xue2019video,
  title={Video Enhancement with Task-Oriented Flow},
  author={Xue, Tianfan and Chen, Baian and Wu, Jiajun and Wei, Donglai and Freeman, William T},
  journal={International Journal of Computer Vision (IJCV)},
  volume={127},
  number={8},
  pages={1106--1125},
  year={2019},
  publisher={Springer}
}

Video Demo

IMAGE ALT TEXT

If you cannot access YouTube, please download 1080p video from here.

Prerequisites

Torch

Our implementation is based on Torch 7 (http://torch.ch).

CUDA [optional]

CUDA is suggested (https://developer.nvidia.com/cuda-toolkit) for fast inference. The demo code is still runnable without CUDA, but much slower.

Matlab [optional]

We use Matlab for generating video denoising/super-resolution dataset and quantitative evaluation require Matlab installation (https://www.mathworks.com/products/matlab.html). It is not necessary for the demo code.

FFmpeg [optional]

We use FFmpeg (http://ffmpeg.org) for generating video deblocking dataset. It is not necessary for the demo code.

Installation

Our current release has been tested on Ubuntu 14.04.

Clone the repository

git clone https://github.com/anchen1011/toflow.git

Install dependency

cd toflow/src/stnbhwd
luarocks make

This will install 'stn' package for Lua. The list of components:

require 'stn'
nn.AffineGridGeneratorBHWD(height, width)
-- takes B x 2 x 3 affine transform matrices as input, 
-- outputs a height x width grid in normalized [-1,1] coordinates
-- output layout is B,H,W,2 where the first coordinate in the 4th dimension is y, and the second is x
nn.BilinearSamplerBHWD()
-- takes a table {inputImages, grids} as inputs
-- outputs the interpolated images according to the grids
-- inputImages is a batch of samples in BHWD layout
-- grids is a batch of grids (output of AffineGridGeneratorBHWD)
-- output is also BHWD
nn.AffineTransformMatrixGenerator(useRotation, useScale, useTranslation)
-- takes a B x nbParams tensor as inputs
-- nbParams depends on the contrained transformation
-- The parameters for the selected transformation(s) should be supplied in the
-- following order: rotationAngle, scaleFactor, translationX, translationY
-- If no transformation is specified, it generates a generic affine transformation (nbParams = 6)
-- outputs B x 2 x 3 affine transform matrices

Download pretrained models (104MB)

cd ../../
./download_models.sh

Run Demo Code

cd src
th demo.lua -mode interp -inpath ../data/example/low_frame_rate
th demo.lua -mode denoise -inpath ../data/example/noisy
th demo.lua -mode deblock -inpath ../data/example/block
th demo.lua -mode sr -inpath ../data/example/blur

There are a few options in demo.lua:

nocuda: Set this option when CUDA is not available.

gpuId: GPU device ID.

mode: There are four options:

  • 'interp': temporal frame interpolation
  • 'denoise': video denoising
  • 'deblock': video deblocking
  • 'sr': video super-resolution

inpath: The path to the input sequence.

outpath: The path to where the result stores (default is ../demo_output).

Vimeo-90K Dataset

We also build a large-scale, high-quality video dataset, Vimeo-90K, designed for the following four video processing tasks: temporal frame interpolation, video denoising, video deblocking, and video super-resolution.

Vimeo-90K is built upon 5,846 selected videos downloaded from vimeo.com, which covers large variaty of scenes and actions. This video set is a subset of Vimeo-90K dataset is a subset of AoT dataset and all video links are here.

This image cannot be displayed. Please open this link in another browser: https://github.com/anchen1011/toflow/raw/master/data/doc/dataset.png

We further chop these videos to 89,800 video clips and build two datasets from these clips:

Triplet dataset for temporal frame interpolation

The triplet dataset consists of 73171 3-frame sequences with a fixed resolution of 448 x 256, extracted from 15k selected video clips from Vimeo-90K. This dataset is designed for temporal frame interpolation. Download links are:

Test set only: zip (1.7GB).

Both training and test set: zip (33GB).

Septuplet dataset for video denoising, super-resolution, and deblocking

The septuplet dataset consists of 91701 7-frame sequences with fixed resolution 448 x 256, extracted from 39k selected video clips from Vimeo-90k. This dataset is designed to video denoising, deblocking, and super-resolution.

The test set for video denoising: zip (16GB).

The test set for video deblocking: zip (11GB).

The test set for video super-resolution: zip (6GB).

The original test set (not downsampled or downgraded by noise): zip (15GB).

The original training + test set (consists of 91701 sequences, which are not downsampled or downgraded by noise): zip (82GB).

Generate Testing Sequences

See src/generate_testing_sample for the functions to generate noisy/low-resolution sequences.

To generate noisy sequences with Matlab under src/generate_testing_sample, run

add_noise_to_input(data_path, output_path);

and the results will be stored under output_path

To generate blur sequences with Matlab, run

blur_input(data_path, output_path);

and the results will be stored under output_path

Blocky sequences are compressed by FFmpeg. Our test set is generated with the following configuration:

ffmpeg -i *.png -q 20 -vcodec jpeg2000 -format j2k name.mov 

Run Quantitative Evaluation

Download all four Vimeo testsets (52G)

./download_testset.sh

Run inference on Vimeo testsets

cd src
th demo_vimeo90k.lua -mode interp
th demo_vimeo90k.lua -mode denoise
th demo_vimeo90k.lua -mode deblock
th demo_vimeo90k.lua -mode sr

Evaluation

We use three metrics to evaluate the performance of our algorithm: PSNR, SSIM, and Abs metrics. To run evaluation, execute following commands in Matlab:

cd src/evaluation
evaluate(output_dir, target_dir);

For example, to evaluate results generated in the previous step, run

cd src/evaluation
evaluate('../../output/interp', '../../data/vimeo_interp_test/target', 'interp')
evaluate('../../output/denoise', '../../data/vimeo_test_clean/sequences', 'denoise')
evaluate('../../output/deblock', '../../data/vimeo_test_clean/sequences', 'deblock')
evaluate('../../output/sr', '../../data/vimeo_test_clean/sequences', 'sr')

It is assumed that our datasets are unzipped under data/ and not renamed. It is also assumed that results are put under [output_root]/[task_name] e.g. output/sr output/interp output/denoise output/deblock, with exactly the same subfolder structure as our datasets.

References

  1. Our warping code is based on qassemoquab/stnbhwd.
  2. Our flow utilities and transformation utilities are based on anuragranj/spynet
  3. There is an unofficial PyTorch implementation by coldog2333/pytoflow

toflow's People

Contributors

anchen1011 avatar donglaiw avatar jiajunwu avatar tfxue avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

toflow's Issues

structure about the SPN net

@anchen1011 Hi, I am confused about the details of the SPN network. I referred to the spatial transformer network and tried to find the localisation net in your project but failed. Can you share your SPN structures? Thanks!

STN net
stn

Running toflow on consecutive frames

I would like to run your system inference on consecutive frames coming directly from the network, rather than on a stored video file on the disk. Is it possible? How would you suggest to do it?

Output at higher resolution

Hi, I have several high resolution footages (4240x2382) that would like to be tested on this algorithm.

What are the options to get the same resolution as output?

What's dsr_imresize?

Hi,
In src/generate_testing_sample/blur_input.m, you use a custom function dsr_imresize.p (which is a binary file) to downsample and upsample images. But you mention that you use MATLAB imresize function to generate LR images in your paper Section 5. I check the output of MATLAB imresize and your dsr_imresize. There are some small differences between the LR images (from the same HR image).

What's the difference between dsr_imresize and MATLAB imresize? Why not using MATLAB built-in imresize function?

Issue in SSIM implementation

Hey, you have amazing work. I am facing an issue with SSIM score you have calculated. it's like reshaping the image and passing it from SSIM function is not consistent with the Original Implementation of authors of SSIM. See the code for reproducing please.

% LR image available here 'https://raw.githubusercontent.com/mugheesahmad/Fun_testing/master/LR0000001.jpg' 
% HR image available here 'https://raw.githubusercontent.com/mugheesahmad/Fun_testing/master/HR0000001.jpg' 
lr = imread('LR0000001.jpg');
hr = imread('HR0000001.jpg');
ssim(hr, lr) %colored image
% ans = 0.8433
ssim(rgb2gray(hr), rgb2gray(lr)) %builtin MATLAB function
% ans =    0.7570
original_ssim(rgb2gray(hr), rgb2gray(lr)) %author implementation available here https://ece.uwaterloo.ca/~z70wang/research/ssim/
% original implementation doesnot accept the RGB image
% ans =    0.7574 
lru = reshape(lr, [380*672,3]);  %your way of doing
hru = reshape(hr, [380*672,3]);
ssim(lru, hru)
%ans = 86.70

As per the docs of Matlab SSIM, only gray images can be passed. Your way of using it does not consistent with the original and also with the SSIM implementation with skimage and pytorch version. see this and this colab file.

Loading the model:

To reproduce the results of the paper on video denoising, I want to load the pre-trained model to evaluate on the test set of Vimeo-90 dataset. But when I reload the model using torch7, I have encountered the following problem:

Warning: Failed to load function from bytecode: binary string: not a precompiled chunkWarning: Failed to load function from bytecode: [string ""]:1: unexpected symbol near char(5)/home/mars/torch/install/bin/lua: /home/mars/torch/install/share/lua/5.2/torch/File.lua:375: unknown object
stack traceback:
[C]: in function 'error'
/home/mars/torch/install/share/lua/5.2/torch/File.lua:375: in function 'readObject'
/home/mars/torch/install/share/lua/5.2/torch/File.lua:307: in function 'readObject'
/home/mars/torch/install/share/lua/5.2/torch/File.lua:369: in function 'readObject'
/home/mars/torch/install/share/lua/5.2/nn/Module.lua:192: in function 'read'
...rs/torch/install/share/lua/5.2/nn/BatchNormalization.lua:185: in function 'read'
/home/mars/torch/install/share/lua/5.2/torch/File.lua:351: in function 'readObject'
/home/mars/torch/install/share/lua/5.2/torch/File.lua:369: in function 'readObject'
/home/mars/torch/install/share/lua/5.2/torch/File.lua:369: in function 'readObject'
/home/mars/torch/install/share/lua/5.2/nn/Module.lua:192: in function 'read'
...
/home/mars/torch/install/share/lua/5.2/nn/Module.lua:192: in function 'read'
/home/mars/torch/install/share/lua/5.2/torch/File.lua:351: in function 'readObject'
/home/mars/torch/install/share/lua/5.2/torch/File.lua:369: in function 'readObject'
/home/mars/torch/install/share/lua/5.2/torch/File.lua:369: in function 'readObject'
/home/mars/torch/install/share/lua/5.2/nn/Module.lua:192: in function 'read'
/home/mars/torch/install/share/lua/5.2/torch/File.lua:351: in function 'readObject'
/home/mars/torch/install/share/lua/5.2/torch/File.lua:409: in function 'load'
demo.lua:65: in main chunk
[C]: in function 'dofile'
...mars/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: in ?
Could you help me check if the pre-trained model has any problem? @anchen1011

Confusion on Image Transformation Module

Hi, according to the paper section 4.3, STN layer is used for transformation estimation. Seems the implementation you use in this repo only support affine transformation? So does it mean in Fig2 the "warped input" is just an affine transformation of "Input frames"?

Using ffmpeg to generate image sequence?

Hello there,

In the readme.md, the line ffmpeg -i *.png -q 20 -vcodec jpeg2000 -format j2k name.mov can generate a video with extension MOV.

I just wonder whether it is possible to directly generate a sequence of images.

Thank you!

runtime crash

I'm trying to run this using this command:

th demo.lua -mode denoise -inpath ../data/example/noisy -outpath ./out

And I'm getting this runtime error:

/torch/install/bin/luajit: ./main/loader.lua:27: argument 1 expected a 'string', got a 'nil'
stack traceback:
	./main/loader.lua:27: in function 'gen_path'
	demo.lua:47: in main chunk

divide vimeo90K dataset into training set and test set

Thank you very much for your work. It is a extremely good work .
I have a problem about how to divide vimeo90K dataset into training set and LR images?

I have seen that there is a get_path_sep.m in src/evalution and sep_trainlist.txt in data.
I want to know how to call to generate the corresponding training set ?

Thanks a lot.

Resolution mismatching for interpolation

Thank you so much for sharing your code.
I use TOFLow to interpolate 2k images, their resolution is 2560x1440.
16 can divide 2560 and 16 can also divide 1440.
However, after doing interpolation, the resolution of my result is 2096x1184. How could I resolve this problem?
Thank you.

resolution mismatch for interp

thank you for sharing your code as well as your dataset.

i would like to evaluate your interp implementation. however, given an input of size 960x540 the output will be of size 960x528 instead. my intermediate solution is to rescale the output using bicubic interpolation such that it matches the input again. what is your suggestion in this regard, such that the evaluation is being done fairly? thanks!

Pre-training the flow estimation network

Hi, @anchen1011 . I pre-trained the flownet on the Sintel dataset but that does not converge . The batchsize is 16 and learning rate is 0.0001, the loss is defined by calculating the l1 difference between the last sub-net's output and the ground truth. Can you share the details about pre-training the flownet?

the noise level of the pretrained denoise model

Hi @anchen1011 thanks for your sharing the pretrained model, the denoising effect is very impressive!

I want to train the denoise model based on my own dataset.
I noticed that there is a file add_noise_to_input to generate the "test" sequences. May I ask that whether it is the same file (same noise var) to generate the "train" sequences?

Thanks!

Not all septuplets listed in "sep_testlist.txt" and "sep_trainlist.txt".

Hello Tianfan,
Thank you for sharing the dataset and code. I downloaded full septuplets dataset (training+test data) and found that "sep_testlist.txt" and "sep_trainlist.txt" contain only 72,436 setptuplets instead of 91,701. However, the folder with septuplets contains all 91,701 septuplets. Why not all septuplets appear in the "sep_testlist.txt" and "sep_trainlist.txt"?

About Vimeo Dataset.

What is the frame rate for the videos in the dataset? Also, what is the sampling frequency for the triplet and septuplet datasets?

2x model

Hi,

Thank you for the great code! In the code, the model is actually a 4x super-resolution model. Is it possible to use the proposed algorithm for 2x? Thank you!

Best,
Yongcheng

License of Vimeo-90K?

Hi, thank you for your great work!

I would like to know the license of Vimeo-90K.
I know the license for the code is MIT, but I wonder if same license is applied to Vimeo-90K dataset.

Thank you!

About the box down-sample kernel

Does the box down-sample kernel mentioned in your paper can be implemented by function imresize(img_hr, 1/up_scale, 'box'), I'm not sure whether they are same?

thanks~

demo.lua fails to run: cuDNN not found

The demo.lua not running after installing with Cuda and cuDNN. I had also installed 'luarocks install cudnn'
$ th demo.lua -mode denoise -inpath ../data/example/noisy
==> initializing...
/home/xhuv/torch/install/bin/luajit: /home/xhuv/torch/install/share/lua/5.1/trepl/init.lua:389: /home/xhuv/torch/install/share/lua/5.1/trepl/init.lua:389: /home/xhuv/torch/install/share/lua/5.1/trepl/init.lua:389: /home/xhuv/torch/install/share/lua/5.1/cudnn/ffi.lua:1603: 'libcudnn (R5) not found in library path.
Please install CuDNN from https://developer.nvidia.com/cuDNN
Then make sure files named as libcudnn.so.5 or libcudnn.5.dylib are placed in
your library load path (for example /usr/local/lib , or manually add a path to LD_LIBRARY_PATH)

Alternatively, set the path to libcudnn.so.5 or libcudnn.5.dylib
to the environment variable CUDNN_PATH and rerun torch.
For example: export CUDNN_PATH="/usr/local/cuda/lib64/libcudnn.so.5"

stack traceback:
[C]: in function 'error'
/home/xhuv/torch/install/share/lua/5.1/trepl/init.lua:389: in function 'require'
demo.lua:20: in main chunk
[C]: in function 'dofile'
...xhuv/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk

otput of SR get much smaller size than input (not 8x problem)

thank you for sharing your code as well as your dataset.

i would like to do the SR implementation. however, given an input of size 19201072, the output will be 19201072, while when input is 38402144 (satisfied 8x ), the output will be of size 21121168 instead, W/H also changed .

What are the options to get the same resolution as output? thanks!

vimeo-90k

Hi @anchen1011 ,

Thank you for your work.
I wonder, can I get original video clips from your Vimeo-90k dataset?
Or is it possible for you to create longer clips? for example 30-frame sequences

the results of sr model is not correct

I have tested the sr model by using command line: th demo.lua -mode sr -inpath ../data/example/low_resolution/ it won't get a correct results, the results just like blur of input frame, when i test with th demo.lua , the denoise will get correct results, could you help me to fix the bug of sr model?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.