GithubHelp home page GithubHelp logo

isabella232 / videoinr-continuous-space-time-super-resolution Goto Github PK

View Code? Open in Web Editor NEW

This project forked from picsart-ai-research/videoinr-continuous-space-time-super-resolution

0.0 0.0 0.0 945 KB

[CVPR 2022] VideoINR: Learning Video Implicit Neural Representation for Continuous Space-Time Super-Resolution

Home Page: https://arxiv.org/abs/2206.04647

Shell 0.10% C++ 5.85% Python 70.55% C 2.17% Cuda 21.33%

videoinr-continuous-space-time-super-resolution's Introduction

VideoINR

This repository contains the official implementation for VideoINR introduced in the following paper:

VideoINR: Learning Video Implicit Neural Representation for Continuous Space-Time Super-Resolution
Zeyuan Chen, Yinbo Chen, Jingwen Liu, Xingqian Xu, Vidit Goel, Zhangyang Wang, Humphrey Shi, Xiaolong Wang
CVPR 2022

You can find more visual results and a brief introduction to VideoINR at our project page.

Method Overview

Two consecutive input frames are concatenated and encoded as a discrete feature map. Based on the feature, the spatial and temporal implicit neural representations decode a 3D space-time coordinate to a motion flow vector. We then sample a new feature vector by warping according to the motion flow, and decode it as the RGB prediction of the query coordinate.

Citation

If you find our work useful in your research, please cite:

@inproceedings{chen2022vinr,
  author    = {Chen, Zeyuan and Chen, Yinbo and Liu, Jingwen and Xu, Xingqian and Goel, Vidit and Wang, Zhangyang and Shi, Humphrey and Wang, Xiaolong},
  title     = {VideoINR: Learning Video Implicit Neural Representation for Continuous Space-Time Super-Resolution},
  journal   = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year      = {2022},
}

Environmental Setup

The code is tested in:

If you are using Anaconda, the following command can be used to build the environment:

conda create -n videoinr python=3.6
conda activate videoinr
conda install pytorch=1.4 torchvision -c pytorch

pip install opencv-python pillow tqdm pyyaml
cd models/modules/DCNv2/
python setup.py install

Demo

  1. Download the pre-trained model from google drive.

  2. Convert your video of interest to a sequence of images. This process can be completed by many apps, e.g. ffmpeg and AdobePR.

The folder that contains this image sequence should have a structure as follows:

data_path
├── img_1.jpg
├── img_2.jpg
├── ...
├── img_n.jpg
  1. Using VideoINR for performing space-time super-resolution. You can adjust up-sampling scales by setting different space_scale and time_scale.
python demo.py --space_scale 4 --time_scale 8 --data_path [YOUR_DATA_PATH]
  1. The output would be three folders including low-resolution images, bicubic-upsampling images, and the results of VideoINR.

Preparing Dataset

We use the Adobe240 dataset for training.

  1. Download the zip file here which contains the original high FPS videos.

  2. In order to extract frames of each video to a separated folder, change videoFolder to where you save the extracted frames and frameFolder to DATASET_PATH in generate_frames_from_adobe240fps.py and run it. This would automatically split the data into train/test/val set.

python generate_frames_from_adobe240fps.py

Training

  1. Configure training settings, which can be found at options/train. The default training setting can be found at train_zsm.yml. You need to change a few lines in the config file in order to run successfully in your machine:

    • name & mode (Line 12 & 13): As mentioned in the paper, we adopt a two-stage training strategy, so there exists two different modes for training set. Adobe and Adobe_a (refer to Line 47 in data/init.py). Adobe fixs the down-sampling scale to 4 while Adobe_a randomly samples down-sampling scales in [2, 4]. For the first stage (0 - 450000 iterations), we set the name & mode to Adobe. For the second stage (450000 - 600000 iterations), we set the name & mode to Adobe_a.
    • dataroot_GT & dataroot_LQ (Line 17 & 18): Path to the Adobe240 dataset. Set them as DATASET_PATH/train (dataroot_LQ is not used in current implementation)
    • models & training_state (Line 47 & 48): Path to where you want to save the model parameters and training state (for restart training).
  2. Run training code. The default setting needs four RTX 2080Ti for training. Note that for applying the two-stage training strategy, you might have to run train.py twice.

python train.py -opt options/train/train_zsm.yml

Additional Note

Throughout the training process, we calculate the loss by summing distances of all pixels between prediction and ground-truth. However, this can be unreasonable for stage 2 (450000 - 600000 iterations) since the ground-truth images have different resolutions, resulting in different loss scales. Using mean distances for the loss value in stage 2 can be helpful for the final model performance.

Thank @sichun233746 very much for his testing!

Acknowledgments

Our code is built on Zooming-Slow-Mo-CVPR-2020 and LIIF. Thank the authors for sharing their codes!

videoinr-continuous-space-time-super-resolution's People

Contributors

zychen-ustc avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.