GithubHelp home page GithubHelp logo

alex04072000 / cyclicgen Goto Github PK

View Code? Open in Web Editor NEW
157.0 7.0 25.0 504 KB

Deep Video Frame Interpolation using Cyclic Frame Generation

Python 98.81% Shell 1.19%
video-frame-interpolation interpolation frame motion video cycle-consistency-loss dvf motion-linearity-loss aaai tensorflow

cyclicgen's People

Contributors

alex04072000 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

cyclicgen's Issues

Error while testing the pretrained model

InvalidArgumentError: 2 root error(s) found.
  (0) Invalid argument: Incompatible shapes: [65536] vs. [131072]
	 [[{{node Cycle_DVF/interpolate/add_4}}]]
	 [[Cycle_DVF/add_3/_135]]
  (1) Invalid argument: Incompatible shapes: [65536] vs. [131072]
	 [[{{node Cycle_DVF/interpolate/add_4}}]]

Facing this issue.
Any help appreciated!

Wrong pixel scale for VGGnet

Hi,

I just noticed that for the whole algorithm, the pixel values for network input/output are scaled to -1 to 1. However, The VGG16 network is supposed to take input tensors with pixel range 0 to 255. The following code in the vgg16.py scales the pixel range to -510 to 0:

rgb_scaled = tf.subtract((input_image+tf.ones_like(input_image)),2)*255.

The VGG weights are set as constant and not trained. I doubt that the VGG net is not able to extract proper features for the edge information because of the wrong scaling. So problem rises that if the VGG net is really needed for edge-guided purposes, or edge information is not helping for this algorithm?

High resolution with larger displacement

Hi, @alex04072000
Thank you for your great work!
When I test your model on high resolution image with larger displacement, the interpolated frame became blurry. As this sample shows (1080 * 1920):

sample

Would you give some ideas about this issue? Using traing datasets with larger displacement or enlarging receptive field?

Should I replace CyclicGen_train_stage1's `ucf101_train_files_frameX.txt` to `frameX.txt` ?

Should I replace CyclicGen_train_stage1's ucf101_train_files_frameX.txt to frameX.txt generated by following steps ?

# 1. download UCF101 and rename:
wget https://www.crcv.ucf.edu/data/UCF101/UCF101.rar && unrar x UCF101.rar && mv UCF-101 UCF101
# 2. download and prepare train/test list:
mkdir ucfTrainTestlist && mv ucf101_train_test_split/*.txt  ucfTrainTestlist
# 3. split UCF101 to tran/test:
python3 1_move_file.py
# 4. split .avi to .png:
brew install parallel
parallel -j 12 ./extract_only.sh ::: $( find ./ -name *.avi )
# 5. generate frame1.txt frame2.txt frame3.txt
python3 2_filter_psnr.py

Pre-trained model restored from chkpt model

Error:
I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
Pre-trained model restored from ./ckpt/ckpt/CyclicGen/model

facing above issue while executing run file-
python run.py --pretrained_model_checkpoint_path=./ckpt/ckpt/CyclicGen/model --first=./myData/ucf101_interp_ours/1/frame_00.png --second=./myData/ucf101_interp_ours/1/frame_01_gt.png --out=./myData/ucf101_interp_ours/1/Output/out.png

Any help appreciated!!

The motion of video is not smooth as your demo video.

I just tested on video surf in DAVIS dataset.
I don't know where went wrong, but, like the title, the motion is not as smooth as your demo.
The interpolated frame seems closer to Frame 1 than Frame 3, the interpolated time probably is not exact 0.5
*Regarding the modification, I refrain to change significantly, just make a loop to run on video.
Here are the modified run test on video and result.

Video: https://drive.google.com/open?id=1Hg8e1YvIBYM4lzGe71w4ke4t6yfQSJvL

`

"""Train a voxel flow model on ucf101 dataset."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import numpy as np
import os
import tensorflow as tf
from datetime import datetime
from CyclicGen_model_large import Voxel_flow_model
import scipy as sp
import cv2
from vgg16 import Vgg16

FLAGS = tf.app.flags.FLAGS

# Define necessary FLAGS
tf.app.flags.DEFINE_string('pretrained_model_checkpoint_path', None,
                        """If specified, restore this pretrained model """
                        """before beginning any training.""")
tf.app.flags.DEFINE_integer('batch_size', 1, 'The number of samples in each batch.')
tf.app.flags.DEFINE_string('video', '',
                        """video""")
tf.app.flags.DEFINE_string('out', '',
                        """output image """)


def normalize(img):
    """Read image from file.
    Args:
    filename: .
    Returns:
    im_array: .
    """
    # im = sp.misc.imread(filename, mode='RGB')
    return img / 127.5 - 1.0


def test(video_dir, out_dir):

    _name = os.path.basename(video_dir).split('.')[0]
    cap = cv2.VideoCapture(video_dir)
    fcounter = 0
    _, first = cap.read()
    first = cv2.cvtColor(first, cv2.COLOR_BGR2RGB)
    fps = cap.get(cv2.CAP_PROP_FPS)
    h,w,_ = first.shape
    print('HxW: {}, FPS: {}'.format((h,w), fps))

    fourcc = cv2.VideoWriter_fourcc(*'XVID')
    out = cv2.VideoWriter(os.path.join(out_dir, _name + '_x2.avi'), fourcc, fps*2, (w,h))

    while True:
        _, second = cap.read()
        if second is None:
            break

        second = cv2.cvtColor(second, cv2.COLOR_BGR2RGB)
        data_frame1 = np.expand_dims(normalize(first), 0)
        data_frame3 = np.expand_dims(normalize(second), 0)

        H = data_frame1.shape[1]
        W = data_frame1.shape[2]

        adatptive_H = int(np.ceil(H / 32.0) * 32.0)
        adatptive_W = int(np.ceil(W / 32.0) * 32.0)

        pad_up = int(np.ceil((adatptive_H - H) / 2.0))
        pad_bot = int(np.floor((adatptive_H - H) / 2.0))
        pad_left = int(np.ceil((adatptive_W - W) / 2.0))
        pad_right = int(np.floor((adatptive_W - W) / 2.0))

        print(str(H) + ', ' + str(W))
        print(str(adatptive_H) + ', ' + str(adatptive_W))

        """Perform test on a trained model."""
        with tf.Graph().as_default():
            # Create input and target placeholder.
            input_placeholder = tf.placeholder(tf.float32, shape=(None, H, W, 6))

            input_pad = tf.pad(input_placeholder, [[0, 0], [pad_up, pad_bot], [pad_left, pad_right], [0, 0]], 'SYMMETRIC')

            edge_vgg_1 = Vgg16(input_pad[:, :, :, :3], reuse=None)
            edge_vgg_3 = Vgg16(input_pad[:, :, :, 3:6], reuse=True)

            edge_1 = tf.nn.sigmoid(edge_vgg_1.fuse)
            edge_3 = tf.nn.sigmoid(edge_vgg_3.fuse)

            edge_1 = tf.reshape(edge_1, [-1, input_pad.get_shape().as_list()[1], input_pad.get_shape().as_list()[2], 1])
            edge_3 = tf.reshape(edge_3, [-1, input_pad.get_shape().as_list()[1], input_pad.get_shape().as_list()[2], 1])

            with tf.variable_scope("Cycle_DVF"):
                # Prepare model.
                model = Voxel_flow_model(is_train=False)
                prediction = model.inference(tf.concat([input_pad, edge_1, edge_3], 3))[0]

            # Create a saver and load.
            gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.2)
            sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options))

            # Restore checkpoint from file.
            if FLAGS.pretrained_model_checkpoint_path:
                restorer = tf.train.Saver()
                restorer.restore(sess, FLAGS.pretrained_model_checkpoint_path)
                print('%s: Pre-trained model restored from %s' %
                    (datetime.now(), FLAGS.pretrained_model_checkpoint_path))

            feed_dict = {input_placeholder: np.concatenate((data_frame1, data_frame3), 3)}
            # Run single step update.
            prediction_np = sess.run(prediction, feed_dict=feed_dict)

            output = prediction_np[-1, pad_up:adatptive_H - pad_bot, pad_left:adatptive_W - pad_right, :]
            output = np.round(((output + 1.0) * 255.0 / 2.0)).astype(np.uint8)
            output = np.dstack((output[:, :, 2], output[:, :, 1], output[:, :, 0]))
            # cv2.imwrite(out, output)
            out.write(cv2.cvtColor(first, cv2.COLOR_RGB2BGR))
            out.write(output)
        first = second
    out.write(cv2.cvtColor(first, cv2.COLOR_RGB2BGR))
    cap.release()
    out.release()

if __name__ == '__main__':
    #os.environ["CUDA_VISIBLE_DEVICES"] = ""

    video = FLAGS.video
    out = FLAGS.out

    test(video, out)

`

Video demo code

Hi,
I saw your video which was very cool. But can you share your demo code that given a video, it generates the output video?

Currently I see run.py only gets two consequitive frames as input. So I'm not sure how to produce a video. Thanks.

Different evalutaion result on UCF dataset using frames generated by run.py

Hi!

I used run.py to generate interpolated frames on UCF101 dataset first and then calculated average PSNR. However, there is a little difference between my result and yours from the paper. The model I use is CyclicGen_model.py and I also test your result images. Without motion mask, the difference is around 0.4dB. So is it because of the motion mask or other fact that has influence on the result images?

training data

Hi, can you give out the link for downloading your training and testing dataset?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.