alex04072000 / cyclicgen Goto Github PK
View Code? Open in Web Editor NEWDeep Video Frame Interpolation using Cyclic Frame Generation
Deep Video Frame Interpolation using Cyclic Frame Generation
InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: Incompatible shapes: [65536] vs. [131072]
[[{{node Cycle_DVF/interpolate/add_4}}]]
[[Cycle_DVF/add_3/_135]]
(1) Invalid argument: Incompatible shapes: [65536] vs. [131072]
[[{{node Cycle_DVF/interpolate/add_4}}]]
Facing this issue.
Any help appreciated!
Hi,
I just noticed that for the whole algorithm, the pixel values for network input/output are scaled to -1 to 1. However, The VGG16 network is supposed to take input tensors with pixel range 0 to 255. The following code in the vgg16.py scales the pixel range to -510 to 0:
rgb_scaled = tf.subtract((input_image+tf.ones_like(input_image)),2)*255.
The VGG weights are set as constant and not trained. I doubt that the VGG net is not able to extract proper features for the edge information because of the wrong scaling. So problem rises that if the VGG net is really needed for edge-guided purposes, or edge information is not helping for this algorithm?
hi, @alex04072000
Loss was may forgotted to be calculated in CyclicGen_train_stage1.py.
prediction1, flow1 = model1.inference(tf.concat([input1, input3, edge_1, edge_3], 3))
total_loss = prediction1
Hi, @alex04072000
Thank you for your great work!
When I test your model on high resolution image with larger displacement, the interpolated frame became blurry. As this sample shows (1080 * 1920):
Would you give some ideas about this issue? Using traing datasets with larger displacement or enlarging receptive field?
Should I replace CyclicGen_train_stage1's ucf101_train_files_frameX.txt
to frameX.txt
generated by following steps ?
# 1. download UCF101 and rename:
wget https://www.crcv.ucf.edu/data/UCF101/UCF101.rar && unrar x UCF101.rar && mv UCF-101 UCF101
# 2. download and prepare train/test list:
mkdir ucfTrainTestlist && mv ucf101_train_test_split/*.txt ucfTrainTestlist
# 3. split UCF101 to tran/test:
python3 1_move_file.py
# 4. split .avi to .png:
brew install parallel
parallel -j 12 ./extract_only.sh ::: $( find ./ -name *.avi )
# 5. generate frame1.txt frame2.txt frame3.txt
python3 2_filter_psnr.py
Error:
I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
Pre-trained model restored from ./ckpt/ckpt/CyclicGen/model
facing above issue while executing run file-
python run.py --pretrained_model_checkpoint_path=./ckpt/ckpt/CyclicGen/model --first=./myData/ucf101_interp_ours/1/frame_00.png --second=./myData/ucf101_interp_ours/1/frame_01_gt.png --out=./myData/ucf101_interp_ours/1/Output/out.png
Any help appreciated!!
and when i load the dvf pretrained model ,the following error occors
W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at save_restore_v2_ops.cc:184 : Not found: Key Cycle_DVF/conv1/BatchNorm/beta not found in checkpoint
Thanks for your sharing !
And I wonder that could you provide the extracted frames triplets train-dataset ?
Look forward to your soonest reply.
I just tested on video surf in DAVIS dataset.
I don't know where went wrong, but, like the title, the motion is not as smooth as your demo.
The interpolated frame seems closer to Frame 1 than Frame 3, the interpolated time probably is not exact 0.5
*Regarding the modification, I refrain to change significantly, just make a loop to run on video.
Here are the modified run test on video and result.
Video: https://drive.google.com/open?id=1Hg8e1YvIBYM4lzGe71w4ke4t6yfQSJvL
`
"""Train a voxel flow model on ucf101 dataset."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import numpy as np
import os
import tensorflow as tf
from datetime import datetime
from CyclicGen_model_large import Voxel_flow_model
import scipy as sp
import cv2
from vgg16 import Vgg16
FLAGS = tf.app.flags.FLAGS
# Define necessary FLAGS
tf.app.flags.DEFINE_string('pretrained_model_checkpoint_path', None,
"""If specified, restore this pretrained model """
"""before beginning any training.""")
tf.app.flags.DEFINE_integer('batch_size', 1, 'The number of samples in each batch.')
tf.app.flags.DEFINE_string('video', '',
"""video""")
tf.app.flags.DEFINE_string('out', '',
"""output image """)
def normalize(img):
"""Read image from file.
Args:
filename: .
Returns:
im_array: .
"""
# im = sp.misc.imread(filename, mode='RGB')
return img / 127.5 - 1.0
def test(video_dir, out_dir):
_name = os.path.basename(video_dir).split('.')[0]
cap = cv2.VideoCapture(video_dir)
fcounter = 0
_, first = cap.read()
first = cv2.cvtColor(first, cv2.COLOR_BGR2RGB)
fps = cap.get(cv2.CAP_PROP_FPS)
h,w,_ = first.shape
print('HxW: {}, FPS: {}'.format((h,w), fps))
fourcc = cv2.VideoWriter_fourcc(*'XVID')
out = cv2.VideoWriter(os.path.join(out_dir, _name + '_x2.avi'), fourcc, fps*2, (w,h))
while True:
_, second = cap.read()
if second is None:
break
second = cv2.cvtColor(second, cv2.COLOR_BGR2RGB)
data_frame1 = np.expand_dims(normalize(first), 0)
data_frame3 = np.expand_dims(normalize(second), 0)
H = data_frame1.shape[1]
W = data_frame1.shape[2]
adatptive_H = int(np.ceil(H / 32.0) * 32.0)
adatptive_W = int(np.ceil(W / 32.0) * 32.0)
pad_up = int(np.ceil((adatptive_H - H) / 2.0))
pad_bot = int(np.floor((adatptive_H - H) / 2.0))
pad_left = int(np.ceil((adatptive_W - W) / 2.0))
pad_right = int(np.floor((adatptive_W - W) / 2.0))
print(str(H) + ', ' + str(W))
print(str(adatptive_H) + ', ' + str(adatptive_W))
"""Perform test on a trained model."""
with tf.Graph().as_default():
# Create input and target placeholder.
input_placeholder = tf.placeholder(tf.float32, shape=(None, H, W, 6))
input_pad = tf.pad(input_placeholder, [[0, 0], [pad_up, pad_bot], [pad_left, pad_right], [0, 0]], 'SYMMETRIC')
edge_vgg_1 = Vgg16(input_pad[:, :, :, :3], reuse=None)
edge_vgg_3 = Vgg16(input_pad[:, :, :, 3:6], reuse=True)
edge_1 = tf.nn.sigmoid(edge_vgg_1.fuse)
edge_3 = tf.nn.sigmoid(edge_vgg_3.fuse)
edge_1 = tf.reshape(edge_1, [-1, input_pad.get_shape().as_list()[1], input_pad.get_shape().as_list()[2], 1])
edge_3 = tf.reshape(edge_3, [-1, input_pad.get_shape().as_list()[1], input_pad.get_shape().as_list()[2], 1])
with tf.variable_scope("Cycle_DVF"):
# Prepare model.
model = Voxel_flow_model(is_train=False)
prediction = model.inference(tf.concat([input_pad, edge_1, edge_3], 3))[0]
# Create a saver and load.
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.2)
sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options))
# Restore checkpoint from file.
if FLAGS.pretrained_model_checkpoint_path:
restorer = tf.train.Saver()
restorer.restore(sess, FLAGS.pretrained_model_checkpoint_path)
print('%s: Pre-trained model restored from %s' %
(datetime.now(), FLAGS.pretrained_model_checkpoint_path))
feed_dict = {input_placeholder: np.concatenate((data_frame1, data_frame3), 3)}
# Run single step update.
prediction_np = sess.run(prediction, feed_dict=feed_dict)
output = prediction_np[-1, pad_up:adatptive_H - pad_bot, pad_left:adatptive_W - pad_right, :]
output = np.round(((output + 1.0) * 255.0 / 2.0)).astype(np.uint8)
output = np.dstack((output[:, :, 2], output[:, :, 1], output[:, :, 0]))
# cv2.imwrite(out, output)
out.write(cv2.cvtColor(first, cv2.COLOR_RGB2BGR))
out.write(output)
first = second
out.write(cv2.cvtColor(first, cv2.COLOR_RGB2BGR))
cap.release()
out.release()
if __name__ == '__main__':
#os.environ["CUDA_VISIBLE_DEVICES"] = ""
video = FLAGS.video
out = FLAGS.out
test(video, out)
`
Hi,
I saw your video which was very cool. But can you share your demo code that given a video, it generates the output video?
Currently I see run.py
only gets two consequitive frames as input. So I'm not sure how to produce a video. Thanks.
Hi!
I used run.py to generate interpolated frames on UCF101 dataset first and then calculated average PSNR. However, there is a little difference between my result and yours from the paper. The model I use is CyclicGen_model.py and I also test your result images. Without motion mask, the difference is around 0.4dB. So is it because of the motion mask or other fact that has influence on the result images?
Hi, can you give out the link for downloading your training and testing dataset?
@alex04072000
Does the steps in stage1 need to be the same as stage2?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.