GithubHelp home page GithubHelp logo

tarun005 / flavr Goto Github PK

View Code? Open in Web Editor NEW
449.0 449.0 70.0 33.8 MB

Code for FLAVR: A fast and efficient frame interpolation technique.

License: Apache License 2.0

Python 90.94% Jupyter Notebook 9.06%
8x-interpolation artificial-intelligence deep-learning machine-learning slomo-filter video

flavr's People

Contributors

around-star avatar n00mkrad avatar tarun-kalluri avatar tarun005 avatar virtualramblas avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

flavr's Issues

Blur output

Input -

cat_video.mp4

Output -

cat_video_8x.mp4

Hey @tarun005 , I used the 8x pretrained model on this video. The output seems blurry mostly at the edges. Can this be improved?

training issue about PSNR

Hi, tarun ,excellent work on video interpolation!
I tried run your code , but I have some trouble. I set my config as
batch_size=2, beta1=0.9, beta2=0.99, checkpoint_dir='.', cuda=True, data_root='/home/Changchen/dataset./vimeo_septuplet', dataset='vimeo90K', exp_name='exp', joinType='concat', load_from=None, log_iter=60, loss='1*L1', lr=0.0002, max_epoch=50, model='unet_18', n_outputs=1, nbr_frame=4, nbr_width=1, num_gpu=1, num_workers=16, pretrained=None, random_seed=12345, resume=False, resume_exp=None, start_epoch=0, test_batch_size=1, upmode='transpose', use_tensorboard=False, val_freq=1
At the beginning, psnr was normal about 20,but it has gradually decreased to about 14. I wonder why it seems to be misconvergence.
Thank you for any help!

Can't run FLAVR

At first I tried to use Flowframes, but since it gave out an error I tried following your instructions on github. When I tried to run python interpolate.py --input_video input.mp4 --factor 8 --load_model FLAVR8X.pth I got a very similar if not identical error message:

13.000209881905063 Traceback (most recent call last): File "interpolate.py", line 133, in <module> videoTensor , resizes = video_transform(videoTensor , args.downscale) File "interpolate.py", line 121, in video_transform videoTensor = transforms(videoTensor) File "C:\Users\frangamer1892roblox\MiniConda3\lib\site-packages\torchvision\transforms\transforms.py", line 60, in __call__ img = t(img) File "D:\FLAVR\dataset\transforms.py", line 333, in __call__ return to_tensor(clip) File "D:\FLAVR\dataset\transforms.py", line 107, in to_tensor return clip.float().permute(3, 0, 1, 2) / 255.0 RuntimeError: [enforce fail at ..\c10\core\CPUAllocator.cpp:79] data. DefaultCPUAllocator: not enough memory: you tried to allocate 25944883200 bytes.

I am not really use what to do now. This are my specs if they help in any way:

DxDiag.txt

Train on low rate video (5-10fps)?

I find your model behave bad on low rate video (5-10fps), I wonder how to fix this? Maybe Train on low rate videos can help? Thanks a lot.

error with there is no sep_trainlist.txt

Hi there,

When Im trying to run the test.py with the pretrained model that you privided, I got an error is that it show there is NOT a directory : xxxxxx/sep_trainlist.txt (my input data path). Can you tell me how to fix this ? Thank you

16x or higher factor trained model

Hi Tarun,

It is remarkable that you make inference speed for 16x or higher factor faster than Super SloMo while performing well. Will you publicate the trained model of 16x and higher and add support afterward?

Question about Training Time

Hi, thank you for releasing the code.

Your paper writes We use 8 GPUs and a mini-batch of 32 to train each model, and training is completed in about 36 hours for 2× a on 2080ti.
But I found it will take at least 5 days on 8 v100 GPUs for 200-epoch training on Vimeo-90K.
Is there some problem ignored by me?

Question on PSNR evaluation on 8x and 4x (Table. 2 and Table. 3)

Hi,

I have a question about the evaluation on the 8x and 4x cases for Table.2 and Table. 3 regarding the Adobe dataset in the paper. It seems 4x cases has much higher PSNR compared to 8x cases.

Let's say the 7 intermediate frames are denoted as t1, t2, t3, t4, t5, t6, t7. To my understanding the PSNR values are normally:
(t1 close to t7) > (t2 close to t6) > (t3 close to t5) > t4
At lease, this is what I have observed for DAIN, SuperSloMo and QVI. And it is expected that when the temporal distance to the input frame increases, the interpolated quality decreases (lower PSNR).

For 4x, you would only have t2, t4, t6, so the average PSNR values should be expected to be lower than 8x.

However, for 4x in Table.3 FLAVR is 5.62dB higher compare to 8x in Table.2. And other methods (DAIN, QVI and SuperSloMo) all experienced much higher PSNR. To my understanding 5.62dB is a huge increase.

The expected trend should be similar to Table.3 in BMBC paper: https://arxiv.org/pdf/2007.12622.pdf
where PSNR(2x) < PSNR(4x) < PSNR(8x).

I am wondering if there is anything I missed for the evaluation that causes my confusion?

Thanks

Cannot reproduce AdaCoF results

Hi,

I use the AdaCoF official codes and the weight that you provide, only achieving 34.93dB on the Vimeo dataset, which is much lower than 35.40. Could you please share the script with me? Many thanks!

silent failure

I'm running this on ubuntu w/a 3090 on a video file (4k mkv if it somehow matters, seems to read fine) but it fails silently at

def video_transform(videoTensor , downscale=1):
and the process is simply killed with no error - nothing indicating not enough memory, etc.

Any thoughts on debugging?

about the training tricks

your work is impressive!Hello, I'd like to ask you a few questions.When I download your code to training, I put the batchsize into 6, change data volume to around 20000, also use vimeo, but why I trained 70 several epoch, the loss on the training set and test set, PSNR and ssim are not convergence,lr at this time has dropped to a low value, so I think that also does not have the resolution to training necessary, the last PSNR is also less than 20, I wonder why I training result is so poor, and the index of the paper you far, can you give some advice?
image

The training batch size of 8x VFI on GoPro

Hi,
What is the batch size for training FLAVR for 8x VFI, I see in the paper that it is 32 with the frame size of 512x512. But I train on 8 GPUs(1080Ti whose memory is the same as 2080Ti), I got OOM error and only a batch size of 16 is ok. Besides that, neither random frame order reversal nor random horizontal flipping as augmentation strategies can be found in the GoPro.py. I wonder that can I reproduce the results with this code ?

custom video,output video size changed

Hi,
When I used custom video test interpolate.py I found that the video size had changed.The input size is(960,540) and the output size is (960,536).This is where I find the problem
downscale = int(downscale * 8) resizes = 8 *( H // downscale), 8 *( W // downscale)
image
So I changed the code like this
resizes = (8 * H // downscale), (8 * W // downscale).
But it has an error.
image
How can I fix this?

Attention weight score

Hi, I was reading your work and was wondering, how do you obtain the highest attention weight for the feature maps in figure 5? Do you just sum up the tensor along the channel dimension and sort that or do you use some other method?
Thanks!

Vimeo90K triplet test dataset performance issue

Hi,

I am impressed with your new video frame interpolation paper.

When I tested, I got 32.59dB in vimeo90K triplet test set.

According to your Middleburry.py in dataset directory, I fixed VimeoSepTuplet class to VimeoTriplet class like below.

What is the problem in my fixed code?

I am wondering if I could get custom triplet interpolation code which takes two input frames and yields an intermediate frame.

    class VimeoTriplet(Dataset):
        def __init__(self, data_root):
            self.data_root = data_root
            self.image_root = os.path.join(self.data_root, 'sequences')
        
            test_fn = os.path.join(self.data_root, 'tri_testlist.txt')

            with open(test_fn, 'r') as txt:
                self.seq_list = [line.strip() for line in txt]
        
        def __getitem__(self, index):
            im1 = Image.open('%s/%s/im1.png'%(self.image_root,self.seq_list[index])).convert('RGB')
            gt = Image.open('%s/%s/im2.png'%(self.image_root,self.seq_list[index])).convert('RGB')
            im3 = Image.open('%s/%s/im3.png'%(self.image_root,self.seq_list[index])).convert('RGB')
        
            im1, gt, im3 = map(to_tensor, (im1,gt,im3))
        
            return [im1, im1, im3, im3], [gt]

        def __len__(self):
            return len(self.seq_list)

forward interpolation

Due to latency constraint, will Flavr use just past frames for multi-frame predictions?

Thanks,

UCF101 testing dataset

Hi, I found UCF101 original dataset with avi format and UCF101 triplet dataset with png format. But there is no 5-frames dataset availble. Can you provide the method to generate the UCF101 testing dataset for FLAVR.

Models trained with Huber and VGG loss

Hi Tarun,

Your work is really interesting!
I was wondering if you could share the models trained in the ablation experiments.

I am curious to see how the different loss functions affect downstream tasks like actuon recognition. From the paper, you mentioned that L1 loss results in sharper images. But, does this also translate to better action recognition results?

Please do share any experiments/insights regarding this. I would love to hear your thoughts.

Testing result on vimeo90k_septuplet

Hello, my friend! I tested the model with pretrained model 'FLAVR_4x.pth' (yours) and dataset 'vimeo90k_septuplet', and the result of psnr I had got was 28.376122. I don't konw why it occurs.

About test data of Adobe

Hi, thanks for your code, I tested the pretained model of 8x on the test set of gopro and all adobe dataset, and got psnr 31.31 and 31.83 respectively. It is the same as the results reported in paper about gopro, but not adobe(32.20 in paper).
I want to know if the results in paper is not tested on the whole adobe dataset, and could you provide more information about the experiment of 8x interpolation on adobe?
Thanks!

Unable to write out results

Hey,

I've managed to get up and running with flavr, right up until the final stage. I'm using a directory with a png sequence in it, which successfully runs through the network. But when it comes to writing it out I simply get:

Writing to  in_2xmp4.mp4
in_2xmp4: No such file or directory
Traceback (most recent call last):
  File "interpolate.py", line 164, in <module>
    os.remove(output_video)
FileNotFoundError: [WinError 2] The system cannot find the file specified: 'in_2xmp4'

I'm reading a sequence of pngs from a directory, using is_folder which is great; is there a way to write out a sequence of pngs rather than a video?

Do not load entire raw video into RAM

Unlike other video interpolation implementations like RIFE (https://github.com/hzwer/Arxiv2020-RIFE), this code loads all frames from disk into RAM.

This makes it impossible to interpolate videos longer than a minute or two, unless you have insane amounts of RAM.

It would be great if it's possible to instead load frames on-the-fly using a buffer, like RIFE does, for example.

Training issue

Hi, I've been trying to train this network on an A100 GPU. However, as torch 1.5.0 doesn't support this GPU I am forced to use torch 1.9.0. The training is broken for torch versions>1.5.0 but cannot find the reason why. I have looked at the differences between the torch versions, however, nothing is clear as to why this happens. Basically, the model stays stuck at around 20dB for the duration of training. I previously tested this code on a 1080Ti with torch 1.5.0 and that worked fine. But due to memory constraints and training time, the A100 would be the better option.
Do you have any idea why this occurs and any possible solutions?

Thanks

How to cascade different speed models?

Hi Tarun,

from the #32 issue I know we can cascade different models to make more speed interpolation,such cascade(2x,8x) models to make 16x interpolation, but how to do the cascade? Is that I use 2x model to generate 2x slow sequences firstly,and then apply 8x model to the 2x slow sequences?

about the vimeo dataset for training

To my best knowledge, other methods train their networks using vimeo_triplet which contains three consecutive frames instead of using vimeo_setuplet (used in video super-resolution). Why do you provide the "vimeo_setuplet" dataset for testing? Is it a fair comparison with the existing approaches?

parameters

Hello!, Since I have not trained and tested the network, I would like to know what the parameters of this model are? Thanks

Unreliable FPS readout causes error

When I try to interpolate a video, this error pops up:

File "interpolate.py", line 120, in <module>
    videoTensor = video_to_tensor(input_video)
  File "interpolate.py", line 101, in video_to_tensor
    fps = md["video_fps"]
KeyError: 'video_fps'

I suspect it fails to read the frame rate for some reason.

This is one of the reasons I am asking for a manual input: #4

Training issue

Hi, author, thank you for sharing the code on GitHub. The code performed well in test, but the PSNR value was always maintained at about 17dB during training. What is the reason?

about QVI model

Hi. Thx for your efforts on benchmarking existing models. I wonder which repo you are using for quadratic video inpainting (QVI) model? Could you please share the link?

UCF101

Hi there,

I was just wondering how you created the UCF101 dataset for your experiments? The only version of the dataset I can find is either still in avi form or has only 2 frames. For a fair comparison, I would like to use the same dataset as you, is this available somewhere?

Windows version

Great work Tarun. Can you share an updated version where one can run FLAVR inferencing and training (using small number of images) on windows as well: e.g., using pycharm etc.

periodic pause of interpolated video

sprite.mp4
sprite_FLAVR_8x.mp4

Hi,

I am using pretrained 8x model to interpolate the demo sprite video as shown on the project homepage. But I find that it seems to "pause" per second. Do you know why? Thx!

config.py not affecting interpolate.py behaviour

Hey team ! Congrats for the amazing job

when trying to modify the config file (eg : reducing batch size), it seems to not being affecting the behaviour of the script (interpolate.py) when launched.

Pretty new to learning python so I maybe misundeerstanding something here, but just in case !

best

Lucien

Finetune problem

Hello, Thanks for your brilliant work but I have a problem about the finetune. When I finetune your model on my own dataset, the finetuned model predicted twinkled videos and I output the predicted frame, I found that the predicted frame was darker than the adjacent frames. Then I tried train the model from the start using Unet34, but got the similar results that darker. And the PSNR and training loss were improving, but the inference results were worse.
Could you please explain to me a little?
It's the training details
python main.py --batch_size 8 --test_batch_size 8 --dataset vimeo90K_septuplet --loss 1L1 -max_epoch 200 --lr 0.00001 --n_outputs 1
Namespace(batch_size=8, beta1=0.9, beta2=0.99, checkpoint_dir='.', cuda=True, data_root='/vimeo_septuplet', dataset='vimeo90K_septuplet', exp_name='exp'
, joinType='concat', load_from=None, log_iter=60, loss='1
L1', lr=1e-05, max_epoch=200, model='unet_18', n_outputs=1, nbr_frame=4, nbr_width=1, num_gpu=1, num_workers=16, pretrained='FLA
VR_2x.pth', random_seed=12345, resume=False, resume_exp=None, start_epoch=0, test_batch_size=8, upmode='transpose', use_tensorboard=False, val_freq=1

Questions about the inference time

Hi, thanks for your interesting work!
I tested the inference time on vimeo90K_septuplet using your script, and i got the time is 0.004 s. It seems too fast?
I modified the code and tested again, and the time I got is 0.195 s.
image
image
So, I wonder how the time in your paper was tested?

Work with custom videos

So I wanted to run this on a custom set of videos, but I'm unsure on how I'm supposed to setup the data properly to do so. I currently have another folder inside with my videos, and if I run:
python test.py --dataset vimeo90K_septuplet --data_root "vimeo90K_septuplet" --load_from "./FLAVR_2x.pth" --n_outputs 1
where the vimeo90k_septuplet folder just contains my two videos, I get the error
FileNotFoundError: [Errno 2] No such file or directory: 'vimeo90K_septuplet\\sep_trainlist.txt'

I'm unsure on how to setup the text file for this, and potentially more issues for setup, but I'm not quite sure where to go. Any help would be really appreciated.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.